Shape complementarity at protein-protein interfaces

June 15, 2017 | Autor: Raquel Norel | Categoría: Algorithms, Biological Sciences, Biopolymers, Proteins, CHEMICAL SCIENCES, Protein Conformation

Share Embed

Laporkan tautan ini

Descripción

Shape Complementarity at Protein-Protein Interfaces RAQUEL NOREL,’ SHUO 1. LIN,’ HAIM J. WOLFSON,’ and RUTH NUSSINOV ’,’**

’Computer Science Department, School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel; ’Laboratory of Mathematical Biology, PRI/Dynacorp, NCI-FCRF, Bldg. 469, rm 151, Frederick, Maryland 21 71 2, USA; and ’Sackler Institute of Molecular Medicine, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel

SYNOPSIS

A matching algorithm using surface complementarity between receptor and ligand protein molecules is outlined. The molecular surfaces are represented by “critical points,” describing holes and knobs. Holes (maxima of a shape function) are matched with knobs (minima). This simple and appealing surface representation has been previously described by Connolly [ ( 1986) Biopolymers, Vol. 25, pp. 1229-12471. However, attempts to implement this description in a docking scheme have been unsuccessful (e.g., Connolly, ibid.) . In order to decrease the combinatorial complexity, and to make the execution time affordable, four critical hole/ knob point matches were sought. This approach failed since some bound interfaces are relatively flat and do not possess four critical point matches. On the other hand, matchings of fewer critical points require a very time-consuming, full conformational (grid) space search [ Wang, ( 1991 ) Journal of Computational Chemistry, Vol. 12, pp. 7467501. Here we show that despite the initial failure of this approach, with a simple and straightforward modification in the matching algorithm, this surface representation works well. Out of the 16 protein-protein complexes we have tried, 15 were successfully docked, including two immunoglobulins. The entire molecular surfaces were considered, with absolutely no additional information regarding the binding sites. The whole process is completely automated, with no manual intervention, either in the input atomic coordinate data, or in the matching. We have been able to reach this level of performance with the hole/ knob surface description by using pairs of critical points along with their surface normals in the calculation of the transformation matrix. The success of this approach suggests that future docking methods should use geometric docking as the first screening filter. As a geometrically based docking methodology predicts correct, along with incorrect, receptorligand bound conformations, all solutions need to undergo energy screening to differentiate between them. 0 1994 John Wiley & Sons, Inc.

INTRODUCTION T h e problem of protein-protein recognition is exceedingly complex. T w o major issues are involved The first concerns the geometrical fitting of the molecular surfaces, whereas the second takes i n t o acc o u n t chemical interactions. Here we treat the first of these issues. Geometrical fitting of two molecules necessitates proper surface representation. In order to efficiently and accurately dock a ligand o n t o a receptor surface, Biopolyrners, Vol. 34,933-940 (1994) 8 1994 J o h n Wiley & Sons,Inc.

CCC 0006-3525/94/070933-08

* T o whom correspondence should be addressed at NCI-FCRF Bldg. 469, Krn. 151, Frederick, MD 21702.

a n adequate description of their respective molecular surfaces is required. Molecular surfaces are often described b y A dot description is convenient, since it enables matching corresponding d o t s o n the two molecular surfaces. However, a dot-surface matching may also present some difficulties: In order t o have a complete surface contour description, a dense d o t o u t p u t is preferable. O n the o t h e r hand, attempting t o m a t c h too m a n y dots may easily result in a combinatorial explosion. Furthermore, if the d o t s are t o o close to each other, the distances between t h e m might fall within the error range of the ligand-dot with the receptor-dot matching. O n e way t o deal with this problem is t o develop a sparse d o t surface representation, which properly describes t h e 933

934

NOREL E T AL.

salient topographical features of the molecular surface, such as peaks of hills and bottoms of pits. Several years ago, M. Connolly developed a surface representation geared toward this goal.3 Starting with a dense dot description (with the density user determined), many dots are next weeded out. Only critical points, describing local knobs and holes, are retained. A knob is a local minimum of the shape function; a depression is a local maximum. The values of the shape function a t these points are measured by constructing spheres with a given radius at each of these points. The areas of the spheres contained within the protein are measured and divided by the square of the radius. In order to qualify as knobs (holes) - minima (maxima), the resulting solid angles have to be below (above) a given threshold. Connolly3 has proceeded t o use this surface representation in his docking scheme. The docking scheme consisted of three phases: ( a ) For each knob (hole) on one molecular surface, a list of matching holes (knobs) on the other is created. ( b ) Matchings between quartets of knobs and holes on the molecular surfaces of the ligand and the receptor are sought. ( c ) Potential, geometrically matched solutions are screened for overlaps between the two molecules in other regions. The rationale behind the requirement of four knobs/holes point matches is combinatorial in nature. T h e exponentially growing combinatorial tree is pruned once mismatched point pairs are detected. Three points define a rotation and a translation. T h e fourth point specifies the chirality.* Once a four-point match is detected, a least-squares fitting between the matching pairs is calculated. In addition t o transforming the coordinates of one quartet with respect to the other, their corresponding, outward-pointing, unit vectors are transformed as well. T h e angles between each of the vectors of the first quartet and their corresponding, transformed vectors of the second quartet are required to be within a predetermined error threshold (180" k 2 0 " ) . tJsing this scheme, Connolly succeeded in docking correctly the a / @ hemoglobin subunits surfaces. However, a t the interface between the trypsin and its inhibitor, there are no four-point matching knob/ hole point pairs that satisfy the minima /maxima r e q ~ i r e m e n t Consequently, .~ correct docking of the trypsin with the trypsin inhibitor was not achieved. Based on similar principles, H. Wang4 has proposed matching of only one knob with one hole. Al* Actually the fourth point is redundant in determining a three-dimensional (3D ) transformation; a matching ordered triplet of noncollinear points has all the required information. Adding a fourth point constitutes a stricter matching condition.

though this approach has succeeded in matching both the two subunits of the hemoglobin and the trypsin/trypsin inhibitor, the time requirement was forbidding. By matching one critical-point pair, the three translational parameters have been determined. Two of the rotational parameters are given by superimposing the normal axes. The remaining sixth degree of freedom is dealt with by carrying out a grid search. A main result of Wang's study is the demonstration of the reason for Connolly's failure: namely, there are no four matching maxima/minima point pairs between the trypsin and the trypsin inhibitor. Our approach follows the two stages of the Connolly molecular surface representation. In the first stage, Connolly's Molecular Surface ( M S ) program'*2 is employed. This program is based on the Richard's definition of a molecular ~ u r f a c e A .~ water molecule, depicted as a ball of 1.4 A in radius is rolled over the van der Waals atomic surfaces of the receptor and ligand molecules. Narrow crevices are bridged, and the surface is described in terms of concave (where the water sphere touches three ato m s ) , convex (one atom is touched), and saddle ( t h e water molecule touches two atoms) regions. In the second stage, the output dots obtained during the construction of the surface are filtered as detailed below. Despite the similarity in the description of the molecular surfaces, we have successfully docked 15 protein-protein complexes. In all of these cases the rms deviation between the calculated and crystal complexes is under 3.2 8.The trypsin/trypsin inhibitor as well as two immunoglobulin-lysozyme complexes are among these. We failed in the prediction of the docking of one complex (4SGB, the serine proteinase B with its potato inhibitor). For each of the complexes the matching required a t most 61 min on a SUN SPARC workstation. Far more time was spent on the scoring of the large number of the generated geometric solutions ( u p to 4.5 h for the 3HFM immunoglobulin G1 FAB fragment with lysozyme as its antigen). There is no predefinition of the binding sites, neither on the receptor nor on the ligand. Furthermore, in 11 out of the 15 complexes the rankings of the correct solutions are in the top 1%of all potential geometrically matched solutions: only for one of the correctly predicted complexes is the ranking of the best solutions within over 4% of all potential solutions. We have been able t o achieve this level of performance with a previously largely unsuccessful critical point representation owing to our matching algorithm. In particular, while we use only two critical points for the matching of the surfaces, we have avoided a full

SHAPE COMPLEMENTARITY

conformational grid search. By making use of the two surface normals a t the critical points, the transformation can straightforwardly be computed. Before calculating the least-squares fitting and the transformation matrix, the matching of the surfaces a t the points is checked, reducing the number of potential transformations. The results obtained here demonstrate the validit,y of the original approach conceived by C ~ n n o l l y . ~ His critical point surface representation is logical. It is unfortunate that owing to the failure of his docking scheme, t o a large extent that approach has been abandoned. With a simple, logical twist, the scheme can be made to work. There are, however, two shortcomings that we have still encountered: First, owing to the rigid criteria used in the definition of a critical point (see below for details), the number of critical points is quite small. Thus, while their number still suffices for protein-protein docking, small ligands are inadequately described. Attempts to dock small ligands, such a s drugs or cofactors, onto large protein receptors have been unsuccessful. Second, not all receptor-ligand docking calculations have resulted in correct solutions. Furthermore, even in the 15 (out of the 16 complexes) where correct docking solutions have been achieved, one would like ( a ) to reduce the number of potential solutions and ( b ) to improve the rotation and translation, and thus achieve better rms deviations. It is these goals, along with improvements of the execution times that current first-stage geometrical docking approaches should preferably address.6 Optimizing geometrical fit, as well as obtaining fewer solutions, will result in less time being spent on the next stage, involving assessment of the energy.

METHODS Critical Points

Using Connolly's Molecular Surface ( M S ) algorithm, the dot surfaces of the receptor and ligand molecules are computed.'** A small subset of the molecular surface dots generated by the MS program is selected. These dots represent critical points on the molecular surfaces. Our method for selecting the critical points is very similar to the one described by Connolly? The atomic coordinates are taken from the Protein Data Bank ( P D B 7 ) . It is simpler to work with a discrete representation of t,he molecule. For this purpose, after generating the molecular surface, the molecule is mapped onto a 3D grid. Each grid element is called a voxel (volume element). We are working with a grid size of 0.25

935

A.

A voxel occupied by any atom is designated a s a n interior voxel. To cover the vacancy between atoms in the interior of the molecule, the radii of these atoms are given as the sum of their van der Waals radius and the probe radius. At each MS dot, a sphere of a fixed radius is constructed (we use 6 & which is a n approximation of the radius of a n amino acid). A shape function at the point is defined as the intersection volume of this sphere and the molecule. Using the discrete grid representation, we count the number of interior voxels that are inside the intersecting sphere (also modeled using voxels). This shape function measures the local convexity /concavity of the surface. When the shape function value is small, the surface is convex (knob) ; when the value of the shape function is high, the surface is concave (hole). Two conditions should be satisfied in order for a point to be selected as a hole / knob: 1. To be a candidate hole ( k n o b ) , the shape function a t the point must be > V ( < V ) ,

3

V being the volume of (or the number of small cubes that conform t o ) the 6 A radius ball. This ensures that the selected points are high knobs or deep holes. 2. For the points satisfying the above constraint, we look for the 1 2 nearest neighbors. A candidate hole ( knob ) having the maximum (minimum) value of the shape function among its neighbors is selected. In the examples we have tested, using a sample MS dot density of 5 dots/A', the 1 2 closest neighbors to a point are less than 2.0 A dist a n t in the Euclidean ( L , ) distance. For each critical point we also compute the normal vector. T h e normal is computed as the unit vector in the direction of the line connecting t,he center of mass of the intersected section with the critical point. At the end of the process, for each molecule, we have lists of knobs and holes along with their associated normal vectors. Matching

To determine a transformation that superposes one body onto another, one needs a minimum of three (noncollinear) matching point pairs in both objects. However, three or four matching knob-hole point pairs are not always present a t the receptor-ligand interface. Indeed, that might have been the reason why Connolly failed in his matching scheme to correctly dock the trypsin inhibitor into its trypsin re~eptor.~,~

NOREL ET AL.

936

A critical difference between our approach and previous ones is the employment of the surface normals. This enables using only two critical points in the calculation of the transformation. The normal itself serves as the third (and fourth) point. Thus, not only are we able to dock receptor-ligand surfaces with only two knobs/ holes point matches combination, but the complexity of the docking method is reduced as well. For each pair of critical points from one molecule (e.g., the receptor), we compute the transformation with each compatible pair from the second molecule (e.g., the ligand) . Compatibility implies that a knob must be paired with a hole (and vice versa). In the second molecule (the ligand) the “normals” are inverted. The points that are one unit away from the critical points, in the normal orientation, are aligned as well. For example, in Figure 1point “a” is aligned with point “d,” and point “b” is aligned with point “e,” which is reversed from c. Since protein-protein interactions are localized, and generally do not span distant regions of the proteins without covering close-range ones, we need not take pairs of critical points whose distance from each other is above a threshold (we use 20 A ) . A pair of points with the associated normals contain additional information that enables a fast pruning of incompatible pairs. Thus, for each pair of points we compute a geometric and symbolic signature. The signature of the pair includes the label of the points (knob/hole), the distance between the points, the angles that are formed between the line segment from both points with each of the normals, and the torsion angle between the two planes formed by the line segment and the normal (see Figure 2 ) . Only pairs with compatible signatures are considered. In the matching, we allow a tolerance of 2 A in the distance between the critical points, and 0.9

0 b

Receptor

Ligand

Figure 1. Critical points and their normals.

A

B

Figure 2. Signature information. A and B are the critical points, NA ( N B ) is the normal of point A ( B ). The angle between AB and N A ( NB) is La ( L b ) . The angle between the planes generated by the normals and the line segment between both critical point is L c .

radians between the corresponding angles in the receptor and ligand. The allowed sum of the differences of two corresponding angles in the receptor and in the ligand, the angles between a normal and a line (angles “a” and “b” in Figure 2 ) , is 1.6 radians. The difference in the sum of all three angles (torsion and the two angles between the line segment and the normals) is not allowed to exceed 2.1 radians. Unlike the approach taken by Connolly in his docking scheme, no volume complementarity of the knobs/ holes shape functions is required here. We found this constraint to be too restrictive. For receptorligand critical-point /normal pairs having complementary signatures, the 3D rotation and translation, achieving the minimal least-squares distance (in matching the two points and two normals), is computed? Receptor-Ligand Overlap Check

The receptor is mapped onto a 3D grid in a manner similar to that described above, with a grid size of 1 A. During the mapping, 1 A is added to the van der Waals radii of each of the interior atoms. The sizes of the exterior atoms are defined by their corresponding van der Waals radii. Voxels occupied by interior receptor atoms are initially designated interior voxels. However, if an exterior atom sphere falls into an interior voxel, the latter is converted into an exterior voxel. MS surface points of the receptor molecule (using a density of 5 dots /A2), are assigned 1 A radii, and are also mapped onto the same grid. Voxels containing MS spheres are now labeled surface voxels. The surface of the ligand is described by the MS algorithm at a density 0.5 dot/A2.The ligand atoms are next transformed and mapped onto the same grid. If a ligand atom center falls in an interior voxel, molecular penetration is registered, and the solution is discarded. For the remaining cases, a score function is computed. The function measures the contact between the receptor and the ligand, with a penalty for surface overlap. The ligand MS dots are trans-

SHAPE COMPLEMENTARITY

formed and mapped. Three counters are kept, one for ligand MS dots that fall in interior voxels ( I ) , the second for dots that fall in exterior voxels ( E ), and the third for MS ligand dots falling in surface voxels ( S ) . The score function is computed as S - 4E - 101. Surface contact increases the score, whereas the overlap of the ligand surface dots within the receptor atoms reduces it. The “goodness” of the scoring function has been tested on the complexes in the crystallographic ( P D B ) data base. Using this scoring function, the potential docking solutions obtained by our matching algorithm are ranked.

RESULTS Table I lists the 16 protein-protein crystallographically determined complexes that have been extracted from the protein data bank (PDB7) and tested with our docking scheme. The resolutions of the crystal structures are noted as well. The list includes 10 proteinases (lcho, ltec, ltgs, Skai, 2ptc, 2sec, 2sni, Ptgp, Isgb, 4tpi) and 3 immunoglobulins ( lfdl, Zhfl, 3hfm). These 16 complexes contain relatively large ligands. Table I1 lists the number of potential receptor-ligand geometrically compatible docking solutions obtained for each complex. The ranking of the best solution within these is noted. Since in these test cases, we knew that the “correct” solution should ideally have 0 displacement and 0

Table I.

1

2 3 4 5 6 7 8 9 10 11

12 13 14 15 16

937

angular distance, we have defined the “best” solution to be the solution having the minimal rms deviation from the original PDB file ligand among all the candidate solutions with translation distance less than 4 8, and angular distance less than 0.4. (For 4sgb no such solution exists; thus we allowed for a translation distance of 4.5 A.) Of course, when the correct outcome is not known, one cannot predict the “best” solution, and all the candidate solutions would have to undergo an energy evaluation procedure as an additional filter. We have used this best solution criterion, just to estimate the efficacy of our overlap score. In 11 of these complexes (lcho, ltec, ltgs, 2kai, Zmhb, Bptc, Bsni, Btgp, 4cpa, 4hvp, 4tpi), our overlap score ranks the geometrically optimal solution, having the lowest rms deviation from the PDB file ligand (so-called best candidate solution in Table I1) , among the top 1%of all potential solutions obtained (see Table 11).The rotations, translations, and rms deviations of the optimal solutions and the total central processing unit ( CPU ) minutes needed to complete each complex docking run are noted too. The rms deviations are calculated between the atoms of the ligand in the transformed docked solution and the ligand atoms in the crystal complex. In one case (4sgb), an rms deviation of almost 5 A is obtained. This complex, whose docking failed, has the fewest receptor-ligand critical knob/ hole point pairs. The next worse case (lfdl) with an rms of 3.1 A, is an antibody-antigen complex.

The Complexes Used in This Study” PDB

Receptor Name

Ligand Name

lcho lfdl ltec ltgs 2hfl 2kai 2mhb 2ptc 2sec 2sni 2tm 3hfm 4cpa 4hvp 4sgb 4tpi

a-Chymotrypsin (E) IG G1 Fab fragment (LH) Thermitase eglin-c (E) Trypsinogen (Z) IG G1 Fab fragment (LH) Kallikrein a (AB) Hemoglobin a-chain (A) 8-Trypsin (E) Subtilisin Carlsberg (E) Subtilisin novo (E) Trypsinogen (Z) IG G1 Fab fragment (LH) Carboxypeptidase HIV-1 protease chain A Serine proteinase (E) Trypsinogen ( Z )

Chain E 2-Lysozyme (Y) Leech (I) Porcine pancreatic secretory trypsin inhibitor (I) Lysozyme (Y) Bovine pancreatic trypsin inhibitor (I) 8-chain (B) Pancreatic trypsin inhibitor (I) Genetically engineered n-acetyl eglin-c (I) Chymotrypsin inhibitor (I) Pancreatic trypsin inhibitor (I) Lysozyme (Y) Potato carboxypeptidase a inhibitor (I) Chain B Potato inhibitor pci-1 (I) Pancreatic trypsin inhibitor (I)

Res. in

A

1.8 2.5 2.2 1.8 2.5 2.5 2.0 1.9 1.8 2.1

1.9 3.0

2.5 2.3 2.1 2.2

a The PDB code of each complex is noted in the PDB column. The chain is given in parentheses next to the receptor and ligand description. The resolution of the complex is noted in the last column.

938

NOREL E T AL.

Table 11. The Best Solution Obtained for Each Complex'

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

PDB

Potential Solutions

Ranking of Best Solution

Within Top Percentage

lcho lfdl ltec ltgs 2hfl 2kai 2mhb 2ptc 2sec 2sni 2tm 3hfm 4cpa 4hvp 4sgb 4tpi

115116 394776 131283 144438 457664 166331 381018 147345 149405 178493 115822 565376 87923 174012 44184 126829

2 124309 95 552 6792 672 49 161 3518 88 180 17637 106 2 13691 11

1% 32% 1% 1%

2% 1% 1% 1% 3% 1% 1% 4% 1%

1% 31% 1%

rot

trans

RMS

CPU Match

Score

0.000 0.233 0.158 0.176 0.123 0.210 0.000 0.063 0.148 0.127 0.114 0.176 0.215 0.071 0.304 0.077

0.695 1.738 1.549 0.661 1.049 1.494 0.639 1.567 0.966 0.958 0.416 2.016 2.418 0.530 4.021 1.300

0.74 3.10 2.12 1.57 1.69 2.37 0.68 1.63 1.59 1.53 0.94 2.70 2.95 0.87 4.92 1.44

12.4 43.9 8.8 12.9 52.6 15.3 34.0 9.9 11.4 12.4 8.4 60.5 5.4 12.7 3.5 7.9

44.8 193.2 27.7 32.2 230.8 39.5 174.2 28.4 31.4 39.2 22.8 275.4 11.9 52.5 7.6 23.3

a The number of potential solutions generated for each complex is given in the third column. The numerical ranking of the best solution, according to the score described in Methods is noted along with its location (percentwise) within all potential solutions (fourth column). The next three columns refer t o the best solution. The rotation, translation, and the rms deviation of the transformed ligand atoms from the crystal-bound atoms are given. The ninth column indicates the CPU time (in minutes) needed to complete the matching of each example. T h e last column indicates the CPU time required t o carry out the scoring of the solution.

The results presented here have been obtained with one set of parameters. Furthermore, unlike previous work,4 no optimizations were carried out. However, since the calculations of the transformation depend critically on the surface normal vectors, even a small inaccuracy in the placement of the crit-

ical points, and thus in the direction of the normals, may affect significantly the quality of the solutions. Clearly, using more than one set of parameters would have improved the quality of the results. Here, however, we have opted to use only one set. The correct solutions do not always rank a t the

Table 111. The Sizes of the Receptors and Ligands and the Number of Knobs and Holes They Contain"

1 2 3 4 5 6

7 8

9 10 11 12 13 14 15 16 a

PDB

Amino Acids Receptor

Int Atoms Receptor

Ext Atoms Receptor

Holes Receptor

Knobs Receptor

lcho lfdl ltec ltgs 2hfl 2kai 2mhb 2ptc 2sec 2sni 2tgp 3hfm 4cpa 4hvp 4sgb 4tpi

146 432 279 233 424 231 141 229 275 275 229 429 307 99 227 230

655 1601 825 807 1606 876

392 1705 1178 838 1621 922 487 842 1122 1105 797 1741 1434 280 706 792

57 214 97 135 220 121 84 119 111 112 107 214 132 54 60 108

144 287 156

581 786 797 832

831 1552 1002 465 603 836

155 270 148 118 144 141 155 144 262 181 101 97 141

Amino Acids Ligand

Int Atoms Ligand

Ext Atoms Ligand

Holes Ligand

Knobs Ligand

53 129 63 56 129 57 146 58 62 63

469 527 306 281 517 269 618 261 306 306 275 516 184 457 247 279

232 473 521 215 483 169 515 192 223 206 178 484 91 288 132 176

31 44 25 20 60 22 87 23 22 30 22 59 1.1 46 17 23

108 103 72 66 93 71 119 64 66 65 62 110 47 95 56 60

58 129 36 99 51

58

Int atoms: the interior atoms; Ext atoms: the exterior atoms, i.e. those for which MS surface dots are generated.

SHAPE COMPLEMENTARITY

top. Since our ranking function is based on receptorligand contact score, this suggests that the correct solutions cannot be distinguished solely by the amount of their buried surfaces. Although, on average (for all the tested examples), the total number of pairs of critical points is close to 16 X lo3 ( 15,756), and the number of “pairs of pairs” (to get all possible matches) is 3.8 X lo8 (see Table 111), the complementary signature (first stage) and the overlap checks discard many wrong solutions. However, too large a number of solutions are still left and need to undergo energy evaluation. An alternative approach would have been to reduce the number of potential solutions via microscopic examination of the salient features of the surface a t the matching stage. This latter approach would require complementarity of the details of the surface shape, including saddle points, ridges, and valleys. Adopting this approach may, however, result in difficulties, as the receptor-ligand surface complementarity is imperfect. Being too restrictive at the matching stage may result in skipping the correct solutions altogether. For this reason, the logic adopted in this work has been to include correct solutions along with “false positives” first, filtering the latt,er later. This necessitates either additional filtering steps to reduce the number of potential solutions, or higher precision ranking. Such steps are currently being examined, with encouraging preliminary results.

DISCUSSION Above we have demonstrated that a simple and appealing concept of describing molecular surfaces in terms of their critical points proposed by Connolly in 19863 can work nicely in a straightforward geometrically based docking scheme. This approach is based on the notion that geometrical surface complementarity a t the molecular interface is fundamental for molecular r e c o g n i t i ~ n . ~ ~ ~ , ~ - ’ ~ Already over a decade ago, Connolly’.’ implemented the molecular surface representation proposed by Richards.’ Dots are placed on the van der Waals atomic surfaces touched by a water sphere. However, these dots could not be used directly in a docking scheme. There were two main reasons for their inapplicability: First, comparison of too many dots resuks in a combinatorial explosion; and second, if the density of the dots is too high, the distances between pairs of dots are too small and fall within the error threshold of the matching. At first sight it would appear that the simplest approach

939

would have thus been to reduce the dot density. Thus, rather than, say, outputting 5 or 10 dots per A’ of surface area, a user could have chosen to describe the surface by, say, 1or 0.1 dots per A‘. Unfortunately, this simplistic route does not work. The sampling of the dots on the surface is a probabilistic occurrence. Sparsely sampled dots will not describe the surface adequately, unless the dots are well placed. Furthermore, since the surface normals a t the dots are critical for surface matching, depending on the contour of the surface, even a small deviation in the positioning of the dots may result in a sharp change in the direction of the normals. Requiring a matching between the directions of the receptor and the ligand surface normals does not work successfully under such circumstances. Connolly3 has thus proceeded to select critical dots, describing interest points on the molecular surfaces. These points, representing protruding knobs or deep holes, were used in his matching scheme. It worked in one case (the a / @subunits of hemoglobin). Despite repeated trials, it did not correctly dock the trypsin inhibitor into its trypsin receptor, where the interface is relatively flat. No fourpoint matches of the receptor with the ligand at the interface, required by Connolly’s method, are present. Owing to this failure, this sparse-point, specialfeature representation of the molecular surface has largely been abandoned in favor of other approaches. Here we show that Connolly’s representation works quite nicely. Indeed, no other geometrically based docking approach published over the years performs better for protein-protein docking. Furthermore, the success obtained here also immediately indicates the points still needing improvement: ( a ) Higher accuracy in the selection and placement of the points should describe the molecular surface. Higher accuracy in the placement of the dots will result not only in better description of the surfaces, but also in having more reliable surface normals. This in turn is expected to yield docking solutions having better rms deviations as compared with the crystal structures. They will clearly also be better candidates for the energy assessment calculations. Such an algorithm is also expected to be more robust. ( b ) A good surface description should preferably be successful in the docking of both small ligands as well as protein ligands. This requires that the number of surface points should not be too small. Last ( c ) higher efficiency in the matching calculations is needed. The approach employed here also indicates the advantages emanating from using the normals in the calculations of the transformation matrices. Development of our current surface representation

940

NOREL ET AL.

and docking algorithms uses and addresses the above issues to achieve these goals6 Energy assessment and minimization is the next required stage.15-" We would like to thank Drs. D. Covell, R. Jernigan, J. V. Maizel, and D. Fischer for helpful discussions. We also thank G. Smythers for his continuing superb technical help. The research of R. Nussinov has been sponsored by the National Cancer Institute, DHHS, under Contract No. 1-CO-74102 with Program Resources, Inc. The content of this publication do not necessarily reflect the views or policies of the DHHS, nor does mention of trade names, commercial products, or organizations imply endorsement by the US. government. The research of H. J. Wolfson has been supported in part by grant No. 89-00481 from the U.S.-Israel Binational Science Foundation ( B S F ), Jerusalem, Israel. The research of R. Nussinov in Tel Aviv University has been supported in part by Grant No. 9100219 from BSF. The research of H. J. Wolfson and R. Nussinov in Israel has been supported in part by a grant from the Israel Science Foundation administered by the Israel Academy of Sciences. R. Norel acknowledges support by the Eshkol Fellowship. This work formed part of the Ph.D. thesis of R. Norel, University of Tel Aviv.

REFERENCES 1. Connolly, M. L. (1986) Science 221, 709-713. 2. Connolly, M. L. (1986) J.Appl. Cryst. 16.548-558. 3. Connolly, M. L. ( 1986) Biopolymers 25,1229-1247.

4. Wang, H. (1991) J . Comp. Chem. 12, 746-750. 5. Richards, F. M. (1977) Ann. Rev. Biophys. Bioeng. 6, 151-176. 6. Lin, S. L., Nussinov, R., Fischer, D. & Wolfson, H. J. (1994) Proteins, 18, 94-101. 7. Bernstein, F. C., Koetzle, T., Williams, G., Meyer, E., Brice, M., Rodgers, J., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977) J. Mol. Biol. 112,535-542. 8. Besl, P. J. & McKay, N. D. (1992) IEEE Trans. Pattern Anal. Machine Intell. 14, 239-256. 9. Kuntz, I. D., Blaney, J. M., Oatley, S. J., Langridge, R. & Ferrin, T. E. (1982) J . Mol. B i d . 161,269-288. 10. Jiang, F. & Kim, S. H. (1991) J . Mol. Biol. 219, 79102. 11. Katchalski-Katzir, E., Shariv, I., Eisenstein, M., Friesem, A. A., Aflalo, C. & Vakser, I. A. (1992) Proc. Natl. Acad. Sci. U S A 89, 2195-2199. 12. Connolly, M. L. (1992) Biopolymers 32, 1215-1236. 13. Kasinos, N., Lilley, G. A., Subbarao, N. & Haneef, I. (1992) Protein Engin. 5,69-75. 14. Norel, R., Fischer, D., Wolfson, H. & Nussinov, R. (1994) Protein Engin., 7 , 39-46. 15. Shoichet, B. K. & Kuntz, I. D. (1991) J. Mol. Biol. 22 1,327-346. 16. Bacon, D. J. & Moult, J. (1992) J . Mol. Biol. 225, 849-858. 17. Cherfils, J. & Janin, J. (1993) Curr. Opin. S t r u t . Biol. 3, 265-269. 18. Goodsell, D. S. & Olson, A. J. (1990) Proteins 8,195202.

Received August 31, 1993 Accepted December 27, 1993

Lihat lebih banyak...

Shape complementarity at protein-protein interfaces

Descripción

Comentarios