Protein Sequences Yield a Proteomic Code

Share Embed


Descripción

Journal of Biomolecular Structure & Dynamics, ISSN 0739-1102 Volume 21, Issue Number 3, (2003) ©Adenine Press (2003)

Protein Sequences Yield a Proteomic Code http://www.jbsdonline.com Abstract Analysis of crystallized protein structures suggests that globular proteins are organized as consecutively connected units of 25-35 residues. These units are closed loops, that is returns of the polypeptide chain trajectory to a close contact with itself. This universal feature of apparently polymer-statistical nature is a basis for a principally novel view on the globular proteins as loop fold structures. The same unit size has been detected in protein sequences translated from complete prokaryotic genomes by positional autocorrelation analysis, which strongly indicates the evolutionary connection of the units. The units are further characterized by prototype sequences matching to their numerous derivatives in the translated genomes. The matches to five strongest prokaryotic prototypes and three prototypes of C. elegans are identified in the sequences of crystallized proteins, and their structures analyzed. Corresponding segments of the polypeptide chains in majority of cases form closed loops, though evolutionary fate of every prototype element is shown to be rather diverse. Then loop ends can be separated by a sequence-wise distant segments and stabilized by the spatial interactions in the context of the overall globular structure. The units belong to a presumably limited spectrum of the sequence prototypes, full repertoire of which would constitute a proteomic code.

Introduction

Igor N. Berezovsky1,*,§ Alla Kirzhner2 Valery M. Kirzhner2 Vladimir R. Rosenfeld2 Edward N. Trifonov1,2 1Department

of Structural Biology

The Weizmann Institute of Science P.O.B. 26 Rehovot 76100, Israel 2Genome

Diversity Center

Institute of Evolution University of Haifa Haifa 31905, Israel §Current Address: Dept. of Chemistry and Chemical Biology Harvard University

The expanding volume of protein sequences and structures, and the necessity to massively characterize and compare the newly appearing sequences call for more intensive study of the rules of protein sequence and structure organization. One rule is suggested by the recent analysis of the chain-to-chain contacts in the crystallized proteins that revealed a conspicuous prevalence of the closed loops of a certain size, 25-30 residues (1-3). The residues closing the loops are in tight van der Waals contacts forming “locks” of hydrophobic and bulky amino acids from both ends of the loops (2, 4-6). It is also discovered that the preferred sequence distance between hydrophobic residues measured by positional correlations along protein sequences translated from complete prokaryotic genomes is 25-30 residues as well (5, 6) that corroborates the closed loop size observed in the protein crystals. (Below, for brevity, the protein sequences translated from the genomes are called proteomes). The typical closed loop together with the locks at their ends has the sequence size of 30-35 residues. These structural units closely follow one after another along the protein sequences (1, 3) that suggests a straightforward evolutionary scenario of formation of such arrays by fusion of respective short early genes encoding the individual loops (5, 7). If, indeed, the protein sequence structure has such a design one would expect that some of the most frequent ancient sequence motifs of the size 30-35 residues might still be detected in the extant sequences. An important difference of our approach compared to massive earlier work on the protein evolution (8-20), is that it is based on the specific universal structural units of justified standard properties (chain return property and size). In the search for the unit size sequence patterns we conducted massive sequence comparisons of the proteins translated from the sequences of 23 complete prokaryotic

12 Oxford Street M-105 Cambridge, MA 02138, USA

*Email:

[email protected]

317

318 Berezovsky et al.

genomes, and from genome of C. elegans. Several of the most frequently represented motifs of the typical loop-n-lock size are detected. Remarkably, in the crystallized proteins these sequence motifs are found to be embedded in the closed loops as one would anticipate. This demonstrates that the loop-n-lock units, each with its specific sequence, are basic elements of the globular protein structure. Materials and Methods Protein sequences contain numerous segments with different degree of similarity that can be revealed by various sequence match procedures. Hypothetical descendants of the same prototype may differ from one another beyond traditional similarity thresholds, demanding a special approach. For example, the model patterns AXCY and MBND, perfectly unrelated by traditional criteria, are both 50 per cent identical to the “non-existing” pattern ABCD. In this work, the sequence comparisons, therefore, are made not between actually existing sequence segments, but rather between hypothetical (non-existing) patterns and the natural sequences. The outline of this original search procedure is described below. All 30-residue long sequences (over one million) from complete translated genome of E. coli (4288 proteins) have been matched to all protein sequences of 23 complete prokaryotic genomes (over 42 000 proteins) in search for the most populous similar sequence segments. Similarly, segments of first 2000 proteins of the proteome of C. elegans have been matched to remaining protein sequences of C. elegans. The selected segments served as initial patterns for multi-step iterative derivation of the final prototype patterns. Since deletions/insertions are about an order of magnitude less frequent than point changes, we did not consider the gaps in these calculations. Extraction of the Prototype Sequence Patterns In this work a direct sequence-to-sequence comparisons are made. Two 30-residue long segments are considered matching if more then 10 residues coincide. This threshold is chosen to guarantee that the observed measure of identity of two compared sequences is well above random expectation. The respective statistical formalism is described below. Statistics of Non-gap Pair-wise Sequence Comparison: Let us take the word W of the size n = 30 letters, taken from the 20-letter alphabet A, with a probability of pi (i = 1, …, 20) for each amino acid, so that every position of the word W is filled independently by a letter from the alphabet A. What is the probability that two such words W1 and W2 have m letters in common in the same sequence positions? The probability of a single match, i.e., having the same letter in a given position is: 20

p = Σpi2. i=I

Therefore, the probability to have matches between any m given positions within the words W1 and W2 is pm. The remaining positions have non-matching letters with probability (1 - p) n-m. Considering all possible arrangements of matching and non-matching positions one derives the following expression for the probability to have exactly m matches between the words W1 and W2: Cm pm (1 - p) n-m. This n expression is the binomial distribution with mean value np and variance np(1 - p). In case of uniform composition of amino acids pi = 0.05 for every i. Then p = 0.05 as well, and np = 1.5, with variance np(1 - p) = 1.425. For the non-uniform composition of amino acids respective values are somewhat different. Let us take as an example natural amino acid composition of the 23 proteomes analyzed in this work. The calculated frequencies of amino acids are: 0.108 (L), 0.087 (A), 0.072 (G), 0.072 (V), 0.067 (E), 0.064 (I), 0.060 (S), 0.056 (K), 0.056 (R), 0.051 (T), 0.049 (D), 0.044 (P), 0.042 (F), 0.039 (N), 0.035 (Q), 0.031

20

(Y), 0.023 (M), 0.020 (H), 0.012 (W), and 0.010 (C). In this case, p = Σpi2 = 0.062,

319

i=I

np = 1.9, with variance 1.74. The probability of having exactly m matches rapidly decays, so that beyond m = 5 it is negligible (0.0073, 0.0016, 0.0003, 0.00005, and 0.00001 for m = 6 to10, respectively). Thus, taking threshold m = 10 for the pair-wise comparisons is safe. In few cases strongly compositionally biased initial segments may result in degenerate final patterns. Indeed, since proteins generally are alanin- and leucine-rich the test sequence biased towards alanine and leucine would collect simple A, L repeat sequences rather than any specific pattern of A and/or L. To avoid this instability the letters of the tested sequence in the first round of matching were taken with the weights reciprocal to their occurrences. The Procedure: (i) In case of prokaryotes, for every 30-residue segment of the E. coli protein sequences the total number of matching segments within the 23 prokaryotic proteomes is found. The threshold for the segment-tosegment comparisons is taken equal to 10 matching (identical) residues. The segments matching to any given initial test segment of E. coli are combined into respective families. The largest families (several hundred segments) are taken for further treatment. (ii) The consensus sequence (most frequent amino acids in each of 30 positions) is calculated for a given family. (iii) This consensus, if it is different from the consensus in the previous round, is used for the next matching round to derive refined consensus sequence. The sequences are considered different if at least one letter is different. (iv) The matching rounds are repeated until no further change is seen. This procedure was used for the detection of the strongest patterns, i.e., those which collect significantly more matches in the natural sequences than in the respective shuffled sequences. The amounts of the segments belonging to prototype family are well beyond occurrences in respective randomized (shuffled) sequences of the same ensemble (see also Results and Discussion and Table I). In case of eukaryotes, the same procedure has been performed with 30-residue segments of first 2000 proteins of the C. elegans in order to reconstruct possible prototypes. With the same as above threshold for the segment-to-segment comparison three strongest prototypes were derived. Then these prototypes have been matched to the sequences of five eukaryotic genomes (C. elegans, A. taliana, M. musculus, D. melanogaster, and F. rubripes). Comparison of the amount of natural sequence segments, matching to the prototypes, collected on C. elegans genome and on combination of five eukaryotic genomes, with number of matching segments in respective shuffled sequences shows large differences (see also Table II and Fig. 3). Sequences The protein sequences of the following complete prokaryotic genomes were used for the calculations: Archaea: A. pernix, A. fulgidus, M. thermoautotrophicum, and P. abyssi. Eubacteria: A. aeolicus, B. burgdorferii, C. jejuni, C. pneumoniae, C. trachomatis, D. radiodurans, E. coli, H. influenzae, H. pylori, M. tuberculosis, M. pneumoniae, N. meningitidis, R. prowazekii, Synechocystis, T. maritima, T. pallidum, U. urealyticum, V. cholerae, and X. fastidiosa. Only one eukaryotic proteome of C. elegans is analyzed here. Four other proteomes, of A. taliana, M. musculus, D. melanogaster, and Fugu had been used for the demonstration of the difference between matches on natural sequences and ran-

Proteomic Code

domized (shuffled) ones. The sequences were provided by the National Center for Biotechnology Information, via Entrez Browser.

320 Berezovsky et al.

Mapping of the Prototypes in the Crystal Structures: The derived 30-residue long prototype sequences are compared to sequences of crystallized structures (21), and the highest matches above the threshold of ten residues are selected. Respective segments of the crystallized proteins are excised from the structures and displayed in projection convenient for viewing. The structures for the best sequence matches to the given prototype are displayed in the gallery of the chain returns for the prokaryotic Prototypes I-V, and eukaryotic Prototypes 1-3, respectively. Results and Discussion Search for the Dominant Sequence Motifs in Prokaryotes Protein sequences contain numerous sections with detectable degrees of identity that can be revealed by various sequence match procedures. In a simplified procedure used in this work all 30-residue long sequences (over one million) from complete proteome of E. coli (4288 proteins) have been matched to all protein sequences of 23 complete prokaryotic proteomes (over 42,000 proteins) in the search for the most populous similarities. The matches found form many distinct families. The prototype sequences for these families are calculated by iterative procedure starting with respective original frequent match 30-residue sequence. The consensus for the matching group of sequences is derived at every stage and the procedure is repeated until the final equilibrium consensus is reached. Many such prokaryotic prototypes are detected of which the first five (the strongest) are described in the paper. Table I lists the prototypes PI-PV, together with the number of respective matching segments in the proteomes. It is important to note that the above sequences are of consensus nature and are not present as such in the proteomes. The consensus sequences (see Table I) are calculated from the frequency tables representing corresponding families of segments found in the 23 proteomes. The amounts of the matching segments observed in each case are compared to the respective amounts for randomized (shuffled) proteomic sequences of the same ensemble. Table I Major prokaryotic protein sequence prototypes. Match threshold is taken to be 10 for all cases. Prototype PI PII PIII PIV PV

Sequence (length) GEIVALVGPSGSGKSTLLRALAGLLkPtsG LSGGQRQRVAIARALAlePKLLLLDEPTSALD DVIVVGAGPAGLAAALvLARAGAKVLVIE RRGIGMVFQNYALFPHLTVLENVALGL PVIILTARDDEEDRVeGLELGADDYLTKPF

(30) (32) (29) (27) (30)

Number of matches in 23 proteomes 1218 907 674 280 197

Same, in 23 shuffled proteomes 75 157 158 29 16

Table II Major eukaryotic protein sequence prototypes. Match threshold is taken to be 10 for all cases. Prototype P1 P2 P3

Sequence FIAIQILEALEYLHSKGIIHRDLKPENILL DKDGLTPLHLAAKNGHVEVVRLLLENGADV GYHFGVLSCRACAAFFRRTVVSKKKYKCCK

Number of matches in C. elegans (same in shuffled proteome) 331 (8) 281 (3) 231 (1)

Number of matches in 5 proteomes (same in shuffled ones) 1133 (15) 642 (18) 231 (3)

Search for similarities with the BLOCKS database (22) resulted in the following most prominent functional associations of the prototype sequences of the Table I. The prototype PI has a consensus sequence GEIVALVGPSGSGKSTLLRALAGLLKPTSG and corresponds to known ATP/GTP-binding site motif, P-loop (22-24). The Prototype II has the consensus LSGGQRQRVAIARALALEPKLLLLDEPTSALD and corresponds to one of the ABC transporter family (IPB001140) signatures. The Prototype III has the consensus DVIVVGAG-

PAGLAAALVLARAGAKVLVIE and has highest matches to the families IPB000103 (pyridine nucleotide-disulfide oxidoreductase II), PR00420 (flavoprotein monooxygenase), and PR00368 (FAD-dependent pyridine nucleotide reductase) in BLOCKS database (22). The Prototype IV, RRGIGMVFQNYALFPHLTVLENVALGL, matches to the family IPB003401 (oxygen-independent coproporfirinogen III oxidase). The Prototype V PVIILTARDDEEDRVEGLELGADDYLTKPF matches to family IPB001789 (response regulator receiver domain) and IPB001867 (transcriptional regulatory protein). 1

321 Proteomic Code

PI

frequency

0.8 0.6 0.4 0.2 0

frequency

sequence position

1 0.8 0.6 0.4 0.2 0

PII

frequency

sequence position

1 0.8 0.6 0.4 0.2 0

PIII

frequency

sequence position

1 0.8 0.6 0.4 0.2 0

PIV

frequency

sequence position

1 0.8 0.6 0.4 0.2 0

PV

sequence position

Figure 2 shows the histograms for the occurrences of consensus (most frequent) residues of the prototypes (gray) as well as for flanking regions. The frequencies in the flanking regions are calculated in order to more accurately define the prototype size limits, which are found to vary from 27 to 32 residues, within expected range. In some cases the histograms indicate possible alternative borders (see PI). This may mean that some prototypes could have had developed several related families. Further detailed study is necessary to outline their scope and the sequence details (see also below).

Figure 1: Frequency values within the consensus prokaryotic prototype sequences and beyond. The dark gray columns correspond to the prototype sequences given in the Table I. The set of five prototype sequences is a result of initial massive screening. The match threshold for the 30-residue long sequences tested in the search is taken equal to ten residues.

322

A

B

Berezovsky et al.

27

QARAGGDVISIIGSSGSGKSTFLRCINFLEKPSEG 61 + + +++++++ ++ + ++ + (15) PI: LTLKPGEIVALVGPSGSGKSTLLRALAGLLKPTSG

25 GLTIWLTGLSASGKSTLAVELEHQLVRDRR 54 + + + + ++++++ + + (12) PI: GEIVALVGPSGSGKSTLLRALAGLLKPTSG

1b0u, ATP-binding subunit of the histidine permease from S. typhimurium, chain a;

Figure 2: Structural representatives of the prokaryotic prototypes. A,B, Prototype I (PI); C-E, PII; F, PIV; G,H, PIII; I,J PV. In-the-plane projections of the loops are shown. Sequences, their positions in the crystallized protein, number of matches to the prototypes, and PDB descriptions of the respective proteins are indicated for each loop.

1d6j, Adenosine 5'-phosphosulfate (Aps) kinase from P. Chrysogenum,chain a;

C

D

LSGGQQQRVSIARALAMEPDVLLFDEPTSALD 186 +++++ +++ ++++++ ++ ++ ++++++++ (26) PII: LSGGQRQRVAIARALALEPKLLLLDEPTSALD 155

798 IALGLAFRLAMSLYLAGEISLLILDEPTPYLD 829 + + + ++ + ++ +++++ ++ (15) PII: LSGGQRQRVAIARALALEPKLLLLDEPTSALD

1b0u, ATP-binding subunit of the histidine permease from S. typhimurium, chain a;

1f2t, ATP-Free Rad50 ABC-ATPase from P. Furiosus, chain b;

The Prototypes Correspond to Closed Loops In order to describe structural properties of the family members belonging to the prototype sequences the whole PDB_SELECT (21), a set of protein crystal structures, was searched for the best matches to the prototypes. For each of them one to several representatives are found with 12 or more matching residues. Figure 2 displays the structures of the best matches to the prototype sequences. Despite structural differences between the examples, even within one and the same prototype, the detected matches are all of the closed loop character that could not possibly be observed if randomly chosen sequence segments from crystallized proteins were considered. This means that the loop-n-lock structure is encoded by the sequence prototypes in the way not compromised by mutational changes in their diverged representatives. Figures 2A and 2B show top representatives of the prototype PI in the available protein crystal structures, with 15 and 12 matching residues, respectively. Both of the sequences contain the ATP(GTP)-binding site (22-24). Apparently, the sequence divergence has led to substantial differences in the protein chain paths and secondary structure elements within the loops. In particular, the closed loop in the Figure 2A has a five residues longer contour length (light gray in the histogram for PI, Fig. 1). The evolutionary fate of every prototype element is expected to be rather diverse, as the examples in the Figure 2 demonstrate, indeed. In case of the Prototype I (Fig. 2A and B) the loop ends are separated by a sequence-wise distant segments (not shown), making together composite locks. In other cases the ends interact directly or are stabilized by the spatial interactions in the context of the overall globular structure (Fig. 2 E and F). PII representatives (Fig. 2C-E) are all variants of the same combination of the secondary structure elements, even though the structure in the Figure 2E is a deviate. The structures in Figure 2G,H (PIII) and 2I,J (PV) are essentially identical, while the structure in the Figure 2F (PIV) is of

E

323

F

Proteomic Code

120 LFVDQGDEQALRAALAEKPKLVLVESPSNPLL 151 + + +++ +++ + + + (12) LSGGQRQRVAIARALALEPKLLLLDEPTSALD

PII:

92 RTRLTMVFQHFNLWSHMTVLENVMEAP 118 + ++++ + + ++++++ (13) PIV:RRGIGMVFQNYALFPHLTVLENVALGL

1cs1, Cystathionine Gamma-Synthase (Cgs) from E. Coli ,chain a;

G

1b0u, ATP-binding subunit of the histidine permease from S. typhimurium, chain a;

H

127 DVVIIGSGGAGLAAAVSARDAGAKVILLE 155 ++ + + ++++++ +++++ +(16) PIII: DVIVVGAGPAGLAAALVLARAGAKVLVIE

22 DYLVIGGGSGGLASARRAAELGARAAVVE 50 + + + + +++ + + ++ + +(13) PIII: DVIVVGAGPAGLAAALVLARAGAKVLVIE

1d4c, Flavocytochrome C fumarate reductase from S. Putrefaciens,chain c;

3grs, Glutathione Reductase Oxidized Form (E) from Human (H. Sapiens) Erythrocyte;

I

J

79 NVIMLTAFGQEDVTKKAVELGASYFILKPF 108 ++ +++ + ++++ +++(13) PV: PVIILTARDDEEDRVEGLELGADDYLTKPF 1qmp, Sporulation response regulator, Spo0a from B. stearothermophilus, chain d;

77 PSIVITGHGDVPMAVEAMKAGAVDFIEKPF 106 + + + + ++ ++ + +++(12) PV: PVIILTARDDEEDRVEGLELGADDYLTKPF 1dbw, Transcriptional regulatory protein Fixj from R. meliloti, chain a.

its own type. This further illustrates our earlier observation that the closed loops of nearly standard size 25-30 residues as a unit may contain diverse combinations of the secondary structure elements (1-3). That is, the secondary structure elements and even their presence are not characteristic of the closed loops as a category. On the other hand, the individual types of the closed loops do have specific secondary structures. Interestingly, the same structure type may belong to different sequence prototypes (for example, Fig. 2 G-J, prototypes PIII and PV). Sequence/Structure Motifs in C. elegans Similar analysis applied to proteins of C. elegans, by taking its first 2000 proteins as a source of initial 30-residue segments, and screening the rest of the proteins, resulted in three strongest sequence/structure prototypes. Remarkably, the highest

matches to these sequence prototypes are found in the respective PDB structures also as returns of the polypeptide chain trajectory: P1 has representative in kinase (1a06, residues 120-149); P2 matches to Ankyrin repeat (1awc_B, residues 31-60); and P3 coincides with Zinc-finger (1by4_4A, residues 1152-1181), respectively.

324 Berezovsky et al. A

CALMODULIN-DEPENDENT PROTEIN KINASE FROM RAT (1a06, residues 120-149)

B

MOUSE GABP ALPHA/BETA DOMAIN (1awc_B, residue 31-60)

C

RETIONIC ACID RECEPTOR RXR-ALPHA (1by4_A, residues 1152-1181)

21

Figure 3: Structural representatives of the consensus eukaryotic prototype sequences in crystalized proteins. A, P1 has representative in kinase (1a06, residues 120-149); B, P2 matches to Ankyrin repeat (1awc_B, residues 31-60); C, P3 coincides with Zinc-finger (1by4_4A, residues 1152-1181).

What would be expected total number of such sequence prototypes? As rough estimate the data of Linial et al. (25) can be used, on the classification of protein sequence fragments of length 50 residues. Although this sequence size is different the estimated number (about 100) of various prototype sequences of 50 residues should be of the same order as for the 30-residue long sequence motifs. Our experience indicates that the number of the sequence prototypes may turn out to be even smaller. Every loop-n-lock element of a protein would correspond to one sequence type, and, perhaps, to several dominating secondary structure types. Thus, the proteins would be characterized as sequences of the units, in the alphabet corresponding to the limited variety of the prototypes (see also (25)). Such a description would not only be important for functional diagnostics of a given protein sequence, but it would provide as well a lead in the sequence-based calculations of the folded structure of the protein, since every prototype sequence unit also makes a closed loop with specific consensus structural details. Conservation of the chain return property was specifically demonstrated in our recently published work (26). The tertiary contacts between the loops would be the next stage of protein folding calculations. The linear succession of the closed loops in the globular proteins (1-3) suggests a cotranslational folding of proteins whereby newly synthesized sections consecutively fold, loop after loop. Verification of this scenario would require high-resolution experimentation. We believe that the hypothetical initial loop closures during the cotranslational folding are not necessarily final. The loops may open and close again in the process of folding. Some may acquire final non-loop conformations. In general, however, loop formation leads to the tremendous reduction of the conformational space during folding process (27) and provides, thus, a basis for the resolution of so-called Levinthal paradox (28, 29). Concluding Remarks

The mapping of the well-characterized basic loop-n-lock units along a protein sequence, essentially, would provide all the features necessary for sequence, structural and functional characterization of the protein. Thus, the repertoire of the prototype loop-n-lock units would represent a proteomic code. The strongest eight units described above are the beginning of a long laborious journey towards completion of the proteomic code.

Acknowledgements We are grateful to S. Pietrokovski for help with the BLOKS database, to M. D. Frank-Kamenetskii for discussion, to A. R. Fersht and to D. Eisenberg for comments, and to Mrs. A. Weinberg for the editing of the text. I. N. B. is a PostDoctoral Fellow of the Feinberg Graduate School, Weizmann Institute of Science. V. M. K. is supported by the Ministry of Absorption. References and Footnotes 1. 2. 3. 4.

Berezovsky, I. N., Grosberg, A. Y. and Trifonov, E. N. FEBS Letters 466, 283-286 (2000). Berezovsky, I. N. and Trifonov, E. N. J. Mol. Biol. 307, 1419-1426 (2001). Berezovsky, I. N. and Trifonov, E. N. Protein Engineering 14, 403-407 (2001). Lamarine, M., Mornon, J.-P., Berezovsky, I. N. and Chomilier, J. Cell. Mol. Life Sci. 58, 492498 (2001). 5. Trifonov, E. N., Kirzhner, A., Kirzhner, V. M. and Berezovsky, I. N. J. Mol. Evol. 53, 394401 (2001). 6. I. N. Berezovsky, A. Kirzhner, V. M. Kirzhner and E. N. Trifonov. Proteins 45, 346-350 (2001). 7. E. N. Trifonov and I. N. Berezovsky. Curr. Opin. Struct. Biol. 13, 110-114 (2003)

8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29.

Levitt, M. and Chothia, C. Nature 261, 552-558 (1976). Wierenga, R. K., Terpstra, P., Hol, W. G. J. Mol. Biol. 187, 101-107 (1986). Han, K. F., Bystroff, C. and Baker, D. Protein Sci 6, 1587-1590 (1997). Bystroff, C. and Baker. D. J. Mol. Biol. 281, 565-577 (1998). Dokholyan, N. V. and Shakhnovich, E. I. J. Mol. Biol. 312, 289-307 (2001). Liu, Y. and Eisenberg, D. Protein Sci. 11, 1285-1299 (2002). Apic, G., Gough, J. and Teichmann, S. A. J. Mol. Biol. 310, 311-325 (2001). Todd, A. E., Orengo, C. A. and Thornton, J. M. J. Mol. Biol. 307, 1113-1143 (2001). Teichmann, S. A., Murzin, A. G. and Chothia, C. Curr. Opin. Struct. Biol. 11, 354-363 (2001). Aravind, L., Mazumder, R., Vasudevan, S. and Koonin, E. V. Curr. Opin. Struct. Biol. 12, 392-399 (2002). Grishin, N. V. J. Struct. Biol. 134, 167-185 (2001). Kinch, L. N. and Grishin, N. V. Curr. Opin. Struct. Biol. 12, 400-408 (2002). Lupas, A. N., Ponting, C. P. and Russell, R. B. J. Struct. Biol. 134, 191-203 (2001). Hobohm, U. and Sander, C. Protein Science 3, 522-524 (1994). Henikoff, S. and Henikoff, J. G. Genomics 19, 97-107 (1994). Saraste, M., Sibbald, P. R. and Wittinghofer, A. Trends Biochem. Sci. 15, 430-434 (1990). Hofmann K., Bucher P., Falquet L., Bairoch A. Nucleic Acids Res. 27, 215-219 (1999). Linial, M., Linial, N., Tishby, N. and Yona, G. J. Mol. Biol. 268, 539-556 (1997). Berezovsky, I. N., Kirzhner, V. M., Kirzhner, A., Rosenfeld, V. R. and Trifonov, E. N. Protein Engineering 15, 955-957 (2002). Ittah, V. and Haas, E. Biochemistry 34, 4493-4506 (1995). Berezovsky, I. N. and Trifonov, E. N. J. Biomol. Struct. Dynam. 20, 5-6 (2002). Berezovsky, I. N. and Trifonov, E. N. J. Biomol. Struct. Dynam. 20, 315-316 (2002).

Date Received: January 29, 2003

Communicated by the Editor Maxim Frank-Kamenetskii

325 Proteomic Code

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.