De novo peptide sequencing by tandem MS using complementary CID and electron transfer dissociation

Share Embed


Descripción

3736 Andreas Bertsch1 Andreas Leinenbach2,3 Anton Pervukhin4 Markus Lubeck5 Ralf Hartmer5 Carsten Baessmann5 Yasser Abbas Elnakady6 Rolf Mu¨ller6 Sebastian Bo¨cker4 Christian G. Huber3 Oliver Kohlbacher1 1

Center for Bioinformatics Tu¨bingen, Eberhard-KarlsUniversita¨t Tu¨bingen, Tu¨bingen, Germany 2 Instrumental Analysis and Bioanalysis, Saarland University, Saarbru¨cken, Germany 3 Department of Molecular Biology, Division of Chemistry and Bioanalytics, University of Salzburg, Salzburg, Austria 4 Lehrstuhl Bioinformatik, Friedrich-Schiller-Universita¨t Jena, Jena, Germany 5 Bruker Daltonik GmbH, Bremen, Germany 6 Department of Pharmaceutical Biotechnology, Saarland University, Saarbru¨cken, Germany

Electrophoresis 2009, 30, 3736–3747

Research Article

De novo peptide sequencing by tandem MS using complementary CID and electron transfer dissociation De novo sequencing of peptides using tandem MS is difficult due to missing fragment ions in the spectra commonly obtained after CID of peptide precursor ions. Complementing CID spectra with spectra obtained in an ion-trap mass spectrometer upon electron transfer dissociation (ETD) significantly increases the sequence coverage with diagnostic ions. In the de novo sequencing algorithm CompNovo presented here, a divide-and-conquer approach was combined with an efficient mass decomposition algorithm to exploit the complementary information contained in CID and ETD spectra. After optimizing the parameters for the algorithm on a well-defined training data set obtained for peptides from nine known proteins, the CompNovo algorithm was applied to the de novo sequencing of peptides derived from a whole protein extract of Sorangium cellulosum bacteria. To 2406 pairs of CID and ETD spectra contained in this data set, 675 fully correct sequences were assigned, which represent a success rate of 28.1%. It is shown that the CompNovo algorithm yields significantly improved sequencing accuracy as compared with published approaches using only CID spectra or combined CID and ETD spectra. Keywords: CID / De novo / ETD / OpenMS / Sorangium cellulosum DOI 10.1002/elps.200900332

Received December 18, 2008 Revised July 22, 2009 Accepted July 29, 2009

1 Introduction The fully automated annotation of mass spectra generated by MS/MS of proteolytic peptides followed by computer-aided database searching forms the basis of high-throughput identification of proteins in proteome analysis [1, 2]. Nevertheless, the majority of mass spectra generated in a proteomic experiment remains un-annotated, mainly because of incomplete sequence information in the mass spectra or the lack of sequence information about the protein under investigation in sequence databases. Moreover, the similarity of many protein subsequences together with incomplete sequence information in the mass spectra results in falseCorrespondence: Andreas Bertsch, Universita¨t Tu¨bingen, Wilhelm Schickard Institute, Sand 14, 72076 Tu¨bingen, Germany E-mail: [email protected] Fax:149-7071-29-5152

Abbreviations: ECD, electron capture dissociation; ETD, electron transfer dissociation

& 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

positive identifications [3]. In order to increase the reliability of protein identifications based on MS data in proteome analysis, it is therefore mandatory both to improve coverage of the complete peptide sequence with diagnostic fragment ions and to implement efficient computer algorithms to automatically interpret the fragment ion spectra [4]. Gas-phase fragmentation of peptide ions by low-energy CID allows inference of their sequence from typical ion series [5, 6], which means primarily b-ions and y-ions resulting from cleavage of the peptide bond. More recently, CID has been complemented by electron-induced fragmentation methods such as electron capture dissociation (ECD, [7]) or electron transfer dissociation (ETD, [8]). In ETD, multiply protonated peptide ions are fragmented upon transfer of electrons from an electron carrier molecule derived from anthracene or fluoranthene by chemical ionization, resulting in dissociation of the peptide radical ions due to highly exothermic electron recombination reactions. In contrast to CID, ETD has no preferred cleavage sites except proline, which leads to a more uniform distribution of the fragment ions, mainly of c- and z-type, along the www.electrophoresis-journal.com

Electrophoresis 2009, 30, 3736–3747

peptide sequence. Hence, CID and ETD are complementary fragmentation techniques, leading to a more comprehensive coverage of a peptide sequence with fragment ions [9]. It was shown that the sequence coverage of doubly charged peptide ions could be increased from 84% with CID alone to 91% upon combining fragment information from CID and ECD [4]. If a sequence database of the species under investigation is available, annotation of fragment ion mass spectra can be performed by matching theoretical ions of candidate peptide sequences to the fragment ions observed in the experimental mass spectrum. A variety of database searching algorithms are currently in use for fully automated spectrum interpretation, such as MASCOT [2], SEQUEST [1], OMSSA [10], and X!Tandem [11]. Although these approaches are standard procedures in proteomic research, there are cases where appropriate sequence databases are unavailable or where sequencing errors or sequence polymorphisms render database searching unsuitable. For those cases, de novo identification, i.e. the inference of the peptide sequence directly from the experimental spectrum, represents a viable alternative. The de novo identification problem has been studied intensively [12–17]. Common methods can be subdivided into two steps: candidate sequence generation and scoring. Most of the approaches (e.g. [14–22]) use a spectrum graph, in which the nodes correspond to ions of the spectrum and edges correspond to valid mass differences. A valid mass difference means that the difference between the m/z ratios of two ions in the spectrum corresponds to the mass of one of the common amino acids. The candidate sequences are generated by traversal through the graph. Another approach makes use of a hidden Markov model, which directly scores the generated sequence [23]. A divide-and-conquer approach, where the candidates are generated from subspectra (between abundant ions) via exhaustive enumeration of amino acid combinations has also been used [24] for de novo sequencing. Very recently, an integer linear program formulation was published [25], showing superior performance compared with other well-known de novo searching tools both for data measured on high-resolution quadrupole time-of-flight and for low-resolution iontrap mass spectrometers. Although some approaches show good performances on high-quality spectra, in which most of the theoretical ions are present, the overall performance compared with database-driven approaches is still poor. Scoring of candidate sequences against the experimental spectrum was also studied intensively (e.g. [15, 26–28]), because such a scoring procedure is needed by most of the database-driven and de novo approaches. Horn et al. [29] presented a method of whole protein de novo sequencing employing CID and ECD as complementary fragmentation techniques on a high-resolution Fourier-transform ion cyclotron resonance mass spectrometer. Exploiting the high mass accuracy of Fourier-transform ion cyclotron resonance-MS/MS and the complementarity of both fragmentation techniques, the same group devised a linear de novo sequencing algorithm, which starts with the most reliable & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Proteomics and 2-DE

3737

fragment ion data and propagates linearly by including additional amino acid masses [4]. From a total of 7852 MS/MS data sets included in the de novo sequencing evaluation, the completely correct sequences of 1704 peptides could be annotated, representing 22% of all peptides identified of all peptides identified by MASCOT with a Mowse score greater than or equal to 34. Very recently, Datta et al. [30] implemented a sophisticated machine learning approach to make use of ETD and CID spectra for de novo sequencing. They use a tree-augmented naive Bayes classifier to learn specific features of fragmentation types in the spectra and combine them into one spectrum. Then a graph-based approach was employed to obtain the final list of peptides. The focus of this work, however, was on the generation of peptide candidates. Recent instrumental developments have led to the commercial availability of ion-trap mass spectrometers, which are able to generate CID and ETD spectra from the same precursor ion in a short period of time [8]. Given the fact that high-resolution mass spectrometers are quite expensive and that the majority of protein identification data today is still measured on low-resolution ion-trap mass spectrometers, we aim in this study at a method for the de novo sequencing of peptides employing CID and ETD in an ion-trap mass spectrometer. We demonstrate that these spectra contain complementary information that can be used to make de novo identification much more reliable. In order to optimize the parameters employed for a combinatorial algorithm, which we call CompNovo, we utilize CID and ETD fragment ion spectra obtained from tryptic peptides of nine standard proteins. The algorithm then employs a divide-and-conquer approach together with rapid mass decomposition to annotate the fragment ion masses derived from the combined CID and ETD spectrum. Finally, the optimized parameters are utilized to obtain de novo sequence information from CID and ETD fragmentation data derived from the measurement of the whole proteome of the bacterium Sorangium cellulosum [31].

2 Materials and methods 2.1 Experimental data generation For the generation of the training data set, nine proteins, obtained from Sigma (St. Louis, MO) or Fluka (Buchs, Switzerland), were partitioned into two mixtures and tryptically digested (trypsin from Promega, Madison, WI) using published protocols [32]. The mixtures contained the proteins in concentrations between 0.3 and 1.7 pmol/mL. Mixture 1 comprised thyroglobulin (bovine thyroid gland) and albumin (bovine serum), whereas mixture 2 contained b-casein (bovine milk), conalbumin (chicken egg white), myelin basic protein (bovine), hemoglobin (human), leptin (human), creatine phosphokinase (rabbit muscle), a1-acidglycoprotein (human plasma), and albumin (bovine serum). The resulting peptide mixtures were then separated using capillary ion-pair RP HPLC (IP-RP-HPLC) and www.electrophoresis-journal.com

3738

A. Bertsch et al.

subsequently identified by ESI-MS/MS as described in detail in [32, 33]. Briefly, one-microliter samples was desalted and preconcentrated at a flow rate of 10 mL/min of 0.05% aqueous TFA in a capillary/nano HPLC system (Model Ultimate, Dionex Benelux, Amsterdam, The Netherlands) using a 10.0  0.2 mm monolithic poly-(styrene/ divinylbenzene) preconcentration column, which was prepared according to the published protocols [32, 34]. The separation of the preconcentrated peptides was carried out upon transfer to a 50  0.1 mm monolithic poly-(styrene/ divinylbenzene) separation column (Dionex Benelux), employing a gradient of 0–40% ACN in 0.05% v/v aqueous TFA in 60 min at 551C. Digests were analyzed in quadruplicate at a flow rate of 0.7 mL/min. An ion-trap mass spectrometer with implemented ETD module (Model HCTultra PTM discovery system, Bruker Daltonik GmbH, Bremen, Germany) and equipped with a modified ESI-ion source (spray capillary: fused silica capillary, 0.090 mm od and 0.020 mm id) was used. A detailed description of the ETD setup of the ion-trap instrument including the negative chemical ionization source needed for the generation of the reagent anion of fluoranthene and the transfer into the ion trap can be found in [35]. Tandem mass spectra were generated using sequential CID and ETD fragmentation of the same precursor ion. Singly charged analytes were automatically excluded. The ETD reaction time, i.e. the reaction time for electron transfer from fluoranthene anions to peptide cations, was set to 100 ms. A short low-energy collisional activation step (‘‘smart decomposition’’, Bruker Daltonics [36]) was performed after electron transfer to further improve the fragmentation especially of doubly and triply charged peptides, because the fragments often do not dissociate due to non-covalent binding forces. The scan range was 210–2000 Da to exclude fluoranthene ions and to achieve a reasonable scan time for one spectrum pair. To generate a data set of a real life sample, approximately 690 mg of proteins extracted from cultivated S. cellulosum [31], a soil living bacterium, were tryptically digested (trypsin from Promega) using the published protocols [32]. The resulting peptide mixture was separated using an offline two-dimensional HPLC setup exactly as described in [37]. IP-RP-HPLC-ESI-MS/MS conditions and parameters for alternating CID and ETD fragmentation and detection were as described in the previous paragraph.

2.2 Data preprocessing A training data set was extracted from the analysis of the two digests of the nine proteins. After optimization of the algorithmic parameters with the training data set, the algorithm was applied to the de novo sequencing of peptides in a digest of a protein extract from S. cellulosum. Since none of the commercially available database-driven search engines supported the use of CID and ETD spectrum pairs during the identification process and identification using & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Electrophoresis 2009, 30, 3736–3747

ETD only showed considerably poorer performance with very few identifications unique to ETD, only CID spectra were used for the annotation of the peptides in the data sets. CID spectra from the eight HPLC runs of the standard protein mixture were identified through database searches using MASCOT version 2.1.03 [2], allowing one missed cleavage and carboxymethylation of cysteine as fixed modification and oxidation of methionine as variable modification. Oxidized methionine has almost the same mass as phenylalanine, which will be taken into consideration in our evaluation below. Spectra of doubly charged precursors and hits to tryptic peptides with a peptide score greater than 10 were used. This low-score cut-off was comparable to that of used by others [38] and ensured that spectra of lower quality were also included in the training data set. Manual inspection of the spectra with scores less than 10 showed that the ion series were highly incomplete and the spectra therefore not suitable for parameter estimation. In order to avoid bias to specific sequences, we allowed spectra corresponding to the same peptide sequence and charge combination only once in the data set. The final data set was constructed by selecting all spectra of peptides with a molecular mass less than or equal to 2000 Da where the peptide sequence was a sub-sequence of one of the proteins present in the sample. This selection resulted in a data set of 156 pairs of ETD/CID spectra of different doubly charged peptides. Only seven triply charged peptides were present and these were treated separately. In order to distinguish correctly from incorrectly determined sequences, the peptides in the benchmark data set of S. cellulosum were identified using MASCOT, version 2.2.04, OMSSA, version 2.1.4 and X!Tandem, version 08-02-01. The protein sequence database used included all 9320 protein sequences from S. cellulosum of Uni-ProtKB/Swiss-Prot release 15.4/57.4 and 86 protein sequences of trypsin and keratin. Cysteine carboxymethylation and methionine oxidation were included as variable modifications and up to one missed cleavage was allowed during the search runs. To ensure that the peptides included in the data set were correctly identified, a more complex verification procedure was performed for the identifications due to the unknown protein content in the sample. The identification was carried out using a composite database in which reversed sequences of all proteins were appended to the original S. cellulosum sequence database [39]. False discovery rates were estimated from the obtained identification lists. A peptide sequence was accepted if at least two identification engines identified the peptide as a top hit with a false discovery rate of 5% or better. If more than one spectrum pair represented the same sequence, the one with the best average false discovery rate was chosen. The 5% false discovery rate score cut-offs for the three search engines were 0.5 for the MASCOT E-value (MASCOT Mowse score 420), 0.09 for the X!Tandem E-value, and 0.1 for the OMSSA E-value. This means that only scores better than the thresholds were accepted for the consensus. Using the 38 261 MS/MS spectrum pairs in our www.electrophoresis-journal.com

Electrophoresis 2009, 30, 3736–3747

data set, this filtering procedure resulted in 2406 spectrum pairs of charge two and 123 of charge three, in which each spectrum pair corresponded to a unique peptide sequence. The data sets used in this study have been deposited in the Proteomics Identification Database (PRIDE [40, 41]) and are publicly available under the experiment accessions 8689 and 8690.

2.3 Algorithm overview The CompNovo algorithm comprises two steps: candidate generation and candidate scoring. After preprocessing of the spectrum, a divide-and-conquer algorithm decomposes the spectrum into smaller parts until the segment is small enough such that the mass decomposition can generate all amino acid compositions for a given mass difference. From this list of compositions, all permutations of sequences are generated, which are then scored against the spectra. The best scoring sequences are kept and used as candidates for the segment. After all segments have been processed, the final candidate list is generated and the candidate sequences are scored against the spectra. This is performed by a twostep approach. First, very simple spectra are generated from

Figure 1. Schematic diagram of the CompNovo de novo sequencing algorithm. Each spectrum pair was preprocessed to assign a score for being a y-ion to each peak. The spectrum was subdivided into smaller sections using putative y-ions as pivot ions. Sufficiently small subsections were then decomposed into amino acid compositions, which in turn were used to generate subsequences. The scored subsequences of two subsections of a section were joined and scored against the spectrum pair. Candidates of the whole spectrum were re-scored using a more detailed scoring scheme.

& 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Proteomics and 2-DE

3739

the candidate sequences and scored against the experimental spectra. The best candidates are then used in a second scoring step to generate more detailed theoretical spectra, e.g., by including additional ion series, losses, and isotope clusters, which are in turn scored against the experimental spectra. The various modules of the CompNovo sequencing algorithm are schematically shown in Fig. 1. We will now describe the individual parts of the algorithm in more detail. The CompNovo algorithm is part of the OpenMS platform and downloadable from www.openms.de. The source code of CompNovo is available and will be released under an open-source license as part of the upcoming version of our software packages OpenMS [42] and TOPP [43].

2.4 Spectrum preprocessing The CompNovo algorithm starts with y-ions of the CID spectrum. To select as many true y-ions as possible from the spectrum and to reduce the probability of selecting other ion types in the divide-and-conquer step, the following scoring steps are performed. The initial score for each of the ions present in the spectrum is its intensity. As ETD spectra contain abundant peaks of the precursor ion in almost all cases [44], the mass of the peptide can be determined within the mass tolerance of the tandem MS scan. This allows an accurate scoring of the complementary features of the spectra. First, the isotope patterns of each of the peaks are scored against the theoretical isotope distribution pattern based on the molecular mass and the average elemental composition of peptides with respect to carbon, hydrogen, oxygen, and nitrogen. A simple correlation is used as a multiplicative scoring in subsequent steps. Doubly charged ions that have isotope ions correlating well with a theoretical isotope distribution are transformed and inserted as singly charged ions into the spectrum, or the intensity is added to the corresponding singly charged ion if this is already present in the spectrum. The scores of a witness set of ions, i.e. the set of all ions that support the y-ion, are used to enhance its score. This includes neutral loss of water and ammonia from y-ions as well as complementary b-ions, which can be witnessed by putative a-ions. In addition to the witness set present in the CID spectrum, the c- and z-type ions of the ETD spectrum are used to provide further evidence that a supposed y-ion is correctly assigned. ETD spectra often contain radical variants of the c- and z-type ions also [44]; therefore, the single radical ions are also used for scoring. The intensities of the y-ion and the weighted intensities of the witness ions are added. Here, the weighting factor is derived from the relative difference of the deviation of the expected versus calculated mass difference. For instance, if a putative y-ion is witnessed by a water loss ion, the mass difference between both ions should be 18.0 Da. However, if it is observed in a distance of 18.2 Da the factor is 0.5, given a mass tolerance of 0.4 Da in the MS/MS scan. This is the same factor as used in the www.electrophoresis-journal.com

3740

Electrophoresis 2009, 30, 3736–3747

A. Bertsch et al.

spectra comparison, described in Section 2.8. Finally, the low-mass-range and high-mass-range candidates are checked to determine whether a mass decomposition is possible for the ions. If no valid mass decomposition was found for a specific ion, the ion cannot be a y-ion and the score is set to 0 to avoid selection as pivot ion. All the ions which have positive scores can be used as pivot ions.

2.5 Rapid mass decomposition In the second step of candidate generation, mass differences between ions are used to generate candidates for the subsequent divide-and-conquer analysis. Given a pair of pivot masses, we search for all amino acid compositions that are compatible with this mass difference, that is, have a theoretical mass close to the measured mass difference. Clearly, we cannot recover the order of amino acids solely from the mass difference, but we can infer the multiplicity of each amino acid in every hypothetical amino acid sequence explaining the mass difference. In this context, we say that we decompose the mass difference over the amino acid alphabet. To find all of these decompositions, there exist efficient algorithms [45, 46] where running time only depends on the number of decompositions of the input mass, and not on the mass itself. In fact, these algorithms can only be applied to integer masses, and hence we use the correction described in [47] to account for rounding error accumulation. This approach is efficient and at the same time guarantees that no decomposition is missed. Mass accuracy of the ion-trap mass spectra used in our analysis is comparatively low. To guarantee that the complete analysis pipeline is sufficiently fast, decompositions must be computed in a matter of milliseconds. Doing so over the amino acid alphabet, which can also include additional modified amino acid masses, can be carried out only for relatively small mass differences.

2.6 Divide-and-conquer approach In order to be able to correlate measured mass differences with a reasonable number of peptide sequence candidates, subsequences are generated using a divide-and-conquer approach, similar to the one described by Zhang [24]. The key difference is that, instead of using b-ions as pivot ions, our approach uses y-ions. A high-scoring putative y-ion is used in the beginning to divide the spectrum into two parts. The upper m/z part represents the prefix of the peptide sequence and the lower m/z part the suffix of the peptide sequence. If such a newly created segment is wider than a given threshold, it is further divided into two new subsegments using a pivot ion. To ensure that the correct sequence is generated, and to guard against occasional false pivot ions (non-y-ions), several pivot ions are selected per step. If the subsegment is small enough, the rapid mass decomposition algorithm is used to generate all possible & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

amino acid compositions explaining the observed mass difference. From each of the compositions all permutations of possible sequences are generated. To reduce the number of permutations, they are scored against the experimental spectra and only the best candidates are kept. This is done by creating theoretical spectra of the peaks generated by the permutations for a subsegment. When a subsegment is located somewhere in the center of the whole mass range of the spectrum, a prefix and a suffix are added during generation of the theoretical spectra. These spectra are then in turn used for comparison with the experimental spectrum pair. When the generation of the candidates of both segments of a divide step is completed, the results are combined. Each of the candidates of the left segment is combined with every candidate of the right segment. The created set of candidates is again scored and only the best candidates are kept. The whole procedure is repeated for several ions and the candidates are recorded for the final scoring of the full-sized sequence candidates.

2.7 Theoretical spectra Theoretical spectra are used to score against the experimental spectra. CID and ETD spectra are treated separately, and then both scores are simply added to get a similarity of a sequence and the experimental spectrum pair. To speed up computations, CID spectra are generated in two ways. The first type is used in the divide-and-conquer part and includes only b- and y-ions of same intensities. The second type of CID spectra also accounts for intensities of the different ions [1] including a-, b-, and y-ions as well as neutral losses from b- and y-ions. Water loss ions are added if the b- or y-ion contains at least one of S, T, E, or D. Ammonia loss ions are added if the b- or y-ion contains at least one of Q, N, R, or K. The relative intensity of y-ions is 100%, b-ions are included with a relative intensity of 80%. Losses from y-ions are included with 10% and losses from b-ions with 2% relative intensity. The intensity of the b-and y-ions (but not of the loss ions) is distributed over the monoisotopic and the isotope peaks according to the isotope distribution. ETD spectra are generated using c-and z-type ions and using the isotope distribution to add also isotope peaks. No differences are made for the intensities of different ion types in ETD spectra.

2.8 Candidate scoring Finally, the candidates produced by the divide-and-conquer approach are scored against the experimental spectra. For this purpose theoretical CID and ETD spectra of the peptides are generated. These spectra are compared with the experimental spectrum and a ranked list according to the similarity is created. As the number of candidates can be very large, a two-step scoring is applied. The similarities of the spectra are scored using the following similarity www.electrophoresis-journal.com

Electrophoresis 2009, 30, 3736–3747

function and a weighting factor is used to account for the differences in expected and observed ion positions: fi;j ¼

e  jPi  Pj j ; jPi  Pj j

where f is the weighting factor of ion i of the first spectrum and ion j of the second spectrum. The positions are noted as Pi and Pj , respectively. Parameter e is the mass tolerance allowed for the scoring and it was set to 0.4. The similarity of two spectra is calculated as follows: P pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Ii  Ij  fi;j i;j ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s ¼ pP P ffi ; I i i j Ij where Ii represents the intensity of the i-th peak of the first spectrum and Ij stands for the intensity of the j-th peak of the second spectrum. For the first scoring, the numerator is calculated by using all possible pairs of ions where the positions do not differ more than e. For the more expensive final scoring, this is only calculated for the best pairs, which means each ion can have at most one partner in the other spectrum. The pairs are determined using a spectrum alignment algorithm, where the number of paired ions is maximized and the sum of position distances is minimized. Intensities are ignored by the spectrum alignment algorithm. First, very simple spectra for CID consisting only of b- and y-ions are created, and for theoretical ETD spectra only c- and z-type ions are used. The candidates are scored using the first version of the scoring scheme. The best peptide sequences of the pre-scoring are used for an indepth scoring. Therefore, b- and y-ions as well as a-ions and neutral losses are used with different intensities in a theoretical spectrum. Additionally, doubly charged b- and y-ions are used. The ETD spectra are simpler because only c-type ions and z-type ions are used. The ions and the isotopes of the ions are used for scoring against experimental CID and ETD spectrum. Finally, a ranked list is generated from the scores, which is the output of the algorithm.

2.9 Quality assessment To count the number of correct amino acids of the predicted versus the correct peptide sequence, the following rules are used. Amino acid residues I/L, Q/K as well as F and oxidized methionine cannot be distinguished by their masses on low-resolution mass spectrometers and are therefore treated as one amino acid. The predicted and correct sequences are compared from the left to the right and an amino acid of the predicted peptide sequence is counted as correct if the corresponding amino acid in the correct sequence is identical. ‘‘Corresponding’’ in this case means that the prefix masses of the predicted and correct peptide sequences of the amino acid pair under consideration do not differ by more than 2 Da. This relatively high threshold takes into account the sum of the precursor mass tolerance and the fragment mass tolerance. This threshold also handles the case, e.g., & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Proteomics and 2-DE

3741

of the peptides DFPIANGER (correct) and NFPIANGER (sequenced) where the prefix masses of the correct amino acids, FPIANGER are shifted by 1 Da. The rules given above have another effect. If, for example, a correct amino acid K is predicted as a sub-sequence GA, this is counted as one error, if a correct subsequence GA is predicted as K, this is counted as two errors. The longest consecutive amino acid sequence of a predicted peptide and the number of incorrectly predicted amino acids are then computed using these rules.

2.10 Parameter determination for the CompNovo algorithm The parameters of the CompNovo algorithm used for the benchmark data set were determined using the training data set. To do so, a grid search for each of the individual parameters was performed. Final parameters were 450 Da for the maximal decomposition mass difference, the 40 best solutions were kept for each subsegment, and then divide steps were performed nine times using nine different pivot ions, if available. The 250 best-scoring peptide candidates were selected for rescoring using detailed CID spectra. For the specification of arbitrary post-translational modifications, CompNovo used the controlled vocabulary terms as specified by the PSI-MOD working group [48]. Additionally, the modifications can be set to fixed or variable, as the case requires.

2.11 Benchmarking against other de novo programs PepNovo [21], version v2.00, was used employing LTQ_LOW_TRYP as model parameter, which matches a low-resolution ion-trap mass spectrometer. The precursor mass tolerance was set to 1.5 Da, which was the same as for CompNovo. The fragment mass tolerance was automatically set to 0.4 Da by the chosen model. A post-translational modification was created for cysteine with a shift of 158.0055 Da to account for the carboxymethylation of cysteine, which was set as variable. LutefiskXP [17], version 1.0.5a, was used. Default parameters were used except for the ion tolerance, which was set to 0.4 Da, the same as for CompNovo. Additionally, the cysteine mass was set to the mass of carboxymethylated cysteine.

3 Results and discussion 3.1 Complementary fragmentation observed in ETD and CID spectra One of the major advantages of using both ETD and CID fragmentation techniques in one experiment rests within the complementary properties of the spectra they produce [49, 50]. Peptides dissociated by CID tend to show fragments www.electrophoresis-journal.com

3742

Electrophoresis 2009, 30, 3736–3747

A. Bertsch et al.

A 0.8

0.2 Relative Abundance

Relative Abundance

0.7

B b-ions y-ions

CID 2+

0.6 0.5 0.4 0.3 0.2

c-ions z-ions

ETD 2+

0.15

0.1

0.05

0.1 0

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0

1

0.2

Mass of Ion/Mass of Precursor Ion

C

1 0.8 0.6 0.4

1

c-ions z-ions

ETD 3+ 0.3 Relative Abundance

Relative Abundance

D0.35

b-ions y-ions

CID 3+

0.3 0.4 0.5 0.6 0.7 0.8 0.9 Mass of Ion/Mass of Precursor Ion

0.25 0.2 0.15 0.1

0.2 0.05 0

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.2

Mass of Ion/Mass of Precursor Ion

from cleavages more in the middle of the sequence [51]. In contrast, peptides dissociated through ETD fragment preferentially toward the ends of the sequence. These effects are shown in Figs. 2A and B, which was created using doubly charged peptides of the benchmark data set of S. cellulosum peptides. It shows clearly that, although b-ions or y-ions might be present, the intensities are comparatively low at the terminal sequence positions. On the contrary, c-and z-ions frequently occur in high abundance in the higher mass range, indicating preferential cleavage of terminal amino acid residues. As reported previously, ETD-specific ions generally have significantly lower intensities compared with those generated by CID [49, 50]. Figures 2C and D show the same analysis using triply charged peptides. Although ETD spectra tend to show many abundant ions, the overall sequence coverage in the CID spectra drops dramatically, especially for high-mass ions. To quantify this observation and to demonstrate the complementarity of the fragmentation methods, we performed another statistical analysis of the frequency of observation of the ions in our benchmark data set for CID and ETD fragmentation. For each backbone position the number of observed ions was counted and plotted in Fig. 3. Figures 3A and B show the complementarity of the fragmentation properties of ETD and CID for doubly charged precursor ions. It can be seen that the amino- and carboxyterminal amino acids are primarily covered by c- and z-type ions in the ETD spectra, whereas the amino acids in the middle of the peptide sequence are much more completely covered with b- and y-ions. The combination of both types is thus the logical consequence in an attempt to cover the complete sequence in a de novo sequencing approach. & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 2. Relative abundances of the different ions produced upon CID (A, C) and ETD fragmentation (B, D) of doubly (A, C) and triply (B, D) charged precursor ions as a function of the molecular mass of the ions. Adapted from [51], the bars show the median peak intensities and the lines show the 25th and 75th percentile of the peak intensities. The upper plots were created using doubly charged peptides (n 5 2406) and the bottom ones using triply charged peptides.

Mass of Ion/Mass of Precursor Ion

Figures 3C and D show that although ETD spectra of triply charged peptides cover most of the sequence ions with reasonable probabilities, the probabilities of seeing ions specific to CID fragmentation drop dramatically in this case. This makes the combination of both types of fragmentation less useful in the case of triply charged ions.

3.2 Application of the CompNovo algorithm to the analysis of an S. cellulosum protein lysate In order to determine the most suitable parameters for the CompNovo sequencing algorithm, we used the CID and ETD mass spectra generated in an ion-trap instrument operated in alternating CID/ETD data acquisition mode of the same precursor ion. So-called smart decomposition was enabled in the instrument, which imposes a short collisional activation step before the expulsion of the fragment ions from the trap to support the dissociation of radical precursor ions activated by electron transfer. A well-defined training set of tryptic peptides obtained upon digestion of nine standard proteins and acceptance only of peptide sequences as correct that were part of the included protein sequences (resulting in 156 pairs of CID/ETD spectra) allowed adjusting the parameters to the values given in Section 2. Using the optimized parameters, the algorithm was applied to the de novo sequencing of tryptic peptides from proteins contained in a lysate of S. cellulosum bacteria. Since we cannot define a priori which sequence delivered by the algorithm is a correct sequence in the large data set, we used consensus identifications by means of at least two of the three different database searching routines, including www.electrophoresis-journal.com

Proteomics and 2-DE

Electrophoresis 2009, 30, 3736–3747

1

b-ions y-ions

CID 2+

B Relative Frequency

Relative Frequency

A

0.8 0.6 0.4 0.2

C Relative Frequency

1

1

2

3

4

0.8 0.6 0.4

0

5 k-5 k-4 k-3 k-2 k-1 Backbone Position b-ions y-ions

CID 3+

0.8 0.6 0.4 0.2 0

c-ions z-ions

ETD 2+

0.2

D 1 Relative Frequency

0

1

1

2

3

4

5 k-5 k-4 k-3 k-2 k-1 Backbone Position c-ions z-ions

ETD 3+

0.8 0.6 0.4 0.2

1

2

3

4

5

3743

k-5 k-4 k-3 k-2 k-1

0

1

2

Backbone Position

MASCOT, OMSSA, and X!Tandem and a maximum of false discovery rate of 5% to obtain the sequences corresponding to the evaluated mass spectra. These consensus identifications yielded 2406 different pairs of CID/ETD spectra which were submitted to the CompNovo algorithm. To these spectrum pairs CompNovo assigned a total of 675 fully correct sequences, representing a success rate of 28.1%. Although, at first sight, this number might look moderate, it is among the highest success rates published so far for the de novo sequencing of peptides. Moreover, this result is superior to a de novo sequencing algorithm applied to CID and ECD data, in which 1704 full sequences out of 7852 candidates (22%) with high-quality mass spectra (MASCOT Mowse score 34) were successfully annotated (see Fig. 3 in [4]). In its present implementation, the improved success rate of CompNovo comes, however, at the cost of considerable computing time. Despite the rapid mass decomposition algorithm, CompNovo on average needs about 13 s per spectrum pair from the benchmark data set to suggest a set of peptide sequences (CompNovoCID about 8 s). PepNovo is able to annotate one spectrum in about 0.2 s, whereas LutefiskXP needs more than 30 s per spectrum. Speed improvements are planned in terms of parallelization of the implementation. Further, it should be mentioned that the implementation has been optimized for identification performance and has not yet been optimized for computational speed. Overall search times for de novo identification are not the bottleneck in the whole proteome analysis (Total search time for the whole proteome of S. cellulosum was about 20 h computing time.). The reason for the inability to correctly assign all sequences is not due to the inability of the algorithm itself to

& 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

3

4

5

k-5 k-4 k-3 k-2 k-1

Backbone Position

Figure 3. Coverage of amino acids in a peptide sequence with fragment ions upon CID (A, C) and ETD (B, D) fragmentation of doubly (A, B) and triply (C, D) charged precursor ions. The numbers indicate the positions of the respective amino acids in the sequence, in which 1 stands for the amino-terminal amino acid and k for the carboxy-terminal amino acid. The upper plots were created from 2406 spectrum pairs of doubly charged peptides, the bottom plots from 123 spectrum pairs of triply charged peptides.

find the correct sequence for a given mass spectrum, but predominantly lies in the absence of unequivocal sequence information in the MS data. We included mass spectra identified with a MASCOT Mowse score of about 20 (5% false discovery rate cut-off) in our evaluation data set, in order to include spectra of medium quality for the assignments. Manual inspection of the spectra corroborated the missing or low abundance of diagnostic ions in the spectra, as shown in Figs. 4 and 5. Moreover, CompNovo was able to correctly identify 74% of all amino acids in the 2406 spectrum pairs of the benchmark data set (i.e. correct amino acid at the correct position), which proves the high performance of the algorithm for retrieving sequence information from combined CID and ETD-MS/MS data. Allowing a maximum of two incorrect amino acids in the sequences, about 52% of the sequences of the benchmark data set were assigned to the corresponding mass spectra. Another interesting aspect of the data evaluation is that CompNovo yielded the correct sequence within the top ten scoring suggested sequences from about 50% of the spectra. Additionally, more than 75% of the reported sequences contain at least a correct subsequence of length six and almost 85% contain a tag of a length of five, which are already unique for a peptide in many cases. This constitutes highly useful information in data evaluations focused at supporting database searching routines and represents a basis for utilizing the reported hits in a homology-based sequence search [52]. To show that our approach can reveal novel peptides not identified by standard sequence database searches, we performed de novo sequencing on all spectra of the S. cellulosum data set with CompNovo. As a simple test, we searched for sequences among the CompNovo top-scoring sequences that

www.electrophoresis-journal.com

3744

Electrophoresis 2009, 30, 3736–3747

A. Bertsch et al.

A2500

B16000 CID 2+ S

L

A

R

1500

ETD 2+

14000 A

A

G

E

L

F

F

L

E F

A

A

R

A

L

S

12000 S

1000

L

R

10000 Intensity

Intensity

2000

A A

A

G

E

L

F

F

L

E G

A

A A

Figure 4. (A) CID spectrum of the doubly charged precursor ion of the peptide SLAAGLFEAR. The doubly charged peptide shows a few matching peaks, most of the peaks remained unmatched. (B) ETD spectrum of the peptide SLAAGLFEAR. Although a few peaks could be matched in the spectrum, most of the signals remain unannotated.

R L

S

8000 6000 4000

500

2000 0

0

200

400

600

800

0

1000

0

200

400

600

m/z

A 400000

1000

B140000 CID 2+

ETD 2+

350000

120000 S A L G V A P A P

300000

E

W

V D

L

V

E

K

S A L G V A P A P

E

W

V

D

L

V

E

K

100000 K

E

V

L

D

V

W

E

P A P A V G L A S

K

Intensity

250000 Intensity

800

m/z

200000

E

V

L

D

V

W

E

P A P A V G L A S

80000 60000

150000 40000

100000

20000

50000 0

0

200

400

600

800

1000 1200 1400 1600 1800 m/z

0

0

200

400

600

800

1000 1200 1400 1600 1800 m/z

Figure 5. (A) CID spectrum of the doubly charge peptide SALGVAPAPEWVDLVEK. Although this peptide shows a good coverage of ions, some backbone fragment ions in the middle are absent. Additionally, most of the peaks in the lower m/z range have very low intensities. (B) The ETD spectrum of the peptide SALGVAPAPEWVDLVEK shows an almost complete ladder of ions, representing fragmentation at the N-terminal part of the peptide. However, some of the ions have low intensities.

were identical to sequences of the reference proteome within one amino acid (single-point mutations). Of these, we manually selected three examples for further inspection. The three spectra could be identified by manual annotation as the peptides FENMGAQMVR, LLDQGQAGDNVFCLLR, and TTDVTGTVNLPEGVR. The spectra and the sequences are available in the Supporting Information (Figs. 1–3). All three spectrum pairs show good coverage of all relevant ions. The peptides can thus be identified as single-point mutations of the reference proteome with high confidence based on their CompNovo identification. The results discussed above clearly prove that de novo sequencing algorithms can be successfully applied to interpret low-resolution tandem mass spectra acquired with a mass accuracy of 70.3 Da. Although most of the information required for sequencing lie in the series of fragment ions, in which the mass differences between fragment ions encode the sequence, the mass accuracy is clearly not sufficient to distinguish some amino acids, such as glutamine/asparagine, as well as phenylalanine and oxidized methionine. Although isobaric amino acids, such as leucine and isoleucine, cannot even be resolved on high-resolution mass spectrometers, higher mass accuracy as offered by

& 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

time-of-flight or Fourier-transform instruments will help to further improve the accuracy of the CompNovo algorithm.

3.3 Influence of precursor ion charge state on de novo sequencing Figures 6A and B show the annotated CID and ETD mass spectra of the doubly charged precursor ion of the peptide LFIDPVLGLAPYQGR from the protein Succinyl-CoA ligase (ADP forming) subunit b (sce9142) of S. cellulosum. Although de novo sequencing using the CID spectrum and all CID-based de novo sequencing tools failed to provide the correct sequence because of missing or low-abundant signals, the combination of both spectra and application of CompNovo yielded the fully correct sequence as top-ranked hit. The signals annotated in the spectra clearly show that all positions in the amino acid sequence are well covered by diagnostic ions in the CID or ETD spectrum. As discussed above, the sequence coverage decreased significantly in the CID spectra of triply charged peptides (Fig. 3C), whereas ETD spectra still contained diagnostic ions for most of the amino acids in the sequences, although at significantly less signal intensities (Fig. 3D). In due conse-

www.electrophoresis-journal.com

Proteomics and 2-DE

Electrophoresis 2009, 30, 3736–3747

algorithm on our benchmark data set with those employing other publicly available de novo software packages (LutefiskXP and PepNovo). Since these packages use CID spectra only, CompNovo has a clear advantage through using both CID and ETD spectra. In order to characterize the improvement that combined ETD and CID offer over CID, we also tested a slightly modified version of CompNovo that uses the same algorithm but makes use only of CID spectra (CompNovoCID). The results of this comparison are summarized in Table 1. This table reports the correctly identified peptides as well as identifications with one to three incorrect amino acids with respect to the true sequence. It can be seen that CompNovo not only facilitates a significant improvement in the success rate of suggested sequences with respect to other algorithms (from 2.2 to 28.1% for fully correct peptides) but also significantly improves the correctness of the sequences upon inclusion of ETD data (from 9.8 to 28.1% for fully correct peptide sequences). The identification rates for the different algorithms depending on the number of tolerated false amino acid annotations are shown in Fig. 7. It can be seen that CompNovo starts with a success rate of 28.1% for entirely correct peptide sequences (number of allowed incorrect residues 5 0), which is significantly larger than for all other investigated algorithms. CompNovo reaches a success rate of more than 50% already upon the allowance of two wrong amino acids, whereas the other algorithms require a tolerance of at least five falsely assigned amino acids for the same success rate. Although CompNovoCID was able to identify 55% of all amino acids correctly, about

quence, the identification performance of de novo sequencing was much lower for triply charged peptides. The coverage of cleavage sites dropped as evidenced by the ion statistics and, as expected, the identification performance of the CID-based algorithms was comparatively low. In 123 CID mass spectra of triply charged precursor ions found in our benchmark data set, PepNovo identified only 19.5% of the amino acids correctly, whereas CompNovo identified 33.5% of the amino acids in the corresponding pairs of CID/ETD mass spectra. However, the implemented experimental setup and parameters predominantly yielded spectrum pairs of doubly charged precursor ions and only about 4.9% of the spectra were from triply charged peptides in the benchmark data set, which makes the de novo sequencing of triply and even higher charged peptide ions not a very important issue. The CID and ETD spectra of the triply charged peptide precursor ion LFIDPVLGLAPYQGR shown in Figs. 6C and D demonstrate the poor coverage of the sequence with abundant fragment ions. From this fragmentation data, none of the algorithms was able to identify the correct sequence of the peptide. Nevertheless, CompNovo at least correctly identified a sequence tag of six consecutive correct amino acids in the peptide sequence.

3.4 Comparison of de novo sequencing algorithms and improvement of sequence annotation by including ETD spectral information In order to assess the performance of CompNovo, we compared the results of the implementation of the

A

3745

B 9000

35000

CID 2+ L

F

I

ETD 2+

8000

30000 D

P

V

L G L

A P

Y

Q G

R

7000

L

6000

R

F

I

D

P

V

L G L

A P

Y

Q G

R

25000 G Q

Y

P A

L G L

V

P

D

I

F

L

Intensity

Intensity

R

20000 15000

G Q

Y

P A

L G L

V

P

D

I

F

L

5000 4000 3000

10000 2000 5000

1000

0

0

0

200

400

600

800 1000 m/z

1200

1400

1600

C 10000

0

200

600

800 m/z

1000

1200

1400

1600

Q G

R

D 6000 CID 3+

ETD 3+ 5000

8000

L

F R

I

G Q

D Y

P

V

P A

L G L L G L

A P V

P

Y D

Q G I

F

R

L

4000

L

6000

Intensity

Intensity

400

R

F G Q

I

D

P

V

L G L

A P

Y

P A

L G L

V

400

600

800 m/z

1000

P

Y D

I

F

L

3000

4000 2000 2000

1000

0

0 0

200

400

600

800 1000 m/z

1200

1400

1600

& 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

0

200

1200

1400

1600

Figure 6. CID (A, C) and ETD (B, D) mass spectra of doubly (A, B) and triply (C, D) charged precursor ions of the peptide LFIDPVLGLAPYQGR. (A, B) The spectra in (A) and (B) show a good coverage of highly abundant ions. However, only CompNovo was able to annotate the correct sequence, although the ETD spectrum of the doubly charged peptide contained only a small number of evidences. (B, D) The triply charged CID spectrum shows poor fragmentation behavior. Although the ETD spectrum shows most of the ions as low abundant signals, CompNovo was only able to identify a sequence tag of length six.

www.electrophoresis-journal.com

3746

Electrophoresis 2009, 30, 3736–3747

A. Bertsch et al.

Table 1. Identification rates for the different de novo search programs for the benchmark data set consisting of 2406 spectrum pairsa)

Correct peptides Within one residue Within two residues Within three residues Total correct residues

LutefiskXP (%)

PepNovo (%)

CompNovoCID (%)

CompNovo (%)

0.0 0.0 1.4 2.4 8.5

2.7 2.9 12.1 18.3 46.3

9.8 9.9 24.9 31.8 54.8

28.1 29.0 51.7 60.1 73.7

a) Only the top-ranked peptide sequences were considered for each spectrum pair delivered by each search engine.

engines perform in terms of correctly predicted subsequences. This measure is useful for assessing the possibility of using the reported hits in a homology-based sequence search. Although about 84% of the sequences reported by CompNovo contain a correct tag of length five, this is the case for only 53 and 9% of the sequences retrieved by PepNovo and LutefiskXP, respectively.

Identification Rate

1 0.8 0.6 0.4

0

4 Concluding remarks

LutefiskXP PepNovo CompNovoCID CompNovo

0.2

0

2

4 6 8 10 12 14 16 18 20 Number of Allowed Incorrect Residues

22

Figure 7. Identification rates of the different algorithms depending on the number of tolerated false amino acid annotations in the top-ranked peptide sequence (benchmark data set, n 5 2406).

Relative Frequency

1

LutefiskXP PepNovo CompNovoCID CompNovo

0.8 0.6 0.4 0.2 0

0

2

4 6 8 10 12 14 16 18 Number of Correct Consecutive Residues

20

Figure 8. Correctly predicted subsequences in the top-ranked peptide sequences as a function of subsequence length upon application of different de novo algorithms on the benchmark data set (n 5 2406).

74% of the amino acids were correctly sequenced by CompNovo, representing a significant increase over methods using only CID spectra for this data set. For more than 50% of the spectra, CompNovo has a correct sequence within the Top 10, CompNovoCID for about 24% and PepNovo for about 11%. This shows that the sequence generation works very well for the divide-and-conquer approach. Figure 8 shows how the different identification & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

De novo sequencing using fragmentation data from a combination of CID and ETD spectra gives significantly improved results when compared with de novo sequencing based on CID data only. The better de novo sequencing performance is due to the complementary fragmentation information contained in the two types of spectra, which results in more complete ion series covering the peptide sequence. The CompNovo de novo sequencing algorithm makes use of this complementary information and thus more frequently succeeds in annotating a correct sequence where de novo sequencing based on CID or ETD spectra alone fails due to the lack of sequence information. We further show that using a rapid mass decomposition algorithm improves the de novo sequencing algorithm. Improvements in the ion scoring model and theoretical spectra generation as well as the utilization of accurate fragment ion masses as obtained from high-resolution mass spectrometers are expected to facilitate achieving performances comparable to sequence database searching algorithms. The authors thank Andreas Ka¨mper for proofreading the manuscript. The project was supported by Bundesministerium fu¨r Bildung und Forschung (BMBF) grant 0313842A. The authors have declared no conflict of interest.

5 References [1] Eng, J. K., McCormack, A. L., Yates, J. R., III, J. Am. Soc. Mass Spectrom. 1994, 5, 976–989. [2] Perkins, D. N., Pappin, D. J. C., Creasy, D. M., Cottrell, J. S. Electrophoresis 1999, 20, 3551–3567. [3] Nesvizhskii, A. I., Vitek, O., Aebersold, R., Nat. Methods 2007, 4, 787–797.

www.electrophoresis-journal.com

Electrophoresis 2009, 30, 3736–3747

[4] Savitski, M. M., Nielsen, M. L., Kjeldsen, F., Zubarev, R. A., J. Proteome Res. 2005, 4, 2348–2354. [5] Roepstorff, P., Fohlmann, J., Biomed. Mass Spectrom. 1984, 11, 601. [6] Biemann, K., Biomed. Environ. Mass Spectrom. 1988, 16, 99–111. [7] Zubarev, R. A., Kelleher, N. L., McLafferty, F. W., J. Am. Chem. Soc. 1998, 120, 3265–3266. [8] Syka, J. E., Coon, J. J., Schroeder, M. J., Shabanowitz, J., Hunt, D. F., Proc. Natl. Acad. Sci. USA 2004, 101, 9528–9533.

Proteomics and 2-DE

3747

[31] Schneiker, S., Perlova, O., Kaiser, O., Gerth, K., Alici, A., Altmeyer, M. O., Bartels, D., et al., Nat. Biotechnol. 2007, 25, 1281–1289. [32] Schley, C., Swart, R., Huber, C. G., J. Chromatogr. A 2006, 1136, 210–220. [33] Toll, H., Wintringer, R., Schweiger-Hufnagel, U., Huber, C. G., J. Sep. Sci. 2005, 28, 1666–1674. [34] Premstaller, A., Oberacher, H., Huber, C. G., Anal. Chem. 2000, 72, 4386.

[9] Zubarev, R. A., Mass Spectrom. Rev. 2003, 22, 57–77.

[35] Hartmer, R., Kaplan, D. A., Gebhardt, C. A., Ledertheil, T., Brekenfeld, A., Int. J. Mass Spectrom. 2008, 276, 82–90.

[10] Geer, L. Y., Markey, S. P., Kowalak, J. A., Wagner, L., Xu, M., Maynard, D.M., Yang, X., et al., J. Proteome Res. 2004, 3, 958–964.

[36] Hartmer, R., Lubeck, M., Kiehne, A., Baessmann, C., Brekenfeld, A., 54th ASMS Conference on Mass Spectrometry and Allied Topics, San Antonio, 2006.

[11] Craig, R., Beavis, R., Bioinformatics 2004, 20, 1466–1467.

[37] Delmotte, N., Lasaosa, M., Tholey, A., Heinzle, E., Huber, C. G., J. Proteome Res. 2007, 6, 4363–4373.

[12] Bartels, C., Biomed. Environ. Mass Spectrom. 1990, 19, 363–368. [13] Fernandez-de-Cossio, J., Gonzalez, J., Besada, V., Comput. Appl. Biosci. 1995, 11, 427–434. [14] Taylor, J. A., Johnson, R. S., Rapid Commun. Mass Spectrom. 1997, 11, 1067–1075. [15] Dancik, V., Addona, T. A., Clauser, K. R., Vath, J. E., Pevzner, P. A., J. Comput. Biol. 1999, 6, 327–342. [16] Chen, T., Kao, M. Y., Tepel, M., Rush, J., Church, G. M., J. Comput. Biol. 2001, 8, 325–337. [17] Taylor, J. A., Johnson, R. S., Anal. Chem. 2001, 73, 2594–2604. [18] Fernandez-de-Cossio, J., Gonzalez, J., Satomi, Y., Shima, T., Okumura, N., Besada, V., Betancourt, L., et al., Electrophoresis 2000, 21, 1694–1699. [19] Chen, T., Bingwen, L., J. Comp. Biol. 2003, 10, 1–12. [20] Bern, M., Goldberg, D., J. Comp. Biol. 2006, 13, 364–378. [21] Frank, A., Pevzner, P., Anal. Chem. 2005, 77, 964–973. [22] Lubeck, O., Sewell, C., Gu, S., Chen, X., Cai, D. M., Proc. IEEE 2002, 90, 1868–1874. [23] Fischer, B., Roth, V., Roos, F., Grossmann, J., Baginsky, S., Widmayer, P., Gruissem, W., Buhmann, J. M., Anal. Chem. 2005, 77, 7265–7273. [24] Zhang, Z., Anal. Chem. 2004, 76, 6374–6383. [25] DiMaggio, P. A., Jr. , Floudas, C. A., Anal. Chem. 2007, 79, 1433–1446.

[38] Vasilescu, J., Smith, J. C., Zweitzig, D. R., Denis, N. J., Haines, D. S., Figeys, D., J. Mass Spectrom. 2008, 43, 296–304. [39] Elias, J. E., Haas, W., Faherty, B. K., Gygi, S., Nat. Methods 2005, 2, 667–675. [40] Martens, L., Hermjakob, H., Jones, P., Adamski, M., Taylor, C., States, D., Gevaert, K., et al., Proteomics 2005, 5, 3537–3545. [41] Jones, P., Coˆte´, R. G., Martens, L., Quinn, A. F., Taylor, C. F., Derache, W., Hermjakob, H., Apweiler, R., Nucleic Acids Res. 2006, 1, D659–D666. [42] Sturm, S., Bertsch, A., Gro¨pl, C., Hildebrandt, A., Hussong, R., Lange, E., Pfeifer, N., et al., BMC Bioinformatics 2008, 9, 163. [43] Kohlbacher, O., Reinert, K., Gro¨pl, C., Lange, E., Pfeifer, N., Schulz-Trieglaff, O., Sturm, M., Bioinformatics 2007, 23, e191–e197. [44] Pitteri, S. J., Chrisman, P. A., McLuckey, S. A., Anal. Chem. 2005, 77, 5662–5669. [45] Bo¨cker, S., Liptak, Z., Martin, M., Pervukhin, A., Sudek, H., Bioinformatics 2008, 24, 591–593. [46] Bo¨cker, S., Liptak, Z., Algorithmica 2007, 48, 413–432. [47] Bo¨cker, S., Letzel, M., Liptak, Z., Pervukhin, A., Bioinformatics 2009, 25, 218–224.

[26] Bafna, V., Edwards, N., Bioinformatics 2001, 17, S13–S21.

[48] Montecchi-Palazzi, L., Beavis, R., Binz, P. A., Chalkley, R. J., Cottrell, J., Creasy, D., Shofstahl, J., et al., Nat. Biotechnol. 2008, 26, 864–866.

[27] Havilio, M., Haddad, Y., Smilansky, Z., Anal. Chem. 2003, 75, 435–444.

[49] Good, D. M., Wirtala, M., McAlister, G. C., Coon, J. J., Mol. Cell. Proteomics 2007, 6, 1942–1951.

[28] Elias, J. E., Gibbons, F. D., King, O. D., Roth, F. P., Gygi, S. P., Nat. Biotechnol. 2004, 22, 214–219.

[50] Molina, H., Matthiesen, R., Kandasamy, K., Padey, A., Anal. Chem. 2008, 80, 4825–4835.

[29] Horn, D. M., Zubarev, R. A., McLafferty, R. W., Proc. Natl. Acad. Sci. USA 2000, 97, 10313–10317.

[51] Tabb, D. L., Smith, L. L., Breci, L. A., Wysocki, V. H., Lin, D., Yates, J. R., Anal. Chem. 2003, 75, 1155–1163.

[30] Datta, R., Bern, M., Proc. Res. Comput. Mol. Biol. 2008, 4955, 140–153.

[52] Kim, S., Gupta, N., Bandeira, N., Pevzner, P. A., Mol. Cell. Proteomics 2009, 8, 53–69.

& 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.electrophoresis-journal.com

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.