Method Enabling Fast Partial Sequencing of cDNA Clones

Share Embed


Descripción

Analytical Biochemistry 292, 266 –271 (2001) doi:10.1006/abio.2001.5094, available online at http://www.idealibrary.com on

Method Enabling Fast Partial Sequencing of cDNA Clones Tommy Nordstro¨m,* Baback Gharizadeh,* Nader Pourmand,† Pål Nyren,* and Mostafa Ronaghi† ,1 *Department of Biotechnology, Royal Institute of Technology, Stockholm, Sweden; and †Stanford Genome Technology Center, Stanford University, Palo Alto, California 94304

Received December 13, 2000; published online April 10, 2001

Pyrosequencing is a nonelectrophoretic single-tube DNA sequencing method that takes advantage of cooperativity between four enzymes to monitor DNA synthesis. To investigate the feasibility of the recently developed technique for tag sequencing, 64 colonies of a selected cDNA library from human were sequenced by both pyrosequencing and Sanger DNA sequencing. To determine the needed length for finding a unique DNA sequence, 100 sequence tags from human were retrieved from the database and different lengths from each sequence were randomly analyzed. An homology search based on 20 and 30 nucleotides produced 97 and 98% unique hits, respectively. An homology search based on 100 nucleotides could identify all searched genes. Pyrosequencing was employed to produce sequence data for 30 nucleotides. A similar search using BLAST revealed 16 different genes. Forty-six percent of the sequences shared homology with one gene at different positions. Two of the 64 clones had unique sequences. The search results from pyrosequencing were in 100% agreement with conventional DNA sequencing methods. The possibility of using a fully automated pyrosequencer machine for future high-throughput tag sequencing is discussed. © 2001 Academic Press

Key Words: pyrosequencing; cDNA sequencing; automation; EST sequencing; gene discovery.

The ability to identify genes at the nucleic acid level has prompted efforts to develop methods for sequencebased nucleic acid analysis. Several principles that can provide large amounts of specific genetic information have been described (1, 2). DNA microarray technology provides a powerful tool for gene discovery and for 1 To whom correspondence and reprint requests should be addressed at Stanford Genome Technology Center, Stanford University, 855 California Avenue, Palo Alto, CA 94304. Fax: (650) 8121975. E-mail: [email protected].

266

studying gene expression (3). The use of this technique depends, however, on the genetic information obtained from databases, which are usually derived by DNA sequencing. Single-pass partial sequencing of cDNA clones to generate expressed sequence tags (ESTs 2) also offers a useful method of gene discovery, and it has been widely applied in different organisms (1, 4 – 8). However, this technique, like RNA blotting (9), S1 nuclease protection (10), and differential display (11), can evaluate only a limited number of genes. To perform large-scale EST analysis, robust, accurate, and inexpensive platforms allowing time-effective analysis of large numbers of samples will be required. To increase the capacity of conventional DNA sequencing in comparative EST analysis, serial gene expression analysis (SAGE) was recently described (12, 13). Compared to single-pass partial sequencing methods, SAGE produces approximately 25–30 times more genespecific sequence data of several gene tags in a single run. However, this technique faces limitations, since many mRNA molecules do not have proper restriction sites for cleavage by the conventional endonucleases. Furthermore, the accuracy of gene identification from the obtained sequence length (9 –13 bases) is limited to about 90 –95%. Most recently, parallel signature sequencing based on ligation and cleavage was presented (14, 15). This technique holds great promise for veryhigh-throughput expression analyses, although the accuracy of gene identification in yeast is limited to 90%. Pyrosequencing (16) is emerging as a widely applicable, alternative technology for the detailed characterization of nucleic acids; its coupling with simple, efficient molecular biological protocols enables DNA analysis with high-information content per analysis. This technique employs coupled enzymatic reactions using DNA polymerase, ATP sulfurylase, and luciferase to monitor DNA synthesis (17). Presence of a 2 Abbreviations used: ESTs, expressed sequence tags; SAGE, serial gene expression analysis.

0003-2697/01 $35.00 Copyright © 2001 by Academic Press All rights of reproduction in any form reserved.

FAST PARTIAL SEQUENCING OF cDNA CLONES

nucleotide-degrading enzyme (apyrase) in the system enables iterative nucleotide addition to the reaction mixture. The technique has the potential advantages of accuracy, flexibility, and parallel processing and can be easily automated. Furthermore, it dispenses with the need for labeled primers, labeled nucleotides, and gel electrophoresis. The methodological performance of pyrosequencing in determination of difficult secondary structures (18), mutation detection (19), resequencing of disease genes (20), sequencing on DNA microarrays (21), single-nucleotide polymorphism analysis on single-stranded (22, 23) and double-stranded (24) DNA, and DNA sequencing on double-stranded DNA (25) has been shown to be feasible. Most recently, the addition of single-stranded DNA-binding protein to the pyrosequencing reaction system has proven to be useful for long read sequencing and sequence determination of difficult templates as well as for providing flexibility in primer design (26). In this work, we performed single-pass DNA sequencing by both pyrosequencing and Sanger DNA sequencing with the aim of gene finding. We analyzed 64 randomly cloned cDNA obtained after phage display, panning, and enrichment. Reliable sequence data were recorded for at least 30 bases on different lengths of PCR templates by pyrosequencing, which demonstrated to be a good length for gene identification in higher organisms. MATERIALS AND METHODS

Oligonucleotide Synthesis and Purification The oligonucleotides BSS-PCR-UP (5⬘-Biotin-GTTCCTTTCTGTGCGGCCCGGCCG), SS-PCR-DOWN (5⬘-CTTCAGAGATCAGTTTCTGCTCGGG), and FSSSEQ-DOWN (5⬘-CTGCTCGGGCCCAGATCTG) were synthesized and HPLC purified by Interactiva (Ulm, Germany). cDNA Library, Phage Display, Panning, and Enrichment Human spleen EasyMATCH phage display cDNA (CLONTECH Laboratories, Basingstoke, UK) was used for preparation of the phage stock library. Panning rounds were performed using standard procedures involving M13K07 helper phage (New England Biolabs, Beverly, MA). Subsequently, the obtained phage particles were incubated with 500 ␮g streptavidin-coated paramagnetic beads (Dynal AS, Oslo, Norway). The bound phage particles were eluted with 500 ␮l 0.2 M glycine, pH 2.5, and titrated after each round of panning by infecting log-phase host cells. The obtained cell suspension was spread onto a large agar plate. Sixty-four colonies were picked for further analysis by pyrosequencing and Sanger DNA sequencing.

267

Signature Length Analysis One hundred sequences were retrieved from the human EST database. A 20-nucleotide-long region was randomly selected from the retrieved sequences and an homology search was performed to investigate the uniqueness of the signature length. To investigate how signature length improves the search result, the searched 20-nucleotide-long sequence was extended by 10 and 80 nucleotides, respectively, and a basic BLAST search was carried out. DNA Amplification and Template Preparation PCRs were performed with BSS-PCR-UP and SSPCR-DOWN on several colonies from the library. Sixty-four PCR products of different lengths ranging from 100 to 800 nucleotides were immobilized onto streptavidin-coated paramagnetic beads according to the supplier’s (Dynal AS) recommendations. Singlestranded DNA was obtained by removing the supernatant after incubating the immobilized PCR product in 0.10 M NaOH for 3 min. FSS-SEQ-DOWN sequencing primer was hybridized to the immobilized ssDNA strand in 10 mM Tris–acetate, pH 7.5, and 20 mM magnesium acetate. Pyrosequencing Pyrosequencing was performed at room temperature in a volume of 50 ␮l on a single-tube automated pyrosequencing system (kindly supplied by Pyrosequencing AB, Uppsala, Sweden). One-third of a primed PCR product was added to the pyrosequencing reaction mixture containing 10 U exonuclease-deficient Klenow DNA polymerase (Amersham Pharmacia Biotech, Uppsala, Sweden), 40 mU apyrase (Sigma Chemical Co., U.S.A.), 100 ng purified luciferase (BioThema, Dalaro¨, Sweden), 15 mU recombinant produced ATP sulfurylase (27), 0.5 ␮g single-stranded DNA-binding protein (Amersham Pharmacia Biotech), 0.1 M Tris–acetate (pH 7.75), 0.5 mM EDTA, 5 mM magnesium acetate, 0.1% bovine serum albumin, 1 mM dithiothreitol, 5 ␮M adenosine 5⬘-phosphosulfate (APS), 0.4 mg/ml polyvinylpyrrolidone (360,000), and 100 ␮g/ml D-luciferin (BioThema). The sequencing procedure was carried out by stepwise elongation of the primer strand upon sequential addition of the different deoxynucleoside triphosphates (Amersham Pharmacia Biotech) and simultaneous degradation of nucleotides by apyrase. Nucleotides were added every 60 s. The output of light resulting from nucleotide incorporation was detected by a photomultiplier tube. The data were obtained in Microsoft Excel format and base calling was carried out manually by measuring the height of the signals. Sequence similarity searches were performed using BLAST.

268

¨ M ET AL. NORDSTRO

of the four different nucleotides under constant agitation of the well, and incorporation is recorded as a flash of light by a photomultiplier tube. This automated machine was used to show the feasibility of the system in terms of reaction volume, format, mixing efficiency, and dispensing. Pyrosequencing yielded accurate data on PCR products of different lengths, as shown by typical pyrograms (Fig. 3). The data clearly revealed a high signal-to-noise ratio and proportional signals in the presence of more than one identical nucleotide. Sixty-four cDNA samples, each between 100 and 800 nucleotides in length, were analyzed. Theoretical Database Search

FIG. 1. A schematic representation of the approach used for gene discovery. A selected human cDNA library was obtained after phage display with a bait protein, followed by rounds of panning and enrichment from a human gene library. Pyrosequencing was performed to obtain short reads on a cDNA template, and the result was analyzed using BLAST.

Conventional DNA Sequencing The sequencing data obtained from pyrosequencing were confirmed by Sanger DNA sequencing on an ABI 377 (64 lanes) using BigDye terminator chemistry and SS-PCR-DOWN as sequencing primer as described earlier (8).

The sequence order of nucleotides determines the nature of the DNA. Theoretically, sequence information of nine nucleotides in a row finds a unique sequence of 262,144 diversities, which exceeds the number of genes in humans; however, in practicality it has been shown that for gene finding in complex organisms information of a longer sequence of DNA is needed. Here we performed database analysis by searching sequence lengths of 20, 30, and 100 nucleotides to find a unique homology. Ninety-seven percent of the searches demonstrated to be unique when 20 nucleotides were analyzed. With information from 30 nucleotides, 98% of the sequences demonstrated to be unique. The 2% that shared homology with other genes were shown upon closer analysis to belong to gene

RESULTS

Pyrosequencing on a Selected cDNA Library from Human Pyrosequencing was performed on a selected cDNA library, which was obtained from a human gene library processed by phage display with a bait protein, followed by rounds of panning and enrichment (Fig. 1). PCR was performed directly on obtained bacterial colonies, using a biotinylated primer as one of the primers. The PCR products were immobilized onto streptavidin-coated paramagnetic beads. After alkali treatment, the immobilized DNA strands were analyzed by pyrosequencing. The objective was to sequence 30 nucleotides of each colony by pyrosequencing, then compare the data with data obtained from Sanger DNA sequencing. A fully automated single-tube pyrosequencing system (Fig. 2) was used that precisely dispensed four different nucleotides in a volume of 200 nl. In this automated system, the dispenser set moves across the reaction well to perform sequential addition

FIG. 2. Schematic representation of the automated single-tube pyrosequencing system. The automated machine is based on precise delivery of four different nucleotides. The dispenser set moves across the X-axes to perform sequential addition of nucleotides. The output of light is detected by a photomultiplier tube and the data are obtained in Microsoft Excel format. In the latest version of the automated pyrosequencing system (PSQ 96), the dispenser set moves across a microtiter plate and the output of light from the whole plate is detected simultaneously by a single CCD camera.

FAST PARTIAL SEQUENCING OF cDNA CLONES

269

FIG. 3. Typical pyrograms for sequencing of 30 nucleotides obtained from pyrosequencing on a selected cDNA library from human. The lengths of the analyzed PCR products (in nucleotides) were (a) 420, (b) 700, (c) 280, (d) 320, (e) 280, and (f) 300. Base calling was performed manually and the following sequences were obtained: (a) AATCCGCCCCCTCGCCCGTCACGCACCGCAC, (b) ATACCGGTCCGGAATTCCCGGGTCGACCTA, (c) ATCCGACATATGCACGTATT-GATATTCGCAC, (d) ATGTCGTATTACCCTATAGTGAGTCGTATT, (e) ATACCAGCT-TTCCCTATAGTGAGTCGTATTAAG, and (f) ATCCCATGTTGCCCTATGGAATCCAGAGC. The pyrosequencing reaction was performed as described under Materials and Methods.

270

¨ M ET AL. NORDSTRO

families. By analysis of more than 100 nucleotides, all of the sequences were found to be unique. Database Search The data obtained from the first 30 nucleotides with pyrosequencing (the sequence after 30 nucleotides were removed in Fig. 3) and Sanger DNA sequencing on the above-described cDNA library were in 100% agreement, indicating high accuracy of the system for short reads. A similar search using BLAST revealed 16 different genes. Forty-six percent of the sequences shared homology with 1 gene at different positions. Two of the 64 clones had unique sequences. The search results from pyrosequencing were in 100% agreement with conventional DNA sequencing methods. DISCUSSION

We have demonstrated the performance of pyrosequencing for EST sequencing on 64 cDNA templates of different lengths. Accurate data for short reads can be obtained by pyrosequencing, in contrast to the other previously described sequencing-by-synthesis methods (28 –31). High fidelity in this system is obtained by using unlabeled nucleotides, addition of an amount of nucleotides within the approximately K M range of the DNA polymerase, and the presence of a nucleotidedegrading enzyme (apyrase). The apyrase degrades the nucleotide to a concentration far below the K M of polymerase within a few seconds; however, the presence of apyrase in the system limits the reading of homopolymeric regions when the number of identical nucleotides exceeds 5– 6 bases. As demonstrated in the pyrogram in Fig. 3, relatively constant signal intensities for each base incorporation during a run are obtained, indicating very efficient polymerization and degradation in each cycle. The major obstacle to obtaining longer read lengths is likely due to accumulation of inhibitory substances during the sequencing process, which is indicated by the slower degradation of ATP by apyrase in later cycles. Removal of the inhibitory substances will increase the efficiency of apyrase in degradation of nucleoside triphosphates, thereby allowing more synchronized extensions and longer reads. There are several advantages in using pyrosequencing compared to SAGE analysis and parallel signature sequencing based on ligation– cleavage: (i) the original clone is directly available for further analysis, i.e., for additional sequence information and for expression analysis; (ii) pyrosequencing uses both random and full-length cDNA libraries for sequence analysis while a certain loss of information is inherent in the SAGE and ligation– cleavage-based sequencing methods since these techniques are based on digestion of the cDNA at a given restriction site; (iii) pyrosequencing can use the

standard search protocol for finding sequence homology while special software is required for analysis of data from other above-mentioned techniques; and (iv) pyrosequencing generates longer read lengths than the SAGE and ligation– cleavage-based sequencing methods which simplify gene identification. In addition, pyrosequencing offers almost the same range of accuracy in gene identification as partial cDNA sequencing by the Sanger technique. This study has been performed by a single-tube pyrosequencing system, which is limited in throughput; however, parallel processing strategies are under development. Recently, a microtiter plate-based pyrosequencing system was developed which uses a single CCD unit to record the flashes of light produced from each well of the plate. This machine might be useful for future high-throughput tag sequencing. Further work is also underway to miniaturize this system using microarray and microfluidics formats, which will increase the throughput and decrease the material cost by at least two orders of magnitude. ACKNOWLEDGMENTS This work was supported by grants from the Swedish Research Council for Engineering Sciences (TFR) and the Swedish National Board for Industrial and Technical Development (NUTEK).

REFERENCES 1. Adams, M. D., Kelley, J. M., Gocayne, J. D., et al. (1991) Complementary DNA sequencing: Expressed sequence tags and human genome project. Science 252, 1651–1656. 2. DeRisi, J., Penland, L., Brown, P. O., et al. (1996) Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat. Genet. 14, 457– 460. 3. DeRisi, J. L., Iyer, V. R., and Brown, P. O. (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680 – 686. 4. Adams, M. D., Soares, M. B., Kerlavage, A. R., Fields, C., and Venter, J. C. (1993) Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library. Nat. Genet. 4, 373–380. 5. Adams, M. D., Kerlavage, A. R., Fleischmann, R. D., et al. (1995) Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377, 3–21. 6. Affara, N. A., Bentley, E., Davey, P., Pelmear, A., and Jones, M. H. (1994) The identification of novel gene sequences of the human adult testis. Genomics 22, 205–210. 7. Ajioka, J. W., Boothroyd, J. C., Brunk, B. P., et al. (1998) Gene discovery by EST sequencing in Toxoplasma gondii reveals sequences restricted to the Apicomplexa. Genome Res. 8, 18 –28. 8. Sterky, F., Regan, S., Karlsson, J., et al. (1998) Gene discovery in the wood-forming tissues of poplar: Analysis of 5,692 expressed sequence tags. Proc. Natl. Acad. Sci. USA 95, 13330 –13335. 9. Alwine, J. C., Kemp, D. J., and Stark, G. R. (1977) Method for detection of specific RNAs in agarose gels by transfer to diazobenzyloxymethyl-paper and hybridization with DNA probes. Proc. Natl. Acad. Sci. USA 74, 5350 –5354.

FAST PARTIAL SEQUENCING OF cDNA CLONES 10. Berk, A. J., and Sharp, P. A. (1977) Sizing and mapping of early adenovirus mRNAs by gel electrophoresis of S1 endonucleasedigested hybrids. Cell 12, 721–732. 11. Liang, P., and Pardee, A. B. (1992) Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. Science 257, 967–971. 12. Velculescu, V. E., Zhang, L., Vogelstein, B., and Kinzler, K. W. (1995) Serial analysis of gene expression. Science 270, 484 – 487. 13. Velculescu, V. E., Zhang, L., Zhou, W., et al. (1997) Characterization of the yeast transcriptome. Cell 88, 243–251. 14. Brenner, S., Johnson, M., Bridgham, J., et al. (2000) Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 18, 630 – 634. 15. Brenner, S., Williams, S. R., Vermaas, E. H., et al. (2000) In vitro cloning of complex mixtures of DNA on microbeads: Physical separation of differentially expressed cDNAs. Proc. Natl. Acad. Sci. USA 97, 1665–1670. 16. Ronaghi, M., Uhlen, M., and Nyren, P. (1998) A sequencing method based on real-time pyrophosphate. Science 281, 363– 365. 17. Hyman, E. D. (1988) A new method of sequencing DNA. Anal. Biochem. 174, 423– 436. 18. Ronaghi, M., Nygren, M., Lundeberg, J., and Nyren, P. (1999) Analyses of secondary structures in DNA by pyrosequencing. Anal. Biochem. 267, 65–71. 19. Ahmadian, A., Lundeberg, J., Nyren, P., Uhlen, A., and Ronaghi, M. (2000) Analysis of the p53 tumor suppressor gene by pyrosequencing. BioTechniques 28, 140 –144, 146 –147. 20. Garcia, C. A., Ahmadian, A., Gharizadeh, B., Lundeberg, J., Ronaghi, M., and Nyren, P. (2000) Mutation detection by pyrosequencing: Sequencing of exons 5– 8 of the p53 tumor suppressor gene. Gene 253, 249 –257. 21. Kwiatkowski, M., Fredriksson, S., Isaksson, A., Nilsson, M., and Landegren, U. (1999) Inversion of in situ synthesized oligonu-

22.

23.

24.

25.

26.

27.

28. 29. 30.

31.

271

cleotides: Improved reagents for hybridization and primer extension in DNA microarrays. Nucleic Acids Res. 27, 4710 – 4714. Ahmadian, A., Gharizadeh, B., Gustafsson, A. C., et al. (2000) Single-nucleotide polymorphism analysis by pyrosequencing. Anal. Biochem. 280, 103–110. Milan, D., Jeon, J.-T., Looft, C., et al. (2000) A mutation in PRKAG3 associated with excess glykogen content in pig skeletal muscle. Science 288, 1248 –1251. Nordstrom, T., Ronaghi, M., Forsberg, L., De Faire, U., Morgenstern, R., and Nyren, P. (2000) Direct analysis of single-nucleotide polymorphism on double-stranded DNA by pyrosequencing. Biotechnol. Appl. Biochem. 31, 107–112. Nordstrom, T., Nourizad, K., Ronaghi, M., and Nyren, P. (2000) Method enabling pyrosequencing on double-stranded DNA. Anal. Biochem. 282, 186 –193. Ronaghi, M. (2000) Improved performance of pyrosequencing using single-stranded DNA-binding protein. Anal. Biochem. 286, 282–288. Karamohamed, S., Nilsson, J., Nourizad, K., Ronaghi, M., Pettersson, B., and Nyren, P. (1999) Production, purification, and luminometric analysis of recombinant Saccharomyces cerevisiae MET3 adenosine triphosphate sulfurylase expressed in Escherichia coli. Protein Expr. Purif. 15, 381–388. Canard, B., and Sarfati, R. S. (1994) DNA polymerase fluorescent substrates with reversible 3⬘-tags. Gene 148, 1– 6. Cheesman, P. C. (1994) Method for sequencing polynucleotides, U.S. Patent No. 5.302509. Metzker, M. L., Raghavachari, R. Richards, S. et al. (1994) Termination of DNA synthesis by novel 3⬘-modified-deoxyribonucleoside 5⬘-triphosphates. Nucleic Acids Res. 22, 4259 – 4267. Tsien, R. Y., Ross, P., Fahnestock, M., and Johnston, A. J. (1991) PCT WO 91/06678.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.