TECHNICAL ARTICLE: Simultaneous cloning of multiple nuclear genes by pooling PCR products of variable size: a cost-effective method of improving efficiency in large-scale genetic analyses

July 8, 2017 | Autor: Jason Addison | Categoría: Genetics, Biological Sciences, PCR, Cost effectiveness, Diploid, Large Scale
Share Embed


Descripción

Molecular Ecology Notes (2007) 7, 389–392

doi: 10.1111/j.1471-8286.2006.01660.x

TECHNICAL ARTICLE

Blackwell Publishing Ltd

Simultaneous cloning of multiple nuclear genes by pooling PCR products of variable size: a cost-effective method of improving efficiency in large-scale genetic analyses JASON A. ADDISON Department of Ecology and Evolutionary Biology, Earth and Marine Science Building, University of California Santa Cruz, Santa Cruz, CA 95064 , USA

Abstract I present a simple approach to overcome the high cost and low efficiency of cloning polymerase chain reaction (PCR) products for individuals in wide-scale population genetic analyses. The methodology reduces the number of cloning reactions per individual by engineering a suite of genetic markers that differ in size and pooling these PCR products prior to cloning. Alleles from each gene are then recovered by screening transformed bacterial colonies and identifying the inserts corresponding to each gene based on size. I demonstrate the utility of this technique by presenting the results I obtained from cloning four nuclear genes in 118 individuals from three species of sea urchins (Strongylocentrotus purpuratus, S. droebachiensis and S. pallidus). Of the 472 different PCR products I cloned, I recovered at least one allele for 432 of them (91.5%) by screening between 16 and 32 bacterial colonies for each individual. There existed a bias with respect to recovery efficiency: the two largest fragments (1130–800 bp) were recovered 100% of the time, while the two smaller fragments (580 –650 bp) were recovered in 85.6% and 81.4% of the experiments, respectively. I discuss the promise of this application for wide-scale genetic analyses. Keywords: diploid, DNA polymorphism, nuclear gene, PCR, pooled cloning Received 13 August 2006; revision accepted 13 November 2006

The analysis of nuclear genes in studies of population genetics is surprisingly infrequent and often limited to model organisms (see Zhang & Hewitt 2003). This limitation may be due in part to the difficulty of identifying alleles in heterozygous individuals when both strands are sequenced at that same time [i.e. direct sequencing polymerase chain reaction (PCR) products from diploids]. Even though powerful algorithms exist to assign allelic identity (e.g. Clark 1990; Stephens et al. 2001), computational approaches such as these are often insufficient in assigning all sequences in a data set, particularly when nucleotide diversity is high. Circumventing these issues requires the independent sequencing of one or both copies of a nuclear gene in a diploid species. While methodologies such as the development of allele-specific primers work for some species (e.g. Atlantic cod: Stenvik et al. 2006), often the Correspondence: Jason A. Addison, Fax: 831-459-5353; E-mail: [email protected] © 2006 The Author Journal compilation © 2006 Blackwell Publishing Ltd

easiest way to separate the individual alleles is to clone each PCR product prior to sequencing. Since cloning DNA at the individual level is both time consuming and expensive, molecular ecologists either dramatically limit their sample size, or avoid the step altogether. There are two ways to increase the cost effectiveness of cloning genes in a large-scale multilocus genetic analysis: use fewer reagents in each cloning reaction or clone more genes at once. Here I present a simple approach to simultaneously clone several PCR products from multiple genes in each individual. I show that by engineering PCR primers to generate products that differ in size for each target gene, it is possible to pool the fragments from each individual and clone them at the same time. The recovery of each gene from the transformed bacterial colonies is accomplished by characterizing the insert size with a simple PCR test. While many studies sequence multiple clones to obtain the full diploid genotype for each gene, the goal of this protocol is to rapidly survey patterns of nucleotide

390 T E C H N I C A L A R T I C L E Table 1 Fragment size (bp), ratio of the PCR products pooled in each cloning reaction, and overall recovery rate of the four nuclear genes cloned in 118 indivdual sea urchins Locus

Size

Ratio

Recovery (%)

SoxB2 Cyclin sm32 gp96

1130 800 650 580

1 4 5 5

100 99.2 81.4 85.6

diversity and genealogical structure among populations or species by randomly sampling one allele at each gene for each individual. I demonstrate the utility of this method by summarizing the results of a large-scale genetic analysis involving the collection of sequence data from four nuclear genes in individuals (n = 118) of three species of sea urchins (Strongylocentrotus purpuratus, S. droebachiensis and S. pallidus).

Methods Marker development To develop the genetic markers used in this study I obtained from GenBank four cDNA clones from genes characterized in the purple sea urchin (Strongylocentrotus purpuratus) (SoxB2, Cyclin, sm32, gp96). Using blast to search the preliminary assembly of the S. purpuratus genome (www.hgsc.bcm.tmc.edu/projects/seaurchin/) I was able to identify the positions of both the exons and introns for each target gene. Using this information, I designed exon-specific PCR primers to amplify between 1200 bp and 1500 bp of each gene. I produced and direct sequenced PCR products from 10 individual S. purpuratus collected near Santa Cruz, California. Cross-amplification in two closely related species, Strongylocentrotus droebachiensis and Strongylocentrotus pallidus, was successful for all genes in some samples, and I direct sequenced three individuals of each that were collected near Friday Harbor, Washington. The sequences obtained had a high level of nucleotide and insertion and deletion (indel) polymorphism, and most sequences only read from 100 to 500 bp before a frame-shift in one of the alleles made the chromatograms illegible. Using this preliminary data, I engineered primers for each gene to produce PCR products that differed in size by at least 70 bp (Table 1). This difference in size allowed for easy discrimination of transformant bacterial colonies containing inserts of each gene by using agarose gel electrophoresis.

Cloning PCR products for all four genes were obtained using an Idaho Technologies 1605 Capillary Air Thermo-Cycler

(Idaho Technology, Inc). PCRs were performed in 10-µL volumes consisting of 1.0 to 10 ng of template DNA, 1× Taq Extender Buffer (Stratagene), 0.2 mm dNTPs (Amersham Biosciences), 0.1 mg/mL bovine serum albumin (BSA), 0.5 mm MgSO4, 0.25 µm forward and reverse primers, 0.4 U of Taq polymerase (New England Biolabs Inc), and 0.4 U of Taq Extender (Stratagene). Reaction mixes were sealed in glass capillary tubes prior to amplification. The thermal cycling protocol consisted of a 1-min denaturation at 94 °C, 35 cycles of 1-s denaturation at 94 °C, 1-s annealing at 49 °C, and 90-s extension at 68 °C. To ensure the complete addition of the 3′ A overhang, the final extension at 72 °C was held for 10 min. To ensure primer fidelity and to estimate relative DNA concentrations of for each gene, a small volume of each PCR product (3 µL) was resolved in 1% agarose and visualized using ethidium bromide. Successful PCRs were then pooled together in equal quantities (approximately 150 ng each) and copurified in agarose using a Zymoclean Gel DNA Recovery Kit (Zymo Research). I then cloned the pooled DNA mixture by inserting approximately 8–12 ng into 2.5 ng of the plasmid using the TOPO TA cloning kit (Invitrogen) following the manufacturer’s recommended protocol.

Colony screening and identification of inserts based on size To isolate each of the four genes I used vector-specific primers to PCR amplify the inserts from an initial 16 colonies per individual. Using a sterile flat toothpick, I re-suspended each colony in separate wells on a 96-well PCR plate, each containing 50–100 µL of sterile dH2O. PCRs were performed (in a separate plate) under oil in 5-µL total volume with the following reagents: 1 µL re-suspended bacterial DNA, 1× Thermopol buffer, 0.2 mm dNTPs, 0.5 mm MgSO4, 0.25 µm of each T3 (5′-AATAACCCTCACTAAAGGGA) and T7 (5′-TAATACGACTCACATTAGGG) primer, and 0.2 U Taq polymerase (New England Biolabs Inc.). PCR products were generated using an ABI GeneAmp 9700 thermal cycler, and the cycling protocol consisted of an initial denaturation of 3 min at 95 °C, 35 cycles of 30 s at 94 °C, 30 s at 55 °C, and 1 min 30 s at 72 °C, followed by a final extension of 1 min 30 s at 72 °C. To identify the insert size for each colony, the total volume of each amplified product was resolved in 1.8% agarose and visualized using ethidium bromide. In the cases where an allele for each of the four genes was not recovered, I screened additional 8 –16 colonies. Sequencing was performed by randomly selecting one allele from each gene and generating a 20-µL PCR product using the protocol described above. Products were gel purified using Zymo-Spin columns, sequenced with both the T3 and T7 primers using ABI Big Dye Terminators (version 3.0), and resolved using an ABI 3100 Automated Capillary Sequencer (Applied Biosystems). © 2006 The Author Journal compilation © 2006 Blackwell Publishing Ltd

T E C H N I C A L A R T I C L E 391

Fig. 1 The fraction of the 118 experiments for which an allele was recovered for one, two, three or four of the nuclear genes simultaneously cloned from individual sea urchins. Each bar represents the maximum number of colonies screened, and in many cases, fewer than 24 or 32 were required to obtain an allele for all four fragments.

Results PCR products for all four genes were successfully amplified in all 118 individuals. Initially, the equal pooling ratios of the genes produced a biased recovery of the largest gene (SoxB2), as 95% of all the bacterial colonies screened in the first experiment (n = 6) had an allele that corresponded in size to this fragment (data not shown). The remaining 5% of the colonies screened had inserts consistent with the second largest gene (Cyclin). A second test of this protocol in which several pooling ratios were tested determined that reducing the proportion of the over-represented fragments (SoxB2, Cyclin) in the pooled mix successfully improved the recovery of the smallest fragments (sm32, gp96) (Table 1). Averaging over all 118 individuals included in this analysis, the overall recovery efficiency of at least one allele for each of the nuclear genes ranged from 81.4% to 100% (Table 1). In total, 91.5% (432/ 472) of the cloned fragments were recovered by screening between 16 and 32 bacterial colonies (Fig. 1). Sequencing of cloned PCR products is known to generate errors through single-base substitutions (e.g. Kobayashi et al. 1999). To access the influence of Taqinduced errors, I sequenced at least four, and in most cases six, clones per individual for 13 randomly selected gene fragments. I detected the full diploid genotype in eight of these cases, of which Taq-induced errors were observed in three (two single-base substitutions and one double-base substitution).

Discussion The results from this experiment indicate that both the cost and the time involved in cloning nuclear genes from diploid species can be reduced if the primers designed for © 2006 The Author Journal compilation © 2006 Blackwell Publishing Ltd

each locus are specifically engineered to produce fragments that differ in size. By pooling the fragments from each individual and simultaneously cloning them, the total number of cloning reactions required to collect a large multigene data set decreases dramatically. Recovery of these fragments is simply a matter of PCR screening transformed bacterial colonies in order to identify those that house recombinant inserts corresponding to each gene. Although the cloning and screening process required to identify alleles from each gene is slightly more time consuming, this methodology is more efficient than cloning each PCR product individually. Furthermore, by simultaneously cloning four genes at once I was able to reduce the cost of the cloning reagents to 25% of the original cost. Preliminary results using a new set of genetic markers in the same species of sea urchins suggests that this protocol may be extended to multilocus systems with more than four genes. In this study, I used one quarter of the volumes recommended by the manufacturer’s protocol for both the ligation and transformation reactions (i.e. each ligation reaction was 1.5-µL instead of 6-µL final volume). Even at this small scale spreading only 10–40 µL of bacterial transformants on Luria Bertani plates produced anywhere from 50 to 200 colonies. By performing the cloning reactions on a smaller scale, I was able to further reduce the cost of cloning by an additional 25%. I produced a nucleotide sequence data set of 432 different gene copies in 118 individual sea urchins by using the cloning reagents for fewer than 40 standard reactions. Individually cloning single genes for each individual using the manufacturer’s recommended volumes would have consumed a minimum of 472 standard reactions. However, by pooling PCR products and scaling back reaction volumes, I was able to reduce the cost associated with the cloning step by more than an order of magnitude, and this represents a substantial saving to the budget-minded molecular ecologist. I encountered two technical artefacts in this experiment, one of which is a result of pooling ratios of the four genes and the other can be attributed to the actual process of cloning PCR products. It is unclear why the smaller fragments were unequally represented in the colonies I screened, and this result was unexpected as the cloning of larger fragments is thought to be less efficient (TOPO manual). Brownsetin et al. (1996) show that Taq polymerase is least efficient at adding a nontemplate 3′ A next to an A, and one possibility is that the addition of this 3′ A was more efficient with the larger DNA fragments as direct result of their primer sequences. However, none of the eight primers used in this study began with a 5′-T, and there was no clear relationship between the cloning efficiency and the 5′ nucleotide of each primer. Results from cloning experiments using several different sets of nuclear genes have, in some cases, experienced a similar bias in overall recovery

392 T E C H N I C A L A R T I C L E efficiency of some fragments. However, when a bias was encountered the recovery efficiency of the fragments was not related to their size suggesting that the cloning bias might simply be due to stochastic processes acting during each step in the protocol. Although the mechanism for the bias in cloning efficiency is unclear, this technical challenge was easily overcome by empirically determining an appropriate pooling ratio for the genes being cloned. However, it should be noted that determining the correct pooling ratio for a much larger set of loci could be quite challenging. The introduction of Taq-induced stochastic mutations while generating and cloning PCR products is a ubiquitous problem (e.g, Kobayashi et al. 1999). Although rarely observed in this study, stochastic events such as this have been observed in several other studies employing cloning as a method to resolve individual alleles in diploids (e.g. Palumbi & Baker 1994; Gaudieri et al. 1999; Beltrán et al. 2002; Hare & Weinberg 2005). However, since Taq errors are random they are not likely to generate shared, derived characters that contribute to the topology of the tree or to the lengths of the internal branches (Beltrán et al. 2002; Hare & Weinberg 2005). Instead, the random addition of singletons to the data set will result in an increase in the number of terminal branches and inflation of the overall levels of nucleotide diversity. The influence of either accommodating or ignoring low frequency Taq errors in the statistical analysis will largely depend on the nature of the question being asked. Some methodologies now allow for the inclusion of sequencing error rates (e.g. migrate; Beerli & Felsenstein 2001), but determining the bias these errors may have on the final conclusions requires further investigation. The approach of randomly sampling one allele per locus per individual is an efficient way to survey the levels of DNA polymorphism and genealogical structure among populations or between species (Muller et al. 2005). Although this method does not provide information on levels of individual heterozygosity, it does provide suitable phylogenetic signal while minimizing both the cost and the effort of sequencing multiple clones from each individual. The use of nuclear DNA sequence data to address hypotheses of population growth history, demography, gene flow and speciation using coalescent-based methods has been clearly demonstrated (e.g. Muller et al. 2005; Kronfrost et al. 2006; see review by Hare 2001). With efforts in genome sequencing on the rise, there is now a wide diversity of organisms from which to develop nuclear genetic markers, and by considering this simple technique, it may be possible to overcome both high costs and time constraints while collecting genetic data for wide-scale population or species-level analyses.

Acknowledgements I thank D.R. Cox for suggesting I write this manuscript, G.H. Pogson for supporting the work, and A.J. Addison, J.M. Pujolar and two anonymous referees for comments on earlier drafts of the text. This research was funded by a grant from NSF (0CE-0350443) to GHP.

References Beerli P, Felsenstein J (2001) Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proceedings of the National Academy of Sciences, USA, 98, 4563–4568. Beltrán M, Jiggins CD, Bull V et al. (2002) Phylogenetic discordance at the species boundary: comparative gene genealogies among rapidly radiating Heliconius butterflies. Molecular Biology and Evolution, 19, 2176–2190. Brownsetin MJ, Carpten JD, Smith JR (1996) Modulation of nontemplated nucleotide addition by Taq DNA polymerase: primer modifications that facilitate genotyping. BioTechniques, 20, 1004–1010. Clark AG (1990) Inference of haplotypes from PCR-amplified samples of diploid populations. Molecular Biology and Evolution, 7, 111–122. Gaudieri S, Kulski JK, Dawkins RL, Gojobori T (1999) Extensive nucleotide variability within a 370-kb sequence from the central region of the major histocompatibility complex. Gene, 238, 157– 161. Hare MP (2001) Prospects for nuclear gene phylogeography. Trends in Ecology & Evolution, 16, 700–706. Hare MP, Weinberg JR (2005) Phylogeography of surfclams, Spisula solidissima, in the western North Atlantic based on mitochondrial and nuclear DNA sequences. Marine Biology, 146, 707– 716. Kobayashi N, Tamura K, Aotsuka T (1999) PCR error and molecular population genetics. Biochemical Genetics, 37, 317–321. Kronfrost MR, Young LG, Blume LM, Gilbert LE (2006) Multilocus analyses of admixture and introgression among hybridizing Heliconius butterflies. Evolution, 60, 1254–1268. Muller MH, Poncet C, Prosperi JM, Santoni S, Ronfort J (2005) Domestication history in the Medicago sativa species complex: inferences from nuclear sequence polymorphism. Molecular Ecology, 15, 1589–1602. Palumbi SR, Baker CS (1994) Contrasting population structure from nuclear intron sequences and mtDNA of humbpack whales. Molecular Biology and Evolution, 11, 426–435. Stenvik J, Wesmajervi MS, Damsgård B, Delghandi M (2006) Genotyping of pantophysin I (Pan I) of Atlantic cod (Gadus morhua L.) by allele-specific PCR. Molecular Ecology Notes, 6, 272– 275. Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics, 68, 978–989. Zhang DX, Hewitt GM (2003) Nuclear DNA analyses in genetic studies of populations: practice, problems, and prospects. Molecular Ecology, 12, 563–584.

© 2006 The Author Journal compilation © 2006 Blackwell Publishing Ltd

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.