A database of expressed genes from Cochliomyia hominivorax (Diptera: Calliphoridae)

Share Embed


Descripción

MOLECULAR BIOLOGY/GENOMICS

A Database of Expressed Genes From Cochliomyia hominivorax (Diptera: Calliphoridae) F. D. GUERRERO,1,2 S. E. DOWD,3 A. DJIKENG,4 G. WILEY,5 S. MACMIL,5 L. SALDIVAR,1 F. NAJAR,5 AND B. A. ROE5

J. Med. Entomol. 46(5): 1109Ð1116 (2009)

ABSTRACT We used an expressed sequence tag and 454 pyrosequencing approach to initiate a study of the genome of the screwworm, Cochliomyia hominivorax (Coquerel) (Diptera: Calliphoridae). Two normalized cDNA libraries were constructed from RNA isolated from embryos and second instar larvae from the Panama 95 strain. Approximately 5,400 clones from each library were sequenced from both the 5⬘ and 3⬘ directions using the Sanger method. In addition, double-stranded cDNA was prepared from random-primed polyA RNA puriÞed from embryos, second-instar larvae, adult males, and adult females. These four cDNA samples were used for 454 pyrosequencing that produced ⬇300,000 independent sequences. Sequences were assembled into a database of assembled contigs and singletons and used to search public protein databases and annotate the sequences. The full database consists of 6,076 contigs and 58,221 singletons assembled from both the traditional expressed sequence tag (EST) and 454 sequences. Annotation of the data led to the identiÞcation of several gene coding regions with possible roles in sex determination in the screwworm. This database will facilitate the design of microarray and other experiments to study screwworm gene expression on a larger scale than previously possible. KEY WORDS Cochliomyia hominivorax, expressed sequence tags, 454 pyrosequencing, sex-lethal, gene ontology terms

The screwworm, Cochliomyia hominivorax (Coquerel) (Diptera: Calliphoridae), is a severe pest of livestock and other warm-blooded animals, including humans, in parts of Central and South America. The female ßy deposits its eggs at a wound site and, upon hatching, the larvae feed on the hostÕs living tissue. During seasons of heavy screwworm ßy infestation, death of the host animal will often ensue if protective measures are not taken. The screwworm has been eradicated from North America and reinfestation is prevented by maintenance of a biological barrier zone formed by the ongoing mass release of sterilized male ßies in Panama (VargasTeran 1991; http://www.nal.usda.gov/speccoll/ collect/screwworm/Start.htm). Before the eradication of the screwworm from the United States, annual losses to the U.S. livestock industry topped Mention of trade names or commercial products in this publication is solely for the purpose of providing speciÞc information and does not imply recommendation or endorsement by the U.S. Department of Agriculture. 1 USDAÐARS, Knipling-Bushland U.S. Livestock Insects Research Laboratory; 2700 Fredericksburg Rd., Kerrville, TX 78028. 2 Corresponding author, e-mail: [email protected]. 3 Research and Testing Laboratory, 4321 Marsha Sharp Fwy., Lubbock, TX 79407. 4 The J. Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850. 5 Department of Chemistry and Biochemistry, University of Oklahoma, 620 Parrington Oval, Norman, OK 73019.

$120 million in the 1950s (Graham and Hourrigan 1977). The beneÞts of the eradication program to the United States were estimated to be ⬎$1 billion in 1974 (Goodwin 1974). Annual costs of maintaining the USDA-Regional Screwworm Program are approximately $31 million, and several avenues of increasing program efÞciency are being explored. One of these areas of research involves the potential use of transgenic female-conditional lethal strains of screwworm. The development of this type of ßy strain has been successfully demonstrated in Drosophila melanogaster (Meigen) (Thomas et al. 2000) and Ceratitus capitata (Wiedemann) (Fu et al. 2007), and their use in control programs is being actively explored. However, these two ßy species have a wealth of molecular and genetic information in public databases, whereas molecular and gene sequence data for screwworm is lacking, with only 29 sequence records available in GenBank and no expressed sequence tag (EST) entries (queried 28 October 2008). Genome sequencing and focused EST acquisition projects have greatly advanced the study of a wide range of biological organisms and can provide new opportunities to solve agricultural problems and guide searches for novel control technologies. The size of the screwworm genome has not been reported and the likelihood of obtaining whole genome sequence in the near future is low. EST-based

1110

JOURNAL OF MEDICAL ENTOMOLOGY

projects will accelerate the development of new approaches to study the genome and molecular biology of this ßy. As a component of a project directed toward development of female-conditional lethal strains of transgenic screwworm, we have developed a database of genes expressed in the screwworm. We used a traditional cloned cDNA library EST-based approach to identify genes expressed in the embryonic and larval stages of the screwworm life cycle. Subsequently, a 454 pyrosequencing approach was used to obtain sequences from expressed transcripts in screwworm embryos, larvae, adult males and adult females. Bioinformatics tools were used to assemble all the sequences into a single expressed transcript database and analyze the data for sequences related to sex determination genes of other organisms which might prove useful for developing the female-conditional lethal strains. Materials and Methods Insect Stocks. All egg and larval materials were from the Panama 95 screwworm strain reared at the Pacora screwworm production plant in Panama according to established protocols (Chaudhury and Skoda 2007). Adult ßies were obtained from the Panama 95 screwworm strain reared similarly at the screwworm production plant in Tuxtla Gutierrez, Chiapas, Mexico. One gram of eggs, second-instar larvae, and 10 g of adults of mixed sex were collected and immediately frozen on dry ice and maintained at ⫺80⬚C. Adult ßies were separated by sex on dry ice and kept frozen until pulverized during the nucleic acid isolation procedure. Nucleic Acid Manipulations. Two grams each of adult males and adult females, 0.5 g of eggs, and 1 g of second-instar larvae were used for isolation of total and poly(A)⫹ RNA. Ten milliliters of RNAlater-ICE (Ambion, Austin, TX) was added to each sample according to manufacturerÕs instructions. Total RNA was isolated using the Totally RNA kit reagents with a lithium chloride precipitation step (Ambion). RNA integrity was veriÞed by formaldehyde gel electrophoresis and staining in GelStar Nucleic Acid Gel Stain (Lonza Rockland, Inc., Rockland, ME). Total RNA was used to construct a standard normalized cDNA library under contract by Express Genomics, Inc. (Frederick, MD), by using proprietary protocols. In brief, cDNA synthesis was primed using an oligodTVN primer and double-stranded cDNA directionally cloned into the EcoRV-NotI site of pExpress1 after size selection on an agarose gel. cDNA normalization was carried out using biotinylated RNA driver transcribed from the cloned cDNA and hybridization with single-stranded cDNA circles generated by phagemid production. After hybrid removal using streptavidin, single-stranded circles were converted into doublestranded DNA and then electroporated into T1 phage resistant DH10b Escherichia coli cells. Sequencing of ESTs from the two normalized libraries was performed at the J. Craig Venter Institute

Vol. 46, no. 5

(Rockville, MD). Bacterial colonies were picked for template preparation using colony-picking robots (Genetix, Boston, MA), inoculated into 384-well plates containing liquid medium and grown overnight. A robotic work station was used to prepare sequencing grade plasmid DNA by using the alkaline lysis method (Sambrook et al. 1989) modiÞed for high-throughput processing. Beckman Multimek 96 or Biomek FX automated pipetting robot work stations (Beckman Coulter, Fullerton, CA) were used to combine prealiquoted templates and reaction mixes consisting of deoxy- and ßuorescently labeled dideoxynucleotides, TaqDNA polymerase, sequencing primers, and sequencing reaction buffer. Linear ampliÞcation steps were performed on Tetrads PTC-225 (MJ Research, Watertown, MA), and sequencing reaction products were puriÞed by ethanol precipitation and resolved on ABI 3730xl sequencing machines (Applied Biosystems, Foster City, CA). The unassembled EST sequences were submitted to GenBank dbEST with the accession numbers FG282693ÐFG291953 (eggs) and FG291954 ÐFG301340 (larvae). PolyA RNA was isolated from two 400-␮g aliquots of total RNA from egg, second-instar larvae, adult males and adult females using the MicroPoly(A) Purist kit (Ambion). Five micrograms of polyA RNA was used to synthesize blunt-ended double-stranded cDNA by using the Just cDNA Double-Stranded cDNA Synthesis kit (Stratagene, La Jolla, CA) and the supplied random primers. DNA sample preparation for sequencing on the 454/ Roche GS-FLX was performed as described by the manufacturer (Margulies et al. 2005). In brief, this entailed shearing the DNA via nebulization, end repairing as described by Roe (2004), followed by ligation of adapter sequences and a second round of end repair to yield a blunt-ended DNA library that was quantiÞed and diluted before ampliÞcation via emulsion polymerase chain reaction (emPCR) (Margulies et al. 2005). After emPCR enrichment, the DNAs were loaded onto a 454/Roche GE-FLX for massively parallel pyrosequencing. The resulting sequence data in Standard Flowgram Format (sff) were assembled using a beta version of SeqMan NGen (DNAstar, Madison, WI) at 97% homology, match size 15, match spacing 24, and minimum sequence length of 100 bp. Assemblies were evaluated for false joins visually using SeqManPro (DNAstar). The raw 454 sequence data were submitted to GenBankÕs Short Trace Archive (accession no. SRA001820). Bioinformatic Analysis of ESTs. In the database of assembled ESTs, sequences were mapped to functional classiÞcations schemes such as gene ontology (GO) terms (Ashburner et al. 2000), Kyoto Encyclopedia of Genes and Genomes pathways (Ogata et al. 1999), Swiss Prot Protein Keywords (Bairoch et al. 2005), and the Database for Annotation, Visualization, and Integrated Discovery (Dennis et al. 2003). For the GO term comparisons with the C. hominivorax developmental stage data sets, the Flybase D. melanogaster 2008_09 Release GO term annotations was used in conjunction with Fisher exact test for determination of GO term distribution differences.

September 2009 Table 1.

GUERRERO ET AL.: SCREWWORM EST DATABASE

1111

Characterization of individual libraries

ID Egg ESTs Reverse direction Forward direction Larval ESTs Reverse direction Forward direction Egg 454 contigs Larval 454 contigs Adult male 454 contigs Adult female 454 contigs Combined data set

No. sequencesa

Success rate (%)

5,049 4,212

94 78

4,884 4,503 3,091 2,255 3,801 2,432 6,076 assembled contigs 58,221 singletons

91 84 NAb NA NA NA NA NA

Mean length (bp)

%G⫹C

973 963

37 30

987 970 366 341 306 339 1,194 237

39 32 32 36 35 36 34 30

a Number of successful sequencing reactions used in the database assembly from a total of 5,376 clones for the ESTs or number of contigs assembled from each 454 cDNA sequencing sample. b Not applicable.

Results and Discussion To facilitate gene discovery and gene expression studies in embryonic and early larval stages of the screwworm, we constructed two normalized cDNA libraries from total RNA isolated from eggs and second-instar larvae. The initial nonnormalized cDNA libraries consisted of a total of 2.16 ⫻ 106 and 3.00 ⫻ 106 colony-forming units (cfu) for the egg and larval libraries, respectively. After normalization, a total of 9.6 ⫻107 cfu, 87% recombinants with average insert size of 1.8 kb, were recovered from the egg library. A total of 8.40 ⫻ 107 cfu, 87% recombinants with average insert size of 1.7 kb, were recovered from the larval library. Normalization resulted in a 40- and 50-fold reduction in clones from the egg and larval libraries, respectively, which hybridized to an actin probe when compared with the number hybridizing from the primary libraries (quality control tests performed by Express Genomics Inc.; data not shown). The pUC/ M13Reverse and pUC/M13Forward primers ßank the pExpress1 multiple cloning site so as to generate 5⬘ and 3⬘ end sequence data from the cDNA cloneÕs insert, respectively. We sequenced 14 384-well plates from each library in both the 5⬘ and 3⬘ directions, obtaining 9,261 and 9,387 sequences from the egg and larval libraries, respectively (Table 1). Sequencing of the egg cDNA library resulted in 4,212 and 5,049 successful forward and reverse direction sequence reactions, respectively. Sequencing of the larval cDNA library resulted in 4,503 and 4,884 successful forward and reverse direction sequence reactions, respectively. The sequencing success rate, mean read length, and % G ⫹ C values were similar in both libraries, although all three parameters were lower in both forward (3⬘ end) direction data sets than the corresponding reverse direction data sets (Table 1). The lower sequencing reaction success rates and mean read lengths for the forward reactions is probably due to problems presenting from polyA tracts at or near the 3⬘ end of the ESTs. In addition, 3⬘ ends of gene coding regions tend to be more A-T rich, thus lowering the % G ⫹ C for the forward data sets. The 454 sequencing approach on the randomprimed egg, larval, adult male, and adult female cDNAs

yielded a total of 46,695, 64,686, 62,853, and 49,650 sequences, respectively, that assembled into 3,091, 2,255, 3,801, and 2,432 contigs, respectively. Comparing the data from the egg and larval ESTs, which were obtained by traditional Sanger sequencing, with the data obtained from the egg and larval 454 sequencing contigs, there is an obvious difference in mean length (Table 1). The average length of the forward and reverse direction EST sequencing reads from the egg and larval normalized libraries is ⬇970 bp, whereas the mean length of the assembled contigs from the egg and larval 454 data are ⬇350 bp. The average length from the individual 454 sequencing reads was ⬇230 bp (data not shown). However, the % G ⫹ C from the two sequencing methodologies was essentially the same. The average % G ⫹ C when combining the forward and reverse direction reads from the egg ESTs was 33.8%, whereas the 454 contig data had 32% G ⫹ C. The average % G ⫹ C from the forward and reverse direction reads of the larval ESTs was 35.6%, whereas the 454 contig data had 36% G ⫹ C. Finally, the 18,648 EST and 223,884 individual 454 sequences also were assembled into a combined screwworm database of 6,076 assembled contigs and 58,221 singletons (Table 1 and Supplemental Data) and used to search public protein databases and annotate the sequences. The mean length of the contigs and singletons in the combined data set were 1,194 and 237 bp, respectively. The % G⫹C of the contigs and singletons was 34 and 30%, respectively. There were 4,059 and 3,423 BlastX hits to the 15 February 2008 UniProt database (The UniProt Consortium 2008) using cut-off e-values of 0.001 and 1 ⫻ e⫺25, respectively. See Supplemental Data for the annotations accompanying the assembled contigs with UniProt hits. The latest release of the annotated D. melanogaster genome indicates 15,140 genes located to the genome sequence (release 5.12; http://ßybase.org/static_ pages/docs/release_notes.html). Assuming a similar number of genes exist in screwworm, it is possible the combined database of 6,076 contigs represents up to 40% of the protein coding genes of the screwworm. Of course, it is possible that some of the screwworm contigs represent different transcripts from the same

1112

JOURNAL OF MEDICAL ENTOMOLOGY

gene coding region or that some contigs are incorrectly maintained as separate entries in the database when they rightfully should be assembled into a single contig. Database inaccuracies arising from these problems are correctable by manual curation and will be brought to light as the database is used. Of the 4,059 contigs with BlastX hits to UniProt entries (e-value ⬍0.001), 1,792 are to D. melanogaster (see Supplemental Data), not surprisingly the species with the most UniProt BlastX hits to the combined screwworm database contigs. The next two most frequently matched organisms were mouse and human with 472 and 461 BlastX hits to UniProt, respectively. Because C. hominivorax and D. melanogaster are both dipterans, we performed a BlastX analysis of the egg, larval, adult male, and adult female databases, by using the latest annotation of the D. melanogaster genome as the reference database. For this analysis, we assembled the combined 454- and Sanger-derived data from the individual life stages to produce combined egg, combined larval, combined adult male, and combined adult female data sets. The annotations and contigs are presented as Supplemental Data. As an overall characterization of the combined data set, we determined the frequency of occurrence of the GO terms. In the combined data set, 2,831, 3,272, and 3,546 contigs had GO terms from the cellular component, biological process, and molecular function ontologies, respectively. Table 2 lists the top 20 most commonly found GO terms in each of the three ontologies, whereas a list of the top 100 GO terms is found in the Supplemental Data. It is striking that eight of the top 20 cellular component GO terms relate to organelle, presumably the mitochondrion complement of genes as mitochondrion is noted in 23% of the genes with cellular component GO terms. Five GO terms that relate to membranes are in the cellular component top 20. The biological process GO terms top 20 list has eight terms that relate to metabolism, probably reßecting that larval-expressed genes are a signiÞcant part of the data set. Larval stages of the screwworm are voracious feeders and metabolism of the meal for energy generation and growth processes is expected to be a signiÞcant activity of that life stage. Half of the top 20 molecular function GO terms are related to binding, including protein, nucleic acid, nucleotide, purine nucleotide, ion, metal ion, cation, and adenyl nucleotide and ATP binding. Seventy percent of the combined data set genes with molecular function GO terms contained the term “binding.” Because a signiÞcant portion of the data set is derived from the egg and larval cDNA libraries that were normalized to increase the relative abundance of rare transcripts, the distribution of GO terms might be different from the distribution which would be found in a data set containing the entire set of expressed genes. To examine how the screwwormÕs combined data set GO terms might differ from that of two dipterans, D. melanogaster and Aedes aegypti (L.) and one hymenopteran, Apis mellifera L., we determined the top 10 GO terms from each of the three ontologies and noted where each of these GO terms ranked in the

Table 2. annotation

Vol. 46, no. 5 Most common GO terms in combined data set

GO term Cellular component Cell Intracellular Organelle Intracellular organelle Cytoplasm Membrane-bound organelle Intracellular membrane-bound organelle Protein complex Membrane Nucleus Mitochondrion Intracellular non-membrane-bound organelle Nonmembrane-bound organelle Organelle membrane Integral to membrane Intrinsic to membrane Envelope Organelle envelope Cytoskeleton Mitochondrial envelope Biological process Physiological process Cellular process Cellular physiological process Metabolism Cellular metabolism Primary metabolism Macromolecule metabolism Nucleoside Nucleotide and nucleic acid metabolism Localization Protein metabolism Establishment of localization Cellular macromolecule metabolism Cellular protein metabolism Transport Biosynthesis Cellular biosynthesis Development Generation of precursor metabolites and energy Nucleobase Molecular function Binding Catalytic activity Hydrolase activity Protein binding Oxidoreductase activity ATPase activity Acting on acid anhydrides Nucleic acid binding Transferase activity Nucleotide binding Purine nucleotide binding Transporter activity Ion binding Metal ion binding Cation binding Adenyl nucleotide binding ATP binding Structural molecule activity Carrier activity In phosphorus-containing anhydrides

Tallya

%b

2,643 2,408 2,041 2,040 1,749 1,650 1,649 913 840 705 645 598 598 441 429 429 358 357 318 316

93 85 72 72 62 58 58 32 30 25 23 21 21 16 15 15 13 13 11 11

2,121 2,098 2,026 1,766 1,690 1,549 1,128 790 761 727 692 690 682 652 644 640 618 611 564 554

65 64 62 54 52 47 34 24 23 22 21 21 21 20 20 20 19 19 17 17

2,484 1,950 1,888 1,517 1,442 1,044 886 872 865 669 599 573 505 505 423 421 413 412 377 375

70 55 53 43 41 29 25 25 24 19 17 16 14 14 12 12 12 12 11 11

a Number of contigs whose annotation contained the speciÞc GO term. b Derived by dividing the tally value by the total number of contigs thatcontained GO terms from the speciÞc ontology.

screwworm combined data set (Table 3). The GO terms for D. melanogaster release 11.0 (14 June 2006), A. aegypti release 5.0 (25 April 2008), and A. mellifera

September 2009 Table 3.

GUERRERO ET AL.: SCREWWORM EST DATABASE

1113

Intraspecies comparison of top 10 GO terms

D. melanogastera Cellular component Cell (1) Organelle (3) Protein complex (8) Cellular component unknown (47) Membrane-enclosed lumen (25) Extracellular region (28) Envelope (17) Synapse (162) Extracellular matrix (70) Obsolete cellular component (363) Biological process Physiological process (1) Cellular process (2) Development (18) Regulation of biological process (24) Response to stimulus (38) Biological process unknown (340) Reproduction (54) Growth (173) Interaction between organisms (493) Pigmentation (716) Molecular function Binding (1) Catalytic activity (2) Molecular function unknown (81) Signal transducer activity (49) Transcription regulator activity (35) Structural molecule activity (18) Transporter activity (12) Enzyme regulator activity (63) Obsolete molecular function (37) Motor activity (102)

Ae. aegyptia

A. melliferaa

Cell (1) Cell part (NFb) Organelle (3) Organelle part (NF) Macromolecular complex (NF) Membrane-enclosed lumen (25) Envelope (17) Extracellular region (28) Extracellular region part (NF) Synapse (162)

Cell (1) Cell part (NF) Organelle (3) Organelle part (NF) Macromolecular complex (NF) Membrane-enclosed lumen (25) Envelope (17) Extracellular region (28) Extracellular region part (NF) Synapse (162)

Cellular process (2) Metabolic process (4c) Multicellular organismal process (NF) Developmental process (NF) Biological regulation (NF) Reproduction (54) Growth (173) Gene expression (NF) Response to stimulus (38) Localization (10)

Cellular process (2) Metabolic process (4c) Multicellular organismal process (NF) Developmental process (NF) Biological regulation (NF) Reproduction (54) Growth (173) Localization (10) Response to stimulus (38) Gene expression (NF)

Binding (1) Catalytic activity (2) Structural molecule activity (18) Transporter activity (12) Transcription regulator activity (35) Enzyme regulator activity (63) Molecular transducer activity (NF) Translation regulator activity (54) Antioxidant activity (230) Motor activity (102)

Binding (1) Catalytic activity (2) Transporter activity (12) Structural molecule activity (18) Transcription regulator activity (35) Molecular transducer activity (NF) Enzyme regulator activity (63) Translation regulator activity (54) Motor activity (102) Antioxidant activity (230)

a Values in parentheses denote the rank of the speciÞc GO term in the screwworm combined data set tally of GO terms. Source of data is The Gene Index Project (http://compbio.dfci.harvard.edu/tgi/cgi-bin/tgi/). b GO term not found in screwworm Combined data set annotation. c Noted as “metabolism” in the screwworm combined data set tally of top 100 GO terms (see Supplemental Data).

release 5.0 (3 May 2008) were obtained from The Gene Index Project (http://compbio.dfci.harvard. edu/tgi/cgi-bin/tgi/). The identity and order of the top 10 Cellular Component GO Terms of A. aegypti and A. mellifera are identical, whereas the identity of the top 10 biological process and molecular function GO terms are identical in the two species, although some terms have slightly different rankings in the top 10 list. Surprisingly, the top 10 GO term list of D. melanogaster has several differences compared with Ae. aegypti and A. mellifera. Protein complex, cellular component unknown, extracellular matrix, and obsolete cellular component are D. melanogaster cellular component GO terms that are not in the top 10 list of the other two insects. Physiological process, regulation of biological process, biological process unknown, interaction between organisms, and pigmentation are D. melanogaster biological process GO terms that are not in the top 10 list of the other two insects. Molecular function unknown, signal transducer activity, and obsolete molecular function are D. melanogaster molecular function GO terms that are not in the top 10 list of the other two insects. Table 3 shows where each of the top 10 GO terms from D. melanogaster, Ae. aegypti, and A. mellifera would rank in the listing of the GO terms from the screwworm combined data set annotations. Many of the most common GO terms of Table

3 have greatly different rankings in the screwworm data set. For example, the biological process GO term reproduction ranks in the top seven for all three species of Table 3, whereas it ranks 56 in the screwworm data set. The molecular function GO term transcriptional regulator activity ranks Þve in all three species of Table 3, yet ranks 35 in the screwworm data set. Other examples can be seen in Table 3. In fact, only the top two or three ranking GO Terms from Table 3 have similar rankings in the screwworm data set. These major differences probably reßect that two of the prime sources of the screwworm data set are assembled contigs and unassembled singletons derived from two normalized cDNA libraries prepared from egg and larval life stages. These egg and larval contigs probably served as seed sequences to the collection of 454-derived sequences, allowing more contigs to be assembled from egg and larval 454 sequences, skewing the GO terms and rankings toward ontological classiÞcations from egg- and larval-expressed transcripts. The assembled contigs from The Gene Index Project that were used to obtain the D. melanogaster, Ae. aegypti, and A. mellifera GO terms were almost all derived from adult cDNAs. We also used the GO annotations to estimate how comprehensive the screwworm combined data set might be as compared with the well-annotated D. melanogaster

1114

JOURNAL OF MEDICAL ENTOMOLOGY

Vol. 46, no. 5

80

70

60

50

%

Female Adult Male Adult Egg Larvae

40

30

20

10

0

Fig. 1. GO terms overrepresented in the C. hominivorax sequence databases compared with the D. melanogaster genome annotation GO terms. The top 20 overrepresented GO terms of the female C. hominivorax 454 contig database were used to examine the differences between the contig databases from the adult female, adult male, egg, and larvae. The % (y-axis) denotes the percentage of contigs in the database which contained the speciÞc GO term noted on the x-axis.

genome. The GO consortium shows a total of 15,601, 8,424, and 2,230 biological process, molecular function, and cellular component GO Terms as of 30 October 2008 (http://www.geneontology.org/GO.downloads. ontology.shtml). The FlyBase 2008_09 D. melanogaster release was used to determine that the most recent annotation of the D. melanogaster genome makes use of 2,401, 1,759, and 532 of the total biological process, molecular function, and cellular component GO terms, respectively. Assuming the screwworm and D. melanogaster have a similar complement of genes and gene number, comparing the GO term use of our screwworm database to that of the D. melanogaster genome annotations could give an indication of what percentage of the screwworm gene count is present in our combined database. Our database annotations of the 6,076 contigs uses 1,658, 1,054, and 453 of the total biological process, molecular function, and cellular component GO terms, respectively, yielding respective percentages of the total GO terms used as 69, 60, and 85%. Although we would not contend the screwworm combined data set contains these percentages of the total gene count expected from the genome, these data indicate that our data set contains a signiÞcant fraction of the expressed genes of the screwworm. As noted, direct GO term comparisons of the D. melanogaster and screwworm combined data set are likely affected by the signiÞcant number of transcripts in the screwworm data set that are derived from embryos and larvae. To evaluate how the different life stages of C. hominivorax varied from each other and from D. melano-

gaster, we used the GO annotations associated with the four developmental stage sequence data sets and D. melanogaster. The number of times each GO term occurred in the D. melanogaster, combined egg, combined larval, combined adult male, and combined adult female data sets was tallied. The tallies were compared between D. melanogaster and each of the combined egg, combined larval, combined adult male, and combined adult female data sets to identify GO terms that were more abundant in each C. hominivorax data set compared with D. melanogaster. The entire analytical results, including P values from Fisher exact test, are presented in Supplemental Data. Figure 1 depicts the 20 most overrepresented molecular function GO terms from the combined adult female data set and noting their occurrence in the other three C. hominivorax data sets. The results from the four data sets are very consistent for 15- of the 20-most overrepresented molecular function GO terms, indicating the four life stages vary from D. melanogaster in a similar manner for these molecular functions. However, catalytic activity, cation binding, transmembrane transporter activity, and molecular transducer activity were not overrepresented in the combined egg data set. In addition, catalytic activity and transferase activity were not overrepresented in the combined larval data set. Thus, from this type of GO term analysis, the egg stage of C. hominivorax seems to differ least from D. melanogaster. A major objective of this DNA sequence acquisition project was to identify screwworm orthologs to insect sex determination pathway genes. By a search of the

September 2009

GUERRERO ET AL.: SCREWWORM EST DATABASE

1115

Fig. 2. Multiple sequence alignment of sex-lethal orthologs. ClustalX was used to produce a multiple sequence alignment from C. hominivorax (screwworm: contig 17682), L. cuprina (Lucilia; GenBank accession no. AAK20025.1), and C. rufifacies (Chrysomya; GenBank accession no. AAN74807.1). Asterisks (*) indicate amino acid identity with the screwworm amino acid sequence deduced from the contig 17682 nucleotide sequence manually adjusted as described in the text. An “X” was placed at the three amino acid positions where manual sequence adjustment was necessary. A hyphen (-) indicates a gap was introduced to optimize alignment with the screwworm sequence.

combined data set annotations, we identiÞed several sequences that had similarity to UniProt entries that are involved in arthropod sex determination pathways. Some of the best characterized sex determination genes from D. melanogaster are transformer, doublesex, and sex-lethal (Saccone et al. 2002), and we searched the gene names list from the combined data set contigs (see Supplemental Data) for these speciÞc names. Of particular interest is contig 17682 identiÞed as sexlethal and possessing high similarity to the sex-lethal ortholog in Chrysomya rufifacies (Macquart), a member of the blowßy family (Mu¨ ller-Holtkamp 1995). A directed search of GenBank also found a sex-lethal ortholog from Lucilia cuprina (Wiedemann) and a ClustalX alignment (Thompson et al. 1997) of the translated screwworm sequence and the L. cuprina and C. rufifacies sequences demonstrated the region of high amino acid identity extends over most of the protein sequence (Fig. 2). Because contig 17682 is from an assembly of ESTs that originated from singleTable 4.

pass sequencing, it is possible there are errors in the contigÕs sequence that can affect the reading frame of the sequence. In deriving the alignment for Fig. 1, we manually adjusted the sequence of contig 17682 to optimize the alignment with the other two blowßy sequences. This was done by reverse complementing and translating contig 17682 in frame two followed by deleting the resulting nucleotides 968 (T), 1003 (A), and 1017 (T). Using the GO term annotations (see Supplemental Data), we identiÞed entries in the screwworm combined data set that had biological process GO term classiÞcations containing male sex determination (GO:0030238), somatic sex determination (GO:0018993), or female sex differentiation (GO:0046660), and these are listed in Table 4. Contig 59623 has moderate similarity (e value ⫽ 1.78e⫺6) to a testis-determining gene responsible for male development in mice sry (Gubbay et al. 1990; GenBank accession no. Q05738). Contig 9469 has high similarity (e value ⫽ 3.21e⫺63) to a D. melanogaster pro-

Combined data set contigs with biological process GO terms related to sex determination and female sex differentiation Contig

Male sex determination (GO:0030238) 59623 Somatic sex determination (GO:0018993) 9469 Female aex differentiation (GO:0046660) 11235 7414 9796 6976

Length bp

Top BlastX hit ID

Accession no

e value

1072

Sex determination region of Y chromosome

Q05738

1.78e⫺6

1503

Achaete-Scute complex protein T4

P10084

3.21e⫺63

1037 1718 1297 1044

Hyperplastics discs Hyperplastics discs Hyperplastics discs Hyperplastics discs

P51592 P51592 P51592 P51592

3.2e⫺130 1.9e⫺123 6.5e⫺101 1.0e⫺52

1116

JOURNAL OF MEDICAL ENTOMOLOGY

tein Achaete-Scute complex protein T4 that interacts with the D. melanogaster sex-determination protein sexlethal (Torres and Sanchez 1989; GenBank accession no. P10084). Contigs 11235, 7414, 9796, and 6976 have high similarity to hyperplastics discs, a gene involved in germ cell development (MansÞeld et al. 1994). Unfortunately, sequences with similarity to doublesex or transformer were not found in our database and isolation of these orthologs will require either deeper sequencing of our cDNAs or use of PCR in conjunction with degenerate primers designed from orthologs from other species. Acknowledgments We thank Steve Skoda and Pamela Phillips for collection of screwworm material, Susan Tweedie for help with our questions on FlyBase data, David Hill for help from the GO ConsortiumÕs help desk, and the sequencing core personnel at the J. Craig Venter Institute Joint Technology Center.

References Cited Ashburner, M., C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, J. T. Eppig, M. A. Harris, et al. 2000. Gene ontology: tool for the uniÞcation of biology. The Gene Ontology consortium. Nat. Genet. 25: 25Ð29. Bairoch, A., R. Apweiler, C. H. Wu, W. C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, et al. 2005. The universal protein resource (UniProt). Nucleic Acids Res. 33: D154ÐD159. Chaudhury, M. F., and S. R. Skoda. 2007. A cellulose Þberbased diet for screwworm (Diptera: Calliphoridae) larvae. J. Econ. Entomol. 100: 241Ð245. Dennis, G., Jr., B. T. Sherman, D. A. Hosack, J. Yang, W. Gao, H. C. Lane, and R. A. Lempicki. 2003. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 4: R60. Fu, G., K. C. Condon, M. J. Epton, P. Gong, L. Jin, G. C. Condon, N. I. Morrison, T. H. Dafa’alla, and L. Alphey. 2007. Female-speciÞc insect lethality engineered using alternative splicing. Nat. Biotechnol. 25: 353Ð357. Goodwin, J. W. 1974. An estimation of consumer surplus via screwworm eradication in the Southwest. Oklahoma State Department of Agriculture Report, Stillwater, OK. Graham, O. H., and J. L. Hourrigan. 1977. Eradication programs for the arthropod parasites of livestock. J. Med. Entomol. 13: 629Ð658. Gubbay, J., J. Collignon, P. Koopman, B. Capel, A. Economou, A. Mu¨ nsterberg, N. Vivian, P. Goodfellow, and R. Lovell-Badge. 1990. A gene mapping to the sex-deter-

Vol. 46, no. 5

mining region of the mouse Y chromosome is a member of a novel family of embryonically expressed genes. Nature (Lond.) 346: 245Ð250. Mansfield, E., E. Hersperger, J. Biggs, and A. Shearn. 1994. Genetic and molecular analysis of hyperplastic discs, a gene whose product is required for regulation of cell proliferation in Drosophila melanogaster imaginal discs and germ cells. Dev. Biol. 165: 507Ð526. Margulies, M., M. Egholm, W. E. Altman, S. Attiya, J. S. Bader, L. A. Bemben, J. Berka, M. S. Braverman, Y.-J. Chen, Z. Chen, et al. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature (Lond.) 437: 376 Ð380. Mu¨ ller-Holtkamp, F. 1995. The Sex-Lethal gene homologue in Chrysomya rufifacies is highly conserved in sequence and exon-intron organization. J. Mol. Evol. 41: 467Ð 477. Ogata, H., S. Goto, K. Sato, W. Fujibuchi, H. Bono, and M. Kanehisa. 1999. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 27: 29 Ð34. Roe, B. 2004. Shotgun library construction for DNA sequencing, pp. 171Ð187. In S. Zhao and M. Stodolsky [eds.], Methods in molecular biology, vol. 255, bacterial artiÞcial chromosomes, vol. 1: library construction, physical mapping, and sequencing. Humana Press Inc, Totowa, NJ. Saccone, G., A. Pane, and L. C. Polito. 2002. Sex determination in ßies, fruitßies and butterßies. Genetica 116: 15Ð23. Sambrook, J., E. F. Fritsch, and T. Maniatis. 1989. Molecular cloning: a laboratory manual, 2nd ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. The UniProt Consortium. 2008. The universal protein resource (UniProt). Nucleic Acids Res. 36: D190 ÐD195. Thomas, D. D., C. A. Donnelly, R. J. Wood, and L. S. Alphey. 2000. Insect population control using a dominant, repressible, lethal genetic system. Science (Wash., D.C.) 287: 2474 Ð2476. Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The ClustalX windows interface: ßexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25: 4876 Ð 4882. Torres, M., and L. Sanchez. 1989. The scute (T4) gene acts as a numerator element of the X: a signal that determines the state of activity of sex-lethal in Drosophila. EMBO J. 8: 3079 Ð3086. Vargas-Teran, M. 1991. The new world screwworm in Mexico and Central America. In R.D.S. Branckaert [ed.], New World screwworm response to an emergency. World animal review, special issue. Food and Agriculture Organization, Rome, Italy. Received 5 January 2009; accepted 1 June 2009.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.