The Phylogenetic Analysis of Variable-Length Sequence Data: Elongation Factor-1  Introns in European Populations of the Parasitoid Wasp Genus Pauesia (Hymenoptera: Braconidae: Aphidiinae)

Share Embed


Descripción

The Phylogenetic Analysis of Variable-Length Sequence Data: Elongation Factor–1a Introns in European Populations of the Parasitoid Wasp Genus Pauesia (Hymenoptera: Braconidae: Aphidiinae) Ana Sanchis,*† Jose M. Michelena,* Amparo Latorre,* Donald L. J. Quicke,† Ulf Ga¨rdenfors,‡ and Robert Belshaw† *Institut Cavanilles de Biodiversitat i Biologı ´a Evolutiva, Universitat de Vale`ncia, Valencia, Spain; †Department of Biology, Imperial College at Silwood Park, Ascot, Berks, England; and ‡Departament of Conservation Biology, Uppsala, Sweden

Introduction A major problem in molecular phylogenetics is the establishment of base pair homology among variablelength sequences, for example, in noncoding regions such as ribosomal genes and introns. Typically, base positions are aligned by the insertion of gaps, representing insertion or deletion events (indels), prior to tree-building. Several computer programs employing different algorithms are available for generating such multiple alignments, and Morrison and Ellis (1997) demonstrated that the choice of program could have a greater effect on the estimated phylogeny than the choice of treebuilding method. As either a complementary or an alternative strategy, many authors simply exclude regions they consider difficult to align, usually on the grounds that they are unlikely to be phylogenetically informative and will merely introduce noise (Olsen and Woese 1993). This inevitably introduces an unwelcome element of subjectivity and may reflect an undue concern over the danger of noise. Wenzel and Siddall (1999) demKey words: phylogenetics, variable-length sequences, alignment strategies, tree-building methods, Aphidiinae, Pauesia, EF-1a introns, elongation factor–1a. Address for correspondence and reprints: Amparo Latorre, Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Genetica Evolutiva, Universitat de Vale`ncia, Edificio Institutos Investigacio´n, Campos Paterna, Apdo. Correos 22085, 46071 Valencia, Spain. E-mail: [email protected]. Mol. Biol. Evol. 18(6):1117–1131. 2001 q 2001 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038

onstrated that levels of noise, such as that caused by completely saturated third codon positions, are unlikely to have a significantly detrimental effect on the phylogenetic signal from elsewhere in the gene. Although secondary structure can be used to improve multiple alignments by identifying homologous regions (Kjer 1995), its estimation is problematic, and many types of sequences do not produce such structure. A further problem with multiple alignments is how to treat gaps, which are informative evolutionary events, but most distance and maximum likelihood analyses would treat them as ‘‘missing data.’’ Despite this, most discussions of molecular phylogenetic methods (e.g., Swofford et al. 1996) do not address these questions, and few studies explore the sensitivity of their results to alignment methods as thoroughly as their sensitivity to tree-building algorithms (but see Giribet and Wheeler 1999; Maddison, Baker, and Ober 1999). Recently, a direct optimization approach (Wheeler 1996) to such data has become available. This avoids the conceptually problematic separation of alignment and tree-building and uses parsimony to infer phylogenetic relationships directly from variable-length sequences. In this study, we compare methods of analyzing variable-length sequence data, in the present case, introns in the elongation factor-1a gene from closely related species of the parasitoid wasp genus Pauesia (Hymenoptera: Braconidae: Aphidiinae). Elongation factor– 1a (EF-1a) is a nuclear coding gene involved in the 1117

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 2, 2013

Elongation factor–1a (EF-1a) is a highly conserved nuclear coding gene that can be used to investigate recent divergences due to the presence of rapidly evolving introns. However, a universal feature of intron sequences is that even closely related species exhibit insertion and deletion events, which cause variation in the lengths of the sequences. Indels are frequently rich in evolutionary information, but most investigators ignore sites that fall within these variable regions, largely because the analytical tools and theory are not well developed. We examined this problem in the taxonomically problematic parasitoid wasp genus Pauesia (Hymenoptera: Braconidae: Aphidiinae) using congruence as a criterion for assessing a range of methods for aligning such variable-length EF-1a intron sequences. These methods included distance- and parsimony-based multiple-alignment programs (CLUSTAL W and MALIGN), direct optimization (POY), and two ‘‘by eye’’ alignment strategies. Furthermore, with one method (CLUSTAL W) we explored in detail the robustness of results to changes in the gap cost parameters. Pheneticbased alignments (‘‘by eye’’ and CLUSTAL W) appeared, under our criterion, to perform as well as more readily defensible, but computationally more demanding, methods. In general, all of our alignment and tree-building strategies recovered the same basic topological structure, which means that an underlying phylogenetic signal remained regardless of the strategy chosen. However, several relationships between clades were sensitive both to alignment and to tree-building protocol. Further alignments, considering only sequences belonging to the same group, allowed us to infer a range of phylogenetic relationships that were highly robust to tree-building protocol. By comparing these topologies with those obtained by varying the CLUSTAL parameters, we generated the distribution area of congruence and taxonomic compatibility. Finally, we present the first robust estimate of the European Pauesia phylogeny by using two EF-1a introns and 38 taxa (plus 3 outgroups). This estimate conflicts markedly with the traditional subgeneric classification. We recommend that this classification be abandoned, and we propose a series of monophyletic species groups.

1118

Sanchis et al.

of the whole data set (exon plus introns). For this, we employed different measures for assessing the different sets of parameters (congruence between different treebuilding protocols and robustness of each estimate by performing bootstrap tests). Taxonomic Background Recent advances in Aphidiinae (Hymenoptera: Braconidae) systematics pose questions about the taxonomic and evolutionary status of many populations. It has been proposed that sympatric population splitting as a consequence of host specialization has led to the formation of rapidly evolving groups of closely related species (Tremblay and Pennacchio 1988). The genus Pauesia Quilis, 1931, is a good group for testing this phenomenon. The genus has 55 species recognized at present (Medvedev 1995) and is highly specialized in its host range, with all the species restricted to Lachninae aphids (Homoptera: Aphididae) mostly on Pinatacea (Stary 1960, 1966, 1970, 1976, 1979). The genus is taxonomically problematic: the phylogenetic relationships between species are unknown and, in practice, it is often impossible to determine species boundaries using morphology (many specimens cannot be ascribed confidently to described species). Although three subgenera are currently recognized (Pauesiella, Pauesia s.s., and Paraphidius), their monophyly has not been convincingly demonstrated. Within the subgenus Paraphidius, we can recognize one species group, here called Pauesia species group jezoensis, that in western Europe includes such described species as Pauesia jezoensis (5Paraphidius piceaecollis Stary, 1960), Pauesia pinicollis, and Pauesia cupressobii. The North American species Pauesia ahtanumensis apparently also belongs to this group. Another species group within Paraphidius, here called the Pauesia species group pini, includes published species like Pauesia pini, Pauesia silvestris, Pauesia silana, Pauesia juniperorum, Pauesia similis, and Pauesia alpina, but also the unpublished Pauesia sp. A (on Larix). The morphologically distinctive species Pauesia infulata is also included in the subgenus Paraphidius, but its phylogenetic affinities have not been established. Within the subgenus Pauesia s.s., the species Pauesia unilachni is morphologically distinctive and has a unique host range (Schizolachnus spp.). In Europe, another two species, Pauesia laricis and Pauesia picta, are assigned to Pauesia s.s. but, although the extreme morphological characters between them are clearly distinctive, intermediate forms appear to occur in nature. Thus, we assign these two species to Pauesia species group laricis. Morphologically, there seems to be a third unpublished taxon belonging to this group, here referred to as Pauesia sp. B (on Pinus nigra nigra). To elucidate the phylogenetic relationship between the three current subgenera, we also included material of one unpublished species from Syria (Pauesia sp. C), which we placed in the subgenus Pauesiella because of its spatulate ovipositor sheaths (Sedlag and Stary 1980).

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 2, 2013

GTP-dependent binding of charged tRNAs to the acceptor site of the ribosome during translation. This gene has been characterized in several animal species (Brands et al. 1986; Lenstra et al. 1986; Rao and Slobin 1986; Roth et al. 1987; Hovemann et al. 1988; Walldorf and Hovemann 1990; Danforth and Ji 1998). It has been used to infer evolutionary relationships among early eukaryotes (Hasegawa et al. 1993; Kamaishi et al. 1996) and also within insect taxa at different levels (Friedlander, Regier, and Mitter 1992; Brower and DeSalle 1994; Friedlander 1994; Belshaw and Quicke 1997; Mitchell et al. 1997; Regier and Shultz 1997). However, highly conserved protein-coding nuclear genes may also be phylogenetically informative among more recent divergences if they contain more rapidly evolving introns (Clark, Leicht, and Muse 1996; Kelchner and Clark 1997; Liss et al. 1997; Belshaw et al. 1999; Fabry, Ko¨hler, and Coleman 1999; Oakley and Phillips 1999). Sequencing across introns from conserved primer sites in the adjacent exon (exon primed intron crossing [EPIC]) allows these variable regions to be targeted and provides an efficient method of looking at closely related taxa. Assessing different methods of analyzing variablelength sequences is not easy. These methods optimize different criteria, and all are dependent on both the parameters chosen (especially the cost of gaps as opposed to substitutions) and the computational effort applied to optimization. Morrison and Ellis (1997) compared alignments with a preferred alignment based on secondary structure, Wheeler (1995) used congruence with existing taxonomy, and Giribet and Wheeler (1999) used congruence with morphological characters (in incongruence length difference [ILD] tests). In the present study, it was not possible to use any of these methods. First, we suspected that the reliability of many morphological characters and our preliminary results were clearly incongruent with the traditional classification of the genus Pauesia. We were unable to use measures of character congruence such as the ILD test because the program POY does not work via production of a multiple alignment. Besides, these noncoding sequences do not give rise to a secondary structure. Therefore, we compared different alignment methods using two other measures of congruence: (1) congruence between the exon characters (which have no alignment ambiguity) and trees produced using the different alignment methods on the introns alone, and (2) congruence between pairs of trees produced using the different alignment methods on the two introns. We are assuming that the exon and the two introns have had the same phylogenetic history and hence should be fully congruent with each other. Incongruence will result from several sources of homoplasy, one of which will be the introduction of errors during alignment. Hence, the more congruent the results given with one alignment method, the more accurate they should be. Additionally, we thoroughly explored the phylogenetic consequences of changes in the parameters of one commonly used alignment method, the program CLUSTAL W, by creating and analyzing 72 alignments

Phylogenetic Inferences Based on Variable-Length EF-1a Introns

Materials and Methods Taxon Sampling

Laboratory Protocols Total genomic DNA was isolated from individual wasps as described elsewhere (Sanchis et al. 2000) and resuspended in 10 ml of TE buffer (Tris HCl 10 mM, EDTA 1 mM [pH 8.0]). The EF-1a was amplified following a technique of nested PCR. Primers used in the first reaction were 59-AGA TGG CYA ARG GTT CCT TCA A-39 (forward) (Belshaw and Quicke 1997) and 59-ATG TGA GCA GTG TGG CAA TCC AA-39 (reverse) (‘‘INDNA35’’; John Hobbs, Nucleic Acid Protein Service, University of British Columbia). PCR reactions were carried out in a Perkin-Elmer 2400 thermal cycler in a 50-ml volume containing 0.5 ml DNA extract, 1.25 U Taq polymerase (Amersham Pharmacia Biotech), 20 pmol of each primer, 10 nmol dNTPs (Amersham Pharmacia Biotech), and 5 ml buffer. PCR conditions consisted of (1) an initial denaturation for 2 min at 948C, (2) 35 cycles of denaturation for 15 s at 968C, annealing for 1 min at 458C, and extension for 2 min at 728C, and (3) a final extension for 7 min at 728C. Second PCR reactions were carried out with internal primers 59-GAT GGC ACG GAG ACA ACA TG-39 (forward) and 59CCA TTG CTG ATT TGT CCA GGG TGG-39 (reverse), corresponding to positions 2741–2760 and 3735– 3758, respectively, in the F2 copy of Drosophila melanogaster (Hovemann et al. 1988). These amplifications were carried out in a 100-ml volume containing 1 ml of the previous PCR reaction as a template, 2.50 U Taq polymerase, 40 pmol of each primer, 20 nmol dNTPs, and 10 ml buffer. PCR conditions consisted of (1) an initial denaturation for 2 min at 948C, (2) 35 cycles of denaturation for 15 s at 968C, annealing for 30 s at 558C, and extension for 2 min at 728C, and (3) a final extension for 7 min at 728C. The products were cut from the agarose gel, purified with the Sephaglas Band Prep kit

(Pharmacia), and then sequenced directly on a PE/ABI 373 automated sequencer using ABI PRISM Dye Terminators (Perkin-Elmer). Two other internal primers were used for sequencing, 59-ACA CCA GTT TCA ACA CGA CC-39 (reverse) and 59-ACG AAG CTC TCA CTG AAG CCG TTC C-39 (forward). Sequence Alignment Methods We used five methods for aligning each intron separately (exons and outgroups excluded): 1. A multiple alignment was made using CLUSTAL W, version 1.5. (Thompson, Higgins, and Gibson 1994). This method produces an alignment from a series of progressive pairwise alignments between sequences and clusters of sequences. The pairwise alignments use the method of Needleman and Wunsch (1970), while the clustering order of progressive sequence alignment (Feng and Doolittle 1987) is determined from a neighbor-joining guide tree. For comparison with other methods, we tested values of 8 for the gap opening penalty and 0.5 for the gap extension penalty. These values represent a compromise between the more restrictive and the more permissive values proposed by some authors (Wheeler 1995; Morrison and Ellis 1997). 2. A typical ‘‘by eye’’ alignment was made by modification of a default CLUSTAL alignment under a compression length criterion. A minimum number of gaps were inserted manually in order to produce what looked like a reasonable alignment. 3. A second ‘‘by eye’’ alignment was made by modification of a default CLUSTAL alignment using a hierarchical approach to overcome the problem caused by variable regions, which can be unambiguously aligned within groups of taxa but not across all taxa. Such regions were aligned independently and also represented as binary characters in a separate data matrix. This was done in a NEXUS file by shifting blocks of bases and inserting gaps in the corresponding positions of the remaining taxa. No homoplasy was found in this second matrix, and a constraint tree was created from it. Under this constraint, the most parsimonious tree was then found as described below for all the sequences, with gaps treated as missing data. Where gaps occurred within the unambiguously aligned blocks, they were coded as additional single binary characters and included in the analysis (note that the constraint tree was written manually and taxa sharing regions of unambiguous alignment were treated as monophyletic, which cannot be demonstrated in an unrooted analysis). Regions that could not be unambiguously aligned even within groups were excluded from the analysis. 4. A multiple alignment was made using MALIGN, version 2.7, parallel version 1.5 (Wheeler and Gladstein 1994). In contrast to CLUSTAL, this method utilizes parsimony as the optimality criterion. The gap cost was set to twice the substitution cost, and 100 random additions were used with limited alignment swapping (commands ‘‘quick’’ and ‘‘align-

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 2, 2013

Parasitized aphids were collected from 20 localities, primarily in Spain (by A.S. and J.M.M.) and Sweden (by U.G.). Individual wasps were reared and placed alive into 100% ethanol. Representatives of two other closely related genera, Aphidius colemani Viereck, 1912, and Pseudopauesia prunicola Halme, 1986, were used as outgroups. The 38 specimens of Pauesia included in this study provided a good representation of the genus, including 13 of the 22 described European species. We analyzed single individuals, and when multiple individuals of the same putative species were studied, each came from a different locality (details in table 1). Specimens were identified using the keys for the group (Stary 1960, 1966, 1970, 1976, 1979; Medvedev 1995). Two EF-1a introns, separated by approximately 300 bp, were then sequenced in individual wasps. Intron 1 varied in length between 172 and 232 bp, and intron 2 was between 103 and 244 bp. In addition to the introns, we sequenced adjacent exon regions totaling 227 bp in length (intron lengths and GenBank accession numbers are given in table 1).

1119

Cinara sp. Cinara sp. Cinara juniperi C. juniperi Cinara hyperophila C. pinea Cinara sp. Cinara piceicola Cinara pilicornis Cinara ponderosae Cinara pruinosa Cinara laricis C. pruinosa C. piceicola Cinara korchiana C. korchiana C. pinea Cinara sp. Cinara sp. Cinara cuneomaculata C. cuneomaculata C. cuneomaculata C. juniperi C. juniperi Cinara pini C. pini C. pini Cinara maritimae Schizolachnus pineti Cinara sp. Cinara sp. Cinara sp. Cinara sp. Cinara sp. C. pilicornis Cinara sp.

Pauesia sg. pini P. similis Stary, 1966 1 . . . . . . . . . . . . . . . . P. similis 2. . . . . . . . . . . . . . . . . . . . . . . . . . . P. similis 3a . . . . . . . . . . . . . . . . . . . . . . . . . . P. similis 4a . . . . . . . . . . . . . . . . . . . . . . . . . . P. alpina Stary, 1966 c . . . . . . . . . . . . . . . . P. alpina c . . . . . . . . . . . . . . . . . . . . . . . . . . . P. pini Haliday, 1834 1 . . . . . . . . . . . . . . . . P. pini 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. pini 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pauesia sp. A 1. . . . . . . . . . . . . . . . . . . . . . . Pauesia sp. A 2. . . . . . . . . . . . . . . . . . . . . . . Pauesia sp. A 3. . . . . . . . . . . . . . . . . . . . . . . P. juniperorum (Stary, 1960) 1 . . . . . . . . . . P. juniperorum 2 . . . . . . . . . . . . . . . . . . . . . . P. silvestris (Stary, 1960) c . . . . . . . . . . . . . P. silvestris c . . . . . . . . . . . . . . . . . . . . . . . . . P. silvestris c . . . . . . . . . . . . . . . . . . . . . . . . . P. silana Tremblay, 1975 . . . . . . . . . . . . . . .

Subgenus Pauesia P. unilachni (Gahan, 1927) . . . . . . . . . . . . .

Pauesia sg. laricis P. laricis (Haliday, 1834) . . . . . . . . . . . . . . . P. picta (Haliday, 1834) 1 . . . . . . . . . . . . . . P. picta 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. picta 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. picta 4a . . . . . . . . . . . . . . . . . . . . . . . . . . . P. picta/P. laricisa . . . . . . . . . . . . . . . . . . . . . Pauesia sp. B. . . . . . . . . . . . . . . . . . . . . . . . .

Subgenus Pauesiella Sedlang and Stary, 1980 Pauesia sp. Ca . . . . . . . . . . . . . . . . . . . . . . . . Cinara sp.

Cinara costata

Pauesia sg. jezoensis P. cupressobii (Stary, 1960) 1 . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . 4a . . . . . . . . . . . . . . . . . . . . . . P. pinicollis (Stary, 1960) 1. . . . . . . . . . . . . 2 ...................... 3 ...................... P. jezoensis Watanabe, 1941 1. . . . . . . . . . . 2 ...................... P. ahtanumensis Pike & Stary, 1996

Aphid Host

Cupressus sempervirens

P. nigra nigra Pinus pinaster P. sylvestris Pinus nigra salzmannii P. nigra nigra P. abies P. nigra nigra

P. sylvestris

P. abies Larix sp. P. abies P. abies Larix kaempferi L. kaempferi P. sylvestris Pinus nigra nigra P. sylvestris Larix sp. Larix sp. Larix sp. J. comunis J. comunis P. sylvestris P. sylvestris P. sylvestris Pinus halepensis

Juniperus oxycedrus Juniperus comunis J. comunis J. comunis Pinus sylvestris P. sylvestris P. sylvestris Picea abies P. abies ?

Picea abies

Plant

205/210 203/218 207/219 173/200 206/203 206/203 204/203 206/218 206/218 206/218 217/202 217/202 216/202 217/202 217/202

Spain, Tuejar Spain, Pto. de San Miguel Sweden, Ska˚ne, Ho ¨rro¨d, Rallate´ Sweden, Va˛stergo ˛tland, Mariestad, Granga ¨rde Sweden, Ska˚ne, Maglehen, Sva ¨rjareboden Sweden, Ska˚ne, Forsakar Spain, Arties Sweden, Ska˚ne, Maglehen, Sva ¨rjareboden Sweden, Va¨stergo¨tland, Mariestad, Granga¨rde United States, Pacific NW (leg. Stary) Sweden, Uppland, Ha¨sselby hage Sweden, Ska˚ne, Degeberga, Bo ¨kestorp Sweden, Ska˚ne, Degeberga, Bo ¨kestorp Germany (leg. Vo˛lkl) Sweden, Ska˚ne, Forsakar Sweden, Ska˚ne, Forsakar Sweden, Ska˚ne, Bro ¨sarp Spain, Barracas Sweden, Ska˚ne, Forsakar Sweden, Ska˚ne, Maglehen Sweden, Ska˚ne, Bro ¨sarp Sweden, Ska˚ne, Bro ¨sarp Sweden, Ska˚ne, Ho ¨rro¨d, Rallate´ Spain, Alcala´ de la Selva Sweden, Ska˚ne, Vebero¨d Sweden, Ska˚ne, Forsakar Sweden, Ska˚ne, Vebero¨d Spain, Ahillas

216/218 (AJ289906/AJ401997) 230/235 (AJ289901/AJ401992) 98c/231 (AJ289902/AJ401993) 113c/232 (AJ289903/AJ401994) 99c/235 (AJ289904/AJ401995) 225/235 (AJ289900/AJ401991) 95c/244 (AJ289899/AJ401990) 211/229 (AJ289905/AJ401996) 217/243 (AJ289907/AJ401998)

Sweden, Sma˚land, Mullsjo ¨ Spain, Barracas Spain, Ahillas Spain, Valdelinares Spain, Fte. San Guille´n Spain, Brotos Denmark, Jutland, Vejle, Eskholt Spain, Olba Syria, Qala at Al Hosn (leg. Fowler & Shaw)

210/205 (AJ289925/AJ402015)

211/183 (AJ289915/AJ402006) 211/181 (AJ289916/AJ402007) 211/183 (AJ289914/AJ402005) 172/157 (AJ289917/AJ402008) 172/200 (AJ289919/AJ402009) 172/199 (AJ289920/AJ402010) 210/103b (AJ289921/AJ402011) 210/103b (AJ289922/AJ402012) 211/189 (AJ289923/AJ402013)

(AJ289909/AJ402000) (AJ289910/AJ402001) (AJ289911/AJ402002) (AJ289912/AJ402003) (AJ289913/AJ402004)

(AJ289891/AJ401982) (AJ289892/AJ401983) (AJ289893/AJ401984) (AJ289918/AJ402009) (AJ289894/AJ401985) (AJ289895/AJ401986) (AJ289896/AJ401897) (AJ289897/AJ401988) (AJ289898/AJ401989) (AJ289908/AJ401999)

212/206 (AJ289924/AJ402014)

First/Second Intron Lengths (EMBL/GenBank/DDBJ accession nos.)

Sweden, Va¨stergo¨tland, Mariestad, Granga¨rde

Locality

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 2, 2013

Subgenus Paraphidius Stary, 1960 P. infulata (Haliday, 1834). . . . . . . . . . . . . .

Genus Pauesia Quilis, 1931

Table 1 Pauesia Specimens Analyzed in the Study of EF-1a Introns

1120 Sanchis et al.

? ? ? Outgroups Aphidius colemani Viereck, 1912 1. . . . . . . A. colemani 2. . . . . . . . . . . . . . . . . . . . . . . . . Pseudopauesia prunicola Halme, 1986 . . . .

NOTE.—Labels 1–4 are used to distinguish specimens of same species; ‘‘c’’ indicates that the sequences of all the specimens were identical for each specie s and we therefore included only one of them in the analyses. a Taxa whose taxonomic status were initially misidentified or dubious. (1) The identity of P. cupressobii 4 is obscure; the molecular analysis indicates that the true identity should be Pauesia sp. A, but the morphology contradicts that result. Thus, there might have been a confusion of specimens at some stage. (2) For P. similis 3, a male dwarf specimen emerged together with a female P. jezoensis specimen, initially assigned to the latter species, but, e.g., the number of antennae segments indicates that it actually was a P. similis. (3) The identity of P. similis 4 should be mentioned. Vo¨lk (personal communication) claims that ‘‘P. pini’’ specimens reared from C. pinea on Pinus can be reared on C. piceicola on Picea. Our analysis shows that P. similis 4 (reared from C. piceicola) is clearly distinct from P. pini. It remains, however, to settle whether this spruce and larch-living taxon also parasitizes aphids on Pinus. (4) Pauesia picta 4 was morphologically identified as P. picta. In contrast to other analyzed P. picta (and P. picta/P. laricis), this specimen does not have homopolymers in the first intron. If the true identity of this sample were P. laricis, the two species of the laricis group would appear to be distinctive, well-supported sister groups. (5) The P. picta/P. laricis specimen was a male that emerged from a lone mummy, initially identified to be a member of the group P. sg. jezoensis, but after molecular analysis, a reexamination of the mummy clearly showed it to belong to the P. sp. laricis. b Aberrant sequences. c Sequences with homopolymers.

Spain Spain Germany (leg. Vo˛lkl) ? ? ?

Plant Aphid Host Genus Pauesia Quilis, 1931

Table 1 Continued

1121

swap’’; smaller numbers of random additions, with more exhaustive swapping, failed to find shorter alignments). 5. A direct optimization procedure (see Wheeler 1996) implemented in POY native code, version 2.0, by David Gladstein and Ward Wheeler (obtained from anonymous ftp.amnh.org/pub/wheeler/programs/poy) was used. We performed three analyses with gap costs set at one, two, and three times that of a substitution. For each gap cost, we performed 100 random additions holding a maximum of 100 trees (branch swapping with both SPR and TBR). All of the methods except POY produced a multiple alignment from which we built MP trees using PAUP* (Swofford 1998) (equal weighting and with 500 random additions, branch swapping using TBR on a maximum of 10 trees up to a limit of 1,000 trees). Additionally, for all methods except POY and the hierarchic ‘‘by eye’’ method, we treated gaps both as missing data and as fifth bases. No regions were excised except in method 3. All alignments are available on http:// www.bio.ic.ac.uk/research/data/pauesia. The 27 alignment positions that appeared heterozygous were coded as missing data. No heterozygosity for length was present except for one unreadable region (around 150 bp) in some individuals (P. picta 1, 2, 3 and P. picta/P. laricis), which might be a sequencing artifact caused by homopolymers. Comparison of Alignment Methods Two measures of congruence were used to compare the five alignment methods: 1. Congruence between the 227 exon characters and trees produced using the different alignment methods on the introns alone. For each of the two introns, we built trees using the five alignments described above. Also using PAUP*, we then measured the lengths of these trees according to the exon characters only, which were treated as unordered characters (i.e., we counted the minimum number of changes required by mapping these characters onto the trees). The most congruent alignment methods are those that give the lowest values (i.e., require the fewest changes). 2. Comparison of the topological congruence (a measure of similarity between different topologies) between trees produced separately from the two introns using the different alignment methods. For this, we chose to use the partition metric of Penny and Hendy (1985), implemented as the symmetric distance in PAUP*, and give the modal value for ease of calculation. Phylogenetic Analysis of Complete Data Set After measuring the congruence of the different alignment methods, we combined in a series of data matrices the exon data plus the two introns aligned using the five methods. On the basis of the complete phylogenetic information, we reconstructed new maximum-

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 2, 2013

Locality

First/Second Intron Lengths (EMBL/GenBank/DDBJ accession nos.)

Phylogenetic Inferences Based on Variable-Length EF-1a Introns

1122

Sanchis et al.

Constraint Analysis To measure the support for groupings recovered in our analyses, we used the Kishino-Hasegawa (Kishino and Hasegawa 1989) and Templeton (1983) tests as implemented in the program package PHYLIP, version 3.53c. We also constrained one topology representing the monophyly of the traditional subgenus Paraphidius; we employed the previous tests in order to compare this topology with the optimal topologies obtained in our analyses.

Table 2 Comparison of Different Alignment Methods by Mapping Exon Characters onto Resulting Trees (range shown) and by Topological Congruence of Trees from the Two Introns (modal value shown)

MAPPING EXON CHARACTERa METHOD

Intron 1

CLUSTAL . . . . . . . . . . GOP/GEP 5 8/ 29–30 (18)b 0.5 . . . . . . . . . . . . . . . . 30–35 (320)c By eye (standard) . . . . 26–28 (30)b 26–35 (1,000)c By eye (hierarchic) . . . 27–28 (364)b MALIGN . . . . . . . . . . . 32–34 (60)b Gap cost 5 2 . . . . . . 25–33 (1,000)c POY gap 5 1 . . . . . . . 31–37 (200)b POY gap 5 2 . . . . . . . 31–35 (100)b POY gap 5 3 . . . . . . . 33–37 (494)b

Intron 2

MODAL VALUE OF TREE DIFFERENCE METRIC

28–32 (1,000)b 32–36 (1,000)c 32 (4)b 28–33 (1,000)c 27–31 (486)b 30 (54)b 28–31 (1,000)c 31–36 (363)b 32 (16)b 34–41 (293)b

30b 26c 25b 25c 17b 26b 24c 28b 22b 30b

a

The numbers of MP trees are shown in parentheses. Results with gaps treated as missing data. c Results with gaps treated as fifth bases. b

Results Comparison of Alignment Methods The EF-1a introns had a high nucleotide bias with several runs of A’s and/or T’s (A1T 5 72%). This made aligning the sequences by eye very difficult but—perhaps surprisingly—these alignments scored well in both measures of congruence (table 2). In fact, there was no striking difference among the different methods, and we found considerable variation in the performance of exon characters on different equally parsimonious trees produced by each method. Two of the specimens (P. juniperorum 1 and 2) had particularly aberrant second-intron sequences (table 1), so we repeated the analyses pruning these from the trees: the results obtained were similar (not shown). Thus, our analyses did not show any method as being markedly more congruent than the others. Phylogenetic Analysis of Complete Data Set Figure 1 shows the strict consensus of all MP trees built from the five multiple-alignment strategies. Regardless of alignment method, we found two main clades (1 and 2), which, in turn, contained six clades plus three branches represented by single specimens. The relationships among these nine branches were very sensitive to alignment protocol (see below). We focused our attention on phenetic-based criteria (less computationally demanding) for examining the sensitivity of our data to tree-building protocol. Method 2 obtained some of the best scores and produced a reasonable number of MP trees (table 2). However, in order to avoid the subjectivity of dealing with ‘‘by eye’’ alignments, we based the NJ, MP, and ML reconstructions on method 1 as well as method 2. Hence, we worked with 796 (after removing gap columns in the 918 initially aligned positions) and 756 aligned positions, re-

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 2, 2013

parsimony (MP) trees in order to examine their node sensitivity. An additional method of examining the sequence alignment process is to choose a single alignment algorithm and vary its parameter values (e.g., Wheeler 1995; Morrison and Ellis 1997). In this way, it is possible to cover most of the phylogenetic variation concerning alignment (Wheeler 1995). Following Morrison and Ellis (1997), we chose the alignment program CLUSTAL W and examined in detail the sensitivity of the phylogenetic estimates to changes in some of the multiple-alignment parameters. We focused on the gap cost ratios, because they are recognized as one of the most important alignment parameters (Tyson 1992; Vingron and Waterman 1994; Wheeler 1995; Morrison and Ellis 1997). There is no way of determining these values a priori (Rinsma-Melchert 1993), and hence we followed Morrison and Ellis’ (1997) strategy of logarithmically varying the gap opening penalty (GOP) and the gap extension penalty (GEP). We varied the GOP (the cost of inserting a new gap into a sequence) from 0.5 to 64 times the cost of a substitution (log2 GOP 5 21, 0, 1, 2, 3, 4, 5, 6). The GEP (the cost of extending an existing gap) was varied from 0.031 to 8 times the cost of a substitution (log2 GEP 5 25, 24, 23, 22, 21, 0, 1, 2, 3). We tested all possible combinations of these values. In these analyses, we included the complete data set (introns plus exons) and the three outgroups. We also tested the sensitivity of the 72 generated alignments to tree-building protocol. For this purpose, three common methods were used: neighbor joining (NJ), MP, and maximum likelihood (ML). NJ analyses were performed using the MEGA program, version 1.01 (Kumar, Tamura, and Nei 1993); distances were estimated using the Jukes-Cantor correction (Jukes and Cantor 1969). MP analyses were performed using PAUP*. We employed unweighted parsimony and three weighting schemes: (1) intron versus exon positions weighted 2 to 1, (2) first plus second codon positions versus third plus intron positions weighted 2 to 1, and (3) first plus second codon positions versus third plus intron positions weighted 5 to 1. ML analyses were performed using DNAML (PHYLIP, version 3.53c; Felsenstein 1993). Bootstrapping (Efron 1982) was used to establish the relative support of each node in the trees. Gaps were treated as missing data in this set of analyses.

Phylogenetic Inferences Based on Variable-Length EF-1a Introns

spectively. In spite of their similar lengths, these two alignments differed in their phylogenetic estimates. Figure 2 shows the NJ tree obtained from the CLUSTAL alignment (method 1). The figure legend gives details of how this topology varied when other tree-building protocols were used on this alignment and when other alignment methods were employed. It is clear that the branches that were sensitive to tree-building protocol were the same as those that were sensitive to alignment method. Within clade 1, the relationship between P. unilachni, Pauesia sp. B, and Pauesia sp. C could not be established, nor could it be determined whether these species belonged to the clade ahtanumensis-jezoensispinicollis-cupressobii (AJPC) or to the clade laricis-picta (LP). Similarly, within clade 2, the relationships among the clade infulata-juniperorum (IJ), the clade sp. A, the clade silvestris-silana-pini (SSP), and the clade similis-alpina (SA) could not be inferred reliably. Sequence Gap Penalties The combination of the GOP/GEP values in CLUSTAL produced results similar to those produced by variation in both the alignment method and the tree-building protocol. The 216 trees showed the same basic structure (figs. 1 and 3) with the same rearrangements in the tips (fig. 2). Operating in this way, however, we were able to recover a greater variation in the relationships among sensitive clades than was recovered by trying several alignment methods (figs. 4 and 5). Realignment and Analyses Within Supported Clades After finding two main clades (robust both to alignment and to tree-building protocol), we realigned sepa-

rately the sequences belonging to each clade in order to find more accuracy. The new alignments were performed using the default parameters of CLUSTAL W. The tree-building—by NJ, ML, and unweighted MP— was performed as explained above. For clade 2, the resulting alignment contained 686 positions (with 113 variable and 71 informative). The three tree-building methods gave the same topology 2 (fig. 5A), but only the branch splitting the clade infulatajuniperorum (IJ) was bootstrap-supported (91% by NJ and 87% by MP); unweighted MP resulted in two trees of length 167 and a consistency index (CI) of 0.910. Regarding clade 1, the new alignment did not resolve the position of the three problematic branches. Excluding the more variable regions from the analyses did not lead to the complete resolution of these branches. Nevertheless, most of these trees reproduced the same topology 1 (fig. 4A) (data not shown), which would be quite in agreement with the current classification for the genus Pauesia (both P. unilachni and Pauesia sp. B are ascribed to the subgenus Pauesia s.s.). Hereinafter, we use the robust relationships established in topology 2 (fig. 5A) and the higher affinity of P. unilachni and Pauesia sp. B to Pauesia sg. laricis as criteria for testing the taxonomic congruence of the estimates obtained by varying the CLUSTAL parameters. Comparison of Sequence Gap Penalties and Tree-Building Protocols As expected, increasing the GOP and the GEP relative to the cost of a substitution decreased the resulting aligned sequence length, and there was a convergence for this data set to 750 nucleotide positions. We also observed a periodicity in the parameters defining both trees and alignments. This periodicity was dependent on the GOP values (fig. 6). Thus, the higher the GOP value, the higher the number of variable and informative positions and the higher the likelihood of the tree, but also the shorter the alignment length and the lower the CI. However, this correlation between parameters did not correspond with topology similarity, indicating that the GEP also influences the phylogenetic inference. Consequently, there are no apparent means of predicting cladogram similarity from the gap penalty values (figs. 4 and 5). Therefore, we decided to check which GOP/GEP combinations led to the most congruent and best bootstrap-supported trees. We compared the resulting topologies under the taxonomic criteria defined above. Congruence between tree-building protocols was measured as the total number of different topologies obtained by NJ, MP, and ML with each parameter combination (counting topologies for clade 1 plus those for clade 2; figs. 4 and 5, respectively). These numbers ranged from a minimum of three topologies to a maximum of six. The highest congruence (only three different topologies) was reached with the log2 GOP/log2 GEP combinations of 21/25, 1/25, 21/24, 1/24, 21/23, 1/23, 0/22, and 1/22 (corresponding to the alignments numbered 1, 3, 9, 11, 17, 19, 26, and 27 in fig. 6). All

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 2, 2013

FIG. 1.—Strict consensus of all MP trees from different multiplealignment methods. Numbers in parentheses indicate how many specimens are in each clade (table 1). Species compositions for each abbreviated clade are as follows: AJPC 5 Pauesia ahtanumensis, Pauesia jezoensis, Pauesia pinicollis, and Pauesia cupressobii; LP 5 Pauesia laricis and Pauesia picta; SA 5 Pauesia similis and Pauesia alpina; sp. A 5 Pauesia sp. A; SSP 5 Pauesia silana, Pauesia silvestris, and Pauesia pini; and IJ 5 Pauesia infulata and Pauesia juniperorum. The traditional species groups and subgenera are indicated on the right-hand side.

1123

1124

Sanchis et al.

these combinations reconstructed the same topologies: (1) topology 3 for clade 1 (fig. 4A), and (2) topology 2 (by ML and MP) and topology 5 (by NJ) for clade 2 (fig. 5A). These topologies matched most of the majority-rule consensus trees shown in figure 3 and exhibited some of the highest CIs. However, the parameters employed to compare phylogenetic trees (e.g., CI and likelihood (2ln L)) are usually correlated with the number of aligned positions, and hence it is not possible to use them as a criterion for choosing the best alignments. Also, in clade 1, these topologies were incongruent with the taxonomic criteria discussed above. Regarding the robustness of the phylogenetic estimates, we performed bootstrap tests in the 72 NJ analyses (fig. 3). We found a bias in the distribution of topologies with bootstrap-supported nodes (figs. 4A, 4B,

5A, and 5B). Hence, most restrictive gap penalties supported topology 1 within clade 1 (fig. 4A and B). The bias within clade 2 was not apparent (fig. 5A and B). Nevertheless, the GOP/GEP combinations leading to supported relationships within clade 1 were not coincident with those combinations supporting nodes within clade 2 (except for the log2 GOP/log2 GEP combination of 4/0). In general, the bootstrap-supported nodes were congruent with our taxonomic criteria. In only four GOP/GEP combinations were some incongruencies detected within clade 2. Such incongruencies dealt with topologies 4 and 5 showing their internal nodes to be bootstrap-supported (fig. 5A and B). Comparing the different reconstruction methods, the log2 GOP/log2 GEP combination (4/0) leading to the most robust NJ topology (i.e., the topology with the greatest number of boot-

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 2, 2013

FIG. 2.—Neighbor-joining (NJ) tree obtained with the complete data set aligned by CLUSTAL (method 1; 796 positions). Branch lengths are proportional to nucleotide divergence. Bootstrap values are given above each branch (first value, NJ; second value, maximum parsimony [MP]). For clade 1 (fig. 4A), topology 3 was obtained by unweighted MP and maximum likelihood (ML). For clade 2 (fig. 5A), topology 1 was obtained by ML and topology 2 was obtained by unweighted MP. Based on the ‘‘by eye’’ alignment (method 2; 756 positions), NJ recovered the same topology; in clade 1 (fig. 4A), topology 3 was obtained by unweighted MP and ML, and in clade 2 (fig. 5A), topology 3 was obtained by ML, topology 5 was obtained by unweighted MP and weighting 5 to 1 the first and second codon positions versus the third and intron positions, and topology 2 was obtained by MP both weighting exon versus intron positions 2 to 1 and weighting the first and second codon positions versus the third and intron positions 2 to 1. Bold lines indicate those branches that were also obtained by MP with all other multiple alignments. All of the reconstructions recovered similar rearrangements in the tips (those with bootstrap support .70%; Berry and Gascuel 1996). The traditional subgeneric classification, as well as our groupings, is indicated on the right-hand side.

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 2, 2013

FIG. 3.—Majority-rule consensus from the three tree-building methods (maximum likelihood, neighbor joining, and maximum parsimony) using the 72 alig nments obtained by varying the GOP/ GEP parameters. Numbers in bold are the majority-rule consensus. Other numbers are the mean bootstrap values (estimated by neighbor joining), with their ranges in brackets. See figure 2 for relationships within each clade.

Phylogenetic Inferences Based on Variable-Length EF-1a Introns 1125

1126

Sanchis et al.

Kishino-Hasegawa and Templeton Tests Due to the paraphyletic status of subgenus Paraphidius, found in all our topologies, we questioned the current subgeneric classification within the genus Pauesia. Hence, we performed the tests of Kishino and Hasegawa (1989) and Templeton (1983) to check whether or not this apparent paraphyly could be due to chance. In order to discard background noise, we constrained eight topologies including the main representatives of each clade: two outgroups (A. colemani 1 and 2), three specimens of Pauesia sg. jezoensis (P. ahtanumemsis, P. pinicollis 1, and P. cupresobii 2), three specimens of Pauesia sg. laricis (P. laricis and P. picta 2 and 3), two specimens of the IJ clade (P. infulata and P. juniperorum 1), one specimen of the SA clade (P. similis 1), one specimen of the SSP clade (P. silvestris), and one specimen of the clade sp. A (Pauesia sp. A 1). Pauesia unilachni, Pauesia sp. B, and Pauesia sp. C were not included in these topologies because their positions were not resolved in our analyses, and these tests do not accept polytomies. Except for the eighth topology, all the topologies showed P. sg. jezoensis and P. sg. laricis as sister groups forming clade 1 (a relationship obtained in all our analyses), and for clade 2, each numbered topology in table 3 reproduced the corresponding topology in fig← FIG. 4.—A, Topologies obtained for clade 1 and their distribution in the 72 alignments under the three tree-building methods: B, neighbor joining, C, maximum-likelihood, and D, maximum parsimony. Lowercase letters (a, b) indicate when the nodes in the corresponding topology (A) were supported by bootstrap values .70%.

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 2, 2013

strap-supported internal nodes) recovered topologies that were incongruent with each other and with our taxonomic criteria (e.g., topology 3 for clade 2; fig. 5A). This shows that bootstrap values of single alignments do not necessary reflect their robustness to variation in alignment parameters. Out of eight different topologies obtained for clade 1 (fig. 4A) and seven topologies for clade 2 (fig. 5A), some of them were totally incongruent with those produced with the within-clade alignments and with our taxonomic criteria. Regarding clade 2, these topologies were 1, 3, 6, and 7 (fig. 5A). Due to the low resolution gained for clade 1, we lacked an explicit criterion for discarding topologies. We tentatively discard topologies 3, 6, 7, and 8 as the most unrealistic. Taking these things into consideration, the comparison of tree-building protocols showed the ML estimation to be the most sensitive to alignment-parameter variation and to lead to many incongruent topologies. NJ recovered a large number of congruent topologies but never succeeded in recovering the preferred topology (topology 2) for clade 2, and MP showed a contradictory result, as it recovered the highest number of different topologies for clade 1 (most of them incongruent with the preferred topology) but the lowest for clade 2 (largely congruent with topology 2).

Phylogenetic Inferences Based on Variable-Length EF-1a Introns

1127

ure 5A. The eighth topology forced the subgenus Paraphidius (formed by P. sg. jezoensis plus of all the clusters forming our clade 2) and the subgenus Pauesia s.s. (formed by P. sg. laricis) to be monophyletic. Basing on the CLUSTAL alignment (method 1), we compared all eight of these topologies by performing the previous tests. The eighth topology was clearly rejected by both tests, so we cannot accept the monophyly of the subgenus Paraphidius (this eighth topology was also rejected when the tests were based on the ‘‘by eye’’ alignment [method 2]; data not shown). The rest of topologies were considered statistically similar by the tests (except for topology 7, which was rejected by Templeton’s test; table 3). Discussion Variable-Length Sequence Alignment

← FIG. 5.—A, Topologies obtained for clade 2 and their distribution in the 72 alignments under the three tree-building methods: B, neighbor joining, C, maximum-likelihood, and D, maximum parsimony. Lowercase letters (a, b) indicate when the nodes in the corresponding topology (A) were supported by bootstrap values .70%. ‘‘P’’ indicates topologies where Pauesia is recovered as paraphyletic with respect to Pseudopauesia prunicola.

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 2, 2013

Under the different alignment algorithms tested, all of the topologies showed considerable similarity (figs. 1 and 3). This shows that a strong phylogenetic signal is present in the EF-1a introns. Both the basic structure (fig. 1) and the minor rearrangements in the tips (fig. 2) were always recovered and were usually supported by high bootstrap values (.70%; Berry and Gascuel 1996). Testing the internal consistency of the five alignment methods revealed no striking difference between them. All of them showed considerable variation both in the performance of exon characters (on different equally parsimonious trees produced by each method) and between variants of the same method. The most striking result was that alignments made by eye and by computationally less demanding programs such as CLUSTAL appeared to perform as well as those produced by the more theoretically justifiable algorithms. With regard to ‘‘by eye’’ alignments, we recognize that they have the disadvantage of being unrepeatable and dependent on the investigator’s criterion. However, their phylogenetic estimations deserve consideration if they are done blind (i.e., aligned without knowledge of the sequence names) and their results are contrasted with others coming from a different alignment algorithm. In addition, alignment algorithms sometimes produce misleading solutions (e.g., insertion of gap columns) that need to be manually refined. Most of the prominent differences between alignment strategies, dealing with the three problematic taxa of clade 1 and with the relationships among the four monophyletic clusters of clade 2, were also found when different tree-building protocols were used on an alignment. Hence, most of the ML and MP analyses recovered P. unilachni, Pauesia sp. C, and Pauesia sp. B in

1128

Sanchis et al.

FIG. 6.—Relationships between the alignment and tree-building parameters in the 72 GOP/GEP combinations. Consistency index (CI) and numbers of variable (VP) and informative positions (IP) in each alignment are shown on the left-hand side. The likelihood (2ln L) and the length of each alignment (ca.) are on the right. In the abscissa, increasing GOP values (eight values, from 0.5 to 64) are represented for each fixed GEP value (indicated below each alignment).

tern depending on the parameter values (also found by Morrison and Ellis 1997). Therefore, exhaustive searches of the parameter space seem to be necessary. However, we could have detected the strength of the phylogenetic signal contained in our data just by testing the three different tree-building protocols with six of the GOP/GEP combinations. These six combinations would include three GOP values (representing strong, medium, and weak gap penalties) versus two GEP values (paying attention to the periodicity observed in fig. 6). In the present case, we would discard the more restrictive parameters (both log2 GOP 5 6 and log2 GEP 5 3), as they led mostly to incongruent topologies (see figs. 4 and 5). Phylogeny of European Pauesia Species There are several conclusions regarding the phylogeny of Pauesia that are not sensitive to either alignment or tree-building method. First, the genus Pauesia seems to be monophyletic with respect to Pseudopauesia. Only under a few extreme alignment parameters was Pseudopauesia recovered within Pauesia (e.g., with the most restrictive GOP value but never combined with the more restrictive GEP; see fig. 5B and D). Second, we found two well-supported sister clades (fig. 2). Clade 1 includes two species groups, (1) P. sg. jezoensis and (2) P. sg. laricis, plus three other species (P. unilachni, Pauesia sp. C, and Pauesia sp. B) whose relationships could not be clarified. Both our sequences and the morphological characters (not shown) indicate that Pauesia sp. B and Pauesia sp. C represent valid species. However, the relationship of Pauesia sp. B within the group P. sg. laricis and Pauesia sp. C within the subgenus Pauesiella are at present uncertain. Clade 2 includes the species of Pauesia sg. pini as well as P. infulata (table 1). This clade contains four subclades: (1) similis-alpina, (2) sp. A, (3) silvestrissilana-pini, and (4) infulata-juniperorum. Although relationships among these subclades were sensitive both to alignment and to tree-building method, further analyses (based on alignments within clade 2) indicated that the subclade formed by the species P. infulata plus P.

Table 3 Comparison of the Different Relationships Obtained in this Study with the Current Classification for the Genus Pauesia by the Kishino-Hasegawa and Templeton Tests TEMPLETON TEST

1 2 3 4 5 6 7 8

KISHINO-HASEGAWA TEST Significantly Worse?

TREE

Steps

Diff. Steps

Its SD

...... ...... ...... ...... ...... ...... ...... ......

757.0 753.0 753.0 751.0 754.0 759.0 762.0 813.0

6.0 2.0 2.0 Best 3.0 8.0 11.0 62.0

5.2948 2.4510 5.1022

No No No

2.2375 5.1022 4.5854 8.1288

No No Yes Yes

Significantly Worse?

Ln L

Diff Ln L

Its SD

22,900.79518 22,891.91464 22,902.00520 22,892.38756 22,892.39837 22,901.97928 22,905.17402 22,951.13295

28.88054 Best 210.09055 20.47292 20.48373 210.06464 213.25937 259.21830

10.1343

No

9.8125 1.0757 1.0742 9.7510 9.1600 13.7894

No No No No No Yes

NOTE.—Both tests are based on the CLUSTAL alignment (method 1). Topologies 1–7 accept Pauesia sg. jezoensis and Pauesia sg. laricis as sister groups belonging to clade 1. Regarding clade 2, each numbered topology reproduces the corresponding situation in figure 5A. Topology 8 forced the subgenera Pauesia s.s. and Paraphidius to be monophyletic. SD 5 standard deviation; L 5 likelihood.

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 2, 2013

the same cluster as P. sg. jezoensis (87% and 73% in the consensus tree, respectively; fig. 3), but the NJ analyses tended to recover them with P. sg. laricis (57%; fig. 3). This latter relationship was supported most of the time by high bootstrap values (figs. 3 and 4B). On the other hand, the MP analyses revealed a bias favoring topology 2 within clade 2 (fig. 5A and D). The subsequent within-group alignments reduced the number of ambiguous positions (at least for clade 2), and the topologies estimated with these new alignments gained robustness and resolution. Based on these alignments, we propose topology 2 (fig. 5A) as the most probable taxonomic hypothesis for the relationships within clade 2. If topology 2 is the best estimate, then in our study the most restrictive CLUSTAL parameters led to the most incongruent alignments. On the same basis, ML appears to be the most alignment-sensitive reconstruction method, and MP appears to be the most robust. We also find, like Morrison and Ellis (1997), that testing alignment strategy is as important as tree-building protocol. Both trying different alignment methods and varying parameter values in the same alignment algorithm seem to be similarly effective in revealing the sensitive branches. Regarding CLUSTAL, there is apparently no such thing as a topological distribution pat-

Phylogenetic Inferences Based on Variable-Length EF-1a Introns

juniperorum forms a sister group to the clade P. sg. pini (SSP 1 SA 1 SP. A). Interestingly, P. juniperorum appears to be a well-supported sister species to P. infulata. All of the analyses showed that the current classification with three subgenera (Paraphidius, Pauesia s.s., and Pauesiella) is not reliable, at least with reference to the Paraphidius monophyly (table 3 and fig. 2). In the absence of a clear morphological distinction between clades 1 and 2, we can only recommend that the current subgeneric classification be abandoned.

1129

be an artifact). Therefore, all of them probably belong to the same species (still undescribed). Three species can be drawn within the silvestrissilana-pini clade. Nevertheless, from the present data arises the question of whether P. silana could be conspecific with P. silvestris. In conclusion, we believe that our 38 Pauesia individuals, although morphologically referable to 13 described and 3 new species, probably represent between 15 and 21 species which conflict with the morphologically defined species boundaries.

Species Boundaries Acknowledgments This work was funded by the Conselleria d’Educacio´ i Cie`ncia (GV-3216/95), by grant PB96-0793C04-01 from DGES, and by NERC in the U.K. We thank P. Gonza´lez-Funes, S. Fowler, S. Shaw, P. Stary, and W. Vo¨lkl for material; D. Swofford and W. Wheeler for their programs, PAUP and POY, respectively; and F. Gonza´lez-Candelas and three reviewers for their useful comments on an earlier draft. The facilities at SCSIE (Universitat de Vale`ncia) were used for sequencing, and the Servei de Bioinforma`tica provided computer support. LITERATURE CITED

BELSHAW, R., and D. L. J. QUICKE. 1997. A molecular phylogeny of the Aphidiinae (Hymenoptera: Braconidae). Mol. Phylogenet. Evol. 7:281–293. BELSHAW, R., D. L. J. QUICKE, W. VO¨LKL, and C. J. GODFRAY. 1999. Molecular markers indicate rare sex in a predominantly asexual parasitoid wasp. Evolution 53:1189–1199. BERRY, V., and O. GASCUEL. 1996. On the interpretation of bootstrap trees: appropriate threshold of clade selection and induced gain. Mol. Biol. Evol. 13:999–1011. BRANDS, J. H. G. M., J. A. MAASSEN, F. J. VAN-HEMERT, R. AMONS, and W. MO¨LLER. 1986. The primary structure of the alpha subunit of human elongation factor 1: structural aspects of guanine nucleotide binding sites. Eur. J. Biochem. 155:167–172. BROWER, A. V. Z., and R. DESALLE. 1994. Practical and theoretical considerations for choice of a DNA sequence region in insect molecular systematics, with a short review of published studies using nuclear gene regions. Ann. Entomol. Soc. Am. 87:702–716. CLARK, A. G., B. G. LEICHT, and S. V. MUSE. 1996. Length variation and secondary structure of introns in the Mlc1 gene in six species of Drosophila. Mol. Biol. Evol. 13:471– 482. DANFORTH, B. N., and S. JI. 1998. Elongation factor-1a occurs as two copies in bees: implications for phylogenetic analysis of EF-1a sequences in insects. Mol. Biol. Evol. 15: 225–235. EFRON, B. 1982. The jackknife, the bootstrap and other resampling plans. SIAM, Philadelphia. FABRY, S., A. KO¨HLER, and A. W. COLEMAN. 1999. Intraspecies analysis: comparison of ITS sequence data and gene intron sequence data with breeding data for a worldwide collection of Gonium pectorale. J. Mol. Evol. 48:94–101. FELSENSTEIN, J. 1993. PHYLIP (phylogeny inference package). Version 3.5c. Distributed by the author, Department of Genetics, University of Washington, Seattle.

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 2, 2013

Following alignment, we examined groups of similar sequences to estimate where the species boundaries lay. Although we cannot confidently draw species boundaries from only a few individuals sequenced at a single locus, we can make some inferences. Belshaw et al. (1999) sequenced EF-1a intron 2 in 12 individuals of another aphidiine species of the genus Lysiphlebus, each from a different locality across western Europe, and found no variation, indicating that the introns may be relatively conserved at the species level. Within the P. sg. jezoensis species complex, the five P. pinicollis/P. jezoensis individuals are distinguished from the three P. cupressobii individuals by a fixed 2-bp indel, which may mark a species boundary. The status of P. pinicollis and P. jezoensis remains uncertain: individuals are distinguished by only a single substitution. The North American P. ahtanumensis is a clearly distinctive species, always appearing basal to the other members of this group. Hence, we are probably dealing with three or four species within this clade, depending on the status of P. pinicollis. The P. sg. laricis species complex constitutes the most problematic area in our study. Although these sequences cluster together in most phylogenetic analyses (often with high bootstrap support), we found extensive variation between all of the sequences. The most similar individuals (P. picta 2 and 3) are distinguished in the second intron by a short region that contains perhaps 10 fixed substitutions or indels (it is difficult to align). Also, P. picta/P. laricis and P. picta 1, 2, and 3 are characterized by the presence of one homopolymer in intron 1. Therefore, the traditional division between P. laricis and P. picta, the two recognized species, is not supported, and we cannot ignore the possibility that each individual in this complex represents a different species (from a minimum of three to a maximum of six). Sequences of the five individuals identified as P. similis and P. alpina (the similis-alpina clade) are identical except at one exon position in P. similis 3, and this position is clearly homoplastic with the alternate allele shared with P. sg. laricis and P. prunicola (the outgroup). Hence, all five individuals may well belong to a single species or form a species complex that would deserve the name Pauesia species group similis. The three individuals identified as Pauesia sp. A are identical, and the only difference between them and P. cupresobii 4 is an additional T in a T run (which may

1130

Sanchis et al.

NEEDLEMAN, S. B., and C. D. WUNSCH. 1970. A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. 48:444–453. OAKLEY, T. H., and R. B. PHILLIPS. 1999. Phylogeny of salmonine fishes based on growth hormone introns: Atlantic (Salmo) and Pacific (Oncorhynchus) salmon are not sister taxa. Mol. Phylogenet. Evol. 11:381–393. OLSEN, G. J., and C. R. WOESE. 1993. Ribosomal RNA: a key to phylogeny. FASEB J. 7:113–123. PENNY, D., and M. D. HENDY. 1985. The use of tree comparison metrics. Syst. Zool. 34:75–82. RAO, T. R., and L. Y. SLOBIN. 1986. Structure of the aminoterminal end of mammalian elongation factor Tu. Nucleic Acids Res. 14:2409. REGIER, J. C., and J. W. SHULTZ. 1997. Molecular phylogeny of the major arthropod groups indicates polyphyly of the crustaceans and a new hypothesis for the origin of hexapods. Mol. Biol. Evol. 14:902–913. RINSMA-MELCHERT, Y. 1993. The expected number of matches in optimal global sequence alignments, N. Z. J. Bot. 31: 219–230. ROTH, W. W., P. W. BRAGG, M. V. CORRIAS, N. S. REDDY, J. N. DHOLAKIA, and J. A. WAHBA. 1987. Expression of a gene for mouse eukaryotic elongation factor TU during murine erythroleukemic cell differentiation. Mol. Cell. Biol. 7: 3929–3936. SANCHIS, A., A. LATORRE, F. GONZALEZ-CANDELAS, and J. M. MICHELENA. 2000. An 18S rDNA based molecular phylogeny of Aphidiinae (Hymenoptera: Braconidae). Mol. Phylogenet. Evol. 14:180–194. SEDLAG, U., and P. STARY. 1980. Pauesia (Pauesiella subg.n.) spatulata sp.n., a parasitoid of Cinara aphids from Central Europe (Hymenoptera: Aphidiidae; Homoptera: Lachninae). Acta Entomol. Bohem. 77:383–386. STARY, P. 1960. A taxonomic revision of the European species of the genus Paraphidius Stary, 1958. Acta Faun. Entomol. Mus. Nat. Pragae 6:5–38. ———. 1966. The Aphidiidae of Italy (Hymenoptera: Ichneumonoidea). Boll. Ist. Entomol. Univ. Bologna 28:65–139. ———. 1970. Biology of aphid parasites (Hymenoptera: Aphidiidae) with respect to integrated control. W. Junk, B. V. Publishers, the Hague. ———. 1976. Aphid parasites (Hymenoptera: Aphidiidae) of the Mediterranean area. W. Junk, B. V. Publishers, the Hague. ———. 1979. Aphid parasites (Hymenoptera: Aphidiidae) of Central Asian area. W. Junk, B. V. Publishers, the Hague. SWOFFORD, D. L. 1998. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer, Sunderland, Mass. SWOFFORD, D. L., G. J. OLSEN, P. J. WADDELL, and D. M. HILLIS. 1996. Phylogenetic inference. Pp. 407–514 in D. M. HILLIS, C. MORITZ, and B. K. MABLE, eds. Molecular systematics. Sinauer, Sunderland, Mass. TEMPLETON, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of human and apes. Evolution 37:221– 224. THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680. TREMBLAY, E., and F. PENNACCHIO. 1988. Speciation in aphidiine Hymenoptera (Hymenoptera: Aphidiidae). Pp. 139– 146 in V. K. GUPTA, ed. Advances in parasitic Hymenoptera research. E. J. Brill, Leiden.

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 2, 2013

FENG, D. F., and R. F. DOOLITTLE. 1987. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25:351–360. FRIEDLANDER, T. P. 1994. Phylogenetic information content of five nuclear gene sequences in animals: initial assessment of character sets from concordance and divergence studies. Syst. Biol. 43:511–525. FRIEDLANDER, T. P., J. C. REGIER, and C. MITTER. 1992. Nuclear gene sequences for higher level phylogenetic analysis: 14 promising candidates. Syst. Biol. 41:483–490. GIRIBET, G., and W. C. WHEELER. 1999. On gaps. Mol. Phylogenet. Evol. 13:132–143. HASEGAWA, M., T. HASHIMOTO, J. ADACHI, N. IWABE, and T. MIYATA. 1993. Early branchings in the evolution of eukaryotes: ancient divergence of entamoeba that lacks mitochondria revealed by protein sequence data. J. Mol. Evol. 36:380–388. HOVEMANN, B., S. RICHTER, U. WALLDORF, and C. CZIEPLUCH. 1988. Two genes encode related cytoplasmic elongation factors 1a (EF-1a) in Drosophila melanogaster with continuous and stage specific expression. Nucleic Acids Res. 16: 3175–3194. JUKES, T. H., and C. R. CANTOR. 1969. Evolution of protein molecules. Pp. 21–132 in H. N. MUNRO, ed. Mammalian protein metabolism. Academic Press, New York. KAMAISHI, T., T. HASHIMOTO, Y. NAKAMURA, F. NAKAMURA, S. MURATA, N. OKADA, K. I. OKAMOTO, M. SHIMIZU, and M. HASEGAWA. 1996. Protein phylogeny of translation elongation factor EF-1a suggests microsporidians are extremely ancient eukaryotes. J. Mol. Evol. 42:257–263. KELCHNER, S. A., and L. G. CLARK. 1997. Molecular evolution and phylogenetic utility of the chloroplast rpl16 intron in Chusquea and the Bambusoideae (Poaceae). Mol. Phylogenet. Evol. 8:385–397. KISHINO, H., and M. HASEGAWA. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J. Mol. Evol. 29:170–179. KJER, K. M. 1995. Use of rRNA secondary structure in phylogenetic studies to identify homologous positions: an example of alignment and data presentation from the frogs. Mol. Phylogenet. Evol. 4:314–330. KUMAR, S., K. TAMURA, and M. NEI. 1993. MEGA: molecular evolutionary genetic analysis. Version 1.0. Pennsylvania State University, University Park. LENSTRA, J. A., A. VAN VLIET, A. C. ARNBERG, F. J. VAN HEMERT, and W. MO¨LLER. 1986. Genes coding for the elongation factor EF-1a in Artemia. Eur. J. Biochem. 155:475– 484. LISS, M., D. L. KIRK, K. BEYSER, and S. FABRY. 1997. Intron sequences provide a tool for high-resolution phylogenetic analysis of volvocine algae. Curr. Genet. 31:214–227. MADDISON, D. R., M. D. BAKER, and K. A. OBER. 1999. Phylogeny of carabid beetles as inferred from 18S ribosomal DNA (Coleoptera: Carabidae). Syst. Entomol. 24:103–138. MEDVEDEV, G. S. 1995. Keys to the fauna of the URSS. Vol. III. Hymenoptera. Part V. Science Publishers, Leningrad. MITCHELL, A., S. CHO, J. C. REGIER, C. MITTER, R. W. POOLE, and M. MATHEWS. 1997. Phylogenetic utility of elongation factor-1a in Noctuoidea (Insecta: Lepidoptera): the limits of synonymous substitution. Mol. Biol. Evol. 14:381–390. MORRISON, D. A., and J. T. ELLIS. 1997. Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of Apicomplexa. Mol. Biol. Evol. 14:428– 441.

Phylogenetic Inferences Based on Variable-Length EF-1a Introns

TYSON, H. 1992. Relationships between amino acid sequences determined through optimum alignments, clustering, and specific distance patterns: application to a group of scorpion toxins. Genome 35:360–371. VINGRON, M., and M. S. WATERMAN. 1994. Sequence alignment and penalty choice: review of concepts, case studies and implications. J. Mol. Biol. 235:1–12. WALLDORF, U., and B. T. HOVEMANN. 1990. Apis mellifera cytoplasmic elongation factor 1-a (EF-1 a) is closely related to Drosophila melanogaster EF-1a. FEBS Lett. 267: 245–249. WENZEL, J. W., and M. E. SIDDALL. 1999. Noise. Cladistics 15:51–64.

1131

WHEELER, W. C. 1995. Sequence alignment, parameter sensitivity, and the phylogenetic analysis of molecular data. Syst. Biol. 44:321–331. ———. 1996. Optimization alignment: the end of multiple sequence alignment in phylogenetics? Cladistics 12:1–9. WHEELER, W. C., and D. S. GLADSTEIN. 1994. MALIGN: a multiple sequence alignment program. J. Hered. 85:417– 418.

MANOLO GOUY, reviewing editor Accepted February 22, 2001

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 2, 2013

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.