Multi-locus species tree for the Amazonian peacock basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited nuclear variation

Share Embed


Descripción

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/255789766

Multi-Locus Species Tree for the Amazonian Peacock Basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited... Article in Molecular Phylogenetics and Evolution · August 2013 DOI: 10.1016/j.ympev.2013.07.031 · Source: PubMed

CITATIONS

READS

4

239

3 authors: Stuart C Willis

Izeni Pires Farias

Texas A&M University - Corpus Christi

Federal University of Amazonas

19 PUBLICATIONS 444 CITATIONS

232 PUBLICATIONS 1,814 CITATIONS

SEE PROFILE

SEE PROFILE

Guillermo Orti George Washington University 216 PUBLICATIONS 4,925 CITATIONS SEE PROFILE

Some of the authors of this publication are also working on these related projects:

'species are arbitrary' View project

Taxonomy, Biogeography and Conservation of primates from Mico and Callibella genera View project

All content following this page was uploaded by Stuart C Willis on 23 January 2017.

The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately.

Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

Contents lists available at ScienceDirect

Molecular Phylogenetics and Evolution journal homepage: www.elsevier.com/locate/ympev

Multi-locus species tree for the Amazonian peacock basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited nuclear variation Stuart C. Willis a,⇑, Izeni P. Farias b, Guillermo Ortí c a

School of Biological Sciences, University of Nebraska-Lincoln, 348 Manter Hall, Lincoln, NE 68588, United States Laboratório de Evolução e Genética Animal, ICB, Universidade Federal do Amazonas, Estrada do Contorno 3000, Manaus, AM, Brazil c Department of Biological Sciences, The George Washington University, 2023 G St. NW Suite 340, Washington, DC 20052, United States b

a r t i c l e

i n f o

Article history: Received 29 March 2013 Revised 31 July 2013 Accepted 31 July 2013 Available online xxxx Keywords: Gene tree-species tree conflict Coalescence * Beast Bayesian concordance analysis

a b s t r a c t The inference of phylogenies of closely related species is obstructed by phenomena such as porous species boundaries and deep coalescence, and is often exacerbated by low levels of nucleotide variation among most loci surveyed in phylogenetic studies. We investigated the utility of twenty-one nuclear loci that had a range of 5–40 (median of 14) variable sites per locus to estimate the phylogeny of the genus Cichla, a group of 15 Neotropical cichlid fishes that began to diverge in the early to mid Miocene. We found that under a concatenated approach, the least variable loci, while contributing less to the overall phylogenetic signal (posterior node support), nevertheless provided information that increased support for the final tree. Moreover, this was not a result of misdirection by mutational noise, as the inference from all data was far superior to those from reduced datasets (those with more variable loci) in terms of the relative precision of posterior tree space. Phylogenetic methods that allowed each locus to have a separate genealogy, including Bayesian concordance analysis and a multispecies coalescent model, provided phylogenies that were also compatible with the concatenated tree in terms of the eight recently delimited species of Cichla, albeit with somewhat diminished support for some branches. In contrast, described species that still regularly exchange genes showed unstable relationships among analyses: not a surprising result from analyses that assume that gene tree heterogeneity results from incomplete lineage sorting and not gene flow. Importantly, we also observed that the confidence intervals for node ages in the coalescent analyses were quite wide, and likely susceptible to influence of the prior on node density (e.g. birth-death). Ó 2013 Elsevier Inc. All rights reserved.

1. Introduction A major paradigm of systematics has emerged to focus on the inference of species-level phylogenies (Edwards, 2008). Species-level phylogenies or ‘species trees’ are terms used, often simultaneously, to refer to phylogenies in which the tips are closely related species in a genus or a species-complex and in which the relationships and divergence times of all species are of interest, and/or to the containing tree (the organismal phylogeny) in which unlinked and often heterogeneous gene trees must be nested. Because of their taxonomic resolution, species trees provide a powerful means to test hypotheses of evolutionary processes (e.g. Whittall and Hodges, 2007). However, species-level divergences are commonly subject to phenomena that complicate the inference of a robust phylogenetic estimate. Among these are natural processes such as the deep coalescence of gene lineages and porous

⇑ Corresponding author. Fax: +1 (402) 472 2083. E-mail address: [email protected] (S.C. Willis).

species boundaries that allow exchange of gene lineages (Maddison, 1997). While these processes can also complicate phylogenetic estimation of long-diverged taxa (Avise and Robinson, 2008), they are believed to be more pervasive for recently diverged taxa (Pamilo and Nei, 1988; Rosenberg and Nordborg, 2002). In addition, a more pragmatic challenge is that estimating the history of recent divergences may be hindered because a lack of mutational variation at most available loci prevents robust phylogenetic inference (Brown and Yang, 2010; Hillis, 1995; Huang and Knowles, 2009). Although traditional approaches to study recent divergences involve estimating phylogenetic history using cytoplasmic DNA loci that exhibit elevated rates of mutation and coalescence, there has been a renewed emphasis on the need to use multiple unlinked genetic loci due to the vagary that any single marker may present regarding phylogeny (Edwards, 2008; Maddison, 1997). Recent studies using this paradigm have demonstrated highly supported phylogenetic estimates based on the analysis of many separate loci (Edwards et al., 2007). However, it remains unclear how many genes are necessary to infer an accurate and well-supported multi-locus phylogeny (Rokas and Carroll, 2005), particularly when

1055-7903/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ympev.2013.07.031

Please cite this article in press as: Willis, S.C., et al. Multi-locus species tree for the Amazonian peacock basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited nuclear variation. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.07.031

2

S.C. Willis et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

there is a limited and variable number of informative sites among loci (Carmago et al., 2012). We tested the patterns of congruence in phylogenetic signal among unlinked nuclear loci with limited variation using a genus of Neotropical cichlid fishes as a model system. Our phylogenetic reconstruction took three basic paths. First, we estimated a combined phylogeny using concatenated sequences from multiple loci. While it has been observed that more data (i.e. more loci) lead to a stronger phylogenetic estimate (Rokas and Carroll, 2005), not accounting for the heterogeneous evolutionary rate or pattern among sites in the alignment can lead to systematic errors and inconsistent phylogenetic estimates (Buckley et al., 2001; Bull et al., 2006). Moreover, it has long been recognized (e.g. Nei and Li, 1979) that unlinked loci, those inherited separately due to recombination, may exhibit incongruent gene trees, sometimes with high probability, and it has recently become apparent that analyzing concatenated loci can result in statistically inconsistent phylogenetic estimates for species trees with short internal branches (Kubatko and Degnan, 2007). So, to better estimate the level of congruent phylogenetic signal from individual loci, we also implemented a Bayesian concordance analysis (BCA; Ane et al., 2007). In this analysis, the gene tree for each locus is estimated separately, without any influence from other gene trees. A species tree (concordance tree) is then built using topologies from the independent loci while taking into account the uncertainty in each gene tree topology posterior distribution (Ane et al., 2007). This method has the advantage that data-rich partitions cannot dominate the analysis, as they can in concatenated estimates. However, it remains unclear how BCAs perform in the inference of phylogenies at the species level, where not only are gene trees likely to be heterogeneous, but incongruence could be falsely inflated by the low levels of character variation among recently-diverged species (i.e. low signal to noise ratio). Finally, we also estimated a species tree using a multi-species coalescent model (Kingman, 1982; Rannala and Yang, 2003) as implemented in the Beast package (Drummond and Rambaut, 2007; Heled and Drummond, 2010). This analysis, which estimates the probability of individual gene tree topologies and branch lengths with respect to simultaneously sampled species trees (topology, divergence times, and population sizes), should be robust to the pitfalls of trees with short internal branches (Edwards et al., 2007; Heled and Drummond, 2010; Kubatko et al., 2009). It remains uncertain, however, how well these parameter-rich models, which are known to benefit from the inclusion of more loci (Carmago et al., 2012; Huang et al., 2010), operate with many loci that are nonetheless poor in variation. We examined the advantages and constraints of multiple nuclear loci to infer the species tree for a genus of Neotropical fishes. Estimated at over 6000 species, Neotropical freshwater fishes represent the most species-dense assemblage of fishes, and perhaps 10% of all vertebrates species (Albert and Crampton, 2005; Reis et al., 2003). However, the taxonomy and evolutionary biology of this assemblage is woefully under-studied, and there has been relatively little progress in explaining the origins of this remarkable fauna (Hoorn et al., 2010; Wiens et al., 2011). Among these colorful and charismatic fishes, ‘‘peacock bass’’ cichlids of the genus Cichla are known for their voracious strike and powerful fight, and are heavily exploited commercially, artisanally, and recreationally in South America (Winemiller, 2001). While a morphological review determined the presence of 15 described species in the genus (Kullander and Ferreira, 2006), our recent study using extensive molecular data showed that several of these are better considered synonymous with other described species, and we hereafter refer to these combined units as ‘delimited species’ (Willis et al., 2012). The mtDNA of these fishes divides them into two clades: Clade A: Cichla temensis, C. melaniae, C. mirianae, C. piquiti, and C. pinima (including C. jariina, C. thyrorus, and C. vazzoleri), and Clade

B: C. orinocensis, C. intermedia, and C. ocellaris (including C. monoculus, C. nigromaculata, C. kelberi, and C. pleiozona). Within Clade A relationships are obscured by extensive polyphyly of mtDNA lineages even among the delimited species, and C. orinocensis possessed two major mtDNA lineages that were most closely related to different species within Clade B (Willis et al., 2012). A rough cytochrome b clock (1% per million years; Bermingham et al., 1997) would place the earliest divergences among extant species in the late Miocene 7 million years ago (Willis et al., 2010), but multi-locus, time-calibrated analyses place this somewhat deeper in time (17 million years ago; Lopez-Fernandez et al., 2013) due to a relatively slow rate of molecular evolution in the Cichla lineage (López-Fernández et al., 2005). Recent attempts at phylogenetic analysis of this genus using morphological characters have been largely ambiguous due to the high degree of conservation seen in Cichla morphology (Kullander and Ferreira, 2006). As morphological and mitochondrial analyses have produced inconclusive results, here we surveyed multiple, unlinked nuclear loci to estimate the phylogeny of these fishes. We hypothesized that despite the limited variation among available nuclear loci, we would be able to recover a cohesive phylogenetic signal that provided confidence in our phylogenetic estimate. We expected that through the combined use of separate and concatenated, Bayesian concordance, and multi-species coalescent analyses, we would be able to determine the extent of conflict among loci and whether it resulted from random variance (i.e. gene tree estimation error due to mutational homoplasy) or systematic variance (i.e. coalescent stochasticity/gene tree heterogeneity). We chose not to include mtDNA in the analyses presented here so that we could compare the phylogenetic patterns rendered by each data type, but we predicted that the recovered nuclear phylogeny would be consistent with the mtDNA genealogy, while perhaps resolving some of the relationships not readily apparent from the latter.

2. Materials and methods 2.1. DNA amplification and sequence assembly We obtained tissues of all described species of Cichla. Tissue collection and DNA extraction were described previously (Willis et al., 2012). For two to seven representatives of each species (Table 1, Supplemental Fig. 1), we amplified 21 putatively single-copy and unlinked nuclear loci (Table 2). While introgressive hybridization is known to be widespread among Cichla species (though numerically rare, overall), our previous extensive survey of population genetic variation at mtDNA, microsatellites, and two nuclear markers (mitf and xsrc) allowed us to choose representatives of each species which were apparently free of hybrid DNA and that came from regions not known for hybridization (Willis et al., 2012). This was important in order to eliminate sources of error from species tree methods that assume gene tree discordance results entirely from deep coalescence of gene lineages (Eckert and Carstens, 2008; Leaché, 2009) while allowing us to put more effort into surveying more loci rather than more individuals. The nuclear loci we surveyed were derived from the literature or developed specifically for this study, and included protein-coding exons and introns, microsatellite-flanking regions, and anonymous loci (Li et al., 2010; Sides and Lydeard, 1999; Won et al., 2006; Zardoya et al., 1996). One large group of loci we used was developed specifically for this study, and consisted of ‘anonymous’ loci, some of which flanked novel microsatellite regions (Brito and Edwards, 2009). To discover these loci, we obtained sequences from a library of genomic DNA enriched for microsatellite repeat motifs (Macrander et al., 2012). From those sequences that did

Please cite this article in press as: Willis, S.C., et al. Multi-locus species tree for the Amazonian peacock basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited nuclear variation. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.07.031

3

S.C. Willis et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx Table 1 Specimens used in the current study, their source localities, and collections where vouchers were deposited. Species C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C. C.

Locality

temensis temensis melaniaea mirianae mirianae piquitia pinima sensu stricto pinima sensu stricto pinima sensu stricto pinima sensu stricto pinima sensu stricto pinima sensu stricto pinima sensu stricto jariina (pinima)a thyrorus (pinima)a vazzoleri (pinima) vazzoleri (pinima) intermedia intermedia orinocensis orinocensis orinocensis orinocensis ocellaris sensu stricto ocellaris sensu stricto kelberi (ocellaris)a monoculus (ocellaris) monoculus (ocellaris) monoculus (ocellaris) nigromaculata (ocellaris) nigromaculata (ocellaris) pleiozona (ocellaris)a

CI AT XA AF SM SF CM AP TL IT CU PU AR JU TR JT OX CA VE CI AT XE TA PI ES SF IQ IA AR PS MR GM

Coordinates 0

Cinaruco Atabapo Xingu (Altamira) Alta Floresta Suia Missu São Felix do Araguaia Canumã Aripuanã Tapajós mouth Itaituba Curuá-Una Paru Araguari Jari (above waterfalls) Trombetas (abv. rapids) Jatapu Oriximiná Caura Ventuari Cinaruco Atabapo Xeruini Tapera Pirara (Takutu) Essequibo (Rupununi) São Felix do Araguaia Iquitos Igapo-Açu Araguari Pasiba Marauiá Guajará-Mirim

Vouchers 0

6° 32.36 N, 67° 21.84 W 3° 46.860 N, 67° 37.730 W 3° 15.70 S, 52° 4.480 W 9° 54.320 S, 56° 6.740 W 11° 44.50 S, 52° 21.360 W 11° 46.560 S, 50° 43.060 W 4° 0.490 S, 59° 6.010 W 5° 10.240 S, 60° 22.390 W 2° 30.310 S, 54° 57.130 W 4° 16.550 S, 55° 59.040 W 2° 51.520 S, 54° 20.920 W 1° 14.180 S, 53° 3.260 W 0° 56.010 N, 51° 0.220 W 0° 32.610 S, 52° 40.820 W 1° 3.250 S, 57° 3.720 W 0° 23.970 N, 59° 26.70 W 1° 45.950 S, 55° 51.950 W 6° 48.430 N, 64° 49.670 W 4° 8.030 N, 66° 37.510 W 6° 32.360 N, 67° 21.840 W 3° 46.860 N, 67° 37.730 W 0° 37.060 S, 61° 57.780 W 0° 26.940 N, 61° 27.050 W 3° 39.40 N, 59° 31.690 W 3° 42.970 N, 59° 19.10 W 11° 46.560 S, 50° 43.060 W 3° 41.50 S, 73° 15.840 W 4° 32.740 S, 60° 48.510 W 0° 56.010 N, 51° 0.220 W 2° 24.150 N, 66° 26.150 W 0° 23.10 S, 65° 12.430 W 10° 47.490 S, 65° 20.950 W

MCNG MCNG INPA CPUFMT INPA INPA INPA INPA INPA INPA INPA INPA MPEG INPA INPA INPA INPA MCNG MCNG MCNG MCNG INPA INPA AUM AUM INPA ANSP INPA MPEG MCNG INPA INPA

a Sites where both representatives were collected; Collection abbreviations: MCNG, Museo de Ciencias Naturales de Guanare (Edo. Portuguesa, Venezuela); AUM, Auburn University Museum (Auburn, Alabama, USA); ANSP, Academy of Natural Sciences (Philadelphia, USA); INPA, Instituto Nacional de Pesquisas da Amazônia (Manaus, AM, Brazil); MPEG, Museu Paraense Emilio Goeldi (Belem, PA, Brazil); CPUFMT, Coleção de peixes da Universidade Federal do Mato Grosso (Cuiaba, MT, Brazil).

Table 2 Loci examined in this study, and results from heuristic searches and likelihood ratio tests. Marker

Type

Reference

base pairs

Sa

Model

SH p-valueb direct/allele

clock p-valuec direct/allele

mitf xsrc GnRH3i1 Gpd2i1 1835e6 8680e2,3 14867e1 18049e2 35564e5 36298e1 55305e1 55378e1 TmoM27 CorE7 CorF12 Cpiorcons CinKA7 CmoME5 CmoMJ3 CteOA7 CteOI2 ATPase 8,6

ORF+intron ORF+intron Intron Intron Intron Intron Intron Intron Intron Intron Intron Intron l-Sat flanking l-Sat flanking l-Sat flanking l-Sat flanking Anonymous Anonymous Anonymous Anonymous Anonymous ORF (mtDNA)

Won et al. (2006) Sides and Lydeard (1999) Hassan et al. (2002) ‘‘ Li et al. (2010) ‘‘ ‘‘ ‘‘ ‘‘ ‘‘ ‘‘ ‘‘ Zardoya et al. (1996) This study ‘‘ ‘‘ ‘‘ ‘‘ ‘‘ ‘‘ ‘‘ Willis et al. (2010)

743 747 374 319 632 1099 870 407 766 458 592 807 337 516 380 378 797 494 504 484 765 842

24 32 5 6 20 40 30 12 29 9 11 9 7 9 15 7 9 19 14 16 38 153

HKY HKY+I F81 F81 HKY+I K2P HKY+I F81 HKY HKY HKY JC K2P HKY F81 F81 F81+I HKY F81 JC HKY+I GTR+I+C

0.146/0.092 0.176/0.086 0.065/0.046 1/1 0.094/0.071 0.051/0.143 0.036/0.122 0.045/0.223 0.024/0.024 0.157/0.157 1/0.043 1/1 0.051/0.051 0.386/0.028 0.53/0.016 1/1 0.201/0.201 0.085/0.085 1/0.06 0.46/0.155 0.007/0.001 –

0.95/0.98 0.98/0.86 0.99/0.99 0.99/0.99 0.09/0.99 0.96/0.99 0.99/0.99 0.99/0.99 0.99/0.99 0.99/0.99 0.93/0.99 0.86/0.99 0.99/0.99 0.99/0.99 0.98/0.99 0.99/0.99 0.99/0.99 0.73/0.77 0.74/0.99 0.99/0.84 0.92/0.0002 –

a

Number of variable sites. Shimodaira–Hasegawa test of constrained and unconstrained topologies for direct sequences and alleles. c Likelihood ratio test of the molecular clock hypotheses on constrained topologies of direct sequences and alleles. Bold values are those significant after Bonferroni correction. b

not contain microsatellites, or which provided a large section flanking the microsatellite region, we searched for similar sequences on the NCBI nucleotide database using the BLAST

algorithm (http://ncbi.nlm.nih.gov). To ensure that selected loci had a low degree of conservation and were unlikely to derive from an expressed or multi-copy gene, we excluded those loci that

Please cite this article in press as: Willis, S.C., et al. Multi-locus species tree for the Amazonian peacock basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited nuclear variation. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.07.031

4

S.C. Willis et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

found significant matches from the BLAST search, matches that often included African cichlid taxa for which partial or complete genomes or large EST libraries were available. For those loci with no significant matches, we designed PCR primers to amplify a section of DNA between 300 and 1000 base pairs. Using primers that appeared to amplify their target loci, we amplified these loci in 2, 7, or all 38 individuals of Cichla sequentially, beginning with representatives of more distantly related species, and discontinuing development of those loci that exhibited properties of multi-copy genes (e.g. sites that were heterozygous in all individuals), had multiple null (non-amplifying) alleles, or were completely invariant between the more disparate Cichla taxa. The resulting ‘anonymous’ or ‘microsatellite flanking’ loci were those that amplified reliably and were clearly homologous in Cichla but were not conserved among other cichlids, and exhibited properties of singlecopy nuclear loci in these fishes. Another large class of markers were exon-primed, intron-crossing markers developed through the comparison of fish genomes (Li et al., 2010). These are markers whose primers are situated in conserved, protein-coding elements of the genome but are designed to amplify an intron whose location is conserved, but whose sequence is potentially less so. However, while these loci may be conserved in the majority of the available genomes, it requires confirmation that each marker is present in any particular taxon, and that available primers reliably amplify only that locus. Thus, as with the anonymous loci, we sequentially amplified individuals and discarded loci that were not homologous with the target locus or showed pervasive heterozygosity. Primers for loci developed here or optimized from Li et al. (2010) are listed in Supplemental Table 1 (online). Amplification for Mitf and Xsrc were described before (Willis et al., 2012). For all other loci, PCR reactions contained 20 mM Tris–HCl (pH 8.4), 50 mM KCl, 2.0 mM MgCl2, 200 lM each dNTP, 0.1 lM each primer, 1.5 lL of 20 mg/mL bovine serum albumin (Fermentas), 0.5 U of Takara ExTaq polymerase (proof-reading exonuclease activity), and 3–4 lL DNA extract (10–50 ng/lL) in 30 lL reaction volumes. Thermal cycling used a touchdown protocol of 1 min at 94 °C, 35 cycles of 30 s at 94 °C, 30 s at X °C, and 1.5 min at 72 °C, followed by 10 min at 72 °C., where X was 58 °C for 15 cycles and 54 °C for 15 cycles. Amplicons were sequenced in both directions at the University of Washington High Throughput Genomics Unit (http://htseq.org). Where necessary, such as in the case of heterozygous insertion-deletion mutations, amplicons were cloned via bacterial transformation using the pGEM-T vector (Promega), and between two and ten clones were sequenced to recover both alleles. Sequences for individual loci were assembled using CodonCode Aligner (CodonCode Corp.). Sequences have been submitted to Genbank (JQ926929-JQ926982, KF299741KF300519). Two datasets were constructed. First, sequences for each locus, inferred directly from the chromatograms with IUPAC ambiguity codes and without regard to diploid status (i.e. one per individual, hereafter ‘direct sequences’), were exported from CodonCode Aligner. When necessary (see below), we concatenated datasets using MacClade 4.08 (Maddison and Maddison, 1992), treating each locus as a character set. Second, the two separate diploid gene copies from individual loci (hereafter ‘alleles’) were estimated using Phase (Stephens et al., 2001) and a phase probability of 0.6. We did not include cloned sequences of known phase to inform the analyses (these individuals were entered as unphased), but where appropriate, we compared the posterior Phase estimates with these known sequences. Where phase probability was less than 0.6, we left these sites as ambiguous (Garrick et al., 2010); these largely corresponded to heterozygous singleton sites (mutations only seen in a single individual and which could not be ascribed to either of the two alleles shown by that individual without ambiguity).

2.2. Analysis of concatenated data To estimate the species-level phylogeny of Cichla, we performed a Bayesian analysis of the concatenated loci using MrBayes v3.1.2 (Ronquist and Huelsenbeck, 2003). We allowed each locus in our alignment to reside in a separate partition, and each partition was allowed to have its own mutation rate and parameters. Because sites in each of our loci are physically linked but do not otherwise experience an explicit constraint structure (i.e. most of our loci are introns or intergenic regions, rather than protein-coding genes), this partitioning scheme should accommodate much variance among loci while inferring a common phylogenetic history (e.g. Li et al., 2008). Models of molecular evolution for each character partition (locus) were chosen using the Akaike information criterion (AIC) in MrModeltest (Akaike, 1974; Nylander, 2004). Each partition was allowed unlinked model parameters and mutation rates. Branch lengths used the default exponential prior of 10, and all other model parameters were left as default except for the transition/transversion ratio of TmoM27, which we found difficult to estimate effectively as evidenced by high variance in the posterior distribution. Thus, we applied a Beta prior of (6,3) that approximated the observed ratio of transitions to transversions following recommendations on the MrBayes website (http://mrbayes.csit.fsu.edu/Help/prset.html). We made two simultaneous runs in MrBayes, each with 15 chains and a heating value of 0.1. Each analysis was run for 10 million generations, sampling every 1000 generations, and discarded the first 5 million generations as burn-in. We ensured that parameters and topology had reached a stable distribution using Tracer v1.4.1 (Rambaut and Drummond, 2007) and the online version of AWTY (Nylander et al., 2008), respectively. To estimate the appropriate root of this tree, we took two approaches in two additional analyses. First, we used data from another Neotropical cichlid fish, the putative sister-group of Cichla, Retroculus (López-Fernández et al., 2010), but were only able to obtain data for 5 of the 21 genes: Mitf, Xsrc, TmoM27, GnRH3i1, and Gpd2i1. This analysis was run as above. However, as this taxon is quite divergent from Cichla, we also performed analysis of the concatenated dataset using a molecular clock prior on branch lengths. It has been suggested that this method provides an adequate estimation of the phylogenetic root (Huelsenbeck et al., 2002). We tested the appropriateness of the ultrametric assumption in two ways. First, we calculated the Bayes Factor comparing the two models using the concatenated data (Nylander et al., 2004). All harmonic means and Bayes Factors in this study were calculated using Tracer. Second, we examined the enforcement of a molecular clock on genealogies for individual loci using PAUP* v4b10 (Swofford, 2002) and the optimal models and parameters estimated with MrModeltest. Topologies used for each locus were those recovered from a constrained maximum likelihood search (see below). Comparisons were performed with both direct sequences and alleles, and the significance of the differences was estimated using a likelihood ratio test of nested models (Huelsenbeck and Bull, 1996). 2.3. Partitioned phylogenetic signal in the concatenated analysis In a concatenated analysis, it is important to understand how each locus contributes to the combined phylogenetic signal, especially when there is limited variability per locus and variation in number of informative sites among partitions. One possibility is that the most variable loci dominate the analysis, while signal from the less variable loci is swamped. Alternatively, some loci may increase the variance (i.e. non-systematic error, or ‘noise’) in the information such that the addition of loci actually decreases the phylogenetic signal to noise ratio in the data, decreasing support for the optimal tree (O’Neill et al., 2013). We tested these

Please cite this article in press as: Willis, S.C., et al. Multi-locus species tree for the Amazonian peacock basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited nuclear variation. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.07.031

S.C. Willis et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

conjectures in three ways: (1) topology tests of constrained phylogenetic searches, (2) reduced datasets of randomly chosen loci, and (3) reduced datasets with the most variable loci (described in the following three paragraphs). First, we desired to know if individual gene trees estimated without the influence of other loci would explain those individual datasets significantly better than if each locus was constrained to have a topology similar to the tree recovered with the combined, concatenated dataset. In other words: do individual gene trees disagree with the concatenated tree? If we discovered such a discrepancy, it would suggest that there was unrecognized conflict among loci that was being ignored (or averaged out) in the combined analysis, and which may have important consequences for the accuracy of our phylogenetic estimate. To answer this, we made constrained searches of genealogies for the individual loci under maximumlikelihood and compared these to genealogies from unconstrained searches using the Shimodaira–Hasegawa (SH) test (Shimodaira and Hasegawa, 1999). Constraints were designed to test the support in each individual genealogy for the phylogenetic relationships among delimited species estimated from the full, concatenated dataset (i.e. we did not require species to be monophyletic in the gene trees; see Section 3). Constrained and unconstrained heuristic searches for each locus were conducted in PAUP* using mutation parameters estimated with MrModeltest. Searches consisted of 10,000 random-addition replicates, each limited to 10,000 branch-swapping iterations. Trees for each locus were compared with SH tests in PAUP* using 1000 resampling of estimated log likelihoods (RELL) replicates. Searches were made for both direct sequences and alleles. While gene tree-species tree conflict is concerning, at the level of divergence observed in our loci, it is uncertain how much apparent conflict among individual genealogies and with the combined tree is a product of real phylogenetic signal versus estimation error due to mutational homoplasy. A more pertinent question is ‘how do these loci interact in a combined analysis to direct the final outcome?’ Traditional phylogenetic theory suggests that total evidence approaches facilitate the emergence of a common phylogenetic signal while the influence of hypothetically random noise is diminished (Baker and DeSalle, 1997; Wenzel and Siddall, 1999). If this is true, then as loci are added to the analysis, they should increase the phylogenetic signal irrespective of variability, and increase the accuracy of the analysis (e.g. as measured by posterior support or variance). However, intuitively it would seem that there must be a minimum average level of variability in the data that is necessary for phylogenetic signal to become emergent, and if the loci used are below this threshold, then adding more loci could actually cause the optimality scores of the combined tree to decrease compared to trees from less data (e.g. O’Neill et al., 2013; Yang, 1998). In order to understand whether this was true for our dataset, we evaluated the changes in phylogenetic signal among reduced combinations of loci. We created jackknife (random deletion) replicates of loci to examine the variation in phylogenetic signal for our unrooted tree. Using Microsoft Excel and PAUP*, we created 10 datasets each of 5, 10, or 15 randomly chosen loci and concatenated these together. We analyzed these jackknifed matrices in MrBayes runs of 5 million generations, sampled every 500 generations and discarded the first 2.5 million generations as burn-in. Each analysis used two simultaneous runs with 15 chains heated at 0.1. We then used topological filters in PAUP* to estimate the proportion of trees in the posterior distribution of each replicate that recovered each clade from our 21-locus concatenated tree (i.e. posterior probability for branches in the full dataset tree from each sub-matrix). As we observed posterior probabilities for branches that were slightly higher for the full dataset when the molecular clock was enforced (see Section 3), indicating steeper ‘hills’ in ‘tree-space’ with this model, we used this same prior here,

5

expecting that the resulting congruence/incongruence between analyses would be more discrete with this model. In addition to posterior probability, we also calculated the K tree score, which measures the difference in topology and scaled branch lengths between a comparison and reference tree, with larger values representing greater discrepancy (Soria-Carrasco et al., 2007). To do this, for each replicate we calculated the maximum credibility tree (the single topology with the maximum product of compatible posterior clade probabilities, hereafter MC tree) with node depths averaged over the entire posterior using TreeAnnotator v1.5.3 (Drummond and Rambaut, 2007). The MC tree of each replicate was compared to the MC tree of the full dataset, and the K score for each replicate calculated. While the MC tree provides a single point estimate of topology and branch lengths from the posterior to compare between analyses, we expected that these averages would be a better descriptor of the central tendency of the posterior than calculating K distances from a small random sample of trees. The conjecture tested above is that phylogenetic signal increases as loci are added to the analysis, implying that any conflicts between the reduced matrices and the 21-locus matrix arise from inadequate signal to noise ratios in the reduced datasets. However, an alternative possibility is that some of the reduced matrices, those with more variable loci, are recovering the ‘true’ phylogeny, while the combined matrix analysis is pulled towards the ‘wrong’ tree by mutational noise (homoplasy) introduced by the less variable loci. In essence, this is the reasoning behind choosing the most variable loci available for a species-level phylogenetic analysis, and if correct, would imply that there is an advantage to using fewer, but more variable loci rather than more, but on average less variable loci. To test this and observe how phylogenetic signal among randomly chosen loci compared to loci chosen for having aboveaverage variability, we created matrices with the 5, 10, and 15 most variable loci (hereafter, ‘reduced-but-more-variable’ matrices) and performed MrBayes runs with identical conditions and constraints. As before, we calculated posterior support in these datasets for branches from the 21-locus concatenated tree, and the K score of each reduced-but-more-variable MC tree compared to the 21-locus tree. More importantly, if the above conjecture (that the more variable matrices recovered the true tree and the larger matrix was mislead by noise) were true, one would expect phylogenetic signal from the reduced-but-more-variable matrices to be stronger (i.e. posterior tree space to be less ‘diffuse’ or ‘flat’) than for the full matrix, because there is less noise to mislead the analysis. Thus, we also examined the posterior signal in each of these reduced-but-more-variable matrices for their own optimal trees. Specifically, we observed the frequency of the most common tree in the posterior distribution for each analysis (a.k.a. the maximum a posteriori tree, or MAP tree) from the MrBayes output. If tree space is more acute for a matrix, we expect fewer trees to make up a larger proportion of posterior tree space (i.e. for the posterior to be more skewed). Similarly, we calculated the highest log clade credibility (log product of clade posteriors in the MC tree) for each of the four matrices (three reduced-but-more-variable and one full) using TreeAnnotator. Larger (less negative) log products indicate higher posterior support for clades in the MC tree, indicating less uncertainty about branching patterns in the posterior tree distribution. Both of these statistics give an estimate of how acute or flat ‘tree-space’ is for each dataset, or in other words, how well do the inferred phylogenies explain their matrices. 2.4. Bayesian concordance analysis Our Bayesian concordance analysis was implemented using BUCKy v1.4 (Ane et al., 2007). This analysis depends on examining the frequency with which the same clades are observed across

Please cite this article in press as: Willis, S.C., et al. Multi-locus species tree for the Amazonian peacock basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited nuclear variation. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.07.031

6

S.C. Willis et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

individual genealogies, and requires summarizing the posterior tree distribution of each locus by identifying the frequency of unique topologies using mbsum. To do this, we made two simultaneous runs in MrBayes with all 38 taxa (direct sequences) for each gene, using the models implemented in the concatenated analysis. As these matrices were smaller than the previous ones, each of these analyses included 10 chains with a heating of 0.15 and was run for 30 million generations, sampling every 500 generations. These analyses converged very early, so we discarded only 10,000 of 60,000 trees from each run as burn-in, providing 100,000 trees per locus. However, in our initial tests using 38 taxa, we discovered that almost all topologies were unique within single-locus posterior tree samples. We determined that this result stemmed from the sharing of identical sequences across individuals and species and the fact that MrBayes always samples fully resolved trees, meaning that a great number of random topologies were generated for those 3+ individuals that shared alleles. We worked around this issue by reducing the number of terminals analyzed in our BCA to one representative for each of the 15 described species, and two for C. pinima (see Section 3). After pruning all but these 16 taxa from the 21 individual locus trees using PAUP*, these trees were summarized using mbsum (which determines the frequency of each topology in the posterior distribution), and the 21 tree summaries were used as input for BUCKy. This program was run for 1 million generations with 10 chains after 100,000 generations of burn-in. The parameter, the prior prediction for the level of concordance among separate genealogies, was left as the default 1.0 for these analyses. For 16 taxa and 21 loci, this prior produces a distribution of 1–8 topologies with a mode of 3 different underlying genealogies among the 21 loci. The output of BUCKy is the primary concordance tree from these loci, which includes the mean proportion of loci supporting each branch in the tree. 2.5. Coalescent species trees While a Bayesian concordance analysis builds a phylogenetic tree from the posterior distributions of separately estimated gene trees, and thus incorporates some uncertainty in those estimates into the analysis, it does not use any other model to assist in evaluating the probability of those gene trees other than the observation of common topologies among loci (Ane et al., 2007). A coalescent-based species tree model (Rannala and Yang, 2003), on the other hand, estimates the posterior distribution of the gene trees in conjunction with the species tree in which they are nested, including the divergence times and population sizes of that coalescent model. Using the coalescent species tree procedure in Beast (‘‘*Beast’’) (Heled and Drummond, 2010), we compared the species trees estimated under three dataset arrangements, each of which used allelic sequences as input. First, we estimated the phylogeny of the delimited species of Cichla using the data from one individual per species (two alleles) under both relaxed and strict molecular clock models (Drummond et al., 2006), and for those with strict clocks, we ran models using continuous with constant root in addition to the constant population sizes used in all other analytical constructions. All models were run with a birth-death prior on node density (branching rates). To test the applicability of the strict molecular clocks, in addition to the likelihood ratio tests of constrained genealogies of alleles (see above), we observed the posterior distributions of the coefficientofvariation parameters for each locus under uncorrelated lognormal relaxed clock models. This parameter represents the degree of variation in rates among branches, and when the posterior distribution of this parameter abuts zero, it implies that the data cannot reject a strict clock model. Two runs for each model were run for 500 million generations, sampling every 10,000 generations, and discarding the first 25,000 trees of each run. We observed effective sample size (ESS) values

for each run using Tracer (with 50% burn-in), calculated the maximum clade credibility tree using Treeannotator, and estimated branch support for this tree using PAUP*. A remaining question about coalescent species trees, however, is how sensitive they are to the choice of operational taxonomic units (OTUs), that is, to what are being assumed to be ‘species’. While these analyses provide for gene tree heterogeneity in the analysis, and in fact use these patterns to estimate species tree attributes, presently most of these analyses assume that all gene tree heterogeneity results from the deep coalescence of lineages (i.e. retention of ancestral polymorphism) and not interspecies gene flow (but see Meng and Kubatko, 2009). Depending on the pattern of gene sharing in the data, the inclusion of introgressed gene lineages in the analysis can produce inaccurate results by presenting an inflated pattern of deep coalescence in the data (Eckert and Carstens, 2008; see also Willis, 2011). To estimate the effects of OTU choice on species tree estimates (see Introduction), we also ran two additional analyses with all available nuclear data using strict molecular clocks and constant population sizes. The first used delimited species as OTUs (8), and the second used the described species as OTUs (15). Each analysis was run as above.

3. Results We were able to obtain a sequence for every individual for every locus except one, Cichla pinima from the Paru for locus 1835e6, for a dataset with only 0.12% missing data. Sequences from the 38 individuals showed a range of variability across loci (Table 2), but the number of variable sites was overall relatively low for molecular phylogenetic data (5–40 substitutions per locus, median 14; 1.1–5.0% of sites; total of 218 parsimony informative sites), especially in comparison to mitochondrial data for these same individuals (153 substitutions, 18% of sites; Willis et al., 2010). Not surprisingly, models of evolution chosen for these loci were relatively simple, with the most complex being the HKY model plus a proportion of invariant sites. Bayesian analysis of the concatenated 21-locus dataset with MrBayes provided a posterior distribution with a harmonic mean log likelihood of –20,286.9 (Fig. 1) (Treebase 14549). The rooting of this tree was supported by both the outgroup analysis (21022.7) and molecular clock analyses (20294.4) in MrBayes (Supplemental Fig. 2). Comparison of the clock and unrooted (non-clock) analyses using Bayes Factors (2B10 = 15) provided ‘‘very strong’’ support for rejection of the molecular clock, following the updated table in Nylander et al. (2004). However, when individual loci were tested for clock-like nature using constrained genealogies in a likelihood ratio test, only one locus showed a significant deviation, and then only for phased alleles (Table 1). We repeated our analyses without this locus (CteOI2) and obtained qualitatively similar results. Tree length for the posterior distribution was quite short overall (mean TL = 0.0399, with a 95% highest posterior density, or HPD, of 3.6–4.4  102), particularly compared to the trees including the outgroup Retroculus (mean TL = 0.13). Interestingly, the molecular clock tree was shorter than our unrooted tree (mean TL = 0.0315, with a 95% HPD of 2.8– 3.5  102). The phylogeny from the unrooted analysis of concatenated nuclear data showed great similarity to previous mtDNA genealogies of Cichla (Willis et al., 2010; Willis et al., 2012). It recovers the two main clades of Cichla (clade A and clade B) and monophyly of the eight delimited species. Unlike the mtDNA tree, however, it fully resolves the relationships of species in clade A. Importantly, the named units within C. pinima sensu lato (C. pinima, C. thyrorus, C. vazzoleri, and C. jariina) are divided into two mixed clades, not unlike the mtDNA tree or microsatellite clusters for these populations

Please cite this article in press as: Willis, S.C., et al. Multi-locus species tree for the Amazonian peacock basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited nuclear variation. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.07.031

S.C. Willis et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

7

Fig. 1. Phylogram of the maximum credibility tree from the unrooted (no outgroup, no molecular clock) MrBayes analysis of 21 nuclear loci, with branch lengths averaged across the posterior sample. Values associated with nodes are posterior probability without/with the molecular clock prior. Asterisks denote nodes that were retained in the constraint tree for the individual locus searches (see text). ° Taxa representatives in the Bayesian concordance analyses. Taxa representatives of the species complexes in the one individual (two allele) coalescent analyses.

(Willis et al., 2012). Similarly, while the named subunits of C. ocellaris sensu lato form a clade (C. ocellaris s.s., C. pleiozona, C. kelberi, C. monoculus, C. nigromaculata), Cichla monoculus, a geographically widespread species, was paraphyletic with respect to C. ocellaris sensu stricto and C. nigromaculata, both geographically restricted species. Also, in contrast to the mtDNA genealogy, C. kelberi has taken the place as most-divergent subpopulation rather than C. pleiozona. Also noteworthy is the recovery of C. orinocensis as monophyletic. Even though it was observed to be a cohesive species using microsatellite data, it exhibits two polyphyletic groups of mtDNA, one of which is nested among the C. ocellaris s.l. populations (Willis et al., 2012; Willis et al., 2007). Trees from the outgroup and molecular clock analyses were very similar to the unrooted tree, except that the molecular clock tree recovered a different topology within C. pinima s.l. (Fig. 1, node d), and that the outgroup-rooted tree recoveredC. orinocensis as paraphyletic with respect to C. intermedia (not shown). Support for most branches was slightly to moderately higher in the molecular clock analysis than the unrooted tree. Similarly, the MAP tree (the most common tree topology in the posterior tree sample) of the unrooted analysis was encountered 1.9% of the time, while for the clock model this was 10.3% of the posterior. This suggests that the optimal tree topology island is much smaller or more acute for the clock model than its unconstrained counterpart. Topologies of direct sequences and alleles from individual loci in unconstrained ML analyses all showed several to many polytomies among individuals of delimited species, frequently deep in the tree; a majority-rule consensus of these trees was a comb, with no resolved branches (not shown). Topologies of individual loci that were constrained to match species relationships depicted by the 21-locus concatenated tree showed a range of incongruence

with their unconstrained counterparts (Table 1). Constrained trees for several loci were significantly less likely at an alpha of 0.05, but only one locus (CteOI2) was significantly incongruent after a Bonferroni correction for multiple related tests (operational p-value = 0.05/21 = 0.0024). Moreover, it is difficult to estimate the impact of noise to signal in these analyses with so few variable sites. Therefore we examined the support for our concatenated tree using different subsets of data under a molecular clock. Importantly, we observed that posterior probability for the clades in

Fig. 2. Graph of posterior support for the 21-locus concatenated molecular clock topology using reduced or full datasets (mean posterior support across nodes; dashed line) and K scores between MC trees of each replicate and the full dataset (solid line). For the random locus replicates, points are means across replicates with 95% confidence intervals indicated.

Please cite this article in press as: Willis, S.C., et al. Multi-locus species tree for the Amazonian peacock basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited nuclear variation. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.07.031

8

S.C. Willis et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

our 21-locus molecular clock tree increased as the number of loci analyzed increased (Fig. 2). As portrayed by the K values, the MC trees from each replicate differed from the 21-locus tree, although the discrepancy decreased as more loci were added. Interestingly, the posterior probability from the 10 most variable loci was not significantly different from that from the 15 random loci (although this may have been conflated by a limited total number of loci). Nevertheless, even the 15 most variable loci (which portrayed virtually the same topology) did not exhibit the same level of posterior support as our 21-locus analysis, implying that the least variable 6 loci nevertheless added important phylogenetic signal to the dataset. We saw a similar pattern with the K scores: for each dataset size, more variable loci were superior to random loci, but even the largest and most variable replicate showed a degree of discrepancy with the full dataset. However, as discussed above, an alternative hypothesis is that the reduced-but-more-variable datasets could be recovering the true tree, while the less variable loci serve to disrupt and pull the analysis towards a tree that is less optimal for those datasets. This would imply that these reduced datasets would exhibit more posterior support for their respective optimal trees than the 21-locus dataset shows for its tree; that is, the tree space for the reducedbut-more-variable datasets should be more acute than that of the full dataset. In fact, however, not only was each set of reducedbut-more-variable loci less supportive of our 21-locus tree, but each was also less supportive of their own ‘optimal’ trees. For example, the MAP trees for the 5, 10, and 15 locus reduced-butmore-variable analyses only made up 0.1%, 0.8%, and 1.8% of each posterior distribution, respectively (versus 10.3% for the 21-locus tree), and none of these topologies matched the maximum credibility (MC) tree from each respective posterior (Supplemental Fig. 3). For both the clock and unrooted analyses of the full dataset, the MAP tree was also the MC tree. Similarly, the highest clade credibility (the log product of posterior probabilities for clades in the MC tree, where less negative values imply higher posterior branch support) was 8.59, 5.79, 4.95, and 2.37 for the 5, 10, and 15 most variable and full, 21-locus analyses, respectively, showing that posterior support for branches was more diffuse in each of the reduced data analyses compared to the full dataset. These results strongly imply that less variable loci have important phylogenetic signal to contribute, and that it is not completely obscured by mutational noise. The primary concordance tree from the Bayesian concordance analyses showed general congruence with our concatenated trees, especially with regard to the delimited species (Fig. 3). One significant incongruence was the topology within the C. ocellaris sensu

Fig. 3. Primary concordance tree from the Bayesian concordance analyses (16 OTUs). Values associated with branches are the proportion of loci supporting a branch.

lato group, where C. pleiozona replaced C. kelberi on the first branch to diverge (as in the mtDNA genealogy; Willis et al., 2010). However, as in the concatenated tree, we found that C. pinima sensu lato was still divided into two groups, with C. pinima sensu stricto polyphyletic between these. In this tree, concordance values for support among loci were variable, but overall quite low, with the lowest support associated with branches that were observed less often in the concatenated analyses also (Fig. 3). Using BEAST, we determined that strict molecular clocks were satisfactory for this dataset, as the posterior for the coefficientofvariation for each locus in the relaxed clock analysis included zero. We also saw no significant differences in topology or branch confidence between different population models (not shown). The topologies in the analyses of delimited species with one individual per species (–LnL 18,785.314), and all data (–LnL 20,707.348), showed identical topologies, and these matched the relationships among species inferred from the concatenated and concordance trees (Fig. 4). Posterior branch support for several of those nodes found in common were slightly lower than in the concatenated trees, however, and slightly lower in the BEAST analysis of all data than of one individual. On the other hand, while confidence intervals on node ages were in general quite large for both coalescent analyses, they were moderately smaller with all data versus one individual (mean width of 95% HPD interval 0.00068 versus 0.00121 mutations/site, respectively; paired t-test, p = 0.0133), although this was offset by a slightly deeper root depth in the one individual analysis (mean width of 2.08  103 mutations/site, 95% HPD 8.39  104 to 3.53  103) than the analysis with all data (mean width of 1.43  103 mutations/site, 95% HPD 7.55  104 to 2.20  103). Confidence intervals from the analysis with described species as OTUs were similarly large (mean width of 0.00064 mutations/site), and the means of node ages for nodes shared with the analysis with delimited species (all data) were not significantly different (paired t-test, p = 0.1408). Also of note, there were no major changes in posterior branch support for branches in common between the analyses with delimited or described species as OTUs, but the branches within the sensu lato

Fig. 4. Phylograms of the maximum credibility trees from the *BEAST multi-species coalescent analyses. (a) Delimited species OTUs. (b) Described species OTUs. Bars portray the 95% highest posterior densities on node ages (ultrametric branch lengths). Values associated with branches are percent posterior probability (for delimited species: support from analyses with one representative/all data).

Please cite this article in press as: Willis, S.C., et al. Multi-locus species tree for the Amazonian peacock basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited nuclear variation. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.07.031

S.C. Willis et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

species were quite short and/or generally not well supported (Fig. 4). In this tree with described species as OTUs, C. pleiozona was once again the first to diverge within C. ocellaris s.l., as in the concordance tree and mtDNA genealogy, not C. kelberi, as in the concatenated nuclear tree.

4. Discussion 4.1. Species-level phylogenetics The inference of evolutionary relationships among closely related species is an important step in recovering the tree of life, but species-level phylogenies present special challenges. Among these are the gene tree-species tree conflicts that result from porous species boundaries and the retention of ancestral polymorphism (Maddison, 1997; Maddison and Knowles, 2006). Even when species do not exchange genes among them, short times between speciation events and large population sizes decrease the chance that ancestral variation will sort to fixation between successive speciation events (Pamilo and Nei, 1988). The result is that gene trees may not match the underlying branching patterns and divergence times between species, although more recent simulation studies have suggested that given realistic mutation rates, the estimates of gene trees from species trees with short coalescent branches are more likely to be polytomies than the incorrect topologies (Huang and Knowles, 2009). Nevertheless, it is important to understand how the phylogenetic signal from unlinked loci compare in combined and separate analyses (Bull et al., 1993). In combined analysis, we saw that although there was a degree of diminishing returns as loci were added to the analysis, these loci nevertheless contributed important phylogenetic signal to the dataset, even when they were less variable loci (Fig. 2). Moreover, the most variable loci did not dominate the analysis, as only the 15-locus reduced-but-more-variable dataset recovered the species relationships inferred from the analyses of all data, and even then with lower branch support than the full dataset. It remains to be seen if additional loci could change the recovered topology, but we observed that most nodes in the tree were very well supported, especially under a molecular clock prior. In separate analyses, we saw that concordance across loci was apparently quite low, although this result might be deceptive. When individual loci were compared to our 21-locus concatenated tree using constrained heuristic searches and topology tests, most loci showed some degree of incongruence (Table 1). However, when phylogenetic signal was compared while taking into account uncertainty in the topologies of individual loci, as in our Bayesian concordance analysis, a phylogeny very similar to our concatenated tree emerged, only differing in the topology within species complexes. Although support for individual branches in this concordance tree were relatively low, we suspect that the low concordance values result, at least in part, from the relatively low number of variable sites across loci and the retention of ancestral alleles among species. As concordance is measured as the proportion of loci exhibiting a clade in their posterior distributions, it is important to note that, in addition to conflicting clades among gene trees, lack of resolution for a group of species in a gene tree (sharing alleles, i.e. polytomies in gene trees) might also result in low concordance values. We observed that phylogenetic estimates employing coalescent models exhibited relationships among delimited species identical to those from the concatenated trees and concordance analyses (Fig. 4). Where these analyses diverged was in the relationships among populations that we recently determined to be synonyms or ESUs. In particular, placement of C. kelberi and C. pleiozona within C. ocellaris sensu lato differed between the concatenated and coa-

9

lescent analyses, with relatively high posterior branch support in each. While this may suggest that these branches fall in a zone of statistical inconsistency for concatenated analyses (Kubatko and Degnan, 2007), we interpret this incongruence to reinforce an intuitive but neglected conclusion: that the choice of OTUs for specieslevel phylogenies is fundamental. While newer phylogenetic methods accommodate gene tree conflicts resulting from the retention of ancestral polymorphism, incorporation of taxa that have experienced post-speciation gene exchange can produce unpredictable results (Eckert and Carstens, 2008). We suggest that where OTUs are subject to fluctuating taxonomy or porous species boundaries, conflicts of this sort are to be expected, and do not necessarily imply a failing of the methods but rather violations of their assumptions. Given adequate species OTUs, a likely goal of many species tree analyses will be to time calibrate these phylogenies and use them as the basis for evolutionary analyses. However, one observation from our coalescent analyses was the lack of confidence in node ages (Fig. 4). The confidence intervals on node ages were relatively large in these analyses, and improved only slightly with the addition of more individuals. Adding more individuals to a dataset has repeatedly been observed to provide increased accuracy in species tree estimation (e.g. Hird et al., 2010; Huang et al., 2010; Maddison and Knowles, 2006), though precision has been less thoroughly examined (but see below). The wide confidence intervals we observed imply that branch lengths and node ages in these analyses are susceptible to influence of the branch length prior (here, the birth-death prior) (Brown and Yang, 2010). While using strict molecular clocks in a partitioned analysis of concatenated data produced smaller confidence intervals (Suppl. Fig. 1), this procedure ignores the variation in topology and coalescence times among unlinked loci (Edwards and Beerli, 2000) and, moreover, does not guarantee that node ages are estimated without excessive influence from the prior. Indeed, in a dated analysis of concatenated data from 20 of the genes from the present dataset, we found that node ages from a constrained topology were very similar to those predicted by the branching priors (Yule or birth-death) with an empty dataset, despite using strict molecular clocks for each locus (Willis, 2011). This is in accordance with observations by a study of simulated data from recent divergences (Brown and Yang, 2010), wherein it was found that due to the restricted degree of mutational variation in most loci from such phylogenies, dated analyses were very susceptible to prior influence. These authors recommended the inclusion of genes with a relatively high mutation rate (>0.01 mutations/site/million years), whereas the estimated mutation rate for our data was much less (0.00026 mutations/site/million years; Willis, 2011). For most taxa, as Brown and Yang (2010) suggest, this probably means including an mtDNA locus in the dataset. However, due to it is proclivity for crossing species boundaries, many have now become suspicious of the accuracy of mtDNA in representing phylogenetic history (e.g. Funk and Omland, 2003). Thus, without the inclusion of fast-evolving cytoplasmic DNA, it remains to be seen whether it will be possible to infer divergence times with useful confidence intervals and without excessive influence from the priors based on nuclear loci that exhibit the levels of variation seen in our dataset. The paucity of nucleotide variation among species at nuclear markers is a fundamental challenge for the recovery of shallow phylogenies, potentially masking or artificially implying conflict among gene trees and reducing confidence in node age estimates. There has been considerable discussion in the systematic literature regarding the use of rapidly evolving versus more slowly evolving molecular markers for phylogenetic estimation, particularly with respect to how rapidly evolving loci are subject to saturation (recurrent mutation), compositional or rate heterogeneity, etc.,

Please cite this article in press as: Willis, S.C., et al. Multi-locus species tree for the Amazonian peacock basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited nuclear variation. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.07.031

10

S.C. Willis et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

and the best strategies at extracting phylogenetic signal from these loci to reconstruct branches deeper in the phylogenetic tree (e.g. Hasegawa and Hashimoto, 1993; Steel et al., 1993). However, there has been much less examination of loci with very few variable sites other than a recognition that it takes a certain number of characters to build a phylogenetic tree and attain adequate accuracy (e.g. Yang, 1998). One strategy for dealing with this dearth of useful variation is to screen candidate nuclear loci for mutations and preferentially use those loci that exhibit more variation, which can provide greater resolution, if not higher confidence, in individual gene trees. However, according to our results, this strategy makes sense only insofar as it does not come at the cost of using more total loci. This is in keeping with results of a recent study wherein the authors examined the effects of the number of loci, number of individuals, and number of nucleotide sites on recovery of an empirical species tree using the multi-species coalescent model (Carmago et al., 2012). As expected, this study reported that more loci produced more accurate results, although they observed that accuracy (as measured by K distances) peaked after the inclusion of only 8 loci (of 20), while we observed posterior support to increase and K distances to decrease until the inclusion of all 21 loci. In our data, although analysis of the 15 most variable loci produced a tree with the same relationships for delimited species as the full-dataset tree, this reduced dataset produced lower posterior branch support than the full dataset (Fig. 2 and Supplemental Fig. 3). One significant difference between these two datasets, however, is that the former study examined these dynamics using empirical data of somewhat higher variability than were available to us (mean of 50 variable sites per locus), implying that if individual gene trees are more informative, fewer may be necessary. However, it remains to be seen if, in general, loci with a mutation rate significantly elevated among most of their cohort produce a trustworthy signal of phylogeny. If the background variation among species’ genomes is generally low even across genetic structural classes (e.g. in introns or intergenic regions), as we observed here, it forces one to wonder why a particular locus would show a level of variation higher than most non-coding loci. Optimally, this pattern would simply represent a lower degree of conservation at that locus, but a number of other cryptic patterns could also produce such elevated variation, such as unidentified paralogy, concerted evolution, or diversifying selection (Zhang and Hewitt, 2003). While initially attractive, incorporating such loci with unrecognized evolutionary patterns may contribute to the lack of accuracy and precision in the estimation of empirical species trees. 4.2. Phylogeny of Cichla Our inferred phylogeny of Cichla shows much congruence with previous phylogenetic estimates, with some important differences. Perhaps most intriguingly, the topology of clade A species was fully resolved except for the C. pinima sensu lato group, which is, nevertheless, monophyletic. In previous mtDNA genealogies (Willis et al., 2012; Willis et al., 2010), a number of soft polytomies and paraphyletic groups were observed at the base of this clade. The most striking of these, the two mtDNA clades of C. pinima s.l., were still recovered in our concatenated tree, but they were observed to be monophyletic here rather than polyphyletic with respect to other species in Clade A. In our analysis of species boundaries in Cichla using mtDNA, microsatellites, and sequences of two nuclear genes (Willis et al., 2012), we inferred that described species in this C. pinima s.l. group share ongoing gene flow and thus cannot unambiguously be considered separate species. The genetic pattern among localities was more suggestive of either one species with complex population structure or two species with extensive hybridization, but not the four species that are described. The pattern in this concatenated phylogeny is similarly complex, with

individuals of several of the described species nested among C. pinima sensu stricto (Fig. 1). Similarly, in the concatenated phylogeny, C. orinocensis is recovered as monophyletic. While microsatellites portrayed this as a single, interbreeding species, individuals of C. orinocensis exhibited two exclusive but divergent mtDNA clades that were polyphyletic with respect to other species in clade B (Willis et al., 2012). If C. orinocensis is truly more closely related to C. intermedia than the C. ocellaris sensu lato group, as portrayed in this nuclear phylogeny, the question remains regarding the origin of this polyphyletic mtDNA. While ancient mitochondrial capture seems a likely hypothesis (Willis et al., 2007), comparison with the distribution of coalescent patterns among other loci will be necessary to adequately test this hypothesis. Also in clade B, we observed that in the topology within C. ocellaris sensu lato, the subunits showed a good deal of ambiguity among different analytical methods, and between the nuclear and mtDNA data. While it is difficult to know from the current results what portion of this ambiguity results from inefficient utilization of phylogenetic signal, it is nonetheless congruent with our recent interpretation of these described species as evolutionary significant units of a more inclusive species that exchange genes over evolutionary time (Willis et al., 2012). The phylogeny of Cichla recovered here using molecular data and Bayesian analyses shows less congruence with the morphology-based phylogeny of Kullander and Ferreira (2006) inferred using parsimony. Although based on only 11 characters and with scant nodal support, one prominent result was the grouping of C. intermedia with species from our clade A. While Cichla intermedia generally show more elongate and streamlined body shape than other clade B species, this is likely a secondary adaptation for utilization of higher current velocity habitats than other species in this clade (Jepsen et al., 1997; Winemiller, 2001). Moreover, both mtDNA and nuclear DNA place C. intermedia in clade B with high support (Fig. 1). Thus, in this case it appears that the morphological characters examined by Kullander and Ferreira (2006) are subject to some degree of homoplasy. 5. Conclusions Despite having relatively few variable characters per locus and in total, we inferred a strongly supported phylogeny of Cichla. It appears that even when the overall mutation rate for a group is constrained, adding more loci can allow for the emergence of a combined phylogenetic signal. While there is some evidence from our results that preferentially utilizing nuclear loci that show a higher level of variability can improve the support for a combined posterior tree, the phylogenetic signal from additional, less-variable loci should not be underestimated. We also observed that even though apparent conflict among loci produced incongruence with the combined phylogenetic estimate, analyses accounting for uncertainty in the topologies of individual genes nevertheless provided a phylogenetic estimate that was highly congruent with our concatenated tree. While the coalescent methods produced a tree that was largely in agreement with other trees and is expected to be more statistically consistent, it remains to be seen if this method can be optimized to produce confidence intervals on node ages useful for testing temporally contingent hypotheses. Acknowledgments The authors appreciate those who contributed tissues or assisted in collection, in particular V. Machado, N. Meliciano, C. Montaña, D. Ribeiro, and P. Reiss. Tissues were collected, stored, and utilized under permits from MARN (Venezuela) and ICMBIO (Brazil: permit for collection No. 031/2003, 045/IBAMA, 148/

Please cite this article in press as: Willis, S.C., et al. Multi-locus species tree for the Amazonian peacock basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited nuclear variation. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.07.031

S.C. Willis et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

2006-DIFAP/IBAMA, permit for access to genetic resources in Brazil No. 034/2005/IBAMA, and Permanent IBAMA License 11325-1/ 2007). We acknowledge funding from the US National Science Foundation (OISE DDEP 0937609), Society of Systematic Biologists Graduate Research Award, UNL Research Cluster, International Foundation for Science, CNPq (CNPq/PPG7 (557090/2005–2009), (554057/2006–2009)), The George Washington University, and FAPEAM. Most analyses were run on the Bioinformatics Core Research Facility computer cluster at the University of NebraskaLincoln. The authors wish to acknoweldge three anonymous reviewers who provided useful comments to earlier versions of this manuscript. Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.ympev.2013. 07.031. References Akaike, H., 1974. A new look at the statistical model identification. IEEE Transactions of Automatic Control 19, 716–723. Albert, J., Crampton, W.G.R., 2005. Diversity and phylogeny of Neotropical electric fishes (Gymnotiformes). In: Bullock, T.H., Hopkins, C.D., Popper, A.N., Fay, R.R. (Eds.), Electroreception. Springer (Chapter 13). Ane, C., Larget, B., Baum, D.A., Smith, S.D., Rokas, A., 2007. Bayesian estimation of concordance among gene trees. Molecular Biology and Evolution 24, 412–426. Avise, J.C., Robinson, T.J., 2008. Hemiplasy: a new term in the lexicon of phylogenetics. Systematic Biology 57, 503–507. Baker, R., DeSalle, R., 1997. Multiple sources of character information and the phylogeny of Hawaiian drosopholids. Systematic Biology 46, 654–673. Bermingham, E., McCafferty, S., Martin, A., 1997. Fish biogeography and molecular clocks: perspectives from the Panamanian Isthmus. In: Kocher, T., Stepien, C. (Eds.), Molecular Systematics of Fishes. Academic Press, New York, pp. 113–126. Brito, P.H., Edwards, S.V., 2009. Multilocus phylogeography and phylogenetics using sequence-based markers. Genetica 135, 439–455. Brown, R.P., Yang, Z., 2010. Bayesian dating of shallow phylogenies with a relaxed clock. Systematic Biology 59, 119–131. Buckley, T.R., Simon, C., Chambers, G.K., 2001. Exploring among-site rate variation models in a maximum likelihood framework using empirical data: effects of model assumptions on estimates of topology, branch lengths, and bootstrap support. Systematic Biology 50, 67–86. Bull, J.J., Huelsenbeck, J.P., Cunningham, C.W., Swofford, D.L., Waddel, P.J., 1993. Partitioning and combining data in phylogenetic analysis. Systematic Biology 42, 384–397. Bull, V., Beltran, M., Jiggins, C.D., McMillan, W.O., Bermingham, E., Mallet, J., 2006. Polyphyly and gene flow between non-sibling Heliconius species. BMC Biology 4, 11. Carmago, A., Avila, L.J., Morando, M., Sites Jr., J.W., 2012. Accuracy and Precision of Species Trees: Effects of Locus, Individual, and Base Pair Sampling on Inference of Species Trees in Lizards of the Liolaemus darwinii Group (Squamata, Liolaemidae). Systematic Biology 61, 272–288. Drummond, A.J., Ho, S.Y., Phillips, M.J., Rambaut, A., 2006. Relaxed phylogenetics and dating with confidence. PLoS Biology 4, e88. Drummond, A.J., Rambaut, A., 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology 7, 214. Eckert, A.J., Carstens, B.C., 2008. Does gene flow destroy phylogenetic signal? The performance of three methods for estimating species phylogenies in the presence of gene flow. Molecular Phylogenetics and Evolution 49, 832–842. Edwards, S.V., 2008. Is a new and general theory of molecular systematics emerging? Evolution 63, 1–19. Edwards, S.V., Beerli, P., 2000. Gene divergence, population divergence, and the variance in coalescence time in phylogeographic studies. Evolution 54, 1839– 1854. Edwards, S.V., Liu, L., Pearl, D.K., 2007. High-resolution species trees without concatenation. Proc. Natl. Acad. Sci. USA 104, 5936–5941. Funk, D.J., Omland, K.E., 2003. Species-level paraphyly and polyphyly: frequency, causes, and consequences, with insights from animal mitochondrial DNA. Annual Review of Ecology Evolution and Systematics 34, 397–423. Garrick, R.C., Sunnucks, P., Dyer, R.J., 2010. Nuclear gene phylogeography using PHASE: dealing with unresolved genotypes, lost alleles, and systematic bias in parameter estimation. BMC Evolutionary Biology 10, 118. Hasegawa, M., Hashimoto, T., 1993. Ribosomal RNA trees misleading? Nature 361, 23. Hassan, M., Lemaire, C., Fauvelot, C., Bonhomme, F., 2002. Seventeen new exonprimed intron-crossing polymerase chain reaction amplifiable introns in fish. Molecular Ecology Notes 2, 334–340.

11

Heled, J., Drummond, A.J., 2010. Bayesian inference of species tree from multilocus data. Molecular Biology and Evolution 27, 570–580. Hillis, D.M., 1995. Approaches for assessing phylogenetic accuracy. Systematic Biology 44, 3–16. Hird, S., Kubatko, L.S., Carstens, B.C., 2010. Rapid and accurate species tree estimation for phylogeographic investigations using replicated subsampling. Molecular Phylogenetics and Evolution 57, 888–898. Hoorn, C., Wesselingh, F., ter Steege, H., Bermudez, M.A., Mora, A., Sevink, J., Sanmartín, I., Sanchez-Meseguer, A., Anderson, C.L., Figueiredo, J.P., Jaramillo, C., Riff, D., Negri, F.R., Hooghiemstra, H., Lundberg, J.G., Stadler, T., Säkinen, T., Antonelli, A., 2010. Amazonia through time: Andean uplift, climate change, landscape evolution, and biodiversity. Science 330, 927–931. Huang, H., He, Q., Kubatko, L.S., Knowles, L.L., 2010. Sources of error inherent in species-tree estimation: impact of mutational and coalescent effects on accuracy and implications for choosing among different methods. Systematic Biology 59, 573–583. Huang, H., Knowles, L.L., 2009. What is the danger of the anomaly zone for empirical phylogenetics? Systematic Biology 58, 527–536. Huelsenbeck, J.P., Bollback, J.P., Levine, A.M., 2002. Inferring the root of a phylogenetic tree. Systematic Biology 51, 32–43. Huelsenbeck, J.P., Bull, J.J., 1996. A likelihood ratio test for detection of conflicting phylogenetic signal. Systematic Biology 45, 92–98. Jepsen, D.B., Winemiller, K.O., Taphorn, D.C., 1997. Temporal patterns of resource partitioning among Cichla species in a Venezuelan blackwater river. Journal of Fish Biology 51, 1085–1108. Kingman, J.F.C., 1982. The coalescent. Stoch. Proc. Appl. 13, 235–248. Kubatko, L.S., Carstens, B.C., Knowles, L.L., 2009. STEM: Species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics. http://dx.doi.org/10.1093/bioinformatics/btp079. Kubatko, L.S., Degnan, J.H., 2007. Inconsistency of phylogenetic estimates from concatenated data under coalescence. Systematic Biology 56, 17–24. Kullander, S.O., Ferreira, E.J.G., 2006. A review of the South American cichlid genus Cichla, with descriptions of nine new species (Teleostei : Cichlidae). Ichthyol. Explor. Freshwaters 17, 289–398. Leaché, A., 2009. Species tree discordance traces to phylogeographic clade boundaries in North American fence lizards (Sceloporus). Systematic Biology 58, 547–559. Li, C., Lu, G., Ortí, G., 2008. Optimal data partitioning and a test case for ray-finned fishes (Actinopterygii) based on ten nuclear loci. Systematic Biology 57, 519– 539. Li, C., Riethoven, J.-J.M., Ma, L., 2010. Exon-primed intron-crossing (EPIC) markers for non-model teleost fishes. BMC Evolutionary Biology 10, 90. Lopez-Fernandez, H., Arbour, J.H., Winemiller, K.O., Honeycutt, R.L., 2013. Testing for ancient adaptive radiations in Neotropical cichlid fishes. Evolution. http:// dx.doi.org/10.1111/evo.12038. López-Fernández, H., Honeycutt, R.L., Winemiller, K.O., 2005. Molecular phylogeny and evidence for an adaptive radiation of geophagine cichlids from South America (Perciformes : Labroidei). Molecular Phylogenetics and Evolution 34, 227–244. López-Fernández, H., Winemiller, K.O., Honeycutt, R.L., 2010. Multilocus phylogeny and rapid radiations in Neotrpical cichlid fishes (Perciformes: Cichlidae: Cichlinae). Molecular Phylogenetics and Evolution 55, 1070–1086. Macrander, J., Willis, S.C., Gibson, S., Orti, G., Hrbek, T., 2012. Polymoprhic microsatellite loci for the Amazonian Peacock Basses, Cichla orinocensis and C. temensis, and cross-species amplification in other Cichla species. Mol. Ecol. Resour. doi: http://dx.doi.org/10.1111/j.1755-0998.2012.03173.x. Maddison, W.P., 1997. Gene trees in species trees. Systematic Biology 46, 523–536. Maddison, W.P., Knowles, L.L., 2006. Inferring phylogeny despite incomplete lineage sorting. Systematic Biology 55, 21–30. Maddison, W.P., Maddison, D.R., 1992. MacClade: Analysis of phylogeny and character evolution. Sinauer Associates, Sunderland, Massachusetts. Meng, C., Kubatko, L.S., 2009. Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. Theoretical Population Biology. Nei, M., Li, W.-H., 1979. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. USA 76, 5269–5273. Nylander, J.A., Ronquist, F., Huelsenbeck, J.P., Nieves-Aldrey, J.L., 2004. Bayesian phylogenetic analysis of combined data. Systematic Biology 53, 47–67. Nylander, J.A., Wilgenbusch, J.C., Warren, D.L., Swofford, D.L., 2008. AWTY (Are We There Yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics. Bioinformatics 24, 581–583. Nylander, J.A.A., 2004. MrModeltest v2. Program distributed by the author, Evolutionary Biology Centre, Uppsala University. O’Neill, E.M., Schwartz, R., Bullock, T., Williams, J.S., Shaffer, B., Aguilar-Miguel, X., Parra-Olea, G., Weisrock, D.W., 2013. Parallel tagged amplicon sequencing reveals major lineages and phylogenetic structure in the North American tiger salamander (Ambystoma tigrinum) species complex. Molecular Ecology 22, 111– 129. Pamilo, P., Nei, M., 1988. Relationships between gene trees and species trees. Molecular Biology and Evolution 5, 568–583. Rambaut, A., Drummond, A.J., 2007. Tracer v1.4. Available from http:// www.beast.bio.ed.ac.uk/Tracer. Rannala, B., Yang, Z., 2003. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645–1656.

Please cite this article in press as: Willis, S.C., et al. Multi-locus species tree for the Amazonian peacock basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited nuclear variation. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.07.031

12

S.C. Willis et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

Reis, R.E., Kullander, S.O., Ferraris, C.J. (Eds.), 2003. Checklist of the Freshwater Fishes of South and Central America. EDIPUCRS, Porto Alegre, Brazil. Rokas, A., Carroll, S.B., 2005. More sequence or more taxa? The relative contribution of taxon number and sequences data size to phylogenetic accuracy. Molecular Biology and Evolution 22, 1337–1344. Ronquist, F., Huelsenbeck, J.P., 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574. Rosenberg, N.A., Nordborg, M., 2002. Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms. Nature Reviews Genetics 3, 380–390. Shimodaira, H., Hasegawa, M., 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Molecular Biology and Evolution 16, 1114–1116. Sides, J., Lydeard, C., 1999. Phylogenetic utility of the tyrosine kinase gene X-src for assessing relationships among representative cichlid fishes. Molecular Phylogenetics and Evolution 14, 51–74. Soria-Carrasco, V., Talavera, G., Igea, J., Castresana, J., 2007. The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees. Bioinformatics 23, 2954–2956. Steel, M.A., Lockhart, P.J., Penny, D., 1993. Confidence in evolutionary trees from biological sequence data. Nature 364, 440–442. Stephens, M., Smith, N.J., Donnelly, P., 2001. A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics 68, 978–989. Swofford, D.L., 2002. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts. Wenzel, J.W., Siddall, M.E., 1999. Noise. Cladistics 15, 51–64. Whittall, J.B., Hodges, S.A., 2007. Pollinator shifts drive increasingly long nectar spirs in columbine flowers. Nature 447, 406–409. Wiens, J., Pyron, R.A., Moen, D.S., 2011. Phylogenetic origins of local-scale diversity patterns and the causes of Amazonian megadiversity. Ecology Letters 14, 643– 652.

Willis, S.C., 2011. Multilocus and parametric analyses of the evolutionary history of the Amazonian peacock cichlids, genus Cichla (Teleostei: Cichlidae). School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, Nebraska, USA. Willis, S.C., Macrander, J., Farias, I.P., Orti, G., 2012. Simultaneous delimitation of species and quantification of interspecific hybridization in Amazonian peacock cichlids (genus Cichla) using multi-locus data. BMC Evolutionary Biology 12. Willis, S.C., Nunes, M.S., Montana, C.G., Farias, I.P., Lovejoy, N.R., 2007. Systematics, biogeography, and evolution of the Neotropical peacock basses Cichla (Perciformes : Cichlidae). Molecular Phylogenetics and Evolution 44, 291–307. Willis, S.C., Nunes, M.S., Montana, C.G., Farias, I.P., Orti, G., Lovejoy, N.R., 2010. The Casiquiare river acts as a corridor between the Amazonas and Orinoco river basins: biogeographic analysis of the genus Cichla. Molecular Ecology 19, 1014– 1030. Winemiller, K.O., 2001. Ecology of peacock cichlids (Cichla spp.) in Venezuela. Journal of Aquariculture & Aquatic Sciences 9, 93–112. Won, Y.J., Wang, Y., Sivasundar, A., Raincrow, J., Hey, J., 2006. Nuclear gene variation and molecular dating of the cichlid species flock of Lake Malawi. Molecular Biology and Evolution 23, 828–837. Yang, Z., 1998. On the best evolutionary rate for phylogenetic analysis. Systematic Biology 47, 125–133. Zardoya, R., Vollmer, D.M., Craddock, C., Streelman, J.T., Karl, S.A., Meyer, A., 1996. Evolutionary conservation of microsatellite flanking regions and their use in resolving the phylogeny of cichlid fishes (Pisces: Perciformes). Proce. Roy. Soc. Lond. B 263, 1589–1598. Zhang, D.X., Hewitt, G.M., 2003. Nuclear DNA analyses in genetic studies of populations: practice, problems and prospects. Molecular Ecology 12, 563–584.

Please cite this article in press as: Willis, S.C., et al. Multi-locus species tree for the Amazonian peacock basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited nuclear variation. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.07.031 View publication stats

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.