We combined mitochondrial (cyb, control region, coi, nd4) and nuclear (irbp, ghr, sry, lcat) DNA sequence data to infer phylogenetic relationships of arvicoline voles. The concatenated supermatrix contained 72.8 % of missing data. From this dataset, Bayesian inference showed close relationships of Arvicola and Chionomys, Proedromys with Lasiopodomys and Microtus gregalis, Phaiomys with Neodon and M. clarkei. Genus Microtus formed a supported group with Blanfordimys and N. juldaschi. The gene partition taxon sets were explained in the multilocus phylogeny in such a way that the resulting Bayesian inference tree represented a unique solution on a terrace in the tree space. This means that although the supermatrix contained a large proportion of missing data, it was informative in retrieving a phylogeny with a unique optimality score, tree likelihood.
Arvicoline voles (Rodentia, Arvicolini) are a young group of small rodents distributed on the northern hemisphere. They started to diverge probably as recently as two to three million years ago, but in the short time frame, they speciated into one of the most speciose mammalian groups (Wilson & Reeder 2005). Today, the species show rapid temporal changes in genetic composition of populations (Bryja et al. 2007, Oliver et al. 2009, Rudá et al. 2010; but see Spaeth et al. 2009) and fast karyotype reorganisation (Mazurok et al. 2001, Mekada et al. 2002, Sitnikova et al. 2007, Mitsainas et al. 2010) coupled with gene reorganisation between mitochondrial and nucleotide genomes (Triant & DeWoody 2007, 2008). Populations of arvicoline voles diverge quickly when they become fragmented in refugia, leading at times to speciation in these areas, and refugia become speciation traps (Martínková & Dudich 2003, Martínková et al. 2007, Tougard et al. 2008, Kryštufek et al. 2009, Haring et al. 2011). Multiple vole species co-occur in many regions and habitat partitioning was found with sympatric occurrence (Jurdíková et al. 2000, Santos et al. 2011). Yet, they are morphologically similar with small differences between distantly related taxa (Fraguedakis-Tsolis et al. 2009).
Reconstruction of phylogenetic relationships in arvicoline voles is complicated by morphological similarities and rapid karyotype rearrangements, making the group an ideal model for molecular genetic studies. Both mitochondrial (mtDNA) and nuclear (nucDNA) markers have been extensively used to resolve relationships within the group and to study evolutionary history of different taxa. The relative merit of information value of mtDNA and nucDNA markers overlaps in arvicoline rodents. MtDNA sequences provide valuable information in phylogeny reconstruction at specific and intrageneric level, contributing to resolving phylogeographic histories of taxa (e.g. Conroy & Cook 2000a, Jaarola & Searle 2002, Fink et al. 2004, Brunhoff et al. 2006, Fan et al. 2011), identification of putative cryptic species (Kefelioğlu & Kryštufek 1999, Hellborg et al. 2005, Castiglia et al. 2008, Conroy & Neuwald 2008, Weksler et al. 2010) and phylogenetic placement of taxa with unstable position based on other data (Macholán et al. 2001, Jaarola et al. 2004, Martínková et al. 2007, Kryštufek et al. 2009, Bannikova et al. 2010). The phylogenetic signal of mtDNA and its ability to resolve relationships decrease at higher level of phylogenies that exhibits as a rapid burst of diversification (Jaarola et al. 2004). However, also nuclear markers, either DNA sequence data or AFLP markers (Galewski et al. 2006, Abramson et al. 2009, Fink et al. 2010), fail to fully resolve the signal of the rapid diversification of voles at the base of the tree. Incongruence of sampling between studies further complicates interpretation of relationships within the group (Bužan et al. 2008, Haring et al. 2011).
Here, we combine available mitochondrial and nuclear DNA sequence markers to reconstruct phylogenetic relationships of arvicoline rodents and to assess stability of the retrieved model. We use Bayesian inference analysis of a concatenated supermatrix and SuperTriplets supertree reconstruction to estimate parent trees. These we then use to establish the size of the terrace where trees will have the same likelihood with the dataset with large content of missing data.
Material and Methods
Sequences were downloaded from GenBank for 74 species belonging to genera Arvicola, Blanfordimys, Chionomys, Lasiopodomys, Microtus, Neodon, Phaiomys and Proedromys (Table 1). Herewith, we accept species designation of Wilson & Reeder (2005) with the addition of recently established species Microtus gromovi (Bannikova et al. 2010) and Proedromys liangshanensis (Liu et al. 2007). Alignments were constructed in Geneious 5.4 (Drummond et al. 2011) with sequences from mitochondrial genes for cytochrome b (cyb), control region (CR), cytochrome c oxidase subunit I (coi), NADH dehydrogenase subunit 4 (nd4) and nuclear genes for interphotoreceptor retinoid-binding protein, exon 1 (irbp), growth hormone receptor, exon 10 (ghr), sex-determining region Y (sry), lecithin: cholesterol acyl transferase, exons 2 through 5 (lcat). The alignments were reduced to contain at least three sequences at every base. In the sry gene, the microsatellite (TC)n(TG)n (Acosta et al. 2010) could not be aligned unambiguously, and the region was deleted. The data and results are available through TreeBASE ( http://purl.org/phylo/treebase/phylows/study/TB2:S12667).
Two additional loci have good taxonomic sampling. However, both loci in the avpr1a gene, its upstream region and exon 1, were previously documented to be disparate with the species tree (Fink et al. 2007, 2010), and some sequences of the avpr1a exon 1 were shared between distantly related species (Fink et al. 2007). To reduce conflict between gene trees in our analyses, we chose to omit these loci.
Optimal substitution model was estimated in MrModeltest 2.3 (Nylander 2004) with Akaike Information Criterion (AIC) and applied to the individual gene alignments and partitions of the concatenated alignment. Where the parameters of the selected model were extreme, a simpler model was used. Bayesian Inference (BI) was conducted in MrBayes 3.1 (Ronquist & Huelsenbeck 2003) with Markov chain Monte Carlo (MCMC) parameters set to 2-5 million steps sampled every 1000th, five to six chains in two runs with chain temperature 0.08- 0.12 and chain swapping attempted once every third generation. The MCMC runs were optimised to mix and ideally to finish with average standard deviation of split frequencies below 0.01, potential scale reduction factor for model parameters approaching 1.000 and proportion of successful chain stage swaps between 0.4 and 0.7. BI is robust in recovering the correct tree topology, but it might fail to establish appropriate branch lengths with default branch length prior (Marshall 2010). The 95 % credibility intervals of the tree lengths from BI were compared to the tree length obtained from maximum likelihood (ML) analysis. The ML tree was calculated in RAxML 7.2 (Stamatakis 2006). The trees were re-rooted to midpoint root to allow for uncertainty in the phylogenetic position of Arvicola (Galewski et al. 2006, Bužan et al. 2008, Abramson et al. 2009, Bannikova et al. 2009).
Divergence events between all taxa were investigated in two ways; from a combination of gene trees and directly from the concatenated supermatrix. The gene trees were combined into a SuperTriplets supertree (Ranwez et al. 2010). The method breaks down the gene trees to their smallest components containing three taxa, where any two taxa are more closely related than either is to the third. Supertree then contains medians of relationships from the triplets as they were found in the gene trees. Edge support in the SuperTriplets analysis is the proportion of triplets that support a given edge.
The supermatrix was resolved with partitioned BI ran for 6 million generations with five heated and one cold MCMC, chain temperature set to 0.09 and one chain swap attempted every third generation. Partition rates were allowed to vary.
Accession numbers of DNA sequences used in this study to reconstruct phylogenetic relationships.
Given the fractional nature of the supermatrix, multiple distinct trees were likely to display the same set of subtrees representing taxa sampled per gene. Such trees will have the same log-likelihood for a partitioned analysis (Sanderson et al. 2011). A set of trees that display the same set of subtrees for sampled loci and have the same log-likelihood is called a terrace. The trees from a terrace are derived from each other by nearest-neighbour interchange (NNI) rearrangement, and, in a dataset with considerable missing data content, the terraces might contain many trees. The size of terraces for our dataset was assessed with perl scripts from the PhyloTerraces package (Sanderson et al. 2011). SuperTriplets supertree and BI tree based on the supermatrix were used as parent trees. Terrace identification requires binary (fully resolved) trees. The BI tree was resolved by accepting all relationships resolved in the posterior sample. The SuperTriplets supertree was resolved using relationships from the cyb tree. In terrace identification analysis, the parent tree was broken to subtrees, where each subtree represented relationships of taxa sampled for the respective gene as resolved in the parent tree. All relationships in the subtrees were further characterized by triplets. From these, all alternative parent trees that contain the subtrees were constructed.
The dataset contained 1143 base-pairs (bp) long alignment with 68 taxa for cyb, CR alignment was 1025 bp long and contained 25 taxa, coi was 1545 bp long with 12 taxa, nd4 was 1378 bp long with 9 taxa, irbp was 1181 bp long with 24 taxa, ghr was 911 bp long with 27 taxa, sry was 908 bp long with 19 taxa and lcat was 590 bp long with 10 taxa. The concatenated supermatrix had 72.8 % missing data composed of missing sequences of individual genes, alignment gaps and unknown nucleotides.
Substitution models selected by AIC for each gene were GTR + Γ + I for cyb, HKY + Γ + I for CR, GTR + Γ + I for coi, GTR + Γ for nd4, HKY + Γ for irbp, GTR + Γ + I for ghr, HKY + I for sry and HKY + I for lcat. As the GTR model requires estimation of the rate matrix, a simpler HKY model was tested for coi, nd4 and ghr genes. The difference between log-likelihood based on selected and tested model was 10.8 for coi, 1.8 for nd4 and 1.1 for ghr and the HKY model was used for nd4 and ghr genes. The resulting trees were similar, and MCMC convergence was faster with the simpler model. The results from the simpler models are reported (Table 2, Figs. 1–2).
The 95 % credibility interval (CI) of the BI tree length based on the partitioned concatenated dataset was 7.89-15.12 with default rate parameter of the exponential branch length prior (λ = 10.0). This CI of the BI tree length did not contain the tree length 3.86 estimated from the ML analysis. Increasing the rate by increments of 10.0 to the final value of 50.0, the tree length decreased to 4.0-4.59, but we did not further test the branch length prior because of decrease in node support for high λ. Node support improved with optimisation of the branch length prior when λ = 20.0, and subsequently decreased (Fig. 3). We further analyse the BI tree with the highest average node support.
The BI supermatrix phylogeny re-rooted with midpoint root showed two initial groups (Fig. 4). The root separated genera Arvicola and Chionomys from Microtus, Blanfordimys, Phaiomys, Proedromys and Lasiopodomys. Proedromys liangshanensis was a sister species to a well-supported group containing Lasiopodomys and Microtus gregalis. Similarly, Phaiomys leucurus was a sister taxon to a supported group with unresolved internal relationships containing Neodon irene, N. sikimensis and M. clarkei. Remaining taxa formed a monophyletic group with high Bayesian posterior probability (BPP). It contained species currently attributed to genus Microtus not mentioned above, genus Blanfordimys and N. juldaschi. The first group that diverged at this level was the subgenus Microtus (Alexandromys) predominantly distributed in the eastern Palaearctic. Microtus fortis group was well differentiated from the basal species of Alexandromys, M. kikuchii, M. montebelli and M. oeconomus. We confirmed position of M. gromovi as a distinct taxon rather than a subspecies of M. maximowiczii.
Further notable group consisted of N. juldaschi with Blanfordimys afghanus and B. bucharensis. M. agrestis was a sister species to this group, but the relationship was unsupported. Nearctic species formed an unsupported group with M. cabrerae. Within the Nearctic group, three pairs of sister taxa had significant node support. Subgenera Microtus (Terricola) and Microtus (Microtus) were sister groups that were most derived in the BI phylogeny (Fig. 4). In the subgenus Microtus, M. arvalis group and M. socialis/guentheri group were separated, but classification of M. schelkovnikovi to the M. socialis/guentheri group had low support (BPP = 0.85). Subgenus Terricola was separated to the eastern and western clade, where the eastern clade containing M. majori, M. subterraneus and M. daghestanicus had BPP = 0.94.
Substitution models used for separate gene tree analyses and partitions of the supermatrix. Model parameters are means estimated from sampled posterior distribution of the gene trees after burn-in. κ — transition/transversion rate ratio, r — substitution rate, f — base frequency, α — shape parameter of the Γ distribution, I — proportion of invariable sites, n/a — not available.
The SuperTriplets supertree agreed with the BI tree in distinguishing groups Arvicola, Microtus (Terricola), Microtus (Microtus), Microtus (Pitymys) with M. guatemalensis, Blanfordimys with N. juldaschi and a separate group including taxa from the subgenus Microtus (Alexandromys) (data available at TreeBASE). The other groups were distorted due to different position of Chionomys nivalis, N. sikimensis and Lasiopodomys brandtii. In the supertree, C. nivalis was placed as a basal taxon after diversification of Arvicola. Neodon sikimensis formed a polyphyly with C. gud and C. roberti. Lasiopodomys brandtii was placed within the group containing Nearctic species.
Phylogenetic terraces where the trees belonged to were small. The terrace with the BI tree used as the parent tree consisted of a single tree, and the terrace with the supertree consisted of 15 trees.
Tree space of the concatenated dataset
We found that by optimising branch length prior of the Bayesian inference analysis of the multilocus phylogeny of arvicoline rodents with missing data comprising nearly 73 % of the concatenated supermatrix we were able to retrieve a phylogeny that is unique on a terrace. This means that there are no trees with alternative topology that would explain sets of taxa from individual gene partitions that would be present in the BI phylogeny. For the supertree approach, the terrace size was also small, and it contained 15 trees with alternative topology that explained the relationships of gene tree datasets as depicted on the supertree.
The topology of our phylogeny that well explained gene sets in the final trees was not reflected in similar congruence in branch length estimations. Our Bayesian phylogeny with highest node support was longer than the ML tree as is known to occur in partitioned datasets (Marshall et al. 2006, Brown et al. 2010, Marshall 2010). The cause of this discrepancy was recently identified to be a branch length prior that places too much probability density on large tree lengths (Rannala et al. 2012). The default on branch lengths in MrBayes assigns independent and identical exponential priors for individual branch lengths. As the default initial value for branch lengths is 0.1, the MCMC starts from very long trees for large datasets with many taxa, which is often unrealistic. The prior then places too much influence on the posterior. The effect is exacerbated for large datasets where there are partitions with low variability or correlations in rate variation or substitution models in the posterior (Rannala et al. 2012). This seems to be the case in our dataset. We optimised the branch length prior in our partitioned analyses, assuming that by setting the branch length exponential prior mean closer to mean branch length estimated from the ML tree length, the posterior tree length would be more similar to the ML tree length (Zhang et al. 2012). This is not a suitable approach in the Bayesian framework (c.f. Zhang et al. 2012). Our data showed that decreasing the tree length in this way lead to decrease of node support for high values of rate parameter of the exponential distribution of the uncorrelated branch length prior. Using compound Dirichlet priors on branch lengths in modified MrBayes 3.1 (Zhang et al. 2012), we were not able to obtain a tree with CI of tree length that would include the ML estimate and the average BPP remained lower than in our optimal tree.
Phylogenetic position of Arvicola within subfamily Arvicolinae is unstable in studies utilising mtDNA (Bužan et al. 2008, Bannikova et al. 2009) in comparison with studies that use nucDNA (Galewski et al. 2006, Abramson et al. 2009). We also observed this in gene trees. Our supermatrix results show that by rooting the tree of the Arvicolini tribe sensu Galewski et al. (2006) with midpoint root, Arvicola forms a supported group with Chionomys at the base of the tree.
Microtus gregalis represented a phylogenetic enigma. In early molecular phylogenies, it was placed distantly from other supposedly related species at the base of the phylogeny of Microtus, but its basal position was unsupported (Conroy & Cook 2000b, Conroy et al. 2001, Jaarola et al. 2004). It was later retrieved as a sister taxon to Chionomys based on mtDNA (Bužan & Kryštufek 2008), but nucDNA grouped M. gregalis with Lasiopodomys (Abramson et al. 2009). This grouping elucidates towards rapid karyotype rearrangements between species, as M. gregalis has 36 chromosomes (Martínková et al. 2004), whereas L. mandarinus chromosome number varies between 47 and 52 (Liu et al. 2010). If the ancestral karyotype of the group was 2n = 54, the rearrangements leading to M. gregalis that branches close to the base of the tree were extensive (c.f. Lemskaya et al. 2010). The phylogenetic position of M. gregalis in the vicinity of Lasiopodomys was confirmed once Lasiopodomys was sequenced for the mtDNA (Bannikova et al. 2010). Our combined phylogeny placed M. gregalis close to the base of the trees in a well-supported group with L. mandarinus and L. brandtii in accordance with recent studies (Abramson et al. 2009, Bannikova et al. 2010). The sister taxon of the group is Proedromys liangshanensis that was described recently (Liu et al. 2007). Based on phylogeny of complete mtDNA, Pr. liangshanensis is a sister species to Microtus (Hao et al. 2011). Our analysis included more comprehensive sampling, and we found that the species forms a supported sister relationship with Lasiopodomys and M. gregalis.
The genus Neodon was polyphyletic with N. juldaschi grouping with Blanfordimys deeper in the phylogeny than other species attributed to Neodon, N. irene and N. sikimensis. The latter two species consistently belonged to the Phaiomys/Neodon lineage (Galewski et al. 2006, Robovský et al. 2008, Bannikova et al. 2009, 2010) that included also M. clarkei in our analyses. Its relationship with Neodon was not resolved, forming a strongly supported trichotomy, where BPP for the monophyly of Neodon within this lineage was as low as 0.41.
The phylogenetic position of M. agrestis was unstable in the gene trees, and it was placed as an unsupported sister taxon to the N. juldaschi/Blanfordimys group in our multilocus phylogeny. Microtus agrestis split from the common ancestors of Microtus early in the radiation of the genus, but its closest relatives might not be identifiable today similarly as in the case of M. cabrerae. Interestingly, M. agrestis from the Iberian peninsula, where M. cabrerae is also distributed, forms a phylogenetic lineage distinct from other M. agrestis populations. This divergence is present both in mitochondrial and nuclear phylogenies and might represent a cryptic species that was not formally described to date (Jaarola & Searle 2004, Hellborg et al. 2005). The erratic placement of M. agrestis in different gene trees and unsupported position of M. cabrerae with North American Microtus indicates that these species represent relicts of a very early colonization of Arvicolini to Western Europe. Phylogenies of Arvicolidae improve with more comprehensive sampling (Bužan et al. 2008), and based on the fact that we analysed majority of species in the tribe Arvicolini, we are confident to state that the close relatives of M. agrestis and M. cabrerae are extinct today and their phylogenetic position is influenced more by stochastic processes in DNA sequence evolution such as saturation or convergence and by computational artefacts in phylogeny reconstruction.
Within Microtus, we retrieved three groups that represent subgenera recognised by Wilson & Reeder (2005) with minor changes. Subgenus Alexandromys was supported without M. clarkei as per Wilson & Reeder (2005), Microtus without M. cabrerae and Terricola with M. tatricus. The species within subgenera showed relationships established in previous studies (Jaarola et al. 2004, Martínková et al. 2007, Kryštufek et al. 2009, Bannikova et al. 2010, Haring et al. 2000, 2011).
Nearctic Microtus consistently suffer from lack of resolution of many relationships (Conroy & Cook 2000b, Conroy et al. 2001, Jaarola et al. 2004, Fink et al. 2010), although recent studies show that these taxa are also strongly influenced by rapid diversification in speciation traps. Weksler et al. (2010) found M. abbreviatus from Wrangell Mts. to be divergent and potentially merit species status, and Conroy & Neuwald (2008) distinguished two species within M. californicus. In our multilocus study, phylogenetic position of M. breweri was particularly unstable. This was probably in lieu of the fact that only the sry gene sequence was available for this species.
Species with small ranges were often part of rapidly differentiating groups. This leads to an assumption that geographic isolation in small refugia triggers diversification on both molecular and morphological levels. In Microtus, the results of phylogeography couple with species phylogenies where phylogeography nowadays indicates regions and populations that might give rise to new species in the future.
We appreciate M. Macholán's invitation to participate in the issue of Folia Zoologica in tribute to Jan Zima's anniversary. N. Martínková is thankful to J. Zima for his guidance and support at the beginning of her career. The bioinformatic analyses were conducted at the computational cluster of the Institute of Vertebrate Biology and at the MetaCentrum. The access to the MetaCentrum computing facilities was provided under the programme “Projects of Large Infrastructure for Research, Development, and Innovations” LM2010005 funded by the Ministry of Education, Youth, and Sports of the Czech Republic. This study was supported by the Grant Agency of the Academy of Sciences of the Czech Republic, grant no. IAA600930609 and with institutional support RVO: 68081766.