Understanding the biogeographic origins and temporal sequencing of groups within a region or of lineages within an ecosystem can yield important insights into evolutionary dynamics and ecological processes. Fifty years ago, Ernst Mayr generated comprehensive—if limited—inferences about the origins of the New World avifaunas, including the importance of pre-Isthmian dispersal between North and South America. Since then, methodological advances have improved our ability to address many of the same questions, but the phylogenies upon which such analyses should be based have been incompletely sampled or fragmentary. Here, we report a near-species-level phylogeny of the diverse (~832 species) New World clade Emberizoidea—the group that includes the familiar sparrows, cardinals, blackbirds, wood-warblers, tanagers, and their close relatives—to our knowledge the largest essentially complete (≥95%) phylogenetic hypothesis for any group of organisms. Biogeographic analyses based on this tree suggest initial dispersal into the New World via Beringia, with rapid subsequent diversification, including early dispersal of 1 lineage (the tanagers, Thraupidae) into South America. We found substantial dispersal between North and South America prior to closure of the Isthmus of Panama, but with a notable increase afterward, with a directional bias from north to south. With much greater detail and historical rigor, these analyses largely confirm Mayr's speculations based on taxonomy, resolving outstanding ambiguity regarding the continental origins of some groups such as the Emberizidae and Icteridae. The phylogeny reported here will be a resource of broad utility for addressing additional evolutionary and ecological questions with this diverse group.
Since the need for a knowledge of the Tertiary composition of the [North and South American] avifaunas is considerable, some method must be found to reconstruct distributions in past geological periods…. Such a method … consists in an evaluation of the present pattern of distribution … and in a study of the distribution of near relatives. Direct proof is impossible by this method, but it allows for inferences with varying degrees of probability. (Mayr, 1964:281)
In two now classic papers on the biogeography of the New World avifauna, Ernst Mayr (1946, 1964) clearly laid out the challenges and opportunities associated with reconstructing the evolutionary history of birds in this climatically dynamic, ecologically diverse region. In particular, Mayr believed that understanding the geographic origins of avian groups would shed light on current spatial diversity patterns, rules (if any) of community assembly and composition, and differential success of lineages, among other phenomena. Unfortunately, given the poor bird fossil record and the lack of rigorous quantitative methods for phylogeny reconstruction at the time, Mayr was limited in his analyses to tabulating regional endemism of families, genera, and species. On the basis of these data, Mayr classified the majority of New World birds by their continent of origin and reconstructed dispersal events and the establishment of secondary centers of endemism. Although the data available at the time were rudimentary, Mayr was able to argue that (1) there was likely an endemic North American tropical avifauna prior to land connections with South America; (2) the South American avifauna was much more successful in limiting influx from North America than its mammalian fauna had been, possibly due to an ongoing history of interchange prior to closure of the Isthmus of Panama; and (3) habitats in North America vary significantly in the prominence of autochthonous as opposed to more recent Eurasian immigrant species (e.g., grassland assemblages are dominated by species that originated in North America, and temperate forest residents—but not migrants—are dominated by immigrants). Although of great general interest and historical importance, these insights were limited by methodological barriers that have been greatly reduced during the past half century.
Contemporary ecologists and evolutionary biologists now address these same areas of inquiry, using explicitly comparative phylogenetic methods (Webb et al. 2002, Ree et al. 2005, Goldberg et al. 2011), especially when faced with a poor fossil record for the group of interest. Although they are powerful, historical methods are limited by the sampling of taxa included in the phylogeny and may be biased when sampling is incomplete or nonrandom (Pybus and Harvey 2000, Wiens et al. 2007, Bokma 2008, Cusimano and Renner 2010, Brock et al. 2011), which is especially likely when investigating continent-scale questions such as those posed by Mayr. Therefore, much research has focused on the generation of comprehensive trees for taxa of interest using synthetic methods (Bininda-Emonds 2004a, de Queiroz and Gatesy 2007, Smith et al. 2009, Pearse and Purvis 2013), even when data are absent for significant fractions of species (Bininda-Emonds et al. 2007, Jetz et al. 2012). Although incomplete phylogenetic hypotheses may prove adequate for many process-related questions in which extremely broad-scale taxonomic coverage is more important than completeness, phylogenetic uncertainty limits our ability to address some lines of inquiry—especially those of a historical nature, in which the branching structure at particular nodes in the phylogeny provides core inferences about past events. Unfortunately, completely sampled phylogenies of particularly large clades remain rare. Here, we report a near-complete species-level phylogeny of a diverse, widespread New World bird lineage, the Emberizoidea. This songbird group—also known as the New World nine-primaried oscines—comprises the widely studied and ubiquitous blackbirds (Icteridae), cardinals (Cardinalidae), sparrows (Passerellidae), tanagers (Thraupidae), and wood-warblers (Parulidae). To our knowledge, this is the largest, essentially completely sampled, wholly data-based phylogenetic hypothesis for any group of organisms studied to date.
The Emberizoidea, comprising some ~832 species (or 7.8% of all birds), represents the second most diverse lineage of New World birds after the South American suboscine radiation (Barker et al. 2004). Due primarily to the importance of the northern Andes for the radiation of sparrows (García-Moreno and Fjeldså 1999, Cadena et al. 2007, 2011) and tanagers (Burns and Naoki 2004, Mauck and Burns 2009, Sedano and Burns 2010), this clade is most diverse in northern South America, but it is widespread throughout the entire mainland New World, as well as in the Greater and Lesser Antilles. The high dispersal potential of this lineage is further evident in its colonization of the Old World (the buntings, Emberizidae) as well as more distant islands, including the Galápagos (Darwin's finches) and the Tristan da Cunha group in the South Atlantic (Nesospiza and Rowettia). This group is as ecologically diverse as it is widespread, with a consequently impressive array of feeding adaptations, ranging from thin decurved bills in nectarivores to massive seed-crushing bills in granivores. This morphological diversity has contributed to a long and controversial history of higher-level classification, with many genera misclassified at the family level or long classified as incertae sedis. Now that molecular phylogenetic analyses have largely resolved such controversial relationships among higher taxa (Barker et al. 2002, 2004, 2013, Ericson and Johansson 2003), as well as revealing species-level relationships within most of the major emberizoid lineages (Lovette et al. 2010, Burns et al. 2014, Klicka et al. 2014, Powell et al. 2014), it is possible to conduct a synthetic analysis of the biogeographic history of this entire radiation. We report such a phylogenetic synthesis of the Emberizoidea here and use it to generate a quantitative, probabilistic analysis of emberizoid biogeographic origins and dispersal history, as requested by Mayr 50 years ago. Because the degree of participation of avian lineages in the Great American Biotic Interchange has been a focus of inquiry (Vuilleumier 1985, Barker 2007, DaCosta and Klicka 2008, Burns and Racicot 2009, Weir et al. 2009, Smith and Klicka 2010), especially in comparison with the rich fossil record of interchange in nonvolant mammals (Marshall et al. 1982, Webb 1985), we pay particular attention to the timing and directionality of dispersal events between North and South America.
We pursued a hierarchical sampling scheme for phylogeny reconstruction within the Emberizoidea. This approach focused on collection of mitochondrial DNA (the protein-coding genes CYTB and ND2, or only 1 of the 2 from species without available frozen tissue) from all species, supplemented by 4 nuclear genes (the protein-coding gene RAG1 and 3 introns, ACO1-I9, FGB-I5, MB-I2) sampled from generic exemplars as well as from deeply diverging lineages within genera as determined by mitochondrial DNA. One of the important outcomes of genus-level phylogenetic analysis of emberizoids was that gene trees are in fundamental conflict regarding basal relationships within the group (Barker et al. 2013). Mitochondrial DNA data place the Old World buntings (Emberizidae sensu stricto) as sister to the New World sparrows (Passerellidae), whereas nuclear genes place Emberizidae outside of a monophyletic New World radiation. Species tree analysis of the data agrees with the former relationship, whereas concatenation favors the latter. These 2 alternative perspectives are not possible to integrate directly in a species-level phylogeny of the group because of the hierarchical sampling used here, not to mention the size of the phylogeny, which makes species tree inference from a single alignment computationally unfeasible. For these reasons, our hypotheses of emberizoid phylogeny were generated using a planned supertree approach. Specifically, we generated phylogenetic hypotheses for major subclades of Emberizoidea (i.e. the families Cardinalidae, Emberizidae, Icteridae, Parulidae, Passerellidae, and Thraupidae), then grafted these subclades onto the backbone topologies of generic relationships that were inferred using both species tree and concatenated analyses (Barker et al. 2013; Supplemental Material Figures S1 and S2). Because we are interested in both absolute and relative timing of evolutionary events within Emberizoidea, and because we want the opportunity to take phylogenetic uncertainty into account in downstream analyses, we have focused on Bayesian relaxed clock analyses, as implemented in BEAST version 1.7.4 (Drummond et al. 2012). Below, we describe both the subclade analyses and the procedure we used to integrate subclade and backbone trees into a posterior distribution of supertrees, which we call a “pseudoposterior.”
Phylogenetic Inference for Major Subclades
Relative- or absolute-time-calibrated tree posteriors have already been generated for comprehensive species-level samples of the Parulidae (Lovette et al. 2010), Icteridae (Powell et al. 2014), Thraupidae (Burns et al. 2014), and Passerellidae (Klicka et al. 2014), and these are not described further here. Currently, no relative-time tree posteriors are available for the Calcariidae, Emberizidae, or Cardinalidae. For these groups, we constructed the most complete data matrices possible, including mitochondrial DNA data from all species for which it was available, as well as nuclear data from the loci listed above for major lineages within each (supplemented with data from another nuclear locus, ODC, for the Emberizidae; Alström et al. 2008). This yielded matrices with 6/6 Calcariidae, 36/41 Emberizidae, and 43/45 Cardinalidae sampled (for taxa and accessions, see Supplemental Material Table S1). These matrices were analyzed individually in BEAST under an uncorrelated lognormally distributed clock model (with an exponential prior, λ = 3) with independent branch lengths for each locus, a GTR+I+G4 model of sequence evolution for each protein-coding gene, and an HKY+G4 model for each intron (Barker et al. 2013). Chains were run for 2 replicates of 1 × 107 generations each. Parameter convergence, burn-in, and sampling adequacy were assessed using Tracer version 1.5 (Rambaut and Drummond 2004), and convergence of nodal posterior probabilities was assessed with AWTY (Nylander et al. 2008).
Pseudoposterior Assembly and Summary
The posterior distributions of time-calibrated backbone trees (from concatenated and species tree analyses of a genus-level sample of emberizoids, described in Barker et al. 2013), along with uncalibrated posterior distributions of trees for each subclade (see above), were imported into R (R Development Core Team 2012) using the “ape” package (Paradis et al. 2004). Outgroups for subclades were pruned as necessary. A pseudoposterior (a distribution of constructed supertrees) was generated using the following procedure: (1) Randomly select (with replacement) a backbone tree from its posterior; (2) randomly select (with replacement) 1 tree for each subclade; (3) for each subclade, graft the selected tree onto the backbone tree, such that the basal split of the subclade tree is coincident with the basal split for the corresponding clade in the backbone tree, scaling the subclade tree so that the grafted tree remains ultrametric; and (4) go to step 1 and repeat until a predetermined number of ultrametric trees have been generated. This procedure was automated by a script in R (available at the Dryad Digital Repository; Barker et al. 2014). For the present study, we assembled pseudoposterior distributions of 4,000 trees each for both the concatenated and species tree backbone analyses. These pseudoposteriors were summarized as maximum clade credibility (MCC) trees and associated posterior probabilities using BEAST's TreeAnnotator application. Where negative branch lengths occurred in the MCC trees, they were reflected and downstream branch lengths rescaled to maintain ultrametricity using a script in R.
We reconstructed the biogeographic history of emberizoids at a roughly continental scale, dividing the current range of the group into the Old World (emberizid buntings are widely distributed across Eurasia and Africa), North America, South America (the 2 continents demarcated by the Canal Zone in Central Panama, roughly corresponding to the area of final seaway closure between them; Coates and Obando 1996), the Caribbean (excluding continental Trinidad and Tobago), and a separate category for other offshore islands including the Galápagos, Cocos Island, and the Tristan da Cunha group in the South Atlantic. Because many emberizoids are either short- or long-distance migrants, and there is controversy regarding which part of their range represents the ancestral resident area for such species (Gauthreaux 1982, Cox 1985, Bell 2000, Salewski and Bruderer 2007), we examined species distributional coding using both the breeding and wintering distributions. At this broad-scale level of analysis, vicariance is unlikely to be a significant factor in the evolution of emberizoids, given that all the demarcated areas were isolated by water barriers for most of the group's history (ending with closure of the Isthmus of Panama some 3 mya). For this reason, we analyzed the distributional data with multistate character methods in a likelihood framework using BayesTraits (Pagel and Meade 2006), rather than using explicitly biogeographic methods such as the dispersal–extinction–cladogenesis method (implemented in Lagrange; Ree et al. 2005) or dispersal–vicariance analysis (Ronquist 1997); however, for the same reasons, we would expect results obtained with such methods to be essentially identical to those reported here. For each of the MCC trees obtained, we used BayesTraits to optimize the distributional data and obtain relative probabilities for each area at all nodes in the tree, using a partially constrained asymmetrical model of state transitions. In order to achieve stable parameter estimates in the face of a very small number of transitions involving offshore islands (not including the Antilles), we set the rate of dispersal from those islands to the mainland to zero. Consequently, our study cannot be considered a test of the direction of dispersal involving offshore islands (i.e. Galápagos and Tristan de Cunha group): Focused analyses of the history of these groups is necessary (e.g., Burns et al. 2002, Ryan et al. 2013).
In order to test the significance of observed dispersal asymmetry between North and South America during the Great American Biotic Interchange, a model constraining symmetry was also optimized, and model likelihoods were compared using the likelihood ratio. We identified isthmus-crossing lineages on the basis of relative-likelihood calculations described above. Dispersal events inferred for a branch could have occurred at any point along the branch; consequently, we treated inferred dispersal of each lineage as a uniform distribution across its corresponding time interval, then integrated these distributions across the entire history of the clade (for a similar treatment, see Cody et al. 2010). Because many inferred dispersal events involved species that were ancestrally either North or South American but that currently occur in both areas (i.e. range expansions), this treatment will necessarily spread inferred dispersal densities into the past, though most of these cases likely involve fairly recent range expansions or sequential dispersal and divergence (i.e. unrecognized speciation events within widespread species). Additional intraspecific sampling within this group is necessary to overcome this issue (Weir et al. 2009, Smith and Klicka 2010), and the results presented here should be conservative with regard to the hypothesis of a burst of post-Isthmian dispersal. Because clade diversity necessarily increases over time, we also divided this distribution by the integral of standing diversity across the clade's history to obtain a per lineage dispersal rate for comparison with the estimated dates of Isthmian closure. We assessed the symmetry of exchange by examining inferred transition rates and the frequency of individual reconstructed transitions from the discrete model analysis and by performing likelihood ratio tests of symmetry.
Phylogeny of Emberizoidea
We obtained synthetic phylogenetic hypotheses (essentially planned supertrees) for 791 of an estimated ~832 emberizoid species, for 95% sampling. These summary trees and tree pseudoposterior distributions (Figure 1; also see Supplemental Material Figures S1 and S2, deposited along with the code used in generation of the pseudoposterior in the Dryad Digital Repository; Barker et al. 2014) represent the best, most comprehensive estimate of phylogenetic relationships in this highly diverse clade, summarizing and adding to previously published higher-level and subclade analyses. These estimates are well resolved, with ≥73% of all nodes reconstructed with 95% “posterior probability” (remembering that these values are composites of individual subclade concatenated analysis numbers and backbone species tree or concatenation tree numbers). As previously discussed (Barker et al. 2013), there are some substantial disagreements regarding basal relationships among major emberizoid clades (families in our treatment); more extensive gene sampling at that level is required to resolve these relationships. For most purposes, however, we expect that this uncertainty will have little effect on analyses using these trees, as we demonstrate below through the results of our biogeographic analyses of the group.
Maximum likelihood analysis of emberizoid distributional data as a discrete character strongly favors a North American origin for this currently widespread New World group (Table 1 and Figure 2; Supplemental Material Figures S1 and S2). This result pertains regardless of which backbone tree is used (Table 1) and whether we analyze current breeding or wintering distributions (not shown). This result is also consistent across samples (N = 100) from both the species tree and concatenated tree posteriors, with all replicates yielding ≥0.95 probability for North America at the root. On the species tree backbone, constraining the ancestral state for this clade to be South American reduces the ln-likelihood value for the model by 4.6 units (5.3 for the concatenation tree), corresponding to a relative likelihood of <1%. In addition to the clade as a whole, 4 of the 5 most species-rich families of Emberizoidea were reconstructed as North American (Table 1 and Figure 2).
Likely ancestral areas for Emberizoidea (root node) and major subclades. Shown is the proportion of marginal likelihood attributable to individual areas in a maximum likelihood analysis of discrete distributional data on 2 alternative emberizoid phylogenies.
Timing and Directionality of Dispersal
Analysis of the timing of transcontinental dispersal events supports a post-Isthmian-closure increase in both absolute and relative rates of dispersal (Figure 3). The total number of lineages involved in dispersal between North and South America increases continuously throughout emberizoid history, but markedly so after final closure of the Isthmus (Figure 3B). However, this increase has to be considered in the context of the entire clade's history: Any randomly evolving discrete character would increase in the number of inferred transitions toward the present as the total amount of time sampled by the phylogeny increases. Figure 3C shows the density of crossing lineages corrected for standing diversity. Two patterns are of particular note. First, the relative rate of crossing is basically constant through most of the history of the clade, excepting a very high early rate due to the early invasion of South America by the cardinal–tanager clade (0.62 crossing lineage−1; not shown on the figure because of scaling) at a time when we infer very few extant emberizoid lineages (though this is certainly an underestimate, the magnitude of which depends on extinction rates in this group). Second, we note a more modest but appreciable post-closure increase in dispersal rates. As expected on the basis of previous results in birds (Weir et al. 2009; but see Smith and Klicka 2010) and mammals (Marshall et al. 1982), we found strong evidence for asymmetry in dispersal between continental areas. Although rates in both directions were significant, we inferred rates from North to South America >3 times higher than the reverse (Table 2), similar to the pattern for mammals as a whole but in striking contrast to previous results from broader surveys of birds (Weir et al. 2009, Smith and Klicka 2010). Similarly, excluding the origin of widespread distributions, we found nearly twice as many inferred dispersal events from North to South America than the reverse (Table 3). Constraining the model of discrete character evolution to symmetry between North and South American dispersals reduced the likelihood by 8.4 natural log units, strongly supporting the asymmetric model (−2 ln Λ ~; P = 0.004).
Transition (inferred dispersal) rates among areas (Ma−1), as inferred by partially constrained (transitions from non-Caribbean islands set to zero) maximum likelihood analysis of distributional data on the species tree estimate of emberizoid phylogeny.
Frequency of reconstructed distribution changes from discrete character analysis of emberizoid ranges based on maximum likelihood reconstruction on the species tree backbone estimate (N = North America, S = South America, C = Caribbean, I = Other Islands, and O = Old World). Changes are divided by location in the tree: external or terminal branches with a single descendant versus internal branches. X→X transitions imply intra-region divergences and are reported to reflect regional diversity.
Strategies for Construction of Large Phylogenetic Trees
Ideas on methods for construction of large phylogenetic trees have been largely focused on the debate between “supertree” and “supermatrix” approaches (Bininda-Emonds 2004a, 2004b, de Queiroz and Gatesy 2007). On one hand, supertrees—generated by quantitative analysis and integration of independent phylogenetic trees without reference to the original data—may be computationally less demanding and can incorporate different incompatible data types (e.g., distance and character data). On the other hand, supermatrix analyses—integration of all available character data into a maximally informative matrix for subsequent phylogenetic analysis—afford the opportunity for data to interact in new ways that may reveal novel hypotheses or emergent support (Gatesy and Baker 2005). Philosophically, many systematists tend to favor the supermatrix approach, but are hindered by significant computational constraints that limit our ability to appropriately analyze large multigene matrices. In particular, species tree methods are the most appropriate tool for analyzing large multigene matrices, but fully parametric approaches are computationally intensive, which can limit the numbers of both genes and taxa analyzed (Cranston et al. 2009). Although less rigorous non- or pseudo-likelihood approaches are available (Liu et al. 2009, 2010), our previous analyses of emberizoid molecular data suggest that such methods can fail to adequately reflect the information content of the available data (Barker et al. 2013). Given these problems, we favored a hierarchical approach that integrates species tree and concatenated analyses of limited sampling to construct a well-supported higher-level backbone for the phylogeny, with taxonomically extensive concatenated analyses of specific subclades. Ideally, subclade analyses would also use species tree methods, but the large size of some subclades (e.g., tanagers) makes this unfeasible for the group treated here. Our approach is essentially a “planned supertree” strategy—a close parallel to divide-and-conquer strategies increasingly used in phylogenetics (Roshan et al. 2004, Goloboff and Pol 2007)—that reflects what we consider to be the best possible analyses at each hierachical level of this large phylogeny. Given the ubiquity of gene tree incongruence at multiple hierarchical levels (Edwards et al. 2007, Brumfield et al. 2008, Cranston et al. 2009, Barker et al. 2013), we anticipate that this strategy will continue to be of use in the near term because of limitations on both the number of genes and the number of taxa that are computationally feasible using species tree methods.
At least 3 previous studies have presented fairly comprehensive phylogenetic hypotheses for this group based on alternative strategies for tree construction. In particular, 2 supertree studies (Jønsson and Fjeldså 2006, Davis and Page 2014) and 1 hybrid supermatrix–supertree study (Jetz et al. 2012) have included 373, 525, and 684 emberizoids (45%, 63%, and 82%), respectively. Although these studies generally do a reasonable job of reflecting their constituent trees and/or underlying molecular data, they each have serious drawbacks as representations of emberizoid evolution. First, the 2 supertree studies omit critical, deeply diverging emberizoid taxa such as Nesospingus, Spindalis, Teretistris, Calyptophilus, Mitrospingus, Lamprospiza, and Rhodinocichla. Second, the same studies reflect some relationships now known to be spurious in the light of additional data or analyses (e.g., Nesopsar as a basal member of the blackbirds, failing to find some Thraupis tanagers nested within Tangara). Third, the Davis and Page (2014) supertree, which was constructed algorithmically rather than “hand assembled” in light of expert knowledge like the Jønsson and Fjeldså (2006) tree, exhibits a number of clear artifacts, such as placement of Rhodinocichla well within the tanagers and recovery of a clade of 1 Sturnella and 4 Icterus species far outside of Icteridae. In contrast to the other 2 studies, the Jetz et al. (2012) supertree of emberizoid relationships is based directly on analyses of available GenBank data (at least prior to grafting of missing taxa). One consequence of this is that their study includes the critical taxa missing from the supertrees. Even so, the study includes relationships that can only be explained as resulting from faulty assumptions used in tree assembly. In particular, the Jetz et al. tree presents finches of the genus Chlorophonia as members of the tanager clade, a result that can only reflect the assumption that these are tanagers, given that both mitochondrial and nuclear DNA strongly support their placement outside of emberizoids (Burns 1997, Barker et al. 2013), and that the Jetz tree correctly places the closely related Euphonia with the finches. Similarly, their tree weakly places South Atlantic Nesospiza and Rowettia within tanagers as distant relatives of Phaenicophilus: Mitochondrial DNA (the only locus sampled for these taxa by Jetz et al.) strongly suppports these taxa as close relatives of southern cone Melanodera and Andean Phrygilus (Barker et al. 2013, Ryan et al. 2013, present study). Other differences between our trees, such as their failure to recover Mitrospingidae as a deep lineage closely related to but outside of Thraupidae, and the lack of monophyly of Thraupidae as we recover it here, may reflect differences in gene and taxon sampling. Still others, such as the placement of Geospiza fortis far outside of Geospiza, are probably due to alignment artifacts or GenBank annotation errors. These and other issues with previous studies, along with our tree's more comprehensive taxon sampling and its improved support at multiple hierarchical levels, suggest that our hypothesis of emberizoid relationships is the best available to date.
Biogeographic Origins of Emberizoidea
Oscine passerines appear to have arisen in Sahul (the continental mass including Australia, Tasmania, New Guinea, and portions of Wallacea; Barker et al. 2002, 2004), possibly as early as the Paleocene (66–56 mya), although the age of passerines has been debated (Barker et al. 2004, Ericson et al. 2006, Brown et al. 2008, Cracraft and Barker 2009, Mayr 2013). Subsequently, 1 diverse lineage of this group (the Passerida) is thought to have dispersed into Eurasia (Barker et al. 2004, Barker 2011) or Africa (Ericson et al. 2003), with later dispersal into the New World. In terms of species numbers, the Emberizoidea represent the most successful New World lineage of Passerida, exceeded in diversity in the region only by the endemic radiation of suboscines (Barker et al. 2004). Given an estimated stem age of ~20 Ma for the Emberizoidea (Barker et al. 2004), its presence in the New World is likely the result of dispersal over a Beringian land bridge (as suggested by Mayr 1946, 1964), given that North Atlantic and Antarctic land connections were sundered by water barriers by 39 and 26 mya, respectively (reviewed in Sanmartín et al. 2001, Sanmartín and Ronquist 2004). However, given the clear ability of many Recent birds to colonize new areas after dispersing long distances across water, this hypothesis has required testing in a phylogenetic framework.
Our reconstruction of ancestral areas strongly supports a North American origin for the group and, hence, a Beringian dispersal route to the New World (Table 1 and Figure 2), in agreement with previous judgments (Mayr 1946, 1964). Subsequent to dispersal across Beringia, this group of songbirds has accumulated ~791 species in the New World (plus 41 in the Old World, resulting from back-dispersal of Emberizidae at ~11.8 mya), for a total of ~7.8% of all extant avian species diversity. Our evidence for trans-Beringian dispersal and northern ancestry of this lineage suggests that a long-distance, transoceanic dispersal event to South America did not lead to colonization of the New World by emberizoids, as has been suggested for a variety of other taxa (Simpson 1980, Renner 2004). Therefore, despite the potential for rare, long-distance dispersal events to shape patterns of avian distribution (e.g., Telfair 1994), our results suggest that gradual range expansion, mediated initially by a Beringian land bridge, was the likely route of New World colonization by this group.
Intrahemispheric Dispersal and Diversification of the Emberizoidea
The earliest history of this group was dominated by diversification within North America, as evidenced by 4 of the 5 major clades within the group appearing to be ancestrally North American (Table 1 and Figure 2). These reconstructions are largely consistent with Mayr's (1946, 1964) assessment of these groups, although our analysis resolves his ambiguous “Pan-American” families Emberizidae (classified as North American in 1946) and Icteridae (classified as “probably originally South American” in 1946) as unequivocally North American. Although North America was clearly an important early center of emberizoid diversification, at least 3 relatively ancient dispersal events led to the founding of endemic South American and Caribbean lineages (Figure 2). One dispersing lineage would have been the common ancestor of Mitrospingidae, Cardinalidae, and Thraupidae, although it is possible that Mitrospingidae and Thraupidae independently reached South America. Subsequently, the Thraupidae have gone on to become a diverse component of many South American avifaunas (Stotz et al. 1996) and have not diversified substantially in North America. The history of Caribbean emberizoid diversity is less clear because of significant uncertainty in phylogenetic relationships of these lineages (Barker et al. 2013) but involved at least 2 ancient dispersal events (at ~11.7 and ~12.6 mya) into the region (the Teretistridae and the clade including Nesospingidae, Spindalidae, Calyptophilidae, and Phaenicophilidae; Figure 2). Dispersal among these regions increased in frequency as the total diversity of emberizoid clades increased, as well as with changes in connectivity (e.g., formation of the Isthmus of Panama; see below).
As expected given the relative size of the landmasses involved, interchange between North and South America dominates the history of emberizoid dispersal within the New World (Table 2). Importantly, dispersal rates between these 2 landmasses do not appear to be temporally uniform. Aside from the early dispersal of thraupids and their allies into South America, much of emberizoid history is characterized by relatively constant dispersal rates. Subsequent to the closure of the Isthmus of Panama, however, there was a marked increase in per lineage dispersal between the continents (~30% on average, but up to twofold in the most recent 300,000 yr; Figure 3). As noted above, it is likely that the analyses presented here underestimate the recent peak in dispersals, given that dispersal probabilities are effectively smoothed out by long branches. Even so, it seems clear that dispersal between North and South America has increased over the past 2 Ma or so, likely as a result of closure of the Isthmus.
Notably, the second-highest overall inferred dispersal rate was from North America to the Caribbean. This is driven primarily by relatively recent dispersals of parulid and icterid lineages into the Caribbean, but also by the much more ancient origins of endemic family-level lineages (e.g., Teretistridae and Calyptophilidae; see above). These reconstructions suggest an early and ongoing importance of the Caribbean to diversification in this group. As expected, given faunal turnover as a function of island area and susceptibility to catastrophic weather events (e.g., hurricanes), all of the ancient family-level diversity in this group is found on the larger islands within the Greater Antilles, whereas generic- and species-level diversity of more recent origin is found in both the Greater and Lesser Antilles (e.g., Ptiloxena, Melopyrrha, Loxigilla, Catharopeza, and Setophaga plumbea). Clearly, both immigration and intrabasin diversification have played a role in assembly of the Caribbean avifauna, given that there are 54 species in this clade with some portion of their distribution in the Caribbean, and only 27 reconstructed dispersals or range expansions into the region (Table 3). It is clear that assembly involved multiple processes, including one-off immigration events such as Dendroica vitellina (possibly a migratory “drop-off”; Outlaw et al. 2003, Riesing et al. 2003), the origin of widespread colonizing lineages such as Coereba, formation of allopatric clusters (either by local differentiation of a widespread ancestor or by island hopping) of closely related lineages within the Caribbean basin (e.g., Loxigilla and allies), and intra-island diversification as seen in the 4 Hispaniolan species of Phaenicophilidae. Thus, repeated cycles of colonization, isolation, specialization, and extinction have shaped modern emberizoid diversity in the Caribbean basin (Ricklefs and Cox 1972, Ricklefs and Bermingham 1999, 2001, 2007, Ricklefs 2010).
We have reported here a broad-scale analysis of origins and trends in interhemispheric dispersal of a widespread passerine radiation. We note that more detailed inference of geographic history is complicated by a feature of emberizoid biology that has often confounded avian historical biogeographic analyses: Many emberizoid species are long-distance seasonal migrants between North American breeding grounds and Neotropical wintering grounds. There has been considerable debate as to whether the breeding or nonbreeding distributions of migratory species represent the true ancestral ranges of migratory lineages (Gauthreaux 1982, Cox 1985, Bell 2000, Salewski and Bruderer 2007). In particular, it has been suggested that North American breeding ranges of many Nearctic–Neotropic migratory bird species may have evolved from ancestral areas closer to present-day nonbreeding ranges during the evolution of migration (e.g., Cox 1985), which would cast doubt on the ultimate North American ancestry of such lineages. Here, we find that both the breeding and nonbreeding distributions of the group are reconstructed as North American, supporting our conclusion that the ancestral emberizoid colonized the New World via Beringia as opposed to long-distance overwater dispersal to South America. However, it remains unclear how migratory ranges evolved among emberizoids subsequent to colonization of the New World, and how the emergence of these disjunct breeding and wintering ranges influenced interhemispheric dispersal patterns in this group. A biogeographic analysis that more explicitly evaluates the reciprocal evolution of breeding and wintering range throughout lineage history is required to resolve this issue as well as determine the geographic origins of long-distance migratory species.
More than 50 years ago, Mayr (1946, 1964) envisioned—and attempted—a comprehensive analysis of the evolutionary origins of New World avifaunas and the ecosystems in which they are found. In the decades since, we have made significant methodological advances that have substantially increased the level of analytical detail as well as the rigor of inference possible. We are only now beginning to take up these questions with new data and methods (Lovette and Hochachka 2006, Weir et al. 2009, Barker et al. 2013), but the initial results, including those reported here, are encouraging. In agreement with Mayr, we reconstructed a North American origin for the Emberizoidea and most of its major clades, supporting a Beringian dispersal route for the group, as well as the existence of an endemic North American tropical avifauna. We also found that the tanagers form part of the endemic South American avifauna, derived from an early dispersal from the north. Although study of a single clade does not allow us to assess the overall asymmetry of faunal exchange, we found evidence of ongoing interchange between North and South America—both pre- and post-Isthmian—with much higher rates after closure. We also found a bias toward northern invasion of the south, consistent with a northern origin: Given the prominence of emberizoids in the New World avifauna, this suggests that Mayr may have underestimated the asymmetry of exchange. A more comprehensive assessment of the avifauna (e.g., Weir et al. 2009, Smith and Klicka 2010), using comprehensively sampled phylogenies for all relevant groups, will be necessary to assess this. Although beyond the scope of this paper, other questions addressed by Mayr offer clear avenues for future research. In particular, the phylogenetic structuring of New World avian communities is a question only now being addressed quantitatively with phylogenetic methods (Lovette and Hochachka 2006, Ricklefs 2011). The phylogenetic hypothesis presented here will be a critical resource for addressing these and a wide array of other evolutionary and ecological questions.
We thank the scientific collectors, collection managers, staff, and curators at the following institutions for providing the tissues without which this study would have been impossible: American Museum of Natural History; Academy of Natural Sciences, Philadelphia; University of Minnesota, Bell Museum of Natural History; Colección Ornitológica Phelps; Cornell University Museum of Vertebrates; Universidad del Valle, Colombia; Field Museum of Natural History; Instituto de Investigación de recursos Biológicos Alexander von Humboldt; Instituto de Ciencias Naturales, Universidad Nacional de Colombia; University of Kansas Natural History Museum; Natural History Museum of Los Angeles County; Louisiana State University Museum of Natural Science Collection of Genetic Resources; Museo Argentino de Ciencias Naturales “Bernardino Rivadavia”; University of Nevada Las Vegas, Barrick Museum of Natural History; Museum of Vertebrate Zoology, University of California, Berkeley; San Diego State University Museum of Biodiversity, Smithsonian Tropical Research Institute; University of Michigan Museum of Zoology; National Museum of Natural History (Smithsonian Institution); University of Washington, Burke Museum; and Zoological Museum, University of Copenhagen. This research was supported by the National Science Foundation (DEB-0316092 to S.M.L. and F.K.B.; IBN-0217817, DEB-0315416, and DEB-1354006 to K.J.B.; DEB-0315469 to J.K.; and DEB-0315218 to I.J.L.). Some analyses were performed with resources of the Minnesota Supercomputing Institute. We thank B. Winger and two anonymous reviewers for comments on the manuscript.