Phylogeography is experiencing a revolution brought on by next-generation sequencing methods. A historical survey of the phylogeographic literature suggests that phylogeography typically incorporates new questions, expanding on its classical domain, when new technologies offer novel or increased numbers of molecular markers. A variety of methods for subsampling genomic variation, including restriction site associated DNA sequencing (Rad-seq) and other next generation approaches, are proving exceptionally useful in helping to define major phylogeographic lineages within species as well as details of historical demography. Next-generation methods are also blurring the edges of phylogeography and related fields such as association mapping of loci under selection, and the emerging paradigm is one of simultaneously inferring both population history across geography and genomic targets of selection. However, recent examples, including some from our lab on Anolis lizards and songbirds, suggest that genome subsampling methods, while extremely powerful for the classical goals of phylogeography, may fail to allow phylogeography to fully achieve the goals of this new, expanded domain. Specifically, if genome-wide linkage disequilibrium is low, as is the case in many species with large population sizes, most genome subsampling methods will not sample densely enough to detect selected variants, or variants closely linked to them. We suggest that whole-genome resequencing methods will be essential for allowing phylogeographers to robustly identify loci involved in phenotypic divergence and speciation, while at the same time allowing free choice of molecular markers and further resolution of the demographic history of species.
Like many fields in evolutionary biology, phylogeography has been consistently transformed by available technologies for assaying genetic variation. Indeed, the various approaches for measuring genetic variation have waxed and waned as available technologies come and go. A determinist viewpoint might suggest that major trends in marker use over the decades have been driven largely by available technologies, such as PCR and next-generation sequencing (reviewed in Brito & Edwards 2009). However, a more “free will” perspective might suggest that marker choice is driven also by the conceptual needs of the discipline and a rallying of the field behind core concepts that argue for one set of markers over another. What seems clear is that, as technologies available to phylogeographers have changed, the borders of the discipline - the types of questions and hypotheses that are tackled - also changed. In particular, phylogeography now appears to regularly pose questions that were traditionally the domain of its closely allied sister discipline, population genetics, and increasingly in the domain of finding the loci responsible for phenotypic variation in natural populations. These new questions themselves can also drive the shape of the field and the tools that phylogeographers adopt in order to answer them. In this essay we explore the history and future of phylogeography through this lens of changing technologies and questions. We suggest that the domains of phylogeography have expanded to include surveys of selection and covariation of genes and environment across the landscape as a result of the increasing ability to assay variation at large numbers of loci through next-generation sequencing. We suggest that trends in marker use imply both a degree of technological determinism as well as shifts in free choice of markers over time. Like other recent reviews, we foresee a day when a greatly expanded toolkit of markers and phylogeographic questions will be readily available through routine whole-genome resequencing of natural populations sampled across geographic space.
Molecular markers and the core concepts of phylogeography
In the 1990s, the polymerase chain reaction (PCR) was the primary driver, and this led to a proliferation of studies focusing on mitochondrial DNA in animals, and on chloroplast DNA in plants. The focus on organellar genomes was not necessarily prescribed by PCR, but the ease of amplification using PCR and the challenges of working routinely with nuclear genes perceived by phylogeographers made organelle genomes a practical focus. MtDNA and cpDNA were also attractive because they were polymorphic at the intraspecific level and experienced little if any recombination, making it straightforward to move from DNA sequence to gene tree without additional data manipulation. With this focus on organellar genomes came the oft-repeated caveats now familiar to every student of phylogeography (Edwards & Bensch 2009): that differences in gene flow between the sexes might cause organelle genomes to recover a biased history of a given species; that the smaller effective population size of organellar genomes might cause the genetic lineages to track population lineages more faithfully than the average marker, sometimes resulting in an overly simplistic view of population history; and, more recently, that natural selection on organelle genomes, particularly mtDNA, might yield estimates of genetic diversity, or spatial patterns in the distribution of that diversity, which do not reflect neutral processes or recent population history (Rand 2001, Excoffier & Ray 2008, Nabholz et al. 2008, 2009). Rather, lineage-specific mutation rates, which are hard to predict for a given species from first principles or life history parameters, appear to be the best predictor of variation in genetic diversity across species, at least in birds and mammals. Natural selection has also emerged as a key determinant of mitochondrial diversity within species, as well as of relationships between mtDNA haplotype or codon distributions and latitude or thermal environment, potentially compromising efforts to understand neutral diversity via neutral markers (Gerber et al. 2001, Ballard & Whitlock 2004, Ribeiro et al. 2011, Jobling 2012, Ballard & Pichaud 2014, Morales et al. 2015). Although genetic diversity within populations and species is not always a primary focus of phylogeographic studies, it is certainly a basic descriptor of population history and the forces governing it warrant our attention.
The first forays in to the nuclear genome in animal phylogeography came in the early 1990s in the form of PCR-amplified nuclear DNA sequences and via microsatellites. Diploid nuclear genes were typically amplified via PCR and sequenced directly (i.e. without cloning first), a practice that led to much hand-wringing over how to determine the phase of nuclear haplotypes comprising the PCR product from heterozygous individuals (Palumbi & Baker 1994, Hare & Palumbi 1999). The phase of nuclear alleles is important because only after determining phase is one able to coherently analyze alleles within populations or linked sites, even for PCR products of a few hundred base pairs. Even today, particularly when PCR-amplified nuclear genes are used in phylogenetics, the phase of nuclear alleles is often ignored, possibly because it is not deemed important when comparing highly divergent species. Recombination also had to be acknowledged and often this was accomplished by determining DNA tracts within which no detectable recombination was observed. Detection of recombination events was often accomplished through software focusing on phylogenetic discordances among sites within a sequence or by estimating linkage disequilibrium among sites within or between loci (Hudson & Kaplan 1988). Testing for and dealing with recombination, for example by retaining only sections of an alignment free of detectable recombination, was and continues to be important so as to ensure assumptions are not violated when building gene trees or when estimating population parameters that require either full linkage between sites within loci or complete independence of sites.
The specific ways in which sequence data were analyzed were very much constrained by the technical limits of PCR; typically data sets consisted of a few loci, each of a few hundreds base pairs, and an ecosystem of software emerged around this particular format. Discordances between nuclear and mitochondrial DNA came to the fore as researchers were able to directly observe them in phylogenetic and phylogeographic analyses (e.g. Godinho et al. 2008), and in particular in studies of hybrid zones. The effects of incomplete lineage sorting were also readily visible even in analyses employing few loci. PCR of nuclear genes was very much a brute-force operation, the number of loci being assayed directly proportional to the amount of effort and number of PCR experiments performed. At the zenith of the (PCR) nuclear age in phylogeography (Balakrishnan et al. 2010), typical studies included tens of loci, and there was mounting evidence that the uncertainty of estimates of demographic parameters decreased with increasing numbers of loci. In addition, population genetic theory suggested much the same: for example, the optimal sampling scheme for estimating genetic diversity within single populations is generally thought to maximize the number of loci at the expense of the number of individuals (alleles) or length of loci (Nei & Roychoudhury 1974, Pluzhnikov & Donnelly 1996, Felsenstein 2006, Carling & Brumfield 2007). Surprisingly, the optimal sampling scheme for estimating genetic parameters from multiple populations partially linked by gene flow is still understudied. In the PCR era, given constraints on budgets and finite ability to sample individuals, there was a trade-off between the number of individuals or populations sampled and the number of loci assayed. We suggest that this trade-off is partially removed with the advent of next-generation sequencing.
Microsatellites and other simple sequence repeats also emerged in phylogeography in 1990s, following their discovery in the previous decade (Tautz & Renz 1984, Jeffreys et al. 1985). Although first employed extensively in the study of parentage in natural populations (Burke et al. 1989, Gyllensten et al. 1990), understandably these markers swept like wildfire through phylogeography. Indeed, despite the fact that microsatellites fail to capture critical components of the original spirit of phylogeography - in particular phylogeography's focus on phylogenetic lineages - they have historically been the most extensively used molecular marker in phylogeography. Their popularity is understandable because of their hypervariability - it is easy to be seduced by markers with such a large number of alleles and such potentially high resolving power. On the positive side, microsatellites have provided unquestionable insight into the demographic histories of literally thousands of species, and have helped expand phylogeography to incorporate and synergize with sister disciplines such as population genetics, landscape genetics, and even behavioral ecology. They also carry some information on the relationships among alleles - assuming a step-wise mutation model - and in principle have the ability to distinguish between recent and more ancient timescales of population divergence (e.g. FST vs. RST comparisons). On this logic, some authors have suggested they may be useful for estimating divergence times as well (Sun et al. 2009). On the down side, several authors have called for a reappraisal of the utility and neutrality assumptions of microsatellites and have questioned the high degree of enthusiasm for these markers in phylogeography (Brumfield et al. 2003, Morin et al. 2004, Zink & Barrowclough 2008, Edwards & Bensch 2009, Zink 2010). As put by Morin et al. (2004) “… the high information content (of microsatellites), a result of high mutation rates, comes at a price…”. The challenges and deficiencies of microsatellites in phylogeography have been reviewed extensively elsewhere (Zink 2010, Albayrak et al. 2011, Perktaş et al. 2015), and include substantial homoplasy, making estimates of the number of mutations difficult; an inability to conduct robust phylogenetic analyses, and hence offering little continuity between phylogeography and phylogenetics; frequent null alleles; and difficulty comparing to sequence-based markers, including mtDNA. Less well-appreciated deficiencies of microsatellites include clear evidence that some simple sequence repeats are indeed functional - they are often involved in gene regulation in both microbes and eukaryotic genomes - and thus may be subject to selection. This last critique no doubt applies to other kinds of markers as well, including SNPs, but the frequent appeal to neutrality by users of microsatellites should be tempered by the increasing number of examples of functional roles for such markers (Liu et al. 2000, Metzgar et al. 2000, Sureshkumar et al. 2009, Tremblay et al. 2010, Grover & Sharma 2011, Gao et al. 2013).
There are no doubt still staunch defenders of microsatellites, and we do not mean to suggest that SNPs, sequence-based markers, or other alternatives to microsatellites are not above reproach. A major criticism of sequence based markers or SNPs in phylogeography has been the paucity of such markers and their low polymorphism. While these criticisms may have been valid in the PCR era. We suggest that they no longer apply meaningfully given the large number of SNPs now achievable with next-generation sequencing approaches. By contrast, although the number of microsatellite loci has been increasing in recent years, we do not know of efforts to assay variation targeted at microsatellites using nextgeneration approaches. Next-generation isolation of microsatellite loci, followed by PCR-based assays of variation, has been used with considerable success (Abdelkrim et al. 2009, Perry & Rowe 2011, Singham et al. 2012, Curto et al. 2013, Taguchi et al. 2013), but actually assaying variation and scaling up beyond PCR-based assays to our knowledge has only recently been developed for microsatellites in studies of phylogeography (but see Fordyce et al. 2011, Raposo et al. 2015, Suez et al. 2015). Although it is surely too early to tell, we suggest that this technical gap implies that the community does not place a high priority on scaling up for microsatellites, perhaps because it is thus far comfortable with the expanded power of SNPs in the next-generation sequencing era. Garrick et al. (2015) recently declared that “Compared to other classes of molecular markers, DNA sequence haplotypes and single nucleotide polymorphisms (SNPs) should be more informative about historical events and processes … operating over timescales most relevant to the discipline (of phylogeography)” (Garrick et al. 2015). While we agree wholeheartedly with this statement, we suggest that much of the community might still favor microsatellites if given the choice, particularly in comparisons of closely related, endangered or very recently diverged populations. This preference, we suspect, is due in part because some labs may not yet have ready access to next-generation sequencing methods. But it might also be due to the perception that, due to their hypervariability, microsatellites have advantages over SNPs in many contexts, especially if they can be assayed in large numbers (Becquet et al. 2007, Kwong & Pemberton 2014).
We sought to determine whether changes in methodologies and markers used in phylogeography have been driven by choice or instead more by the availability of technologies adopted primarily for increasing the number of loci. We conducted a study parallel to that of Garrick et al. (2015) by reading abstracts for 397 papers that use microsatellites in phylogeography and were published in Molecular Ecology, a major outlet for phylogeographic research (Fig. 1, see legend for methods). Garrick et al. (2015)'s survey, which comprised 370 papers reporting on 508 single-species data sets, was interesting because it somewhat unexpectedly focused on SNPs, whereas our intuition was that, among nuclear loci, microsatellites were the main driver of phylogeography until recently. Our analysis confirms this suspicion: once the studies that only used mitochondrial DNA were pulled out from their analysis (a total of 280 studies, comprising 73.5 % of all “SNP” studies in our sample; see Fig. 1), the number of phylogeographic studies using microsats is comparable to, and sometimes exceeds, that using nuclear SNPs (Fig. 1A). Intriguingly, phylogeographic studies employing only mtDNA do indeed seem to be declining since 2007, at least in the pages of Molecular Ecology. This decline could highlight a shift in preference within the field towards other types of genetic data or a shift in preference of journals against publishing studies that rely solely on mtDNA. Additionally, the year 2013 suggests a shift as studies employing nuclear SNPs begin to exceed those using microsatellites. This uptick does not seem to be driven entirely by next-generation sequencing, which only comprised eight studies in our sample, suggesting that SNPs may have risen in popularity independently of novel technologies and perhaps due to conceptual advances or available software. Examination of microsatellite studies in all years of our sample (Fig. 1B) suggest that, at best, this technology has leveled out in its popularity, particularly given the increasing number of pages in the journal over time. It will be interesting to see what the next five years brings in terms of the relative use of these various marker types in phylogeography. Given the pre-eminence of Molecular Ecology in the field of phylogeography, we suggest that the trends observed here may well reveal trends that will follow in time with the rest of the field.
Next-generation sequencing and the rise of sequence-based markers in phylogeography
Phylogeographers have appreciated for years that, despite their lower polymorphism, SNPs are much more common in the genome than microsatellites (Brumfield et al. 2003). Yet this point was almost moot because it was difficult if not impossible to take advantage of SNPs on a scale that would capitalize on their ubiquity. The advent of next-generation sequencing will likely increase the swing of the phylogeographic pendulum back in favor of SNPs and sequence-based markers once and for all. The adoption of partial genome survey methods such as Rad-seq will not only yield SNPs in sufficient numbers for phylogeography, but will prescribe the use of SNPs even more so than will whole-genome resequencing. Whole-genome phylogeographic studies are already the norm for model species such as humans (Altshuler et al. 2010, Reich et al. 2010, Hammer et al. 2011, Li & Durbin 2011, Stoneking & Krause 2011) and Drosophila (Yukilevich et al. 2010, Campo et al. 2013, Duchen et al. 2013, Reinhardt et al. 2014) and researchers will have many options for marker types once this phase is achieved. Until that time, by shifting the focus of phylogeography to sequence-based markers and SNPs, next-generation sequencing methods promise to stabilize and unify phylogeographic studies in many productive ways. To us they are a positive trend for phylogeography because of the many reasons that SNPs have previously been considered beneficial: they provide more natural comparisons to variation in organelle genomes and between studies, and, despite the challenges of recombination within loci, provide natural bridges to phylogenetic analysis.
Types and consequences of next-generation sequencing approaches in phylogeography
Aside from the use of next-generation sequencing approaches for isolating microsatellite loci, nextgeneration sequencing is making inroads into phylogeography in two main ways: through Rad-seq, which generates short (∼100 bp) markers, typically with one or a few SNPs per locus (see Puritz et al. 2014 for a review of different Rad-seq methods); and through targeted capture approaches, which can be used to target already-defined sets of loci, such as exons or ultraconserved elements (UCEs) and their polymorphic flanking regions (Faircloth et al. 2012, Smith et al. 2014). Although transcriptome and amplicon sequencing have also both proven useful in phylogeography (Hedin et al. 2012, O'Neill et al. 2013), transcriptome sequencing will likely have less direct use in purely phylogeographic investigations (as opposed to the discovery of loci under selection; see below) because of its focus on loci that are relatively conserved but more likely under selection, and we predict that amplicon sequencing will ultimately prove less attractive to phylogeographers because of the labor involved and the smaller number of loci that can be assayed (but see McCormack & Faircloth 2013).
The emerging “core” approaches of targeted enrichment and Rad-seq promise to re-orient phylogeography towards sequence-based markers in different ways because of the types of data they each produce (Lemmon & Lemmon 2012, McCormack et al. 2012, 2013). Targeted enrichment approaches yield data that can be assembled into individual sequence-based markers spanning hundreds to potentially thousands of base pairs, resulting in haplotypes or consensus sequences within which there may be several to many SNPs that can in principle be subjected to phylogenetic analysis (Lemmon & Lemmon 2013). By contrast, Rad-seq typically yields loci that are too short to analyze using traditional phylogenetic methods; instead, researchers typically extract single or multiple SNPs from such Rad-loci and then analyze them as individual SNPs. In many ways the two approaches provide contrasting bridges to phylogenetics and classical phylogeography, as well as pointing to complementary analytical approaches in the future. For example, because the loci yielded by targeted enrichment approaches to phylogeography can often be analyzed using standard phylogenetic methods for estimating gene trees, they provide a natural bridge to classical phylogeography. By contrast, although the SNPs generated by Rad-seq can be used to estimate phylogenetic relationships of populations or species (“species trees”), and indeed have been subjected to concatenation approaches in early examples (Emerson et al. 2010, Merz et al. 2013), currently these markers are used to bypass classical gene trees and instead estimate the species tree directly (Bryant et al. 2012, Chifman & Kubatko 2014, Rheindt et al. 2014). These two approaches can sometimes require different sets of analyses, and it may be that the toolkit for linked SNPs such as produced by targeted enrichment is still deeper than that available for analyzing SNPs.
Although both core methods will align phylogeography squarely on the use of SNPs, whether linked or unlinked in individual loci, these differences in continuity with classical “gene tree” phylogeography and analytical approaches are significant. Gene trees may be the lynchpin in this phylogeographic transition. Many have suggested that, despite their centrality to the origins of phylogeography (Avise et al. 1987), ultimately, gene trees are a nuisance parameter in phylogeography and, if anything, can be a distraction from the key levels of analysis and primary interests, which are populations and species, not genes. In this sense, Rad-seq may have the practical advantage of finally freeing the community conceptually from gene trees and haplotype networks, which are still a ubiquitous component of phylogeography. The ability and tendency to make and interpret gene trees has resulted in heated controversies in phylogeography, such as the conflicts between model-based approaches in phylogeography and more literal interpretations of gene trees, such promoted by nested-clade analysis (Nielsen & Beaumont 2009, Beaumont et al. 2010, Templeton 2010). It will be interesting to see how the analytical methods afforded by Rad-seq versus targeted enrichment influence the next ten years of phylogeography. It may be that the sheer number of loci generated by both methods moves the field forward in adopting the model-based approaches that are clearly appropriate for such data sets.
Power of next-generation genome subsampling methods for phylogeography
Genome subsampling methods such as Rad-seq and targeted capture re-sequencing, including UCE analysis, hold enormous promise for phylogeographic investigations of neutral processes such as population structure, species delimitation and historical demography. Phylogeography has traditionally emphasized sampling of individuals and populations over loci (Brito & Edwards 2009, Garrick et al. 2015). This bias is a natural and understandable outgrowth of one of the main motivations for phylogeography - to discover novel lineages of organismal biodiversity within species. However, it is now better appreciated that sampling robustly for loci as well as individuals is essential for increased precision of parameter estimates in phylogeography and for better accounting for stochastic variation among loci (Beerli & Felsenstein 1999, Edwards & Beerli 2000, Jennings & Edwards 2005, Felsenstein 2006, Carling & Brumfield 2007). The sheer number of loci revealed by methods such as Rad-seq, UCE analysis or targeted capture methods therefore comes as a welcomed boon over the relative dearth of loci captured in a typical PCR-based study. Considering one of the core goals of phylogeography is to describe the history of populations using genomic data, it should not be surprising that even a dozen loci (which is typical for the heyday of PCR-based phylogeography) are unlikely to capture the diverse signals of history across all chromosomes in a typical genome. In the case of Rad-seq, there is a serious issue involving the bias of the technique against older haplotypes that have experienced mutations in the restriction sites used to isolate DNA fragments - a bias that can compromise estimates of genetic variation (Arnold et al. 2013). Furthermore, the process of assembling a library from non-model species often involves grouping sequence reads by some similarity threshold. The choice of these thresholds is not necessarily a straightforward process, and highly divergent alleles may be inadvertently omitted prior to downstream analyses either because their differences exceed preset similarity thresholds, or they increase the proportion of missing data in the genotype by individual matrix (Huang & Knowles 2014, Harvey et al. 2015a). Still, for other next-gen subsampling approaches, the variation revealed by next-generation approaches - whether amplicon sequencing of ∼100 loci or the tens of thousands of SNPs that a typical Rad-seq study reveals - is likely more than adequate for understanding the basic population history of most species.
Harvey et al. (2015b) recently compared the resolving power for phylogeography of data sets varying in size in terms of number and length of sequence-based markers to estimate demographic parameters (effective population size, divergence time, migration rate) and species history in a Neotropical songbird with deep phylogeographic breaks. They found that increasing the number of loci up to 5000 provided increased resolution of the particular demographic histories that they studied, but that increasing the number of loci beyond this yielded minimal gains. Additionally, they found that increasing locus length past 500 bp did not yield additional improvements in resolution for the focal parameters of their study. This study therefore suggested that the numbers of loci revealed by genome subsampling methods such as Rad-seq are likely to be adequate for resolving population history on a variety of scales. Indeed, the initial round of empirical studies using data sets produced by Rad-seq or sequence capture, usually on the order of 2000–30000 SNPs or aligned markers appear quite satisfying in so far as they have revealed undiscovered phylogeographic lineages that significantly improve our understanding of the fit of genomic variation to environmental and topographic barriers to gene flow (e.g. Alcaide et al. 2014, Harvey & Brumfield 2015). We concur with Harvey et al. (2015b) that genome subsampling methods are likely to finally provide the appropriate level of genomic detail for the foreseeable future of phylogeography.
Natural selection and the expanding domain of phylogeography enabled by next-generation sequencing
Classically, phylogeography has focused on the neutral demographic history of species, a goal that has been facilitated by studies on organelle genomes as well as multilocus analyses of nuclear genes. However, the ability to scan genomes for thousands of loci at a time has helped expand the purview of phylogeography beyond neutral demographic histories to include the discovery of loci under selection. As we have seen, with the advent of diverse types of nuclear markers including SNPs, phylogeography has relaxed its original focus on gene trees; purists may even argue that the original definition of phylogeography included a substantial focus on mitochondrial gene trees, and that use of nuclear markers with their frequent recombination might constitute an expansion of the original definition of phylogeography (Avise et al. 1987). In the same way, the ability to examine variation on a large scale and to focus on, for example, variation in transcriptomes and exomes (e.g. Marra et al. 2014) where natural selection is likely to be more prevalent, has allowed phylogeography to embrace topics such as natural selection. This shift, although fascinating in its own right, was arguably not a part of Avise's original vision for the field (Avise et al. 1987). However, the ability to examine large numbers of loci, and to estimate robust distributions of alleles and allele frequencies across geography and the genome, immediately raises the possibility of phylogeography embracing studies of natural selection. Indeed, some of the most integrative studies in phylogeography thus far are those that combine robust geographic sampling and historical demographic inference with investigations of natural selection (Deagle et al. 2012, Jones et al. 2012a, b, Pearse et al. 2014, Schielzeth & Husby 2014, Wallberg et al. 2014). Have phylogeography and population genetics become synonymous? We suggest not. If there is any attribute that distinguishes phylogeography from population genetics it is robust geographic sampling of populations within a species; such sampling is arguably a hallmark of phylogeography, yet is often not required, or achieved, in even high quality studies of population genetics.
Lewontin's paradox and genetic variation within Species
There is a long history of using population genetics to discover loci with a history of selection, beginning with Lewontin & Krakauer's (1973) observation that FST outliers could be useful in identifying loci under selection. The use of FST outliers has become common now, and, although there are caveats to the interpretation of high FST as an unambiguous signal of selection (Turner & Hahn 2010, Cruickshank & Hahn 2014), the ability now to study distributions of loci makes it a useful tool, especially when implemented with care and a consideration of the underlying demographic history (Johnston et al. 2014, Lotterhos & Whitlock 2014). Recent studies suggest that, in fact, avoiding natural selection entirely, even in the nuclear genome, may not be possible, whether studying whole-genome or transcriptome variation, because the imprint of selection may be much greater across the genome than originally envisioned by phylogeographers. For example, the overall level of diversity in the nuclear genome displayed in a species is often taken by phylogeographers to be a neutral indication of the historical effective population size, summarized for single populations by the equation π = 4Nμ. However, the small range of nuclear genetic diversities across species - usually considered to fall in a range of two orders of magnitude and often called “Lewontin's (1974) paradox” - has been a major challenge for population geneticists and has profound implications for phylogeography as well. The paradox was a major impetus for the development of the nearly neutral theory, which placed emphasis on interactions between selection and drift and seemed to fit available data better than the strictly neutral theory (Ohta 1992, Ohta & Gillespie 1996). By postulating a nearly neutral zone in which the absolute value of the product Ns was substantially less than one, the nearly neutral theory - clearly a key departure from Kimura's strictly neutral theory (Kimura 1968, 1983) - was able to account in part for this paradox. Although there have been many estimates of the distribution of selection coefficients for key model species (Keightley & Eyre-Walker 2010), until recently there were few compelling data to really test these ideas across a wide range of species.
Two recent comparative studies of population genomics are relevant to Lewontin's paradox and have important implications for phylogeography. Corbett-Detig et al. (2015) conducted an exhaustive survey of genome-wide genetic variation in 40 eukaryotes and came to the conclusion that the small range of genetic diversities (π) among species could be explained in part by the greater ability of natural selection to reduce genetic variation in species with large population size. The implication of this paper is that for some species, particularly those with large populations, selection may depress the level of neutral variation at nearly every site in the genome, because selective sweeps are common and drift is relatively weak. This study has profound implications for phylogeographic studies of widespread species; remarkably, the frequent signals of natural selection that phylogeography has come to expect for mtDNA may also apply to the nuclear genome, particularly as it applies to overall diversity in species with large populations.
In another related study, Romiguier et al. (2014) recently surveyed population variation in transcriptomes of a variety of species across the tree of life. Like the study by Corbett-Detig et al. (2015) this study, although comprehensive, cannot be considered phylogeography because the main focus was not population history but the overall amount of variation. Whereas in some cases multiple populations per species were sampled and much of the genetic variation within each species may have been captured, in neither study was the general geographic sampling robust enough to be considered phylogeography. Romiguier et al. (2014) came to the startling conclusion that the amount of variation (π) in the transcriptome was best predicted by life history attributes and longevity, rather than by geographic range or other aspects of strictly neutral demography. Surprisingly their reasoning was largely based on a neutral argument: long-lived and other species with K-selected life history traits tend to be able to sustain smaller populations, and hence lower genetic diversities, than species with r-selected life history traits, because they live in more stable environments with fewer long-term perturbations. By contrast, r-selected species, which tend to have greater genetic diversity, possess the large populations that allow them to colonize and persist in unstable habitats. Although ecologists will likely find merit in this hypothesis, because it conceives of ecology and life history as the causal driving force behind population genetic variation, it is not very satisfying from a population genomics perspective. It is surprising that even negative selection on deleterious variation, traditionally considered a major force in regulating intraspecific genetic variation, especially in protein-coding regions, was only briefly mentioned as potentially important for explaining the positive correlation between life history and the ratio of nonsynonymous to synonymous nucleotide substitutions (dn/ds) within species. In this case, the smaller populations of long-lived K species result in higher dn/ds (driven largely by higher dn) due to increased fixation of deleterious mutations in small populations, as envisioned by the nearly neutral theory (see also Weber et al. 2014). However, Romiguier et al. (2014) suggested that overall levels of synonymous substitution were largely driven by effective population size, which in turn was seen as a neutral consequence of life history variation. It will be important to verify the hypothesis of this work through genome-wide measurements of diversity as a part of detailed phylogeograhic analyses of diverse species.
Selection, recombination and hitchhiking in phylogeography
The above two studies provide an important contrast in how phylogeographers are beginning to think about genome-wide data. In particular, the issue of linkage disequilibrium (LD) and the potential for genome-wide hitchhiking has emerged as a key factor influencing patterns of variation in the era of wholegenome phylogeography. Aside from a few key emerging models for ecology and evolution, such as sticklebacks, honeybees and other groups (Jones et al. 2012a, Wallberg et al. 2014), the phylogeography of non-model species thus far has not dealt substantially with the effects of linkage on genomic variation across geographically sampled sets of populations. In the PCR era, levels of LD were occasionally measured within or between the loci that could be assayed, often to confirm the independence of loci, but in general the data available to phylogeographers was not comprehensive enough to provide meaningful insight into the effects of hitchhiking on genome-wide variation. Levels of LD can be influenced by many factors, including neutral processes such as genetic drift and population bottlenecks; selective processes like selective sweeps and balancing selection; and genetic processes like rates of recombination and mutation (Slatkin 2008). The population recombination rate (ρ = 4Nc) has been measured in relatively few non-model species, whereas there are numerous estimates for populations of humans, mice or Drosophila (Smukowski & Noor 2011).
Population geneticists have been measuring LD for decades, and one can calculate the LD or r2 value between any pair of markers, regardless of how densely the genome is sampled or whether the markers are on the same or different chromosomes. In the pre-genomic era, when LD values were calculated between loci in studies that only sparsely sampled the genome, the motivation was often to study the action of natural selection. But this was a very hit or miss endeavor: if the candidate genes between which LD was calculated were not involved in selective processes, the result was often underwhelming. By contrast, in the era of genomics, when the genome can be sampled much more densely, in principle one need not know the candidate genes under selection: LD can be exploited to discover loci that are the actual targets of selection without a priori suspicion that the loci measured are under selection themselves (Slatkin 2008). Correlated patterns variation across the genome due to hitchhiking have been used increasingly to produce a set of candidate genes responding to selection, without those candidates having been assayed directly. The candidate genes are usually physically close to or in the same linkage blocks as, the markers directly measured, and the assumption is that hitchhiking on the actual targets is causing departures from neutrality, such as high FST, in the assayed SNPs. This protocol, variably called selection mapping or association mapping, has resulted in the generation of hundreds of candidate genes in emerging model species that may be responding to environmental or other selective pressures (Hohenlohe et al. 2012). Increasingly studies are also taking advantage of the high LD created when two species or populations hybridize: chromosomal blocks deriving from each parental population can be used to identify genomic regions that underlie phenotypic traits in those populations in hybrids. This method, often called admixture mapping, has been used in humans extensively and increasingly in non-model species (Slate & Pemberton 2007, Pallares et al. 2014).
However, at the dawn of the whole genome era, such protocols can be challenging to implement, and can get stuck between a rock and a hard place (Fig. 2). On the one hand, methods such as Rad-seq, although delivering a large number of SNPs, still sample the genome sparsely, and can fall short of this goal, particularly in species where the population recombination rate is high. This failure to identify actual targets of selection through selection mapping arises because the genome-wide levels of LD can be so low as to cause the actual targets of selection to be effectively unlinked (in low or average LD with) the nearest neutral site whose variation is interrogated. The result will be little evidence for selection among those genomic SNPs that are assayed, and many targets of selection will be missed. Clades such as Drosophila, or many bird species, likely fall into this category, and may require whole-genome resequencing to more confidently identify targets of selection through hitchhiking (Backström et al. 2006, Ellegren et al. 2012, see Fig. 3; Backström et al. 2013). The advantage to such species with high recombination rates is that, when an outlier locus is detected, one can be sure that one is relatively close to the actual target of selection, although even whole genome resequencing studies in Drosophila have sometimes yielded mixed results, especially if selection is weak, very recent, or acting on standing variation. On the other hand, levels of genome-wide LD in a given species may be substantial because of recent population history, a history of domestication or an overall low population recombination rate. Canids are good examples of this pattern (see Fig. 3 and Boyko et al. 2010, Boyko 2011). In such cases, even sparse sampling of the genome will often uncover sites that appear to be under selection, having hitchhiked with the actual targets that could be megabases away. When LD is high, larger regions of the chromosome are dragged along by hitchhiking than when LD is small, and these large regions can sometimes capture hundreds of genes. But in this situation, the list of candidate genes will be so long that it becomes less than useful. Threespine sticklebacks (Gasterosteus aculeatus) may fall in this category: Rad-seq studies routinely identify FST outliers but the list of genes within linkage blocks can be long and often provide only a vague idea of actual targets of selection (Hohenlohe et al. 2012). The timing and strength of the selection event can also be important for regulating the size of the hitchhiking chromosomal segments. Chromosome inversions will also protect blocks of the genome from recombination, causing hundreds of genes to remain in high LD and making identification of the actual targets of selection challenging or impossible without further methods development. Although genome subsampling methods such as Rad-seq will in general provide a coarser picture of hitchhiking loci, even whole genome re-sequencing will not be able to unambiguously identify the actual targets of selection if LD is high.
Examples: Rad-seq meets selection mapping in natural populations
Our thinking on the efficacy of Rad-seq to search for loci under selection issues has been influenced by recent results from our laboratory. We now use two case studies, from a lizard and a songbird, to illustrate the challenges of detecting selection and of identifying targets of selection with genome subsampling methods.
Adaptation and the evolution of cold tolerance in green anole lizards
The green anole lizard, Anolis carolinensis, is an ideal species to explore the molecular basis of climate-mediated local adaptation. The only anole native to the continental United States, this species occupies the highest latitudes of any of the nearly 400 species of the genus. The northern edge of the species' distribution is likely limited by winter temperatures (Williams 1969), but populations do not hibernate, as is common for most mid- and high- latitude reptiles. By retreating to sheltered sites and basking during sun exposure, northern populations are able to remain active and periodically feed in the winter months (Bishop & Echternacht 2004), despite regular ambient temperatures below freezing. Additionally, populations from different climates display significant differences in cold tolerance (Wilson & Echternacht 1987). The recent publication of the genome of this species (Alfoldi et al. 2011) provides a unique resource for understanding the molecular processes of evolutionary response to local environment and for identifying genes that may play a key role in physiological divergence between populations of a non-model species. Taking advantage of this opportunity, we used double-digest RAD sequencing (ddRad-seq, Peterson et al. 2012) to identify regions of the A. carolinensis genome associated with cold variation across the species' range. Using SphI and EcoRI restriction enzymes, we digested genomic DNA from 28 individuals representing six populations spanning the latitudinal extent of the natural range of A. carolinensis. We genotyped 20282 SNPs with a coverage of ≥ 10× for these individuals using the Stacks software package (Catchen et al. 2011, 2013). To search for regions of the genome associated with temperature variation across the species' range, we used georeferenced locality data for each population to retrieve estimates of the mean temperature of the coldest quarter of the year (BIO11) from the Worldclim database (Hijmans et al. 2005). We then used allele counts from the Radseq dataset to calculate Bayes factor associations and Pearson correlations using the Bayenv2 software package (Gunther & Coop 2013). Variant sites in the top 1 % of both Bayes factor associations and Pearson correlations were retained as candidate markers identifying regions of the genome that may be important for local adaptation to cold (Fig. 4). This analysis resulted in 72 candidate SNPs, all of which were noncoding: 67 % are located in intergenic regions, whereas 33 % map to introns.
Several genes in this dataset may be of interest for further study due to their close proximity to SNPs exhibiting geographic correlations with temperature and their potential involvement in oxygen regulation, which has been proposed as a major constraint for ectotherms under extreme temperature challenge (Portner et al. 2006, 2007). One of these variants is located 40.8 kb upstream from the first exon of Rhoassociated protein kinase 2 (ROCK 2), whose signaling is important for regulation of pulmonary vasculature (Riento & Ridley 2003, Noma et al. 2006, Seasholtz et al. 2006, Rankinen et al. 2008). Another SNP lies 1.46 kb upstream of the first exon of transcription factor 4 (TCF4), which is involved in regulation of breathing patterns (Zweier et al. 2007). Functional genomics studies are needed to better understand the potential role and importance of these and other physiological processes to temperature-mediated local adaptation within the green anole.
Temporal evolution of house finch populations before and after an epizootic
The house finch (Haemorhous mexicanus), one of the most common birds in both urban and rural environments in North America, is rapidly becoming a model system for avian study. It has been important in studies of rapid morphological adaptation, sexual selection, evolution of disease resistance, and invasion (Badyaev et al. 2012). The uniqueness of the house finch in studies of disease ecology is the result of its relationship with the pathogen Mycoplasma gallisepticum (MG). This poultryassociated bacterium was first documented in the house finch in 1994 in the Washington D.C. area (Ley et al. 1996, Hochachka & Dhondt 2000). MG infects the respiratory tract and causes severe conjunctivitis (Hochachka & Dhondt 2000), suppresses pathogenspecific components of the immune system (Bonneaud et al. 2011), and stimulates inflammatory responses (Gaunson et al. 2006, Mohammed et al. 2007, Adelman et al. 2013). The pathogen spread through the eastern population rapidly, and by 1998 had caused severe declines across the region, as high as 60 % in some areas (Dhondt et al. 1998). Infection experiments comparing gene expression responses of eastern individuals with 12 years of exposure and historically unexposed individuals suggested rapid evolution of gene expression, disease resistance (Bonneaud et al. 2011, Bonneaud et al. 2012) and disease tolerance (Adelman et al. 2013).
We collected a genome-wide SNP dataset using double-digest Rad-seq (Peterson et al. 2012) to identify regions of the genome with signatures of MG-mediated selection over time. As in the Anolis study, we digested the genome with SphI and EcoR1, and selected fragments from 276-324 base pairs long to recover homologous loci scattered randomly across the genome. In this preliminary study, we sampled only 11 individuals (22 chromosomes), five from pre-epizootic (1990) and six from post-epizootic (2003) populations from Alabama. We ran the Radseq libraries on a single lane of HiSeq Illumina 2500, generating a total of ∼8 million paired-end reads (∼4 million pairs), each 150 bp long. Using the Stacks pipeline (Catchen et al. 2011, 2013), we genotyped over 12000 SNPs from 2223 loci (Fig. 5). Of the 7260 comparisons of SNPs achieving our quality threshold between these time periods, we found 167 (2.3 %) significant shifts in allele frequency (Fisher's exact test p-value < 0.05) from 129 unique loci. Of these 129 loci, 68.2 % are in intergenic regions, 29.5 % are in introns, 2 loci fall within exons, and 1 within a 3′ UTR region. Although none of these loci retain significance with Bonferroni correction, we suspect that larger sample sizes of individuals from additional localities will improve detection. FST values in this collection of SNPs range from 0.208 to 1 (fixed differences). These regions with high FST are in or near genes with a variety of functions. One gene, PPP2R2C is involved in immune pathways in humans, and falls 13 kb away from a SNP with an FST of 1.
These two studies illustrate the promise but also the challenges of detecting selection and identifying candidate genes in vertebrate genomes using a genome subsampling method such as Rad-seq (Tiffin & Ross-Ibarra 2014). In both studies, most of the Rad-seq SNPs fell in noncoding regions whose relationship to nearby genes was unclear. In the example from Anolis, the candidate genes identified as being the closest to those SNPs that were correlated with environmental variables were often quite far away from the SNP used to tag them. In the example from house finches, the number of FST outliers in comparisons of preand post-epizootic populations was relatively small, perhaps because the observed level of LD in house finches is generally small, and certainly because of our small sample sizes. In the few avian species that have been studied with regard to recombination rate, rates on the autosomes are likely to be quite high, with levels of LD declining rapidly as one moves away from the focal SNP (Backström et al. 2006, Bullaughey et al. 2008, Janes et al. 2009, Li & Merila 2010, Ellegren 2014). We have found that LD in songbird populations, such as red-winged blackbirds (Agelaius phoeniceus) and house finches, falls off rapidly after a few hundred base pairs, a situation very reminiscent of populations of Drosophila (Edwards & Dillon 2004, Backström et al. 2013). In such species, LD often declines between SNPs less than 500 bp apart, which means that any SNP found to exhibit high differentiation or signatures of natural selection is unlikely to be useful in identifying candidate genes for a phenotypic trait even a few kb away. Thus it is unclear whether recent proposals to map QTL in natural populations using SNP-chips containing on the order of 10000 SNPs will be effective (Hagen et al. 2013).
Recent statistical models promise new power to estimate the distribution of effect sizes of loci underlying a phenotypic trait along whole chromosomes in natural populations. For example, using approximately 10000 SNPs, Santure et al. (2013) found that for most chromosomes in the genome of great tits studied at Whyndham Woods U.K., a model in which the effect of a given chromosome on continuous traits like wing length and clutch size was proportional to the length of that chromosome, and hence the number of genes on that chromosome. By extension, this result suggests that nearly every gene has a similar - and infinitesimally small - effect on variation in the focal trait. But it is also unclear whether the failure to reject such a null hypothesis is also due to the relatively meager sampling of the genome. Although 10000 SNPs seems like a large number, it is relatively small in terms of capturing variation in genomic blocks in high LD, especially for vertebrate genomes on the order of 1–3 Gb and in species with high population recombination rates (Edwards 2013).
Measuring hitchhiking is crucial enough to the expanding domain of phylogeography that we predict that, ultimately, the field will forsake genome subsampling approaches for phylogeographically informed whole-genome resequencing studies. We are beginning to see the first glimpses of such studies in genomically unstudied species (e.g. Ojeda et al. 2014), and the results are as exciting as they are informative about the determinants of genomic variation and structure.
From phylogeography to genotype to phenotype
Loci whose variation has been influenced by natural selection are often also loci that underlie phenotypic traits. The search for loci underlying natural variation in phenotypic traits is a major thrust of modern evolutionary biology (Hoekstra et al. 2006, Hoekstra & Coyne 2007, Ellegren & Sheldon 2008, Rebeiz et al. 2009, Hiller et al. 2012, Jones et al. 2012b). Of the many methods for identifying such loci - including QTL and linkage mapping using pedigrees and crosses, or “systems genetics”, in which multiple kinds of genomic, transcriptomic, and metabolomic data are integrated together (Feltus 2014) - association mapping is probably the method most closely allied to phylogeography. This alliance arises because association mapping uses population comparisons to find genomic loci that correlate with the distribution of a particular phenotype (Stinchcombe & Hoekstra 2008, reviewed in Kratochwil & Meyer 2015). Others have made a similar connection between landscape genetics and the search for loci underlying adaptive phenotypes (Jones et al. 2013). Association mapping has significant promise for closing the genotypephenotype gap in non-model species and in many contexts has more statistical power than does mapping with pedigrees or controlled crosses (Schielzeth & Husby 2014). Indeed, the emerging trend is toward simultaneous inference of population history and identification of loci associated with phenotypic traits (e.g. Fumagalli et al. 2011, Linnen et al. 2013). Association mapping is likely to be most powerful in situations where most of the genomic variation is shared among a group of closely related populations or species, perhaps connected by high gene flow, but there exist marked phenotypic differences among those populations (Axelsson et al. 2013, Cullingham et al. 2014, Schielzeth & Husby 2014). In such situations, association mapping should reveal similar allele frequencies across nearly the entire genome, often due to shared standing variation, in both control and comparison populations, yet these populations should differ at loci whose variation correlates with the divergent phenotypes of interest. Precisely this situation has been found in those emerging cases where association mapping or candidate gene investigation has proved useful in non-model species, including several plants (Comeault et al. 2014, Cullingham et al. 2014, Johnston et al. 2014, Pearse et al. 2014, Roesti et al. 2014). Use of candidate genes in such situations can also be highly informative (Uy et al. 2009). Indeed, when populations have experienced a moderate history of divergence, such that the compared genomes differ at many sites due to neutral demographic divergence, it becomes essential to correct for such substructuring so as not to spuriously implicate loci showing strong allele frequency differences in the origin of phenotypic traits that between those populations (Pritchard et al. 2000, Patterson et al. 2006, Price et al. 2006, 2010).
This is an exciting time for phylogeography. We have a growing number of examples of studies in which employment of next-generation sequencing methods has yielded high resolution substructuring within species at a level of detail that far exceeds that formerly yielded by single locus mtDNA or microsatellite studies. It is now clear that a few thousand loci such as is typically yielded by methods such as Rad-seq may well be adequate to discover the major phylogeographic lineages within a species. Although the finer details of the history of species can always be clarified further with increasing genomic sampling (or individual sampling), many of these details are likely lost to historical reconstruction because of their age or their mild imprint on the pattern of genome-wide variation. While we expect wholegenome resequencing to become more common in phylogeography, it is unclear whether this method will be overkill if the focus is strictly on the core purview of phylogeography, namely reconstruction of the phylogenetic lineages and neutral demography within species. What is clear is that next-generation methods are causing a resurgence in the use of SNPs as opposed to microsatellites, enabling easier comparisons among loci and species and providing a uniform framework for comparative phylogeography (Hickerson et al. 2010, Andrew et al. 2013).
But next-generation sequencing has also broken down the conceptual edges of phylogeography, resulting in an expanded purview that has blurred the lines between core phylogeographic foci and other areas of related interest, such as identifying loci with a history of natural selection and underlying variation in adaptive traits. Studies combining historical reconstruction of phylogeographic history and a search for such adaptive loci are becoming more common (Deagle et al. 2012, Jones et al. 2012a, b, Pearse et al. 2014, Wallberg et al. 2014), and the goal of identifying loci underlying adaptive traits is often as important as understanding the demographic history of a set of populations. This conceptually expanded phylogeography marks an important phase in the evolution of the field, and although always a glimmer in the eye of phylogeographers, has arguably been driven due to the recent arrival of high-throughput genomic approaches. As phylogeography expands its purview, it is becoming clear that genome subsampling methods such as Rad-seq, while extremely powerful for identifying phylogeographic clusters within species, may be inadequate for identifying loci that are targets of natural selection or linked to the true targets. Whole-genome resequencing will likely emerge as the standard tool, if not for traditional phylogeographic investigations, then certainly in the quest for loci underlying quantitative traits in natural populations and their history of divergence within species.
We thank Utku Perktaş for inviting us to contribute to this volume. We thank Tim Sackton, Fábio Raposo do Amaral, Ryan Garrick and Bryan Carstens for helpful discussion and sharing of data, and two anonymous reviewers and Ryan Garrick for helpful comments on the manuscript. We thank Emily Kay and Brant Peterson for suggestions on the Rad-Seq library preparation protocol, and Christian Daly and Jennifer Couget for help with Illumina sequencing. Work was supported by NSF grant DEBIOS 0923088 to SVE and grants from the American Museum of Natural History, American Ornithologists Union, and Society for the Study of Evolution to AJS. Members of Jonathan Losos' lab provided helpful comments in the development of ideas and analytical approaches of the Anolis project. The Harvard Museum of Comparative Zoology Putnam Expedition Grant, Miyata Award and Robert A. Chapman Memorial Scholarship funded Anolis fieldwork. Anolis Rad-Seq work was supported by a National Science Foundation Doctoral Dissertation Improvement Grant (DDIG award # 1311484) to SCC and SVE.
Supplementary online materials
Table S1. The full list of articles using microsatellites (Excel file; URL: http://www.ivb.cz/folia/download/edwards_et_al._table_s1_supplementary_material.xlsx).