Phylogenetic methods increasingly are brought to bear on questions of subspecies taxonomy, but several recent examples highlight the need for a clear and consistent philosophical approach to how genetic data are used to assess subspecies limits. Such standards are crucial conceptually, whether or not taxonomic decisions affect conservation decisions, as they might in a recent study focused on the California Gnatcatcher (Polioptila californica), a taxon currently protected under the U.S. Endangered Species Act. It is also crucial that any adopted framework allows each of a full range of alternatives to be either supported or rejected. In this spirit, in addition to recommending best practices, I propose an amendment to the phylogenetic species concept to include a subspecies category.
The Subspecies Debate
No taxonomic rank has been more maligned or misunderstood than the subspecies. Attacks on this rank's value date back to the early 1950s (e.g., Wilson and Brown 1953), the principal argument against it being that subspecies are “arbitrary,” a charge that could be levied with equal force at the genus, family, or any other higher rank (which tend not to correspond across phyla or kingdoms; Avise and Mitchell 2007). Although there may be merit in setting aside the whole of the Linnaean hierarchy (Ereshefsky 2001), it remains the dominant way in which we classify organisms and, hence, communicate about ecological communities, phylogenetic relationships, biogeographic processes, and a host of other basic topics in ecology and evolutionary biology. Linnaeus's scheme is likely to be with us for years to come, so we ought to determine how best to standardize its use across all organisms and ensure that classification into established ranks follows a logical and repeatable procedure. This last point, repeatability, is an often neglected cornerstone of the scientific process. Many criticized Sibley and Ahlquist (1990) when they opted to allow particular levels of Δt50, a measure of the difference in temperature at which DNA heteroduplexes and homoduplexes denature, to assign ranks of family, tribe, order, and the like—yet, if nothing else, the procedure and the assignment were repeatable.
Strides have been made to reduce subjectivity in other ranks, most notably that of species, a rank even Darwin refused to define despite its appearing prominently in the title of his famous book. Darwin considered species limits arbitrary, and modern debates about the relative virtues of particular species concepts have done little to address the inherent subjectivity of, say, what exactly it means to be reproductively isolated (how much hybridization is too much?) or how exactly to identify a clade that corresponds to something above an isolated population yet below a wholesale radiation (i.e. the “diagnosable clusters” of the phylogenetic species concept; Cracraft 1983). A crucial step in the direction of objectivity has been a recent emphasis on effect size (of trait variability) to determine species and subspecies limits (Patten 2010, Tobias et al. 2010, Winker 2010). Akin to Sibley and Ahlquist's (1990) Δt50 thresholds, the idea is that differences of a particular magnitude—a large effect size—for a trait being examined will indicate whether 2 populations are 2 species (Tobias et al. 2010). A similar argument can be made to assess whether 2 populations correspond to 2 subspecies (Patten 2010), provided it is clear that there are distinct thresholds that determine species limits and subspecies limits.
This last point, that different taxonomic ranks have different thresholds of distinctiveness, should go without saying, given that the whole of Linnaean hierarchy is predicated on this view; but a failure to set distinct thresholds is common, even if a particular study did not assess effect size per se. A prevalent example is when researchers bring molecular methods to bear on questions of subspecies with the (often unstated) expectation that subspecies ought to differ in the same way that (phylogenetic or other lineage-based) species would. Nowhere is this situation clearer than in assessments, implicit or explicit, of subspecies' “validity” in which one or several neutral genetic markers are found not to be reciprocally monophyletic (or yield unique haplotypes or form distinct clusters) among populations under study, as in recent papers on intraspecific variation in the green comma (Polygonia faunus; Kodandaramaiah et al. 2012), California Gnatcatcher (Polioptila californica; Zink et al. 2013; cf. McCormack and Maley 2015), and western shovel-nosed snake (Chionactis occipitalis; Wood et al. 2014). In each case, the authors drew broad conclusions about subspecies validity after they had failed to meet genetic expectations for species limits, and in no case did the authors explain how they would assess subspecies limits in relation to a particular definition of the term “subspecies.”
What Is a Subspecies?
It is incumbent on any scientist, no matter the field of inquiry, to adhere to (or at least specify) definitions. It is senseless for particle physicists to argue about whether a Higgs boson exists if they do not agree on what a Higgs boson is. The taxonomic rank of subspecies has been defined for many decades (e.g., Mayr 1942, Amadon 1949, Rand and Traylor 1950), and that definition can be summarized simply as “a collection of populations occupying a distinct breeding range and diagnosably distinct from other such populations” (Patten 2009), with the crucial caveat that these populations comprise “completely fertile individuals” (Mayr 1942:106); that is, populations are not reproductively isolated from one another. Compared with a biological species, Patten and Unitt (2002:27) noted that “Concealed by the nesting of the two categories in the Linnaean hierarchy is that they address qualitatively different aspects of biology: the species addresses reproductive and behavioral criteria, the subspecies morphological diagnosability.”
The diagnosability issue has received attention for well over a half-century (Rand and Traylor 1950, Amadon and Short 1992), and a host of practitioners have argued for a standard statistical threshold, the “75% rule” being the predominant (and most widely accepted) example (Amadon 1949, Pimentel 1959, Baker et al. 2002, Patten and Unitt 2002, Haig et al. 2006). The 75% rule means that subspecies A is recognized taxonomically if, and only if, ≥75% of the individuals in group A lie outside 99% of the range of variation of group B for the character or set of characters under consideration (Patten and Unitt 2002). Taxonomic recognition of subspecies B requires meeting the opposite criterion: 75% of its individuals outside 99% of the variation in group A. In terms of a joint probability, a more liberal 75% rule (i.e. 75% of group A from 75% of group B) is equivalent to a difference of 5% of group A from 99.9% of group B (Pimentel 1959), a level of separation akin to the accepted standard type I error rate of α = 0.05. Accordingly, the stricter 75% rule defined above conforms more than adequately to statistical (i.e. probabilistic) convention in ecology and evolutionary biology.
It is unclear how “subspecies” was defined in the studies of the butterfly, bird, and snake mentioned above. It may be that a subspecies was viewed as “lower than a species,” however that would be defined, or perhaps it was viewed as an “incipient species,” even though Mayr (1942:155) admonished that although under a model of geographic (as opposed to ecological, polyploid, or other) speciation, every biological species must go through a subspecies stage, this condition does not mean that every subspecies will become a species. Regardless, it does not follow that the same criteria for assessing species limits can be used to assess subspecies limits, chiefly because a simple consequence of the definition of subspecies is that gene flow (or its inferred possibility in contact) has not been severed. The possibility of ongoing gene flow changes the equation, and so it makes no sense to expect clusters of genes or reciprocal monophyly among groups, at least with respect to neutral markers. Some nuclear genes under natural selection are expected to differ, given that subspecies characters are assumed to have a genetic basis (Remsen 2010); that is, the “typical characters of [a] group of individuals are genetically fixed” (Mayr 1942:106).
Detecting Subspecies Genetically
We might rephrase our definition of “subspecies” to emphasize that the term refers to “heritable geographic variation in phenotype.” Each of these 3 components—the heritable, the geographic, and the phenotypic—is crucial to a proper understanding of what this taxonomic rank encapsulates. A clear implication of this rephrasing is that for any given subspecies, there will be a gene or set of genes that determines phenotype of individuals in a particular geographic area. As such, any genetic assessment of subspecies ought to focus on identification and elucidation of these genes. Today, avian systematists have access to 0.5% of avian genomes (see Jarvis et al. 2014, Zhang et al. 2014), but one day soon, avian systematists will have access to whole genomes of any species they wish to study, coupled with a key of how specific phenotypic traits map to that genome. With these tools, researchers will be able to identify which genes correspond to key phenotypic traits that vary geographically; with such data in hand, basic assignment tests (Piry et al. 2004, Manel et al. 2007) could be used to classify genetically screened individuals into geographically circumscribed populations.
The foregoing assessment postulates the existence of “subspecies genes,” genes responsible for phenotypic variation associated with different geographic populations. A key implication of this view is that phenotypic variation is not merely a product of environment. If it can be shown that environmental variance rather than genetic variation (i.e. in the sense of quantitative genetics) principally shapes phenotype, then subspecies ought not be named. Little work has been devoted to this question, although there are some preliminary ventures, such as a reported common garden experiment (which failed the test of a true common garden, but it was a step in the right direction) of Swamp Sparrow (Melospiza georgiana) subspecies (Ballentine and Greenberg 2010). Another implication is that the “subspecies genes” will be nonneutral—they are postulated to be under natural selection for local adaptation to environmental conditions, particularly of plumage color and pattern (Remsen 2010), most often, I suggest, via a process of phylogenetic niche conservatism (sensu Pyron et al. 2015). Postulated genetic variation does not imply that a researcher may choose any gene(s), neutral or not, with an expectation that any gene(s) will resolve phylogenetic history or systematic relationships.
Indeed, modern genetic tools can zero in on especially fine-scaled variation among human populations (e.g., Xing et al. 2009), and there is no reason to think we could not do the same with other organisms. As a result, we are increasingly likely to identify genetic variation at a spatial extent so small that it is meaningless for subspecific identification, yet genetic differentiation is insufficient, by itself, to diagnosis a subspecies: Morphological variation is needed as well (Mousseau and Sikes 2011). The problem, then, is to state clearly and explicitly what the expectations are for how a subspecies will be detected. On one hand, some modern methods (e.g., single-nucleotide polymorphism) may yield a surfeit of distinction unsuitable to subspecies diagnosis. On the other hand, one cannot, for reasons of ongoing gene flow and of heritable variation in phenotype, expect implicitly that use of neutral genetic markers will demonstrate reciprocal monophyly among subspecies or that subspecies otherwise will form distinct clusters. Use of such neutral genes might reasonably yield no difference among populations, even if those populations differ markedly in phenotype, so a finding of no difference in neutral markers cannot be construed to mean anything. (Conversely, finding a difference in neutral markers means only that sufficient time has elapsed since the sampled populations were isolated but does not necessarily mean that the subspecies are valid, and they would not be valid unless phenotype differed.) It goes against basic philosophy of science to fail to reject a null hypothesis and then conclude that the null is true. At the least, any “acceptance” of the null must be accompanied by an a priori analysis of statistical power, although such an analysis necessitates that a suitable statistical analysis was performed—and in the case of the California Gnatcatcher, Zink et al. (2013) did not conduct basic analyses of molecular variance (see McCormack and Maley 2015). Likewise, a researcher who finds that taxa do not differ in, say, cytochrome-b sequence has a responsibility to report that this is not equivalent to finding that the taxa “do not differ genetically.” The latter (decidedly common) shorthand implies that no aspect of the genotype differs, which was not the null hypothesis tested; rather, it is attribute substitution, the answering of an easier question than what was asked (sensu Kahneman 2011). Readers rightly would scoff if a researcher found no variation in the lesser wing coverts among populations only to proclaim that phenotype did not vary geographically.
Another aspect of philosophy of science bears scrutiny, an aspect McCormack and Maley (2015) touched upon but did not explore fully: Subspecies have no place in the schema of the phylogenetic species concept, so adherents to that concept are predisposed to a finding of “no subspecies” because below the generic level a taxon is either a species or it is nothing. A basic tenet of the hypothetico-deductive framework, the bedrock of scientific inquiry, is that valid alternatives are tested, yet Zink et al. (2013) tested implicitly only 2 alternatives, both of which would have yielded the same conclusion. Had they concluded that distinct genetic clusters were present, they would have declared the gnatcatcher taxa to be species and trumpeted this example of cryptic species others had overlooked. Had they concluded (as they did with their visual inspection of haplotype networks but with no statistical tests) that no geographic pattern was present, they would have declared no taxa to be discernible (which they did). Their ideology left them no other options. In other words, if a conclusion of species = A, subspecies = B, and no taxa = C, then under the framework adopted, only A or C could be reached; there is no suitable alternative, in the hypothetico-deductive sense, under which conclusion B could have been reached. This situation is acceptable if researchers do not intend to draw an inference about subspecies limits, but it is patently unacceptable if they do.
A Way Forward
In light of increased reliance on phylogenetic methods, it is imperative that systematists identify and adopt standards to determine species limits and subspecies limits, “to establish a standard method to determine the species–subspecies boundary in order to effectively use the subspecies classification for research and conservation purposes” (Torstrom et al. 2014). This task will mean that we must consider both phenotype and genotype (see Winker 2009), even in the face of an overwhelming push to consider only the latter. As it stands, we can assess concordance between aspects of subspecific phenotype and genotype (e.g., Pruett et al. 2008, Miller et al. 2011), especially if we use a range of genes, and it possible to meld the two in increasingly sophisticated ways (e.g., Hawlitschek et al. 2012). Better still, we can establish an a priori set of explicit genetic predictions to assess subspecies limits (for a superb example, see Sackett et al. 2014).
In the interim, I suggest a simple solution to resolve the problem: Add alterative B to the framework of the phylogenetic species concept. In principle, this addition is simple. Because a subspecies is defined by its morphological diagnosability (Patten and Unitt 2002), a researcher needs to account for phenotype as well as genotype, and genetic differences alone are not enough to define a subspecies (Mousseau and Sikes 2011). Under the biological species concept, a diagnosably distinct, geographically circumscribed segment not reproductively isolated from other such segments would be deemed a subspecies and not a species. I propose that under the phylogenetic species concept, a (morphologically) diagnosably distinct, geographically circumscribed clade that does not form a distinct (neutral) genetic cluster or is not reciprocally monophyletic (I mention this because its assessment is common practice, not because it is a criterion inherent to the concept) in relation to other such clades be deemed a subspecies and not a species. Only a failure to achieve both phenotypic and genotypic distinctiveness—by which I mean a large effect size (Patten 2010, Tobias et al. 2010)—ought to lead a researcher to conclude that a subspecies is taxonomically invalid.
As a final thought, I wish to emphasize that it is incumbent upon a researcher who wishes to make a good-faith effort to examine phenotypic variation to begin with diagnoses in the type descriptions of the subspecies. It is incorrect, for example, to assess a subspecies' taxonomic status that is based on plumage color by instead using wing length, when body size was never claimed to differ. Only with rigorous analysis of phenotype, using modern methods such as colorimetry and computer-based analysis of shape, wed to rigorous analysis of genotype and a clear sense of what we expect to see in such data (e.g., Sackett et al. 2014), can we move forward in the field of subspecies systematics. Short of that, we run the risk of further imperiling Earth's already unconscionably imperiled biodiversity when we accept null hypotheses of no difference, fail to state expectations explicitly, and adopt frameworks that do not allow us to reach reasonable scientific conclusions.