There is increasing interest among conservation biologists in using DNA barcodes to document biological diversity, and in using the barcode data to replace morphologically based taxonomic data. Taxonomic data, traditionally obtained by observing the morphological and anatomical characteristics of organisms, have been the foundation of conservation biology and ecological assessments for decades, using taxa presence, their associated ecological characteristics, and assemblage-level diversity as biological indicators of ecological health (Angermeier and Karr 1994).
In the discipline of ecological monitoring and assessment, my colleagues and I have found a fairly consistent error rate associated with applied taxonomy of 10 to 15 percent (most testing has been done with freshwater benthic macroinvertebrates)—and this is at the level of genus. The number would undoubtedly be much higher for species-level identifications. Some scientists are concerned that similar error may exist with data on other assemblages (specifically, fish and periphyton), but this is as yet untested. I have heard three recommendations from within the monitoring community for dealing with this taxonomic error: (1) Base ecological assessments on DNA barcoding, as it will eliminate all taxonomic error; (2) base ecological assessments on family-level taxonomy, as it will reduce the rate of taxonomic error to acceptable levels; and (3) accept the 10 to 15 percent uncertainty, and move forward with the assessments. The first recommendation, which calls for more detailed data, represents a misunderstanding of what DNA barcoding is, of how those data relate to taxonomic diversity, and of the uncertainties inherent in translating genetic data into taxonomic data. The second, a call for more coarse data, risks losing ecological information available at finer taxonomic levels (genus or species), which is critical to interpreting ecological assessments in a way that integrates the diversity of environmental pressures on ecosystems. The third, to which I adhere, is a compromise between the first two, and also recognizes the uncertainty in all technical endeavors.
DNA barcoding relies almost strictly on gaps in genetic variance for distinguishing among taxa, whether these are putative species or higher categories (Meyer and Paulay 2005). The degree of overlap (the inverse of gaps) in genetic signature that is tolerated before a new taxon, or simply a different one, is recognized, is a matter of professional judgment. This is recognized both by barcoding proponents (Blaxter et al. 2005) and opponents (Wheeler 2005), the former describing these judgment calls as “user-defined cutoffs.”If improved objectivity is the rationale for genetic definition of taxonomic limits, it is curious to me that best professional judgment can be so embraced. Clearly, the uncertainty associated with traditional morphological taxonomy does not just melt away with the implementation of molecular genetics techniques.
Although one could argue that subjectivity is also inherent in determining the limits of morphologically based operational taxonomic units (OTUs), multiple lines of information are used for justifying the assignment of taxa, including biogeographic distributions, phylogenetic relationships, and the structural and functional complexity of morphological features (Wheeler 2005), not just a single piece of information. Early uses of the term “OTU” signified the taxonomic level on which analyses were performed, or the end units. That there are multiple recent forms of OTU, such as molecular OTU (MOTU), evolutionarily significant unit (ESU), and least inclusive taxonomic unit (LITU)—and there are many others—is evidence of an active area of thought and research focused on formulating the theoretical bases of species concepts. The point here is that taxon definition is rarely a cut-and-dried, yes-or-no decision by research taxonomists (though many might not admit it); it requires more than a single datum. And that's the way it should be.
Biological taxonomy is of two types (in the very broad sense), research and production. The definition of morphologically based taxa relies on the efforts of research taxonomists; the timely application of their results to cataloging the content of bulk ecosystem samples (i.e., multitaxon samples) is performed by production taxonomists using dichotomous identification keys and taxon descriptions. That research and production taxonomy are both so active, and that they need each other, speaks to a synergism that many fail to recognize. Research taxonomists may not always appreciate that the information they develop in their research, and the dichotomous keys they uniquely have the ability to produce as a result, are priceless to frontline production taxonomists. Production taxonomists, in turn, may not recognize the hundreds—maybe hundreds of thousands—of solitary hours spent by research taxonomists in developing the knowledge base and the experience, writing skills, and curatorial capabilities that provided the foundation for authorship of the identification keys. Most biological monitoring and assessment programs are more directly reliant on whole-sample data, such as a list of taxa and the number of individuals of each taxon, that come from the efforts of production taxonomists.
Is there a purpose, directly relevant to assemblage-level biological assessments, in placing Linnean nomenclatural labels on gap-defined segments of bulk ecosystem DNA, as Blaxter and colleagues (2005) demonstrated was possible? Perhaps, but it depends strictly upon the ultimate uses of the data. Angermeier and Karr (1994) suggested that genetic diversity is but one component in the hierarchy of biological organization that deserves humanity's best efforts toward restoration and protection; following genetic diversity, there is taxonomic diversity and ecological diversity.
I suggest that for purposes of describing a multispecies sample, the results of which are intended to contribute to the calculation of assemblage- or community-level indicators of ecological condition, it is not necessary to label gap-delineated genetic segments. Rather, it may be more critical to use some metric of genetic diversity as a descriptor of the single sample, a process called “shotgun sequencing” in the emerging field of metagenomics (Chen and Pachter 2005). Comparison of that genetic descriptor of a sample from a location known to be exposed to environmental stressors to another sample from a location lacking the stressors would most likely be a robust indicator of overall ecological conditions (although this has not yet been confirmed). It would avoid the need to spend a substantial amount of time labeling subjectively defined barcode segments; would most likely do more to describe, and thus protect, a principal aspect of ecosystem health (genetic diversity, sensu Angermeier and Karr ); and would allow more focus to return to ecosystem protection and away from debates about species concepts, reproductive isolating mechanisms, and genetic drift. It is important to protect both genetic diversity and taxonomic diversity.
If we, as a scientific community, are to make our research and applied activities truly worthwhile, what can be better than explicitly recognizing that ecological protection directly contributes to the health and prosperity of humans...and then doing something about it? Intense, extensive debate about how much barcode gap is really a gap, and then deciding what that gap means, while interesting, may do little to advance ecological protection. All three endeavors (production taxonomy, research taxonomy, and molecular genetics) are laudable; however, definition of taxa should be left to the research taxonomists, with occasional help from molecular geneticists.
Uncertainty can never be entirely eliminated from data; that is a truism of science. The key is making the effort to characterize the uncertainty and minimize the rates of error, and not throwing out a taxonomic system that is truly integrative. I believe that reducing the support for training and hiring scientists in the discipline of applied production taxonomy, because of some perceived but false notion of enhanced data quality, would be detrimental to ecological protection.