Deoxyribonucleic acid (DNA) barcoding is an effective tool for species identification and life-stage association in a wide range of animal taxa. We developed a strategy for rapid construction of a regional DNA-barcode reference library and used the caddisflies (Trichoptera) of the Great Smoky Mountains National Park (GSMNP) as a model. Nearly 1000 cytochrome c oxidase subunit I (COI) sequences, representing 209 caddisfly species previously recorded from GSMNP, were obtained from the global Trichoptera Barcode of Life campaign. Most of these sequences were collected from outside the GSMNP area. Another 645 COI sequences, representing 80 species, were obtained from specimens collected in a 3-d bioblitz (short-term, intense sampling program) in GSMNP. The joint collections provided barcode coverage for 212 species, 91% of the GSMNP fauna. Inclusion of samples from other localities greatly expedited construction of the regional DNA-barcode reference library. This strategy increased intraspecific divergence and decreased average distances to nearest neighboring species, but the DNA-barcode library was able to differentiate 93% of the GSMNP Trichoptera species examined. Global barcoding projects will aid construction of regional DNA-barcode libraries, but local surveys make crucial contributions to progress by contributing rare or endemic species and full-length barcodes generated from high-quality DNA. DNA taxonomy is not a goal of our present work, but the investigation of COI divergence patterns in caddisflies is providing new insights into broader biodiversity patterns in this group and has directed attention to various issues, ranging from the need to re-evaluate species taxonomy with integrated morphological and molecular evidence to the necessity of an appropriate interpretation of barcode analyses and its implications in understanding species diversity (in contrast to a simple claim for barcoding failure).
Deoxyribonucleic acid (DNA) barcoding uses a short, standardized segment of the mitochondrial cytochrome c oxidase subunit I (COI) gene to identify animal species (Hebert et al. 2003). It is an effective method in varied animal lineages including several major freshwater insect groups—mayflies (Ephemeroptera) (Ball et al. 2005), caddisflies (Trichoptera) (Hogg et al. 2009, Zhou et al. 2009, 2010), midges (Diptera∶Chironomidae) (Ekrem et al. 2007), and black flies (Diptera∶Simuliidae) (Rivera and Currie 2009). This approach is particularly valuable for aquatic insects because it enables identification of larval stages and females that often would otherwise remain taxonomically ambiguous (for caddisflies, see Shan et al. 2004, Graf et al. 2005, 2009a, b, Zhou et al. 2007, Waringer et al. 2008, Pauls et al. 2009, Zhou 2009). This capability has stimulated growing interest in developing barcode libraries that allow identification of regional faunas of aquatic insects.
The work involved in constructing regional barcode libraries depends on the nature of sequence variation within lineages of a species. If patterns of intraspecific variation are complex and geographic divergence is large, effective barcode-based diagnostic systems must be based on a reference barcode library for each locality. If sequence profiles for most species show little regional divergence, each reference barcode library can be created by amalgamating local data with barcode records gathered over a broader area. A major benefit of the amalgamation approach is that it overcomes the difficulty of obtaining specimens of uncommon species, which make up a substantial number of the species in any local fauna. Amalgamation of records from multiple localities aids construction of a comprehensive library because many locally rare species may be common at another site.
Few studies have examined enough taxa at a large enough geographic scale to provide a good sense of the extent of geographic variation in barcode sequences. However, analysis of >1000 species of lepidopterans across the eastern ½ of North America showed that barcode variation was very limited within species, even between collection localities that were thousands of kilometers apart (Hebert et al. 2010). Reason exists to expect a similar pattern in other terrestrial and marine groups, but freshwater species may show more regional variation because of the discontinuous nature of the habitats they occupy. Repeated extinctions of Nearctic aquatic insects over large parts of their range during the Quaternary and subsequent recolonizations from southern refugia may have enabled maintenance of substantial regional genetic diversity (Hewitt 2000). Therefore, sequence variation must be examined on a large scale for a wide range of taxa to test the level of local effort required to achieve a comprehensive regional library.
We used the Trichoptera fauna of the Great Smoky Mountains National Park (GSMNP, southeastern USA) as a model system to test the effort needed to construct a regional barcode library. The GSMNP is the target of the first All Taxa Biodiversity Inventory (ATBI) in a national park (Nichols and Langdon 2007). The GSMNP is an ideal locality in which to test the effectiveness of DNA barcoding because of its high level of biodiversity in the temperate region, a consequence of its highly diversified habitats and the fact that this region has never been glaciated (Flint 1971). Aquatic insect diversity has been studied extensively in the GSMNP (Stoneburner 1977, Morse et al. 1993, 1998, Parker 1998, 2000, Etnier et al. 2004, DeWalt and Heinold 2005, DeWalt et al. 2007, Parker et al. 2007). Caddisflies, one of the most diverse freshwater insect groups, are particularly species-rich in the GSMNP. Two hundred thirty-one species, representing ⅙ of the North American fauna, have been reported from this region (Parker et al. 2007). However, immature forms and females of many caddisfly species in the GSMNP remain difficult or impossible to identify. This problem is a serious roadblock to a major goal of the ATBI—compiling information on the natural history and ecological role of each species (Nichols and Langdon 2007). A DNA-barcode reference library for the GSMNP caddisflies would enable the identification of immatures and females, aid detection of rare species, and provide a first measure of genetic diversity within the focal fauna. Only 1 caddisfly species (Neophylax kolodskii Parker) is thought to be endemic to the GSMNP. Most (88%) of the resident species are widely distributed throughout the eastern US and Canada (Parker et al. 2007). This fact provided the rationale for rapid construction of a reference library for the GSMNP fauna by including barcode records for caddisflies from other sites in North America.
Our general goal is to develop a strategy for rapid construction of regional barcode libraries. Work in GSMNP began with a bioblitz (short-term, intense sampling program) in the spring of 2007. The goal was to obtain fresh samples of as many Trichoptera species as possible for barcode analysis. This brief effort yielded a limited number of the species known from the GSMNP because of the strong seasonality of many caddisflies. However, the species gathered during this effort did provide a basis for ascertaining whether a barcode reference library constructed solely from species records obtained from a global barcoding effort (Trichoptera Barcode of Life [TBoL]; www.trichopterabol.org), most of which were collected outside the focal area, would generate different identifications than a library supplemented by records from within GSMNP through local biotic surveys.
Materials and Methods
Bioblitz survey
A bioblitz survey was conducted from 16 to 18 May 2007 by a team of 12 taxonomic specialists and volunteers. Adult caddisflies were collected with UV light traps and sweep nets. Larvae and pupae were collected with kick nets and by hand. Species or morphospecies were sorted into separate vials to minimize the chance of cross contamination. As many specimens as possible were morphologically identified to species in a 2-d session after collection. When discrepancies between these morphological assignments and COI assignments were detected, specimens were re-examined by specialists.
Trichoptera samples from other regions
Barcode records from specimens of Trichoptera species known from the GSMNP, but obtained through the TBoL effort, most of which were collected from other localities in North America, were used to create a barcode library with coverage for as many species as possible. Samples from much of the eastern US and Canada were included in our analyses (Fig. 1), but samples from other regions, e.g., Ozarks, Gulf Coastal Plain, Atlantic Coastal Plain, Wichita Mountains, Great Plains, Cumberland Plateau, that might potentially have high divergence levels were unavailable to us or were represented by few specimens. Samples from these regions should be investigated further. Representative sequences for caddisfly species known from the GSMNP were selected from projects in the global TBoL campaign. Caddisfly species from the TBoL library were included based on the most recent park checklist (Parker et al. 2007), which was based on examination of >130,000 specimens and records. Representatives of each haplotype cluster showing >2% divergence from its nearest neighbors were selected to represent each taxon. Our goal was to reflect the COI sequence diversity present within each species across its distribution. If a particular COI haplotype cluster was represented by multiple individuals, a single representative from each state or province was included in the analysis. Whenever available, full-length sequences derived from male specimens were selected.
Species identification via DNA barcoding
A significant fraction of the caddisflies collected in the 2007 bioblitz consisted of immature individuals or females, most of which presented a challenge for species-level identification. In such cases, DNA barcodes were used for species identification. We used a strict consensus criterion for these identifications—each specimen was identified only if its sequence nested within a sequence cluster delimited by positively identified specimens of that species (Zhou et al. 2007).
DNA protocols
All specimens were stored in 95% ethanol or pinned. Standard DNA-barcoding protocols (Ivanova et al. 2006, deWaard et al. 2008) were conducted at the Canadian Centre for DNA Barcoding, University of Guelph. In most cases, a single leg was removed from each individual and used for DNA extraction. A nondestructive extraction protocol was followed for some microcaddisflies (e.g., Hydroptilidae). In this protocol, the entire specimen was emerged in lysis buffer and was retained after extraction. The full-length barcode region of the COI gene was amplified with 2 sets of routine primers: LepF1 (5′-ATTCAACCAATCATAAAGATATTGG-3′)/LepR1 (5′-TAAACTTCTGGATGTCCAAAAAATCA-3′) (Hebert et al. 2004), and LCO1490 (5′-GGTCAACAAATCATAAAGATATTGG-3′)/HCO2198 (5′-TAAACTTCAGGGTGACCAAAAAATCA-3′) (Folmer et al. 1994). PCR products were visualized, cycle sequenced, purified, and bidirectionally sequenced on ABI 3730XL sequencers (Applied Biosystems, Foster City, California).
Data depository and output
All relevant voucher information, DNA sequences, and trace files are publicly accessible in projects GSMNP caddisflies (SMCAD) and GSMNP caddisflies additional samples (SMTRI), in the Barcode of Life Data Systems (BOLD systems; http://www.boldsystems.org). All COI sequences have also been deposited in GenBank (accession numbers are in Appendix 1; available online from: http://dx.doi.org/10.1899/10-010.1.s1 (10.1899_10-010.1.s1.xls)), except for 13 Churchill samples that were published in earlier papers (Zhou et al. 2009, 2010).
Sequence analysis and tree construction
Neighbor-Joining (NJ) trees were built for both GSMNP samples and combined samples using a Kimura-2-Parameter (K2P) distance model, using tree-construction tools on BOLD. The Newick tree files were subsequently imported into the web-based visualization tool, interactive Tree of Life (iTOL; http://itol.embl.de/; Letunic and Bork 2007). The terminal nodes in the tree of samples from the GSMNP were collapsed for each morphological species, and the total branch lengths to the closest and the farthest terminal were used as sides of the triangle (Fig. 2). The combined NJ tree was presented in circular format. All terminal clades with average distance to terminals <2% were collapsed, but the overall tree topology was not changed (Fig. 3). Eleven monophyletic clades were recognized and pruned from the combined tree and presented as 9 subtrees (Figs 4–12). A NJ tree with detailed sample identification code (BOLD Sample ID), distribution, and sex information also was built with the BOLD tree construction function with K2P distances (Appendix 2; available online from: http://dx.doi.org/10.1899/10-010.1.s2 (10.1899_10-010.1.s2.pdf)). Intra- and interspecific distances were calculated in BOLD with the Nearest Neighbor Summary analytical tool and a K2P distance parameter for all sequences >350 base pairs (bp).
Results
DNA-barcode reference library
The GSMNP checklist was updated to ensure correspondence with current Trichoptera nomenclature (Morse 2010). Records from Parker et al. (2007) (with updates made by the authors of the present paper) and new records from our study were included in the updated checklist (Table 1). A total of 234 species, including 2 undescribed species by Parker et al. (2007) and 3 that have only provisional identifications, are now known from GSMNP. This total includes 2 species (Homoplectra flinti Weaver and Hydroptila coweetensis Huryn) that were collected for the first time in GSMNP during our bioblitz. COI sequences were available from the TBoL global library for 209 of these species (89%). Eighty species were collected during the bioblitz. Only 3 of these species were not already in the TBoL global library. Our coverage was 212 species (91% of the fauna), of which 208 (89%) were represented by barcode sequences >500 bp. Figure 1 shows the distribution of the 1638 caddisfly specimens examined in our study.
Table 1
Trichoptera of Great Smoky Mountains National Park (GSMNP) and barcode distances. Divergence values were calculated for all sequences >350 base pairs, using the Nearest Neighbor Summary tool provided in the Barcode of Life Data System (BOLD). Global library = Trichoptera Barcode of Life database, bioblitz = short-term, high-intensity sampling event, N = number of sequences in the database, NN = nearest neighbor, ISD = intraspecific distance, ID = identifier, N/A = not applicable. AB = Alberta, AL = Alabama, AR = Arkansas, AZ = Arizona, CO = Colorado, FL = Florida, GA = Georgia, IA = Iowa, IL = Illinois, IN = Indiana, KY = Kentucky, LA = Louisiana, MA = Massachusetts, MB = Manitoba, MD = Maryland, ME = Maine, MI = Michigan, MN = Minnesota, MT = Montana, NB = New Brunswick, NC = North Carolina, NJ = New Jersey, NL = Newfoundland and Labrador, NS = Nova Scotia, NV = Nevada, NY = New York, OH = Ohio, OK = Oklahoma, ON = Ontario, OR = Oregon, PA = Pennsylvania, PE = Prince Edward Island, QC = Quebec City, SC = South Carolina, SD = South Dakota, SK = Saskatchewan, TN = Tennessee, TX = Texas, VA = Virginia, WI = Wisconsin, WV = West Virginia, WY = Wyoming, VT = Vermont, ME = Maine, WA = Washington.
Barcode divergences in GSMNP Trichoptera
COI sequences were obtained from 645 of the caddisflies collected in the bioblitz. All 80 species in this collection exhibited reciprocal monophyly in the NJ tree, and no species shared barcodes (Fig. 2). COI divergences within species were, on average, much lower than those between nearest neighbors (mean intraspecific distance = 0.7%, maximum intraspecific distance = 1.4%; Table 1). Eleven species showed maximum intraspecific divergence >2% (Fig. 2, highlighted by thickened branches), a threshold found useful in species discrimination in many insect groups (e.g., Hebert et al. 2003, 2004, Ball et al. 2005), whereas 3 species had mean intraspecific divergence >2%. Exceptionally large intraspecific variation was observed in 2 species—individuals of Dolophilodes distincta (Walker) showed as much as 14.0% divergence and those of Polycentropus cinereus Hagen reached 9.9% divergence. Levels of within-species divergence were not correlated with the number of individuals analyzed for a species. In contrast to these cases of deep intraspecific variation, 1 species pair, Agapetus walkeri (Betten and Mosely) and A. tomus Ross, showed only 3.1% divergence, a result suggesting their recent speciation.
Barcode divergences in combined samples
Barcode coverage for the GSMNP fauna was extended by including 993 records from specimens collected at other localities to produce a data set with 1638 sequences (97% were >500 bp in length), representing 212 species (Fig. 1). The expansion of geographic range did not lead to a large increase in COI divergence for all species. For example, the maximum intraspecific divergence for Hydropsyche sparna Ross remained as low as 1.5% among samples collected as far as 2600 km apart. However, maximum intraspecific sequence divergences increased in many taxa, a result that reflected geographic variation. Between-species differences decreased because of greater taxon coverage, especially the addition of species that were closely related to those already in the data set. The barcode gap decreased, but intraspecific divergences were still much lower than interspecific divergences. Mean intraspecific distance was 1.7% (range: 0–10.2%), mean maximum intraspecific distance was 3.1% (range: 0–10.6%), and average distance to the nearest neighbor was 10.1% (range: 0–23.4%). The rise in intraspecific variation reflected the fact that some species showed very high divergence across their distribution. Half of the species represented by multiple individuals showed maximum intraspecific divergence ≥2% (Figs 4–12, Table 1), whereas 34% had a mean intraspecific divergence ≥2%. However, most of the overall rise in intraspecific variation arose from a few species with deep maximum within-species divergence. For instance, ∼11% of the species showed ≥8% maximum within-species divergences (Table 1, highlighted in grey blocks with a solid line in Figs 4–7, 9–10, 12). The 8% value was not selected to imply a generic barcode gap for the caddisflies examined in our study nor to suggest that taxa with <8% divergences should not be investigated. This arbitrary divergence was used simply to point out where large intraspecific divergences were observed in the focal taxa. Most of these taxa have not been thoroughly investigated via integrated morphological and molecular evidence, some are almost certainly species complexes, and others include multiple lineages with clear morphological differences. Despite cases of large sequence variation within species, most species (91%) defined by morphology were represented by a monophyletic assemblage of haplotypes (Figs 4–12).
The exceptions are highlighted with transparent (Category 1 in Appendix 3; available online from: http://dx.doi.org/10.1899/10-010.1.s3 (10.1899_10-010.1.s3.doc)) or grey (Category 2 in Appendix 3) blocks with a dotted line, representing instances where the barcode data could not definitively identify species or where barcode data could identify species, but species were not monophyletic, respectively. In Category 1, 14 species belonging to 6 species complexes either shared barcodes or formed clusters with representatives from ≥1 taxon (Figs 5, 6, 9–11, highlighted by transparent blocks with a dotted line). These cases included 5 pairs of species—Polycentropus colei Ross/Polycentropus rickeri Yamamoto (Fig. 5), Ceraclea nepha (Ross)/Ceraclea tarsipunctata (Vorhies) (Fig. 9), Triaenodes tardus Milne/Triaenodes marginatus Sibley (Fig. 9), Psilotreta rossi Wallace/Psilotreta rufa (Hagen) (Fig. 10), and Lepidostoma pictile (Banks)/Lepidostoma modestum (Banks) (Fig. 11). In addition, they included 1 group of 4 species—Cheumatopsyche campyla Ross/Cheumatopsyche speciosa (Banks)/Cheumatopsyche pasella Ross/Cheumatopsyche ela Denning (Fig. 6). In Category 2, 6 species each had multiple haplotype groups deeply diverging from each other and a closely related species nested within its species boundary but not overlapping with any of its haplotype groups: Nyctiophylax affinis (Banks), Diplectrona modesta Banks, Hydropsyche rossi Flint, Voshell, and Parker, Ceraclea flava (Banks), Oecetis inconspicua (Walker), and Psilotreta labida Ross. The potential reasons for these unusual barcode clustering patterns are briefly addressed in the discussion. Taxonomic changes have not been proposed in our paper because comprehensive taxonomic revision is not available. However, provisional opinions on the relevant issues are provided in Appendix 3.
Discussion
This study validated the effectiveness of DNA barcoding as a tool for identifying Trichoptera species found in the GSMNP. When analysis was restricted to specimens collected in the Park, DNA-barcode results showed perfect congruence with morphological assignments, a result that also was obtained in an investigation of subarctic caddisflies (Zhou et al. 2009, 2010). We also sought to establish a protocol for expedited construction of the barcode reference library needed to aid identifications of this local fauna. A 3-d collecting effort in the GSMNP provided barcode records for 80 species (34% of the fauna), including 3 with no prior barcode data. However, coverage for an additional 57% of the GSMNP species was gained from records obtained through ongoing effort to build a comprehensive barcode reference library for North American Trichoptera. The inclusion of such records enabled rapid progress towards a comprehensive library, but the addition of samples from multiple localities did lead to a substantial increase in the levels of sequence variation within species. Only 5% of the taxa collected in the GSMNP showed >2% mean intraspecific divergence, but 34% of the species in the composite data set exceeded this threshold. Despite this large sequence variation, 91% of the species were represented by a monophyletic cluster of sequences. As a consequence, query sequences still were usually assigned to the proper species via a rigorous tree-based method. Barcoding can assign queries to the correct species, even those exhibiting paraphyly in COI, if nested lineage diversity is present (Kizirian and Donnelly 2004). For instance, if a query sample belongs to any of the paraphyletic haplogroups within Diplectrona modesta Banks (Fig. 6), the sample is identified to that species. Moreover, in cases where observed haplotype groups represent distinctive species yet to be described, the barcode reference library can be updated as soon as taxonomic revisions are completed for a species complex, as can be all identifications made based on sequence records.
Overall, DNA barcoding distinguished 93% of the caddisfly species in our study, which is by far the most rigorous test to date for the effectiveness of barcoding in caddisflies because it involves a wide range of taxa represented by multiple individuals from a large geographic range. Among the 77 species with coverage from within and outside the Park, 51 had all haplotypes detected in GSMNP samples nested within species boundaries delimited by outside samples, and 87% of the GSMNP individuals were placed within these boundaries (samples collected in the bioblitz were highlighted in red in Appendix 2). This observation suggests that sequence records within the global TBoL reference library can be used effectively to aid comprehensive barcode coverage for much of eastern North America. Of course, care in interpretation is critical when a query sequence falls outside the species boundary delimited by existing records because many eastern caddisfly species, including some known from the GSMNP, are currently absent from the overall library. As indicated by our study, local biotic surveys can make an important contribution to the overall DNA-barcode library by providing coverage for rare or endemic species (for hotspots of caddisfly endemism, see de Moor and Ivanov 2008) and by contributing high-quality barcode sequences through the analysis of fresh material.
Most of the Trichoptera species known from the GSMNP (91%) are now included in the DNA-barcode reference library, so immediate opportunities exist for its application. The most important of these applications lies in the capability to identify caddisflies at any life stage. This ability will facilitate studies on life history, phenology, food and habitat preferences, and larval behaviors. In addition, molecular analyses, including high-throughput sequencing technologies, can be applied to community-level biodiversity surveys and for foodweb study via diet analyses (King et al. 2008, Rokas and Abbot 2009, Valentini et al. 2009a, b). Such studies will certainly advance our understanding of ecosystem functions and enhance ecosystem management.
Complexities when using DNA barcoding
Our study has further validated the effectiveness of DNA barcoding for the identification of Trichoptera within the GSMNP, but it has revealed some complexities. These complexities fell into 3 categories: 1) barcode data could not be used to distinguish some closely related species; 2) barcode data could be used to identify species, but species were not monophyletic; 3) barcode sequences recovered monophyletic taxa, but the genetic distances were very high. A comprehensive taxonomic revision is beyond the scope of our paper, but provisional opinions and observations made during the course of our study are provided in Appendix 3. Here, we discuss the implications of the present results in relation to past concerns expressed regarding the effectiveness of DNA barcoding.
Category 1: barcodes could not be used to distinguish some closely related species
Fourteen species belonging to 6 species complexes could not be distinguished by DNA barcodes. Members of 3 of these species complexes (Ceraclea nepha (Ross)/C. tarsipunctata (Vorhies), Triaenodes tardus Milne/T. marginatus Sibley, and Psilotreta rossi Wallace/P. rufa (Hagen)) each possess distinctive morphological characters that allow their unambiguous identification. Members of the 3 remaining species complexes have very subtle diagnostic characters, and individuals with intermediate morphology are sometimes observed. For example, members of the Cheumatopsyche campyla complex are notoriously difficult to distinguish reliably using morphology. A revision to Nearctic Cheumatopsyche (Gordon 1974) is widely followed by taxonomists, but males with an admixture of diagnostic characters are common and identification of both sexes remains difficult. A single individual is often assigned to different species if examined by more than 1 taxonomist because of the lack of consistent diagnostic characters. Indeed, such is the case for a number of specimens in the barcode tree (taxonomic comments can be found in the corresponding specimen records in BOLD projects). However, none of these conflicting identifications corresponded strictly to the barcode haplotype groups (Fig. 6). The C. campyla complex might have undergone recent speciation with incomplete lineage sorting that is reflected in both morphology and COI sequences, possibly accompanied by hybridization. In several cases, the taxa in question are represented only by limited samples collected in just a few localities. Thus, revisional work is impossible at this time.
Category 2: barcodes could be used to distinguish taxa even when species showed paraphyly
The 6 species in this category can be distinguished readily by barcodes, and evidence is increasing that taxa in this category may each include multiple cryptic species. For instance, among the most sampled taxa, Diplectrona metaqui Ross resides deeply inside D. modesta (Fig. 6). The long speculation that multiple species are included within D. modesta has now gained support from studies of larval morphology and barcode data (JLR, CJG, OSF, L. Harvey, Clemson University, J. C. Morse, Clemson University, and XZ, unpublished data). Diplectrona modesta is undoubtedly a complex of several species with nearly indistinguishable adults but often diagnosable larvae. Similarly, the Oecetis inconspicua complex contains at least 21 divergent COI clusters with O. nocturna Ross nested within the defined species boundary (Fig. 9). The probable presence of multiple species within O. inconspicua is supported by earlier observations that members of this complex show pronounced genitalic variation, even among sympatric individuals. The long-standing difficulty of species diagnosis in this group is further reflected by the fact that O. inconspicua has 8 synonyms (Morse 2010). This taxonomic uncertainty is now gaining clarity. Seven distinguishable larval types were described for this species complex (Floyd 1995). The coupling of barcode results with morphology has revealed diagnostic characters among adults of some component taxa (Zhou et al. 2010). However, extensive sampling accompanied by sequencing of type material and careful morphological examination will be required to clarify species boundaries, to apply the existing names properly, and to describe any new species that result. Although the other 4 taxa in this category were sampled less comprehensively than D. modesta and O. inconspicua, diagnostic morphological differences have been observed in at least some members of these complexes, e.g., Ceraclea flava (Banks) from Florida (Appendix 3).
Category 3: barcode sequences recovered monophyletic taxa, but genetic distances were very high
The present study has revealed that many morphologically recognized caddisfly species include lineages with deep genetic divergences. In fact, 11% of the species in our study possessed COI lineages with a maximum divergence >8%, suggesting that current taxonomy has overlooked some species. Prior studies on other insect groups, including lepidopterans, hymenopterans, and dipterans, have shown that such deep COI divergences regularly reflect unrecognized species, a conclusion based on the presence of concordant differences in morphology, ecology, and host-specificity among barcode clusters with such high divergence (Hebert et al. 2004, 2010, Smith et al. 2006a, 2007, 2008). A similar pattern of morphological differentiation among deeply divergent COI haplogroups has been reported in caddisflies and mayflies (Zhou et al. 2010).
Most species showing deep COI divergence have not undergone intensive morphological scrutiny, but the hypotheses held by many taxonomists that certain caddisfly species may indeed be complexes are supported by our barcode analyses, such as in Hydropsyche rossi Flint, Rhyacophila glaberrima Ulmer, R. mycta Ross, and Helicopsyche borealis (Hagen). Furthermore, cryptic species have regularly been encountered in recent studies of Trichoptera (Jackson and Resh 1992, 1998, Whitlock and Morse 1994, Pauls et al. 2006, 2009, 2010, Smith et al. 2006b, Zhou et al. 2007, 2009, 2010, Bálint et al. 2008, Lehrian et al. 2009). DNA barcoding is playing an increasingly important role in highlighting taxa that should be investigated in more detail and in allowing morphological comparisons to focus on groups that show genetic divergence.
Taxonomy and barcodes
Critics of DNA barcoding have based their concerns largely on theoretical objections (e.g., Rubinoff et al. 2006), or on failure to gain perfect resolution in a particular taxonomic group (Whitworth et al. 2007, Wiemers and Fiedler 2007, Alexander et al. 2009). Theoretical objections ultimately must account for actual data. Our study provides yet another example of the effectiveness of DNA barcoding as a tool for species identification, so it further weakens theoretical objections. Case studies have revealed that barcoding is not a perfect technology, but neither is any other approach for species recognition. Nevertheless, some generalized conclusions of barcoding failure are often made based on case studies in which just a few species within a particular group have been examined (e.g., Whitworth et al. 2007) and on the presumption that existing taxonomic systems are static and lack the capacity to evolve or to embrace new evidence (e.g., Alexander et al. 2009). Our results suggest that it is a great oversimplification to conclude that all cases of incomplete correspondence between current taxonomic assignments and barcode clusters reflect a flaw in barcode methods. For example, the heterogeneity in COI divergence values found within and among many caddisfly “species” in our study could be interpreted as the absence of a barcoding gap or the lack of a limit on intraspecific sequence variation in caddisflies. However, a critical examination of species with such properties reveals the strong concordance of anomalous results with the need for a better taxonomic investigation in the relevant taxa. We do not claim that barcode-based identifications are superior to morphological assignments when conflicts arise. However, we do argue that every exceptional case (e.g., nonmonophyletic species or taxa showing >2% intraspecific divergence following a more conservative standard for the caddisflies) should be examined in detail to understand the biological origins and implications of the divergent pattern. From the taxonomic point of view, species hypotheses proposed by morphology have been and always will be subject to nomenclatural revision when new evidence or analytical methods become available. We anticipate that some current barcoding failures will actually prove to be successes when taxonomists use barcode data to conduct much needed revisions. In fact, some recent nomenclatural changes have been supported by the barcode analyses in our study, e.g., the reinstatement of Cheumatopsyche enigma Ross, Morse, and Gordon as a full species instead of a subspecies of C. harwoodi Denning and Gordon (Flint et al. 2004). On the other hand, some past synonymization may have to be revised, e.g., the treatment of Drusinus uniformis Betten as a subspecies of Pseudostenophylax sparsus (Banks) (Schmid 1991). These 2 taxa can be readily differentiated based on color patterns of the forewings and genitalic structures and by barcodes, as revealed in our study (Fig. 12). In fact, they have been treated as separate species in practice by many taxonomists despite Schmid's (1991) revision. Thus, DNA barcodes (i.e., standardized mitochondrial COI sequences) can be used for identification of caddisflies and to provide a deeper understanding of species boundaries and of biodiversity at large.
Conclusion
DNA barcoding is a robust tool for identifying the Trichoptera species that occur within the GSMNP. The barcode analyses have drawn attention to various interesting issues in caddisfly biodiversity, ranging from the need for a re-evaluation of taxa in a number of species groups to the necessity of an appropriate interpretation of barcode analyses and its implications for understanding species diversity. Last, both local bioblitz surveys and global DNA-barcoding projects have played critical roles in assembling a barcode library for a local site. The strategy for barcode library construction developed in our study should serve as a model for similar efforts in other geographic regions.
Acknowledgements
The 2007 bioblitz was supported by Discover Life in America (DLIA2007-08) and by the GSMNP. Sequencing costs and informatics support were provided by grants from the Natural Sciences and Engineering Research Council of Canada and by Genome Canada to PDNH. The US Geological Survey provided funding from the Natural Resources Protection Program to CRP. Funding from Environment Canada's Water Science and Technology program enabled surveys in New Brunswick. Parks Canada supported collecting efforts across Canada under permit number NAP-2008-1636. We thank other members of the bioblitz team including Ian Stocks, Lauren Harvey, Katy Hind, John Wilson, and Anne Timm, for aid with collections and identifications. We thank the following organizations for contributing crucial specimens to this study and hosting voucher specimens: National Museum of Natural History, University of Minnesota Insect Collection, Royal Ontario Museum, Florida A&M University, Illinois Natural History Survey, Environment Canada, and Rutgers University. Numerous collaborators have helped to collect specimens, maintain vouchers, and provide taxonomic advice. We particularly thank Ralph Holzenthal, Karl Kjer, Roger Blahnik, Donald Baird, Kristie Heard, Boris Kondratieff, John Morse, Luke Myers, and Andrew Rasmussen. Sheng Li of Peking University provided help in generating the sample distribution map. Last, we thank colleagues at the Canadian Centre for DNA Barcoding, and the Biodiversity Institute of Ontario, for their assistance in the laboratory and field.