Knowing the species identity of plant roots is important for several central questions in ecology including linking patterns of above- to belowground richness (Frank et al., 2010; Kesanakurti et al., 2011; Hiiesalu et al., 2012; Pärtel et al., 2012), determining drivers of belowground community composition and diversity (Frank et al., 2010), and understanding plant-microbe interactions (Saari et al., 2005; Beiler et al., 2010; Toju et al., 2013). Ecological studies of belowground processes typically generate numerous soil samples that contain mixtures of tangled roots from different species. Roots, however, are notoriously difficult to identify with traditional methods as used for flowers and plant shoots. Consequently, a variety of approaches have been developed to allow species identification of root samples (Cahill and McNickle, 2011; Rewald et al., 2012). In this study, we used a PCR-based method to identify woody species from bulk root samples. Other researchers have used near-infrared spectroscopy and plant wax markers to estimate the proportion of root biomass per species in known mixtures (Roumet et al., 2006; Lei and Bauhus, 2010). Another approach is the use of stable carbon isotopes to discriminate between C3 and C4 plants (Polley et al., 1992; Gealy et al., 2013). Generally these approaches have been successful at identifying roots in mixtures, but molecular techniques (e.g., restriction fragment length polymorphism [RFLP], fluorescent amplified fragment length polymorphism [FAFLP], next-generation sequencing) hold the most promise because unlike morphological, anatomical, and chemical characteristics, DNA-based molecular markers do not vary depending on biotic and abiotic conditions (Rewald et al., 2012).
The majority of DNA-based species identification techniques were developed for grassland species (Moore and Field, 2005; Frank et al., 2010; Cahill and McNickle, 2011; Haling et al., 2011; Kesanakurti et al., 2011; Taggart et al., 2011; Hiiesalu et al., 2012; Pärtel et al., 2012; Wallinger et al., 2012) with limited coverage of boreal forest species, especially trees, with the exception of two studies that examined Pinus and Picea species (Govindaraju et al., 1992; Jaramillo-Correa et al., 2003). This lack of information is problematic as roots in forests make up a considerable portion of terrestrial productivity (i.e., 2–5 kg·m-2; Jackson et al., 1996), drive community composition by competing for soil resources (Casper and Jackson, 1997; Cahill and McNickle, 2011), and contribute to forest succession from soil bud banks (Bobowski et al., 1999). Tree roots also influence the hydrology (Nepstad et al., 1994) and nutrient cycling (Nepstad et al., 1994; Ruess et al., 2003) of forest soils through the turnover and activity of roots themselves, and also by interacting with ectomycorrhizal fungi (Simard et al., 1997; Plamboeck et al., 2007). Our understanding of how woody roots are involved in these ecological processes is currently limited by the ability to identify tree species in the field (Beiler et al., 2010).
The ideal technique for identifying species from root samples would be time and cost effective, accurate, and could be applied to samples containing mixed species (i.e., multiple species in a single sample). Initial studies identifying woody plant roots using molecular methods were limited to analyzing small sections of roots to ensure high-quality data (Jackson et al., 1999; Linder et al., 2000). Since then, RFLP analysis has been used to identify woody species, but requires time-consuming downstream reactions using restriction enzymes (Bobowski et al., 1999; Brunner et al., 2001; Jaramillo-Correa et al., 2003; Yanai et al., 2008). Using species-specific primers has proven to be an accurate method for identifying species (McNickle et al., 2008; Mommer et al., 2008); however, designing primers can be difficult, time consuming, and is dependent on the quality of sequence data available in reference libraries (Mommer et al., 2008, 2011). More recently, high-throughput sequencing techniques (e.g., 454 pyrosequencing, metabarcoding) have been developed, but these technologies remain somewhat expensive and are often limited to shorter DNA fragments (i.e., <500 bp; Valentini et al., 2009; Burgess et al., 2011; Yoccoz et al., 2012). Similar to RFLP analysis, identification using FAFLPs relies on differences in amplified fragment sizes and circumvents sequencing, but has the added benefit of avoiding downstream reactions with restriction enzymes (Ridgway et al., 2003; Rewald et al., 2012). Using FAFLP analysis of the trnL intron and the trnT-trnL and trnL-trnF intergenic spacers, Taggart et al. (2011) successfully identified 80% of 95 species in a grassland community.
Building on the work of Taggart et al., (2011), we developed methods for FAFLP analysis of seven common woody species in the boreal forest of Alberta: Abies balsamea (L.) Mill., Alnus crispa (Aiton) Pursh, Betula papyrifera Marshall, Picea glauca (Moench) Voss, Picea mariana (Mill.) Britton, Sterns & Poggenb., Pinus contorta Douglas ex Loudon, and Populus tremuloides Michx. (Natural Regions Committee, 2006). We chose to focus on ectomycorrhizal host species because studies documenting ectomycorrhizal host specificity require methods to identify root tips of trees (Kennedy et al., 2003; Tedersoo et al., 2008). Yet, many researchers still rely on tracing roots to hosts or harvesting whole seedlings for identification (Sakakibara et al., 2002; Ishida et al., 2007; Bent et al., 2011). Here we outline a reliable approach to identify roots of these species by comparing FAFLP size differences of three noncoding chloroplast DNA (cpDNA) regions: the trnT-trnL intergenic spacer, the trnL intron, and the trnL-trnF intergenic spacer. First, we generated a reference FAFLP size key for each of the species studied, then tested the accuracy of this technique at identifying single species and in mixtures of two, four, and six species using DNA isolated from leaf tissue. Next, we used the method to distinguish among species within mixed root soil samples collected from the field. Finally, we compare the efficiency and cost of this technique to other current molecular methods.
Site characteristics—Foliar and root samples were collected from 11 sites located near Grande Prairie, Alberta, Canada (Appendix 1), within the Lower Foothills subregion of the Foothills Natural Region (Natural Regions Committee, 2006). This ecoregion is characterized by pure and mixed stands of lodgepole pine (Pinus contorta), as well as mixed stands of aspen (Populus tremuloides) and white spruce (Picea glauca). White birch (Betula papyrifera), balsam fir (Abies balsamea), and black spruce (Picea mariana) are also common (Natural Regions Committee. 2006). The focal species in this study were chosen because of their dominance in the plant community, their ubiquitous distribution across western North America, and their ability to form ectomycorrhizas.
Generation of FAFLP size key using foliar DNA
Foliar sample collection and DNA isolation from leaf tissue—Leaf tissue from 43 individuals consisting of seven common woody species (Abies balsamea, Alnus crispa, Betula papyrifera, Picea glauca, Picea mariana, Pinus contorta, and Populus tremuloides) was sampled in June 2011. Leaf samples were stored with Drierite (W. A. Hammond Drierite Co. Ltd., Xenia, Ohio, USA) at room temperature. Specimen vouchers are deposited at the University of Alberta Vascular Plant Herbarium (ALTA; Appendix 2). Total genomic DNA was extracted using a QIAGEN DNeasy Plant Mini Kit (QIAGEN, Mississauga, Ontario, Canada) following manufacturer's instructions and subsequently stored at −20°C.
PCR amplification of foliar DNA—Three regions were examined in total using DNA isolated from foliar samples. Three cpDNA noncoding regions—the trnT-trnL intergenic spacer, the trnL intron, and the trnL-trnF intergenic spacer—were amplified with universal primers: (i) the trnT-trnL intergenic spacer with primers A (5′-CATTACAAATGCGATGCTCT-3′) and B (5′-TCTACCGATTTCGCCATATC-3′), (ii) the trnL intron with primers C (5′-CGAAATCGGTAGACGCTACG-3′) and D (5′-GGGGATAGAGGGACTTGAAC-3′), and (iii) the trnL-trnF intergenic spacer with primers E (5′-GGTTCAAGTCCCTCTATCCC) and F (5′-ATTTGAACTGGTGACACGAG; Taberlet et al., 1991). One primer each per pair was labeled with a fluorescent dye (FAM_primer A: Integrated DNA Technologies. Coralville, Iowa, USA; VIC_primer C: Applied Biosystems, Life Technologies, Carlsbad, California, USA; and NED_primer E: Applied Biosystems). PCR reactions for the trnT-trnL intergenic spacer and the trnL intron totaled 25 µL: 5.0 µL 5× e2TAK buffer (Mg2+ plus; TaKaRa Bio, Otsu, Shiga, Japan), 2.0 µL dNTP Mixture (TaKaRa Bio), 0.15 µM of each the forward and reverse primer, 1–39 ng genomic DNA, and 0.25 units of e2TAK Taq polymerase (TaKaRa Bio). PCR amplification for the trnL-trnF intergenic spacer was performed in 25-µL reactions using 0.1 µM of each forward and reverse primer, 1–39 ng of genomic DNA, 11 µL of nuclease-free water (Life Technologies), and 12.5 µL of EconoTaq PLUS 2× Master Mix (Lucigen Corporation, Middleton, Wisconsin, USA). PCR reactions were performed using an Eppendorf Mastercycler Pro gradient thermal cycler (Model 6321; Eppendorf Canada, Mississauga, Ontario, Canada). Each region had unique thermal cycler conditions (Table 1). Products were verified on a 1.5% agarose gel and visualized using ethidium bromide.
Generation of FAFLP size key—A reference size key of FAFLP amplicon sizes was generated by analyzing single species with three to eight replicates per species using foliar DNA (Table 2; Fig. 1). PCR products were diluted 200× by combining 396 µL of distilled H2O and 2 µL of PCR product from each of the regions examined. From this dilution, 2 µL were added to 8 µL of Hi-Di Formamide and 0.3 µL of GeneScan 1200 LIZ Size Standard (Applied Biosystems). The final mixture was centrifuged for 30 s at 10,000 rpm, then denatured at 95°C for 2 min and coldsnapped to maintain single-stranded fluorescently labeled DNA. Sizes of pooled PCR amplicons were first resolved using capillary electrophoresis (ABI 3730 DNA analyzer, Applied Biosystems) and then sized with GeneMapper 4.0 software (Applied Biosystems) with GeneScan 1200 LIZ Size Standard (Applied Biosystems). Fragment sizes read by the capillary sequencer were rounded to the nearest base pair (Fig. 1).
FAFLP analysis of single and mixed species samples—When analyzing mixed samples of DNA using PCR-based methods, false negatives (failing to detect a species known to be present) can occur both preamplification and postamplification. Prior research has indicated that false negatives can occur preamplification due to difficulties in DNA extraction, species-specific differences in amplification, and competition among primers (Kesanakurti et al., 2011; Mommer et al., 2011; Taggart et al., 2011). Taggart et al. (2011) investigated whether preamplification competition among primers affects species detection by combining template DNA prior to PCR amplification to determine the maximum number of species detectable in mixtures, and thus we did not pursue this in the current study. False negatives may also occur postamplification as fluorescently labeled amplicons are injected into capillaries. For negatively charged DNA, shorter fragments have greater mobility into the capillaries during injection and therefore may be more detectable than longer fragments. We tested for competition among different sizes of fragments by combining amplicons from different species (and thus different sizes) postamplification, using DNA isolated from leaf tissue. In the case of competition, longer fragments may not be detectable in mixtures. Samples containing one (n = 40), two (n = 15), four (n = 15), and six (n = 1) species were analyzed to determine how accurately we could detect species in mixtures. We could only include a maximum of six species in combination because the congeneric Picea species are indistinguishable using the two amplifiable regions in this study (see below).
Thermocycler conditions for amplification of the trnT-trnL intergenic spacer, the trnL intron, and the trnL-trnF intergenic spacer (Taggart et al., 2011).
In preparation for capillary electrophoresis, PCR products of one individual per species were serially diluted 400× without first quantifying. We use this approach because DNA abundance is expected to vary among species in environmental samples (Haling et al., 2011; Mommer et al., 2011). Thus, these reactions vary in DNA concentrations across samples and are consistent with the approach described above to generate the reference FAFLP size key. From the diluted products, 2 µL were added to 8 µL of Hi-Di Formamide and 0.3 µL of GeneScan 1200 LIZ Size Standard (Applied Biosystems). The final mixture was then prepared for capillary electrophoresis as previously described. Presence or absence of a species in a mixture was determined based on a set of liberal criteria as described by Taggart et al. (2011), where the presence of a peak for at least one region signifies the presence of a species in a mixture.
FAFLP analysis of mixed root samples
Soil sample collection and DNA isolation from root tissue—Thirty-two soil cores (5 cm diameter, 20 cm deep) were collected in June 2012 from a subset of sites (see Appendix 1) used for leaf tissue collection. Foliar and root samples were not taken from the same individuals. The identity of the nearest tree species within 1 m of a soil core was taken to aid in root identification. Soil samples were transported on ice, then frozen at −20°C until processed. Soil samples were thawed, and fine roots were washed and sieved. Subsamples of 250 mg of fine root tissue were placed in a prechilled freeze-dryer (VirTis Freezermobile FM25XL; SP Scientific, Warminster, Pennsylvania, USA) at −45°C, lyophilized for 24 h, and twice ground to a fine powder using a mixer mill (Retsch Type MM 301; Retsch GmbH, Haan, Germany) for 1 min at 25.0 Hz. DNA extraction from roots can be sensitive to secondary metabolites found in lignified root material (Linder et al., 2000). For this reason, genomic DNA was isolated from ground fine root tissue using a hexadecyltrimethylammonium bromide (CTAB) protocol according to Roe et al. (2010) with one modification: pellets were resuspended in 50 µL of nuclease-free water (Life Technologies) with gentle agitation.
PCR amplification and FAFLP analysis of DNA isolation of root tissue—Despite troubleshooting efforts, amplification of the trnT-trnL intergenic spacer was unreliable for mixed root samples and was discarded from further analysis. The remaining two cpDNA noncoding regions, (i) the trnL intron and (ii) the trnL-trnF intergenic spacer, were examined using DNA isolated from mixed root samples. PCR reactions of root samples were similar to those of foliar samples (see Table 1). PCR products were verified on a 1.5% agarose gel using SYBR Safe DNA gel stain (Life Technologies). The products were prepared for capillary electrophoresis as previously described for foliar DNA, with the following modifications: the final mixture was centrifuged for 15 s at 1000 rpm (rather than 10,000 rpm), denatured at 95°C for 2 min, and then coldsnapped.
Species included in this study and number of individuals sampled for foliar tissue per species.
Several criteria following the work of Taggart et al. (2011) were used to indicate species presence in a mixed root sample. Two identification assumptions were used: (1) liberal, in which a species is identified as present in a sample when a known fragment size or peak (Fig. 1B, C) is found for one of the two chloroplast (cpDNA) noncoding regions, and (2) conservative, in which a species is identified as present in a sample only if known peaks are present for both chloroplast (cpDNA) noncoding regions (Fig. 1B, C). We further used a constrained approach on both the liberal and conservative assumption, where the species pool was limited to only those tree species found within a 1-m diameter of a soil core.
Generation of FAFLP size key using foliar DNA —We were able to amplify DNA from 40 of the 43 individuals sampled. From single species samples, PCR amplicon sizes were obtained for all seven species for the trnT-trnL intergenic spacer, the trnL intron, and the trnL-trnF intergenic spacer using foliar DNA (Fig. 1). Intraspecific variation in FAFLP size was observed in one or more regions in all seven species (Fig. 1). Intraspecific variation in fragment length was observed in Betula papyrifera, Picea mariana, Pinus contorta, and Populus tremuloides for the trnT-trnL intergenic spacer (Fig. 1A); Picea glauca, P. mariana, and P. tremuloides for the trnL intron (Fig. 1B); and in Abies balsamea, Alnus crispa, P. contorta, P. glauca, and P. mariana for the trnL-trnF intergenic spacer (Fig. 1C). Interspecific variation was observed between P. glauca and P. mariana in both the trnT-trnL intergenic spacer and the trnL intron (Fig. 1A, B), and between A. crispa, B. papyrifera, P. glauca, and P. mariana in the trnL-trnF intergenic spacer (Fig. 1C). Additionally, interspecific variation was observed between A. crispa and P. contorta in the trnL-trnF intergenic spacer (Fig. 1C). Unique fragment sizes were found for five of the seven species studied for the trnT-trnL intergenic spacer (Fig. 1A; A. crispa, A. balsamea, B. papyrifera, P. contorta, and P. tremuloides), four species for the trnL intron (Fig. 1B; A. crispa, B. papyrifera, P. contorta, and P. tremuloides), and three species for the trnL-trnF intergenic spacer (A. balsamea, P. contorta, and P. tremuloides). Five species had unique fragment lengths using two of the three regions (trnT-trnL intergenic spacer and trnL-trnF intergenic spacer); the remaining Picea species had identical sizes for all three regions studied (Fig. 1). In summary, we were able to identify all six genera studied from two regions; however, we were unable to discern between congeneric Picea species. Using a single region (the trnL intron), we successfully identified species in mixtures containing two, four, and six species with a 0% false-positive rate and a 2% false-negative rate.
FAFLP analysis of mixed root samples —Across all 32 mixed root samples, we were able to detect the presence of a tree genus within a 1-m diameter in 100% of the soil cores using only one of the two amplifiable regions (trnL intron, 0% false-negative rate; Fig. 2). This number decreased to a rate of 88% when trying to discern both Picea species. When using the presence of known peaks in both the trnL intron and trnL-trnF intergenic spacer regions (i.e., conservative criteria), our detection rate for a known species within a 1-m diameter of a soil core decreased to 25% (a 75% false-negative rate; Fig. 2). However, using the liberal criteria, we were able to identify known tree species with a 100% success rate when looking only at known peaks of the trnL-trnF intergenic spacer region (P. tremuloides = 392; P. contorta = 476; A. balsamea = 494,508) and integrating them with the peaks of the trnL intron region.
Although we targeted tree species, several mixed root soil samples contained “extra” species (n = 5) characterized by different FAFLP sizes than those included in the size key based on foliar samples. These may have been roots of shrubs and herbaceous species, which a size profile had not been created for, as the goals of this project were not dependent upon the use of these additional species. Alternatively, the “extra” species detected could indicate additional intraspecific variation of the species included that was not captured in this study, possibly due to the small number of foliar samples.
We were successful in developing an accurate method to detect the presence of boreal tree species from mixed DNA samples. This general protocol has now been used in both grassland and forest systems, facilitated by universal plant primers for PCR (Taberlet et al., 1991). This method has promising applications in the field and to additional plant communities beyond the boreal forest and grassland regions.
In mixed DNA samples isolated from leaf tissue, we were able to identify five out of seven species in mixtures 98% of the time (2% false-negative rate) using the trnT-trnL intergenic spacer. In mixed DNA samples isolated from root tissue, we were able to identify four out of seven species using the trnL intron and three out of seven species using the trnL-trnF intergenic spacer and both the trnL intron and the trnL-trnF intergenic spacer regions combined. The general increase in the false-negative rate when including the trnL-trnF region in analyzing mixed roots appears to be because of: (1) increased interspecific variation within the trnL-trnF intergenic spacer, and (2) the intraspecific variation found between both Picea species as well as B. papyrifera, P. contorta, and A. crispa. However, we were able to distinguish among genera with 100% success and encountered no false positives using only the trnL intron region, which contained little intraspecific variation.
We were unable to discern among Picea species, although the genus Picea was differentiated from the other five genera. Many existing molecular techniques have issues with discriminating among closely related species (Bobowski et al., 1999; Linder et al., 2000; Brunner et al., 2001; Frank et al., 2010; Burgess et al., 2011; Kesanakurti et al., 2011; Mommer et al., 2011; Taggart et al., 2011). Including another region will increase the likelihood of discerning among congeneric species and many universal plant primers have been described to facilitate this (e.g., Shaw et al., 2005, 2007).
Our method has several important advantages over other molecular techniques. Compared to RFLP, FAFLP analysis saves time and labor by avoiding restriction digests; without accounting for time spent on DNA extraction or subsequent data analysis, each mixed sample took approximately three to five minutes of handling time. Run times for sequencing range between 0.5 h and 14 d (Glenn, 2011; Shokralla et al., 2012), compared to capillary electrophoresis that takes between 1–2 h per run. Our method is inexpensive (approximately US$0.50 per sample) compared to sequencing that costs between US$0.10 per megabase (Mb) and US$10/Mb, although next-generation sequencing (NGS) platforms are quickly becoming more affordable (Glenn, 2011). Other than a slight increase in cost for additional reagents needed for analyzing mixed samples, fragment sizing does not incur a large increase in cost when more regions are added because additional amplicons can be labeled with different fluorescent dyes. For a comprehensive comparison of cost and performance among different NGS platforms, see Glenn (2011) and Shokralla et al. (2012). Using a capillary sequencer, up to four regions can be analyzed simultaneously. Moreover, the use of universal primers requires no prior knowledge of sequences to design species-specific primers. Sequence information is superfluous when the objective is to determine species identity, in which case PCR-based approaches save time, labor, and resources.
A limitation of PCR-based methods is that they often require successful amplification of two or more regions for each species. This is more relevant to studies where a greater number of species are present (e.g., Taggart et al., 2011; De Barba et al., 2014). Another requirement for PCR-based methods is a complete reference database. Similar to how sequencing studies require reference databases such as BOLD or GenBank, FAFLP analysis requires a database of FAFLP sizes for all species in a community. This is important because unknown species may have identical amplicon lengths when using FAFLP analysis to identify mixed root samples from the field. Consequently, PCR-based techniques used on mixed samples often have problems with numerous false positives (detecting a species known to be absent), especially in species-rich plant communities (Taggart et al., 2011; Rewald et al., 2012). Creating a reference database by surveying plant species aboveground greatly reduces the occurrence of false positives (Taggart et al., 2011). Regardless, there is smaller risk of encountering false positives when identifying mixed samples from boreal forests because they are relatively species-poor compared to other plant communities. Anatomical and morphological characteristics can be used to filter out roots of shrubs and herbaceous species from samples, so a reference database of those species may not be necessary in studies focusing on tree roots.
Advances in sequencing technology have made NGS faster and more affordable. As a result, NGS is beginning to have applications across a multitude of different disciplines, such as studying plant belowground richness (Hiiesalu et al., 2012), diet analysis (De Barba et al., 2014), and root-fungal associations (Toju et al., 2013). For example, Toju et al. (2013) used 454 pyrosequencing to identify fungal species and their associated plant hosts from root samples, but this approach required two PCRs per root sample, thus adding to overall costs. One recent study used a combination of metabarcoding and NGS to identify plant and animal DNA to investigate diet components of brown bears (De Barba et al., 2014). By amplifying multiple DNA markers (the trnL intron, internal transcribed spacer region 1 [ITS1], and internal transcribed spacer region 2 [ITS2]) simultaneously, they streamlined the PCR process and increased species resolution for Poaceae, Asteraceae, Cyperaceae, and Rosaceae. The ITS1 and ITS2 regions were included because species resolution using the trnL intron region was limited for these families. Ultimately, knowledge of local plant distributions aboveground was still used to identify 28 out of 60 plant species (De Barba et al., 2014).
While NGS methods will likely become more readily used and accessible, it may not be the most appropriate choice for all studies. A major challenge to NGS is data processing because these approaches generate millions of reads per run, which requires substantial computing demands for storage and analysis. Moreover, the researcher is required to learn a range of data pipelines that are often unique to different sequencing platforms (Harrison and Kidner, 2011; Egan et al., 2012). These challenges aside, these studies highlight the future potential of NGS technologies as a method of identifying roots, yet future work is needed to further simplify the process of data filtering and analysis. We suggest that while sequencing is critical and appropriate for some investigations, the FAFLP method remains a useful tool for ecological studies generating hundreds of samples, where specific sequence information is not required.
The FAFLP method described here is straightforward, quick, and requires minimal resources. Furthermore, the use of universal plant primers permits the application of this method to plant communities worldwide (Taberlet et al., 1991). We acknowledge that the verification procedure used does not include two important aspects for application to the field: variable abundances of species in mixtures and species-specific biases in DNA extraction. Instead, our goal was to show proof of concept of this method. Nevertheless, this protocol was effective at identifying woody boreal forest species from roots and can easily be applied to field-based studies.
 The authors thank the Natural Sciences and Engineering Research Council of Canada (NSERC) for Discovery Grants to J.F.C. and J.C.H., a Strategic Grant to J.F.C., and an NSERC Undergraduate Student Research Award to M.J.R. We also thank the Alberta Conservation Association (ACA) for a Grant in Biodiversity to G.J.P. We are grateful to Janice Cooke, the Cahill laboratory, and the Hall laboratory for their advice during the study.