The rapidly evolving and highly variable gene maturase K (matK; Hilu and Liang, 1997) has been recommended as a locus for DNA barcoding by the Consortium for the Barcode of Life (CBOL) Plant Working Group (Hollingsworth et al., 2009). Amplification and sequencing of the matK barcoding region is difficult due to high sequence variability in the primer binding sites (Hollingsworth et al., 2011). Currently, there are three popular matK primer pairs available to amplify approximately the same region of the gene: 390F and 1326R (Sun et al., 2001; Cuénoud et al., 2002), XF and 5R (Ford et al., 2009), and 1R_KIM and 3F_KIM (Hollingsworth et al., 2009; Jeanson et al., 2011). Kress et al. (2009) used these three primer pairs to amplify DNA barcodes from 296 shrub and tree species. These primer combinations showed amplification success in 85% and sequencing success in 69% of the species, proving that reliable amplification is possible across a range of plants, using several primer combinations. However, using more than one primer pair can be time consuming as well as costly and is often complex for large-scale projects (e.g., Heckenhauer et al., unpublished data).
Here, we report a set of universal primers that can be multiplexed in one PCR to amplify matK successfully in angiosperms and expedite high-throughput, rapid, automated, and cost-effective species identification. We present methods that enable efficient PCR amplification and sequencing of the matK barcode region.
METHODS AND RESULTS
Sequences of the matK gene from 178 taxa belonging to 123 genera and 41 families were obtained from GenBank ( www.ncbi.nlm.nih.gov/genbank; Appendix S1 (apps.1500137_s1.docx)) and aligned using the MAFFT plugin (Katoh and Standley, 2013) in Geneious (version 8.0.5; Kearse et al., 2012). Because primers were initially developed for a barcoding project dealing primarily with the tree flora of Southeast Asia, matK sequences of the most representative genera and families of dicots and monocots were used. The target DNA region was located between positions 383 and 1343 of the matK gene (with respect to Arabidopsis thaliana (L.) Heynh.) and includes the binding sites of the three commonly used matK primer pairs. Primers were designed at the most conserved regions, resulting in a fragment between positions 383 and 1256 (positions 414–1226, excluding the primer sequences). Forward primers are at a similar position to the 390F and XF primers, whereas the reverse primers are located downstream from the above-cited reverse primers to avoid a region of up to 11 adenine bases (e.g., Sterculia tragacantha Lindl. AY321178, positions 1257–1267). which could cause PCR and sequencing problems. To minimize primer degeneracy, aligned sequences were clustered into seven groups according to their genetic similarity in the MAFFT alignment, in which sequences are sorted according to their pairwise distances. Thus, for each cluster, primers with no more than five degenerate nucleotide positions were developed. Primers were developed manually considering primer properties (annealing temperature, 3′ and 5′ end stability) and primer secondary structures (cross dimers, dimers, hairpins) with the use of NetPrimer (PREMIER Biosoft International, Palo Alto, California, USA; www.premierbiosoft.com/netprimer/netprlaunch/netprlaunch.html). Primers were designed at the same positions in the matK gene for the forward and reverse primers so that they could be multiplexed in a single PCR for each sample. Seven forward and seven reverse primers were developed. Because using more primer combinations in a multiplex PCR reduces the probability of the most appropriate primers binding to the target region, only five forward and five reverse primers for the most frequent sequences in our alignment were multiplexed (Table 1 : C_MATK_F/C_MATK_R). Primers were mixed in different ratios depending on their level of degeneration (Table 1). The remaining two forward and two reverse primers serve as spares for amplification of taxa that fail amplification using the previous five-primer combination. Primers were compared against the National Center for Biotechnology Information (NCBI) GenBank nucleotide reference database using the Mega BLAST algorithm ( blast.ncbi.nlm.nih.gov/Blast.cgi). Table 2 shows BLAST results with no mismatches in forward or reverse primers at the family level. Thus, in studies where the species are identified to family level, primers can be combined accordingly in a multiplex PCR. To evaluate the universality of the primers, multiplex PCR was conducted on DNA of 54 species from 48 families, representing frequently occurring trees and palms (e.g., Arecaceae, Dipterocarpaceae, Euphorbiaceae) in Southeast Asia (Table 3), along with other taxa from other parts of the world to improve the coverage of angiosperms (e.g., Leontodon [Asteraceae], Tillandsia [Bromeliaceae], Helianthemum [Cistaceae], Polystachya [Orchidaceae]). Approximately 30 mg of silica gel–dried material (bark or leaves) was transferred into a 96-well plate, and genomic DNA was extracted using the DNeasy 96 Plant Kit (QIAGEN, Hilden, Germany). PCRs included 5 µL of 2× ReddyMix PCR Master Mix with 1.5 mM MgCl2 (#AB-0575/DC/LD/A; Thermo Fisher Scientific, Waltham, Massachusetts, USA), 0.1 µL of forward and reverse primer cocktail each at 50 µM (final concentration 0.5 µM), 1 µL of template DNA, and H2O up to a final volume of 10 µL. Thermocycler conditions were as follows: 95°C for 2 min: five cycles of 95°C for 25 s, 46°C for 35 s, and 70°C for 1 min; 35 cycles of 95°C for 25 s, 48°C for 35 s, and 70°C for 1 min; and a final extension at 72°C for 5 min. For samples that did not amplify using the above-mentioned protocol, the 2× Phusion Green HS II Hi-Fi PCR Master Mix with 1.5 mM MgCl2 (#F-566S, Thermo Fisher Scientific) was used with the following thermocycler conditions: 98°C for 30 s; five cycles of 98°C for 10 s, 53°C for 30 s, and 72°C for 30 s; 35 cycles of 98°C for 10 s, 55°C for 30 s, and 72°C for 30 s; and a final extension at 72°C for 5 min. PCR products were visualized on a 1.5% TAE agarose gel using ethidium bromide staining. After cleaning the PCR products with 1 µL exonuclease I and FastAP thermosensitive alkaline phosphatase mixture (7 units Exo I, 0.7 units FastAP; Thermo Fisher Scientific) at 37°C for 45 min and 85°C for 15 min, barcodes were Sanger sequenced with the BigDye Terminator Kit version 3.1 (Thermo Fisher Scientific) according to the manufacturer's instructions. Sequencing was carried out using an ABI 3730xL DNA Analyzer (Applied Biosystems, Foster City, California, USA) at the Department of Botany and Biodiversity Research, University of Vienna. Bidirectional sequences were assembled in Geneious and edited.
Primers developed for multiplex PCR used to amplify the matK barcoding region. The forward (C_MATK_F) and reverse (C_MATK_R) primer cocktail as well as the four additional primers are given with their proportions in the primer cocktail.
Using 2× ReddyMix PCR Master Mix, all samples could be amplified except for one sample with low-quality DNA (Fig. 1, slot 30). This sample was successfully amplified in a PCR with 2× Phusion Green HS II Hi-Fi PCR Master Mix (Fig. 1, slot 31). Overall, the newly designed degenerate primer cocktails were very effective (100%) in amplifying the target matK region, with a product of 813 bp in length in Arabidopsis thaliana. By multiplexing the primers in a single PCR, barcodes were recovered from all samples.
We developed 14 universal, partly degenerate primers suitable for DNA barcoding of angiosperms that may also be suitable for multiplexed amplicon sequencing approaches on next-generation sequencing platforms (e.g., fusion primers on the Illumina system, see Elbrecht and Leese, 2015). We confirmed the effectiveness of our multiplexed primers on 53 species from 44 different plant families. Amplification success for these multiplexed primers in the cross-transferability tests with plant families outside Southeast Asia extends their potential usefulness, especially for large-scale barcoding projects with a diverse composition of plant families. Furthermore, by improving the routine amplification of the matK barcode, the establishment of our multiplex PCR approach will reduce laboratory costs as well as potential laboratory errors.
Recommended use of primers for different families, based on BLAST matches with no mismatches.a
Taxa used for primer testing.
This research was funded by the Austrian Science Fund (Fonds zur Förderung der wissenschaftlichen Forschung [FWE]; AP26548-B22). The authors thank Anton Russell for language editing.