The plastome is heavily relied upon in plant systematics, owing to its conserved nature and orthology, particularly for the study of deeper evolutionary divergences. Moreover, discordance between the uniparentally inherited plastome and the biparentally inherited nuclear genome may provide insights into introgression events and their direction (Twyford and Ennos, 2012). However, the low rate of molecular evolution in the plastome can become a hindrance when reconstructing relationships between closely related taxa, requiring large amounts of data to resolve these relationships (Uribe-Convers et al., 2016). In an attempt to alleviate this problem, several recent studies have leveraged available high-throughput sequencing data for the development of variable taxon-specific plastid (and nuclear) regions (e.g., Uribe-Convers et al., 2016).
Castilleja L. (Orobanchaceae; “the paintbrushes”) is a taxonomically challenging clade that includes ∼200 hemiparasitic species, many of which have a complicated history of polyploidy and/or hybridization (Heckard and Chuang, 1977). Microsatellite markers have been developed in Castilleja for population genetic studies (Fant et al., 2013), and broader, genus-wide phylogenetic reconstructions within Castilleja used two chloroplast regions (trnL-F and the rps16 intron), nuclear ribosomal spacers (ITS and ETS), and a low-copy nuclear gene (waxy) (Tank and Olmstead, 2008, 2009). However, species-level relationships lacked resolution in Tank and Olmstead (2008, 2009), limiting conclusions regarding diversification and hybridization. Here, we follow Uribe-Convers et al. (2016) for primer design and validation of the most highly variable chloroplast regions in Castilleja. Because these primers were designed for the Fluidigm Access Array microfluidic PCR system (Fluidigm, South San Francisco, California, USA), annealing temperature specifications are consistent across all primer combinations; this allows for parallelization of PCR and is ideal for high-throughput sequencing platforms (see Uribe-Convers et al., 2016 for application of this approach). Although our initial focus was the development of Castilleja-specific primers, we evaluated their utility in silico in three other lineages of Orobanchaceae to obtain a subset of “core” chloroplast primers with the potential to amplify across the clade. Once identified, we surveyed this set of core primers to assess their performance using additional sampling across Orobanchaceae. Orobanchaceae represents the largest parasitic clade of angiosperms and has well-documented modifications to the plastome, such as reduction and accelerated rates of molecular evolution; however, the most comprehensive phylogenetic investigation to date was based on only five gene regions (McNeal et al., 2013). Thus, an expanded molecular toolkit will be of great benefit for future investigations in the clade.
All primer pair sequences designed for Castilleja (names and region amplified), amplicon lengths, and validation results for Orobanchaceae and outgroup taxon Paulownia. All pairs were designed for an annealing temperature of 60°C (±1°C). Combinations are listed from most variable to least variable, according to our prioritization scheme (see text). Boldfaced rows correspond to core Orobanchaceae primers, defined by successful amplification in two or more major clades in Orobanchaceae (see Fig. 1).
METHODS AND RESULTS
Three species of Castilleja were selected for genome skimming (C. cusickii Greenm., C. foliolosa Hook. & Arn., C. tenuis (A. Heller) T. I. Chuang & Heckard; Appendix 1), with taxa chosen to include both annual and perennial lineages (National Center for Biotechnology Information [NCBI] Sequence Read Archive [SRA] accession SRP100222). DNA extraction, purification, Illumina library construction, and subsequent cleaning of reads followed Uribe-Convers et al. (2016). Samples were sequenced as 100-bp single-end reads on an Illumina HiSeq 2000 (Illumina, San Diego, California, USA) at the University of Oregon, and cleaned reads were assembled against a reference genome (Sesamum indicum L. JN637766) using the Alignreads pipeline version 2.25 (Straub et al., 2011). In addition to these three low-coverage genomes, we also used existing data for 12 Castilleja plastomes generated by Uribe-Convers et al. (2014) using a long-PCR approach. Fifteen plastomes in total were aligned using MAFFT version 7.017b under the default settings (Katoh and Standley, 2013).
We used a custom R script (Uribe-Convers et al., 2016) to identify the most variable regions of the alignment spanning 400–1000 bp that were flanked by conserved regions, enabling prioritization based on predicted amplicon size and variability. Regions containing ambiguous bases were discarded, and those missing from one or more taxa in the alignment, particularly in the plastomes generated through the long-PCR method, were given lesser priority. We used Primer3 (Untergasser et al., 2012) to design primer pairs for the selected regions with an annealing temperature of 60°C (±1°C), and allowing no more than three continuous nucleotides of the same base, following the specifications of the Fluidigm Access Array System protocol.
We validated each primer combination using PCR with three high-quality Castilleja DNA isolations chosen to represent major lineages, sensu Tank and Olmstead (2008) (C. lineariloba (Benth.) T. I. Chuang & Heckard, C. lemmonii A. Gray, and C. pumila Wedd.; Appendix 1), but different than those selected for genome skimming and primer design, and a negative control. Because we followed the approach of Uribe-Convers et al. (2016), it was necessary for our validation conditions to simulate the four-primer reaction of the Fluidigm microfluidic PCR using a standard thermocycler. Therefore, our target-specific primers include a 5′ conserved sequence (CS) tag, obtained from the Fluidigm Access Array System protocol, which provides an annealing site for Illumina sequencing adapters and sample-specific barcodes. PCR amplification followed Uribe-Convers et al. (2016), and amplicons were visualized on a standard agarose gel. In total, 76 primer combinations were successfully designed and validated (Table 1).
To test the broader utility of our Castilleja-specific primers, we searched for matches in two published plastome assemblies for Lamourouxia virgata Kunth (Pedicularideae, Clade IV; Fig. 1) and Neobartsia stricta (Kunth) Uribe-Convers & Tank (Rhinantheae, Clade V) (NCBI SRA accessions SRR1023133 and SRR1023130, respectively; Uribe-Convers et al., 2014). We assembled the plastome for a third taxon, Physocalyx major Mart. (Buchnereae, Clade VI; NCBI SRA accession SRP100222), to include in our comparison. Physocalyx major was sequenced on an Illumina HiSeq 2000 at the University of Oregon as 100-bp paired-end reads. Cleaned reads for P. major were mapped to three reference plastomes with one copy of the inverted repeat region removed (Sesamum indicum JN637766, Neobartsia inaequalis (Benth.) Uribe-Convers & Tank KF922718, Castilleja paramensis F. González & Pabón-Mora KT959111) using Bowtie2 (Langmead and Salzberg, 2012). Consensus sequences of the resultant contigs were obtained and used as final references. Contigs were then imported into Geneious R7 version 7.0.6 (Kearse et al., 2012), and a consensus sequence was obtained by calling regions with less than 5× coverage as “N” and using the “Highest Quality” as a threshold.
Separate BLAST databases were created for Lamourouxia Kunth, Neobartsia Uribe-Convers & Tank, and Physocalyx Pohl assemblies (-makeblastdb), and blastn_short was used to search for matching hits with the list of Castilleja chloroplast primers. Hits were further considered if both primer pairs (1) occurred on the same contig and (2) had predicted amplicon sizes between 350–1000 bp. Once we obtained a set of primer hits for the three taxa, they were validated with PCR using L. virgata, P. major, and Neobartsia filiformis (Wedd.) Uribe-Convers & Tank (Appendix 1), as described above. Primer pairs with amplification in at least two out of three taxa above were chosen for another round of PCR validation with expanded taxon sampling that represented all major lineages of Orobanchaceae (sensu McNeal et al., 2013; Appendix 1): Lindenbergia sp. Lehm. (Clade I), Schwalbea americana L. (Cymbarieae, Clade II), Orobanche californica Cham. & Schltdl. (Orobancheae, Clade III), Pedicularis sp. L. (Pedicularideae, Clade IV), Rhinanthus alectorolophus (Scop.) Pollich (Rhinantheae, Clade V), Harveya purpurea Harv. (Buchnereae, Clade VI), and Paulownia Siebold & Zucc. (Paulowniaceae; outgroup). As a positive control, we included CS-tagged “universal” primers for the trnL-F region (“trn-c” and “trn-f” of Taberlet et al., 1991, in Tank and Olmstead, 2008).
Out of the 76 primer pairs designed and validated for Castilleja, we identified 36 pairs with applicability across Orobanchaceae (referred to as core Orobanchaceae primers; these are boldfaced in Table 1). These were chosen based on amplification across a large phylogenetic breadth of the clade, but allowing for some failures. For example, Orobanche, a holoparasite, failed for most primer combinations, a result that is likely due to the reduction and modification of the plastome in this lineage (see Bennett and Mathews, 2006). Higher success rates were noted for hemiparasites.
We report 76 primer pairs designed to target the most variable regions of the chloroplast genome in Castilleja. We further demonstrate their utility across other major clades in Orobanchaceae, particularly with hemiparasitic taxa, and present a subset of 38 core Orobanchaceae primers. Although these primer combinations target similar highly variable plastid regions as in other angiosperm-wide studies (e.g., Ebert and Peakall, 2009), few of the primers reported here overlap directly with them. Two exceptions are Cas_11589 F (trnG) and Cas_61880 F (psaI) (Table 1), which were also developed by Ebert and Peakall (2009). Notably, our primer combinations were designed with the same annealing temperature to take advantage of the Fluidigm microfluidic PCR system and high-throughput sequencing platforms, but will also be useful for traditional PCR and Sanger sequencing.
This research was supported by resources at the Institute for Bioinformatics and Evolutionary Studies (IBEST; NIH/NCRR P20RR16448 and P20RR016454) and by the following awards from the National Science Foundation: DEB-1253463 (awarded to D.C.T.), DEB-1502061 (awarded to D.C.T. for S.J.J.), and DEB-1210895 (awarded to D.C.T. for S.U.C.).
- Bennett, J. R., and S. Mathews. 2006. Phylogeny of the parasitic plant family Orobanchaceae inferred from phytochrome A. American Journal of Botany 93: 1039–1051. Google Scholar
- Ebert, D., and R. Peakall. 2009. A new set of universal de novo sequencing primers for extensive coverage of noncoding chloroplast DNA: New opportunities for phylogenetic studies and cpSSR discovery. Molecular Ecology Resources 9: 777–783. Google Scholar
- Fant, J. B., H. Wolf-Weinberg, D. C. Tank, and K. A. Skogen. 2013. Characterization of microsatellite loci in Castilleja sessiliflora and transferability to 24 Castilleja species (Orobanchaceae). Applications in Plant Sciences 1: 1200564. Google Scholar
- Heckard, L. H., and T.-I. Chuang. 1977. Chromosome numbers, polyploidy, and hybridization in Castilleja (Scrophulariaceae) of the Great Basin and Rocky Mountains. Brittonia 29: 159–172. Google Scholar
- Katoh, K., and D. M. Standley. 2013. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in performance and usability. Molecular Biology and Evolution 30: 772–780. Google Scholar
- Kearse, M., R. Moir, A. Wilson, S. Stones-Havas, M. Cheung, S. Sturrock, S. Buxton, et al. 2012. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics (Oxford, England) 28: 1647–1649. Google Scholar
- Langmead, B., and S. Salzberg. 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods 9: 357–359. Google Scholar
- McNeal, J. R., J. R. Bennett, A. D. Wolfe, and S. Mathews. 2013. Phylogeny and origins of holoparasitism in Orobanchaceae. American Journal of Botany 100: 971–983. Google Scholar
- Straub, S. C. K., M. Fishbein, T. Livshultz, Z. Foster, M. Parks, K. Weitemier, R. C. Cronn, and A. Liston. 2011. Building a model: Developing genomic resources for common milkweed (Asclepias syriaca) with low-coverage genome sequencing. BMC Genomics 12: 211. Google Scholar
- Tank, D. C., and R. G. Olmstead. 2008. From annuals to perennials: Phylogeny of subtribe Castillejinae (Orobanchaceae). American Journal of Botany 95: 608–625. Google Scholar
- Tank, D. C., and R. G. Olmstead. 2009. The evolutionary origin of a second radiation of annual Castilleja (Orobanchaceae) species in South America: The role of long distance dispersal and allopolyploidy. American Journal of Botany 96: 1907–1921. Google Scholar
- Twyford, A. D., and R. A. Ennos. 2012. Next-generation hybridization and introgression. Heredity 108: 179–189. Google Scholar
- Untergasser, A., I. Cutcutache, T. Koressaar, J. Ye, B. C. Faircloth, M. Remm, and S. G. Rozen. 2012. Primer3-New capabilities and interfaces. Nucleic Acids Research 40: e115. Google Scholar
- Uribe-Convers, S., J. R. Duke, M. J. Moore, and D. C. Tank. 2014. A long-PCR based method for chloroplast genome enrichment and phylogenomics in angiosperms. Applications in Plant Sciences 2: 1300063. Google Scholar
- Uribe-Convers, S., M. L. Settles, and D. C. Tank. 2016. A phylogenomic approach based on PCR target enrichment and high throughput sequencing: Resolving the diversity within the South American species of Bartsia L. (Orobanchaceae). PLoS ONE 11: e0148203. Google Scholar