The plastome is heavily relied upon in plant systematics, owing to its conserved nature and orthology, particularly for the study of deeper evolutionary divergences. Moreover, discordance between the uniparentally inherited plastome and the biparentally inherited nuclear genome may provide insights into introgression events and their direction (Twyford and Ennos, 2012). However, the low rate of molecular evolution in the plastome can become a hindrance when reconstructing relationships between closely related taxa, requiring large amounts of data to resolve these relationships (Uribe-Convers et al., 2016). In an attempt to alleviate this problem, several recent studies have leveraged available high-throughput sequencing data for the development of variable taxon-specific plastid (and nuclear) regions (e.g., Uribe-Convers et al., 2016).
Castilleja L. (Orobanchaceae; “the paintbrushes”) is a taxonomically challenging clade that includes ∼200 hemiparasitic species, many of which have a complicated history of polyploidy and/or hybridization (Heckard and Chuang, 1977). Microsatellite markers have been developed in Castilleja for population genetic studies (Fant et al., 2013), and broader, genus-wide phylogenetic reconstructions within Castilleja used two chloroplast regions (trnL-F and the rps16 intron), nuclear ribosomal spacers (ITS and ETS), and a low-copy nuclear gene (waxy) (Tank and Olmstead, 2008, 2009). However, species-level relationships lacked resolution in Tank and Olmstead (2008, 2009), limiting conclusions regarding diversification and hybridization. Here, we follow Uribe-Convers et al. (2016) for primer design and validation of the most highly variable chloroplast regions in Castilleja. Because these primers were designed for the Fluidigm Access Array microfluidic PCR system (Fluidigm, South San Francisco, California, USA), annealing temperature specifications are consistent across all primer combinations; this allows for parallelization of PCR and is ideal for high-throughput sequencing platforms (see Uribe-Convers et al., 2016 for application of this approach). Although our initial focus was the development of Castilleja-specific primers, we evaluated their utility in silico in three other lineages of Orobanchaceae to obtain a subset of “core” chloroplast primers with the potential to amplify across the clade. Once identified, we surveyed this set of core primers to assess their performance using additional sampling across Orobanchaceae. Orobanchaceae represents the largest parasitic clade of angiosperms and has well-documented modifications to the plastome, such as reduction and accelerated rates of molecular evolution; however, the most comprehensive phylogenetic investigation to date was based on only five gene regions (McNeal et al., 2013). Thus, an expanded molecular toolkit will be of great benefit for future investigations in the clade.
All primer pair sequences designed for Castilleja (names and region amplified), amplicon lengths, and validation results for Orobanchaceae and outgroup taxon Paulownia. All pairs were designed for an annealing temperature of 60°C (±1°C). Combinations are listed from most variable to least variable, according to our prioritization scheme (see text). Boldfaced rows correspond to core Orobanchaceae primers, defined by successful amplification in two or more major clades in Orobanchaceae (see Fig. 1).
METHODS AND RESULTS
Three species of Castilleja were selected for genome skimming (C. cusickii Greenm., C. foliolosa Hook. & Arn., C. tenuis (A. Heller) T. I. Chuang & Heckard; Appendix 1), with taxa chosen to include both annual and perennial lineages (National Center for Biotechnology Information [NCBI] Sequence Read Archive [SRA] accession SRP100222). DNA extraction, purification, Illumina library construction, and subsequent cleaning of reads followed Uribe-Convers et al. (2016). Samples were sequenced as 100-bp single-end reads on an Illumina HiSeq 2000 (Illumina, San Diego, California, USA) at the University of Oregon, and cleaned reads were assembled against a reference genome (Sesamum indicum L. JN637766) using the Alignreads pipeline version 2.25 (Straub et al., 2011). In addition to these three low-coverage genomes, we also used existing data for 12 Castilleja plastomes generated by Uribe-Convers et al. (2014) using a long-PCR approach. Fifteen plastomes in total were aligned using MAFFT version 7.017b under the default settings (Katoh and Standley, 2013).
We used a custom R script (Uribe-Convers et al., 2016) to identify the most variable regions of the alignment spanning 400–1000 bp that were flanked by conserved regions, enabling prioritization based on predicted amplicon size and variability. Regions containing ambiguous bases were discarded, and those missing from one or more taxa in the alignment, particularly in the plastomes generated through the long-PCR method, were given lesser priority. We used Primer3 (Untergasser et al., 2012) to design primer pairs for the selected regions with an annealing temperature of 60°C (±1°C), and allowing no more than three continuous nucleotides of the same base, following the specifications of the Fluidigm Access Array System protocol.
We validated each primer combination using PCR with three high-quality Castilleja DNA isolations chosen to represent major lineages, sensu Tank and Olmstead (2008) (C. lineariloba (Benth.) T. I. Chuang & Heckard, C. lemmonii A. Gray, and C. pumila Wedd.; Appendix 1), but different than those selected for genome skimming and primer design, and a negative control. Because we followed the approach of Uribe-Convers et al. (2016), it was necessary for our validation conditions to simulate the four-primer reaction of the Fluidigm microfluidic PCR using a standard thermocycler. Therefore, our target-specific primers include a 5′ conserved sequence (CS) tag, obtained from the Fluidigm Access Array System protocol, which provides an annealing site for Illumina sequencing adapters and sample-specific barcodes. PCR amplification followed Uribe-Convers et al. (2016), and amplicons were visualized on a standard agarose gel. In total, 76 primer combinations were successfully designed and validated (Table 1).
To test the broader utility of our Castilleja-specific primers, we searched for matches in two published plastome assemblies for Lamourouxia virgata Kunth (Pedicularideae, Clade IV; Fig. 1) and Neobartsia stricta (Kunth) Uribe-Convers & Tank (Rhinantheae, Clade V) (NCBI SRA accessions SRR1023133 and SRR1023130, respectively; Uribe-Convers et al., 2014). We assembled the plastome for a third taxon, Physocalyx major Mart. (Buchnereae, Clade VI; NCBI SRA accession SRP100222), to include in our comparison. Physocalyx major was sequenced on an Illumina HiSeq 2000 at the University of Oregon as 100-bp paired-end reads. Cleaned reads for P. major were mapped to three reference plastomes with one copy of the inverted repeat region removed (Sesamum indicum JN637766, Neobartsia inaequalis (Benth.) Uribe-Convers & Tank KF922718, Castilleja paramensis F. González & Pabón-Mora KT959111) using Bowtie2 (Langmead and Salzberg, 2012). Consensus sequences of the resultant contigs were obtained and used as final references. Contigs were then imported into Geneious R7 version 7.0.6 (Kearse et al., 2012), and a consensus sequence was obtained by calling regions with less than 5× coverage as “N” and using the “Highest Quality” as a threshold.
Separate BLAST databases were created for Lamourouxia Kunth, Neobartsia Uribe-Convers & Tank, and Physocalyx Pohl assemblies (-makeblastdb), and blastn_short was used to search for matching hits with the list of Castilleja chloroplast primers. Hits were further considered if both primer pairs (1) occurred on the same contig and (2) had predicted amplicon sizes between 350–1000 bp. Once we obtained a set of primer hits for the three taxa, they were validated with PCR using L. virgata, P. major, and Neobartsia filiformis (Wedd.) Uribe-Convers & Tank (Appendix 1), as described above. Primer pairs with amplification in at least two out of three taxa above were chosen for another round of PCR validation with expanded taxon sampling that represented all major lineages of Orobanchaceae (sensu McNeal et al., 2013; Appendix 1): Lindenbergia sp. Lehm. (Clade I), Schwalbea americana L. (Cymbarieae, Clade II), Orobanche californica Cham. & Schltdl. (Orobancheae, Clade III), Pedicularis sp. L. (Pedicularideae, Clade IV), Rhinanthus alectorolophus (Scop.) Pollich (Rhinantheae, Clade V), Harveya purpurea Harv. (Buchnereae, Clade VI), and Paulownia Siebold & Zucc. (Paulowniaceae; outgroup). As a positive control, we included CS-tagged “universal” primers for the trnL-F region (“trn-c” and “trn-f” of Taberlet et al., 1991, in Tank and Olmstead, 2008).
Out of the 76 primer pairs designed and validated for Castilleja, we identified 36 pairs with applicability across Orobanchaceae (referred to as core Orobanchaceae primers; these are boldfaced in Table 1). These were chosen based on amplification across a large phylogenetic breadth of the clade, but allowing for some failures. For example, Orobanche, a holoparasite, failed for most primer combinations, a result that is likely due to the reduction and modification of the plastome in this lineage (see Bennett and Mathews, 2006). Higher success rates were noted for hemiparasites.
We report 76 primer pairs designed to target the most variable regions of the chloroplast genome in Castilleja. We further demonstrate their utility across other major clades in Orobanchaceae, particularly with hemiparasitic taxa, and present a subset of 38 core Orobanchaceae primers. Although these primer combinations target similar highly variable plastid regions as in other angiosperm-wide studies (e.g., Ebert and Peakall, 2009), few of the primers reported here overlap directly with them. Two exceptions are Cas_11589 F (trnG) and Cas_61880 F (psaI) (Table 1), which were also developed by Ebert and Peakall (2009). Notably, our primer combinations were designed with the same annealing temperature to take advantage of the Fluidigm microfluidic PCR system and high-throughput sequencing platforms, but will also be useful for traditional PCR and Sanger sequencing.
This research was supported by resources at the Institute for Bioinformatics and Evolutionary Studies (IBEST; NIH/NCRR P20RR16448 and P20RR016454) and by the following awards from the National Science Foundation: DEB-1253463 (awarded to D.C.T.), DEB-1502061 (awarded to D.C.T. for S.J.J.), and DEB-1210895 (awarded to D.C.T. for S.U.C.).