Although the plastome has long been considered the workhorse of phylogenetic inference in plants, reliance on chloroplast data alone may limit the ability to identify inheritance patterns of polyploids, as well as introgression and hybridization events (Godden et al., 2012; Twyford and Ennos, 2012). The reliable estimation of an underlying species tree also depends upon the acquisition of multiple, unlinked loci—especially for recent and rapid radiations—shifting the focus toward the development of single- or low-copy nuclear gene regions for phylogenetic analyses. The process of identifying and developing these nuclear gene regions using traditional methods can be time consuming and costly, but the increasing availability of high-throughput sequencing data, as well as new bioinformatic approaches, allows for the efficient and cost-effective exploration of the nuclear genome.
Here we focus on developing a suite of putatively single-copy nuclear gene regions in Castilleja L. (Orobanchaceae; “the paintbrushes”), a clade rich with polyploid and hybrid taxa, and the product of an ongoing rapid radiation (Tank and Olmstead, 2008, 2009). Previous studies using the nuclear ribosomal ITS and ETS regions, the low-copy nuclear gene waxy, and the plastid trnL-F and rps16 intron regions hinted at cytonuclear discordance in some taxa, and most relationships among closely related taxa were unresolved (Tank and Olmstead, 2008, 2009). We recently developed primer combinations targeting the most variable regions of the plastome in Castilleja (Latvis et al., 2017a), and now present a companion set of nuclear primers with the goal of obtaining a resolved species tree for this challenging clade, as well as to aid in the detection of introgression and hybrid speciation. Primers for microsatellite markers have also been developed by Fant et al. (2013) for population-level investigations. In part, we follow the approach outlined by Blischak et al. (2014) to develop nuclear primer combinations from genome-skimming data, while following specifications for the Fluidigm Access Array microfluidic PCR system (Fluidigm, South San Francisco, California, USA) (see Latvis et al., 2017a). Thus, all primer combinations use the same annealing temperature of 60°C and may be amplified in parallel prior to high-throughput sequencing or traditional Sanger sequencing. We specifically target putatively single-copy genes from the conserved ortholog set (COSII) and pentatricopeptide repeat (PPR) domains, both of which have been highlighted for their phylogenetic utility in plants (COSII: Wu et al., 2006; PPR domain: Li et al., 2008; Yuan et al., 2009, 2010). We also test these primers for their broader applicability across Orobanchaceae following the approach outlined in Latvis et al. (2017a) with the goal of finding a subset of nuclear gene regions that would amplify across this more inclusive clade. Previous phylogenetic studies within Orobanchaceae employed the nuclear phytochrome genes PHYA and PHYB, the nuclear ribosomal ITS region, and the plastid matK and rps2 genes. Orobanchaceae is the largest clade of parasitic angiosperms, and plastome reduction and accelerated rates of molecular evolution in retained plastid genes have been documented (see discussion in Bennett and Mathews, 2006). Additionally, phytochrome genes regulate responses to light and can be significantly modified in parasites (Bennett and Mathews, 2006). Therefore, the development of additional single-copy nuclear regions would provide a much-needed source of phylogenetic information in Orobanchaceae, and may provide a more reliable estimate of branch lengths for further studies of diversification and character evolution.
Table 1.
Nuclear primer pairs designed for Castilleja (locus and region amplified), amplicon lengths, and validation results for Orobanchaceae and outgroup taxon Paulownia. All pairs were designed for an annealing temperature of 60°C (±1°C). Boldfaced rows correspond to core Orobanchaceae primers, defined by successful amplification in two or more major clades in Orobanchaceae (see Fig. 1).
Continued.
Continued.
Continued.
METHODS AND RESULTS
We assembled contigs from raw reads of three low-coverage Castilleja genomes, C. cusickii Greenm., C.foliolosa Hook. & Arn., and C. tenuis (A. Heller) T. I. Chuang & Heckard (Latvis et al., 2017a; National Center for Biotechnology Information [NCBI] Sequence Read Archive [SRA] accession SRP100222) using CAP3 (Huang and Madan, 1999) with the default settings. The accessions were sequenced as 100-bp single-end reads on an Illumina HiSeq 2000 (Illumina, San Diego, California, USA), yielding ~12.5 million reads per taxon (Uribe-Convers et al., 2014) and an average depth of coverage of ~0.8×. These taxa include both annual and perennial lineages of Castilleja and span the phylogenetic breadth of the clade (Tank and Olmstead, 2008, 2009). These assemblies were then culled to include only contigs of 1 Kb or larger using a custom R script. The culled assemblies and script are available from the Dryad Digital Repository ( http://doi.org/10.5061/dryad.52v62; Latvis et al., 2017b).
To search for hits among our contigs, available COS sequences were obtained from Sol genomics ( https://solgenomics.net), and PPR loci were mined from the Mimulus L. genome on Phytozome (Hellsten et al., 2013; https://phytozome.jgi.doe.gov) using the 127 PPR orthologs identified in Arabidopsis Heynh. by Yuan et al. (2009) as references. Both gene sets may be found in Uribe-Convers et al. (2016) and were used to construct local BLAST databases for the search (-makeblastdb). We used TBLASTX to search each Castilleja CAP3 assembly (with contigs of 1 KB or greater) against both the COS and PPR databases, indicating tab-delimited output (-outfmt 6). Output files were filtered for alignment length >200 and a maximum E-value of 1e-10, and were culled to include only unique hits.
Hits shared between C. cusickii, C. foliolosa, and C. tenuis were placed together into individual FASTA files (data available from the Dryad repository: http://doi.org/10.5061/dryad.52v62; Latvis et al., 2017b), imported into Geneious R7 version 7.0.6 (Kearse et al., 2012), and aligned with MAFFT version 7.017b under the default settings (Katoh and Standley, 2013). We designed primer pairs using Primer3 (Untergasser et al., 2012) using the same specifications for the Fluidigm Access Array system as Latvis et al. (2017a), but with a size range between 400–525 bp and an optimal size of 500 bp. We designed 10–30 primer pairs for each identified locus and prioritized them based on desired size and the presence of multiple G or C bases at the 3′ end of the primers (GC clamp). This also allowed us to design overlapping sets of primers with the potential to combine them after sequencing to produce longer contigs for downstream phylogenetic analyses. Suitable primer pairs were validated for Castilleja with PCR following the same amplification protocol and using the same Castilleja accessions as in Latvis et al. (2017a) and visualized on an agarose gel. We present 87 nuclear primer combinations specifically designed and validated for Castilleja (Table 1).
To investigate whether any of these primer combinations would amplify successfully across Orobanchaceae, we searched for our selected Castilleja primers against an assembled low-coverage genome for Lamourouxia multifida Kunth using BLAST (Altschul et al., 1990). Lamourouxia multifida was sequenced on an Illumina HiSeq 2000 at the University of Oregon as 100-bp paired-end reads, and contigs were assembled using SPAdes (Bankevich et al., 2012) under the default settings. BLAST search parameters, assessment of suitable hits, and subsequent PCR validation with Lamourouxia virgata Kunth, Physocalyx major Mart., and Neobartsia filiformis (Wedd.) Uribe-Convers & Tank are described in Latvis et al. (2017a). Primer combinations with successful amplification in Lamourouxia Kunth and at least one other taxon were selected for further PCR testing with other major lineages in Orobanchaceae (sensu McNeal et al., 2013; Fig. 1). This second round of PCR validation follows Latvis et al. (2017a), except that two of the accessions used for testing were changed. As in Latvis et al. (2017a), we also included a negative control and conserved sequence-tagged “universal” primers for the trnL-F region as a positive control for all primer pairs. We used Neobartsia peruviana (Walp.) Uribe-Convers & Tank instead of N. filiformis, and Paulownia fortunei (Seem.) Hemsl. instead of P. elongata Siebold & Zucc. (Appendix 1). Of the 87 nuclear primer combinations specifically designed for Castilleja, we identified 27 with broader applicability in Orobanchaceae, chosen if they successfully amplified in Pedicularideae (Clade IV; including Castilleja, Lamourouxia, and Pedicularis L.) and at least one of the other major clades highlighted in Fig. 1. Validation results are presented in Table 1 with these “core Orobanchaceae” combinations boldfaced.
CONCLUSIONS
We present 87 nuclear primer pairs specifically designed for Castilleja that target COSII and PPR loci. Although we target the same putative single-copy nuclear domains as previous studies (Wu et al., 2006; Li et al., 2008; Yuan et al., 2010; Blischak et al., 2014; Uribe-Convers et al., 2016), we developed primers for different loci and present unique primer combinations in this study. As with our chloroplast primers (Latvis et al., 2017a), all combinations were designed with the Fluidigm microfluidic PCR system in mind, allowing for parallelization of amplification for downstream high-throughput sequencing platforms. Of these, we identify a set of 27 primer combinations with broader utility across Orobanchaceae. The development of primers for putatively single-copy nuclear loci will greatly enhance efforts to understand evolutionary history at multiple taxonomic scales, both for Castilleja and across Orobanchaceae.
ACKNOWLEDGMENTS
This research was supported by resources at the Institute for Bioinformatics and Evolutionary Studies (IBEST; NIH/NCRR P20RR16448 and P20RR016454) and by the following awards from the National Science Foundation: DEB-1253463 (awarded to D.C.T.), DEB-1502061 (awarded to D.C.T. for S.J.J.), and DEB-1455399 (support for P.D.B.).