Resolving phylogenetic relationships among closely related angiosperm species is frequently hindered due to limited variation in currently available markers (Li et al., 2008; Zimmer and Wen, 2012). This challenge is no less problematic in the myrrh genus, Commiphora Jacq. (Burseraceae), where complete, species-level resolution has not been achieved despite the use of multiple gene regions (Weeks and Simpson, 2007). We describe the development and evaluation of four novel, exon-primed intron-crossing (EPIC) markers for Burseraceae (Sapindales) using a repository of putative, intron-flanking nuclear unigenes from 43 plant taxa and two complete reference genomes (IntrEST; Ilut and Doyle, 2012). Markers were evaluated for their phylogenetic utility at the species level using a recently radiated lineage of Commiphora and a generic-level sampling in Burseraceae. Sequence variation from these novel markers was compared to existing nuclear markers and shows promise for resolving relationships at both shallow and deeper phylogenetic scales.
METHODS AND RESULTS
Marker development—Development of EPIC markers for Burseraceae involved unigene data sets of Citrus clementina hort. ex Tanaka and C. sinensis (L.) Osbeck (Rutaceae; Sapindales) and two reference genomes available in IntrEST, Oryza sativa L. and Arabidopsis thaliana (L.) Heynh. We developed 12 primer pairs for putative introns from each of four predicted amplicon size categories (200-bp increments between 400–1200 bp), resulting in 48 total primer pairs. For each size category, six primer pairs were developed from a percent-identity criterion of either 80–89.9% or 90–100% between the unigene and the corresponding reference. We predicted that the lower percent-identity criterion (80–89.9%) might yield more informative variation among closely related species. Half of the primer pairs were generated using C. clementina and the other half from C. sinensis unigenes. Primer sequences were a consensus between unigene and the corresponding reference genome. Primers were preferentially designed using A. thaliana. Primers were designed between 18–30 bp, within 50 bp of putative intron splice regions in the reference genome, having a melting temperature (Tm) between 51–74°C, without predicted dimers, and a 35–60% G-C content. Primer characteristics were evaluated using Oligo-Evaluator (Sigma-Aldrich, St. Louis, Missouri, USA). Exceptions for Tm and %GC were made for 18 primers where it was not possible to meet all necessary criteria (Appendix 1). Each primer pair was tested by its ability to amplify a single PCR product from two species of Madagascan Commiphora (C. lamii H. Perrier and C. ankaranensis (J.-F. Leroy) Cheek & Rakot.) and a positive control (A. thaliana) and to sequence cleanly.
Taxonomic sampling and molecular methods—Markers that passed all of the above criteria were evaluated using 14 species of Burseraceae (Appendix 2), including eight Commiphora ingroup species and six outgroup species from closely (Bursera Jacq. ex L., Aucoumea Pierre) and distantly (Boswellia Roxb. ex Colebr., Protium Burm. f., and Beiselia Forman) related genera, respectively (Weeks et al., 2005; Thulin et al., 2008). All ingroup taxa are Madagascan, and seven correspond to one of two species-rich clades in Madagascar. We sampled densely from one clade to test phylogenetic utility at shallow scales. Whole genomic DNA was extracted from specimens using the FastPrep FastDNA Spin Kit (Bio101 Systems, La Jolla, California, USA). All markers were amplified in 25-µL PCR reactions including: 0.5 µL forward and reverse primers (5 µM), 0.5 µL spermidine (4 mM), 2 µL total DNA, and 5 µL GoTaq Green Master Mix (Promega Corporation, Madison, Wisconsin, USA). A ramp-up PCR thermocycler protocol followed a 4-min presoak at 94°C with 35 cycles of 30 s at 94°C (denaturation), 1 min at 48–56°C (annealing), and 50 s at 72°C (extension), followed by a 4-min postsoak at 72°C. PCR products were purified prior to sequencing reactions using ExoSAP (USB Corporation, Cleveland, Ohio, USA) and sequenced by Macrogen (Rockville, Maryland, USA) using an ABI 3730XL Analyzer with BigDye Terminator version 3.1 (Applied Biosystems, Foster City, California, USA). Sequencing reactions (10 µL) for both directions included 40 ng/µL template. Products were assembled and edited using Sequencher 4.7 (Gene Codes Corporation, Ann Arbor, Michigan, USA).
Phylogenetic analyses—Multiple sequence alignment (MSA) was performed using MUSCLE version 3.7 (Edgar, 2004). Gap regions in the MSA were treated as missing data. Markers were evaluated using maximum parsimony (MP) and Bayesian inference (BI). MP analyses were conducted using PAUP* 4.0b 10 (Swofford, 2002) with a two-step protocol modified from Plunkett et al. (2005). Branch support for internal nodes was inferred by bootstrapping 1000 replicates in PAUP*. BI analyses were performed using MrBayes version 3.1.2 (Ronquist and Huelsenbeck, 2003). Two runs were performed for each data set using the best-fitting model as determined by jModelTest (Posada, 2008) consisting of four chains each for 10 million generations sampled every 1000 generations; 10% sampling was discarded as burn-in for each run. MSA and BI analyses were performed in the CIPRES Science Gateway (Miller et al., 2010).
Marker evaluation—Fifteen of the 48 EPIC primer pairs (31%) amplified at least one species, and four pairs (8%; 10F-10R, 16F-16R, 39F-39R, and 43F-43R) produced amplicons that sequenced cleanly for multiple taxa. Provisional marker names are provided based on gene ontology categories from reference taxa (Table 1). When searched in BLAST, sequences of putative intron regions for RPT6A, BXL2, and Rab6 (Appendix 2) matched gene ontology categories predicted for the Arabidopsis and Oryza references in IntrEST. Sequence products for mtATP Synthase D (Appendix 2) did not BLAST to predicted gene ontology categories. Sequences produced by this study have been deposited into GenBank (Appendix 2). Sequence alignment files are deposited in the Dryad Digital Repository ( http://doi.org/10.5061/dryad.382p0; Gostel and Weeks, 2014). Phylogenetic statistics of new EPIC markers are presented in comparison (Table 1) with those from nuclear markers developed for previous phylogenetic studies of Burseraceae (ETS: Weeks and Simpson, 2007; ITS: Gostel and Weeks, unpublished). Phylogenetic analysis of EPIC markers developed in this study recovered well-resolved phylogenies consistent with those from previous studies (Fig. 1). The concatenated set of all four EPIC markers resulted in improved phylogenetic resolution compared to previously developed markers (Fig. 1).
Critical assessment of primer design criteria—Each of the 15 primer pairs that amplified at least one species spanned the range of melting temperatures (51–74°C), differed from their pair by less than 10°C in Tm, and were developed from both Citrus unigene data sets and both reference genomes. More than half (9/15) of these markers were designed using 80–89.9% identity criteria, yet only two (16F-16R and 43F-43R) sequenced cleanly for multiple taxa. Two of the six primer pairs developed from the 90–100% identity criterion (10F-10R and 39F-39R) sequenced cleanly for multiple taxa and yielded the most informative variation among Commiphora species. These results do not support predictions that lower percent identity would provide better shallow-scale phylogenetic resolution, which suggests mutation rates between exon and intron regions are independent.
CONCLUSIONS
The EPIC markers developed in this study may also be useful for phylogenetic reconstruction in other angiosperm taxa. Most primer pairs amplified A. thaliana (Brassicales), and they may work in other rosid or eudicot taxa. Of the four markers, RPT6A is most promising for further evaluation. This ca. 400-bp region sequenced cleanly for all Burseraceae taxa and yielded a percentage of phylogenetically informative characters on par with ITS. Our study demonstrates how genomic resources from model organisms can be leveraged to advance the phylogenetic systematics of nonmodel organisms.
Table 1.
Marker names, primer sequences, and phylogenetic statistics for the novel nuclear EPIC markers and the benchmark nuclear ribosomal DNA ETS and ITS regions.
LITERATURE CITED
Notes
[1] This material is based in part upon work supported by a dissertation research fellowship to M.R.G. from the Department of Environmental Science and Policy at George Mason University and by the National Science Foundation under grant number 0919179 to A.W. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Publication of this article was funded in part by the George Mason University Libraries Open Access Publishing Fund.