Simple sequence repeats (SSRs; microsatellites) have been widely used for nearly two decades to visualize intraspecific genetic variability (Avise, 1994), and SSRs have been used to infer phylogenies in some lineages (Orsini et al., 2004). Because they are highly polymorphic, SSRs are often informative for finer-scale patterns within and among populations (DeFaveri et al., 2013). The fast mutation rate that creates these high levels of polymorphism is widely recognized to increase levels of homoplasy as genetic distance increases (Estoup et al., 2002), although the use of multiple loci may mitigate the impact of homoplasy in a single locus. The genomic location (i.e., coding vs. noncoding) and nucleotide repeat pattern (mononucleotide repeats vs. two or more repeated nucleotides) of a set of SSRs are important factors that affect homoplasy and thus experimental design.
Plastid genomes provide useful markers to infer plant genetic patterns. Plastid data are especially important in Pinus L. (Pinaceae) because nuclear genetic markers have been problematic. For example, there has been slow concerted evolution in the nuclear ribosomal DNA internal transcribed spacer (Gernandt et al., 2001), greatly limiting its usefulness. Incomplete lineage sorting in Pinus low-copy nuclear loci extends deeply into the tree (Syring et al., 2005). This will likely mean that numerous low-copy nuclear loci will be required to resolve species trees in Pinus. Although plastid lineages can suffer from incomplete lineage sorting as well, their faster coalescence times may make them more useful for species-level questions. Because plastids are paternally inherited in Pinaceae (Neale and Sederoff, 1989), plastids potentially track pine pollen flow in contrast to maternal mitochondrial inheritance and biparental nuclear inheritance. Because much of the genetic variation in long-lived forest trees like Pinus is contained within rather than among populations (Petit and Hampe, 2006), our long-term goal of delimiting a species complex required a marker that could be efficiently genotyped in many individuals per population so that we could use allele frequencies rather than exemplar sampling. We investigated 15 plastid simple sequence repeat (cpSSR) loci—nine loci based on the P. thunbergii Parl. plastid genome (Wakasugi et al., 1994; Vendramin et al., 1996) and six loci designed for P. contorta Douglas (Stoehr and Newton, 2002)—that have been genotyped in numerous species and populations of pines and other members of Pinaceae (Echt et al., 1998; Walter and Epperson, 2001; Marshall et al., 2002; Richardson et al., 2002; Robledo-Arnuncio et al., 2005; Godbout et al., 2010; Feng et al., 2011; Jardón-Barbolla et al., 2011).
We clarify the genomic locations of these 15 cpSSRs in a plastome alignment of 107 pine species (Parks et al., 2012), summarize the extent of primer sequence conservation across the genus to allow selection of loci that can be used for any pine species, and investigate whether these loci are in hypervariable regions of the plastome.
To evaluate the most distant relationships for which the cpSSR loci might be useful, we investigated interspecific comparisons. Subsect. Ponderosae (sect. Trifoliae) was chosen for this test because of our ongoing study of that group. The highly variable ycf1 locus has been suggested as a useful region for species-level phylogenies (Parks et al., 2012). We compared interspecific information content of the multilocus cpSSR fragment lengths with ycf1 sequences from the same 15 individuals in subsect. Ponderosae.
A set of six nonredundant loci were chosen that can be economically and efficiently amplified in a single-tube multiplexed PCR, and we demonstrate the use of these six cpSSR loci in 911 samples. We also evaluate the impact of homoplasy on these loci using multilocus linkage disequilibrium as another criterion to evaluate their usefulness. For loci on the nonrecombining plastid genome, a finding of significant linkage disequilibrium suggests that the multilocus haplotypes are unlikely to be created by homoplasy (Angioi et al., 2009). Based on preliminary evidence in our own laboratory and on other reports, we hypothesize that two distinctive geographic regions represent divergent lineages within the P. ponderosa P. Lawson & C. Lawson species complex (Fig. 1). We hypothesize for the western region (corresponding to P. ponderosa var. ponderosa) that the Willamette Valley, Oregon, and Fort Lewis, Washington, populations (Pacific Northwest operational taxonomic unit [OTU]) are genetically distinct from the populations of western California (Benthamiana OTU) and/or inland populations (Ponderosa OTU) (Bouffier et al., 2003; Potter et al., 2013). For the eastern region (corresponding to P. ponderosa var. scopulorum Engelm. in S. Watson), we hypothesize that populations in southeastern Arizona (Sky Island OTU) are distinct from other, mostly allopatric populations of P. ponderosa var. scopulorum (Scopulorum OTU) and/or from partly sympatric P. arizonica Engelm. (Rehfeldt, 1999; Epperson et al., 2009). Patterns among these populations were observed using a method that does not require an assumption of uncorrelated alleles and allows for a priori definition of groups to emphasize among-group rather than within-group variation.
MATERIALS AND METHODS
The published nucleotide sequences for 15 cpSSR primer pairs (Table 1) were located within the aligned plastomes of 107 species of Pinus and six Pinaceae outgroups (TreeBase S12640) (Parks et al., 2012). Unique primer matches were confirmed by conducting a BLAST search for each primer sequence within the P. ponderosa var. ponderosa plastome (GenBank FJ899555). We made slight manual adjustments to improve the alignment in areas where cpSSRs were located, and then used annotations for FJ899555 to determine whether the primers, SSR regions, and flanking sequences were coding or noncoding. Using the same plastome alignment, primer conservation was determined for each taxonomic subsection (Gernandt et al., 2009). Primers were regarded as being highly conserved if they had no more than one base position mismatch. Alignments are available on the Dryad Digital Repository ( http://doi.org/10.5061/dryad.5nc25; Wofford et al., 2014).
Table 1.
Characteristics of 15 Pinus cpSSR loci assessed in this study.
To test if these loci were in hypervariable regions of the plastome, we measured nucleotide variation in the regions immediately surrounding each locus using the same plastome alignment by extracting a 1-kb segment centered on the repeat region. Using the script sorter.pl (Goremykin et al., 2010) on the iPlant Discovery Environment ( http://www.iplantcollaborative.org), we calculated the observed variability (OV) for each base position. OV calculates a mean of all possible pairwise comparisons, excluding gaps. Mean OV was also calculated for the full plastome alignment. For comparison, we also counted the number of unique amplicon lengths for each locus in the alignment (measuring from the outside of each primer pair) and conducted a Spearman's rank correlation test between the mean OV of the 1-kb segments and the amplicon lengths.
To evaluate interspecific information content, we selected the 14 samples that represent subsect. Ponderosae s.s. (Gernandt et al., 2009; Willyard et al., 2009) from the plastome alignment described above and P. jeffreyi Balf. (subsect. Sabinianae) to serve as the outgroup (Appendix 1). These 15 plastomes were used to compare the information content in the fragment lengths of the 15 cpSSR loci vs. the nucleotide sequences for the highly variable ycf1 region using median joining haplotype networks with star contraction preprocessing (Network version 4.6; Fluxus Engineering, Suffolk, England).
To improve the PCR multiplex, we removed loci that were monomorphic in early testing (Pt 107517), had numerous failures (Pc69, Pc987, Pt1254, Pt15169, Pt36480), or amplified the same region as other cpSSR loci (Pcl1A2, Pt30204; see Results). Six loci (Pc10, PcG2R1, Pcl2T1, Pt 100783, Pt71936, and Pt87268) amplified consistently in subsect. Ponderosae, were polymorphic, and had lengths that allowed confident four-color genotyping. For intraspecific comparisons, these six cpSSR loci were genotyped for 911 individuals from 41 populations of subsect. Ponderosae (Fig. 1; Appendix 2) using a PCR multiplex protocol that integrates fluorescent labels during PCR (Culley et al., 2008; Culley et al., 2013). New forward primers were purchased (Integrated DNA Technologies, Coralville, Iowa, USA) with a unique nucleotide sequence for one of four fluorescent dyes added to the 5′ end of the published primer (Table 2). The same four unique sequences (Table 2) were purchased as fluorescently labeled primers (Life Technologies, Carlsbad, California, USA). A 1-µM primer master mix was created with six forward, six reverse, and four labeled primers in a 1 : 4 : 4 (forward : reverse : labeled) volume ratio to limit forward primers as recommended by the manufacturer's protocol for the Multiplex PCR kit (part number 206143, QIAGEN, Germantown, Maryland, USA). We isolated DNA using the DNeasy Plant Mini Kit (QIAGEN, Valencia, California, USA) according to the manufacturer's protocol except that leaves dried in silica gel were homogenized in QIAGEN AP1 buffer and RNase A using the FastPrep homogenizer (MP Biomedicals, Santa Ana, California, USA) with a ceramic bead and garnet sand in FastPrep tubes, processing three times for 20 s each at 6 m/s. PCR reactions were 10 µL, using 1 µL of 1 mM primer master mix and 1 µL of DNA eluted from the DNeasy procedure. Thermocycler parameters were 15 min at 95°C; 35 cycles of 30 s denaturing at 94°C, 90 s annealing at 58°C, and 90 s extension at 72°C; and a final extension of 10 min at 72°C. PCR success (expecting multiple fuzzy bands because of the six-plex) was determined on 0.8% agarose gels using 2 µL of PCR product with 1 : 1000 SYBR Green loading dye (Sigma-Aldrich, St. Louis, Missouri, USA). PCR products were diluted 1 : 10 and genotyped (University of Missouri DNA Core Facility; ABI 3730xl DNA Analyzer, Life Technologies) with a GS600 LIZ (Life Technologies) size standard.
Table 2.
Fluorescently labeled primers for 15 Pinus cpSSR loci.
Linkage disequilibrium was estimated using MultiLocus (version 1.2; http://www.bio.ic.ac.uk/evolve/software/multilocus) and significance was estimated using 100 randomizations. Patterns among predefined OTUs were observed using discriminant analysis of principal components (DAPC) (Jombart et al., 2010). Two separate DAPC analyses were run, one for the western and one for the eastern region, with a priori grouping into three OTUs each (Fig. 1). DAPC and scatter plots of the first two principal components were run using adegenet (version 1.3-9.2; Jombart et al., 2010) in R (version 3.0.2; http://www.R-project.org). We used default parameters to place inertia ellipses for each OTU.
RESULTS
We found single locations in the plastome for all 30 primers (Table 3; Fig. 2; http://doi.org/10.5061/dryad.5nc25; Wofford et al., 2014). Two locus pairs were redundant: Pt30204 and Pc10 had the same reverse primer sequence and the forward primer for Pc10 was 50-bp upstream from Pt30204. Thus, they would yield amplicons that encompass the same repeat region. Pt87268 and PcI1A2 also had overlaps that include the same repeat region, despite not having matching primer sequences. Fourteen loci had both primers located in coding regions; primers for Pt71936 were within the ycf3 intron (Table 3). Fourteen loci had repeat regions located in intergenic spacers or introns. The repeat region for Pt107517 was located entirely within the rpl32 gene. Repeat motifs for the loci varied. Five were simple mononucleotide repeats, five had two adjacent segments of mononucleotide repeats, and five had complex motifs, including an 11-bp minisatellite in Pc987 and a 10-bp minisatellite in PcL2T1. We also found that indels in flanking regions contribute to the length variation in some loci. Primer conservation was high in all taxonomic subsections for 10 of the 15 primers (Table 3), and opportunities exist to create nearby primers for taxonomic subsections that have diverged (data available from the Dryad Digital Repository: http://doi.org/10.5061/dryad.5nc25; Wofford et al., 2014). As expected from their design in P. thunbergii and P. contorta, primers were well conserved across subgen. Pinus except for Pt71936 in sect. Trifoliae. Subgen. Strobus had mismatches for primers of four loci: Pc987, Pt100783, Pt1254, and Pt87268.
For the 107 Pinus plastomes examined, none of the tested loci were located in hypervariable regions. The mean OV for the entire plastome was 0.0350 (0.1193 SD), while the mean OV for the 15 segments surrounding each locus was 0.0210 (0.0084 SD). For the same 107 Pinus plastomes, length variation ranged from four to 37 alleles per locus (Table 3). Pt 107517 was monomorphic for the repeat region with only minor length differences in the flanking regions, as expected from its location in a coding region. Pc10, Pt30204, and Pt15169 had the greatest amount of length variation with 37, 35, and 32 unique lengths in 107 species, respectively. Spearman's rank correlation tests (rs) for the 15 loci showed no significant correlation between mean OV and number of alleles [rs(13) = 0.1323, P = 0.43], nor for only the six loci (described below) that we selected for subsect. Ponderosae [rs(4) = 0.4412, P > 0.5]. Networks based on cpSSRs and ycf1 nucleotide sequences from the same samples were different (Fig. 3).
Multiplex genotyping in 911 subsect. Ponderosae individuals yielded 45 total alleles, with a mean of 7.5 (2.3 SD) alleles per locus (Table 3). Pt71936 was successfully amplified despite minor primer differences. Multilocus linkage disequilibrium was significant (rd = 0.95; P < 0.01). Our DAPC analysis included 35 cpSSR alleles in 314 individuals in the western region (Fig. 1). Populations assigned to the Pacific Northwest OTU did not overlap on the scatter plot with the Benthamiana OTU or with the Ponderosa OTU (Fig. 4A). There were 37 cpSSR alleles in 597 individuals in the eastern region (Fig. 1). The inertia ellipse for the Sky Island OTU did not overlap with the ellipse for the Scopulorum OTU or with the ellipse for the P. arizonica OTU (Fig. 4B).
Table 3.
Plastid SSR locus characteristics in 107 Pinus species and in 911 subsect. Ponderosae individuals.
DISCUSSION
Our analysis of plastome alignments confirmed single locations for all 15 primer pairs but found that repeat regions were redundant for two pairs of loci: Pt30204 and Pc10; Pt87268 and PcI1A2. The primers showed generally high levels of sequence conservation across the four taxonomic sections of pine, with some exceptions in subgen. Strobus where either the forward or the reverse primer for three loci (Pc987, Pt100783, and Pt1254) had mismatches across the entire subgenus. For these, minor adjustments in primer location to more conserved adjacent regions would potentially increase cross-species transferability (data available from the Dryad Digital Repository: http://doi.org/10.5061/dryad.5nc25; Wofford et al., 2014).
Because the plastid genome is nonrecombining, the significant linkage disequilibrium that we observed in this set of six cpSSR loci in 911 samples suggests that these multilocus haplotypes are likely to be identical by descent rather than to have been derived by homoplasy.
The cpSSR haplotype network for one exemplar each of 15 species has two unresolvable cycles, and outliers are attached to the network by as many as 10 median vectors (Fig. 3A). This suggests that when using a single sample per taxonomic unit these six cpSSR loci are too saturated to make useful interspecific comparisons in subsect. Ponderosae. The ycf1 network (Fig. 3B) differs from the cpSSR network. It has 27 median nodes, seven cycles, and fails to group most of the clades that were resolved from a whole-plastome phylogeny using the same samples (Parks et al., 2012). This suggests that nucleotide sequences of ycf1 are also inadequate for interspecific comparisons across this taxonomic subsection.
DAPC scatter plots for our two intraspecific analyses each support our hypothesized OTU. The Pacific Northwest OTU is clearly distinct from the Benthamiana and Ponderosa OTUs (Fig. 4A), with no intermingled sample points. Although some Scopulorum sample points are intermingled with Sky Island OTU samples, the inertia ellipse for the Sky Island OTU does not overlap with the inertia ellipses for the Scopulorum OTU or for P. arizonica (Fig. 4B). Using data from six loci for 911 individuals, we were not able to infer an optimal number of clusters (k) using the Bayesian Information Criterion implemented in the find.clusters algorithm of adegenet. However, this feature may be useful to assign individuals to populations to identify potentially admixed populations.
As we finish our data set for all subsect. Ponderosae populations of interest, DAPC of cpSSRs will certainly play an important role. We will test a range of nested OTU groupings to observe relative distinctiveness of these subdivisions. An important caveat is that these cpSSR loci are all linked on the plastid genome and are uniparentally inherited. DAPC offers a way to use these cpSSR data that avoids the discriminant analysis assumption that variables are uncorrelated yet takes advantage of the a priori group assignment feature of discriminant analysis, a feature that is lacking in principal components (Jombart et al., 2010), and is likely to be important in cases like ours where much of the variation is contained within populations.
Incomplete lineage sorting is an important factor in pine molecular studies (Syring et al., 2005) and can lead to incongruence among nuclear and plastid phylogenies (Willyard et al., 2009). Plastid lineages in pines might also be incongruent with nuclear lineages in areas of secondary contact via the widespread phenomenon that has been called “chloroplast capture” (Matos and Schaal, 2000; Liston et al., 2007). This plastid-nuclear conflict has been attributed to hybridization in many plant families, although other mechanisms play a role (Stegemann et al., 2012). For the P. ponderosa species complex, mitochondrial haplotypes in some cases support further subdivision of OTUs indicated by our plastid evidence, support fewer subdivisions, or suggest different geographic delineations between OTUs. For example, although data presented here show that the Willamette Valley, Oregon, and Fort Lewis, Washington, populations have similar plastids (Pacific Northwest OTU in Fig. 4A), they do not have similar mitochondria. A Fort Lewis, Washington, population shares a mitochondrial haplotype with populations represented by our Benthamiana OTU, and a Willamette Valley, Oregon, population shares a mitochondrial haplotype with populations represented by our Ponderosa OTU (Potter et al., 2013). We also have preliminary evidence from nuclear SSRs for some incongruent groupings, suggesting a genetic mosaic for the P. ponderosa species complex. Although the patterns are certainly affected by incomplete lineage sorting, we expect that pollen flow (revealed by paternal plastid inheritance) and seed dispersal (revealed by maternal mitochondrial inheritance) have shaped the genotype of divergent pine populations in contact zones. The extent of organelle transfer and nuclear introgression across contact zones of long-separated subsect. Ponderosae populations seems to be rather limited (Latta and Mitton, 1999), but emerging patterns suggest that there are other major contact zones that are yet to be explored. Thus, our taxonomic conclusions in subsect. Ponderosae will await nuclear and mitochondrial data, as well as morphological characters and ecological niche models. DAPC will be an important tool to combine these independent data sets because it can accommodate correlated variables and provides group weightings to compensate for unequal contributions from each partition.
Considering the current possible alternatives for measuring genetic diversity in wild plants, multiplex genotyping of cpSSRs in Pinus provided an efficient and relatively informative view of genomic diversity for use in estimating genetic distance in the plastid lineage. Although we demonstrated the utility of these six loci within subsect. Ponderosae, the conservation of primers across the genus suggests that many of the 13 nonredundant cpSSR loci will provide useful data for other Pinus taxonomic subsections. In conjunction with other criteria for population genetic structure and species delimitation, these fragment length characters can provide useful insights into pine relationships. We suggest that the method would be easy to extend to other plants using readily available plastome alignments to design primers specific for the target group (Angioi et al., 2009).
LITERATURE CITED
Appendices
Appendix 1.
Taxon name and sample, GenBank number, country, state, and geographic coordinates of 15 samples used in Fig. 3. NA = not available.
Appendix 2.
Operational taxonomic unit (OTU), population, collector(s), collector number or herbarium voucher, U.S. state, and GPS coordinates of 41 populations shown in Fig. 1 and used in Fig. 4. A herbarium voucher for each population has been deposited at Hendrix College Herbarium (HXC in Index Herbariorum).
Notes
[1] The authors thank A. Duina, J. Finney, D. Gernandt, V. Goremykin, D. Hoose, S. Langer, P. Lea, B. Linz, P. Marquardt, S. Meyers, R. Murray, T. Nguyen, M. Parks, D. Pouncey, C. Rand, B. Schumacher, N. Seagar, J. Smith, K. Spatz, and F. Telewski, and three reviewers. Funding is acknowledged from Hendrix College Odyssey, Arkansas Academy of Sciences (grants to A.M.W., K.F., and A.B.), Beta Beta Beta Foundation (awards to A.M.W. and A.B.), and an Arkansas Department of Education Student Undergraduate Research Fellowship to K.F.