The Cross Timbers forests of central North America contain some of the largest tracts of old-growth forests remaining in the continental United States and are predominantly populated with two species of oak: post oak (Quercus stellata Wangenh.) and blackjack oak (Q. marilandica Münchh.; Therrell and Stahle, 1998). Originally, these forests covered nearly 79,062 km2 (Kuchler, 1964), but remnant old-growth stands are now estimated to cover only about 1200 km2 (Bayard, 2003; Peppers, 2004) ranging from eastern Oklahoma to North Texas. The remnant stands in eastern Kansas have never been spatially quantified. The remaining stands of old-growth forest are highly fragmented but relatively undisturbed because they grow in areas that are very rocky and difficult to navigate, farm, or develop. Post oak is a slow-growing tree, typically 9–12 m tall with trunk diameter of 38–46 cm in the Cross Timbers eco-region (Stransky, 1990). Because post oak is such a slow-growing species, fire is necessary to maintain post oak as the dominant canopy species. In the absence of fire or under active fire-suppression regimes, the forest canopy composition changes from a fire-tolerant post oak canopy to faster-growing fire-intolerant species such as softwoods and mesophytic plants (Abrams, 1992).
Little genetic or ecological research has been performed on this characteristic old-growth forest species. A genetic examination of the extant forest remnants is needed to quantify remaining genetic diversity in post oak and to identify populations of highest conservation priority. Here we use 454 pyrosequencing to discover microsatellites from a pooled library of post oak samples using reduced representational libraries based on restriction site conservation (Maughan et al., 2009), with each sample labeled independently using a unique multiplex identifier barcode. The incorporation of the barcodes into specific DNA sequence fragments allows for the unambiguous assignment of sequences to specific DNA samples in the sequence pool, which can then be used to identify putatively polymorphic microsatellite loci among samples from the sequence data itself, effectively eliminating the need for polymorphism screening and genotyping validation (an expensive and often laborious process).
METHODS AND RESULTS
DNA was extracted from bud and catkin tissue from each of three populations located across the Cross Timbers range, including a northern (Kansas; n = 29), a central (Oklahoma; n = 30), and a southern (Texas; n = 28) population (Appendix 1). Voucher specimens are archived at the Stanley L. Welsh herbarium (BRY) at Brigham Young University, Provo, Utah.
Tissue was ground in liquid nitrogen and DNA extracted using a modified mini-salts protocol as reported by Todd and Vodkin (1996) supplemented with 1% (w/v) polyvinylpyrrolidone-10 (Sigma-Aldrich, St. Louis, Missouri, USA). Microsatellite markers were identified from reduced representational libraries, also known as genomic reduction libraries, of two individuals collected from the northern and southern range of post oak. The reduced representational libraries were generated following Maughan et al. (2009) and were 454 pyro-sequenced at the DNA Sequencing Center at Brigham Young University (DNASC), Provo, Utah. A total of 305,717 reads were identified (approximately 150,000 reads from each individual) representing 184 Mb with an average read length of 601 bp (N50 = 623 bp). All reads were assembled into 18,270 contigs using the default parameters of the Roche Newbler assembler (version 2.3; 454 Life Sciences, a Roche Company, Branford, Connecticut, USA). A total of 1191 perfect microsatellite repeats were identified from the contigs using the computer program MISA (Thiel et al., 2003), including 809 di-, 291 tri-, and 91 tetranucleotide repeat motifs, with minimum repeat units of 8, 6, and 5, respectively. The most common motifs were AG/CT, AAT/ATT, and AAAT/ATTT.
Putative microsatellite markers were then further filtered using custom Perl scripts (Stajich et al., 2002) to identify 48 microsatellite loci that varied in repeat length between the consensus reads of each post oak individual. Flanking primers for all 48 loci were designed using Primer3 version 2.0 (Rozen and Skaletsky, 2000) with default parameters except for: product size = 150–250 bp, maximum melting temperature (T m) difference = 1°C, and maximum poly X = 3. All 48 primer pairs were screened for strong amplification and polymorphism on an initial panel of four post oak individuals. The PCR reactions used a HotStarTaq Plus Master Mix Kit (QIAGEN, Germantown, Maryland, USA) in 10-µL reactions with 10 ng of genomic DNA and 0.5 µM of each primer according to the manufacturer's guidelines with the following amplification parameters: 95°C for 5 min, followed by 35 cycles of 94°C for 45 s, 55°C for 30 s, 72°C for 60 s, followed by a final extension cycle of 72°C for 10 min. The amplicons were visualized in a UV transilluminator using 3% MetaPhor Agarose (Lonza Ltd., Allendale, New Jersey, USA) gels electrophoresed for 6 h and stained with ethidium bromide. Twenty-four primer pairs failed to amplify or produced complex (multiple bands) banding patterns.
Twelve of the polymorphic microsatellites were then fluorescently labeled with VIC, FAM, or NED (Life Technologies, Grand Island, New York, USA; Table 1) and amplified across all individuals in each of the three populations (n = 87) using the multiplex PCR Type-it Microsatellite PCR Kit (QIAGEN) according to the manufacturer's instructions in 10-µL reactions with 30 ng of genomic DNA and 0.2 µM of all primers with the following amplification profile: 95°C for 5 min, followed by 30 cycles of 95°C for 30 s, 55°C for 90 s, 72°C for 30 s, followed by a final extension cycle of 60°C for 30 min. Four microliters of the amplified products were processed for fragment analysis on an ABI 3730xl using GeneScan 500 ROX (Applied Biosystems, Carlsbad, California, USA) as the size standard. Fragment analysis and scoring was performed using the microsatellite plugin in Geneious 5.6.7 (Drummond et al., 2011), and the data analyzed for genic differentiation (F ST), deviation from Hardy-Weinberg equilibrium (HWE), observed heterozygosity (H o), expected heterozygosity (H e), and total number of alleles using the computer program Arlequin version 3.5 (Excoffier and Lischer, 2010). Linkage disequilibrium was tested for each pair of loci across and within populations using the log likelihood ratio statistic and default Markov chain parameters in GENEPOP (Raymond and Rousset, 1995). BLASTX searches of the RefSeq_Protein data with the flanking microsatellite DNA sequences produced only a single significant hit (E-value = 2e-55) for QS00766 to GenBank accession number XP_007012674.1, an RNA polymerase-associated protein CTR9 isoform 5 from Theobroma cacao L. The primer sequences, repeat motif type, and expected product size for the microsatellites not selected for fluorescent analysis are provided in Appendix S1 (apps.1400070_s1.docx).
Characteristics of 12 polymorphic microsatellite primers developed for Quercus stellata.
The 12 microsatellite loci produced a total of 168 alleles across all populations, with an average number of alleles per locus of 14. The total number of alleles varied from a low of five (QS02499 locus) to a high of 20 (QS01386 locus). Within individual populations, the average number of alleles ranged from a low of eight (Oklahoma) to a high of 9.167 (Texas). The mean H o and H e were 0.417 and 0.752, respectively, across all populations, while in individual populations the H o and H e values ranged from 0.05 to 0.833 and 0.236 to 0.893, respectively (Table 2). F ST among populations was 0.066. Significant departure from HWE was identified for 11 out of the 12 microsatellite markers when measured across all individuals, without respect to population (Table 2). When tested within populations, four to five markers were in HWE in each population, suggesting possible subpopulation structure due to geographic isolation and genetic drift (we note that the sample sizes are too small for definitive conclusions). Of the 66 possible pairwise comparisons, significant linkage disequilibrium (P < 0.01) was identified across populations between two sets of loci, specifically between QS01386 and QS08928, and between QS00984 and QS02499. Within populations, significant linkage disequilibrium was identified between a single set of loci for both the Texas (loci QS00984 and QS02499) and Oklahoma (loci QS01386 and QS08928) populations, while three pairwise comparisons evidenced linkage disequilibrium (loci QS00766 and QS01904, QS00984 and QS00562, and QS00984 and QS03297) in the Kansas population.
We report the development of 12 highly polymorphic microsatellite markers for post oak. Our preliminary analysis of three post oak populations suggests that genetic diversity within populations remains high and that only low levels of among-population differentiation (F ST) exist. These observations suggest that the remnant populations we sampled have not had sufficient time to show the effects of severe population reduction (inbreeding and genetic drift), perhaps due to the long lifespan (300+ years) of post oak. The markers reported here should provide the molecular tools needed for larger investigations of population structure and genetic diversity in this key species of the Cross Timbers forest of North America.
Number of alleles detected at each microsatellite marker. Bold values indicate a significant deviation (P < 0.001) from Hardy-Weinberg equilibrium.