Liriodendron tulipifera L., commonly known as tuliptree, tulip poplar, or yellow poplar, is a pioneer tree in the family Magnoliaceae native to eastern North America. It has a wide geographic distribution in the southeastern and mid-Atlantic United States and occurs in diverse habitats. To facilitate population genetic analyses of effective population size and population structure, we developed genomic microsatellite (simple sequence repeat [SSR]) markers without the potential limitations of previously reported SSRs. Liriodendron tulipifera SSRs have been developed from expressed sequence tags (ESTs; Xu et al., 2006, 2010; Yang et al., 2012; Zhang et al., 2015) located in or near functional genes, and consequently, they are more likely to be affected by natural selection (Ellis and Burke, 2007). Liriodendron chinense (Hemsl.) Sarg. genomic (noncoding, nontranscribed) microsatellite loci have been cross-amplified in L. tulipifera (Yao et al., 2008). Cross-species amplification of microsatellite loci might result in ascertainment bias, where polymorphism is reduced when loci are transferred to related species (Ellegren et al., 1995). Preliminary tests of loci from Yao et al. (2008) carried out with 10 L. tulipifera individuals showed low polymorphism (results not shown). Nonneutral evolution or ascertainment bias can potentially impact the estimation of population genetic parameters. Therefore, we identified and characterized polymorphic genomic microsatellite loci in L. tulipifera using Illumina next-generation sequencing and a bioinformatics pipeline.
METHODS AND RESULTS
Microsatellite development—Total DNA from leaves of one L. tulipifera individual collected on the main campus of Georgetown University in Washington, D.C., USA, was extracted using the DNeasy Plant Mini Kit (QIAGEN, Valencia, California, USA). A genomic DNA library for Illumina paired-end sequencing was prepared from 4 µg of DNA following the PCR-free library prep kit from Illumina (Illumina, San Diego, California, USA). DNA was sheared to 550 bp and sequenced as 150 bp paired-end reads on an Illumina HiSeq 2500 at the Biocomplexity Institute of Virginia Tech (Blacksburg, Virginia, USA). We used PAL_FINDER_v0.02.04 (Castoe et al., 2012) to extract reads containing perfect microsatellites (uninterrupted and identical repeats). The reads were imported to PAL_FINDER and analyzed in two different ways: (1) as Illumina paired-end reads filtered to include ≥12 tri-, ≥10 tetra-, ≥8 penta-, and ≥6 hexanucleotide repeats, and (2) as FASTQ sequence files converted to FASTA format, treated as 454 single-end reads, and filtered to include ≥15 di-, ≥10 tri-, ≥8 tetra-, ≥6 penta-, and ≥4 hexanucleotide repeats. One potential advantage of using both methods is the development of loci with a broader range of amplification fragment sizes. In both cases, we identified microsatellite loci with flanking sequences suitable for PCR primer design or potentially amplifiable loci (PALs). Raw reads were deposited in the National Center for Biotechnology Information (NCBI) Short Read Archive (SRA; BioProject no. PRJNA331147, BioSample no. SAMN05417503). Summaries of reads containing microsatellite repeats and PALs (with primer sequences) detected using both methods are available in Appendices S1 (apps.1700032_s1.txt) and S2 (apps.1700032_s2.txt).
We selected a set of 77 PALs to empirically assess amplification using three individuals. We amplified each locus in 25-µL PCR reactions (1× OneTaq Standard reaction buffer, 160 µM dNTPs, 0.2 µM forward primer, 0.2 µM reverse primer, 0.625 units OneTaq DNA polymerase [New England BioLabs, Ipswich, Massachusetts, USA], 1 µL template DNA [concentration was not determined], and ddH20 to 25 µL). Thermocycling conditions were 94°C (30 s); followed by 30 cycles of denaturation at 94°C (30 s), annealing at 50–61°C (30 s, Table 1), and extension at 68°C (30 s); and a final extension of 68°C (5 min). Fifty-one primer pairs yielded products of the expected size without nonspecific amplification and were then tested for polymorphism in seven individuals, by visualizing PCR products on 3% agarose gels. Of these 51 loci, 23 were polymorphic and used to genotype 30 to 52 total individuals collected from three old-growth locations in the native range of L. tulipifera (Appendix 1). Because L. chinense, the other single species in Liriodendron L., has a restricted geographic distribution in China and Vietnam, we were not able to test for cross-species amplification. As cross-amplification of genomic SSRs has limited success in plants (Merritt et al., 2015) and success declines as genetic divergence increases (Barbará et al., 2007), we did not test for cross-amplification in other Magnoliaceae.
Characteristics of 23 polymorphic genomic microsatellite loci isolated from Liriodendron tulipifera.
Genetic properties by individual and pooled sampled locations of 23 polymorphic microsatellite markers developed in Liriodendron tulipifera.a
For fragment analyses, PCR products were fluorescently labeled either using primers tailed with a 5′ M13(−21) sequence following Schuelke (2000) or using primers with a 5′ fluorophore and amplified in multiplex (Table 1). In the tailed primer labeling method, two PCR reactions were carried out using the same reverse primer. The first PCR used an M13(−21)-tailed locus-specific forward primer, while the second used a universal fluorescently labeled M13(−21) as a forward primer. The products of the first PCR were purified using StrataPrep PCR Purification Kit (Agilent Technologies, Santa Clara, California, USA) and then used as the template for the second PCR. Fluorescent products were electrophoresed on an ABI PRISM 3100 Genetic Analyzer (Applied Biosystems, Foster City, California, USA), and amplicon sizes were estimated with either orange or red DNA size standard (MCLAB, San Francisco, California, USA) and GeneMapper software 3.7 (Applied Biosystems) using the Local Southern sizing algorithm.
Microsatellite data analysis—Genotypes appeared diploid, displaying at most two alleles per locus per individual. Data were analyzed by sampled location and as a pooled population (Table 2). For each locus, number of alleles, observed heterozygosity (Ho), expected heterozygosity under random mating (He), and polymorphism information content (; Botstein et al., 1980) for the pooled population were estimated using CERVUS 3.0.3 (Kalinowski et al., 2007). We used GENEPOP 4.2 (Rousset, 2008) to test deviation from Hardy-Weinberg expected heterozygote frequency (HWE) using default values for Markov chain parameters, and to estimate the fixation index (F = [He - Ho]/He; Hamilton, 2009) for the pooled population.
Population genetic parameters are listed for each sampled location showing loci exhibited two to 12 alleles, with almost all alleles common to each location (Table 2). Lack of population differentiation (FST = 0.077 estimated using GENEPOP) justified pooling genotypes from the three locations. In the pooled population, observed and expected heterozygosities ranged from 0.233 to 0.865 and 0.272 to 0.876, respectively. Six loci showed significant deviations from HWE (Table 2) with deficits of heterozygotes that could be attributed to numerous causes. One hypothesis is nonrandom mating, which we tested using INEST (Chybicki and Burczyk, 2009) with the individual inbreeding model (IIM) run for 50,000 burn-in and 500,000 total cycles. The estimated average coancestry coefficient over all loci (f) was 0.041, with a 95% highest posterior density interval of [0.000, 0.085], indicative of an outcrossing species. Another hypothesis for the deficit of heterozygosity is the presence of null alleles. Frequencies of null alleles estimated with INEST using IIM are listed in Table 2. The six loci with significant deficits of heterozygotes also showed evidence of null allele frequencies greater than zero.
When comparing models including null alleles (n), mating among relatives (f), and genotyping failures (b) using INEST, the full model (nfb) was found to best fit the data by the lowest deviance information criterion value (DIC = 5905.837), showing that all three parameters contributed to observed genotype frequencies. The nb model, without mating among relatives, exhibited the closest DIC value (5916.991), but the difference was greater than 10, indicating stronger support for the nfb model (Igor J. Chybicki, personal communication).
The authors thank the Smithsonian Environmental Research Center (Edgewater, Maryland, USA), James Madison's Montpelier (Orange, Virginia, USA), and Saddler's Woods Conservation Association (Haddon Township, New Jersey, USA) for allowing sampling of specimens. R.G.O. was supported by a fellowship from the Consejo Nacional de Ciencia y Tecnología (CONACYT, Mexico). The authors are grateful for funding from the Georgetown Environment Initiative, the Center for the Environment, and the Department of Biology at Georgetown University.