The Smilax hispida group is a well-supported clade including six species in Smilacaceae (Qi et al., 2013) with a disjunct distribution including eastern Asia (S. sieboldii Miq. and S. scobinicaulis C. H. Wright), western North America (S. californica (A. DC.) A. Gray), eastern North America (S. hispida Raf.), and Mexico (S. moranensis M. Martens & Galeotti and S. jalapensis Schltdl.). Smilax sieboldii is a typical element of temperate broadleaved forests that occurs widely in mainland China, Taiwan, Japan, and Korea. Previous studies based on two cpDNA intergenic regions indicated that at least four biogeographic lineages exist, with each lineage containing at least one private haplotype. This phylogeographic structure is considered to be related to the historical fluctuation of climate and sea level (Zhao et al., 2013). However, this study was limited by the lack of nuclear markers. Therefore, polymorphic microsatellite markers will enhance our understanding of population genetic diversity and historical demography (e.g., gene flow, genetic bottlenecks) and will allow for connecting these patterns to geological and environmental changes.
Existing microsatellite markers for Smilax species (Xu et al., 2011; Martins et al., 2013) showed limited transferability and polymorphism for the S. hispida group due to phylogenetic distance. Therefore, in the current study we aimed to develop more polymorphic and transferable expressed sequence tag-simple sequence repeat (EST-SSR) markers from the transcriptome, which contains abundant ESTs, based on a high-throughput sequencing approach.
METHODS AND RESULTS
Transcriptome sequencing— Fresh young leaves of one wild accession of S. sieboldii were collected at Tianmu Mountain, Zhejiang Province, China (Appendix 1), and frozen in liquid nitrogen. RNA was extracted using TRIzol Reagent (Invitrogen Life Technologies, Carlsbad, California, USA) and treated with DNase (TaKaRa Bio, Shuzo, Kyoto, Japan) following the manufacturer's instructions. A 2 × 150-bp paired-end RNA-Seq library was prepared following the normalized eukaryote transcriptome library preparation protocol of the Beijing Genomics Institute (Shenzhen, China) and sequenced on the Illumina HiSeq 2500 platform (Illumina, San Diego, California, USA). A total of 65,863,062 raw reads were generated and uploaded to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (accession SRP095761). The raw data were filtered using FASTX-TOOLKIT version 0.0.14 (Gordon and Hannon, 2010) by removing adapter sequences and low-quality reads with >5% unknown bases and/or >15% low-quality bases (quality value <20). Remaining reads were assembled into 66,482 transcripts using TRINITY version 2.3.2 (Grabherr et al., 2011), which were then clustered into 47,628 unigenes with TGICL version 2.1 (Pertea et al., 2003).
Characteristics of 17 newly developed microsatellite loci in Smilax sieboldiia
Microsatellite development— Using the MIcroSAtellite identification tool (MISA) (Thiel et al., 2003), microsatellite regions in the unigenes were screened according to the following criteria for repeat numbers: dinucleotide repeats ≥6, trinucleotide repeats ≥5, and tetranucleotide, pentanucleotide, and hexanucleotide repeats ≥4. Primers were designed for the screened microsatellite loci using Primer3 (Untergasser et al., 2012) with the default parameter settings. A total of 9263 microsatellite sequences were obtained, from which 2252 primer pairs were designed. Of these, 122 primer pairs were randomly selected and their forward primers were synthesized with one of three different universal primers (5′-CACGACGTTGTAAAACGAC-3′, 5′-TGTGGAATTGTGAGCGG-3′, or 5′-CTATAGGGCACGCGTGGT-3′) (Boutin-Ganache et al., 2001; Sakaguchi and Ito, 2014). To prevent primer dimers, hairpin structures, and mismatches, the best matches of forward primers and universal primers were selected using OLIGO version 6.67 (Molecular Biology Insights, Cascade, Colorado, USA).
We selected 12 accessions from various populations (Appendix 1) to test the effectiveness of primer amplification and to preliminarily assess genetic variation. Total genomic DNAs were extracted from silica-dried leaves using Plant DNAzol (Invitrogen Life Technologies). PCR amplifications were performed following the standard protocol of the Tsingke PGR kit (Tsingke Biotech Company, Beijing, China) in a final volume of 10 µL, which contained approximately 5 ng of DNA, 5 µL of 2× PCR Master Mix, 0.1 µM of forward primer, 0.4 µM of reverse primer, and 0.3 µM of fluorescently labeled universal primer (FAM, ROX, HEX, TAMRA; Table 1). The PCR thermal profile involved an initial denaturation at 95°C for 5 min; followed by 35 cycles of 94°C for 40 s, 58°C for 30 min, 72°C for 30 s; and a final 10-min extension step at 72°C. Fragment lengths of PCR products were analyzed on a 3730x1 DNA Analyzer (Applied Biosystems, Foster City, California, USA) with GeneScan 500 LIZ as an internal reference (Applied Biosystems). Electrophoresis peaks were scored using GeneMarker version 2.2.0 (SoftGenetics, State College, Pennsylvania, USA). A total of 17 primer pairs with stable repeatability and high variation were selected for further analysis. All primer sequences obtained from this study were submitted to GenBank (Table 1).
Polymorphism assessment— To further evaluate the applicability of these primers, 68 individuals from five representative populations from China, Korea, and Japan (Appendix 1) were used to calculate genetic variation parameters. DNA extraction, PCR amplification, and length assessment of PCR products were performed following the procedures described above. The presence of null alleles and their bias on genetic diversity were evaluated based on the expectation maximization method implemented in FreeNA (Chapuis and Estoup, 2007). Deviation from Hardy–Weinberg equilibrium for each population and linkage disequilibrium for each primer pair were tested using GENEPOP version 4.0.7 (Rousset, 2008). The number of alleles, observed heterozygosity, expected heterozygosity, and polymorphism information content were calculated to assess the genetic polymorphism at each locus using CERVUS version 3.0.3 (Kalinowski et al., 2007).
Two loci (SS20, SS95) with high occurrence of null alleles (>5%) were excluded from the following analysis. No significant deviation from Hardy-Weinberg equilibrium (P < 0.001) was observed for the remaining 15 loci except SS5 in populations CZJ and JFS; SS19 in population KMJ; and SS21, SS100, and SS109 in population JFS, which might be caused by Wahlund effect of specific populations. There was no evidence of significant linkage disequilibrium in any pair of loci. We detected 156 alleles in total, and the number of alleles at each locus ranged from four to 18, suggesting a moderate to high level of polymorphism. The observed heterozygosity, expected heterozygosity, and polymorphism information content for each locus ranged from 0.36 to 0.97, 0.59 to 0.92, and 0.53 to 0.91, respectively (Table 2).
Transferability evaluation— Transferability of the 15 primers was examined in the accessions of the five related species, i.e., five accessions each for S. californica, S. hispida, S. moranensis, and S. jalapensis and 10 accessions for S. scobinicaulis (Appendix 1). All loci were successfully amplified except two loci (SS21 and SS100) for S. hispida and one (SS33) for S. moranensis (Table 3). Polymorphism was detected in all but two loci (SS21 and SS100) for S. californica, five (SS2, SS19, SS103, SS120, and SS122) for S. hispida, four (SS21, SS74, SS103, and SS114) for S. moranensis, and one (SS100) for S. jalapensis (Table 3). The levels of both cross-amplifiability and polymorphism largely decreased with increasing phylogenetic distance. In total, 12 loci were amplifiable across the other five species in the S. hispida group.
Genetic properties of the 15 newly developed microsatellite loci for Smilax sieboldii. Loci SS20 and SS95 are not included due to a high proportion (>5%) of null alleles.a
Fragment sizes detected in cross-amplification tests of the 15 newly developed microsatellite markers in the remaining five species of the Smilax hispida group.a
Using high-throughput sequencing, we sequenced and assembled the transcriptome of S. sieboldii without a reference genome. Fifteen EST-SSR markers were successfully developed to evaluate the genetic structure and demography of S. sieboldiii, of which 12 are likely to be useful for all six species of the S. hispida group.
The authors thank the editor and anonymous reviewers for their constructive comments that substantially improved the manuscript. This work was supported by the National Natural Science Foundation of China (no. 31461123001, 3151101152) and the National Project for Basic Work of Science and Technology of China (no. 2015FY110200).