Catalpa Scop. (Bignoniaceae) comprises 11 species of trees, and five of the 11 species in the genus originated in China. Catalpa ovata G. Don is distributed in central and northern China, whereas C. bungei C. A. Mey. and C. fargesii Bureau are distributed in central to southwestern China; C. fargesii has a glabrous form, namely, C. duclouxii Dode (Gilmour, 1936). Catalpa tibetica Forrest is endemic to southwestern China and, like C. ovata, has creamy yellow flowers. Catalpa bungei is characterized as fast growing, having excellent wood qualities, and being highly adaptable in China (Shi et al., 2011). Due to these economic and ecological benefits, it has been introduced and cultivated in Shandong, Jiangsu, Henan, and Anhui provinces (Shi et al., 2011). Molecular genetic studies have been few in number (Li, 2008), and no simple sequence repeats (SSRs) have been reported. To optimize the conservation and utilization of C. bungei and related species, the development of expressed sequence tag (EST)–SSR markers is very useful for germplasm identification and research into the genetic diversity of C. bungei and related species.
Next-generation sequencing (NGS) technologies have emerged as powerful tools for high-throughput EST sequence determination (Clark et al., 2013). EST-SSRs derived from EST sequences are more convenient and can be isolated with higher efficiency and at lower expense than genomic sequence SSRs (Wang et al., 2012). In this study, we identified 3999 SSR loci and characterized 30 polymorphic EST-SSR markers to facilitate our further investigations of systematics and population genetics in C. bungei and related species.
METHODS AND RESULTS
ESTs are an important source for the development of SSR markers. In this study, ESTs were isolated using a NGS approach. Total RNAs were extracted from the roots of one individual of C. bungei ‘YU-1’ using Trizol reagent according to the manufacturer's instructions (Invitrogen, Carlsbad, California, USA). Paired-end libraries with approximate average insert lengths of 200 bp were synthesized using a Genomic Sample Prep Kit (Illumina, San Diego, California, USA) according to the manufacturer's instructions. Libraries were sequenced (101-bp paired-end reads) on an Illumina HiSeq 2000 instrument by a customer sequencing service (Biomarker Technologies, Beijing, China). Raw reads were cleaned by removing adapter sequences, empty reads, and low-quality sequences. Clean reads were assembled into nonredundant transcripts using Trinity, which has been developed specifically for de novo assembly of transcriptomes using short reads (Grabherr et al., 2011). The clean sequence data has been deposited in the Short Read Archive database of the National Center for Biotechnology Information (NCBI; accession no. SRP059272). A total of 62,955 unigenes were obtained with an N50 length of 1417 bp. Potential SSR loci of these unigenes were detected using the MISA tool (Thiel et al., 2003; http://pgrc.ipk-gatersleben.de/misa). The parameters were as follows: minimum SSR motif length of 10 bp and repeat length of 10 for mononucleotides, six for dinucleotides, and five for tri-, tetra-, penta-, and hexanucleotides (Yang et al., 2014). A total of 3999 SSR loci were identified in 14,634 unigenes from the C. bungei transcriptome. Of these unigenes, 580 contained more than one SSR locus, and 484 SSR loci were present in compound formation. The combined set of all of the EST-SSR loci revealed that, on average, one EST-SSR was found for every 7.51 kb of sequence data. Within the identified EST-SSR loci, mono-, di-, tri-, tetra-, and pentanucleotide repeats had two, four, 10, 14, and two types, respectively. The most frequent repeat motifs were mononucleotide repeats (1957 [48.94%]), followed by dinucleotide (1164 [29.11%]), trinucleotide (834 [20.86%]), tetranucleotide (41 [1.02%]), and pentanucleotide repeats (3 [0.07%]) (Table 1). All of the dinucleotide and trinucleotide repeat motifs were further analyzed to determine their distribution. The most common dinucleotide motif was AG/CT (730 [62.71%]), and the rarest was CG/CG (5 [0.43%]) (Table 2). Among the trinucleotide repeats, AAG/CTT (243 [29.14%]) was the most common motif, followed by ATC/ATG (132 [15.83%]); ACT/AGT (9[1.08%]) was the rarest motif (Table 2).
Subsequently, the mononucleotide repeats were discarded because it was difficult to distinguish genuine mononucleotide repeats from polyadenylation products and some were likely generated by base mismatching or sequencing errors. Primer pairs were designed using Primer3 (Rozen and Skaletsky, 1999). The major parameters for primer pair design were set as follows: primer length of 18–22 bases (optimal 20 bases), PCR product size of 100–500 bp (optimal 200 bp), GC content of 40–70% (optimal 50%), and annealing temperatures of 52–59°C (optimal 55°C). Based on these parameters, 177 primer pairs were designed and synthesized for polymorphism detection.
Genomic DNAs of all accessions were extracted from the leaves using a modified version of the cetyltrimethylammonium bromide (CTAB) method (Kabelka et al., 2002). Samples of C. bungei were collected from four populations: Luoning, Henan Province (population HN: 34°24′6″N, 111°42′42″E; n = 21); Chuxian, Anhui Province (population AH: 32°50′54″N, 117°47′49″E; n = 11); Lianyungang, Jiangsu Province (population JS: 34°40′3″N, 119°19′60″E; n = 6); Qingzhou, Shandong Province (population SD: 36°46′15″N, 118°25′56″E; n = 14). Samples of three related species were collected from three populations: C. duciouxii in Kunming, Yunnan Province (25°02′32″N, 102°38′46″E; n = 13); C. fargesii in Yishui, Shandong Province (35°48′38″N, 118°38′5″E; n = 15); and C. ovata in Yunxian, Hubei Province (32°51′33″N, 110°44′10″E; n = 12). Plants for all accessions were grown in the Catalpa germplasm repository at the Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, and vouchers are deposited at the Herbarium of the Institute of Botany, Jiangsu Province and Chinese Academy of Sciences (NAS), Nanjing, China (Appendix 1). Approximately 10 g of young leaves were collected in the spring season. PCR amplification was carried out in 10-µL reaction mixtures containing 30 ng of template DNA, 1× PCR buffer (Mg2+ free), 2.0 mM MgCl2, 0.2 mM dNTPs, 0.25 µM of each primer, and 1 unit Taq polymerase (TaKaRa Biotechnology Co., Dalian, China). Cycling was performed on a T100 Thermal Cycler (Bio-Rad, Marnes-la-Coquette, France). Amplification reactions were initiated with a pre-denaturing step (95°C for 5 min), followed by denaturing (95°C for 30 s), annealing (55°C for 45 s), extension (72°C for 60 s) for 32 cycles, and a final extension at 72°C for 8 min. Amplified PCR products were separated on 8% denaturing polyacrylamide gels using a vertical electrophoresis device. Detection of EST-SSR bands was performed using the silver staining method.
One hundred seventy-seven EST-SSR primer pairs were synthesized in this study. Fifty-five primer pairs were identified that yielded stable, clear, and repeatable amplicons in all accessions. The other primer pairs were unstable or gave no product. The 55 primers corresponded to 25 monomorphic loci ( Appendix S1 (apps.1500117_s1.doc)) and 30 polymorphic loci (Table 3). The polymorphic SSR loci were analyzed with POPGENE version 1.32 software (Yeh et al., 1999) for the number of alleles per locus (A), observed heterozygosity (Ho), expected heterozygosity (He), and fixation index (FIS). The A values ranged from two to 18 with a mean of 6.78 (Table 4). The Ho and He values were 0.05–1.00 and 0.18–0.95 with averages of 0.53 and 0.75, respectively. The FIS values ranged from −1.00 to 1.00 with an average of 0.32. Hardy–Weinberg equilibrium (HWE) and link-age disequilibrium using Bonferroni correction were tested for every locus. Less than half of the loci (12, six, one, and seven loci in populations HN, AH, JS, and SD, respectively) showed significant departure from HWE (P < 0.001). Significant linkage disequilibrium was not detected between any pair of loci (P < 0.001).
EST-SSRs present in the Catalpa bungei transcriptome.
Characteristics of the di- and trinucleotide repeat motifs in the Catalpa bungei transcriptome.
Cross-amplification of 30 polymorphic loci was tested in 61 individuals of four Catalpa species under the same PCR conditions used for C. bungei. All markers showed successful amplification results in more than half of the 61 individuals tested, with the exception of three loci (comp100847, comp111793, and comp114074) (Table 5).
To identify potential functions of the 30 polymorphic SSR-associated unigenes, the sequences were used for BLAST searches and annotation against the NCBI nonredundant protein (NR) database ( http://www.ncbi.nlm.nih.gov/) using an E- value cut-off of 10-5. All sequences were found to have potential functions by BLASTX. These sequences showed significant homology to protein sequences from Sesamum indicum L., Rehmannia glutinosa (Gaertn.) Libosch. ex Fisch. & C. A. Mey., Genlisea aurea A. St.-Hil., and Erythranthe guttata (DC.) G. L. Nesom. The potential functions were mainly related to transcription factor, hormone metabolism, and carbon metabolism ( Appendix S2 (apps.1500117_s2.doc)).
In the present study, we have developed 30 novel EST-SSR polymorphic markers for C. bungei. These markers provide an efficient tool for investigating population genetic diversity in different environments and will facilitate studies on molecular breeding, genetic improvement, and conservation of C. bungei and related species.
Characteristics of 30 polymorphic EST-SSR markers in Catalpa bungei.a
Genetic properties of 30 polymorphic EST-SSR loci in Catalpa bungei.a