A chloroplast is an organelle with its own genome encoding a number of chloroplast-specific components (Sugiura et al., 1998). Owing to its tractable size and high level of conservation, the chloroplast genome can be used to characterize genetic relationships among species. Furthermore, plant taxonomists have widely adopted the sequence variability of two loci in land plants, consisting of portions of the chloroplast rbcL and matK genes, as an effective DNA barcode (Vijayan and Tsou, 2010). Chloroplast DNA contains many of the genes necessary for proper functioning of the organelle. The analysis of chloroplast DNA sequences has proven useful in studying plant evolution (Shaw et al., 2007), and the field of chloroplast genome characterization is growing rapidly (Timmis et al., 2004). The size of the genome, which has been determined for a number of plants and algae, ranges from 85 to 292 kbp. The complete DNA sequences of several different chloroplast genomes of plants and algae have been reported. Many chloroplast DNAs contain two inverted repeats (IRs), which separate a large single-copy region (LSC) from a small single-copy region (SSC) (Palmer and Thompson, 1982). The IRs vary in length from 4 to 25 kbp (Robinson et al., 2009).
Capsicum frutescens L. (Solanaceae), a name that is generally applied to all cultivated peppers in the United States, is also known as C. annuum L. (Smith and Heiser, 1951). Cultivars of C. frutescens can be annual or short-lived perennial plants. The flowers have a greenish white or greenish yellow corolla, and they are either insect- or self-pollinated. The fruit is usually very pungent, growing to 1.0–8.0 cm long and 0.6–3.0 cm in diameter (Smith and Heiser, 1951). The fruit is typically pale yellow as it matures to a bright red, but it can also be other colors (Heiser and Smith, 1953; Stummel and Bosland, 2006). More recently, C. frutescens has been bred to produce ornamental strains with a large number of erect peppers growing in colorful ripening patterns (Stummel and Bosland, 2006). Capsicum frutescens likely originated in South or Central America (Heiser, 1979; Clement et al., 2010) and spread quickly throughout the tropical and subtropical regions in this area, where it still grows wild today (Purseglove, 1976). It is also believed that C. frutescens is the ancestor of C. chinense Jacq. (Bosland, 1996; Basu et al., 2003).
In this study, using Illumina technology, the complete chloroplast genome of C. frutescens was sequenced, assembled, annotated, and mined for simple sequence repeat (SSR) markers and for single-nucleotide polymorphism (SNP) and insertion/deletion (indel) variants. The resultant data have been made publicly available as a resource for genetic information for Capsicum L. species, which will facilitate investigations into genetic variation and phylogenetic relationships of closely related Capsicum species.
METHODS AND RESULTS
For this study, C. frutescens seeds (accession no. IT158639) were obtained from the National Agrobiodiversity Center, Rural Development Administration, Republic of Korea. Seeds were germinated and grown in a greenhouse, fresh leaves were collected from 40-d-old seedlings, and DNA was extracted using a DNeasy Plant Mini Kit (QIAGEN, Valencia, California, USA) according to the manufacturer's instructions to construct chloroplast DNA libraries. An Illumina paired-end DNA library (average insert size of 500 bp) was constructed using the Illumina TruSeq library preparation kit following the manufacturer's instructions (Illumina, San Diego, California, USA).
The library was sequenced with 2 × 300 bp on the MiSeq instrument at LabGenomics ( http://www.labgenomics.co.kr/). Prior to chloroplast de novo assembly, low-quality sequences (quality score < 20; Q20) were filtered out, and the remaining high-quality reads were assembled using the CLC Genome Assembler (version beta 4.6; CLC bio, Aarhus, Denmark) with a minimum overlap size of 200 bp and maximum bubble size of 50 bp for the de Bruijn graph. Chloroplast contigs were selected from the initial assembly by performing a BLAST (version 2.2.31) search against the reference chloroplast genome of C. annuum (GenBank accession NC_018552) using CLC software with parameters of 0.5 for fraction, 0.8 for similarity, and 200–600 bp of overlap size (Jo et al., 2011). The selected chloroplast contigs were merged into a total of four contigs, and iterative contig extensions were performed to construct a complete C. frutescens chloroplast genome by mapping raw reads to the contigs. Dual Organellar GenoMe Annotator (DOGMA; Wyman et al., 2004) and CpGAVAS (Liu et al., 2012) were used to annotate the chloroplast genome. All transfer RNA (tRNA) genes were amended with tRNAscan-SE (Lowe and Eddy, 1997). OGDRAW (Lohse et al., 2007) was used to produce a map of the genome.
SSR candidates of the Capsicum frutescens chloroplast genome.
Sputnik software (Cardle et al., 2000) was used to find the SSR markers present in the chloroplast genome of C. frutescens. It uses a recursive algorithm to search for repeats with lengths between two and five, and finds perfect, compound, and imperfect repeats. Sputnik has been applied for SSR identification in many species, including Arabidopsis and barley (Cardle et al., 2000). To identify SNP and indel variants in the C. frutescens chloroplast genome, we used BWA (Li and Durbin, 2009) with ‘mem’ command line options ‘-k19 −w100 −d100 −r1.5 −y20 −c500 −D0.5 −W0 −m50’ and SAMtools (Li et al., 2009) software with ‘mpileup’ command line options ‘-uf −d250 -q0 −e20 −h100 −L250 −m1 −o40.’ A more detailed method is described at http://samtools.sourceforge.net/mpileup.shtml.
SNP markers of the Capsicum frutescens chloroplast genome.
Illumina paired-end (2 × 300 bp) sequencing produced a total of 8,272,114 paired-end reads, with an average fragment length of 256 bp, which were then analyzed to generate 1,796,432,923 bp of sequence. The results contain 31,772,592 mapped nucleotides with an average coverage of 202× on the chloroplast genome. Contig alignment and scaffolding based on paired-end data resulted in a complete circular C. frutescens chloroplast genome sequence (Fig. 1). The chloroplast genome of C. frutescens has been deposited in GenBank (accession no. KR078312; National Center for Biotechnology Information [NCBI]). It has a total length of 156,817 bp and is composed of an LSC of 87,380 bp, two IRs of 25,792 bp, and an SSC of 17,853 bp. The overall GC content of the C. frutescens chloroplast genome is 37.7%, with the IRs having a higher GC content (43.1%) than the LSC (35.7%) and SSC (32.0%) due to the presence of GC-rich ribosomal RNA (rRNA) genes. The C. frutescens chloroplast genome encodes 132 unique genes (Appendix 1), including 87 protein-coding genes, 37 tRNA genes, and eight rRNA genes. Seven of these genes are duplicated in the IR regions, nine genes (rps16, atpF, rpoC1, petB, petD, rpl16, rpl2(IR), ndhB(IR), ndhA) and six tRNA genes contain one intron, and two genes (clpP, rps12) and one ycf (ycf3) contain two introns (Fig. 1).
The size of the C. frutescens chloroplast genome (156,817 bp) was larger than reported for Capsicum species such as C. annuum var. glabriusculum (Dunal) Heiser & Pickersgill (GenBank accession no. KJ619462) and C. annuum (GenBank accession no. NC_018552). The lengths of the LSC and IRs in C. frutescens differed from those in the other two species and contributed to the variation of chloroplast genome size. For example, the C. frutescens chloroplast genome was 36 bp longer than the reported C. annuum chloroplast genome and 205 bp longer than the C. annuum var. glabriusculum chloroplast genome. Furthermore, the SSC and IR regions of C. frutescens were 3 and 9 bp longer, respectively, and the LSC region was 14 bp shorter and 167 bp longer, respectively, than those of the previously reported chloroplast genomes. The average GC content in the C. frutescens chloroplast genome is 37.7%, similar to other Capsicum species.
The organization and gene order of the Capsicum chloroplast genome exhibited the general chloroplast genome structure seen in angiosperms (Sugiura, 1992). The Capsicum chloroplast genome contains 132 genes (Appendix 2), of which there were eight rRNA genes, 37 tRNA genes, 21 ribosomal subunit genes (12 small subunit and nine large subunit), and four DNA-directed RNA polymerase genes. Forty-six genes were involved in photosynthesis, of which 11 encoded subunits of the NADH-oxidoreductase, seven for photosystem I, 15 for photosystem II, six for the cytochrome b6/f complex, six for different subunits of ATP synthase, and one for the large chain of ribulose bisphosphate carboxylase/oxygenase (RuBisCO). Five genes were involved in different functions, and three genes were of unknown function. As shown in Fig. 1 and Appendix 2, genome organization appeared to be more conserved with unique gene sequences, as discovered previously in Capsicum species (Jo et al., 2011; Zeng et al., 2014; Raveendar et al., 2015a). However, in this newly determined chloroplast genome, we found 132 predicted genes and size variations were observed in the IR and LSC regions.
Indel markers of the Capsicum frutescens chloroplast genome.
A total of 125 potential SSRs motifs were identified, located mostly in the noncoding regions (Table 1); of these, the majority belonged to tetranucleotide (50%) and trinucleotide (26%) repeats. All other types of SSRs, such as di- and pentanucleotide motifs, were relatively low (25%). The majority of tetranucleotide SSRs had the AAAT/AATA/ATAA motif, followed by those with the ATAA/TAAA/AAAT motif; the TTTG/TTGT/TGTT, TCTT/CTTT/TTTC, and AATT/ATTA/TTAA motifs were found with similar frequency (7.2%). Two different repeats—those with the TTTTA/TTTAT/TTATT and TTATT/TATTT/ATTTT motifs—were identified among pentanucleotide SSRs. The TTC/TCT/CTT and TTA/TAT/ATT motifs were identified among the trinucleotide SSRs, but only the TA/AT motif was identified for the dinucleotide SSRs (Table 1). In total, 125 potential SSRs motifs were identified in the 156.8-kb sequence of the Capsicum chloroplast genome. Hence, the observed frequency of SSRs motifs was approximately one per 1250 bp of chloroplast genome.
Comparison of the C. frutescens chloroplast genome sequence with the reference chloroplast sequence of C. annuum revealed a total of 34 mutations (18 SNPs and 16 indels), with 15 of these variants involving more than one nucleotide (Table 2 and 3). Among the detected variants, six SNPs and two indels were observed in the coding region of the chloroplast genome. Among these SNPs and indels, there were 29 and five mutations located in the LSC and SSC regions, respectively. These molecular markers will facilitate studies of genetic diversity, population genetic structure, and sustainable conservation for C. frutescens.
The size of the C. frutescens chloroplast genome identified here is more closely related to that of C. annuum var. glabriusculum reported previously (Raveendar et al., 2015b). Moreover, the C. frutescens chloroplast genome has similar genome organization, gene order, gene sizes, and GC content, with only SNPs/indels variation. It has been reported that C. annuum var. glabriusculum is considered the wild parental species of the cultivated C. annuum (Votava et al., 2002; Aguilar-Meléndez et al., 2009; González-Jara et al., 2011).
We provide here the complete chloroplast genome sequence of C. frutescens, a cultivated pepper in the United States. Availability of this sequence and the recently determined C. annuum chloroplast genome sequence (GenBank accession no. NC_018552) enables us to assess genome-wide mutational dynamics within the genus Capsicum. The chloroplast genome possesses similar genome organization, gene order, gene sizes, and GC content, with only SNPs/indels variation having been revealed. It is difficult to get accurate phytogenies and effective species discrimination using a small number of plastid genes in evolutionarily young lineages (Ruhsam et al., 2015). Therefore, complete plastid genome sequencing provides a solution to this problem. Availability of this sequence can enable researchers to design conserved primers to sequence new genomic regions that could provide useful phylogenetic information for closely related species. Moreover, the structural details of this C. frutescens chloroplast genome join the growing database of Capsicum species, which can facilitate investigations into gene expression and genetic variation of these crop species.
This study was performed with the support of the Research Program for Agricultural Science and Technology Development (Project no. PJ008623), National Institute of Agricultural Science, Rural Development Administration, Republic of Korea.