Open Access
Translator Disclaimer
4 July 2015 Transcriptome-Facilitated Development of SNPs for the Sonoran Desert Rock Fig, Ficus petiolaris (Moraceae)
Nicholas G. Davis, Derek D. Houston, John D. Nason
Author Affiliations +

The genus Ficus L. (Moraceae) is a diverse (>750 species) and ecologically important lineage of tropical woody plants. Many organisms depend on figs to carry out portions of their life cycles, particularly fig wasp pollinators and parasites, which are often host fig specific. Despite substantial interest in the coevolution of figs and fig wasps (Herre et al., 2008), as nonmodel organisms genomic resources are largely lacking. Next-generation sequencing technologies have facilitated the development of genomic resources, such as single-nucleotide polymorphisms (SNPs), for nonmodel organisms. SNPs are biallelic markers that can yield valuable insight into ecological, genetic, and coevolutionary processes (Morin et al., 2004; Pool et al., 2010; Steiner et al., 2013).

The Sonoran Desert rock fig, F. petiolaris Kunth, is the only widespread, desert-adapted fig species in North America. It is also the northernmost naturally distributed Ficus in the New World, reaching a latitude of 31°N in the state of Sonora, Mexico. Ficus petiolaris supports a community of obligately associated fig wasps, including a pollinator (Pegoscapus) and several nonpollinators (Aepocerus, Heterandrium, Idarnes, and Physothorax). To enable ecological and evolutionary genetic studies, we sequenced the transcriptome of F. petiolaris to develop SNP markers optimized for high-throughput genotyping on the Sequenom MASSArray System (Agena Bioscience, San Diego, California, USA).


RNA was extracted from nine F. petiolaris plants grown from seeds sampled from five populations distributed across the species' range in Baja California, Mexico (Appendix 1). Five milligrams of leaf tissue was sampled per individual, samples were pooled and homogenized in liquid nitrogen with a mortar and pestle, and RNA was extracted using the Spectrum Plant Total RNA Kit (Sigma-Aldrich, St. Louis, Missouri, USA). Extracted RNA was quantified using a NanoDrop 1000 Spectrometer (Thermo Fisher Scientific Inc., Waltham, Massachusetts, USA) and then submitted to the Iowa State University (ISU) DNA Facility where it was quantified a second time using the Agilent RNA 6000 Nano Kit (Agilent Technologies, Santa Clara, California, USA). A cDNA library was prepared from the mRNA templates using the Illumina TruSeq RNA Sample Preparation Kit V2 (Illumina, San Diego, California, USA), with library construction verified using the Agilent DNA 7500 Kit (Agilent Technologies), before transcriptome sequencing at the ISU DNA Facility on an Illumina MiSeq (Illumina) with 250-cycle paired-end reads.

Illumina sequencing produced 33,294,480 reads, with an average read length of 215 bp, for a total of 7,147,200,749 bp sequenced. Low-quality reads were removed using Sickle v.1.33 (Joshi and Fass, 2011). The F. petiolaris transcriptome was de novo assembled using Trinity release 2013-11-10 (Grabherr et al., 2011). The final assembly contained 125,493 contigs, with a mean length of 1176 bp, mean coverage depth of 48×, N50 and N90 of 2011 and 478, respectively, and a total length of 147,624,931 bp. Reads were mapped to the assembled transcriptome using the program Bowtie2 v.2.1.0 (Langmead and Salzberg, 2012). SNP calling was performed using the Genome Analysis Toolkit (GATK) v.2.7-2 (McKenna et al., 2010). GATK input files were prepared using SAM-tools v.1.1 (Li et al., 2009) and Picard v.1.97 (The Broad Institute; freely available at GATK identified 139,254 putative SNPs, which were filtered bioinformatically using customized Python scripts. Initial SNP filtering was based on the following criteria: (1) sequence depth at the SNP position was ≥10; (2) the GATK quality score was ≥30; (3) there were no ambiguous bases, indels, or other SNPs located within 100 bp flanking the SNP; and (4) the minor allele was represented in at least 1% of the reads (to minimize ascertainment bias). This initial filtering yielded a set of 21,228 putative SNPs.

SNPs occurring in single-copy protein-coding genes were identified as follows: (1) Primary protein transcripts for Arabidopsis thaliana (L.) Heynh., Oryza sativa L., and Vitis vinifera L. were obtained from the U.S. Department of Energy database ( (2) Single-copy nuclear gene variants identified by Duarte et al. (2010) as shared among a diverse sampling of seed plants were retrieved from the primary protein transcripts. (3) A local BLASTX of F. petiolaris transcripts against the single-copy nuclear gene variant database was performed. BLAST results were filtered by E-value (≥1e-100), identity score (≥70%), and having hits to two or more species. This filtering yielded 3200 putative SNPs in 927 single-copy nuclear gene contigs. For contigs containing multiple SNPs, the one with the highest coverage was selected if it was also located ≥60 bp from the contig's ends and ≥20 bp from the nearest neighboring SNP.

Table 1.

Information for the 54 SNPs validated through genotyping a panel of 96 Ficus petiolaris individuals.a


SNPs in organellar genomes were identified by performing tBLASTX against the mitochondrial genomes of Malus domestica Borkh. (GenBank no. FR714868), V. vinifera (FM179380), Ricinus communis L. (HQ874649), Carica papaya L. (EU431224), and A. thaliana (Y08501), and the chloroplast genomes of A. thaliana (NC_000932), Populus trichocarpa Torr. & A. Gray (NC_009143), and V. vinifera (NC_007957). The BLAST results for organellar genomes were filtered based on E-value (≥1e-50), identity score (≥70%), and hits to three or more mtDNA genomes, or two or more cpDNA genomes. After filtering out SNPs located near contig ends, a set of 31 putative organellar SNPs was obtained. This relatively small number of SNPs is likely due to the generally low levels of polymorphism in maternally inherited plant genomes.

The 927 nuclear and 31 organellar SNPs from F. petiolaris that were submitted to the Sequenom MASSArray System software for primer design had a minimum minor allele frequency of 9%, which should minimize the likelihood of calling a false SNP instead of a true SNP, particularly given the accuracy of the Illumina sequencing platform (Ross et al., 2013). The nuclear SNPs formed 31 multiplexes ranging from 27 to 30 loci, and the organellar SNPs formed two multiplexes of 24 loci and seven loci. The Sequenom software could not effectively multiplex the nuclear and organellar SNPs together, but given that genotypes were accurately scored, it is unlikely that loci in separate multiplexes would give rise to systematic bias. For genotyping, we selected two of the 33 total multiplexes: one nuclear multiplex of 30 loci, which had the highest confidence score (78.1%), and the organellar multiplex of 24 loci (confidence score 82.6%) (Table 1; Appendix 2).

SNPs were verified by genotyping 96 F. petiolaris individuals representing the species range in Baja California, Mexico (Appendix 1). Genomic DNA was extracted from silica-dried leaf tissue using an AutoGen Prep 740 DNA extraction robot (AutoGen, Holliston, Massachusetts, USA). DNA concentration was standardized to 20–25 ng/µL, then individuals were genotyped using the Sequenom MASSArray instrument at the ISU Genomic Technologies Facility. Of the 30 nuclear SNPs, 26 (87%) amplified successfully, of which 25 were polymorphic (Table 1). The one monomorphic SNP was likely due to poor amplification on the diversity panel (16% amplification; see Table 1). Of the 24 maternally inherited SNPs, 23 (96%) amplified successfully, of which only nine were polymorphic (Table 1). The relatively low number of polymorphic mtDNA and cpDNA SNPs may be an artifact of having a number of full siblings in our diversity panel, although further testing on additional samples is needed to verify that as the case.


We successfully developed primers for 49 SNPs that amplified reliably in F. petiolaris individuals sampled across a broad geographic range. These SNPs can be applied to future ecological, genetic, and coevolutionary studies of F. petiolaris and its associated pollinating and nonpollinating fig wasps.



J. M. Duarte , P. K. Wall , P. P. Edger , L. L. Landherr , H. Ma , J. C. Pires , J. Leebens-Mack , and C. W. dePamphilis . 2010. Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels. BMC Evolutionary Biology 10: 61. Google Scholar


M. G. Grabherr , B. J. Haas , M. Yassour , J. Z. Levin , D. A. Thompson , I. Amit , X. Adiconis , et al. 2011. Full-length transcriptome assembly from RNA-seq data without a reference genome. Nature Biotechnology 29: 644–652. Google Scholar


E. A. Herre , K. C. Jandér , and C. A. Machado . 2008. Evolutionary ecology of figs and their associates: Recent progress and outstanding puzzles. Annual Review of Ecology Evolution and Systematics 39: 439–458. Google Scholar


N. A. Joshi , and J. N. Fass . 2011. Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33) [software]. Available at [accessed 29 May 2015]. Google Scholar


B. Langmead , and S. L. Salzberg . 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods 9: 357–359. Google Scholar


H. Li , B. Handsaker , A. Wysoker , T. Fennell , J. Ruan , N. Homer , G. Marth , et al. 2009. The sequence alignment/map (SAM) format and SAMtools. Bioinformatics (Oxford, England) 25: 2078–2079. Google Scholar


A. McKenna , M. Hanna , E. Banks , A. Sivachenko , K. Cibulskis , A. Kernytsky , K. Garimella , et al. 2010. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20: 1297–1303. Google Scholar


P. A. Morin , G. Luikart , R. K. Wayne , and SNP Workshop Group. 2004. SNPs in ecology, evolution and conservation. Trends in Ecology & Evolution 19: 208–216. Google Scholar


J. E. Pool , I. Hellmann , J. D. Jensen , and R. Nielsen . 2010. Population genetic inference from genomic sequence variation. Genome Research 20: 291–300. Google Scholar


M. G. Ross , C. Russ , M. Costello , A. Hollinger , N. J. Lennon , R. Hegarty , C. Nusbaum , and D. B. Jaffe . 2013. Characterizing and measuring bias in sequence data. Genome Biology 14: R51. Google Scholar


C. C. Steiner , A. S. Putnam , P. E. A. Hoeck , and O. A. Ryder . 2013. Conservation genomics of threatened animal species. Annual Review of Animal Biosciences 1: 261–281. Google Scholar


Appendix 1.

Source locality information for samples included in this study.


Appendix 2.

SNP primer table including the marker's ID, GenBank accession number (NCBI ss#), polymorphism type, sequence capture primers 1 and 2, Sequenom extension primer, and cellular location.



[1] We performed all collections under United States Department of Agriculture permit no. P587-131025-011 and International Phytosanitary Certificate no. 1129486 issued to D.D.H. Finn Piatscheck, Justin Van Goor, and Ismael Romero assisted in the field. Kevin Cavallin assisted with DNA extractions, and Whitney Pike assisted with SNP genotyping. Funding was provided by the National Science Foundation in the form of a grant issued to J.D.N. (DEB-1146312).

Nicholas G. Davis, Derek D. Houston, and John D. Nason "Transcriptome-Facilitated Development of SNPs for the Sonoran Desert Rock Fig, Ficus petiolaris (Moraceae)," Applications in Plant Sciences 3(7), (4 July 2015).
Received: 19 March 2015; Accepted: 1 April 2015; Published: 4 July 2015

Ficus petiolaris
population genomics
RNA sequencing
single nucleotide polymorphism
transcriptome sequencing
Get copyright permission
Back to Top