Phylotranscriptomics, or using transcriptome sequences to investigate phylogenetic relationships and gene family evolution in nonmodel plants, has gained popularity in recent years due to decreases in cost and improvements in analysis pipelines (Wickett et al., 2014; Edger et al., 2015; Li et al., 2015; Yang et al., 2015; McKain et al., 2016). It is often possible to recover at least 15,000 genes from the target species using de novo–assembled transcriptome data (Yang and Smith, 2013). Among these, approximately 5000 are shared among most species within an order (Yang et al., 2015), with the rest being tissue- and/or taxon-specific. Together they provide enormously rich data both for phylogenetic reconstruction and for investigating gene family evolution that underlies lineage-specific adaptations.
Generating plant phylotranscriptomic data has become much easier over the past few years due to improvements in sequencing and extraction protocols but may still be challenging for a variety of reasons. Previous literature on phylotranscriptomic methods has focused on RNA extraction and fragment analyses of those extracted RNA samples (Johnson et al., 2012; Yockteng et al., 2013; Jordon-Thaden et al., 2015) and sequence data analyses (Yang and Smith, 2013, 2014). However, as phylotranscriptomic studies expand to nonmodel systems that often require field sampling, the logistics of obtaining fresh tissues becomes a limiting factor. Likewise, some taxa such as cacti pose special challenges due to high levels of mucilage (Jordon-Thaden et al., 2015). Moving forward, the issues of long-term preservation and curation of cryogenic genetic materials will also be of the utmost importance for laboratories seeking to pursue these studies.
From 2012 to 2015, we conducted field expeditions to remote localities in both the southwestern United States and northern Mexico to support National Science Foundation–funded projects on the evolution of Caryophyllales and gypsum-endemic plants. Together with samples from living collections, we generated a transcriptome data set of 200 species of plants (Appendix 1). During the process we have developed an optimized workflow, which is described below. In addition, we discuss alternative procedures that we tested, as well as considerations for project planning.
METHODS AND RESULTS
Taxon sampling—The Caryophyllales phylotranscriptomics project emphasized a combination of broad taxon sampling across the order and in-depth sampling of lineages with key evolutionary transitions. These key transitions include the gain and loss of plant carnivory; the gain and loss of betalain pigmentation; transitions to saline, dry, or alpine habitats, and/or to specialized soil types; and transitions to C4 and CAM photosynthesis. Of the transcriptomes we have generated for the Caryophyllales phylotranscriptomic project, half were collected from the field, with the remaining half from living collections (Appendix 1). Additional transcriptomes and genomes were obtained from publicly available databases such as Phytozome (Goodstein et al., 2012), the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA), and the 1000 Plants Initiative (1KP; Matasci et al., 2014).
Field collection—We timed our field trips to coincide with the beginning of the flowering season as much as possible to optimize the chance of obtaining young flower and leaf buds. Our experience has been that mature vegetative tissue is more difficult to work with due to its low concentration of nuclear RNA (Johnson et al., 2012) and high level of chloroplast RNA and secondary compounds compared to developing tissues. It is also important to emphasize that field conditions are more difficult to control than greenhouse conditions. While this may impose limitations for researchers wishing to study differential gene expression, this is less problematic for phylotranscriptomic studies.
Compared to tissue preservation using an RNA stabilization solution (such as RNAlater; Thermo Fisher Scientific, Waltham, Massachusetts, USA), tissue frozen in the field allows for biochemical analyses such as characterization of betalain and anthocyanin pigmentation, in addition to DNA and RNA sequencing, and hence this was our primary (and recommended) means of collection (Appendix 2). For all individuals frozen in liquid nitrogen, we also collected silica-preserved tissue from the same individual as a DNA backup, as well as herbarium specimens whenever possible. Because DNA may degrade relatively quickly for some groups in silica (e.g., Onagraceae), it is important to remove silica from the leaves once dried and place them in a −20°C freezer for long-term storage (Neubig et al., 2014).
RNA extraction (less than 6 h for six samples)—We tested five alternative RNA extraction protocols. These include TRIzol option 1 from Jordon-Thaden et al. (2015), the Aurum Total RNA Mini Kit (Bio-Rad Laboratories, Hercules, California, USA) following the manufacturer's protocol, the QIAGEN RNeasy Mini Kit (QIAGEN, Hilden, Germany) following the manufacturer's protocol, the PureLink protocol (Appendix 3; Yockteng et al., 2013), and the hot acid phenol-LiCl-RNeasy Mini Kit protocol (Appendix 4, modified from Protocol 12 of Johnson et al. ). We had approximately 10–30% success rate (see below for quality control) with Bio-Rad, QIAGEN, and TRIzol protocols, whereas the PureLink protocol had close to 100% success rate and only failed when the sample itself was degraded or highly mucilaginous. Although more time consuming, the hot acid phenol-LiCl-RNeasy Mini Kit protocol had great success with tissues that are highly mucilaginous like cacti (Appendix 4).
Quality control and DNase digestion (less than 3 h for 12 samples)—For quality control of RNA, we used agarose gel for an initial assessment. If RNA was evident, removal of DNA was carried out following Jordon-Thaden et al. (2015) with minor modifications (Appendix 5). After that, we followed fig. 2 of Jordon-Thaden et al. (2015) for evaluating integrity of RNA on a 2100 Bioanalyzer (Agilent, Santa Clara, California, USA) or a Fragment Analyzer (Advanced Analytical Technologies, Ankeny, Iowa, USA). RNA concentration was measured with either a NanoDrop Spectrophotometer (Thermo Fisher Scientific) or a Qubit fluorometer (Thermo Fisher Scientific). We considered an RNA integrity number (RIN) of 6 or higher and concentration of 20 ng/µL or higher as successful. When RNA extraction failed, it was often due to either pellet loss (resulting in a completely empty gel with no DNA or RNA trace) or degradation (which shows up as smeared ribosomal RNA bands). RNA degradation can happen during collection, shipping, or in a suboptimal extraction, as for example with too much starting tissue. For difficult tissues that are mucilaginous, we reduced the amount of starting tissue by half.
RNA samples prepared at the Brockington Laboratory at the University of Cambridge, United Kingdom, were shipped on dry ice in cardboard freezer boxes to the University of Michigan for library preparation and sequencing. Dry ice shipments were sent on Monday or Tuesday to avoid delay over the weekend.
Library preparation (less than 20 h for 12 samples)—We tested four different library preparation protocols. In 2012, we started with Illumina TruSeq version 2 (Illumina, San Diego, California, USA), with and without additional strand-specific steps (see Supplementary Methods in Yang et al. ). In 2013, we began using the newly released TruSeq Stranded mRNA Library Prep Kit (“the Illumina kit”; Illumina), which was more streamlined and produced much higher strand specificity than the previous stranded protocol. In 2014, we switched to the KAPA Stranded mRNA-Seq kit (“the KAPA kit”; KAPA Biosystems, Wilmington, Massachusetts, USA; Appendix 6), which is considerably cheaper than the Illumina kit with indistinguishable results in terms of both success rate and strand specificity. The KAPA kit is also more streamlined with fewer bead washing steps and required roughly 15% less time. The cost is ca. US$30 per sample for the KAPA kit itself plus ca. US$20 per sample for consumables (magnetic beads, tips, tubes, and additional chemicals; we used leftover adapters from the Illumina kit, which lasted through more than 150 additional libraries from one 48-sample Illumina kit). We modified the manufacturer's protocol slightly to accommodate the increasing read length of newer Illumina platforms (125- or 150-bp paired-end; Appendix 6).
Quality control of the library was done at the University of Michigan DNA Sequencing Core using an Agilent 2100 Bioanalyzer followed by confirmation using qPCR. Although the minimal concentration of the library and percentage of adapter contamination allowed differ among sequencing platforms, we followed a few general rules. First, the peak of the library fragment size distribution should be approximately the read length plus adapter size. For example, for paired-end 125-bp sequencing on Illumina platforms, peak of library size distribution should be approximately 60 bp (adapter) + 125 bp (read) in each direction, making a total of 370 bp for the optimum library size (see Appendix 6 for modifications in library preparation to adjust library sizes). Second, although we do not quantify the library concentration in the laboratory, we visualized the library by loading 3 µL of library mixed with GelRed fluorescent stain (Biotium, Fremont, California, USA) onto a 1.5% agarose gel. As a rule of thumb, if the libraries were visible from the gel (even if only barely visible), they were sent to the DNA Sequencing Core for further quantification. Libraries were walked to the on-campus University of Michigan DNA Sequencing Core immediately in ambient temperature, or stored in −20°C for less than a month before walking to the sequencing core in ambient temperature.
Sample curation (less than 1 h per sample)—We store all RNAs in a −80°C freezer on standard storage racks. Ideally, they would be stored long-term in liquid nitrogen vapor freezers. To prevent freeze/thaw of sensitive samples, we placed samples into labeled cardboard freezer boxes and recorded the sample locations in a database that is properly backed up (Appendix 7).
We have developed an effective phylotranscriptomics workflow involving cryogenic tissue collection in the field, RNA extraction of diverse taxa with close to 100% success rate, library preparation for Illumina platforms, and sample storage and curation. Future efforts should focus on streamlining the workflow given specific laboratory and field settings and as sequencing technologies continue to evolve. In addition, it would be ideal to collaborate with major tissue and seed banks such as the Millennium Seed Bank (Royal Botanic Gardens, Kew) and the Global Genome Initiative (Smithsonian Institution) (Gostel et al., 2016) when designing phylotranscriptomic projects.
The authors thank H. Flores Olvera, H. Ochoterena, N. Douglas, A. Clifford, S. Lavergne, T. Stoughton, N. Jensen, W. Judd, U. Eggli, G. Kadereit, R. Puente, L. Majure, D. Warmington, S. Pedersen, and K. Thiele for assisting with obtaining plant materials; the Bureau of Land Management, U.S. Forest Service, California State Parks, Missouri Botanical Garden, Rancho Santa Ana Botanic Garden, Desert Botanical Garden, The Kampong of the National Tropical Botanical Garden, Sukkulenten-Sammlung Zürich, Cairns Botanic Gardens, Botanischen Gartens–Technische Universität Dresden, Millennium Seed Bank, and Booderee National Park for granting access to their plant materials; and M. R. M. Marchán-Rivadeneira, L. Cortés Ortiz, S. Ahluwalia, J. Olivieri, V. S. Mandala, R. Mostow, M. Croley, L. Leatherman, R. Cronn, M. Parks, T. Jennings, and I. Jordon-Thaden for help with developing laboratory protocols. The molecular work of this study was conducted in part in the Genomic Diversity Laboratory at the University of Michigan. Support came from the University of Michigan, Oberlin College, the National Geographic Society, a Chateaubriand Fellowship, and the U.S. National Science Foundation (DGE 1144254, DEB 1054539, DEB 1352907, and DEB 1354048). Fieldwork by H.E.M. was supported in part by the ERA-Net BiodivERsA project “WhoIsNext,” with the national funders Agence Nationale de la Recherche (ANR; ANR-13-EBID-0004), Deutsche Forschungsgemeinschaft (DFG), and Fonds zur Förderung der wissenschaftlichen Forschung (FWF); the Joseph-Fourier Alpine Station provided lodging and logistic support.
- Blanco, M. A., W. M. Whitten, D. S. Penneys, N. H. Williams, K. M. Neubig, and L. Endara. 2006. A simple and safe method for rapid drying of plant specimens using forced-air space heaters. Selbyana 27: 83–87. Google Scholar
- Brockington, S. F., Y. Yang, F. Gandia-Herrero, S. Covshoff, J. M. Hibberd, R. F. Sage, G. K. S. Wong, et al. 2015. Lineage-specific gene radiations underlie the evolution of novel betalain pigmentation in Caryophyllales. New Phytologist 207: 1170–1180. Google Scholar
- Edger, P. P., H. M. Heidel-Fieldischer, M. Bekaert, J. Rota, G. Glöckner, A. E. Platts, D. G. Heckel, et al. 2015. The butterfly plant armsrace escalated by gene and genome duplications. Proceedings of the National Academy of Sciences, USA 112: 8362–8366. Google Scholar
- Goodstein, D. M., S. Q. Shu, R. Howson, R. Neupane, R. D. Hayes, J. Fazo, T. Mitros, et al. 2012. Phytozome: A comparative platform for green plant genomics. Nucleic Acids Research 40: D1178–D1186. Google Scholar
- Gostel, M. R., C. Kelloff, K. Wallick, and V. A. Funk. 2016. A workflow to preserve genome-quality tissue samples from plants in botanical gardens and arboreta. Applications in Plant Sciences 4: 1600039. Google Scholar
- Johnson, M. T. J., E. J. Carpenter, Z. Tian, R. Bruskiewich, J. N. Burris, C. T. Carrigan, M. W. Chase, et al. 2012. Evaluating methods for isolating total RNA and predicting the success of sequencing phylogenetically diverse plant transcriptomes. PLoS ONE 7: e50226. Google Scholar
- Jordon-Thaden, I. E., A. S. Chanderbali, M. A. Gitzendanner, and D. E. Soltis. 2015. Modified CTAB and TRIzol protocols improve RNA extraction from chemically complex Embryophyta. Applications in Plant Sciences 3: 1400105. Google Scholar
- Li, Z., A. E. Baniaga, E. B. Sessa, M. Scascitelli, S. W. Graham, L. H. Rieseberg, and M. S. Barker. 2015. Early genome duplications in conifers and other seed plants. Science Advances 1: e1501084. Google Scholar
- Matasci, N., L.-H. Hung, Z. Yan, E. Carpenter, N. Wickett, S. Mirarab, N. Nguyen, et al. 2014. Data access for the 1.000 Plants (1KP) project. GigaScience 3: 17. Google Scholar
- McKain, M. R., H. Tang, J. R. McNeal, S. Ayyampalayam, J. I. Davis, C. W. dePamphilis, T. J. Givnish, et al. 2016. A phylogenomic assessment of ancient polyploidy and genome evolution across the Poales. Genome Biology and Evolution 8: 1150–1164. Google Scholar
- Neubig, K. M., W. M. Whitten, J. R. Abbott, S. Elliott, D. E. Soltis, and P. S. Soltis. 2014. Variables affecting DNA preservation in archival plant specimens. In W. L. Applequist and L. M. Campbell [eds.], DNA banking for the 21st century: Proceedings of the U.S. Workshop on DNA Banking, 81–112. William L. Brown Center, Missouri Botanical Garden, St. Louis, Missouri, USA. Google Scholar
- Wickett, N. J., S. Mirarab, N. Nguyen, T. Warnow, E. Carpenter, N. Matasci, S. Ayyampalayam, et al. 2014. Phylotranscriptomic analysis of the origin and early diversification of land plants. Proceedings of the National Academy of Sciences, USA 111: E4859–E4868. Google Scholar
- Yang, Y., and S. A. Smith. 2013. Optimizing de novo assembly of shortread RNA-seq data for phylogenomics. BMC Genomics 14: 328. Google Scholar
- Yang, Y., and S. A. Smith. 2014. Orthology inference in non-model organisms using transcriptomes and low-coverage genomes: Improving accuracy and matrix occupancy for phylogenomics. Molecular Biology and Evolution 31: 3081–3092. Google Scholar
- Yang, Y., M. J. Moore, S. F. Brockington, D. E. Soltis, G. K.-S. Wong, E. J. Carpenter, Y. Zhang, et al. 2015. Dissecting molecular evolution in the highly diverse plant clade Caryophyllales using transcriptome sequencing. Molecular Biology and Evolution 32: 2001–2014. Google Scholar
- Yockteng, R., A. M. R. Almeida, S. Yee, T. Andre, C. Hill, and C. D. Specht. 2013. A method for extracting high-quality RNA from diverse plants for next-generation sequencing and gene expression analyses. Applications in Plant Sciences 1: 1300070. Google Scholar
RNA extraction for mucilage tissue using hot acid phenol-LiCl-RNeasy Mini Kit (ca. 2 days). Notes and modifications from Protocol 12 in appendix S1, Johnson et al. (2012). Prepared by Alfonso Timoneda and Tao Feng.
DNase digestion (∼1 h). Modified from the manufacturer's protocol and from Jordon-Thaden et al. (2015). Prepared by Ya Yang.
Stranded mRNA library preparation (ca. 2 d for 12 libraries and 2.5 d for 20 libraries). Prepared by Ya Yang and Michael Moore.