Despite the great importance of members of the Brassicaceae in agriculture and the extensive genomic resources available for Arabidopsis thaliana (L.) Heynh., our knowledge of phylogenetic relationships within the family is still murky in many clades (Franzke et al., 2011). Among the hurdles to elucidating phylogenetic relationships within this family are extensive gene duplication and polyploidization, and past and recent hybridization (Franzke et al., 2011). Main lineages have been identified using a variety of regions (e.g., ITS, nad4, ndhF, phyA, Adh, chs, matK, rbcL, and trnLF). However, relationships at shallower levels (e.g., within some tribes or at the species level) are often characterized by poor resolution. New genomic tools have much to contribute to our understanding of evolution in Brassicaceae, but to date, technological, analytical, and logistical limitations have slowed down the wide-scale applicability of genomic approaches for phylogenetics (Egan et al., 2012). Thus, phylogenetic studies at the species level or of rapidly diversified groups still rely widely on single marker primer development.
We developed a strategy to identify and design primers for single-copy nuclear genes (SCNGs) focusing on Streptanthus Nutt. and other members of the tribe Thelypodieae whose phylogenetic relationships and circumscription have remained a challenge (Warwick et al., 2010). Our strategy combines Illumina reads from genomic scans (low-depth-coverage sequencing of total genomic DNA) and public expressed sequence tags (ESTs) from Brassica L., a close relative to the Thelypodieae, with results from previous studies that identified putative SCNGs at wider taxonomic scales using algorithmic methods. We report several primer combinations that might be of utility for informing relationships within and across groups in the Thelypodieae.
METHODS AND RESULTS
Our approach to identify SCNGs is outlined in Fig. 1. We cross-referenced the results of three previous studies that used algorithms to identify putative SCNGs with published ESTs as follows: for the APVO loci (file 1471-2148-10-61-S1.xls from Duarte et al., 2010), we kept only those loci reported to have introns and be SCNGs in A. thaliana; for the COSII set (file available at: http://solgenomics.net/documents/markers/cosii.xls), we included all that were single-copy in A. thaliana; for the PPR genes (file NPH_2739_sm_TableSl.xls from Yuan et al., 2009), we kept those unique in rice and Arabidopsis; and we included all ESTs between B. napus and A. thaliana (Ilut and Doyle, 2012) after removing duplicates. Our final matrix contained 10,817 loci (APVO, 5381; COSII, 2869; PPR, 90; ESTs, 2477). The vast majority of loci (5596; 69.86%) were represented by a single source, 25% (2025) were represented by two sources, 5% (385) by three, and only 0.05% (4) were present in all four sources. We selected loci for primer design at random and verified that the following four criteria were met for each locus (if not, we picked another locus): (1) it was identified as SCNG by multiple sources in the matrix above; (2) it was represented by a single gene model in the A. thaliana genome (Tair10); (3) it contained an estimated length range between 600–1200 bp to allow for assembly from a single pass of Sanger sequencing; and (4) it possessed 40–60% intron content to maximize potential phylogenetic utility at species-level relationships (Rodríguez et al., 2009). In addition, we chose loci to span all five A. thaliana linkage groups.
We selected 15 loci for primer design. We designed primers based on alignments of genomic scans generated from low-coverage Illumina sequencing of total genomic DNA (Illumina GAIIx [Illumina Inc., San Diego, California, USA], 80 bp reads) of B. rapa L. (2× coverage) and B. oleracea L. (9× coverage; L. Comai, unpublished data) mapped onto A. thaliana using the Burrows-Wheeler Alignment tool (BWA) (Li and Durbin, 2009) and visualized in IGV version 1.5 (Thorvaldsdottir et al., 2013). We located the selected regions based on their locus ID and followed standard primer design guidelines, aiming for primers with a length of 22–25 bp, 40–60% GC content, melting temperature (T m) = 55–62°C, the presence of a 3′ GC clamp, and without repeats, runs, or secondary structures such as hairpins, dimers, and cross-dimers. Prior to testing in the laboratory, we tested primer performance in silico using Amplify 3× version 3.1.4 ( http://engels.genetics.wisc.edu/amplify/). We designed 250 primer combinations for the 15 selected regions, and chose 52 to test in the laboratory.
Between one and five primer pairs for each of 15 selected regions were tested for single band amplification in a set of taxa spanning several genera in the Thelypodieae. Here, we report statistics on primer combinations that consistently yielded a single band and whose product generated a clean sequence in at least 70% of taxa tested (five loci), as well as sequences for a few primer sets that could be of utility with additional optimization or in a different subset of taxa (Tables 1 and 2).
Laboratory— Genomic DNA was extracted from tissue dried in silica gel using either the cetyltrimethylammonium bromide (CTAB) method (Doyle and Doyle, 1987) or the DNeasy Plant Mini Kit (QIAGEN, Valencia, California, USA). PCR reactions consisted of 5 µL of 5× Green GoTaq Reaction Buffer (M791A; Promega Corporation, Madison Wisconsin, USA), 0.5 µL. dNTP mix (10 mM each), 0.5 µL of each primer (10 µM), and 0.2 µL (5 units/µL) of GoTaq (M3001; Promega Corporation) in a total volume of 25 µL. Cycling conditions are presented in Table 2. Bidirectional sequencing was performed at Beckman Coulter Genomics (Danvers, Massachusetts, USA). When more than one band amplified, we isolated bands, reamplified, and sequenced directly. If cloning was necessary, PCR products were gel-purified (QIAquick Gel Extraction Kit, QIAGEN), ligated into pGEM T-Vector (Promega Corporation), cloned into E. coli DHB-5α-competent cells (Invitrogen, Carlsbad, California, USA), reamplified (eight colonies per PCR product), and sequenced.
Primer regions that amplify a single band and yield clean sequences (first five) and others that might be useful on a clade-by-clade basis.
Summary of parsimony-informative characters for those regions for which we obtained sequence data (due to financial limitations we could only sequence a reduced number of amplicons). For those taxa where cloning (see Appendix 1) was necessary, the allele that yielded the shortest tree was selected.
Sequences were assembled and edited in Sequencher version 4.7 (Gene Codes Corporation, Ann Arbor, Michigan, USA). Potential PCR recombinants, assessed by manual examination of the sequences, were excluded. Alignment was performed manually in MacClade version 4.08 (Maddison and Maddison, 2002), and proportion of informative characters calculated in PAUP* version 4.0b10 (Swofford, 2002).
We have corroborated the utility of the SCNGs reported here by using a subset to estimate phylogenies of the “Streptanthoid” complex and its allies, a group that has been subject to several substantial taxonomic revisions and whose phylogenetic relationships have remained poorly understood. While these results are beyond the scope of this paper and will be reported separately (Cacho et al., in prep.), given the level of phylogenetically informative variation that we observe (Table 2; Appendix 1) we have confidence that the SCNGs we contribute here will be useful to infer species-level phylogenies in several clades of the Thelypodieae and potentially of the Brassicaceae as a whole. These improved phylogenies could be an important stepping stone to facilitate comparative evolutionary studies in these clades until technological advances allow straightforward implementation of new sequencing technologies for lowcost phylogenetic studies.
Using a strategy that combines results from previous algorithmic studies identifying putative SCNGs with genomic resources from published ESTs and Illumina genomic scans, we have identified and designed primers for several SCNGs that are of phylogenetic utility. Our primers yield sequences that are informative for phylogenies at and above the species level in most species of Thelypodieae and Sisymbreae we tested, including when possible two or more species of Streptanthus, Streptanthella Rydb., Caulanthus S. Watson, Guillenia Greene, Stanleya Nutt., Sisymbrium L., Thelypodium Endl., and Thysanocarpus Hook. Given that we designed primers based on Arabidopsis and Brassica sequences, they are also likely to be useful for understanding relationships among members of Camelineae, and potentially across Brassicaceae as a whole.
 We appreciate Luca Comai's generosity in facilitating the Brassica BAM files and providing laboratory space. Support for this study comes from the National Science Foundation (DEB 0919559 to S.Y.S.), Plant Genome Program award DBI 0733857 “Functional genomics of plant polyploids” (L. Comai), and a Consejo Nacional de Ciencia y Tecnología (CONACyT) fellowship to N.I.C. (EPSCI no. 187083).