Polyploidy (whole genome duplication) is a very important process that has shaped flowering plant evolutionary history (Soltis et al., 2009). Much progress in the study of polyploid evolution has been made in the past two decades regarding both ancient paleopolyploidization (Doyle et al., 2008; Soltis et al., 2009) as well as very recent neopolyploidization (Buggs et al., 2009; Abbott et al., 2013). An important research gap (Soltis et al., 2009) is understanding polyploids of intermediate age that have diploid ancestors in the same genus, so-called mesopolyploids, which are characterized by diploid-like reproduction but whose parental subgenomes are still discernible (Mandáková et al., 2010).
Several mesoallopolyploid crop systems (e.g., cotton, soybean, tobacco, wheat) are becoming well understood and have excellent genetic resources; however, understanding natural systems is also important. Specifically, studying natural mesopolyploid species radiations may be key to understanding the importance of polyploidy in angiosperm diversification (Soltis et al., 2009). Recent plant species radiations are a significant contributor to generating plant biodiversity, and evidence suggests that polyploidy has played an important role in these radiations (Mayrose et al., 2011). Many fundamental and biologically interesting questions regarding polyploidy and diversification in plants are yet to be investigated in such systems (Doyle et al., 2008; Soltis et al., 2009; Mayrose et al., 2011).
The large, nearly cosmopolitan genus Veronica L. (Plantaginaceae) comprises approximately 450 species of annual and perennial herbs, shrubs, and small trees with centers of diversity in both Eurasia and New Zealand. The genus is an excellent example of a natural mesopolyploid (∼20 million years old) system comprising multiple lineages, including several recent species radiations, in which polyploidy and hybridization have accompanied diversification (Albach et al., 2008; Meudt et al., 2015). Northern Hemisphere Veronica species are diploids or polyploids with chromosome numbers ranging from 2n = 14−80 and base numbers of x = 6−9 and 17 (Albach et al., 2008). By contrast, Southern Hemisphere species—which evolved as a single lineage from Northern Hemisphere ancestors ∼10 million years ago (Wagstaff et al., 2002; Albach and Meudt, 2010; Meudt et al., 2015)—all have high chromosome numbers (2n = 40−124) with base chromosome numbers of x = 20 or 21 (Albach et al., 2008). Several studies focusing on Veronica in both hemispheres have used standard DNA sequencing and amplified fragment length polymorphism (AFLP) fingerprinting techniques to elucidate patterns of relationship from phylogeography (Meudt and Bayly, 2008) to phylogeny of the genus as a whole (Wagstaff et al., 2002; Albach and Meudt, 2010; Meudt et al., 2015) or of particular polyploid complexes (e.g., Albach, 2007), and used these to infer the evolution of chromosome number, genome size, breeding systems, and habit (Albach and Greilhuber, 2004; Albach et al., 2008; Meudt et al., 2015). Nevertheless, a lack of variable genetic markers using standard DNA sequencing and genotyping techniques, and a lack of appropriate phylogenetic analysis methodologies that can incorporate reticulate evolution and allopolyploids, have hampered further progress in studies of Veronica and polyploid evolution at the population, species, and generic levels.
It has been known for some time that low-copy nuclear (LCN) markers can be extremely useful for phylogenetic reconstruction at the genus (interspecific) level, including for elucidating the evolutionary history of polyploids, for which standard uniparental DNA sequencing markers from chloroplast DNA or the internal transcribed spacer (ITS) region are not informative (e.g., Sang, 2002). Apart from LCN markers, microsatellites or simple sequence repeat (SSR) markers are useful for closely related species when traditionally genotyped and analyzed for studies at the infrageneric level, but SSRs and their flanking regions may also be useful as phylogenetic markers when high-throughput sequenced (Chatrou et al., 2009; Germain-Aubrey et al., 2016).This, however, requires new bioinformatic tools such as the workflows MarkerMiner (Chamala et al., 2015) and QDD (Meglécz et al., 2014) for the development of LCN and SSR markers, respectively, using genomic and transcriptomic resources.
High-throughput de novo transcriptome sequencing, or RNA-Seq, has proven to be an excellent source of genetic data for gene characterization and marker development in studies of natural systems with little or no additional genetic resources available (Stickler et al., 2012; Alvarez et al., 2015), as is the case for Veronica. The benefits of RNA-Seq are simultaneous characterization of genes and gene expression, reduced representation for large, complex genomes, and the generation of large amounts of sequence data without a reference genome. RNA-Seq also presents its challenges, particularly assembly without a reference genome, and assembly of polyploid genomes. Polyploid transcriptome assembly is an active area of research. A major issue is the differentiation of homoeologs from orthologs. Some studies have tested different pipelines, such as combining multiple k-mer assemblies in polyploid wheat (Krasileva et al., 2013), or combining assemblies from different assemblers and then using a second step to cluster redundant contigs in polyploid tobacco (Nakasugi et al., 2014). However, there are few examples to date of comparisons in natural, noncrop systems with few prior genomic resources. To date, there are no clear answers regarding which assembler, combination of assemblers, or assembly pipeline is best for polyploids and their diploid progenitors or close relatives.
The aim of our study was, therefore, twofold. First, we aimed to generate transcriptomic data for Veronica; second, we aimed to use these transcriptomic resources to develop and validate phylogenetically informative sequencing markers. Specifically, in this paper, we generate the first transcriptome resources for the genus Veronica, using short-read Illumina HiSeq 2000 (Illumina, San Diego, California, USA) sequencing of eight individuals representing seven species and five different ploidy levels. We then assemble, identify, and broadly characterize and compare a large number of expressed sequences. Single-nucleotide polymorphisms (SNPs) are mined from transcriptomes of these eight individuals plus those of two additional Plantaginaceae outgroups (Plantago ovata Forssk. and Picrorhiza kurrooa Royle ex Benth., available from public databases) and compared using phylogenetic and network analyses. Secondly, we used the transcriptomic data to discover, design, and develop two types of genetic markers (i.e., LCN and SSR markers). To test the success of our approach, we then used microfluidic PCR and Illumina MiSeq to validate 48 loci in 48 individuals for both LCN and SSR markers. We provide examples of sequence alignments and downstream phylogenetic analyses for representative loci showing their potential phylogenetic utility in Veronica when resequenced with high-throughput sequencing. The resource and marker development of the current study provide new, variable markers for future evolutionary studies of the genus. Furthermore, a parallel study currently underway will further examine assembly methods and analyze the transcriptomes themselves to quantify and compare underlying interspecific gene divergence and investigate the timing and mode of polyploidy in the sampled Veronica polyploids and their close relatives (Meudt et al., unpublished data). The current study is thus a critical first step toward ultimately understanding the role of polyploidy in generating novel genetic and morphological variation that leads to adaptation and species diversification (Doyle et al., 2008; Soltis and Soltis, 2009).
MATERIALS AND METHODS
RNA extraction, cDNA library prep, and Illumina sequencing—We sampled leaf tissue from seven greenhouse-grown individuals and from one field-collected individual representing seven species of two polyploid complexes in Veronica from New Zealand and Eurasia with three ploidy levels each (Appendix 1). The field-collected material was stored at −80°C on RNAlater (Life Technologies, Carlsbad, California, USA). Because we wanted to take a broad approach to analyze polyploidy in Veronica and develop markers for the entire genus, we sampled multiple species in two divergent lineages, rather than multiple individuals per species. Cultivated plants were grown in the same greenhouse in Oldenburg, Germany. Leaf material was harvested and placed directly into tubes with liquid nitrogen, stored at −80°C until extraction, and ground to a powder with a prechilled mortar and pestle while adding liquid nitrogen. RNA was extracted using the RNeasy kit (QIAGEN GmbH, Hilden, Germany) following manufacturer's instructions using 500 µL RLC buffer with 4% PVP and 1% β-mercaptoethanol. A DNase I digest and RNase inhibitor reaction was performed using 0.5 µL (20 units) RNase inhibitor, 6.0 µL 10× DNase I buffer, and 1.0 µL DNase I to the resulting 60 µL RNA extract and incubated at 37°C for 15 min. Then, 2.6 µL EDTA (0.2 M, pH = 8; final conc. 8 mM) was added, incubated for 10 min at 75°C, and the RNA was reprecipitated by adding 1:10 3 M sodium acetate, 2.5 volume 100% ethanol, incubating on ice for 20 min, centrifuging at full speed for 5 min, washing with 100 µL 75% ethanol, centrifuging at full speed, air-drying the resultant pellet for 10–15 min, redissolving in 25 µL RNase-free water, and storing at −80°C. Small aliquots of raw RNA extract and the reprecipitated RNA extract were run on the Tecan Infinite Pro F200 (Tecan, Crailsheim, Germany) and Agilent 2100 Bioanalyzer (Agilent Technologies, Waldbronn, Germany) to measure RNA quality and quantity. RNA from eight individuals with RNA Integrity Number (RIN) of 6.8 or greater, 260:280 ratio between 1.9–2.1, and at least 50 ng/µL (Appendix 1) were sent to BGI (BGI-Hong Kong Co. Ltd, Hong Kong, China) for Illumina TruSeq cDNA library preparation on normalized RNA and high-throughput Illumina HiSeq 2000 100-bp paired-end de novo transcriptome sequencing. The transcriptomic data generated here are publicly available in the National Center for Biotechnology Information (NCBI) Sequence Read Archive for submission SPR074674 and the Trinity assemblies in the NCBI Transcriptome Shotgun Assembly Sequence Archive (Table 1; http://www.ncbi.nlm.nih.gov/sra/SRP074674).
Table 1.
Information about Illumina sequencing reads and Trinity assemblies for the eight individuals of Veronica sampled.
Quality control, preprocessing of reads, assembly, and Blast2GO analyses— The following analyses were carried out on each of the eight individuals separately. Demultiplexed Illumina sequencing results were retrieved in FASTQ format via FTP from BGI. Between 12.8 and 13.5 million paired-end reads were generated per individual in both the forward and reverse directions (Table 1), from which single reads, adapters, and reads with a quality score (QC) cutoff of less than 20 had already been removed. After testing the effect of different QC cutoffs on the resulting sequence reads and assemblies of V. trichadena Jord. & Fourr., we used QC = 40 in the bash script TrimClip.sh (De Wit et al., 2012) to remove reads QC < 40. Reads were screened for contaminant sequences from H. sapiens, E. coli, mtDNA, and cpDNA using mirabait (MIRALIB version 4.0; Chevreux et al., 1999) with default settings, the respective databases downloaded from NCBI, and then removed. We used QualityStats.sh (De Wit et al., 2012) and the Galaxy web interface (Afgan et al., 2016) to summarize quality score and nucleotide distribution data for the forward and reverse reads, CollapseDuplicateCount.sh (De Wit et al., 2012) to calculate the fraction of duplicate reads and singletons, PECombiner.sh (De Wit et al., 2012) to remove orphan reads and put remaining reads in the same order in forward and reverse files, and the Velvet helper script shuffleSequences_fastQ.pl to put those two files together in one interleaved file (necessary for Velvet/Oases assembly). The resulting clean sequence reads were assembled de novo using several different assemblers including Trinity, trans- ABySS, SOAPdenovo-Trans, and Velvet/Oases. Relative to the other assemblers, Trinity produced more hits with >80% similarity to contigs >600 bp against Arabidopsis thaliana (L.) Heynh. (data not shown; comparisons done using MarkerMiner 1.0 [Chamala et al., 2015]). Therefore, we chose the de novo assemblies produced using Trinity version r20140717 (Grabherr et al., 2011, compiled for 64-bit Ubuntu) using default settings on the resulting clean sequence reads. For the purposes of marker development, a highly accurate discrimination of homoeologs in polyploids is not necessary at the transcriptome assembly stage, as the discrimination is done in the second resequencing step. Additional comparisons of different assemblers and assembly pipelines, particularly regarding polyploid transcriptomes, were outside the scope of the current study and will be addressed in a subsequent study (Meudt et al., unpublished data). Trinity assemblies of all four New Zealand, all four European, and all eight Veronica individuals were also made. Table 1 shows information about the sequence reads and statistics from the eight different individual Trinity assemblies. Functional annotation of contigs from the different assemblies was conducted using BLAT (Kent, 2002) with default settings against the TAIR database (version 10 represented gene model from 2011-01-03; Lamesch et al., 2012) and MapMan hierarchical categories (Ath_AGI_LOCUS_TAIR10_Aug2012; http://mapman.gabipd.org/web/guest/mapmanstore). Mean contig length ranged from 635–742 bp, N50 value from 916–1133 bp, E90N50 value from 1099–1307 bp (which is computed with the contig_ExN50_statistic.pl script of the Trinity package and represents the N50 of 90% of the expressed transcripts), and number (and percentage) of contigs with positive BLAST hits from 24,583–39,915 (42–63%). To demonstrate the quality and utility of the transcriptomic resources developed here, we compared the transcriptome sequences of our eight sampled individuals relative to each other and to two outgroups, Picrorhiza kurrooa ( http://scbb.ihbt.res.in/Picro_information/; SRR392742; Gahlan et al., 2012) and Plantago ovata (SRR629688; Kotwal et al., 2016). To do this, we mined the data from these 10 individuals for SNPs using Site Identification from Short Read Sequences (SISRS) version 1.0 (Schwartz et al., 2015; https://github.com/rachelss/SISRS/releases). SISRS identifies SNPs for phylogenetic studies directly from raw high-throughput sequences without a reference genome and without a priori knowledge of potentially informative loci. Briefly, SISRS first assembles raw sequence reads into a “composite genome” using Velvet, maps the raw reads and individual contigs against this composite genome with Bowtie 2, and then calls SNPs with a Python script (Schwartz et al., 2015). SNP discovery was performed using SISRS on four different data sets: (1) all eight Veronica individuals combined plus P. kurrooa and P. ovata as outgroups, (2) all eight Veronica individuals only, (3) the four New Zealand individuals only, and (4) the four European individuals only. The SNP data were converted to NEXUS format and analyzed using NeighborNet networks (SplitsTree version 4.14.2; Huson and Bryant, 2006). In addition, GARLI version 2.01.1067 (Zwickl, 2006) was used for phylogenetic tree reconstruction under maximum likelihood. We first performed a GARLI run with 10 replicates to estimate the model parameters for the model of evolution estimated with jModelTest version 2.1.5 (012010F; Darriba et al., 2012) [setting ratematrix = a b c a b a statefrequencies = estimate]; six of the 10 replicates had the same best lnL score. These estimated model parameters were then fixed for a bootstrap analysis, which was performed with 1000 replicates [parametervaluestring = M1 r 1.00000 7.30163 1.61422 1.00000 7.30163 1.00000 e 0.27231 0.22561 0.22541 0.27667]. The resulting tree was compared to previously published phylogenetic estimates.
Marker development—Two different types of markers were developed from the Veronica transcriptome resources generated here, LCN and SSR markers.
Low-copy nuclear markers—MarkerMiner was used with default settings to identify LCN markers from a curated set of conserved ortholog set (COS) loci (De Smet et al., 2013). MarkerMiner was developed and tested using transcriptome assemblies from 77 Lamiales species (including six from Plantaginaceae; Chamala et al., 2015), and uses a reciprocal BLAST of all transcriptomes with one another and to the reference A. thaliana genome. Arabidopsis thaliana (Brassicales) is the phylogenetically closest reference available in MarkerMiner to Veronica (Lamiales). Of the 1228 loci returned, 73 were classified as being “strictly” and 1155 as “mostly” single copy. MAFFT alignments of the 330 loci found in six or more individuals, of which 15 were “strictly” and 314 “mostly” single copy, were used to develop primers in Geneious (version 8.7) with Primer3 (Untergasser et al., 2012), aiming for a melting temperature of 60°C. Loci were checked manually for large introns in Geneious by comparing the MarkerMiner alignment to A. thaliana. We chose 13 “strictly” single-copy loci with a successful primer search and 35 additional “mostly” single-copy loci with successful primer searches such that all five A. thaliana chromosomes were equally represented in this marker set. These 48 loci were validated using Fluidigm microfluidic PCR and Illumina MiSeq amplicon sequencing of 48 individuals representing 46 Veronica species (19 from the Southern Hemisphere) and all subgeneric lineages (Appendix 2). The combination of this technique with Illumina MiSeq amplicon sequencing of 300-bp paired-end reads has proven useful and highly efficient in recent studies for development of novel and effective nuclear sequencing markers and improving understanding of phylogenetic relationships in nonmodel genera (Gostel et al., 2015; Uribe-Convers et al., 2016). This method enables the amplification of 48 samples and 48 primer pairs in 4-µL reaction volumes, such that the total volume of these reactions equals, e.g., 10 standard 25-µL reaction volumes. Each reaction contained 2 ng DNA, 200 nM of each primer, 0.1 µL 5 U/µL VELOCITY DNA polymerase (Bioline, Luckenwalde, Germany), 1× buffer, 0.1 µL 10 mM dNTPs, 0.25 µL 1 M DMSO, and 0.5 µL 5 M betaine. The samples were initially denatured for 2 min at 98°C; followed by 45 cycles of denaturation for 15 s at 98°C, annealing for 30 s at 55°C, elongation for 30 s at 72°C; and finalized with 5-min elongation at 72°C. Preliminary testing showed that more cycles were necessary due to some low-quality DNA samples. Barcoding and Illumina sequencing was done by LGC Genomics (Berlin, Germany) with Illumina MiSeq v3 chemistry. For each LCN locus, resulting sequences were trimmed with BBMap tools ( https://sourceforge.net/projects/bbmap/), de novo assembled with CAP3 (99% identity; Huang and Madan, 1999), aligned to the respective locus sequence of the transcript with MAFFT (setting E-INSI; Katoh and Standley, 2013), and examined in Geneious for sequence length, similarity to original transcript, A. thaliana gene, and number of individuals successfully sequenced. In addition, the alignment was exported to GARLI, in which numbers of sequences, SNPs, and parsimony informative characters (PICs) were calculated. For one randomly chosen example LCN marker, a phylogeny was reconstructed using the same settings as described above for SNPs.
Simple sequence repeats—Numerous SSRs were identified from Trinity assembly of the New Zealand individuals only using QDD version 3.1 (Meglécz et al., 2014; Table 1). Settings for the search were a length of 250–350 bp of the locus and primer melting temperatures of 59–61°C. After filtering for quality (taking QDD categories A and B), repeats (removing dinucleotides for example), and length of predicted PCR product, 48 loci were chosen from the 1124 potential SSRs with primer sites found by QDD. These were validated using Fluidigm microfluidic PCR and Illumina MiSeq amplicon sequencing (see above) of 48 individuals representing 20 Australasian species and one interspecific hybrid (Appendix 3). For each SSR marker, which included the SSR repeat area and flanking regions, resulting sequences were analyzed in the same way as the LCN data (see above) and examined in Geneious and GARLI regarding SSR motif, sequence length, number of individuals successfully sequenced, number of alleles sequenced, and pairwise genetic distance. In addition, for one randomly chosen example SSR locus, the alignment was exported to GARLI and a phylogeny was reconstructed using the same settings as described above for SNPs.
RESULTS
Transcriptomes—Functional annotation of individual assemblies was similar for each of the eight individuals, with gene ontology (GO) terms assigned to 13,009–14,271 contigs (19–33%; Table 1). There was large overlap of annotated contigs of the different assemblies whether looking at assemblies of individuals of New Zealand species only (26,524 or 89.4% shared annotated contigs; Fig. 1A), European species only (25,456 or 87.8%; Fig. 1B), or all New Zealand vs. all European species (29,839 or 94.3%; Fig. 1C). On the other hand, individual species had 114–453 (0.4–1.6%) unique annotated contigs relative to other species from the same geographical area, and the numbers for New Zealand and European species were comparatively very similar (compare Fig. 1A and 1B). Within the New Zealand species, V. hectorii Hook. f. and V. ochracea (Ashwin) Garn.-Jones shared the most unique annotated contigs (234 or 0.8%) relative to the other five species pairs, whereas V. catarractae G. Forst. and V. ochracea shared the fewest (110 or 0.4%; Fig. 1A). Within the European species, the species pair with the most unique shared annotated contigs was V. panormitana Tineo ex Guss. (2x) and V. cymbalaria Bodard (6x) (238 or 1.1%), whereas the two diploids V. panormitana and V. trichadena shared the fewest (53 or 0.2%) (Fig. 1B).
GO term results were also very similar; of 35 GO categories, the number of unique transcripts were largely overlapping for all species pairs, as is shown for the most divergent species pair of the eight transcriptomes sequenced (i.e., New Zealand V. hectorii vs. European V. panormitana; Fig. 2). The GO categories with the largest numbers of unique transcripts (ca. 500– 3000) for these Veronica leaf transcriptomes were (from highest to lowest) “not assigned,” “protein,” “RNA,” “signaling,” “transport,” “misc,” “cell,” and “DNA” (Fig. 2A).
SNP discovery using SISRS resulted in the following number of SNPs and potential PICs: 10-individual data set including outgroups (29,738 SNPs, 8746 PICs), eight Veronica individuals only (45,751 SNPs, 40,217 PICs), four New Zealand individuals only (41,167 SNPs, 2302 PICs), and four European individuals only (65,278 SNPs, 1735 PICs). When the 10-individual data set was analyzed using SplitsTree (Fig. 3A–C), the NeighborNet network clearly showed a main split between all Veronica transcriptomes vs. the two outgroups (Fig. 3A). Although some reticulation was present among the New Zealand species (Fig. 3C), reticulation is more pronounced among the European individuals (Fig. 3B), which comprise two allopolyploids and their putative diploid parental species. The phylogenetic analysis of the same data set contained moderate to high support for all branches in the phylogeny (Fig. 3D). Among the New Zealand individuals, V. hectorii (6x) and V. ochracea (18x) are very closely related to each other; V. ochracea may be an allopolyploid of V. hectorii and another unsampled species (Wagstaff and Wardle, 1999). Within the European lineage, V. cymbalaria (4x) is positioned between both diploid parental species as expected (Albach, 2007; Fig. 3B, 3D), and we suspect V. cymbalaria (6x) to be a backcross allopolyploid of V. cymbalaria (4x) × V. panormitana (2x) based on the larger similarity with that species compared with V. trichadena (Fig. 1; 328 vs. 132 unique annotated contigs).
Marker development
Low-copy nuclear markers—A range of 3–14 (average: 23.4, median: 22) of 48 individuals were successfully sequenced for each of the 48 loci, with 22 of 48 (46%) loci successfully amplifying in at least 24 (>50%) individuals (Appendix 4). For each individual, 4–40 loci were successfully amplified (average: 23.4, median: 21.5), and again less than half of the individuals (22/48, 46%) had successful amplification of at least 25 (>50%) loci (data not shown). Only one-quarter (11/48) of the loci aligned well with the corresponding transcript; these loci had mean lengths of 327–480 bp, contained large numbers of SNPs and PICs, and BLASTed to known A. thaliana genes (Appendix 4).
Figure 4 shows an alignment and GARLI tree of sequences from 22 of the 42 individuals successfully sequenced for one randomly chosen example locus, LCN-04 (two outgroups plus 10 European and 10 New Zealand Veronica individuals/species). Nearly twice as many different sequences were generated for the 10 New Zealand individuals shown here (27 sequences; 6x or 18x, “V. townsonii E6 18” to “V. melanocaulon D5 11” in the tree) relative to the 10 European individuals (14 sequences; 2x or 4x, “V. missurica E2 13” to “V. chamaedrys C3 10”). Of the 10 New Zealand individuals, five have only one sequence and are all in the same clade (V. albiflora (Pennell) Albach, V. cupressoides Hook. f., V. densifolia F. Muell., V. lavaudiana Raoul, and V. senex (Garn.-Jones) Garn.-Jones), whereas the other five have 2–8 sequences that fall into the first clade or one of two other clades (Fig. 4). As another example highlighting the low-copy nature of the loci that were sequenced, locus LCN-38 has two orthologous copies, which is expected due to the categorization of the A. thaliana gene AT3G59380 as “mostly” single copy (data not shown; comparisons done using MarkerMiner 1.0 [Chamala et al., 2015]). Additional phylogenetic analyses of the other LCN loci are outside the scope of this study and will be performed elsewhere (Meudt et al., unpublished data).
Simple sequence repeats—Overall, 3–47 (mean: 37.7, median: 44.5) of 48 individuals were successfully sequenced for each of the 48 SSR loci, including 40 of 48 loci that were successfully sequenced for at least 26 (>50%) individuals (Appendix 3). For each individual, 0–43 loci were successfully sequenced (average: 37.7, median: 40), with all but two individuals with at least 29 (>60%) loci successfully sequenced (individuals V. catarractae B1 and V. colostylis H3 failed for all 48 and 40 loci, respectively). In general, sequences ranged from 98–851 bp in length (average: 324) and contained one or more length- and/or sequence-variable SSR motifs as well as flanking SNPs and indels within and among individuals (e.g., Fig. 5). Number of sequenced alleles (which are supported by at least 10 raw sequencing reads) per individual ranged from 1–39 (mean: 4.32, median: 3.0, n = 47), with the lower polyploids having fewer alleles than the higher polyploids (6x, mean: 3.96, n = 37; 12x; mean: 5.25, n = 5; 18x, mean: 6.09, n = 5).
As the focus of SSRs is often population genetics, we analyzed two subsets of the larger SSR data set in more detail, i.e., eight individuals of V. chionohebe Garn.-Jones (4), V. trifida Petrie (2), and their interspecific hybrid (2) (all 2n = 42) (Appendix 5), and six individuals of V. thomsonii Cheeseman (2n = 42), respectively (Appendix 6). For all loci in the two subsets, sequences were on average of 317–327 bp, with 1–26 alleles (mean: 4.0–4.3), 54–80 SNPs, and 41.7–52.5 PICs (see “Totals” rows in Appendix 5 and 6). Figure 5 shows an alignment of 54 different SSR sequences from one locus (SSR-08) of the eight-individual V. chionohebe/V. trifida subset. In locus SSR-08, the sequences ranged from 311–387 bp (average: 357 bp). The sampled individuals had on average 6.8 alleles, and individuals of V. chionohebe had half as many unique alleles (3–6 each) as individuals of V. trifida and the interspecific hybrid (8–10). The sequences of locus SSR-08 were highly variable (note the many colored bars in the alignment in Fig. 5), with 126 SNPs and 109 PICs, and 0–0.14 pairwise genetic distances (mean and median: 0.08) (Appendix 5). In the phylogenetic tree, there is support for some taxonomic clustering of sequences of V. chionohebe and V. trifida, respectively, with hybrid sequences in highly supported clades with V. chionohebe or V. trifida in three vs. four cases, respectively (see tree in Fig. 5). Additional analyses of the other SSR loci are outside the scope of this study and will be performed elsewhere (Meudt et al., unpublished data).
DISCUSSION
The development of transcriptomic and genomic resources and variable genetic markers in so-called natural “mesopolyploid” species radiations is key to addressing fundamental questions about polyploidy and diversification. For polyploids, functional genomic resources in particular are important to facilitate the study of gene evolution. Veronica is an example of a natural mesopolyploid species radiation that to date has lacked such genomic and genetic resources, and this has hindered progress in studying polyploid evolution at the population, species, and generic levels. The transcriptomic and genetic resources developed here will make further detailed studies regarding the role of polyploidy in adaptation and species diversification in Veronica possible.
In the current study, we sequenced and assembled leaf transcriptomes from eight individuals representing seven species of Veronica from polyploid species radiations in Europe and New Zealand. There was high overlap of annotated contigs (Fig. 1) and GO terms (Fig. 2) among the eight individuals, as well as good phylogenetic resolution in the network and phylogenetic analyses of SNPs generated using SISRS (Fig. 3). An outstanding challenge with de novo transcriptome assemblies of polyploids is differentiating homoeologs from orthologs; however, this was not an issue for developing markers in polyploid Veronica from our transcriptome assemblies, as phylogenetic relationships (Fig. 3) are consistent with hypothesized relationships and previous phylogenetic results. Such results demonstrate the utility of these transcriptomic resources for phylogenetic studies, functional analyses across the genus using reverse transcription PCR, or for further comparative transcriptomic analyses of the sampled natural allopolyploids and their diploid parental species in the two main centers of diversity for Veronica (i.e., Europe and New Zealand). The large number of transcripts unique to hexaploid V. cymbalaria (453) relative to other individuals representing species from which it likely derived (V. trichadena: 114 and V. panormitana: 195) is surprising and opens the door to studies of differential expression and functional differentiation of genes in polyploids. Common garden experiments are also planned, which will allow comparison of other individuals with the eight sequenced here.
Furthermore, the SSR and LCN genetic markers developed here from the transcriptomes, and validated using microfluidic PCR and high-throughput sequencing, are highly variable and will be extremely useful in future phylogenetic studies of Veronica as a whole, as well as studies at the interface of inter-and intraspecific levels of New Zealand Veronica (e.g., phylogenetic, phylogeographic, and population genetic studies). From 330 mostly or strictly “low copy” loci common to 6–8 of the sequenced transcriptomes, we developed and sequenced 48 LCN markers in 48 individuals representing all subgeneric lineages in Veronica. Of the 22 LCN markers that were successfully sequenced for >50% individuals, 11 aligned well with the corresponding transcripts, were on average 394 bp long, contained large numbers of SNPs and PICs, and BLASTed to known A. thaliana genes. These 11 LCN markers are excellent candidates for reconstructing a better-resolved phylogeny of Veronica. In addition, of the 1124 SSRs identified in the four New Zealand Veronica individuals, we validated 48 in 48 Southern Hemisphere Veronica individuals, 40 of which were successfully sequenced for >50% of individuals. Sequenced SSRs and their flanking regions were on average 324 bp long, contained numerous SNPs and PICs, and had mean pairwise genetic distances of 0.01–0.18. The variation seen, particularly in the flanking regions of the sequenced SSRs, is equal to or much greater than that from previous studies using standard DNA sequencing and genotyping markers (e.g., Wagstaff et al., 2002; Meudt and Bayly, 2008). These 40 SSRs have great potential as highly variable sequencing markers (as opposed to being genotyped) at the interface of intra- and interspecific levels regarding questions of population genetics, species limits, and relationships of closely related species in New Zealand Veronica. Additionally, challenges presented by genotyping SSRs in polyploids, such as determining allele dosage and unambiguously identifying alleles (Pfeiffer et al., 2011), are overcome by sequencing the SSRs and their flanking regions, which we would recommend for future studies. For both the LCN and the SSR markers, future sequencing projects could be conducted either using traditional methods (PCR, cloning, sequencing) or using high-throughput sequencing. Furthermore, as biparental nuclear markers, the LCN markers and SSRs will be highly effective in elucidating complex relationships in polyploid Veronica.
In addition to the potential advances for Veronica, our methodological approach may also be useful for other natural polyploid groups that lack genomic or genetic resources. Natural species that are not associated with economically important crop or other “model” species often lack genomic resources and are very limited regarding the availability of variable genetic markers. Furthermore, developing and establishing such markers using traditional methods (e.g., López-González et al., 2015) can be tedious and time-consuming, with more effort required for fewer microsatellites developed. (As an aside, we found eight of the 12 reported SSR loci from López-González et al. [2015] in our transcript sequences, none of which met the quality criteria of our QDD pipeline for our New Zealand–focused sampled species.) There are nearly 4000 plant transcriptomes in the NCBI Sequence Read Archive ( http://www.ncbi.nlm.nih.gov/sra/) and 1000 Plants (1KP) project ( www.onekp.com; Matasci et al., 2014) online resources (Hodel et al., 2016). The 1KP transcriptomes have recently been used to develop SSRs for over 1000 plant species (Hodel et al., 2016), whereas Chapman (2015) published a method for the development and validation of 10 COS LCN loci for legume crop species from transcriptomes. By combining Chapman et al.'s (2015) standard wet laboratory approach with a scalable, high-throughput microfluidic PCR strategy (Gostel et al., 2015; Uribe-Convers et al., 2016), we here show that screening of 48 SSR or LCN loci is possible in one microfluidic PCR. In fact, this approach could be scaled up from 48 loci to 480 loci, although the latter might have drawbacks, as here 10 loci are amplified in multiplexed reactions, respectively, and results of these de novo marker sequences can contain PCR chimeras. Nevertheless, combining Fluidigm microfluidic PCR and MiSeq amplicon sequencing of LCN and SSR markers, which were designed in MarkerMiner and QDD from transcriptomic data, is a relatively straightforward high-throughput marker validation method as well as an analysis pipeline that can be used on other natural (and polyploid) systems.
ACKNOWLEDGMENTS
The authors thank the Albach and Zotz AGs at Carl-von-Ossietzky Universität Oldenburg for their support; W. Lobin (Bonn), J. Seguí Colomar (Mallorca), P. J. Garnock-Jones (Wellington), and A. C. Clarke (Cambridge) for assistance collecting material; the Alexander von Humboldt Foundation for an Experienced Researcher Fellowship (to H.M.M.); New Zealand Core funding for Crown Research Institutes from the Ministry of Business, Innovation and Employment's Science and Innovation Group; and two anonymous reviewers for their constructive comments.
LITERATURE CITED
Appendix 5.
Validation of 48 SSRs on a subset of the 48 New Zealand and Australian individuals of Veronica sequenced. Shown are eight individuals of the Veronica chionohebe/V. trifida subset (A5, B5, C4, D4, E4, F4, G4, and H4; see Appendix 3).
Continued.
Continued.
Appendix 6.
Validation of 48 SSRs on a subset of the 48 New Zealand and Australian individuals of Veronica sequenced. Shown are six individuals of the V. thomsonii subset (C5, D5, E5, F5, G5, and H5; see Appendix 3).