Billions of specimens can be found in natural history museum collections around the world, holding potential molecular secrets to be unveiled. Among them are intriguing specimens of rare families of moths that, while represented in morphology-based works, are only beginning to be included in genomic studies: Pseudobistonidae, Sematuridae, and Epicopeiidae. These three families are part of the superfamily Geometroidea, which has recently been defined based on molecular data. Here we chose to focus on these three moth families to explore the suitability of a genome reduction method, target enrichment (TE), on museum specimens. Through this method, we investigated the phylogenetic relationships of these families of Lepidoptera, in particular the family Epicopeiidae. We successfully sequenced 25 samples, collected between 1892 and 2001. We use 378 nuclear genes to reconstruct a phylogenetic hypothesis from the maximum likelihood analysis of a total of 36 different species, including 19 available transcriptomes. The hypothesis that Sematuridae is the sister group of Epicopeiidae + Pseudobistonidae had strong support. This study thus adds to the growing body of work, demonstrating that museum specimens can successfully contribute to molecular phylogenetic studies.
Over 3 billion specimens are estimated to be found in natural history museum collections around the world, representing one of the most important biobanks in the world (Duckworth et al. 1993, Suarez and Tsutsui 2004, Chapman 2005). Until recently, this vast amount of biological resource was mainly used for morphological studies because the DNA from these specimens was thought to be too degraded to be used for molecular studies (Shapiro and Hofreiter 2012). Due to this, DNA work has, for a long time, been limited to species for which freshly collected samples could be obtained, while molecular work from collections was restricted to Sanger sequencing of short fragments of DNA (Hajibabaei et al. 2006, Lozier and Cameron 2009, Strutzenberger et al. 2012, Hebert et al. 2013, Cameron et al. 2016). Moreover, the methods were often destructive for the specimens (Hajibabaei et al. 2006, Strutzenberger et al. 2012, Hebert et al. 2013). Recently, high-throughput sequencing technologies have made the DNA in museum specimens more accessible, either through whole-genome sequencing (Cong et al. 2017, Sproul and Maddison 2017, Allio et al. 2019, Li et al. 2019, Zhang et al. 2019) or through genome reduction methods (Suchan et al. 2016, Breinholt et al. 2018, Toussaint et al. 2018). These advanced sequencing approaches have opened up a new field with great potential for studying the evolutionary history of taxa that are difficult to collect: museomics.
The family Epicopeiidae is a small Asian family of Lepidoptera represented by 25 species (Minet 2002, Wei and Yen 2017, Zhang et al. 2020). Many of them are large diurnal species mimicking butterflies in the families Papilionidae and Pieridae. The history of the family has been dynamic. Epicopeiidae had originally been described to harbor only one genus Epicopeia Westwood, 1841 (Laithwaite and Whalley 1975). The pierid-like moths Nossa Kirby, 1892, were previously assigned to the family Epiplemidae (now considered a subfamily of Uraniidae), but then were rightly placed in Epicopeiidae by Fletcher (1979) and later confirmed by Minet (1983, 1986). In latter studies, Minet (1983, 1986) added five genera to Epicopeiidae: Amana Walker, 1855; Chatamla Moore, 1881; Parabraxas Leech, 1897; Psychostrophia Butler, 1877; and Schistomitra Butler, 1881. In 2002, Minet described two new genera, Deuveia and Burmeia. Finally, in 2017, the number of genera increased to 10 with the description of Mimapora by Wei and Yen. The family was thought to be related to Drepanidae and was placed in the superfamily Drepanoidea (Minet 2002), until recent molecular data suggested that they are in fact related to the superfamily Geometroidea (Regier et al. 2009, Bazinet et al. 2013, Rajaei et al. 2015). The sister group of Epicopeiidae has been suggested to be the recently described Pseudobistonidae (Rajaei et al. 2015, Wang et al. 2019). Minet (2002) studied the relationships of genera within Epicopeiidae based on 34 morphological characters obtained from the head, thorax, pregenital abdomen, and male genitalia. He found that Deuveia was sister to the rest of Epicopeiidae and that the relationships of the other genera were relatively clear (Fig. 1, left side). However, the position of Amana was not stable; it was either sister to Chatamla + Parabraxas or sister to a clade containing Chatamla, Parabraxas, Schistomitra, Nossa, and Epicopeia.
The first attempt to infer the phylogeny of the family based on genetic markers was done by Wei and Yen (2017). They used sequence data for three gene regions (COI, EF-1α, and 28S) and 14 species. Their study was mainly focused on describing a new genus, Mimaporia, but they sampled widely throughout the family. The results of their analyses are highly incongruent with those of Minet (2002), but showed poor or no support on many branches. Wei and Yen (2017) showed that Epicopeia and Nossa likely are paraphyletic, and they were not able to resolve the relationships of the new genus Mimaporia with any confidence (Fig. 1).
Recently, Zhang et al. (2020) used PCR-generated baits to infer a multilocus phylogenetic hypothesis for Epicopeiidae based on 18 species and 94 loci. Their results were highly congruent with Minet's (2002) results based on morphology and also found that Epicopeia and Nossa both were paraphyletic with regard to each other. In addition to using fresh specimens, Zhang et al. (2020) used older specimens with some degree of success, although they were able to recover a significantly smaller number of loci from the older specimens.
Epicopeiidae species are generally rare and difficult to collect, as they are mainly distributed in areas that are not easy to access, nevertheless they can be found in natural history museums. Here we investigate the use of target enrichment (TE) methods to study the phylogenetic relationships of this family of Lepidoptera based only on museum specimens. Genome reduction methods, such as TE, aim to sequence only specific segments of the genome. In the case of highly fragmented genomes (e.g., museum specimens), such genome reduction methods might be a very useful way of gathering data for phylogenetic studies. To study phylogenetic relationships among species, one usually analyzes an a priori known set of genetic markers, e.g., a set of single-copy, protein-coding, homologous genes. By targeting specific genes of interest, the TE method can be particularly relevant for phylogenetic studies. However, it has generally been thought that such reduction methods require good-quality DNA from fresh or properly stored tissue (Lemmon and Lemmon 2013, Jones and Good 2016). Regardless, TE methods have been used successfully on stored DNA extractions (Faircloth et al. 2012, McCormack et al. 2013), as well as museum specimens (Bi et al. 2013, Cruz-Dávalos et al. 2017, St Laurent et al. 2018).
Materials and Methods
Specimens were taken from the collection at the Zoological Research Museum Alexander Koenig (ZFMK, Bonn, Germany). We sampled 16 available species of Epicopeiidae, including at most four specimens per species. In addition, we sampled two species of Sematuridae (Anurapteryx interlineata and Mania empedocles) and two specimens of Pseudobistonidae (Pseudobiston pinratanai) to investigate the relationships between these three families. In total, 33 museum specimens collected between 1892 and 2001 were included (Table 1). The oldest sample is a Parabraxas davidi (Oberthür, 1885) specimen from 1892, whereas the most recent one is Parabraxas flavomarginaria (Leech, 1897) from 2001 (Table 1). We were not able to acquire samples of the genera Chatamla (Moore, 1881), Burmeia (Minet, 2002), Mimaporia (Wei and Yen, 2017), or Amana (Walker, 1855), or samples of Heracula discivitta, which was recently moved to the family Pseudobistonidae (Wang et al. 2019). Details for all the specimens included can be found on Zenodo (doi:10.5281/zenodo.3769000).
Number of raw recovered loci and selected loci per specimen
Sample Preparation and DNA Extractions
We used a semidestructive approach, i.e., we removed the abdomen for DNA extraction without grinding the tissue, thus preserving the genitalia for future preparation (Hundsdoerfer and Kitching 2010). Genitalia dissections are routinely done for Lepidoptera by boiling abdomens in KOH to remove soft tissue, thus destroying the DNA in the process, so our approach is less destructive than what is normally done. For large specimens (like Nossa or Epicopeia), the abdomen was cut in half above the genitalia to ensure that they fit inside 1.5-ml Eppendorf tubes. Abdomens were first soaked in 180-µl H2O, for about 5 min, to rehydrate tissues. Water was removed before starting DNA extractions. Samples were lysed at 56°C overnight shaking with 350 rpm (by using a thermomixer) for approximately 12–18 h. We used the DNeasy Blood & Tissue kit (Qiagen, Hilden, Germany) and followed the standard DNA extraction protocol for tissues, with the following modifications: we included an RNase-digestion step and eluted the DNA in Milliq water. Finally, DNA concentration of each sample was quantified using a Quantus Fluorometer (Promega, Madison, WI), and fragment lengths were measured with a Fragment Analyzer (Advanced Analytical, now Agilent Technologies Inc., Santa Clara, CA).
Library Preparation, TE, and Sequencing
There is still no consensus on how the DNA in museum specimens is best accessed. Here we used TE, a genome reduction approach. TE methods use probes, designed to target specific regions of the genome (Breinholt et al. 2018, Toussaint et al. 2018, Espeland et al. 2019). In the case of phylogenetic studies, this approach has the main advantage to recover exactly the loci of interest, and as long as a probe kit exists for a group, no previous knowledge about the genomes of the group of interest is required. This approach follows three major steps: 1) bait design, 2) libraries preparations and sequencing, and 3) filtering and processing of the data.
Regarding the bait design, new genes were selected and added to the Butterfly1.0 kit by Espeland et al. (2018). Mayer et al. (2021) designed hybrid enrichment baits with BaitFisher software version 1.2.8 (Mayer et al. 2016). A bait length of 120 bp was specified with a clustering threshold of 0.15, and a tiling design of 3 baits per bait region with an overlap of 60 bp for two consecutive baits resulting in bait regions with a total length of 240 bp. Individual coding sequences (CDS) from Danaus plexippus (Linneaus), Melitaea cinxia (Linneaus), Heliconius melpomene (Linneaus), Papilio glaucus (Linneaus), Plutella xylostella (Linneaus), Bombyx mori (Linneaus), and Manduca sexta (Linneaus) were used as references. The LepZFMK1.0 kit includes 2,954 probe regions in different CDS regions belonging to 1,754 genes and is compatible with BUTTERFLY1.0 (Espeland et al. 2018) and partially compatible with LEP1 (Breinholt et al. 2018). For more details on the kits, see Mayer et al. (2021). In many cases, multiple exons of single genes were targeted when they were long enough.
Library preparation was performed at the Zoological Research Museum Alexander Koenig (Bonn, Germany). Most of our samples contained less than 100-ng genomic DNA, which is the needed concentration according to standard protocol, but we included them anyway. With the Fragment Analyzer we found that many fragments of our samples were around 140 bp; therefore, no fragmentation was necessary for these samples. Other samples with higher quality and longer fragments were fragmented with Bioruptor PICO sonicator (Diagenode, Seraing, Belgium) to obtain DNA fragments with an approximate length of 350 bp.
We repaired the DNA with NEBNext FFPE DNA Repair Mix (NEB, Ipswich, United Kingdom), following the manufacturer's protocol. We purified the reactions with Agencourt AMPure XP beads with a ratio of (1:3). We quantified the resulting libraries with Quantus Fluorometer (Promega) and quality checked with a Fragment Analyzer (Advanced Analytical, now Agilent Technologies Inc.).
We proceeded to the enrichment and captured steps with the Agilent SureSelect XT2 protocol, with additional modification following Bank et al. (2017). Enrichment and sequencing were done at StarSEQ GmbH (Mainz, Germany) on Illumina Nextseq 500 Systems with a read length of 150 bp. Exons found in at least 20 of the 33 specimens (with an average of 254 loci per specimen) were used for downstream phylogenetic analyses (Table 1). Sequencing data is available at the NCBI under Bioproject PRJNA684488.
Data Clean up and Assembly
Reads were trimmed with fastq-mcf (Aronesty 2011) using default parameters to remove adapters and low-quality regions. Data cleaning and assembly was done using the iterated bait assembly (IBA) pipeline (Breinholt et al. 2018) with default parameters, except that the paired gap length was set to 100 (-g 100). Genomic sequences of the target regions from D. plexippus were used as a reference for the IBA pipeline. In brief, reads similar to the reference sequence were identified with USEARCH (Edgar 2010) and assembled with Bridger (Chang et al. 2015). The resulting assembly was then used as a reference sequence for another run of USEARCH, and this process was repeated three times.
List of the 19 available transcriptomes and genomes added to this study
The loci were aligned using the FFT-NS-i algorithm with two iterations in MAFFT v.7 (Katoh and Standley 2013) prior to phylogenetic analyses. Alignments were trimmed to the probe regions by using TrimAl (Capella-Gutierrez et al. 2009), with the options ‘-gapthreshold’ and ‘-conserve’. These commands were implemented to remove gaps. The alignment cleanup was performed with HmmCleaner (Di Franco et al. 2019), which allows the detection and removal of primary sequence errors in multiple alignments. We used the commands ‘-costs’ and ‘--noX’ and defined the four costs as follows: –0.15, –0.08, 0.15, and 0.45. We subsequently manually checked for frame shifts, gaps, and codon positions. Finally, alignments containing less than 20 samples (excluding references) were discarded from the downstream analysis. The final filtered data set consisted of 378 genes.
Screening Available Genomes and Transcriptomes
Additional 19 taxa were added to our data set by mining available genomes and transcriptomes, including one epicopeiid and one sematurid (Table 2). Twelve of the transcriptomes were from the superfamily Geometroidea, the remaining ones were from other macroheteroceran superfamilies. Raw reads were downloaded from the NCBI Sequence Read Archive (Leinonen et al. 2011). Reads were first processed to remove low-quality regions (Q < 30), adapters and homopolymer stretches using Cutadapt 1.4.1 (Martin 2011; minimum read length 50 bp) and Prinseq 0.20.4 (Schmieder and Edwards 2011), respectively. De novo assembly was carried out with Trinity 2.0.6 (Grabherr et al. 2011, Haas et al. 2013), with default parameters, including a minimum contig length of 100 bp and a minimum kmer coverage of 5.
Identification of the 378 genes was carried out with a BLAST approach. A reference sequence set was created from the TE alignments from one representative per gene. A tblastn (Gertz et al. 2006) search of the reference set against the transcriptomes (e-value threshold 10e-5) was carried out. The resulting BLAST output was used to extract the coding regions from each assembly using a set of open access python scripts from Dr. C. Peña (PyPhylogenomics, https://github.com/carlosp420/PyPhyloGenomics). The extracted sequences were aligned to the existing alignment with MAFFT 7.266 (Katoh and Standley 2013) using the ‘add fragments’ and ‘auto’ options, to preserve existing gaps in the alignment and choose the most appropriate alignment strategy, respectively. The resulting alignments were manually screened to ensure accurate alignment and frame preservation.
To partition our data set, we calculated the relative rates of evolution for each site in the alignment using TIGER (Cummins and McInerney 2011) and created partitions using the RatePartitions algorithm (Rota et al. 2018). We tested a range of d values (1.1, 1.5, 2.0, 3.0, and 4.0), which affects the number of partitions, and calculated the Bayesian information criteria (BIC) values for each partitioning scheme in PartitionFinder2 (Guindon et al. 2010, Frandsen et al. 2015, Lanfear et al. 2017). The partitioning scheme with the highest BIC value was found for d = 2.0, which resulted in 14 subsets.
Using the optimal partitioning scheme, we inferred the phylogenetic relationships with IQ-TREE 1.6.10 (Nguyen et al. 2015, Chernomor et al. 2016) under the maximum likelihood (ML) criterion. We used the model finding option in IQ-TREE (Kalyaanamoorthy et al. 2017) to find the optimal model for each partition. To investigate the robustness of our inferences, we used 1,000 ultrafast bootstraps (-bb; Hoang et al. 2018) and 1,000 replicates for SH-aLRT (-alrt; Guindon et al. 2010), which is the minimum recommended number.
We recovered a total of 2,131 raw loci. From our total of 33 specimens, two (6%) provided no data: Nossa nelcinna (S3) and P. davidi (S21). Six specimens provided only 1–12 raw loci (18%); for 16 specimens, we obtained between 150 and 1,000 loci (48%); finally, nine specimens gave more than 1,000 loci, with a maximum of 1,383 loci recovered (27%; Table 1, Fig. 2).
There is a positive correlation between the collection date of the specimens and the number of recovered loci (rho = 0.46, P = 0.008; Fig. 2). As expected, the younger a specimen is, the more loci we can recover from it. However, there is a lot of variation, meaning some recently collected specimens can give fewer loci than specimens collected a long time ago. This is, e.g., the case in two specimens of P. davidi. We recovered 516 raw loci from the older of the two, collected in 1892, whereas the more recent one (1957) provided only a single raw locus.
We obtained on average 254 loci and a median of 353 loci per specimen (Table 1). For our phylogenetic analyses, we first used all the 31 specimens that produced some data, including the 6 from Mayer et al. (2021). The samples Epicopeia philenora (S57) and Nossa palaeartica (S5) appeared to be contaminated as their phylogenetic position in preliminary analyses were highly doubtful, and thus they were excluded from the rest of our analyses.
Our final data set comprised 37 species, including 20 species sequenced for this study and 17 outgroup species with published transcriptomes. The data matrix included 378 nuclear loci (327 genes), for a total alignment of 134,881 base pairs. The average length of the 378 loci involved in this study is 367 bp.
Model Selection and Phylogenetic Analyses
The ML analyses for the different models tested gave the same phylogenetic relationships, and there were no conflicting nodes. The taxon data set, extended with 17 outgroup species, analyzed in IQ-TREE resulted in a highly supported ML tree (Fig. 3). We also performed the same phylogenetic analyses where we excluded specimens with less than 10 loci, and we obtain the same topology ( Supp Material 1 [online only]), indicating that the necessarily somewhat limited data recovered from old specimens are of sufficient quality for phylogenetic analysis. Although our data set gave strong support for many of the branches, the relationships among the Noctuoidea, Bombycoidea, and Geometroidea were weakly supported.
The monophyly of Epicopeiidae is strongly supported, and the sister group is Pseudobistonidae, with Sematuridae being sister to these two, also with strong support (SH-like = 100, UFBoot = 100). Within Epicopeiidae, almost all relationships are strongly supported, with the exception of the position of Schistomitra funeralis (SH-like = 54.8, UFBoot = 81). Relationships of genera are congruent with Minet (2002) and Zhang et al. (2020), i.e., Deuveia is sister to the rest of Epicopeiidae, with Psychostrophia branching off next, then Schistomitra, and finally Parabraxas being sister to a clade containing paraphyletic Epicopeia and Nossa (Fig. 3).
Within genera, species for which two or more individuals were included were mainly monophyletic, with the exception of Epicopeia hainseii and Epicopeia polydora, which were intermixed in a clade with very short branches (Fig. 3). The branch leading to P. davidi has weak support values (66.1/94), and this species appears to be genetically very closely related to P. flavomarginaria. In addition, Nossa moorei is not genetically differentiated from Nossa nagaensis, while being morphologically very similar (Fig. 3).
Within Epicopeiidae, our results strongly support and are almost entirely congruent with the relationships suggested by Minet (2002) and Zhang et al. (2020) and thus highly incongruent with the results of Wei and Yen (2017). We find Deuveia to be sister to the rest of Epicopeiidae, with the monophyletic Psychostrophia being sister to the rest of all taxa excluding Deuveia (Fig. 3). Wei and Yen (2017) found Parabraxas to be sister to Psychostrophia, but our results place Parabraxas in a clade with Schistomitra and (Epicopeia + Nossa) with strong support.
The position of S. funeralis (which has 131 loci in our dataset) was incongruent with the hypothesis by Minet (2002), but with low support. In our study, we found Schistomitra to be the sister group of Parabraxas + (Epicopeia + Nossa) (Fig. 3), whereas Minet (2002) found it to be the sister group of Epicopeia + Nossa (Fig. 1), and Zhang et al. (2020) found it to be sister to Parabraxas + Chatamla. Wei and Yen (2017) found it to be sister to Chatamla + the newly described genus Mimapora, and this clade to be closer to Parabraxas + Psychostrophia than to Epicopeia + Nossa (Fig. 1). However, we are not able to confidently resolve the relationships of Schistomitra, Parabraxas, and (Epicopeia + Nossa). Our study does not include the taxa Amana, Chatamla, or Mimapora, which are all potentially related to Schistomitra and Parabraxas (Minet, 2002). All four genera, Schistomitra, Amana, Chatamla, and Mimapora, are currently being considered to be monotypic, and their relationships based on morphology are somewhat enigmatic (Minet 2002, Wei and Yen 2017). Zhang et al. (2020) did include all four genera, and they were able to resolve their phylogenetic positions with confidence.
As in Zhang et al. (2020), we find that Nossa and Epicopeia are paraphyletic with regard to each other. Indeed, E. philenora appears to be the sister group to N. moorei and N. nagaensis, whereas N. palaeartica comes out as related to E. hainseii and E. polydora. Furthermore, these relationships are well supported. Minet (2002) also finds the two genera to be closely related and sharing six apomorphic character states, despite being superficially quite distinct with Epicopeia species tending to mimic papilionids, and Nossa species tending to mimic pierid species (Fig. 3). Clearly, these two genera need to be studied in more detail by including all 12 described species. It is possible that the genera should be synonymized, in which case Epicopeia would have priority. Also, we found E. hainseii and E. polydora to be genetically inseparable based on our dataset. In contrast, Zhang et al. (2020) find these two taxa to be completely separate, with E. polydora being sister to N. moorei, in a similar position to our E. philenora. Zhang et al. (2020) did not sample E. philenora, but E. polydora and E. philenora are morphologically very similar, suggesting that our sequences may be contaminants.
The paraphyly of Epicopeia and Nossa is surprising. These two genera are morphologically superficially very different, with Epicopeia species showing distinct tails on the hindwings, whereas Nossa species lack these tails. Indeed, Minet separated these two genera on morphological characters, including their genitalia (Minet 2002). However, one should keep in mind that Epicopeia are mimicking species of butterflies in the genera Papilio and Byasa (that have tails on the hindwings), whereas Nossa is thought to mimic species of Pieridae (that do not have tails; Wei and Yen 2017, Zhang et al. 2020). It has been considered that mimicry might be one of the causes for the rapid divergence of phenotypes (Turner 1976, Counterman et al. 2010, Kozak et al. 2015). Thus, in further work, we need to investigate this aspect by including more species and individuals of Epicopeia and Nossa.
Within Epicopeiidae, specimens with few loci explain most branches with low support (the exception being Schistomitra described above). When we removed the four specimens with less than 10 loci (see Table 1) from our analyses, the relationships do not change, while the support greatly improved to reach the maximum value of 100/100 on some branches, like for N. moorei and N. nagaensis, or for the relationships between Epicopeia hainseii and E. polydora ( Supp Material 1 [online only]). This would indicate that specimens with few loci are only affecting the support values, but not the general topology.
Here we obtain strong support for the hypothesis that Sematuridae is the sister group of Epicopeiidae + Pseudobistonidae. Even with few representatives for Sematuridae and Pseudobistonidae, the support for this hypothesis is compelling (100/100) and in line with previous studies (Rajaei et al. 2015, Kawahara et al. 2019, Wang et al. 2019). Furthermore, we confirmed that Epicopeiidae is monophyletic with regard to Pseudobistonidae, strengthening the case for the latter family.
The first attempt to resolve the position of Pseudobistonidae was made when the family was described by Rajaei et al. (2015) to accommodate P. pinratanai. Rajaei et al. (2015) found the family to be the sister group of Epicopeiidae. Recently, the position of Pseudobistonidae was corroborated with the addition of another species in the family: H. discivitta (Wang et al. 2019). However, Wang et al. (2019) only included three Epicopeiidae and two Sematuridae species. Furthermore, the support for the branches leading to these three families was quite low, e.g., the branch supporting Sematuridae as the sister group of Epicopeiidae + Pseudobistonidae had a bootstrap value of 33. Zhang et al. (2020) include Heracula in their dataset and find it to be sister to Epicopeiidae with strong support; thus, it would appear that Pseudobistonidae is indeed the sister lineage to Epicopeiidae, with Sematuridae being sister to these two.
Old Material and Contamination
We see a tendency for old museum specimens to yield fewer loci than the more recently collected ones (Fig. 2). Overall, the older a specimen is, the lower the chances are to get DNA out of it with the TE approach. Nevertheless, some old specimens provide more loci than younger ones. For instance, for the two specimens of P. davidi, the older, collected in 1892, provided 516 raw loci, whereas the younger, collected in 1957, provided only a single raw locus. There is no clear explanation for these kinds of outliers, but they might be due to different treatments during their curation (Espeland et al. 2010, Burrell et al. 2015, Vaudo et al. 2018). Unfortunately, neither a proper record of this kind of treatment nor how specimens have been collected and curated are usually available, making it impossible here to infer what other factors than age can affect the quality of DNA. Regardless, even if the tendency is, as expected, that older samples have less and poorer DNA quality, it remains a trend. Therefore, we should not discount these specimens just because they are old, as they can still turn out to be real genetic treasure troves.
Unfortunately, two specimens were definitely contaminated, E. philenora (S57) and N. palaeartica (S5), and therefore were not analyzed further. If they had been of good quality, they could have helped us to confirm the position of E. philenora in the case of S57, as well as the separation of Nossa in two groups with N. moorei + N. nagaensis on one side and N. palaeartica (S5) on the other side. In addition, our E. polydora specimens were found to be genetically identical to E. hainseii, in stark contrast to Zhang et al. (2020). One of our specimens (S47) yielded 1,270 raw loci (Table 1), suggesting large amounts of DNA in the extract. The two species cannot be confused morphologically (see doi:10.5281/zenodo.3769000). Clearly, this needs to be investigated in more detail, but for the moment, we do not have a good explanation for these results.
The Importance of Museomics
Since their creation, natural history museums have been an essential source of biological knowledge and resources for both the scientific community and the public (Duckworth et al. 1993, Suarez and Tsutsui 2004). These collections of biological specimens are vital for the study of systematics, global climate change research, biological invasion studies, as well as for many other scientific disciplines (Bi et al. 2013, Bradley et al. 2014, Bakker et al. 2020). Curated specimens in museums have several advantages compared with collecting fresh specimens; they can be easy to access, most of them are identified, and often possess information such as the date of collection and the location. Moreover, nowadays, researchers in biology and ecology face many challenges before being able to sample in the field. These issues can be monetary (e.g., lack of funding), stochastic events (inaccessibility of species of interest, adverse weather conditions, pandemic etc.), but also administrative difficulties, with bureaucratic hurdles being erected at an increasing pace (Neumann et al. 2018). Natural history museums also contain extinct taxa, rare and challenging to collect species, which can be a crucial asset to studies. However, until recently, this vast amount of biological resources was mainly used for morphological studies because the DNA from these specimens was thought to be too degraded to be used for molecular studies (Shapiro and Hofreiter 2012). Due to this, DNA work has for a long time mainly been limited to species for which freshly collected samples could be obtained, whereas DNA work from collections has been limited to sequencing short fragments DNA (Hajibabaei et al. 2006, Lozier and Cameron 2009, Strutzenberger et al. 2012, Hebert et al. 2013, Cameron et al. 2016).
We have taken advantage of recent advances in sequencing technologies, which have opened up access to genomic data of museum specimens. Within the past few years, various studies emerged applying these methods on a wide variety of species: from birds (Anmarkrud and Lifjeld 2017, Cloutier et al. 2018) and mammals (Fabre et al. 2014, Hawkins et al. 2016) to insects (Kanda et al. 2015, Sproul and Maddison 2017), and plants (Zedane et al. 2016, Silva et al. 2017). Part of these studies used whole-genome sequencing (Kanda et al. 2015, Zedane et al. 2016, Sproul and Maddison 2017, Cloutier et al. 2018), whereas the others employed diverse genome reduction methods, such as exon capture (Bi et al. 2013) and TE (Hawkins et al. 2016). Although these studies used different kinds of sequencing methods, they focus on very distinct scientific questions: from systematics (Silva et al. 2017), to the origin and diversification of a taxon (Fabre et al. 2014), to population genomics (Bi et al. 2013).
Here, we used a genome reduction method, TE, on curated museum specimens of rare and challenging to collect moth species, to refine our knowledge of their phylogenetic relationships. We managed to recover on average 566 nuclear loci per species using the TE method. The present study also shows that it is possible to extract substantial amounts of DNA sequence data from specimens collected up to 127 yr ago. Hence, our study contributes to the field of museomics, demonstrating the application of this sequencing method on museum specimens, increasing the value of such specimens even further. Museomics opens a window to the past, providing possibilities for testing new hypotheses and for casting new light on old ones.
In summary, we conducted a phylogenetic analysis on small and rare families of Lepidoptera, using museum specimens. We successfully sequenced samples that were collected between 1892 and 2001. By utilizing a TE approach, we were able to recover between 150 and 1,383 loci per specimen for 75% of our samples. From all these raw loci, we used 378 genes—present in at least 20 samples—to reconstruct a phylogenetic hypothesis based on ML analysis of 37 taxa. This analysis corroborates, with strong support, the hypothesis that Sematuridae are the sister group of Epicopeiidae + Pseudobistonidae. Within Epicopeiidae, our study finds Deuveia as sister group of the rest of Epicopeiidae genera. The position of Schistomitra is incongruent with the central hypothesis suggested by Minet (2002) for this family; however, the support for this branch is low. The low support for this branch might be explained in our study by the lack of some genera (Amana, Chatamla, and Mimapora). Indeed, these taxa may help to clarify the phylogenetic position of Schistomitra, as seen in Zhang et al. (2020). Although we showed that Psychostrophia and Parabraxas are monophyletic, we also found that Nossa and Epicopeia are paraphyletic. Overall, the genera of Epicopeiidae require more work to reveal their phylogenetic relationships.
Museum collections represent a varied and essential biobank of samples for studying the diversity on earth. The availability of specimens, not only rare but also extinct, within worldwide museum collection is a fantastic asset. Nowadays, sequencing techniques are powerful enough to allow scientists to recover DNA from old museum specimens. This is the beginning of an exciting era for molecular studies. Our study makes its contribution to the field of museomics by successfully demonstrating that researchers can use museum samples at a molecular level for phylogenetic studies. Consequently, this study is paving the way for more molecular work using museum specimens.
We are thankful to Claudia Etzbauer for help with ordering the kit and to Sandra Kukowka for assistance in the molecular lab. We highly appreciate the effort of everyone depositing samples at the ZFMK. The study was funded by the Zoological Research Museum Alexander Koenig, and received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant Agreement No. 6422141.