The number and delimitation of species in the Phyciodes tharos species complex has puzzled lepidopterists for years. Previous analysis of mtDNA sequence data has suggested that P. cocyta is more closely related to P. cocyta than to P. tharos, in contrast to inferences from morphology and ecology. We sequenced the mitochondrial gene cytochrome oxidase I for 40 individuals of Phyciodes tharos, P. cocyta, and P. cocyta from Michigan and Ohio, a region at the boundaries of the geographic ranges of these species. Network and cladistic analyses reveal shared mtDNA lineages, indicating that limited hybridization occurs in this region between P. cocyta and the other two taxa but not between P. tharos and P. cocyta. Our evidence also supports the traditional phylogenetic assessment of P. tharos and P. cocyta as the two most closely related species in this species group. Data from nuclear genes are needed to more fully resolve this intriguing group of butterflies.
Papilionoidea, the true butterflies, are no doubt the group of invertebrates most well-known to science—thousands of studies over hundreds of years having been published, especially on their ecology and evolution (Boggs et al. 2003). The taxonomy of butterflies is especially well known, with most of the worldwide taxa already described—a huge feat considering the size of the group (Ackery et al. 1999). Yet there still remain many groups of butterflies for which species boundaries remain unclear. The genus Phyciodes , comprised of ten currently recognized species (Pelham 2008) with mainly Nearctic distributions, is one such group.
Within Phyciodes, the P. tharos species group (P. tharos, P. cocyta, P. cocyta, and P. pulchella) has been especially interesting due to the species' phenotypic similarity, variability in diagnostic characters, and (with the exception of P. cocyta and P. cocyta) largely parapatric ranges with broad swathes of sympatry (Fig. 1) (Scott 1994). Phyciodes tharos and P. cocyta are of particular interest, having long been considered conspecifics. Oliver (1980), however, based on extensive breeding experiments that revealed a degree of hybrid breakdown between various populations of Phyciodes tharos, distinguished two entities which he called tharos Type A and tharos Type B. Scott (1994) assigned the name “cocyta” to Oliver's “tharos Type B” and raised it to full species status based on a few morphological characters and sympatry with P. tharos in some areas, despite incomplete reproductive isolation in other areas. Phyciodes batesii and P. pulchella are more easily diagnosable, although extensive hybridization between P. cocyta and P. cocyta has also been observed in Colorado and Utah (Scott 1994, 1998). Recent authors have largely followed the lead of Scott (1994) in recognizing four distinct species in the P. tharos group, although agreement is by no means universal, especially as to whether P. cocyta should not be a subspecies of P. tharos (e.g. Glassberg 1999).
The advent of molecular analysis of DNA has been a great boon to butterfly systematics, augmenting knowledge of morphology and life histories (Sperling 2003). Wahlberg et al. (2003) conducted the first phylogenetic analysis of Phyciodes using DNA, with a parsimony analysis of 1450 base pairs of the mitochondrial gene cytochrome oxidase I (COI) from 140 Phyciodes specimens representing all ten species from across North American and Mexico. Wahlberg et al. (2003) attempted to sample specimens widely in order to capture as much geographic variation as possible. They found that the tharos species group formed a well-supported monophyletic clade (with the exception of two specimens). Within the clade they found that P. tharos was the most basal of the four species, which confirmed the suggestion of Scott (1994) based on genital and pupal characters. Surprisingly, though, they found that the haplotypes of P. cocyta and P. cocyta were interdigitated, grouping together to form several clades paraphyletic with respect to P. pulchella. The inference that P. cocyta is most closely related to P. cocyta and sister group to P. pulchella contradicts the morphological and ecological evidence that P. cocyta is most closely related to P. tharos.
In this paper we re-investigate the relationships between P. tharos, P. cocyta, and P. cocyta. In contrast to Wahlberg et al. (2003), our approach was to sample P. tharos, P. cocyta and P. cocyta from a relatively small geographic area: the lower peninsula of Michigan and northwest Ohio, where the ranges of all three species coincide (Fig 1). This is a region not sampled by Wahlberg et al. (2003); their closest specimens to our study area were collected from Carleton Co., Ontario. Since there is known to be incomplete reproductive isolation between these three species (Scott 1994), this area is a particularly interesting region to investigate the question of which species is most closely related to P. cocyta, and to see whether evidence of gene flow between species can be observed.
Materials and Methods
Sampling and molecular techniques. Specimens of Phyciodes were collected between 06 June and 19 June 2006 throughout the northern part of the lower peninsula of Michigan. Since only P. cocyta and P. cocyta were found in Michigan, P. tharos was collected in Ohio on 29 and 30 June 2006. All specimens were spread while fresh and are now stored at the Hillsdale College Insect Collection, Hillsdale, Michigan. Approximate collection localities are displayed in Fig 1; complete collection data are presented in Appendix 1. Identification to species level was made after examination of the color of the antennal clubs, the extent and pattern of reticulation and dark pigmentation on the dorsal wing surfaces, extent of dark coloration on the hindwings below, and overall size (Nielsen 1999; Scott 1986). In our estimation, none of the specimens from which reliable sequence data was extracted had a doubtful identification based on these characters.
Two legs were removed from each fresh specimen for DNA extraction, either immediately after capture or after several months storage in ethanol in refrigeration. Genomic DNA was extracted from the two legs of each specimen using the Qiagen DNeasy extraction kit. Two primer pairs were used to amplify 1450 base pairs of the COI gene: LCO1490-J-1514 (5 ' G G T C A A C A A A T C A T A A A G A T A T T G G ) and H C O 2 1 9 8 - N - 2 1 7 5 (5'TAACTTCAGGGTGACCAAAAAATCA) (Folmer et al. 1994), and C 1 - J - 2 1 8 3 (5'CAACAYTTATTTTGATTTTTTGG) and TL2-N-3014 (5'ATCCATTACATATAATCTGCCATA) (Simon et al. 1994). These primers were chosen based on their previous successful amplification in Phyciodes (Wahlberg et al. 2003). All COI fragments were amplified with standard polymerase chain reaction (PCR) techniques. PCR products were monitored for yield, specificity, and contamination using agarose gel electro¬phoresis and cleaned with ExoSAP-IT. PCR fragments were sequenced with a dye terminator cycle sequencing kit and an Applied Biosystems 3130 Genetic Analyzer. Only one strand of each fragment was sequenced, using the forward primer. Sequence quality was assessed by the Applied Biosystems software Sequencing Analysis 5.2 and by visual inspection of the chromatograms. Areas of poor sequence quality, such as the center of the COI gene where the fragments of the two sequencing runs overlapped, were trimmed. Sequences of entirely poor quality were discarded. Forty sequences of 1319 nucleotide characters were ultimately retained.
Data analyses. Seventy-eight COI sequences of Phyciodes tharos, P. cocyta, P. cocyta, P. pulchella, and P. phaon from Wahlberg et al. (2003 (accession nos. AF187747, AF187785, AF187789, AF187798, AF187800, AF187783, AF187807, AY156595-AY156686), were downloaded from GenBank. Those, together with our 40 novel sequences, were aligned in Mesquite v. 2.6 (Maddison & Maddison 2009) using Clustal W v. 2.0.9 (Larkin et al. 2007) with default settings. Alignments were manually adjusted using the Align package in Mesquite (Maddison et al. 2007).
Our 40 novel sequences were opened with SplitsTree4 v. 4.10 (Huson ↦ Bryant 2006) and a split network generated with Kimura-2 parameter (K2P) distance according to the Neighbor-Net method (Bryant & Moulton 2004); all other settings were set to default. Robustness of the splits was assessed with 1000 bootstrap repetitions.
In order to compare directly our data with Wahlberg et al. (2003), a phylogenetic tree was generated using the maximum-likelihood method implemented in Garli 1.0 (Zwickl 2006). We used the HKY+I+G model of evolution, which was selected by ModelTest 3.7 (Posada ↦ Crandall 1998) as the most likely to fit our data according to the AIC criterion. During the analysis, Garli was allowed to estimate rate parameters, base frequencies, and proportion of invariable sites. Fifteen search replicates were performed to find the best tree of score -4897.3347. Two hundred bootstrap replicates were performed in order to estimate branch support. The tree was rooted with the three P. phaon sequences and two aberrant P. pulchella sequences (47–6 CA3 and 49–13 CA3).
Network analysis. Our 40 Phyciodes sequences are represented by a split network in Fig. 2. A split network can be thought of as a generalized combination of many phylogenetic trees; in fact, phylogenetic trees are a sub-category within split networks. With any data set, such as DNA sequences, any given character can be used to split a set of taxa into separate groups. For example, in this matrix
the first column splits [w,x] from [y,z], the second column splits [x] from [w,y,z], the third column does not split any taxa, because it is constant, and the fourth column splits [w,y] from [x,z]. Any set of splits can be compatible or incompatible. A phylogenetic tree is a visual representation of a set of compatible splits; a split network can represent a set of splits whether compatible or incompatible. When all splits are compatible, they can each be represented graphically by a single branch: this is a phylogenetic tree. With most real data sets, however, this is not the case. Incompatible splits in a split network are represented by multiple parallel branches (Huson & Bryant, 2006). Hence a split network can incorporate a measure of uncertainty within a data set. The network in Fig. 2 was constructed by the Neighbor-Net method, in which the data matrix used to generate the split network is a matrix of the genetic distances between taxa.
The sequences in Fig. 2 separate into two clusters with a large relative distance between them, supported by 100% bootstrap support (not shown). All of the five P. tharos are in one cluster, all of the 10 P. cocyta are in the other, and the 25 P. cocyta are distributed among both (20 with the P. tharos grouping and five with the P. cocyta grouping). There is no one set of splits that encompasses one species exclusively. Of the five P. cocyta that group with the P. cocyta, two are two of the three P. cocyta from Otsego Co.—the location where most of the P. cocyta in our study were collected, and the only place we sampled where P. cocyta was the most commonly encountered Phyciodes species. Four of the five P. tharos sequences are included in a group of six haplotypes separated from the rest by a single set of splits (Fig. 2). Of the two P. cocyta sequences that complete that group, one is the lone P. cocyta specimen from Ionia Co., by far the closest geographically to the location where the P. tharos specimens were collected (Fig. 1D).
Phylogenetic tree analysis. When we constructed a maximum-likelihood tree from our 40 novel sequences in addition to the 78 Phyciodes sequences from Wahlberg et al. (2003), the branching pattern and branch support closely resemble the parsimony tree found in Wahlberg et al. (2003) (Fig. 3). All the clades marked with capital letters by Wahlberg et al. (2003) as being important for discussion in their tree are also found in our tree, and with comparable branch supports. The phylogeny appears robust to the addition of new sequences and to the method of phylogenetic inference, parsimony or maximum likelihood. The topology of the tree can be summarized as three principal clades: a “tharos” clade (clade B), a “cocyta/batesii” clade (clades C, D, E), and a “pulchella” clade (clades F – H). All five of our novel P. tharos sequences and 20 of our 25 novel P. cocyta sequences clustered within the “tharos” clade; the remaining five P. cocyta sequences and all ten of our novel P. cocyta sequences clustered within the “cocyta/batesii” clade (Table 1).
Number of haplotypes of three species of Phyciodes found in certain clades of the phylogenetic tree generated from the data of Wahlberg et al. (2003), the number of haplotypes of each species from our data added to those clades when our data was analyzed with that of Wahlberg et al. (2003), and the total number of haplotypes of each species in those clades (see Fig. 3).
Our results in Fig. 2 suggest that: 1. mitochondrial introgression may have taken place between P. cocyta and P. cocyta and between P. cocyta and P. tharos, and 2. that P. cocyta in our area of study is more closely related to P. tharos than to P. cocyta. The first inference is supported by the fact that none of the three species are separated into an exclusive cluster. It is also suggestive that of the two P. cocyta that cluster with the P. tharos sequences group A in Fig. 2, one is from Ionia Co., the closest location geographically to where all the P. tharos were collected. Similarly, two of the three P. cocyta collected from Otsego Co., where almost all of the P. cocyta were collected, cluster with the P. cocyta sequences despite three-quarters of the P. cocyta clustering at the P. tharos end of the figure. The second inference is supported because 15 of the 20 P. cocyta sequences are far closer to the P. tharos sequences than to the P. cocyta. Moreover, when our data are added to that of Wahlberg et al. (2003) (Fig. 3), the same 15 P. cocyta sequences, as well as the five P. tharos sequences, come out in the “tharos” clade (clade B; Table 1). The P. cocyta from our study area are not only more similar to the P. tharos than the P. cocyta from our study area, but from all around the continent.
Species that are recently diverged and closely related are expected to sustain some gene flow (Coyne & Orr 2004). As long as they maintain genetic integrity across their range, this does not compromise their status as good species (Sperling 2003). Gene flow between recently diverged species is not unusual and has been documented in a wide variety of animal taxa (e.g. Nosil 2008; Friar 2007). The members of the Phyciodes tharos species complex are likely just such recently diverged species. They diverged on the basis of adaptation to different ecological pressures (Oliver 1980), and differences in flight periods in response to environmental conditions was likely a key factor in the speciation (Oliver 1980; Scott 1994). At the geographic boundary between species' ranges (especially of parapatric species, like tharos and cocyta) where ecological pressures are similar on both, hybridization is particularly likely.
Hybridization and mitochondrial introgression may best explain the pattern in our data (Fig. 2). Similar patterns, where mitochondrial lineages correlate better with geographic than phylogenetic distances between populations, have been observed in other Lepidoptera (Schmidt & Sperling 2008). It may not be the best explanation for the non-monophyly of especially P. cocyta and P. cocyta in Wahlberg et al. (2003), although non-monophyly in mitochondrial gene trees has been demonstrated to be an indicator of mitochondrial introgression in various other taxa (Linnen ↦ Farrell 2007; Shaw 2002; Gompert et al. 2008). Nearly all of the P. tharos, P. cocyta and P. cocyta in Wahlberg et al. (2003) were sampled from or near areas where all three species are found and yet, unlike our data, different species collected from the same locality fell out in different parts of their tree.
Many species, especially recently diverged ones, are paraphyletic with respect to their gene trees due to gene introgression via hybridization or incomplete lineage sorting from variable ancestral populations (Maddison 1997; Funk ↦ Omland 2003). Wahlberg et al. (2003) explained the patterns of non-monophyly of the species in the P. tharos complex as largely due to incomplete lineage sorting, but with hybridization and mitochondrial gene introgression playing a role especially between P. cocyta and P. cocyta. Across the largely sympatric ranges of P. cocyta and P. cocyta (Fig. 1), there is only one place where morphology indicates hybridization may be currently occurring regularly between them: the Rocky Mountains of Utah and Colorado (Scott 1998). Elsewhere their phenotypes are quite distinct (Scott 1994, 1998). Confusion in species identification in the P. tharos species complex usually occurs when P. tharos and P. cocyta are confused, as in the case of Porter and Mueller (1998) whose conclusions in hybridization experiments between P. tharos and P. cocyta were challenged by a claim by J. Scott that they had misidentified a subspecies of P. cocyta as P. tharos (Wahlberg et al. 2003). Furthermore, hybridization and hybrid viability has been observed several times between P. tharos and P. cocyta, both in the laboratory and in the wild (Oliver 1980; Scott 1986b). Our data corroborate the conclusion of Wahlberg et al. (2003) that introgression has likely occurred between P. cocyta and both P. cocyta and P. tharos, but, in contrast to their conclusions, suggest that introgression between P. cocyta and P. tharos has been more widespread than between P. cocyta and P. cocyta, at least in our area of study.
The final word on the relationships between these butterflies has, of course, not yet been established. The keys to shedding more light on the questions in this species complex will be more intense sampling (e.g. Funk 1999) and the use of nuclear genetic markers. Numerous studies have shown the importance of sampling both mitochondrial and nuclear genes for phylogenetic analyses, since their different patterns of inheritance often lead to discordant gene trees (e.g. Berthier et al. 2006; Gomez-Zurita & Vogler 2003). Wahlberg and Freitas (2007) did analyze a mitochondrial gene and two nuclear genes for 11 species of the P. tharos species complex in a phylogeny of the Phyciodina tribe, but with ambiguous results: parsimony and Bayesian analyses of the combined genes gave very different arrangements of the P. tharos species complex. Until an analysis of nuclear genes can be incorporated into a much more intense sampling of the P. tharos species complex, we can only anticipate further elucidation of the complications in this fascinating group of butterflies.
We gratefully acknowledge and thank Dan York, for his supplying of material resources and funding and especially for his guidance in data collection and analysis; Niklas Wahlberg, for his early inspiration and support for undertaking this study; Beth Reiter, for her invaluable support and assistance in the laboratory; Owen Perkins and the Michigan Entomological Society, for providing information on collecting localities and dates; Mogens “Mo” Nielsen, for assistance with identifying specimens; and Felix Sperling, for advice and comments on drafts of this paper. This research was conducted at Hillsdale Colleges G. H. Gordon Biological Station near Luther, MI. It was funded by a Laboratory for Advanced Undergraduate Research Education Adapted for Talented and Extraordinary Students grant to BP, and Hillsdale College Biology Department funding to DCH and Dan York. This writing of this paper was supported by an NSERC Discovery grant to Felix Sperling, University of Alberta. This is paper #1 of the G.H. Gordon Bio-Station Research Series