We compared primary and secondary structures of V4 (helices E23-2 to E23-5) and V7 (helix 43) regions of 18S rRNAs in insects and the other three major arthropod groups (crustaceans, myriapods, and chelicerates) known so far. We found that the lengths of primary sequences and the shapes of secondary structures of these two hypervariable regions of insect 18S rRNA even at infraclass levels are phylogenetically informative and reflect major steps in insect evolution. The long sequence insertion and bifurcated shape of helices E23-2 to E23-5 in the V4 region are unique synapomorphic characters for winged insects (Pterygota). The long sequence insertion and expanded stem length of helix 43 in the V7 region are synapomorphic characters for holometabolous insects which conduct complete metamorphosis. The strongly conserved secondary structures suggest the possibility that these hypervariable regions may be related with certain important cellular functions unknown thus far. The comparison with insect fossil records revealed that the pterygote synapomorphy (V4) and the holometabolous synapomorphy (V7) were established prior to the acquisition of insect wings (flight system) and prior to the development of complete metamorphosis, respectively. These synapomorphies have been also relatively stable over at least 300 Myr and 280 Myr, respectively as well. It implies that the expansion events of the V4 and V7 regions have not occurred simultaneously but independently at different periods during the insect evolution. Then this suggests that V4 and V7 regions are not functionally correlated as recently suggested by Crease and Coulbourn.
INTRODUCTION
The slowly evolving property of 18S rRNA sequences has been widely used for phylogenetic studies among remotely related animal groups such as among phyla, classes, and orders. In the phylogenetic studies of major arthropods and related groups, a number of authors have also used primary sequence information from slowly evolving parts of the 18S rDNA (Carmean et al., 1992; Pashley et al., 1993; Campbell et al., 1994; Friedrich and Tautz 1995, Kim et al., 1996; Chalwatzis et al., 1996; Giribet et al., 1996; Friedrich and Tautz, 1997). The fast evolving regions (especially V4 and V7) have been excluded in those analyses because of difficulties in obtaining a reliable alignment, of difficulties in constructing an unambiguous secondary structure and of saturation of phylogenetic informations on nucleotide sequences due to multiple hit. Only since recently the fast evolving parts of 18S rDNA have been used for the phylogenetic studies among close relatives at lower categories such as the family level in tiger beetles (Volger and Pearson, 1996; Volger et al., 1997; Hancock and Volger, 1998).
We had previously determined the 18S rDNA sequences from a number of collembolan species (Lee et al., 1995a, b; Hwang et al., 1995). We found that the sequence length of the collembolan 18S rDNA is far shorter than those of dipteran insects and the differences of the sequence lengths are mainly caused by expansions of V4 and V7 regions in dipteran insects (Hwang et al., 1995). This finding made it possible to deduce that primary and secondary structures of these two variable regions may provide us with some critical information related to insect phylogeny and 18S rRNA evolution. Recently, Crease and Coulbourn (1998) have reported that the coordinated and perhaps functionally correlated increases occur between V4 and V7 regions of many arthropod 18S rRNAs.
The phylogenetic relationships among major subgroups of insects were relatively well documented on the basis of morphological and paleontological characters (Kristensen 1991, Kukalovà-Peck, 1991). In addition, insect phylogeny was also examined and discussed on the basis of molecular data such as the alignable sequences of 18S and 28S rRNAs (Chalwatzis et al., 1996, Whiting et al., 1997). Yet, in the molecular studies, variable regions of 18S rRNA have never been employed for phylogenetic studies on higher categorical levels (above Order) of insects so far. In this paper, we conduct comparative analyses of primary and secondary structures of two hypervariable regions, V4 and V7, of 18S rRNA. We suggest that these regions have independently evolved during the insect evolution and could provide phylogenetic informations in higher categorical levels (above Order) of insects.
MATERIALS AND METHODS
All arthropod 18S rRNA sequences accessible from EMBL data bank were retrieved and examined. At least one or more 18S rDNA sequences from most major insect orders have been published. Because primary and secondary structures of 18S rRNAs are similar within each insect order, one representative species in each insect order is selected (except for Collembola and Diptera; 3 and 2 species, respectively) and their sequence alignments and the secondary structures are presented in this paper. Two representative 18S rRNA sequences from the other major arthropod groups (crustaceans, chelicerates, and myriapods) are also shown as reference groups. However, the extreme cases of sequence expansion obtained from tiger beetles (Volger et al., 1997), strepsipteran species such as Xenos vesparum, Mengenilla chobauti, and Stylops melittae (Chalwatzis et al., 1995), pea aphid, Acyrthosiphon pisum (Kwon et al., 1991), the branchiopod crustacean Daphnia pulex (Crease and Colbourne, 1998), and the isopod crustacean Armadiliidium vulgare (Choe et al., 1999a) are not included in the alignment set due to the difficulties for constructing their stable secondary structures. We will discuss these exceptional cases in detail in another subsequent paper. The classification, the representative species names, EMBL accession numbers, and abbreviations of taxon names are listed in Table 1. The classification scheme is followed by Kristensen (1991) for the Hexapoda, by Brusca and Brusca (1990) for the Crustacea, and by Hickman et al. (1984) for the Chelicerata and the Myriapoda.
Table 1
List of representative arthropod species employed in this analysis and the abbreviations
The sequences of V4 and V7 regions of the 18S rDNAs from the 22 and 17 species (Table 1) respectively were aligned by Clustal X (Thompson et al., 1997). Then the alignments of the primary sequences of helices E23-2 to E23-5 of the V4 region and helix 43 of the V7 region were adjusted by the observation of compensatory substitutions in our predicted secondary structure model. The nomenclature of these helices is after Neefs et al. (1993) and the positions are indicated in the putative secondary structure of 18S rRNA of Hypogastrura dolsana (order Collembola) for convenience (Fig. 1A).
In the present study, our putative secondary structure of helices E23-2 to E23-5 in D. melanogaster was predicted by using helix E23-2 as an anchored pairing and was compared with two previously published secondary structure models by Rijk et al. (1992) and Hancock et al. (1988) (Fig. 1B). Our present model is quite different from that of Rijk et al. (1992) which is folded into three helices (E23-3, E23-4, and E23-5) from helix E23-2 (anchored pairings). However our model is rather similar to that of Hancock et al. (1988). Both models are folded into two helices and the base pairings of stems in these helices are completely the same, though the anchored pairings are absent and their nucleotides are involved in flanking stems or loops. Of the two helices in our secondary structure model (Fig. 1B), the right helix includes helix E23-5 of Rijk et al. (1992) and the left one is the same as the hypervariable region discussed in 18S rDNA of tiger beetles by Volger et al. (1997). The model recently revised by Peer et al. (1998) agrees well with our present model. The putative secondary structures of helices E23-2 to E23-5 and helix 43 were finally drawn by loopDloop secondary structure drawing software (Gilbert, 1992).
RESULTS
Primary structure analysis
The multiple sequence alignments of arthropod V4 and V7 regions are shown in Fig. 2. The stem and loop regions are indicated according to the Kjer method in the alignments (Kjer, 1995). In this multiple alignment, the sequence positions of V4 and V7 regions correspond to positions 643 to 855 and 1421 to 1608 of 18S rDNA of Drosophila melanogaster, respectively (Tautz et al., 1988).
The sequences of the helices E23-2 to E23-5 of the V4 region range from 49 bp to 51 bp (apterygote insects), from 72 bp to 120 bp (pterygote insects), from 46 bp to 48 bp (myriapods), from 47 bp to 55 bp (crustaceans), and from 46 bp to 49 bp (chelicerates). The longest and the shortest sequences are those of Philaenus spumarius (Order Hemiptera, Class Insecta, Subphylum Hexapoda) and Artemia salina (Order Anostraca, Class Branchiopoda, Subphylum Crustacea), respectively. The sequences of helices E23-2 to E23-5 in pterygote insects are longer than those of apterygote insects and other three major arthropod groups (chelicarates, crustaceans, myriapods). In pterygote insects, Orders of Dermaptera, Orthoptera, and Hemiptera have relatively long sequences compared to the other pterygote insects. In holometabolans of pterygote insects, D. melanogaster has 108 bp that is about 30 bp longer than those of the other holometabolous insects (Fig. 2A).
The sequences of helix 43 of the V7 region range from 52 bp to 59 bp long in apterygote insects, from 55 bp to 66 bp in hemimetabolous insects, from 86 bp to 155 bp in holometabolous insects, and from 49 bp to 53 bp in crustaceans. Those of Eurypelma californica, a chelicerate and Bothropolys asperatus, a myriapod are 50 bp and 54 bp long, respectively. The longest and the shortest are those of D. melanogaster (155 bp) and A. salina (49 bp), respectively. The helix 43 of P. spumarius (order Hemiptera) is 66 bp long and about 10 bp longer than those of the apterygotes and the paleopterans (Ephemeroptera and Odonata). The sequences of holometabolous insects are longer than those of apterygote insects, hemimetabolous insects (Paleoptera and Hemiptera), and the other three major arthropod groups (Fig. 2B).
Secondary structure analysis
The shapes of the secondary structures of the helices E23-2 to E23-5 are well conserved with tanxon-specific patterns as shown in Fig. 3 (A and B). Most of the secondary structures of E23-2 to E23-5 helices in pterygote look like bifurcated forms connected to the anchored pairings (helix E23-2) (Fig. 3B). These secondary structures indicate that pterygote insects have one more helix compared to apterygote insects (Fig. 3A) and the other three major arthropod groups (data not shown), both of which have one elongated helix connected to anchored pairings. Putative secondary structures of the helix 43 (V7 region) were predicted and compared as shown in Fig. 3 (C and D). The stem lengths of helix 43 of holometabolous insects (Fig. 3D) are much longer compared to those of hemimetabolous insects, apterygote insects (Fig. 3C), and the other major three arthropod groups (data not shown).
DISCUSSION
We had previously reported that two dipteran insects (D. melanogaster and Aedes albopictus) are clearly distinguished from collembolan insects by their longer sequences in the V4 region (61–76 bp) and in the V7 region (89–104 bp). We also suggested that the comparison of more insect 18S rDNA sequences would give a significance to this apparently taxon-specific pattern (Hwang et al., 1995). In our present study, it revealed that these two expanded/deleted regions corresponded to the helices E23-2 to E23-5 in the V4 and helix 43 in the V7 regions, respectively. Our present multiple alignments derived from most of the major orders of insects reconfirm that in insect groups, expansions of 18S rRNA appear to be taxon specific mainly in these two hypervariable regions, V4 and V7.
Wheeler (1989) suggested that the ectognathous insects (or perhaps pterygote insects) are defined by a medium-sized sequence insertion in 18S rDNA on the basis of their size variation patterns digested by restriction enzymes XbaI/EcoRI. However, because his analysis did not include Thysanura and Archaeognatha, it was not clear if the observed insertions would define ectognathous insects or pterygote insects. Moreover, due to the limitations of the experiment, it was impossible to determine in which region of the 18S rDNA the insertions appeared. Our present analysis reveals that the sequence length differences observed are caused by the expansions of the helices E23-2 to E23-5 in pterygote insects and of the helix 43 in holometabolous insects rather and not in ectognathous insects as a whole.
When Rijk et al. (1992) and Neefs et al. (1993) listed and named helices specific to eukaryotes, they included helices E23-2, 3, 4, and 5 in the class Insecta (Fig. 1B). At that time, just a few 18S rRNA sequences were available in insect groups and they were limited only to holometabolous insects (e.g. Apis mellifera, Tenebrio molitor, and a couple of Drosophila species). Our present analysis of secondary structure conducted with relatively abundant sequence data shows that among the class Insecta, only the infraclass Pterygota has formed two helices with expanded sequences. Furthermore, the Apterygota which consists of Collembola, Thysanura, and Archaeognatha, have only one elongated helix.
The present analyses of primary and secondary structures of 18S rRNAs are in accordance with the monophylies of the Pterygota and of the Holometabola already established by morphological evidence (Kristensen, 1991). The monophyly of Archaeognatha, Thysanura, and Pterygota (ectognathous insects) has been suggested based on the morphological characters and has been widely accepted, as well (Brusca and Brusca, 1990). However, one intriguing fact from our present study is that Archaeognatha and Thysanura have neither V4 sequence expansion nor one more helix in the helices E23-2 to E23-5 of the V4 as it was for collembolans as well as the other three major arthropod groups (crustaceans, myriapods, and chelicerates). Hence, this is in contrast with the other ectognathous insects like Pterygota, which have two helices in the helices E23-2 to E23-5. Among the ectognathous insects, Archaeognatha and Thysanura have been considered as intermediate-form insects between the development of ectognathous mouth-parts and the development of wings. Thus they have shared both primitive morphological characters found in collembolans and advanced morphological characters found in pterygotes. Their collembolan-like single helix possession in the helices E23-2 to E23-5 is one another plesiomorphic character shown in the molecular level.
Our results imply that the bifurcated shape of helices E23-2 to E23-5 is a synapomorphy for the pterygote insects. The large sequence insertion and the bifurcated form of the secondary structure of this region are present in most of the pterygote insects examined thus far, though there exists one exception, Hydropsyche sp. (data not shown; Trichoptera, X89483) which has only a single helix in the helices E23-2 to E23-5 region. Thus it is most parsimonious to assume that the second helice of the Hydropsyche case was lost independently in the evolutionary lineage of Trichoptera. Likewise, for the holometabolous insects, the stem elongation of helix 43 is the shared synapomorphic character. The remarkably conserved secondary structures suggest the possibility that these hypervariable regions are related with certain important cellular functions unknown so far.
In the present analysis, we do not include 18S rDNAs of some insect species to evolve fast as well as to have extremely expanded lengths because it is difficult to predict their stable secondary structures. Recently, we reconstructed the secondary structures from strepsipteran insects (Choe et al., 1999b). The result showed that they have unique secondary structures highly deviated from our present general features. It is likely that such deviations shown in the excluded taxa including strepsipteran insects are autapomorphic characters appearing only on each evolutionary lineage.
With regard to the insect phylogeny, the cladistic exploitation of the structural changes in rDNA was pioneered by Wheeler (1989). In our previous publication (Hwang et al., 1998), we have also attempted to interpret the phylogenetic meanings of the structural changes of R1/R2 elements of 28S rDNA, D3 stem of 28S rRNA, and ITS2a within 5.8S rDNA in insect phylogeny. We had concluded then that the yield of structural changes which are informative for higher insect systematics is poor in the 28S and 5.8S rDNA regions. Through our present study, however, it is now revealed that the primary and the secondary structures in V4 and V7 regions of 18S rRNA are phylogenetically informative in higher categorical level of insects and are evidently indicative of major steps in the insect evolution.
On the basis of insect fossil records (Kukalovà-Peck, 1991), it can be deduced that the bifurcated form of helices E23-2 to E23-5 was established prior to the acquisition of insect wings (flight system) and has been relatively stable for over at least 300 Myr. Thereafter, the elongation event of helix 43, which has been relatively stable for over at least 280 Myr, was established prior to complete metamorphosis (Fig. 4). It should be noted here that these two structural changes in insects are highly conserved for relatively long period of time. All this seems to indicate that the expansion events of V4 and V7 regions in insect 18S rRNA have not occurred simultaneously but independently at different periods during the insect evolution. Crease and Coulbourn (1998) in their recent publication, have proposed two possibilities in order to explain the coordinated increases of V4 and V7 regions in Arthropoda; one possibility is that the variable regions are functionally correlated, as was suggested for the 28S rDNA (Hancock and Dover, 1988, 1990) and the other possibility is that expansion mechanisms selectively operated only within certain variable regions where such changes are able to be tolerated. Our comparative analyses based on the insect fossil records strongly support that the expansions of these two variable regions are not functionally correlated and evolved seperately. Without considering the times when expansions of the V4 and the V7 regions happened, it is apt to misinterpret as if lengths of these two regions co-increased simultaneously. Our results show that there exists a time-gap, at least 20 Myr, between expansion events of these two variable regions. This is a relatively long gap of time considering that, since the first pterygote insect emerged, most of the extant insects have evolved only within ca. 50 Myr. If these variable regions were functionally correlated, the expansion events must have occurred nearly at the same time. Therefore, it is likely that the coordinated like pattern of V4 and V7 expansions is due to selective operation of expansion mechanism rather than functional correlation.
In conclusion, the primary and the secondary structures in V4 and V7 regions of 18S rDNA are found to be phylogenetically informative and reflect major steps in insect evolution. Highly conserved secondary structures of these two hypervariable regions show that these regions may be in charge of unknown important celluar functions. Their coordinated-like increasing pattern is not caused by functional relationship of these two regions. Considering that hypervariable regions of 18S rDNA have been generally employed for phylogenetic studies on lower hierachical levels (below Family) or removed before conducting comparative analyses so far, these new view points and findings are quite intriguing. Our present approach shows that the secondary structures of those fast evolving regions of 18S rRNA are remarkably conserved and can be used in phylogenetic studies on higher hierachical levels (above Order). In other animals, we could not find similar structural changes that are phylogenetically informative so far. Nevertheless, if more 18S rDNA sequence information for a wider range of taxa is obtained and if the same analyses be conducted and applied to the higher taxonomic levels of other animal groups besides insects, the secondary structures of variable regions of 18S rRNA should also be able to show distinct patterns that are phylogenetically informative along the examined taxa. Furthermore, additional sequence data could be helpful to unravel unknown celluar functions and evolutionary mechanisms of V4 and V7 regions
Acknowledgments
We heartily thank Dr. Markus Friedrich, Department of Biological Science, Wayne State University, 5047 Gullen Mall, Detroit, MI48202, for his valuable and critical comment for our manuscript. This study was partially supported by grants from KOSEF in 1995– 1998 (95-0401-04-01-3), SRC (97K3-0401-03-03-1) and the Ministry of Education of Korea (Institute for Molecular Biology and Genetics) in 1998. Dr. Ui Wook Hwang was supported by post-doctoral fellowship (Sep. 98 – Aug. 99) of KOSEF.