MicroRNAs are important at post transcriptional regulation in eukaryotes. Nasonia genus is becoming increasingly popular model in present days due to genetic advantages it possesses over Drosophila. Nasonia species are found distributed throughout the world, expect for N. longicornis, and N. giraulti. In this study, we use the sequential method of blasting all known invertebrate miRNA genes against the Nasonia vitripennis, Nasonia longicornis, and Nasonia giraulti genomes. We identify 40, 31 and 29 putative pre-miRNAs and mature sequences in N. vitripennis, N. giraulti and N. longicornis respectively. A cross species comparison of putative miRNA sequences and their statistical characteristics reveals that there are no huge differences between the species, except for few miRNAs which are reported. We also find that the minimal folding energy index for three Nasonia species pre-miRNA's average is around -0.85 ± 0.11. Further, we report that U is predominant at the 5‘ end of mature sequence, which being a typical characteristic of plant miRNAs. Using MiRanda, we predict nearly 471 potential sites in the N. vitripennis genome. Thus concluding our study to be the beginning of understanding the Nasonia's non coding RNAs and may play an important role in effective pest management in near future.
Introduction
microRNAs are a class of conserved small noncoding regulatory RNAs. miRNAs are involved in regulation of gene expression at the post transcriptional level. In animals they inhibit translation by partially binding to their target mRNAs followed by their cleavage, while in plants they have perfect or near perfect base pairing on their target sequence. Apart from regulation of gene expression, they are also involved in control of organ development, stem cell differentiation and developmental timing.1 Different studies reveal role of miRNAs in diseases such as cancer and other infections.2,3 Recent studies on miRNAs have begun to explore exact effect of the microRNA clusters or individual microRNA on cellular processes.4
Preliminary surveys of miRNAs across the animal kingdom demonstrated a very compelling feature of miRNA evolution when compared with the evolution of the protein coding repertoire.5 Many studies on evolutionary surveys have been reported on comparing the conserved miRNAs between the taxonomies of interest.6–789 If a taxon increased its mutation rate or lost miRNAs, then the phylogenetic history of that particular miRNA would be inaccurately constructed, leading to potentially spurious claims about the import of miRNAs with respect to organism's evolution. Yet, there are many studies exploring the miRNA repertoire independent of phylogenetic conservation.5,10,11 Further, miRNAs have enormous potential in disease diagnosis and find application in gene therapy.12
Nearly, 9539 hairpin sequence entries have been registered by March, 2009 in miRBase. Sequence analysis have shown that some mature miRNAs are phylogenetically conserved, particularly in the first 8 residues at the 5‘ end in species of the same kingdom.13–1415 It has also been reported that quite a few mature miRNA sequences are conserved between animals and plants. Examples include, mir-854, identified in C. elegans, mouse, human and plants.16
Currently there are three methods for identifying miRNAs, which are the classic cloning method, deep sequencing method and computational approaches. The classic and deep sequencing methods are generally used for validation of miRNAs, but are not efficient as compared to the computational approach for detecting miRNAs. The computational approach could be further classified into 3 types: ab initio prediction based on the sequence and structural features, comparative genomic strategy based on conservation and the integrated approach. Most of the known miRNAs presently are detected by computational approaches in diverse organisms from plants to higher animals.17–18192021
Many computational prediction algorithms of miRNAs have been developed to aid in experimental studies of miRNA discovery. Commonly, all algorithms have different approach with the aim of reducing the false positive and/ or to increase specificity of prediction.22 Evolutionary conservation is considered an important feature of the hairpin sequence and analysis thereof is often used to identify and focus comparison on the conserved noncoding sequences space in different genomes.23–2425 Phylogenetic shadowing has been used for combined selection and filtering of miRNA candidates.26 Other filtering criterias include intragenomic matching of candidate miRNAs and their potential targets,27 expression profile, thresholds such as minimal folding energy (MFE), minimal folding energy index (MFEI), occurrence of intergenic regions or existing in close proximity of known miRNA clusters.28–2930
Nasonia is approximately 120MY diverged from Honeybees, in a second major branch of Hymenoptera. The Order Hymenoptera is diverse and are natural enemies of a broad range of vector arthropods of medical, veterinary and agricultural significance. Nasonia belongs to family Pteromalidae, whose adults lay their eggs in or various life stages of other arthropods subsequently regulating the insect population. There are three closely related species in the genus Nasonia, they are Nasonia vitripennis, Nasonia longicornis and Nasonia giraulti. These three species differ in their host preferences–-N. vitripennis parasitize a wide range of flies (including blowflies, fleshfiles and houseflies), while the other two species appears to be specialists for their host. They are haplodiploidy, which allows geneticists to exploit many of the advantages of haploid genetics. As a result, Nasonia, are emerging as models for studies of complex genetic traits. Nasonia is well positioned phylogenetically to assist in identifying orthologs of important genes in insects and a genetically traceable system for functional analysis.31 On a higher level these genomes may help us to understand features such as regulatory domain evolution, frequency and type of noncoding DNA, and metabolic capabilities. Thus in our studies here, we identify putative miRNA homologs in N. vitripennis, N. longicornis and N. giraulti, and compare the sequences among the species. Further we also describe the statistical sequence characteristics of putative miRNA gene sequences and predict their possible targets in N. vitripennis species using MiRanda.32,33
Materials and Methods
Microrna Sequence dataset
The pre-miRNAs (miRNA genes) for all known invertebrates, 1742 sequences were obtained from miRBase Sequence Database, release 13.0, March 2009.34 These sequences include all miRNA reported species from protozoans to echinodermata. These miRNAs in database were either obtained by direct cloning or/ and confirmed by a variety of experimental approaches, including northern blotting, polymerase chain reaction or microarray. Besides these, many of them were obtained by computational identification of their homologs in closely related species.
Detection of Mirna Homolog in All Nasonia Species genome
Here we performed BLAST35 (expect value 0.01, mismatch -2) search using all previously published invertebrate pre-miRNAs and queried against the N. vitripennis (6.2X), N. longicornis (1X), and N. giraulti (1X) genome assemblies.31 All the hits were downloaded in FASTA format36 and used for further analysis. The hits were iterated to remove the duplicates of same miRNA gene based on optimal values of identity, alignment length and gaps between the query and hit sequences. A flowchart is shown summarizing the method for identifying putative miRNAs in Nasonia (Fig. 1). Since the complete assembly of the genome is not available or completed, it may be too early to identify the clustered miRNAs in Nasonia.
Sequence Characteristic analysis
The sequence characteristic analysis was performed by identifying base frequencies of A, G, C, U, A + U, G + C of the hit sequences. Independently all invertebrate miRNA gene sequences statistical analysis was performed and compared with the iterated pre-miRNA sequences. MFE of the secondary structure was obtained using m-FOLD37,38 for all the putative pre-miRNA genes. The adjusted minimal folding energy (AMFE) and the MEFI, was calculated as previously described by Zhang.39
Results and Discussion
Pre-and Mature Mirnas of nasonia
Pre-miRNA sequences of all invertebrates studied till date were downloaded from the miRBase.34 These pre-miRNA sequences were used as query for BLAST tool against the N. vitripennisi (6.2x), N. longicornis (1X), and N. giraulti (1X), genome assemblies. All the hits were stored in multi FASTA format and used for further analysis. The iteration of the raw BLAST data is summarized in Figure 1. For those sequences with less than 85% identity in all three species were validated using miPred. The resulting sequences were used as putative pre-miRNA sequences (Supplementary Table 1, 2, 3). In N. vitripennis after iteriation 40 putative miRNAs were obtained, of which mir-iab-4 alone had 100% Identity at pre-miRNA level, 13 miRNAs had identity between 95%–99.9% Identity, followed by 14 miRNAs between 90%–94.9% and 12 between 80%–89.9% Identity (Fig. 2) to their query sequences at pre-miRNA level. Even in case of N. giraulti after iteriation of the BLAST hits, 31 putative miRNAs were obtained of which only mir-iab-4 had 100% identity, followed by 9 pre-miRNAs between 95%–99.9%, 10 pre-miRNAs between 90%–94.9% and 11 pre-miRNAs between 80%–89.9%. While in N. longicornis we obtained 29 putative pre-miRNAs and interestingly there were no 100% identity miRNAs, 6 pre-miRNAs between 95%–99.9%, 15 between 90%–94.9% and 8 between 80%–89.9% to their query sequences (Fig. 2a). It is quite interesting to note that most of the query sequences after complete screening were mostly from Apis mellifera–-95.12% (N. vitripennis), 96.77% (N. giraulti), 100%,40 (Fig. 2b) suggesting their phylogenetic closeness. They both belong to same Order Hymenoptera and sub order Apocrita. The organisms of this order have membranous wings, which is a unique characteristic. Further all these sequences have 100% conserved mature miRNA sequences.
Table 1.
Cross species comparison of pre- and mature miRNAs sequences of N. vitripennis, N. longicornis, N. giraulti.
Table 2.
Comparative list of MFE, AMFE and MFEI of N. vitripennis, N. longicornis, N. giraulti.
Table 3.
Comparison of Statistical sequence characters between all Nasonia species and all invertebrates.
At mature miRNA level very interestingly, of 40 miRNAs, 39 pre-miRNAs though they had varying identity between 80%–99.9% identity; 38 of them had 100% identity at mature sequence level, showing conservation of sequences at mature sequence level during evolution. In N. giraulti of 31 miRs obtained, 28 of them had no change in the mature sequences compared to their query sequences. While miR-281 has 95.24% of mature identity (Supplementary Table 2) and miR-100 had 81% mature identity with 4 mismatches (Supplementary Table 2, Figure 3a) at mature sequence level. N. longicornis was more interesting to analyze than the other two species as at pre-miRNA level there was no 100% identity sequence, while at mature sequence level 28 of 29 sequences had 100% mature identity and again miR-281 had 95.24% identity to their query sequences, making this miRNA sequence genus specific. A cross species comparison of Identity percent of miRNA sequences at pre- and mature sequence level is determined (Table 1). The pre-miRNA sequence analysis gives a clear picture that there has been divergence during speciation of Nasonia.
Sequence Analysis
We looked upon the effect of sequences on their secondary structure and stability during their divergence into speciation. At pre-miRNA level except for few miRNAs such as mir-100, mir-13a, mir-210, mir-252, mir-276, mir-317, mir-92a, mir-927, and mir-993 others have 100% identity (Table 1). In N. giraulti and N. vitripennis putative mir-100 had only 92.3% identity at pre-miRNA level and 75% Identity at mature miRNA (Table 1), yet the hairpin structure between the species is almost conserved (Fig. 3a, 3b) the minimal variation between the structures is accounted for mismatches and gaps between the sequences. Also, mir-100 is probably lost during divergence of N. longicornis or could be due to differences in genome sequence coverage. mir-993 is another example for divergence among the Nasonia species which has affected both the secondary structure and MFE. Other extreme divergent example among the species is putative mir-252 at 46th position has uracil instead of cytosine, which has affected the secondary structure (Fig. 4a, 4b) and hence their free energy at very minimal difference, but the mature sequence remains unaltered. It is interesting to see mir-317 (Fig. 5a, 5b), though there is a single base mismatch (A-G) at 38th position, it has affected neither the secondary structure nor the MFE (minimal folding energy).41 Few other examples wherein a single base has not affected the secondary structure or MFE are mir-13a, mir-276, mir-92a (Supplementary Figure 1, 2, 3). Whilst of these divergence among the species, the seed region sequences have been highly conversed with all the miRNAs, maintaining their functional integrity in the genome.
Previously it has been demonstrated that compared to other noncoding RNAs, pre-miRNAs have low MFE.41 Therefore, MFE was considered as one of the important factors to identify miRNA genes.42,43 We have also compared the values of the MFE, AMFE (adjusted minimal folding energy) and MFEI (minimal folding energy index). Even though the identity values of few above mentioned miRNAs have diverged during divergence among the species, amazingly, their AMFE and MFEI values are greatly conversed except in very few cases such as mir-276, mir-993 making them species specific (Table 2). We find that the MFEI values for N. vitripennis, N. giraulti and N. longicornis pre-miRNA's average being -0.86 ± 0.12, -0.85 ± 0.11 and -0.85 ± 0.11, suggesting that though there are divergence mismatches or gaps in sequence, yet the MFEI have been highly conserved between the species. Altogether, these features could play a significant role in further understanding the evolutionary pattern of the pre-miRNAs among the species.
Statistical Sequence characteristics
Sequence characteristics of pre- and mature miRNAs are reported in plants and very recently in animals.9,19,34,44 We performed a detailed analysis of sequence characteristics of N. vitripennis, N. longicornis and N. giraulti putative pre-miRNA sequences and all known invertebrate miRNA sequences (Table 3). Our study shows that the length of all three Nasonia species pre-miRNAs varies between 41 to 100 nucleotides with an average of ∼83 ± ∼10-nts for all three species together, which is quite consistent with the other known invertebrate sequences, particularly within class insecta. The average base composition of pre-miRNA sequences in all three Nasonia species and other invertebrates are almost identical (Table 3). The average frequency of G + C% in Nasonia species is higher than other invertebrates, which is quite unique observation for this genus. Due to the high frequency of G + C, the base frequency of A + U is generally found much less in Nasonia. It is well known that the A + U composition decreases the stability of the pre-miRNA secondary structure,39 yet in Nasonia we find a totally different observation from what is been previously observed in other animals. Such a unique observation has not been reported so far. This high frequency of G + C, could be unique to genus, differing from other close relatives. Further, we find that G/C and U/A ratio for Nasonia (average of all 3 species) and invertebrates are 1.16, 0.81 and 1.11, 0.79 respectively. We further analyzed the frequency of nucleotides at each position in the mature miRNA of all Nasonia species.
Earlier studies have shown that U is the predominant nucleotide at 5’ end of the mature miRNA in plants. Based on this it has been proposed that the 5’ end may play an important role in biogenesis of mature miRNA through recognition of the targeted miRNA precursors by RISC.39 Consistent to this, our studies have shown that uracil is present predominantly at 5’ end of the mature miRNAs in all three species of Nasonia (Fig. 6a). Therefore, our study suggests that there are a certain degree of similarity at sequence level between plant and Nasonia mature miRNAs at a distant view, in addition to other invertebrate miRNA sequences. We further find apart for 1st position in mature miRNA, U is predominant at positions 9, 17, 20, and least at 7, 14, 22 positions (Fig. 6b). In plants, cytosine is the dominant nucleotide at position 19; however that is not observed in animals.39 Instead we find in the case of Uracil in Nasonia is dominant at the same position (average of 3 species 33.6%). These positional differences in miRNA sequences could affect their miRNA:mRNA target binding energy and making Nasonia unique from other related species. Also, we find that G + C frequency is higher than A + U at position 3. All these sequence characteristics features could play an important role in further understanding the evolution of miRNAs and their potential target sequences.
Mirna Target prediction
miRNA targets are usually located in the 3‘-UTR region of mRNAs. These UTRs have already been recognized as an important regulatory region even before the discovery of miRNAs, due to the presence of numerous regulatory signals involved in the control of nuclear export, subcellular localization, transcript stability amongst other processes.45 Additionally, this regions frequently contains multiple target sites for more than one miRNA to interact.45 However, it is been shown that target sequences inserted in the coding or 5’ UTRs can also be functional.
It is well known that animal miRNA targets are difficult to predict, unlike plant targets since miRNA:mRNA duplexes often contain several mis-matches, gaps and G:U base pairs in many positions.45 However, it is increasingly recognized that near-perfect complementarity between a few bases at the 5‘ end of miRNA and the 3‘ UTR targets is instrumental in metazoan target site recognition, these sites are referred as “seed sites/sequences”.23,46 In our study we use MiRanda algorithm,32 which encompasses the thermodynamic stability of miRNA:mRNA duplex as one of the entity in detecting the potential binding site on the 3‘UTRs. As we have seen although the pre-miRNAs sequences were greatly varying between the query sequences and the hit sequences, the mature sequences were almost conserved. We looked upon the conservation of the mature sequences with the well studied invertebrate model D. melanogaster, as predicted atleast 80% of the mature sequence in all three Nasonia species were 100% conserved (Fig. 7). This conservation gives an overwhelming opinion that they are functionally conserved and hence their target sites to a certain extent. Thus we extracted all the 3‘UTRs from the N. vitripennis species, converted into FASTA format and further used in the MiRanda algorithm. We thus identify 471 poteinal target sites in 46 GIs (Gene IDs) (Supplementary Table 4). To reduce the false positive binding sites we used cut off value of -14 kcal/mol and score value of 80. As the N. giraulti and N. longicornis sequences were from trace archived and the sequences are not completed, it could be too early to predict the potential targets at this point of time as it may lead to many false positive hits. Further, experimental evidences are required to validate these targets in in vivo conditions which are beyond the scope of our study. Thus, we consider that our study leads to the beginning of understanding the Nasonia's miRNA integrity.
Conclusion
In this study, we report 40, 31 and 29 putative miRNAs in N. vitripennis, N. giraulti and N. longicornis respectively, by modifying the existing computational methods using all reported invertebrate miRNA genes. It is important to note that N. vitripennis (6.2x) has higher genome sequence coverage than the other two species (1x), which will influence the number of miRNAs reported and could change with the genome coverage. Sequences with less than 85% have been validated using miPred. The statistical sequence characteristics of the putative sequences are found to be quite consistent with other invertebrate sequences. Further, we find this integrated method of detecting putative miRNA genes has been reasonable as we started with all invertebrate miRNA genes containing reported miRNA genes from majority of invertebrate phyla, after iterations, we find that most of the query sequences come from Apis mellifera, the closest organism in evolution. We did not find any close homolog miRNA genes from other query phyla used. The positional differences of bases in the sequences could make these miRNAs unique for Nasonia. Among the putative hits, although we find some of the pre-miRNA sequences are not 100% homolog, yet the 100% conservation of mature sequences reveals the functional domains are probably unchanged during the evolution of Nasonia genome. Further insights to this conservation the miRNA targets and find that these targets are again conserved during evolution when compared to the most well studied organism D. melanogaster. It may be too early to determine the miRNA clusters as the assembly of the Nasonia genomes are still under progress. As we have basically used comparative analysis for detecting homolog miRNAs, there could be more miRNAs yet to be identified in the genome, which could have aroused after speciation and remains unidentified in our analysis. We are very sure that miRNAs could be probably used in the potential pest management for controlling of insects in agricultural fields. This method could make the limited use of chemical active ingredients on the crops, thereby reducing the pollution and ecological imbalances caused in recent years. We hope that this paper could be a start in understanding of miRNAs in the Nasonia genomes. There could be many more interesting evolutionary features yet to be discovered when we start looking at the miRNA target sites.
Acknowledgement
We extend our thanks to BCM's (Bayer College of Medicine) Nasonia Genome Project group and all other groups associated with Nasonia genome sequencing project for sequencing the genome and making available for the public.