Basic helix-loop-helix (bHLH) proteins comprise a large superfamily of transcription factors, which are involved in the regulation of various developmental processes. bHLH family members are widely distributed in various eukaryotes including yeast, fruit fly, zebrafish, mouse, and human. In this study, we identified 55 bHLH motifs encoded in genome sequence of the human body louse, Pediculus humanus corporis (Phthiraptera: Pediculidae). Phylogenetic analyses of the identified P. humanus corporis bHLH (PhcbHLH) motifs revealed that there are 23, 11, 9, 1, 10, and 1 member(s) in groups A, B, C, D, E, and F, respectively. Examination to GenBank annotations of the 55 PhcbHLH members indicated that 29 PhcbHLH proteins were annotated in consistence with our analytical result, 8 were annotated different with our analytical result, 12 were merely annotated as hypothetical protein, and the rest 6 were not deposited in GenBank. A comparison on insect bHLH gene composition revealed that human body louse possibly has more hairy and E(spl) genes than other insect species. Because hairy and E(spl) genes have been found to negatively regulate the differentiation of insect preneural cells, it is suggested that the existence of additional hairy and E(spl) genes in human body louse is probably the consequence of its long period adaptation to the relatively dark and stable environment. These data provide good references for further studies on regulatory functions of bHLH proteins in the growth and development of human body louse.
The basic helix-loop-helix (bHLH) proteins form a superfamily of transcription factors involved in a wide range of eukaryotic developmental and biochemical processes including neurogenesis, myogenesis, sex determination, and environmental response (Massari and Murre 2000, Jones 2004, Castillon et al. 2007). These proteins are characterized by their bHLH motif, which is about 60 amino acids in length. The basic region is located at the N-terminal of bHLH motif. It is primarily responsible for binding to DNA with the assistance of certain basic residues such as R (arginine), K (lysine), and H (histidine). The HLH region is composed of two helices and a loop structure with variable length. It facilitates the formation of homodimeric or heterodimeric complexes between different family members through dimerization (Murre et al. 1989, Kadesh 1993, Massari and Murre 2000).
All eukaryotic bHLH transcription factors were first classified into 27 bHLH families and 4 higher order groups by means of phylogenetic analysis (Atchley and Fitch 1997). Two decades later, animal bHLH proteins were expanded to 45 bHLH families and 6 higher order groups. The 6 higher order groups (A, B, C, D, E, and F) were found to have 22, 12, 7, 1, 2, and 1 bHLH families, respectively, based on evolutionary relevance, structural, and functional properties (Simionato et al. 2007). Group A proteins mainly regulate neurogenesis, myogenesis, and mesoderm formation. They recognize and bind to E-box sequence typical of CAGCTG or CACCTG. Group B proteins mainly control cell proliferation and differentiation, sterol metabolism, adipocyte formation, and expression of glucose-responsive genes. They recognize and bind to E-box sequence typical of CACGTG or CATGTTG. Group C proteins usually contain a conserved Per-Amt-Sim homolog (PAS) domain in addition to the bHLH motif. PAS domain promotes dimerization with another protein containing PAS domain. They are mainly involved in the regulation of midline development, tracheal development, and circadian rhythms, and in the activation of gene transcription in response to environmental toxins. Group C proteins recognize and bind to DNA core sequence as of ACGTG or GCGTG. Group D proteins serve as antagonist of group A protein for lack of the basic region. Group E proteins bind to CACGCG or CACGAG and usually contain two particular peptides named “Orange” and “WRPW” at the carboxyl terminus. Group F corresponds to the Col/Olf-l/EBF (COE) proteins, which lack a basic domain and are characterized by the presence of COE domain involved in both dimerization and DNA binding (Atchley and Fitch 1997, Crews 1998, Ledent and Vervoort 2001, Ledent 2002).
With the rapid expansion of nucleotide and protein databases available to public, it is becoming more and more convenient for any researchers to conduct surveys on bHLH proteins of any organisms whose genomes are sequenced and released online. It would not only benefit researchers who are dedicated to study structures and functions of individual bHLH proteins but also enable a quick growth of organism list with identified bHLH repertoire. Up to now, over 1,000 bHLH family members have been identified including 8 bHLH members in Saccharomyces cerevisiae, 16 in Amphimedon queenslandica, 33 in Hydra magnipapillata, 45 in Caenorhabditis elegans, 46 in Ciona intestinalis, 50 in Strongylocentrotus purpuratus, 50 in Tribolium castaneum, 51 in Apis mellifera, 52 in Bombyx mori, 54 in Acyrthosiphon pisum, 57 in Daphia pulex, 57 in Harpegnathos saltator, 59 in Drosophila melanogaster, 63 in Lottia gigantea, 64 in Capitella spl, 68 in Nematodtella vectensis, 70 in Acropora digitifera, 78 in Branchiostoma floridae, 86 in Taeniopygia guttata, 87 in Tetraodon nigroviridis, 104 in Gallus gallus, 107 in Ailuropoda melanoleuca, 114 in Rattus norvegicus, 114 in Mus musculus, 117 in Homo sapiens, 139 in Brachydanio rerio, 147 in Arabidopsis, and 167 in Oryza sativa (Robinson and Lopes 2000; Bailey et al. 2002; Ledent et al. 2002; Li et al. 2003; Satou et al. 2003; Simionato et al. 2007; Wang et al. 2007, 2008, 2009; Bitra et al. 2009; Zheng et al. 2009; Pires and Dolan 2010; Dang et al. 2011a,b; Gyoja et al. 2012; Liu et al. 2012).
The human body louse, Pediculus humanus corporis (Phthiraptera: Pediculidae), causes the cutaneous disease named pediculosis vestimenti by laying their eggs in the seams of clothing. It is the primary vector of human diseases including relapsing fever, trench fever, and epidemic typhus. Human body louse diverged from human head louse (Pediculus humanus capitis) at ∼ 100,000 years ago, dovetailing with the origin of clothing (Toups et al. 2011). The body louse has a long evolutionary association with human, which has been considered in medical and healthcare practice (James et al. 2011). During its long period adaptation to human parasitism, certain physiological and biochemical features could have been remarkably changed. However, previous studies have been mainly focused on the parasitic relationship between human and body louse, and the development processes of body lice to prevent and treat pediculosis (Levot 2000, Pedra et al. 2003, Toups et al. 2011). A comprehensive identification of bHLH proteins of the human body louse would facilitate studies on the emergence and underlying mechanism of specific physiological and biochemical features in human body louse.
Therefore, in this study, we conducted a genome-wide survey to genome sequence database of human body louse (Kirkness et al. 2010) and successfully identified 55 bHLH motifs encoded in the genome of human body louse. Further phylogenetic analyses enabled us to define orthology of the 55 identified P. humanus corporis bHLH (PhcbHLH) members by using known bHLH members from fruit fly and other insect species. It was found that 29 of the 55 PhcbHLH proteins have been annotated in consistency with our analytical result, 20 were either annotated different with our analytical result or were merely annotated as hypothetical proteins, and the rest 6 were not found in current GenBank databases. Besides, human body louse possibly has more hairy and E(spl) genes than other insect species, which is probably the result of its long period parasitism on human. Our present work establishes a good basis for further studies on regulatory functions of bHLH proteins in the growth and development of human body louse.
Materials and Methods
BLAST Searches and Manual Examination. First, with both 59 D. melanogaster bHLH (DmbHLH) and the 45 representative bHLH motifs obtained from the additional files of previous reports (Ledent and Vervoort 2001, Simionato et al. 2007) as query sequences, tBLASTn searches were performed against the RefSeq genomic and trace-whole-genome shotgun sequence databases of human body louse ( http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE_TYPE=BlastSearch&PROG_DEF=blastn&BLAST_PROG_DEF=megaBlast&BLAST_SPEC=OGP_121224_16222) to retrieve all potential bHLH sequences. All query sequences were not filtered to obtain coding regions covering the full bHLH range. Other parameters for the search were of default values. The retrieved sequences were manually checked to discard redundant ones having the same contig number, the same reading frame, and the same coding regions. In case where the retrieved amino acids did not cover the full bHLH range, we retrieved the corresponding nucleotide sequences from GenBank nucleotide database and translated them into amino acids by using EditSeq program of DNAStar package (version 5.01) to supplement the absent amino acids. Intron splice sites, which separated bHLH coding sequences into more than one region, were assessed by NetGene2 online ( http://www.cbs.dtu.dk/services/NetGene2/).
Each of the above sequences was manually examined to see how many conserved amino acids existed in the 19 highly conserved sites (Atchley et al. 1999). If more than 10 conserved amino acids were present in the bHLH motif (Toledo-Ortiz et al. 2003), it was regarded as a candidate bHLH motif and was subject to further analyses. The bHLH motifs of Emc and COE families are relatively shorter, having 35 and 50 amino acids, respectively. Therefore, if more than five and eight conserved amino acids were present in potential Emc and COE sequences, the sequences were subject to further analyses as well.
To check whether there are protein sequences corresponding to the candidate motifs, BLASTp searches were performed against the RefSeq protein database of human body louse ( http://www.ncbi.nlm,nih.gov/blast/Blast.cgi?PAGE_TYPE=BlastSearch&PROG_DEF= blastn&BLAST_PROG_DEF=megaBlast&BLAST_SPEC=OGP__121224_16222) using the above obtained candidate bHLH motif sequences.
Multiple Sequence Alignment. All the above bHLH motif sequences were aligned by ClustalW program implemented in MEGA 5 (Tamura et al. 2011) with default settings. We then obtained a rich text file using GeneDoc Multiple Sequence Alignment Editor and Shading Utility (version 2.6.02) (Nicholas et al. 1997), in which the conserved sites of aligned PhcbHLH motifs were shaded with different gray depths.
Phylogenetic Analysis. Evolutionary relationships among all identified PhcbHLH motifs were analyzed using three different algorithms: distance neighbor-joining (NJ), maximum parsimony (MP), and maximum likelihood (ML). NJ phylogenetic analyses (Saitou and Nei 1987) were performed online ( http://www.phylogeny.fr/version2_cgi/one_task.cgi?task_type=bionj) using BioNJ algorithm (Gascuel 1997). MP phylogenetic analyses were conducted using PAUP 4.0 Beta 10 (Swofford 1998) based on the step matrix constructed from Dayhoff PAM 250 distance matrix by R. K. Kuzoff ( http://paup.csit.fsu.edu/ data/pam250.nex). NJ distance tree was bootstrapped with 1,000 replicates to provide information about the statistical reliability. MP tree was generated using heuristic searches and bootstrapped with 500 replicates. ML frees were constructed using PhyML program online ( http://www.atgc-montpellier.fr/phyml/) (Guindon et al. 2010) with the following parameter settings: BioNJ starting tree, 500 bootstrap steps, and LG (Le and Gascuel et al. 2008) substitution model. Other parameters such as proportion of invariable sites or gamma-shape parameter were optimized by ProtTest (Abascal et al. 2005).
Phylogenetic analyses of PhcbHLH motifs were carried out in two steps. First, all the candidate PhcbHLH motifs were used to build ML trees with 59 DmbHLH, 114 M. musculus bHLH (MmbHLH), and 70 Acr. digitifera bHLH (AdibHLH) motif sequences (from Supp Figs. S1-S6 [online only]). These trees clearly displayed to which higher order group a candidate PhcbHLH sequence belonged. Then, each candidate PhcbHLH motif was used to conduct in-group phylogenetic analysis with DmbHLH motif sequences. That is, a single PhcbHLH sequence was used to construct NJ, MP, and ML phylogenetic trees with known DmbHLH members of the same group (Wang et al. 2007, 2008). When in-group phylogenetic analysis using DmbHLH members could not yield evolutionary trees with sufficient bootstrap support, bHLH sequences from Anopheles gambiae, A. mellifera, Acy. pisum, or T. castaneum were then used to do the in-group analysis till sufficient bootstrap support was obtained for orthology assignment. Criterion for orthology assignment was as follows: if a PhcbHLH sequence formed a monophyletic clade with one DmbHLH or other insect bHLH sequence with bootstrap support >50% in various phylogenetic frees, the known DmbHLH or other insect bHLH member was regarded as an ortholog of the PhcbHLH sequence.
Protein Functional Domain Prediction. To further assess the reliability of our classification to the identified motifs and to examine whether the full-length protein sequences contain additional characteristic domains such as PAS, we carried out prediction of protein domain architectures with simple modular architecture research tool (SMART, http://smart.embl.de/) online.
Results and Discussion
Identification of PhcbHLH Members. Through BLAST searches, manual examinations and phylogenetic analyses, we identified 55 bHLH motifs encoded in the genome of P. humanus corporis. The alignment of all 55 PhcbHLH motifs is shown in Fig. 1. In our study, we named PhcbHLH genes according to the family names they belong to, which will facilitate further studies on structural and functional comparison with other organisms. Meanwhile, we added “1”, “2,” and “3,” etc. to names of some PhcbHLHs, which belong to one single bHLH family. For example, there are two human body louse bHLH genes in family Mist, which were named PhcMistl and PhcMist2, respectively. Detailed information of the 55 PhcbHLH genes, including names, bootstrap values from phylogenetic analyses, and GenBank annotations are listed in Table 1. Figure 1 and Table 1 led us to conclude that there were 23, 11,9, 1, 10, and 1 PhcbHLH members in groups A, B, C, D, E, and F, respectively.
Determination of PhcbHLH Orthology. Orthology determination tells whether similar genes in different organisms are orthologous. Although orthology determination has confronted with certain difficulty because no absolute criterion can be used to determine whether two genes are orthologous (Ledent and Vervoort 2001), the in-group phylogenetic analysis has proved to be reliable for identifying orthologous sequences in our previous studies (Wang et al. 2007, 2008). Therefore, in this study, we also used in-group phylogenetic analysis to define orthology for the identified PhcbHLH motifs.
Based on the overall ML trees constructed using amino acids of 55 PhcbHLH motifs and bHLH motifs from D. melanogaster, M. musculus, and Acr. digitifera (Supp Figs. S1–S6 [online only]), in-group phylogenetic analysis was conducted to determine orthology of each PhcbHLH member. For example, Supp Fig. S3 [online only] showed that PhcSim formed a large evolutionary clade with other group C bHLH members. Therefore, it was used to construct NJ, MP, and ML phylogenetic frees with 10 group C bHLH members from D. melanogaster (Fig. 2). As a result, PhcSim formed monophyletic clade with sim (single minded) of D. melanogaster with high bootstrap values. Therefore, we considered PhcSim as an ortholog of fruit fly sim. Similarly, in-group phylogenetic analysis was conducted to each of the identified PhcbHLH members. All the bootstrap values of constructed NJ, MP, and ML trees for each identified PhcbHLH member were listed in Table 1 without displaying correspondent constructed trees. It was found that the orthology of PhcbHLH members with D. melanogaster and other insect species could be divided into the following categories.
A complete list of PhcbHLH genes
First, 43 PhcbHLH motifs formed monophyletic clades with DmbHLH sequences with all the bootstrap values over 50 in constructed NJ, MP, and ML trees. They are PhcE12/E47, PhcMyoD, PhcNgn, PhcMistl, PhcBeta3, PhcAtonal2, PhcNet, PhcMyoR, PhcMesp, PhcTwist, PhcPTFa, PhcPTFb1, PhcPTFb2, PhcHand, PhcSCL, PhcNSCL, PhcMnt, PhcMax, PhcMyc, PhcUSF, PhcMITF1, PhcMITF2, PhcAP4, PhcTF4, PhcMLX, PhcSREBP, PhcSRC, PhcClock1, PhcAHR1, PhcAHR2, PhcSim, PhcTrh, PhcHIF, PhcARNT, PhcBmal, PhcEmc, PhcHey1, PhcHey2, PhcHES1, PhcHES3, PhcHES4, PhcHES5, and PhcCOE. Because of the high bootstrap values above the set criterion (50), we, therefore, confidently defined orthology of these PhcbHLH motifs as correspondent to DmbHLH orthologs.
Second, two PhcbHLH motifs, namely PhcMist2, and PhcDelilah, did not form monophyletic clade with bHLH sequences of D. melanogaster in NJ phylogenetic tree (marked with n/m in Table 1). PhcMist2 motif formed monophyletic clade in MP and ML trees with bootstrap values of 100 and 99, respectively. PhcDelilah formed monophyletic clade in MP and ML trees with bootstrap values of 98 and 84, respectively. Although we did not have sufficient bootstrap supports from all three constructed phylogenetic trees, we defined orthology for them based on the two formed monophyletic clades with bootstrap values over 50. These assignments may be modified if new data demonstrate discrepancy with our current analysis.
Finally, the rest 10 PhcbHLH motifs, namely PhcASCa1, PhcASCa2, PhcAtonall, PhcAtona13, PhcASCb, PhcClock2, PhcHES2, PhcHES6, PhcHES7, and PhcHES8, did not form any monophyletic clade with corresponding DmbHLH sequence in all three constructed phylogenetic trees. Therefore, we defined their orthology through constructing phylogenetic trees with corresponding bHLH members from An. gambiae, A. mellifera, Acy. pisum, or T. castaneum, respectively (marked with superscript letters Ag, Am, Ap, and Tc, respectively in Table 1). Among them, PhcASCa1, PhcASCa2, PhcASCb, PhcClock2, PhcHES2, PhcHES6, PhcHES7, and PhcHES8 were defined with sufficient confidence because all bootstrap values were over 50 in the three constructed trees, while the rest two members, PhcAtonall and PhcAtona13, had bootstrap values of over 50 only in two of the three constructed trees.
It is to be noted that three additional bHLH families, i.e., pearl and amber, which belong to group A, and peridot, which belongs to group D, have been found in Acr. digitifera (Gyoja et al. 2012). We have included these three sequences in both our general phylogenetic analyses (Supp Figs. S1 and S4 [online only]) and in-group phylogenetic analysis (Table 1). However, in all the constructed phylogenetic trees, no PhcbHLH sequence formed monophyletic clade with any of the three AdibHLH sequences, providing another instance of probable loss of these three genes during insect evolution.
Identification of PhcbHLH Protein Sequences. Protein sequence accession numbers of the identified PhcbHLHs are listed in Table 1. It was found that 49 PhcbHLH motifs have correspondent protein sequences deposited in GenBank (show as “XP_” plus numbers) and the rest 6 PhcbHLHs, namely PhcASCa2, PhcDelilah, PhcHES3, PhcHES4, PhcHES7, and PhcHES8, do not have correspondent protein sequences in current database. Further examination to the 49 PhcbHLH protein sequences revealed that all of them are from the annotation to genome sequences after completion of the human body louse genome sequencing project (Kirkness et al. 2010). Among them, 29 PhcbHLH proteins were annotated in consistence with our analytical result (Table 1, shown in bold face at the last column), 8 PhcbHLH proteins were annotated different with our analytical result (Table 1, shown in italics at the last column), and the rest 12 were merely annotated as hypothetical proteins (Table 1, shown in normal type at the last column). Therefore, our data provide good reference for updating annotations to the 26 PhcbHLH proteins in current GenBank database. For example, our analysis highly supports that PhcPTFb2 is a bHLH member of PTFb family but not that of Atonal family (Table 1).
Although amino acid sequences flanking the bHLH motif are generally divergent even in closely related proteins from the same species, certain conserved domains or motifs are often present within related bHLH protein groups (Jones 2004). To further determine reliability of our classification to the identified PhcbHLHs, a separate phylogenetic tree (Fig. 3) with predicted protein domain was constructed based on an alignment of all PhcbHLH motifs. As we can see, HLH domain was identified in all PhcbHLH protein sequences. In addition, group C PhcbHLHs are characterized by having two PAS and one C-terminal to PAS motif (PAC ) domains with exception only on PhcAHRl. Four of the six group E PhcbHLH full-length protein sequences have an Orange domain. Apart from the common domains existing in groups C and E bHLH proteins, other structural domains were also found in individual PhcbHLH members. For example, PhcE12/E47 (group A) and PhcClockl (group C) have a coiled coil domain, PhcUSF (group B), PhcBMAL, and PhcARNT (group C) have a transmembrane domain, PhcSRC (group B) has two PAS domains, and PhcCOE (group F) has an Immunoglobulin Plexin Transcription (IPT) domain. To sum up, our analyses indicated that protein architecture is highly conserved within specific bHLH groups, and the above data provide further support to the results of our phylogenetic analysis based on bHLH motifs (Fig. 1 and Table 1).
Genomic Coding Regions of PhcbHLH Motifs. Coding regions of the 55 PhcbHLH motifs are listed in Table 2. It was found that the coding regions of 21 PhcbHLH motifs contain one intron in basic, helix 1, loop, or helix 2 region, and those of 4 PhcbHLH motifs (i.e., PhcMITF1, PhcSREBP, PhcHES1, and PhcHES2) have 2 introns, which are located in the basic and loop region, respectively. So, totally there are 29 introns in the coding regions of all 55 PhcbHLH motifs. The longest intron in coding regions of PhcbHLH motifs is 6,723 bp (base pairs), the shortest one is only 66 bp, and the average length of the 29 introns is 616 bp. While in Acy. pisum, H. saltator, D. melanogaster, and A. mellifera, there are 26, 22, 18, and 9 bHLH members having introns in coding regions of their bHLH motifs. The total number of their introns is 34, 26, 20, and 9, the longest intron is of 30,718, 7,943, 11,845, and 4,460 bp, the shortest one is of 62, 82, 57, and 72 bp, and the average intron length is of 4,193, 1,391, 1,082, and l,326bp, respectively (Liu et al. 2012). In summary, the number of PhcbHLH motifs having introns is more than many other insect species and only inferior to pea aphid. However, the average length of PhcbHLH introns is the least among these five insect specifies and the shortest length of PhcbHLH intron is just higher than those of pea aphid and fruit fly. Whether this has any evolutionary significance remains for future exploration.
It should be noted that coding regions of five PhcbHLH motifs, namely PhcASCa2, PhcHES3, PhcHES4, PhcHES7, and PhcHES8, were identified from trace-whole-genome shotgun nucleotide sequences (Table 2). We included them as bHLH members because their motif sequences are different with other identified PhcbHLH motifs. Whether they are genuine novel bHLH family members awaits further verification upon completion of genome sequence assembly with higher quality.
A Comparison on Insect bHLH Family Members. So far, bHLH repertoires have been established for 10 insect species, namely P. humanus corporis (Phc, Aedes aegypti (Aa), An. gambiae (Ag), Culex quinquefasciatus (Cq), H. saltator (Pa), A. mellifera (Am), Acy. pisum (Ap), B. mori (Bm), D. melanogaster (Dm), and T. castaneum (Tc). The numbers of bHLH family members in each of the 10 insect species are listed in Table 3. Table 3 displays that all insect species lack bHLH genes of Olig, MyoRb, and Figα families. Many families have at least one gene including E12/E47, Ngn, Mist, Beta3, Atonal, Net, MyoRa, Twist, PTFa, PTFb, Hand, SCL, NSCL, Mnt, Max, Myc, USF, AP4, TF4, SREBP, SRC, Clock, AHR, Sim, Trh, HIF, ARNT, BMAL, Emc, Hey, and H/E(spl), among which the 10 insect species have the same number of genes in 10 families, such as E12/E47, Beta3, Net, MyoRa, Hand, SCL, NSCL, Myc, SRC, HIF, and ARNT. However, we failed to identify any Paraxis family member in human body louse. Although all other nine insect species have been found to have one Paraxis family member, the absence of it in human body louse is probably due to incompleteness of the louse genome sequences. Therefore, it is expected to find this absent bHLH member after a new and higher quality version of human body louse genome sequences is released. Similar situation is also present in A. pisum, which lacks ASCa, MyoD, and Microphthalmia transcription factor (MITF) family members, and in T. castaneum, which lacks Mesp and MLX family members, both of which should be due to incompleteness of genome sequences as well. Coding regions of these missing bHLH members are expected to be present in genome sequences of higher quality.
Table 3 also presents that the number of H/E(spl) family members varies greatly among different insect species. It ranges from 4 in mosquitoes to 11 or 12 in fruit fly. Human body louse has eight H/E(spl) family members, being second only to that of fruit fly. In insects, there are four different bHLH genes in H/E(spl) family. They are genes H (hairy), Dpn (deadpan), Side (similar to deadpan), and E(spl) (enhancer of split). A close examination to distribution of insect bHLH genes in H/E(spl) family revealed that human body louse has one or two more H genes than other insect species (Table 4). A phylogenetic tree constructed using all H/E(spl) family members of the 10 insect species demonstrated that the three P. humanus corporis H genes, i.e., PhcHES1, PhcHES3, and PhcHES4, were from species-specific gene duplication in louse lineage (Fig. 4). Nevertheless, the three louse E(spl) genes, i.e., PhcHES6, PhcHES7, and PhcHES8, were also originated from a species-specific gene duplication, whereas E(spl) genes in other insect species except mosquitoes were derived from various E(spl) genes already existed in common ancestor of insects (Fig. 4).
Taken together, it is possible that human body louse has more hairy and E(spl) genes than other insect species. Why does human body louse have such additional genes? It could be the consequence of long period adaptation to relatively dark and stable environment on human body. First, hairy gene is involved in negative regulation of insect eye development. It was found that Drosophila hairy negatively regulates progression of the morphogenetic furrow across the eye imaginal disc (Brown et al. 1995) and is able to restrain proneural pathways whose activation is imminent (Greenwood and Strahl 1999). Therefore, the existence of multiple hairy genes probably means that eye development of human body louse is hindered for adaptation to dark environment. Second, E(spl) gene inhibits differentiation of specific preneural cells. Drosophila E(spl)mB(g) was found to be a potent inhibitor to prevent ectodermal cells from adopting the sensory organ precursor fate (Giagtzoglou et al. 2003). Although various E(spl) genes are functionally redundant (Schrons et al. 1992), the existence of three E(spl) genes in human body louse probably indicates that specific preneural cells are inhibited from forming functional sensory apparatus, leading to deficiency in sensory functions other than photoreception.
The bHLH family members in 10 insect species
Distribution of insect bHLH genes in H/E(spl) family
In this study, P. humanus corporis genome sequences were searched and identified to encode 55 members of the bHLH superfamily. Phylogenetic analyses revealed that the 55 PhcbHLHs are distributed in 39 bHLH families with 23, 11,9, 1, 10, and 1 member(s) in groups A, B, C, D, E, and F, respectively. Group C and E PhcbHLH proteins were found to possess PAS/PAC and Orange domains, respectively, further verifying our classification to the identified PhcbHLH family members. Examination to GenBank annotations of the 55 PhcbHLH members indicated that 29 PhcbHLH proteins were annotated in consistence with our analytical result, 8 were annotated different with our analytical result, 12 were merely annotated as hypothetical protein, and the rest 6 were not deposited in GenBank. A comparison on insect bHLH gene composition revealed that human body louse possibly has more hairy and E(spl) genes than other insect species. Because hairy and E(spl) genes have been found to negatively regulate the differentiation of insect preneural cells, it is suggested that the existence of additional hairy and E(spl) genes in human body louse is probably the consequence of long-period adaptation to relatively dark and stable environment on human body. These data provide good reference for further studies on regulatory functions of bHLH proteins in the growth and development of human body louse.
We are grateful to Prof. Bin Chen of Jiangsu University and two anonymous reviewers for their constructive suggestions and comments on our study. We acknowledge the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov) for providing genome sequence information, and institutions and organizations providing public release of genome sequences used in our investigation. This work was supported by Scientific Research Promotion Fund for the Talents of Jiangsu University (09JDG029) and the National Basic Research Program (973) of China (2012CB114604).