A variety of bioactive proteins from medicinal leeches, like species of Hirudo, have been characterized and evaluated for their potential therapeutic biomedical properties. However, there has not previously been a comprehensive attempt to fully characterize the salivary transcriptome of a medicinal leech that would allow a clearer understanding of the suite of polypeptides employed by these sanguivorous annelids and provide insights regarding their evolutionary origins. An Expressed Sequence Tag (EST) library-based analysis of the salivary transcriptome of the North American medicinal leech, Macrobdella decora, reveals a complex cocktail of anticoagulants and other bioactive secreted proteins not previously known to exist in a single leech. Transcripts were identified that correspond to each of saratin, bdellin, destabilase, hirudin, decorsin, endoglucoronidase, antistatin, and eglin, as well as to other previously uncharacterized predicted serine protease inhibitors, lectoxin-like c-type lectins, ficolin, disintegrins and histidine-rich proteins. This work provides a lens into the richness of bioactive polypeptides that are associated with sanguivory. In the context of a well-characterized molecular phylogeny of leeches, the results allow for preliminary evaluation of the relative evolutionary origins and historical conservation of leech salivary components. The goal of identifying evolutionarily significant residues associated with biomedically significant phenomena implies continued insights from a broader sampling of blood-feeding leech salivary transcriptomes.
Notwithstanding the dubious utility of leeches for the treatment of obesity, hysteria, and other ailments in the 19th century, the European medicinal leech, Hirudo medicinalis, since then has come to play a valuable role in the postoperative treatment of venous congestion following flap and replantation surgery for which they were recently approved as a medical device by the U.S. Food and Drug Administration (Rados, 2004).
The anti-thrombin activity of the salivary protein, hirudin, has a storied medical history dating to the late 19th century (Markwardt, 1955), including U.S.-FDA approval in 1998 for heparin-induced thrombocytopenia (HIT). Yet leeches, like H. medicinalis and the giant Amazonian leech, Haementeria ghilianii, produce a pharmacological cocktail of protease inhibitors (Baskova and Zavalova, 2001; Salzet, 2001) that assist leeches both to successfully feed and to avoid the blood meal coagulating during the extended periods in which it must reside in the crop.
Besides hirudin, a variety of bioactive compounds already have been isolated (typically with high-performance liquid chromatography and peptide sequencing) from leech salivary secretions, usually from species of Hirudo and Haementeria raised in captivity. These include the original anti-Xa angiogenesis-inhibiting antistasins, other serine protease inhibitors targeting various coagulation factors or elastase, endoglucoronidases, a fibrinolytic isopeptidase, and factors that inhibit platelet aggregation (Baskova and Zavalova, 2001; Salzet, 2001). Several of these bioactive compounds have been the subject of pharmaceutical development for glaucoma (orgelase), emphysema, and inflammation (eglin) or for the reduction of tumor metastases (bdellastasin).
The evolutionary relationships of leeches (Hirudinida) have been increasingly clarified through modern molecular phylogenetic techniques. It is now well established that hirudinids evolved from a common ancestor with lumbriculid oligochaetes (Siddall et al., 2001) and that, for example, H. medicinalis and H. ghilianii are only distantly related, having diverged about 200 million years ago (Siddall et al., 2006). Species of Hirudo (Hirudinidae) feed through a cutaneous incision facilitated by 3 armed muscular jaws, whereas species of Haementeria (Glossiphoniidae) insert a muscular proboscis into vascularized tissues. This long evolutionary history and the variety of vertebrate hosts and habitats to which various leech lineages are adapted connote both common molecular mechanisms for sanguivory present in the ancestral leech as well as other mechanisms that may be lineage specific or species specific.
Different medicinal leech species are known to produce distinct suites of bioactive compounds in their salivary secretions. The North American medicinal leech, Macrobdella decora, may be unique in secreting a 39 amino acid glycoprotein (GP) IIb/IIIa disintegrin, decorsin. Mass spectrometry suggests that even closely related species of Hirudo exhibit substantial interspecific variation and that “there is only a more complicated way for gradual solution of this problem, by creation of a cDNA library of a species of medicinal leech” (Baskova et al., 2008).
Rather than continue the focus on already well-studied leech species like H. medicinalis, Hirudo verbana, and species of Haementeria, we targeted the salivary transcriptome of the North American medicinal leech, M. decora. This species typically feeds on amphibians and fish and is abundant in North American temperate freshwater environments where it is a willing annoyance to humans (Munro, Siddal et al., 1991; Munro et al., 1992). The choice of this species was driven both by phylogenetic and practical considerations. Though otherwise not well studied, the pre-existence of at least 1 salivary anticoagulant, decorsin, from M. decora should serve as a quality control check on the successful elucidation of the saliomic repertoire. More interesting is that while long thought merely to be a recently diverged North American counterpart to the otherwise old-world Hirudinidae, M. decora is in a distinct family and part of a lineage now understood to have diversified from ancestral jawed leeches over 150 million years ago (Phillips and Siddall, 2009; Phillips et al., 2010). Insofar as the phylogenetic origin of the macrobdelloid leeches lies between those of Glossiphoniidae and Hirudinidae, its salivary transcriptome could illuminate patterns associated with the evolution of blood feeding more generally across Hirudinida and create context for the known leech salivary secretions (Baskova and Zavalova, 2001; Salzet, 2001) and the transcriptome of Haementeria depressa (Faria et al., 2005).
MATERIALS AND METHODS
Leeches were collected in Tolland County, Connecticut, from exposed skin while wading in local ponds and held in isolation in distilled water for 24 hr prior to dissection. Leeches were induced to initiate feeding so as to better promote an appropriate transcriptional state. To minimize the co-purification of any surface bacteria, leeches were washed in 0.5% bleach for 1 min and rinsed in distilled water for 1 min. Salivary tissue masses lying posterior to the 3 muscular jaws were removed aseptically by dissection while immersed in RNAlater (Qiagen, Valencia, California), using sterilized tools. Tissue samples were washed in 0.5% bleach for 1 min, rinsed in distilled water for 1 min, and stored in RNAlater. Whole RNA was isolated with the RNeasy Tissue Kit (Qiagen).
Construction of cDNA libraries was facilitated with the SMARTer cDNA Library Construction Kit (Clontech, Mountain View, California), but with first-strand synthesis accomplished with 200 units of Superscript III Reverse Transcriptase (Invitrogen, Carlsbad, California), and second-strand synthesis with 5 units of Platinum Taq DNA Polymerase High Fidelity (Invitrogen), run for 25–28 cycles of 10 sec at 95 C, 20 sec of 55 C, and 4 min of 68 C. The amplicons then were cloned without cleanup using the Topo TA cloning kit (Invitrogen). Into thirty-eight 96-well plates, 3,548 white colonies were picked and dissolved in 100 μl 0.1× TE.
Thirty 96-well plates were selected for sequencing and characterization. Plates were heated to 99 C for 10 min prior to use of 2 µl from each of 96 wells in 25 µl amplification reactions using M13 primers run for 29 cycles of 10 sec at 94 C, 30 sec at 50 C, and 1.5 min 72 C. Amplification products were purified with the AMPure Purification system (Agencourt, Danvers, Massachusetts). Expressed sequence tags for purified amplification products were sequenced in the forward direction with primer Smart-seq 5′-AAGCAGTGGTATCAACGCAGAGT-3′ corresponding to most of the SMART CDS Primer II A primer. Following ethanol precipitation, sequencing products were elecrophoresed on a 3730 XL DNA Analyzer (Applied Biosystems, Carlsbad, California).
An EST library database was constructed in FileMaker Pro 5 (FileMaker, Santa Clara, California) following the removal of sequences from 3 failed plates, as well as the removal of low-signal quality sequences, as determined with Sequence Analysis Software v5.4 (Applied Biosystems), and those shorter than 150 nt in length with a poly adenylation motif. Vector and adaptor sequence removal was not necessary in light of the Smart-seq sequencing primer annealing to within 3 nt of the cloned insert; however, the first 20–30 nt were automatically trimmed so as to minimize the inclusion of 5′ sequencing errors. The 2019 EST clone sequences were subjected to an all-against-all BLAST approach (where the ESTs were BLASTed against each other using the blastn tool from the NCBI BLAST package). Each of these EST clones was assigned 1 of 554 unique cluster identification numbers based on an E-value inclusion criterion of 1e−5. The longest sequence was then chosen from each cluster as its “reference sequence.” The reference sequence was compared against the non-redundant nucleotide and protein coding sequence databases in GenBank, using each of the blastn and blastx options in blastcl3 v2.2.22 ( http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/netblast.html), keeping the top 10 best scoring matches. Blastx and blastn output was converted into database-manageable form with a Perl script ( http://jura.wi.mit.edu/bio/education/bioinfo/pages/scripts/blast_parse_0.pl.html). All predictions scoring better than 1e−2 were manually checked to avoid uninformative annotations lacking a descriptor. Reference sequences were converted to putative amino acid sequences in all 3 forward reading frames with transeq for examination of cellular compartmentalization with TargetP v1.1 (Emanuelsson et al., 2007). In addition to global annotations predicted against GenBank nr databases, blastx comparisons were made against a locally compiled sequence database of the following accessions: Q07558 hirudin from H. medicinalis, P84590 hirudin from Poecilobdella viridis, CAA79672 thrombininhibitor from Haemadipsa sylvestris, P09865 bdellin B-3 from H. medicinalis, AAA96144 destabilase I from H. medicinalis, AAA96143 destabilase II from H. medicinalis, 0905140A eglin c from Hi. medicinalis, Q01747 leech antiplatelet protein from Haementeria officinalis, P17350 decorsin from Macrobdella decora, Q9NBW4 therostasin from Theromyzon tessulatum, P16242 ghilanten from H. ghilianii, and patent 2006_US_7.049.124_B1 manillase from Hirudinaria manillensis.
EST sequence clusters with high-scoring blastx matches (E-value < 1e−10) against the local database of salivary sequences, as well as those clusters wherein a highest scoring match from GenBank nr was a known leech salivary anticoagulant, were reconciled into unigenes with CodonCode Aligner (Codoncode Corp., Dedham, Massachusetts). Additional unigenes were constructed from clusters of ESTs for which there was no significant blastx or blastn match from GenBank nr, and for those exceeding 1% of all clones, i.e., >20 clones in the cluster, and for all clusters with a predicted secretory signal peptide. All unigenes were converted to their corresponding predicted amino acid sequences with Virtual Ribosome (Wernersson, 2006). These putative polypeptide sequences then were used to retrieve orthologous sequences from GenBank nr with blastp and from GenBank EST with tblastn. Comparative amino acid sequence alignment was accomplished with MUSCLE (Edgar, 2004) and visualized with Jalview (Waterhouse et al., 2009). Signal peptide prediction was accomplished with SignalP (Emanuelsson et al., 2007). For high-copy clusters not receiving significant annotation, predicted products from reconciled unigenes were examined for conserved domains and motifs against PROSITE and Pfam with MOTIF Search ( http://motif.genome.jp/). We examined the trend in accumulations of new transcripts using Newton-Rhapson estimation on a non-linear general logistic equation [Total*(1-(1/e(obs*CONSTANT)))] in order to predict both the total number of transcripts in the transcriptome and to determine how much more effort might be required for completion.
Following the removal of low-quality and short sequences, 2,019 transcripts remained in the M. decora salivary EST library. With a 1e−5 E-value cut-off, all-against-all blastn comparisons structured 554 clusters. Examination of the rate of accumulation of clusters yielded an estimate of 655 total transcripts, suggesting that fully 4,300 clones would have to be analyzed in order to sequence 95% of all transcripts at least once (and more than 11,000 clones to sequence all transcripts once). The 30 clusters with the largest numbers of clones represented 60% of all sequenced transcripts (Table I). Inasmuch as there were obvious clusters with regulatory or housekeeping function (like 16S rDNA and other mitochondrial transcripts), among those 30 were 8 clusters exhibiting the strongest inferred amino acid homology to known leech anticoagulant proteins, 7 clusters representing mitochondrial genes, a lectoxin-like cluster, and fully 13 clusters without significant inferred amino acid or nucleotide homology to known leech bioactive genes in GenBank nr.
The 30 most frequently sequenced clusters in the Macrobdella decora salivary transcriptome Expressed Sequence Tag library.
Saratin was represented by 110 clones in the M. decora EST library, grouping in 4 clusters. Sequence reconciliation implied 1 major and 2 minor unigene transcripts. These 3 putative transcripts averaged 83% amino acid identity among themselves and 63% identity (average E-value of 3e−17) with saratin from H. medicinalis. Bdellin orthologs comprised 46 salivary clones in 3 clusters, corresponding to 1 major transcript following sequence reconciliation (Fig. 1). This putative transcript shared 57% amino acid identity with bdellin-B3 (E-value of 2e−6) and 51% with LDTI (E-value of 2e−4). A fourth cluster representing a single sequence showed weaker homology to bdellin (cluster 560 E-value = 1e−3). Two distinct unigenes of destabilase were found in the M. decora salivary transcriptome, representing 39 clones in 2 clusters. Both of these matched destabilase I of H. medicinalis at 7e−53. A putative thrombin-inhibiting hirudin-like cluster of 36 clones was detected in the M. decora salivary EST library. No other known sequence produced a better scoring match than the weak 2.10 E-value found in blastx matches to hirudin. All 6 cysteines involved in the 3 disulfide bonds (Dodt et al., 1985) were conserved in the M. decora inferred protein (Fig. 1), although the fifth differs in absolute alignment position. Two clusters of 21 clones of decorsin corresponding to a major and a minor transcript were present in the M. decora EST library. The translated minor transcript was identical to the available amino acid sequence for this gene (Swiss-Prot|P25512). The major transcript differed at 6 residues and included an additional glutamic acid at position 43 of the alignment (Fig. 1). Cluster 45, while not scoring significantly against curated databases, nonetheless bore considerable similarity in size and structure to decorsin. We detected 8 sequences in 1 cluster corresponding to the heparanase class of endoglucornonidases with a blastx score of 3e−26 and possessing a signal peptide sequence. The inferred amino acid sequence of this transcript demonstrated 66% identity to patented “manillase,” the patented endoglucoronidase/hyaluronidase from H. manillensis (US7049124B1). In addition, the best tblastx match against available EST libraries revealed a similarly conserved protein from H. medicinalis (133K group 6320) in GenBank, with 65% amino acid sequence identity to the M. decora inferred polypeptide. Three clones in 1 EST cluster of the M. decora transcriptome demonstrated significant blastx similarity with glossiphoniid antistasin and ghilanten (E-value of 1e−14) and less with hirudinid guamerin (E-value of 9e−6) or bdellastasin (E-value of 0.003). As well, the implied amino acid sequence for the M. decora includes all 3 antistasin domains, conserving all 20 cysteine residues. One salivary transcript from M. decora matched eglin with a blastx E-value of 2e−04, sharing 49% amino acid identity.
In addition to a priori known salivary proteins, other high-copy transcripts (Table I) included 3 predicted to be histidine-rich (clusters 18, 7, and 30), several bearing predicted similarity to protease inhibitors (clusters 16, 26, 46, and 35), and a c-type lectin (cluster 38). Other transcripts, while not necessarily of high copy, also bore sufficient similarity either to unannotated ESTs from other leeches (Table II) or to various bioactive polypeptides of leeches and other organisms that they deserved closer scrutiny (Fig. 2). A putative protein with a fibrinogen-related domain (FreD) that shared significant (2e−43) blastx homology with ficolin, and tblastx homology with an uncharacterized H. medicinalis EST, was found in 2 clusters (562 and 686), representing 4 cloned transcripts (Fig. 2). The secreted protein implied in cluster 493 appeared to share amino acid homology with a pit-viper reprolysin (blastx of 9e−5). Cluster 31, while not specifically matching any annotated records with any significance, exhibits cysteine-rich Pfam UPAR-Ly6 (CL0117) plasminogen-activating domain structure.
Nonspecifically annotated transcripts exhibiting signal peptides with potential protease inhibiting function and similar transcripts from Expressed Sequence Tag libraries of Hirudo medicinalis and Haementeria depressa.
Successful cDNA library sequencing
The repertoire of bioactive proteins encoded in the salivary transcriptome of the North American medicinal leech is considerably more expansive than previously hypothesized (Munro, Siddal et al., 1991; Munro et al., 1992). Our EST library from M. decora salivary tissue revealed a suite of putative anticoagulants and other transcripts associated with sanguivory that have not previously been found together in 1 species of leech. Whereas a transcript sharing homology with the platelet aggregation inhibitor saratin was the second-most common transcript, its frequency was superceded by mitochondrial 16S rDNA (represented by 378 clones grouped in 3 clusters). The SMARTer poly-T CDS Primer IIA (Clonetech) used in first-strand synthesis is designed to target polyadenylated mRNA transcripts; unfortunately, leech mitochondrial rDNA sequences are approximately 70% AT-rich, and M. decora has a 16S rDNA containing a string of 26 adenosines, allowing it to be easily included in cDNA library construction. Nonetheless, the surreptitious sequencing of 16S rDNA and other high-copy mitochondrial transcripts did not significantly impede the characterization of secreted polypeptides with roles in leech sanguivory. Fully 8 salivary proteins known to be involved in anticoagulation and other functions related to blood feeding were identified through blastx and blastn annotations. Only 1 of these, decorsin, was previously known to occur in M. decora. These polypeptides, their functions, and their relative representations in the salivary EST library are more fully evaluated in the following discussion in relation to other known homologs and in descending order of the frequency with which they were found in sequenced clones.
Like Leech Anti-Platelet Protein (Connolly et al., 1992), saratin inhibits platelet aggregation by interfering with von Willebrand factor (vWf)–mediated interactions with collagen (Barnes et al., 2001). Following injury to vascular walls, exposed collagen is proaggregatory first via a conformational change with vWf (Ruggeri et al., 1997), then leading to irreversible binding with the surface glycoprotein complex GP Ib/IX/V (Obert et al., 1999; Watson et al., 2000). These activated platelets further induce thrombus formation through secretion of thromboxane and cytokines. The 12 kDa saratin, first isolated from H. medicinalis, has 6 beta strands and 1 alpha helix in addition to C- and N-terminal unstructured regions of high mobility (Grönwald et al., 2008). Crystallography has implied (Grönwald et al., 2008) that saratin residues Thr56, Phe58, Glu83, and Glu84 (as numbered in Fig. 1) of H. medicinalis are involved in binding to collagen. Besides universal conservation of the 6 disulfide-bond–forming cysteines, the longest strings of conserved residues were FYANRKYT, LDECKKT56C, VFLED, CYYN, and ENYL. Of these, only Thr56 in the alignment ( = Thr34 in UniProtKB|DQVWW8) corresponds to a residue hypothesized to be involved in this protein's binding functionality (Grönwald et al., 2008). However, Val58 of M. decora is a conservative substitution relative to the phenylalanine, and 1 of 2 glutamic acids thought to bind a lysine of collagen is conserved (Glu84). The overall presence of several highly conserved stretches of amino acid residues corroborates suggestions of a 2-site binding mechanism of saratin with exposed collagen fibrils (Grönwald et al., 2008). Calin, another non-enzymatic collagen-binding platelet aggregation inhibitor from H. medicinalis (Munro, Jones, and Sawyer, 1991), is fast-acting, like saratin (Deckmyn et al., 1995; White et al., 2007); however, no amino acid sequence is available for this 65 kDa protein. It is possible that calin activity represents a pentameric form of saratin that does not polymerize in the recombinant produced forms studied in vitro.
This 56 amino acid non-classical Kazal-type cysteine protease inhibitor from H. medicinalis (Fink et al., 1986) inhibits plasmin. Inasmuch as plasmin promotes dissolution of crosslinked fibrin, bdellin may function more to inhibit plasmin's proaggregatory influence on platelets (Watabe et al., 1997) in the absence of fibrin deposition (of which there is little at a leech bite wound during feeding). Bdellin-KL from Hirudo nipponia and bdellin-B3 from H. medicinalis share 87% amino acid sequence identity, more than 50% identity with LDTI, all of which also inhibit mast-cell tryptase, suggesting a broader anti-inflammatory role (Sommerhoff et al., 1994). The longest string of conserved amino acid residues across the 3 bdellins (Fig. 1) was VCGxDGVTY. All 6 cysteines, presumably involved in 3 disulfide bonds, are evolutionary conserved as well. Whereas the complete absence of proline from bdellin has previously been noteworthy (Fritz et al., 1971), 2 are implied in the M. decora ortholog.
The role of destabilase, a 12.3 kDa isopeptidase remains obscure, with implications both for fibrinolytic activity (Zavalova et al., 1996) and inhibition of platelet aggregation (Baskova et al., 2000). Fibrinolytic activity is indicated by its ability to cleave (gamma-Glu)-Lys isopetide bonds between adjacent fibrin molecules in a clot (Baskova and Nikonov, 1991). The purported ability of destabilase in solubilizing established clots led to its inclusion in the Russian drug Piyavit (Panchenko et al., 1995; Baskova et al., 1995). In addition to evolutionary conservation of 14 cysteines, presumably involved in 7 disulfide bonds, there was a 20-residue string of high evolutionary conservation: CTGGRTPTCQDYARIHxGGP.
The antithrombin activity of H. medicinalis saliva, denoted “hirudin,” was the first bioactive substance isolated from an animal for pharmacological use (Jacoby, 1904) and was instrumental in the first successful human hemodialysis treatment prior to the wide availability of heparin (Haas, 1924). Hirudin was eventually isolated in a purified form (Markwardt, 1955, 1992), at which time it was thought to be solely responsible for inhibiting coagulation. With an inhibition constant in the femtomolar to picomolar range, hirudin remains the most potent natural direct thrombin inhibitor known (Greinacher and Warkentin, 2008). It is bivalent, binding irreversibly to the fibrinogen binding exosite as well as to the catalytic active site pocket of thrombin. Hirudins (including the recombinant forms lepirudin and desirudin), the only leech salivary proteins approved for human use, are indicated both for heparin-induced thrombocytopenia (HIT) and for thrombotic prophylaxis following orthopedic surgery. Treatment with hirudins during HIT dramatically decreases the risk of new thrombi by 92% and decreases the probability of heart attack by 14% (Koster et al., 2000; Greinacher and Warkentin, 2008). Moreover, hirudin, unlike heparin, requires no cofactor and is more effective in accessing clot-bound thrombin, promoting dissolution of mural thrombi, and preventing acute coronary syndrome or deep vein thrombosis. The irreversible 1∶1 binding nature of hirudin to thrombin, however, carries with it the risk of severe bleeding in patients with reduced renal function that is necessary for clearing the protein complexes (Greinacher and Warkentin, 2008). Early reports of poor immunogenicity (in light of the short 65 amino acid length) proved premature; risk of immunoglobulin G–mediated anaphylaxis is 0.16% in re-exposed patients (Greinacher et al. 2003). Several properties of the hirudin “core” motifs associated with hirudin's binding to the thrombin catalytic pocket are conserved in the M. decora sequence (Fig. 1), including: an N-terminal pair of hydrophobic residues (MT in M. decora; IT for H. medicinalis), DCT, and CKC, as well as a GSNV region conservatively replaced by chemically similar GGHK in M. decora. As well, in a region that corresponds to exosite binding, the putative M. decora hirudin FESFSLD bears considerable homology with FEEFSLD of H. manillensis.
Prior to this transcriptomic study, the only known bioactive salivary protein from M. decora was decorsin (Seymour et al., 1990). This 4.5-kDa protein has an exposed RGD disintegin-like motif and is a GP IIb/IIIa antagonist that inhibits the end stages of platelet aggregation (Ginsberg et al., 1988; Krezel et al., 1994). In this regard, it shares similarities with the antihemostatic viperid snake venoms kistrin and echistatin (Dennis et al., 1990). Ornatin-c, a GP IIb/IIIa antagonist from the distantly related glossiphoniid leech, Placobdella ornata, shares 34–42% sequence identity with decorsin, including the 6 cysteines involved in 3 disulfide bonds, but not the 2 prolines flanking the exposed RGD motif (Mazur et al., 1991). Only 1 of the 2 prolines thought to add rigidity to the RGD binding motif (Mazur et al., 1991) was found in the major transcript here; neither proline is present in the ornatin-c sequence. Of the 7 additional residues identified as potentially important for decorsin binding (Yang et al., 2004), besides the RGD triplet, only Asp10, Asn18, and Tyr37 are conserved relative to the major transcript (Fig. 1); none of these is conserved in ornatin-c. Cluster 45, while not scoring significantly against curated databases, nonetheless bears considerable similarity in size and structure to decorsin. This predicted protein contained a KGD disintegrin motif in place of decorsin's RGD, yet would still be predicted to target GP IIB/IIIa (Reiss et al., 2006).
“Orgelase” was marketed by Biopharm, U.K. as a leech salivary hyaluronidase in light of its apparent beta-endoglucuronidase activity. Very little is known about this enzyme insofar as the first mentions of it were merely tangential (Sawyer, 1986, 1988), with suggestions for use in treating glaucoma. Details have not been published regarding the size, sequence, structure, or specific activities of the enzyme beyond very broad considerations in expired patent applications (e.g., US005279824A). The amino acid sequence of an apparently related endoglucoronidase from an Asian leech, H. manillensis, released through a patent application (US7049124B1) served as a target for local similarity searches in the present study. The similarly conserved protein from H. medicinalis (133K group 6320) in GenBank with 65% amino acid sequence identity to the M. decora inferred protein, likely corresponds to the previously reported “orgelase” activity for the European medicinal leech (Fig. 1).
The interaction between hemostasis and tumor metastasis has long carried the implication that anticoagulants, especially those of the protease inhibitor type, might be useful in the treatment of a variety of cancers (Saito et al., 1980; Zacharski, 1981). That leech salivary extracts also are capable of reducing the extent of pulmonary tumor metastases (Gasic et al., 1983) led to the discovery of a new class of extremely cysteine-rich serine protease inhibitors, the antistasins (Tuszynski et al., 1987; Nutt et al., 1988). Archetypal antistasins, including antistasin from H. officinalis, ghilanten from H. ghilianii, and therostasin from T. tessulatum, are antagonistic to factor Xa, thus preventing the conversion of prothrombin to thrombin (Dunwiddie et al., 1989; Brankamp et al., 1990; Chopin et al., 2000). Moreover, these inhibitors all are from leeches in Glossiphoniidae. Other antistasin-class salivary proteins from jawed leeches in the Hirudinidae, including bdellasatsin, hirustasin, and guamerins, appear to target kallikrein, plasmin, or elastases as opposed to factor Xa (Salzet, 2001). The implied amino acid sequence for the M. decora ortholog includes all 3 antistasin domains conserving all 20 cysteine residues. The longest region of high-sequence conservation is in the third domain, e.g., CSRxTNxCDC, where amino acid identity exceeds 50% in comparison to antistasin and ghilanten (Fig. 1). Taken together, these observations suggest a factor Xa–inhibiting role in the salivary secretions of M. decora. While anti-Xa activity has been reported in H. medicinalis salivary extracts (Rigbi et al., 1987), the peptide responsible has yet to be isolated and characterized. We found no significant match in available H. medicinalis EST libraries for the putative antistasin from M. decora.
A member of the potato inhibitor I family of serine protease inhibitors, this cysteine-free 8.1-kDa protein is a potent leukocyte elastase and cathepsin G inhibitor (Seemüller et al., 1980; Schneibli et al., 1985). At one time, eglin was under serious consideration for drug development in light of it variously protecting against experimental models of emphysema, blocking neutrophil-mediated platelet aggregation, and inhibiting arthritic-like collagen degradation at very low doses (Lai and Diamond, 1990; Renesto et al., 1990; Steinmeyer and Kalbhen, 1996). Human trials were abandoned in the face of allergenicity and anaphylaxis (Schneibli and Liersch, 1989; Metz and Peet, 1999; Scheneibli, 2006). Amino acid sequence conservation with eglin was more pronounced in the C-terminal half of the molecule, especially the reactive site-loop GSPVTxDxR (Fink et al., 1986), than in the N-terminal sequence (Fig. 1).
Other high-scoring transcripts
Of course, leeches are not the only animals that avail themselves of blood as a primary source of nutrition. A variety of blood-feeding dipterans like mosquitoes, sand-flies, simuliids, tabanids, tse-tse flies, and stable flies have independently evolved mechanisms associated with sanguivory and the need to maintain blood meals in an uncoagulated state. Other than insects, ticks, mites, vampire bats, and even hookworm nematode parasites subsist largely or exclusively on ingested blood. And, while not sangiuvorous, the venoms of several viperid and elaphid snakes are anticoagulative, even as others are procoagulative (Kini, 2006). While collectively, these animals have unrelated recent histories, their respective salivary and toxin repertoires are focused on common targets like the vertebrate hemostatic cascade and the need either to sequester iron or to detoxify heme. These phylogenetically diverse animals might then be expected to have convergently evolved secreted proteins, sharing various domains of predictive homology or otherwise similar compositions. We examined the M. decora EST library for a variety of high-scoring (or highest annotated scoring) matches for their ability to encode secreted proteins that may be expected to be involved in antagonizing coagulatory or inflammatory mechanisms in host blood.
Polypeptides like hirudin, bdellin, eglin, and antistasins each are serine protease inhibitors (serpins). Serpins are ubiquitous among bloodfeeding animals including the AcAP family of inhibitors in hookworms (Mieszczanek et al., 2004), Kunitz superfamily serpins from snakes (Serrano and Maroun, 2005), and the Kazal-type rhodniin in assassin bugs (van de Locht et al., 1995), among others, that are well characterized from, for example, mosquitoes and ticks (Mans et al., 2002; Champagne, 2005; Ribeiro et al., 2007). The notion that the suite of known bioactive serpins from leeches is exhaustive is belied by the variety of other serine protease inhibitors we found among those M. decora EST clusters with predicted signals targeting for secretion. Among these were putative elastase inihibitors beyond eglin detailed above, Kazal-type serpins, cystatins, and a chymotrypsin-like protease inhibitor (Table II). Cluster 46 appears to be in the CRISP family of serpins, like those implicated in cobra venom (Matsunaga et al., 2009), and is highly conserved in a salivary EST library from Haementeria depressa (HDAH06F01 at 3e−23). Three clusters (Table II), representing 3 predicted unigenes, suggested other cysteine-rich anitstasin-like serpins (Fig. 2). One of these, cluster 35, was among the top 30 most frequently sequenced in the library (Table I) and shared considerable amino acid homology with an EST from H. medicinalis and with the N-terminal half of an antistasin-like cocoon protein from Theromyzon rude (Fig. 2). Each of these 3 antistasin-like unigenes encoded a predicted signal peptide and possessed 20 cysteines in positions that were relatively conserved with respect to each other, yet they shared an average of less than 25% amino acid identity amongst themselves and less than 14% identity with the predicted M. decora antistasin.
Among the most frequent clones, we found a c-type lectin represented by 1 cluster of 24 copies (Fig. 2). The putative protein from this cluster best matched a water snake (Enhydris polylepis) lectoxin anticoagulant at 1e−11 with blastx and a predicted c-type lectin in available EST data for the leech Haementeria depressa (tblastx to HDAH06E05.F at 6e−13). C-type lectins, among the first proteins isolated from snake venoms, are known to bind effectively to Gla domains of factors X and XI at nanomolar concentrations. Snake venom proteins bothrojaracin and bothroalterin are c-type lectins inhibiting fibrin production by binding to thrombin exosites (Kini, 2006). While c-type lectins also are well known from the salivary transcriptomes of various mosquitoes (Valenzuela et al., 2002), it is not clear whether they exert their physiological activity as coagulation antagonists, as hemolytic agents, or in an antimicrobial capacity. An additional transcript (cluster 550) comprising a single clone showed significant similarity to an R-type lectin, some of which are known to be hemolytic (Nakano et al., 1999); however, no secretory signal peptide was detected and TargetP suggested mitochondrial localization.
In mammals, ficolins (Fig. 2) are stimulatory as lectins to the complement cascade (Matsushita, 2000) acting in the innate immune system. Ficolin-like secreted proteins rhyncolin 1 and 2 have only recently been characterized from venom of the New Guinea bockadam colubrid snake (Ompraba et al., 2010). Whether their function is antihemostatic or procoagulative has not yet been determined. Mosquito ficolins typically are implicated in primitive metazoan immune functions (Wang et al., 2004). However, mass-spectrometry has revealed female salivary-specific ficolins from Aedes aegypti, which strongly suggests a role in blood feeding (Ribeiro et al., 2007). Reprolysins are thought to cleave vWf precursor proteins and impede hemostasis (Matsui and Hamako, 2005). Cluster 31, while not specifically matching any annotated records with any significance, exhibits cysteine-rich Pfam UPAR-Ly6 (CL0117) plasminogen-activating domain structure.
Other high-copy clusters
Among the top 30 representative clusters in the M. decora salivary transcriptome (Table I) are 2 unigenes predicted to be histidine-rich. Cluster 18, with 97 clones, has a secretory signal and is predicted to be an 89 amino acid mature peptide with 10 His, 22 Asp, and 2 internally repeated elements: HxLHKRSEDSDD and DDxKD. Cluster 7, with 80 clones, is predicted to be a 53 amino acid mature secreted peptide with 8 His residues and a repeated HKxGxSxxPxxxSGH motif. Similarly, cluster 30, with 15 cloned transcripts, is predicted to encode a secreted 149 residue mature peptide that is 13% His and has a pentameric HAKHKRSEDSDVVESEKAAV repeat. Histidine-rich proteins of the unicellular Plasmodium spp. malarial parasites are involved in the sequestration and detoxification of heme and conversion to hemozoin (Pandey et al., 2003). Proteins and glycoproteins with heme-binding histidine residues are highly efficient in the sequestration and uptake of iron and other rare metals (Nriagu and Pacyna, 1988; Kotrba et al., 1999). Because porphyrin toxicity is principally an intracellular phenomenon and not an intestinal intraluminal problem, these Macrobdella decora proteins might play a physiological role in leech nutrition or, like histatins, in preventing putrefaction of the bloodmeal (Kavanagh and Dowd, 2004).
Insofar as homology of amino acid sequences in related organisms indicates orthologous, or at least recently parologous, evolutionarily related genes (descended from a common ancestor), these additional data from a macrobdellid leech permit hypotheses regarding their origin and diversification. In the context of established phylogenic work for the Hirudinida (Fig. 3), we conclude that the ancestral leech, already corroborated as having been sanguivorous (Siddall et al., 2006), was able to target factor Xa with an antistasin, to impede platelet activation both in terms of vWf-mediated and GP IIbIIIa–mediated aggregation with ancestral decorsin/ornatin and LAPP/saratin orthologs, and to employ a c-type lectin for an as-yet unknown function. Other proteins presently appear to be phylogenetically restricted to the arhynchobdellidan jawed-leech lineage such as hirudin, eglin, bdellin, destabilase, and the endoglucoronidase. Additional saliomic studies encompassing a broader range of leech families should clarify this picture. In particular, the principally marine piscicolid leeches should be revealing in light of their intermediate position between glossiphoniid and jawed leeches. Moreover, those piscicolid leeches that feed on sharks and skates, like species of Pontobdella and Branchellion, do so exclusively, perhaps reflective of proteins uniquely adapted to a system in which thrombocytes aggregate independent of thrombin, fibrin, or ADP (Stokes and Firkin, 1971), or simply being able to function in relation to a more basic (pH 7.7) blood meal.
The diversity of “medicinal” leeches spans freshwater habitats in all of the continents (except Antarctica), where presumably they have evolved in isolation from each other for millions of years. Notably, the salivary transcriptome of a European medicinal leech has yet to be examined. To the extent that we were able to find high-scoring matches to available Hirudo spp. ESTs, the latter had been generated from whole body preparations focused on embryonic and neurobiological correlates. While those are characterized in public databases as having originated from H. medicinalis, the leeches are likely to have been misidentified specimens of the closely related Hirudo verbana, the only commercially available European medicinal leech and universally distributed under the wrong name (Siddall et al., 2007). An expanded examination of the salivary transcriptomes of African, Asian, South American or Australian hirudinoid leeches should allow for detailed molecular evolutionary analyses of the various selection pressures on leech bioactive proteins, perhaps even identifying those residues that are under strong purifying or Darwinian selection and of critical importance to their physiological function.
We thank A. Oceguera, A. Phillips, and S. Kvist for their reviews of earlier drafts. This research was supported by a grant (DEB-0640463) from the National Science Foundation and sabbatical stipend support for G.S.M. from the LG Corporation. M.E.S. and G.S.M contributed to research design. G.S.M. developed and sequenced the library. M.E.S. and I.N.S. analyzed the data. While M.E.S. was principal in its writing, all authors contributed to the completion of the manuscript.