Orthoptera songs are widely used for the description and diagnosis of new species. Most of the corresponding sound recordings are in analogue format (tapes), widely scattered among institutions, and only a small fraction is accessible as an organized collection (‘phonothek’). Approximately 12,000 Orthoptera sound recordings, representing about 4,000 species from all biogeographic regions, have been digitized and stored in a database during the DORSA project (Digital Orthoptera Specimen Access – www.dorsa.de). Together with images and collection data of voucher specimens, the DORSA serves as a ‘Virtual Museum’, summarizing distributed collections and phonotheks from several German researchers and institutions. A subset of recordings was used to develop automatic sound recognition tools, by using neural networks fed by acoustic parameters. Relevant parameters, such as carrier frequency and pulse repetition rate, were determined by a special software module, which could then be used to extract those features from all cricket songs hitherto available in the DORSA database. These parameters were then added as annotations to the individual song recordings within the database. The enriched database tables were analysed, revealing outliers due to low-quality recordings or misidentification, permitting a cleaning of the data. For recordings from a limited geographic range, pulse intervals and carrier frequency are sufficient to identify plausible matchings between archived songs and new sound recordings.
Introduction
The species-specific, stereotyped songs of male Orthoptera can be used as a highly reliable feature for the recognition and description of species (Otte 1994). This requires methods for objective song description, some of which were developed long before presentday advanced technologies for sound recording and analysis. In an enlightening review, Ragge & Reynolds (1998) outline the history of song representations, including a reproduction of A. Yersin′s attempt at musical notation for the songs of European Orthoptera, dating from 1854 (loc. cit., Fig. 34). During the 20th century, the development of tape recorders, oscilloscopes and spectrographs resulted in the well-known graphical representations used in modern bioacoustics. In addition, several Orthoptera species became model organisms for neuroethology, revealing the underlying physiological processes of song production, hearing and phonotaxis (Huber et al 1989, Schildberger 1994, Helversen & Helversen 1998, Römer & Krusch 2000). Taxonomic and behavioral studies were successfully combined, and resulted in vast collections of sound recordings. Ideally, they were deposited in major sound archives, such as the Wildlife Section of the National Sound Archive (NSA) in Britain, as is the case for Ragge′s extensive song recordings of Western European grasshoppers. In Germany, which has a strong tradition in insect bioacoustics and neuroethology, recordings were not centrally collected, but remained with the respective researcher or affiliated institution (e.g., Faber 1953).
In an attempt to safeguard these distinct tape collections and phonotheks, songs were digitized, put into databases, and published on the Internet as a ‘Virtual phonothek’ as part of the DORSA project (DORSA: Digital Orthoptera Specimen Access, see http://www.dorsa.de and Ingrisch et al. 2004). A subset of this digitized sound repository was used to develop automated sound classification tools using neural networks (Dietrich et al. 2004). The vision was to integrate acoustic recording into Rapid Assessment Programs, as a noninvasive technique to classify and map acoustic diversity (Riede 1993). Software tools should enable field researchers to classify sound recordings from complex acoustical environments, such as tropical rainforest canopies, by automated filtering, analysis and possibly identification of songs.
In the study reported here we used the feature-extraction modules from our automatic sound classification software (Dietrich et al. 2004) to determine fundamental acoustic parameters such as carrier frequency and pulse rate of all digitized cricket songs available in the DORSA database. Extracted parameters were then annotated as sound metadata. This procedure speeds up feature extraction, hitherto done manually, by orders of magnitude. We demonstrate that such automated feature extraction is reliable and feasible for crickets (Grylloidea), thereby facilitating the generation of a look-up table which can be used to navigate and query the huge DORSA sound repository. Because the locality of the original recording is known, data subsets for certain geographical regions can be extracted for decentralized use by field researchers. Possible applications are the rapid identification and mapping of selected indicator species, or the diagnosis of unknown songs, which are strong indicators for undescribed, ‘new’ species within a certain area.
Methods
Orthoptera song repository within DORSA (Digital Orthoptera Specimen Access).—The DORSA multimedia database comprises 30,000 images of type specimens and 12,000 sound files. These are linked to voucher specimens with collection data, representing 16,000 specimen records from approximately 4,000 species. The sound files were provided in digital (wav) format by the authors for the DORSA project ( www.dorsa.de, see Ingrisch et al. 2004). Most sound files are fully accessible through the SYSTAX database ( http://www.biologie.uni-ulm.de/systax/) and can be used noncommercially, citing the source and the recordist. In addition, DORSA specimen information is reciprocally linked to the Orthoptera Species File (OSF – http://osf2x.orthoptera.org/OSF2.3/) accessible through the Global Biodiversity Information Facility ( www.gbif.org). Toolkit for Orthoptera Song Recognition and Analysis (TOSRA).—TOSRA consists of several modules implemented in C/C++ and MATLAB. They were developed to classify Orthoptera songs, based on neural and statistical pattern recognition algorithms, described in detail by Dietrich et al (2004) and Schwenker et al. 2003. The first step of the insect song processing is resampling the sound files to a standard sampling rate of 44.1 kHz and normalising of the signal to its maximal amplitude, to suppress the influence of the sound volume. Environmental noise reduction is then performed by bandpass filtering (see Fig. 1). Details on the filtering procedure are given in Dietrich et al 2004. Because temporal structure is among the most prominent features to differentiate insect songs at species level, pulse detection is a crucial part of the automated song classification, which is accomplished in the next step.
As shown in Fig. 1, the TOSRA pulse-detection algorithm is based on a lower and an upper threshold function, derived from the local sound energy. To take into account variation of the energy within a single chirp (Figs 1, 2) the threshold functions must not be constant. A pulse onset is detected if the signal exceeds both the lower and upper threshold function within a short, predefined time frame. The pulse offset is determined analogously. These two functions allow determination of the exact temporal position of single pulses, so that the following features can be calculated for each song:
Distance between consecutive pulses
Pulse length
Frequency contour of pulses
Energy contour of pulses
Time Encoded Signal of pulses
Maximal amplitude of pulses.
The extracted features then serve as inputs to a Radial Basis Function (RBF) neural network classifier (Schwenker et al. 2001). Before the RBF net can be used for classification, it has to be trained by a supervised learning procedure utilizing the extracted feature vectors together with the corresponding signal-class labels (the species name of the singing insect). For this training procedure, a subset of the sound data has to be used for each species. As a test-set for automatic identification, 53 species of crickets and katydids were selected from the DORSA database. Classification results based on a statistical cross-validation testing procedure indicate very high classification accuracy (see Table 1).
Table 1.
Error rates for the automated classification of cricket and katydid songs determined for different single features: pulse distance, pulse length, pulse frequency, energy contour, Time-Encodedignals (TES) and maximal amplitude of pulses. The classifier performance based on single features can be improved significantly by combining the single-classifier decisions into an overall decision. In particular the decision template approach is highly accurate.
Annotation and visualisation of song parameters.—The neural network classification shown in Table 1 is based on a comparatively small subset from only 53 species, and is not applicable to the entire DORSA sound archive. However, the TOSRA preprocessing routines could be used to extract important parameters such as carrier frequency or pulse distance from all songs. Therefore, the feature extraction module was applied to all DORSA sound files classified as ‘Grylloidea’ by a batch routine, calculating ‘carrier frequency’, ‘pulse distance’, ‘pulse frequency’ or ‘duty cycle’ (see Fig. 2 for illustration of features). Results were stored within additional parameter columns of the respective soundfile database tables.
Once stored within the database, parameters can be tabulated together with specimen data (cf. Table 2), or visualised by graphics programs. Particularly useful was the application of a desktop Geographical Information System (GIS: ArcView 3.2 by ESRI) for ”mapping“ the parameter space. Such GIS software is designed to connect spatial data with attribute tables. In our case, song parameters are plotted within the parameter space, and each point is linked to tabulated specimen data. This allows user-friendly exploration of the parameter space, by producing distinct maps: individual species can be highlighted, legends on the level of individual species or higher taxa can be generated ‘on the fly’, and outliers can be identified and inspected by clicking on the respective data points (Figs 3–5).
Table 2.
Comparison of automated feature extraction using the TOSRA module with results from conventional song analysis. Parameters obtained using automated feature extraction are labelled: A; conventional measurements of pulse distance and carrier frequency taken from the literature, are labelled: N (Nischk 1999, pp. 113 ff.). Original sound files can be retrieved through the SYSTAX Portal ( http://www.biologie.uni-ulm.de/systax/portal/index.html) by searching for the species name or sound-file code.
Results
To investigate the reliability of the automated feature extraction, some songs were analysed manually, using standard sound-analysis software. Subsequently, the resulting table was analysed for inconsistencies and outliers, and the identified ‘corrupt’ files were removed. Finally, the acoustic parameter space was plotted and analysed, using a Geographic Information System.
Comparison of automated vs conventional feature extraction.—For a subset of Ecuadorian trigoniid (Trigonidiinae) species, parameters were compared directly with extensive conventional sound analysis of trigoniid songs made by Nischk (1999). Table 2 shows a remarkable coincidence of results. However, Nischk (1999) describes additional, idiosyncratic ‘secondary song features’ such as within-pulse frequency modulations, regularity of pulse distances or grouping of pulses. Such groups are characterized as chirps, and their species-specific number of pulses can be an additional, highly reliable feature of the species song. These secondary features are ‘lost’ during the TOSRA simple feature extraction and representation within the parameter space (cf. Fig. 5), but they are evident at first sight in oscillograms of original wav-files, which can always be retrieved from the SYSTAX database.
Up to now, only a small subset of the voucher specimens collected by Nischk has been described taxonomically (Desutter-Grandcolas 2000, Nischk & Otte 2000). The majority of voucher specimens was determined to subfamily or genus level. Specimens which evidently belong to a hitherto undescribed species are labelled with a preliminary code, and the respective voucher specimens can be considered as ‘types of tomorrow’, waiting for taxonomic description or revision. These specimens are stored in the database with their preliminary code name, (e.g., Phalangopsinae sp. PhalOtLP2, Nischk & Otte 2000, p. 248, or the Trigonidiinae listed in Table 2). Therefore, the preliminary code names are indications, but by no means equivalents of well-defined species, until a thorough taxonomic study is completed. Associated sound recordings provide additional and important bioacoustic characters, which should be used in species descriptions. For example, individuals of TrigoSP3 are morphologically undistinguishable, but fall into two groups (n and h) according to distinct pulse distances. Therefore, a closer look into the morphology might reveal subtle differences, which justify description of two species, to be easily distinguished by their distinct songs.
Data cleaning.—Species descriptions including song parameters were published by Nischk and Otte (2000) for four new genera and 10 new species of Ecuadorian Phalangopsinae. In this case, outliers were easily detected by mapping song parameters for these species with the GIS software (Fig. 3, 4). Selecting song parameters for Hattersleya clandestina (Nischk & Otte 2000) revealed two clusters with a similar carrier frequency, but distinct pulse distances around 16 ms and 70 ms, respectively (Fig. 4). Closer inspection of the underlying sound recordings showed that one cluster is made up entirely of recordings contaminated by human-voice ‘announcements’, while the second cluster, around 70 ms, is in accordance with the published results (loc. cit., p. 234), and is based on “clear” or filtered recordings.
Fig. 5 shows a screenshot of a zoom into the labelled parameter space, revealing the close vicinity of Eneopterinae SP2 and Aclodes chamacoru. A closer look into the respective sound files reveals an ambiguous recording of several simultaneously singing species, as the reason for this misrepresentation (Fig. 6). For the mixed sound track, TOSRA modules determined a carrier frequency of 6359 Hz and a pulse distance of 16 ms, corresponding to the high-pitched species Eneop LP2. However, the track was classified by human observers as Aclodes chamacoru, which is the second prominent voice in this mixed recording.
Besides misidentifications, poor recording quality or soundtracks dominated by announcements of human observers were the reasons for discrepancies or contradictions between parameter values of distinct recordings for one species. Hence, automated feature extraction is a prerequisite for data-cleaning procedures.
The acoustic parameter space and generation of the look-up table.—Fig. 7 presents a plot of carrier frequency against pulse distances for all recordings. It shows that there is a correlation between carrier frequency and pulse distance, resulting in an “empty triangle” for high carrier frequencies and longer pulse distances. Otherwise, the parameter space is too densely packed, so that individual clusters can hardly be identified. It is therefore necessary to extract limited datasets for certain taxonomic groups or regional species assemblies (see Figs 3–5).
Discussion
This analysis demonstrates that automated calculation of simple features, such as pulse rate and carrier frequency, is feasible for stereotyped cricket songs with ‘pure’ carrier frequency. Results are comparable to traditional, manual measurements using oscillograms and sonograms. Using batch processes, the TOSRA feature-extraction tool automatically annotated more than 2,000 recordings of cricket songs, stored within the DORSA sound repository. In a first step, carrier frequency, pulse distance and duty cycle were extracted, because it has been shown that these parameters are used by females of Gryllus spp. for conspecific mate recognition and phonotaxis (Schildberger 1994).
The two-dimensional frequency/pulse distance diagrams reveal clearly recognizable clusters, comparable with results from manual analysis. By using the TOSRA software, the acoustic parameter space for a set of recordings can be generated quickly, and then be used to answer a variety of questions. Within Rapid Assessment Programs using acoustic recording (such as the Tropical Ecology, Assessment & Monitoring (TEAM) Initiative http://www.teaminitiative.org), one could use the parameter database as a lookup-table to identify additional ‘new’ species. The only requirement would be a notebook, an extract of the DORSA database, and a digitized sample of the “new” song. Parameters of songs would then be extracted, either by TOSRA modules or traditional methods, and those extracted parameters then compared with the available feature database. Using graphical visualisation, a selection of ‘similar’ songs can be made, searching for nearest neighbors along the frequency and pulse-distance axes. The resulting short list of similar songs from the database can then be compared with the ‘new’ song, using traditional sonagrams and acoustical comparison by the human observer, which includes secondary song features. In the future, neural network approaches could be used for a complete automatisation of song classification.
For the cricket fauna from one locality of lowland Amazonian rainforest, Nischk (1999) could show that clusters of acoustic parameters coincide with morphospecies, which are well-documented by voucher specimens. However, it is highly probable that similar songs at distinct sites are produced by distinct species, resulting in “acoustical vicariance”. Our automatized feature extraction will facilitate the detection of such “acoustic equivalent species”, as well as mapping of ranges and contact zones.
Are “acoustic communities” saturated within a certain parameter space, or are they convergent if we compare distinct faunas? These and similar questions can only be answered by comparative studies, using major databases including voucher specimens, songs and a simple set of sound parameters. Such multimedia databases need not necessarily be centralized, because modern web technologies provide efficient protocols for access to federated databases (see GBIF, www.gbif.org). For interoperability, certain descriptors will be necessary for efficient characterisation. Though traditional song analysis will always be useful, it seems unrealistic that all hitherto digitized sound files could be annotated without using automatic feature extraction software, such as the program presented here.
The present study brings us back to the very first formalized approaches to annotate insect songs, which were the musical annotations by Yersin (cf. Ragge & Reynolds 1998). To administer multimedia data such as sound files in a database, we need simple qualitative descriptors, such as the numeric values for carrier frequency, pulse rates etc. However, this works only for the highly stereotyped songs of Grylloidea. In the case of more complex or broadband songs, such as those produced by many katydids (Tettigonioidea) and grasshoppers (Acridoidea), other descriptors will be needed. These might consist of simplistic descriptors of power spectra, such as frequency maxima and broadness, as given by the Q-value. The development of innovative descriptors, their simple automated calculation, and tools for annotation and retrieval within large databases is the challenge for the future.
Acknowledgments
This work has been partially supported by the DFG (Deutsche Forschungsgemeinschaft) under SCHW 623/4-2, and a DFG travel grant to K. Riede (Ri 1525/2-1 KON1275/2005).