Open Access
How to translate text using browser tools
20 March 2021 Individual distinctiveness across call types of the southern white rhinoceros (Ceratotherium simum simum)
Sabrina Nicolleta Linn, Sabine Schmidt, Marina Scheumann
Author Affiliations +

Individual distinctiveness in the acoustic structure of vocalizations provides a basis for individual recognition in mammals and plays an important role in social behavior. Within a species, call types can differ in individual distinctiveness, which can be explained by three factors, namely differences in the social function, the distance of the caller to the receiver, and the acoustic structure of the call. We explored the variation in individual distinctiveness across three call types (Grunt, Hiss, Snort) of the southern white rhinoceros (Ceratotherium simum simum) and investigated to what extent the abovementioned factors account for individual distinctiveness. Calls were recorded from 25 adult southern white rhinoceroses in six different zoos. We used three methods to compare the level of individual distinctiveness across call types, namely discriminant function analysis (DFA), potential for individual identity coding (PIC), and the information criterion (Hs). The three call types possessed an acoustic structure capable of showing individual variation to different extents. Individual distinctiveness was lowest for Snorts, intermediate for Hisses, and highest for Grunts. The level of individual distinctiveness of all three call types was lower than that previously reported for Pant calls of this species. Calls functioning to mediate intragroup social interactions had the highest individual distinctiveness. This highlights that a given communicative function and the need for individual discrimination during a social interaction have a major influence on the degree of individual distinctiveness.

Vocal communication can be important for coordinating social interactions among animals. Acoustic signals can vary substantially in frequency-time contours and amplitude, and can thus reflect a wide variety of behavioral situations and environmental conditions. Moreover, animals living in a complex social environment have been suggested to use complex communication systems with signals carrying multiple information (e.g., Bouchet et al. 2013; Knörnschild et al. 2019; Peckre et al. 2019). Acoustic signals may convey information about the external environment with which the sender is confronted (e.g., Seyfarth et al. 1980; Manser 2001), about the internal state of the sender (e.g., Bastian and Schmidt 2008; Schehka and Zimmermann 2009; Scheumann et al. 2012), and also about physical characteristics of the sender (e.g., Charlton et al. 2011; Stoeger and Baotic 2016). Thus, vocalization can encode the identity of the individual, which provides the basis for vocal individual discrimination. Individual discrimination is important for regulating social relationships to govern cohesion, attraction, and avoidance, among conspecifics (August and Anderson 1987; Ehret 2006) such as mother–infant reunions, support of specific group members, or avoidance of inbreeding (e.g., Phillips and Stirling 2000; Torriani et al. 2006; Wittig et al. 2007; Müller and Manser 2008; Bouchet et al. 2012; Kessler et al. 2012; Rubow et al. 2018). It therefore can be assumed that the more complex social organization will favor individual distinctiveness in call types. We investigated the encoding of sender identity in the southern white rhinoceros (Ceratotherium simum simum), which, in contrast to all the other solitarily living rhinoceros species, has been described as semisocial (e.g., Hutchins and Kreger 2006).

It has been shown across a wide range of mammalian species (Appendix I) that even if the majority of adult call types show individual distinctiveness, the degree of distinctiveness can vary among different call types within a given species. This suggests that different selection pressures have affected the evolution of individual distinctiveness across different call types. To explain differences in individual distinctiveness related to call type, three major hypotheses have been proposed, which are not mutually exclusive (see Appendix I): the “social function hypothesis” (e.g., Snowdon et al. 1997; Charrier et al. 2001), the “distance communication hypothesis” (Mitani et al. 1996), and the “acoustic structure hypothesis” (e.g., Leliveld et al. 2011).

The “social function hypothesis” assumes that calls functioning in individualized intragroup social interactions, such as contact or aggression calls, should have a higher degree of individual distinctiveness than calls directed to the whole group, such as food, alarm, or loud calls (e.g., Snowdon et al. 1997). Lemasson and Hausberger (2011) expanded the social function hypothesis and proposed that individual distinctiveness was highest in calls related to affiliative contexts, intermediate in calls related to agonistic contexts, and lowest in calls related to general activities or directed to the whole group. Evidence for the social function hypothesis was found in several mammalian orders such as Primates (Chacma baboon—Rendall et al. 2009; rhesus monkeys—Rendall et al. 1998; red-capped mangabeys—Bouchet et al. 2012, 2013), Carnivora (dwarf mongoose—Rubow et al. 2018; domestic dog—Yin and McCowan 2004; giant otter—Mumm et al. 2014), and Rodentia (African woodland dormouse—Ancillotto and Russo 2016).

The “distance communication hypothesis” suggests that individual distinctiveness is related to the transmission distance (Mitani et al. 1996). Thus, long-distance calls emitted out of visual contact with the receiver should have a higher level of individual distinctiveness than calls uttered in close distance where visual or tactile information are additionally available (Mitani et al. 1996). Evidence for the distance communication hypothesis was found in primates (chimpanzees—Mitani et al. 1996; rhesus monkeys—Rendall et al. 1998; gray mouse lemurs—Leliveld et al. 2011), carnivorans (giant otters—Mumm et al. 2014), and rodents (Ancillotto and Russo 2016).

The “acoustic structure hypothesis” is related to call-type-specific vocal production mechanisms. In mammals, the vocal production apparatus is evolutionarily conserved and consists of the lung, the larynx with the vocal folds, and the supra-laryngeal system with the throat, mouth, and nose (e.g., Fant 1960; Lieberman and Blumstein 1988; Fitch 2010). Thus, source- and filter-related factors, namely the anatomical variation of the vocal folds defining the fundamental frequency and the anatomical variations of the supra-laryngeal vocal tract creating formants (source–filter theory; see Fitch 2010; Taylor and Reby 2010), determine individual distinctiveness (e.g., Scherer 1989; Fitch 1997; Belin et al. 2004; Pfefferle and Fischer 2006; Plotsky et al. 2013). In narrow-band tonal calls of high to ultrasonic fundamental frequencies, harmonics at the source level are widely spaced, resulting in little interharmonic energy that can be filtered by the vocal tract. Thus, individual distinctiveness in these calls is critically coded by variation in the fundamental frequency (Yin and McCowan 2004; Leliveld et al. 2011). In contrast, in broadband calls of low fundamental frequency, or without detectable harmonic structure (termed noisy calls), there is a dense energy distribution at the source level. In these calls, the filter function of the vocal tract is the predominant factor determining individual distinctiveness (e.g., Rendall et al. 1998; Taylor and Reby 2010). Even if both factors can encode individual identity, it has been hypothesized that narrow-band harmonic calls are better suited to code for sender identity than broadband noisy calls (Yin and McCowan 2004; Leliveld et al. 2011). Here, the question arises to what extent animal species that predominantly use noisy calls encode sender identity in their vocalizations. Thus, we investigated the encoding of sender identity in the southern white rhinoceros, a species in which noisy calls dominate the vocal repertoire and little is known about information encoded in the vocalizations.

In southern white rhinoceroses, adult bulls live solitarily, but cows occur in groups of different composition (Owen-Smith 1973). Most southern white rhinoceros groups are based on a mother–offspring bond and consist of an adult female and her offspring (Owen-Smith 1973). Adolescents often join with similar-aged companions or mother–offspring dyads. These groupings can persist for extended periods of more than a month or only a couple of days. Group sizes of over 10 individuals can occur (Owen-Smith 1973; Shrader and Owen-Smith 2002). The mating system of southern white rhinoceroses is territorial-based, with males defending their own territories and females ranging freely between male territories (Owen-Smith 1973; Kretzschmar et al. 2020). Given the poor eyesight of rhinoceroses, this more pronounced social organization may favor a more complex acoustic communication system. Indeed, acoustic signals play an essential role in the coordination of mother–infant interactions (Linn et al. 2018), during friendly encounters, during aggressive interactions (Owen-Smith 1973; Policht et al. 2008; Jenikejew et al. 2020), and during mating behavior of southern white rhinoceroses (Owen-Smith 1973; Cinková and Shrader 2020). For example, vocalizations play a very important role in coordinating male and female behavior during consortship (Owen-Smith 1973) where bulls follow a single cow for 2–3 weeks. Bulls emit Pant calls suggested to contain cues about the physical characteristics of the sender, signaling male quality (Cinková and Policht 2014; Cinková and Shrader 2020). If cows are not ready to accept precopulatory contact, they do not tolerate such approaches and usually respond with aggressive calls such as Hisses and Grunts (Owen-Smith 1973).

The southern white rhinoceros has a distinct acoustic communication system in which 10–11 different call types have been discriminated onomatopoetically (Owen-Smith 1973) or based on the acoustic structure (Policht et al. 2008). The majority of calls were described as noisy calls (e.g., Owen-Smith 1973; Policht et al. 2008; Linn et al. 2018). There is, moreover, some evidence for a strong innate component to the development of vocal usage and production in southern white rhinoceroses (Linn et al. 2018).

Only one call type, the Pant (Fig. 1), has been studied in detail. The Pant consists of bouts of repetitive noisy calls produced during inhalation or exhalation and is emitted during isolation from the group, when approaching other conspecifics, or in the mating context (e.g., Owen-Smith 1973; Policht et al. 2008; Cinková and Policht 2014, 2016; Linn et al. 2018; Cinková and Shrader 2020). It has been found that the Pant encodes information not only about the sender, such as individuality, subspecies, age class, sex, and dominance status, but also about the motivation of the sender (Cinková and Policht 2014, 2016; Cinková and Shrader 2020) and that conspecifics were able to extract sex and subspecies in playback experiments (Cinková and Policht 2016; Cinková and Shrader 2020). For the other call types, the potential for individual signatures is still unknown.

Fig. 1.

Sonograms of the common call types of the southern white rhinoceros: Grunt, Hiss, Snort, and Pant. The panel for Grunt includes a zoomed-in sonogram to show the harmonic structure of the call. F0—fundamental frequency, F1—first formant, F2—second formant.


In this study, we investigated the potential for coding sender identity in three of the most common call types of the vocal repertoire of the southern white rhinoceros (C. s. simum; Fig. 1). These three call types were emitted in different contexts, at different distances of the caller from their recipient, and differed in their level of harmonicity. Rhinoceros calls therefore are a promising model to explore the above hypotheses on call-type-related differences in distinctiveness. The Snort is uttered during general activities, such as feeding or resting. It is a noisy call, which sounds like an air blow through the nostrils or the mouth (e.g., Owen-Smith 1973; Policht et al. 2008; Cinková and Policht 2014). The Hiss and the Grunt are uttered during agonistic interactions (e.g., Owen-Smith 1973; Policht et al. 2008; Cinková and Policht 2014; in previous publications, the Hiss has been termed Threat, but we aim to be consistent in labeling all call types using onomatopoetic labels). The Hiss is suggested to serve as first warning, for example, as a reaction to the approach or presence of another individual, whereas the Grunt signals a more pronounced motivation to fight. When the recipient does not react, Hisses are often followed by Grunts in combination with agonistic displays such as horn clashing (Owen-Smith 1973; Policht et al. 2008). Hisses and Grunts are emitted commonly by females or adolescents in response to the presence of a male (Owen-Smith 1973; Policht et al. 2008; personal observations). Hisses sometimes also are emitted in interactions between females or adolescents (Owen-Smith 1973; Policht et al. 2008; personal observations). Both call types differ in their level of tonality. Thus, the Grunt is a broadband call that contains low-frequency harmonic components, whereas the Hiss is a broadband call without tonal structure. To compare our data with the results of Cinkova and Policht (2014) for Pant calls, we calculated the information criterion (Hs), which is rather insensitive to differences in sample size (Beecher 1989). In addition, we used discriminant function analysis (DFA) and potential for individual identity coding (PIC) as reported in the literature (see Appendix I) to compare the level of individual distinctiveness between different call types.

To test the three hypotheses, we made the following predictions about how the level of individual distinctiveness should differ between call types (Table 1). For the social function hypothesis, we predict that the Pant, the Hiss, and the Grunt, uttered during specific social interactions, will have a higher level of individual distinctiveness than Snorts uttered during general activities, such as resting or feeding. Moreover, the level of individual distinctiveness should be higher for the Pant uttered during affiliative social interactions than for the Hiss and Grunt uttered during agonistic interactions. For the distance communication hypothesis, we predict that Pant and Snort uttered at variable distances will show a higher level of individual distinctiveness than Hiss and Grunt uttered during close-distance interactions. For the acoustic structure hypothesis, we predict that the Grunts in which a harmonic structure and formants are obvious will show the highest level of individual distinctiveness, Hisses and Pants containing formant-like structures will show an intermediate level, and nasal Snorts will show the lowest level of individual distinctiveness.

Materials and Methods

Subjects and study site.—Recordings were made on two juvenile and 23 adult southern white rhinoceroses ranging from 2 to 45 years of age at the following six zoological institutions (Table 2): Serengeti-Park Hodenhagen (February–March 2012, May–June 2014), Dortmund Zoo (September–October 2014), Augsburg Zoo (July–August 2014), Osnabrück Zoo (April–May 2014), Erfurt Zoo (April–May 2015), and Gelsenkirchen Zoo (August–September 2015). Due to the fact that there is no evidence for seasonal trends in reproduction in female rhinoceroses in zoos (Roth 2006), and that reproductive cyclicity in females occurs throughout the year (Patton et al. 1999; Brown et al. 2001), we hypothesize that the different dates had no influence on vocalizations. For five of the six institutions, the groups were observed when the adult bull was kept together with the adult females and their offspring. In the Dortmund Zoo the adult bull was separated physically during the whole observation period; however, he had visual and olfactory contact with the adult females.

Table 1.

Predictions of level of individual distinctiveness for southern white rhinoceros call types (including acoustic structure, mouth position, context in which they are given, and typical distance at which they are exchanged) and predictions for acoustic variability and individual distinctiveness based on the different hypotheses; SF = social function hypothesis, DC = distance communication hypothesis, AS = acoustic structure hypothesis; inter. = intermediate.


At Augsburg Zoo, the rhinoceros group consisted of three adult females and one adult male. The rhinoceroses were observed in a 14,000-m2 outdoor enclosure where they lived together during the day with Cameroon sheep (Ovis aries) and blesbok (Damaliscus pygargus phillipsi). At Osnabrück Zoo, we recorded three adult females and one adult bull that were kept in a 2,000-m2 outdoor enclosure together with red river hogs (Potamochoerus porcus) and Chapman's zebras (Equus quagga chapmani). At Dortmund Zoo, we observed two adult females in their 2,250-m2 outdoor enclosure. One of the females had a 5-month old calf. At Gelsenkirchen Zoo, the rhinoceros group consisted of two adult females and one adult bull. The rhinoceroses were observed in a 5,000-m2 outdoor enclosure where they lived together with several antelope species. At Erfurt Zoo, we recorded two adult females and one adult bull kept together in a 3,500-m2 outdoor enclosure during the day. At Serengeti-Park Hodenhagen, the rhinoceros group consisted of 9–11 individuals (2012: six adult females, one adult male, two infants; 2014: five adult females, one adult male, two juveniles, three infants). The adult male was occasionally separated from the herd. Data were mainly recorded in the 9-ha drive-through outdoor enclosure where the rhinoceroses lived together with several other species (e.g., Watusi cattle—Bos primigenius f. taurus; zebras—E. q. chapmani; ostriches—Struthio camelus; lechwes—Kobus leche; addax antelopes—Addax nasomaculatus; dromedaries—Camelus dromedarius). Our research followed the ASM guidelines (Sikes et al. 2016). The article contains only observational data of zoo animals during their daily routine without any manipulation of the animals.

Data collection.—Recordings took place throughout the day between 0600 and 1700 h. Audio and video data were collected using the focal animal sampling method (Altmann 1974). Each rhinoceros of a group was observed for a 10-min interval in block-randomized order. When all subjects had been observed once, the next block of focal observations started. Overall, a total of 384 h of data were recorded and analyzed. We recorded 81 h at Augsburg Zoo, 54 h at Osnabrück Zoo, 60 h at Erfurt Zoo, 95 h at Serengeti-Park Hodenhagen, 40 h at Gelsenkirchen Zoo, and 54 h at Dortmund Zoo. Recordings were mainly made in the outdoor enclosures from the visitor or keeper area. Occasionally, recordings were made in the indoor enclosures, when the rhinoceroses had to stay indoors due to weather conditions.

Table 2.

Demographic data of southern white rhinoceroses included in the study and number of selected high-quality calls per call type used for the acoustic analyses.


Since it has been suggested that white rhinoceros produce infrasound vocalizations (Muggenthaler et al. 1993) acoustic data were obtained using a Sennheiser omnidirectional microphone (MKH 8020; Sennheiser, Wedemark, Germany) with a frequency response of 10 – 60,000 Hz (frequency response from 10 to 20,000 Hz ± 5 db) equipped with a windshield and a boom pole. The microphone was connected to a Sound Devices 722 State Recorder (Sound Devices, LLC, Reedsburg, Wisconsin; frequency response of the recorder: 10 ± 20,000 Hz; settings: 44.1 kHz sampling rate, 16 Bit, uncompressed. wav format). Concomitant video recordings were done using a digital camcorder (Sony DCR-SR36E, Tokyo, Japan). To allocate vocalizations to individuals, the observer (SNL) noted the identity of the caller.

Acoustic analysis.—The spectrograms of all audio recordings were inspected visually using Batsound Pro (2013; settings: fast Fourier transformation [FFT] 512, Hanning window). Calls were classified visually based on previously published vocal repertoires (Policht et al. 2008; Linn et al. 2018). In these studies, call classification was validated using multivariate statistics. For further acoustic analyses, we only selected calls of high quality (no overlap with other sounds, good signal-to-noise ratio, no clipping). The recordings from different zoos were affected by different ambient noise (e.g., Baker and Logue 2007; Maciej et al. 2011) such as urban, traffic, and building construction noise. Since low frequency signals travel over long distance, even noise sources far away from the recording site necessarily affect the sound recordings, even in high-quality recordings. We used a noise reduction method as applied in other studies, when animal vocalizations were hampered by site-specific noise (e.g., Liu et al. 2003; Baker and Logue 2007; Nair et al. 2009). Namely, we preprocessed the sound files using a bandpass filter of 10 – 10,000 Hz followed by the Wiener Noise Suppressor with Harmonic Regeneration Noise Reduction (HRNR) algorithm (Plapous et al. 2005, 2006) in Matlab (2018) (script modified from Pascal Scalart version We determined a 200-ms noise segment shortly prior to or after the vocalization of interest, which was used as a statistical estimate of the ambient noise and filtered from the original recording of the vocalization to obtain an estimate of the underlying vocalizations (Wiener Filter). Since the Grunts contained a fundamental frequency with harmonics, we decided to use additionally the HRNR method, which is suggested to reduce harmonic distortions for small signal-to-noise ratios (Plapous et al. 2005, 2006). Afterwards, the preprocessed audio files were stored as separate wave files for further acoustic analysis.

We are aware that filtering the acoustic recordings might influence the acoustic measurements and that filtering can cause harmonic distortions known as musical notes. We tried to reduce these effects as much as possible by using 1) high-quality calls, 2) the same procedure for all recordings, 3) a long noise segment directly preceding or following the respective vocalizations without any distinct sound events (e.g., bird calls, human speech) to calculate the statistical background noise, and 4) by using a noise reduction method suggested to reduce harmonic distortions. For Hisses and Snorts, we listened to all filtered vocalizations and selected only calls where musical notes could not be perceived by the experimenter. Taking a random sample of all Hisses and Snorts led to comparable statistical results as taking a sample of these call types including only filtered vocalizations without detectable musical notes. Thus, for the Grunts, for which a limited sample size was available, all calls were used. Sonograms of examples of the original and filtered calls are presented in  Supplementary Data SD1.

Because the number of calls per call type and individual varied widely, we randomly selected 5 – 20 calls per individual of every call type for acoustic analysis to have a call balanced data set. Individuals with less than five calls per call type were not taken into account. In total, 651 calls were included in the acoustic analysis (Table 2; 60 Grunts, 286 Hisses, 305 Snorts). We also recorded Pants in the present study. However, due to their low amplitude and interferences with environmental sounds in the outdoor enclosures, most of these Pants did not satisfy our quality criteria. Therefore, we referred to the results reported in Cinkova and Policht (2014) for comparisons.

The spectral and temporal parameters that were measured differed depending on the call types. We described the spectral composition using Praat (2018; self-written script—Boersma 1993, 2001) by measuring the following nine acoustic parameters for all call types: call duration (DUR), time of maximum amplitude (timeMAXPEAK), percentage of voiced frames (VOI), the center of gravity (COG) of the spectrum, standard deviation of the frequency (SD) in the spectrum, the skewness (SKE) as a measure of symmetry of the spectrum, the kurtosis (KUR) describing the deviation of the spectrum from a Gaussian distribution, harmonic-to-noise ratio (HNR), and Wiener entropy (ENTR). For full definitions of all acoustic parameters, see Table 3. Since harmonic-to-noise ratio and Wiener entropy values are based on logarithmic scaling, we have converted these logarithmic values to a linear scale for all subsequent calculations.

If no fundamental frequency contour could be determined in the sonogram (noisy calls) for a time frame, the time frame was set as unvoiced for the calculation of the percentage of voiced frames (VOI). For the harmonic Grunt, we included four additional parameters characterizing the contour of the fundamental frequency (F0): minimum F0 (MINF0), maximum F0 (MAXF0), mean F0 (MEANF0), standard deviation of the F0 (SDF0). We used a semiautomatic procedure for pitch tracking. If necessary, we corrected the pitch tracking manually by matching the extracted contour with the sonogram (settings: submenu: “To pitch”; min pitch: 10 Hz; max pitch: 3,000 Hz; time steps: 0.005). However, since it has been suggested that noisy calls might be well suited for extraction of filter-related formants (e.g., Plotsky et al. 2013; Gamba 2014), we additionally measured four formant parameters using Praat sub-menu “quantify formant”: first formant (F1), bandwidth of the first formant (BDF1), second formant (F2), and bandwidth of the second formant (BDF2). For the Grunts, we estimated the expected number of formants based on the following formula (Pfefferle and Fischer 2006):


where N = number of formants, L = vocal tract length [m], c = speed of sound (340 m/s), and fc = cutoff frequency of the measurement range [Hz]. We based our calculation on the oral vocal tract length (0.72 m) of a cadaver measured by R. Frey (Leibniz Institute for Zoo and Wildlife Research, pers. comm.) to get an indication of how many formants we can expect. Based on the calculated values and on visual inspections of the sonogram, we used the following setting for Grunts: number of formants: 4; max formant value: 1,000 Hz; time steps: 0.05 s. For the Hiss we were not able to use the formula since the expected formant frequencies did not correspond to the dominant frequency bands in the sonogram. To track these frequency bands we based our setting on visual inspection of the sonograms and used the following settings: number of formants: 3; max formant value: 5,000 Hz; time steps: 0.05 s. For Snorts, the frequency band of high energy was reflected by the center of gravity. Further emphasized frequency bands were barely detected. Therefore, we measured no formants for Snort calls.

Table 3.

Description of measured acoustic parameters.


In addition, we measured the minimum frequency (MIN), maximum frequency (MAX), and bandwidth (BAND), as well as the frequencies of the first, second, or third quarter of total energy in the spectrum (25%QUART, 50%QUART, 75%QUART; FFT 1024, Hanning window) for all call types using the automatic measurement routine of Avisoft (2018). Measurements were taken at the time point of maximum amplitude (max) as well as across the whole call (mean).

Statistical analysis.—In the first part of the analysis, we investigated the potential of each call type to encode sender identity using the whole data set. Using the Kolmogorov–Smirnov test, we confirmed that the majority of acoustic parameters for the majority of individuals were normally distributed (P ≤ 0.05). We tested whether the acoustic parameters differed between individuals by calculating a linear mixed model with the acoustic parameter as the dependent variable, the sender as predictor variable, and zoo as a random variable (“nlme” package; Rstudio Team 2016), and tested the effect of the sender using the “anova” function. The random variable zoo was added to account for call adaptations in response to site-specific noise, or similarities based on relatedness of individuals in a given zoo. To control for multiple testing of the same null hypothesis, we carried out the Fisher-Omnibus test (Haccou and Melis 1994). This test combines the P-values of the different ANOVAs into a single chi-square distributed variable resulting in an overall P-value and thereby in a rejection or acceptance of the null hypothesis. The degrees of freedom represent twice the number of included P-values. Based on the significant parameters in the linear mixed model, we carried out a principal component analysis (PCA) and extracted principal components (PCs) with an eigenvalue higher than 1 to reduce the number of parameters. In that manner, correlating acoustic parameters were represented by the same PC. To investigate whether calls can correctly be classified to the respective individuals, we carried out an independent DFA based on these PCs using the leave-one-out method for cross-validation. To test whether the number of correctly classified calls was significantly higher than expected by chance, we performed a binomial test for each subject and calculated the level of agreement using the kappa test (Scheumann et al. 2007). The level of agreement was defined as follows: Cohen's kappa < 0.00 = poor agreement, 0.00–0.20 = slight agreement, 0.21–0.40 = fair agreement, 0.41–0.60 = moderate agreement, 0.61–0.80 = substantial agreement, and 0.81–1.00 = almost perfect agreement (Landis and Koch 1977). To estimate which parameters were important for classification, we investigated the correlation between the DFA function with the PCs and afterwards the correlations of the PCs with the acoustic parameters. Parameters with a loading factor higher than 0.7 were considered as having a strong impact on the respective PC.

In addition, we calculated potential for individual identity coding (PIC) for each parameter and call type according to Robisson et al. (1993). The PIC tested whether the interindividual variation of a call type was larger than its intraindividual variation. For the PIC analysis, we calculated the mean (MEANWithin) and standard deviation (SDWithin) of each subject for each acoustic parameter as well as for the mean (MEANBetween) and standard deviation (SDBetween) of the whole data set. Using these parameters, we obtained the within-individual (CIWi) and between-individual (CIB) coefficients of variation (CI = 100 * (1 + 1/4n) * SD/MEAN), where n is the number of calls. Further, we calculated the CIW by averaging the CIWi of all subjects. We determined the PIC for each parameter by calculating the ratio PIC = CIB/CIW (e.g., Ligout et al. 2004; Bouchet et al. 2012). A value of PIC > 1 indicates that this parameter is potentially capable of encoding individuality. Additionally, we calculated the PICOverall as mean of all PIC values across the parameters (Salmi et al. 2014).

In the second part of the analysis, we aimed to compare the level of individual distinctiveness across call types. Since the results of the DFA are affected by the number of individuals included in the analysis (e.g., Beecher 1989), we balanced our sample and compared the six individuals for which data on all three call types were available. Then, we performed again the DFA as described before.

The information capacity criterion (Hs) according to Beecher (1989) is based on information theory and calculated in bits. The value 2Hs estimates the number of individuals that can be potentially discriminated based on the considered acoustic parameters of the call. We carried out a one-way ANOVA testing whether the PC scores of the above described PCA differed between individuals. We used the mean squares (MS) of the significant PC components (e.g., Beecher 1989; Bouchet et al. 2013) to calculate the estimates for within-individual variance (S2 W = MSW) and between-individual variance (S2 B = (MSB – MSW)/n0 according to Lessells and Boag (1987). Thereby, MSB is the mean square of between-individual variance, MSW is the mean square of within-individual variance and n0 is a coefficient related to the sample size. The value of n0 is calculated using following formula: fi01_440.gif (a = number of groups; ni = number of calls in the ith group) and represents the mean sample size per individual. Based on these estimated variances, we calculated the information criterion (Hi = log2 (S2 T/S2 W)). The total variance S2 T was calculated as the sum of S2 W + S2 B. To estimate the information capacity of a call the information criterions of all significant PCs were summed (Hs = ∑Hi).

For the comparison of the level of individual distinctiveness, we also calculated the PICoverall and the Hs for the balanced data set.


Grunt.—The ANOVAs revealed that 21 out of 29 acoustic parameters were significantly different across individuals (F5, 54 ≥ 5, P ≤ 0.040 and for MINF0, MAXF0, and MEANF0 F5, 36 ≥ 4, P ≤ 0.010, Fisher-Omnibus test: χ2 = 281.58, d.f. = 58, P < 0.001; Table 4). A PCA based on these significant parameters (except MINF0, MAXF0, and MEANF0, which could not be obtained for all Grunt calls and the other call types) extracted five PCs with an eigenvalue higher than 1 explaining 85% of the variance. An independent DFA based on these five PCs was able to classify 65% of the calls to the respective individual (cross-validation: 57%). Significantly more calls were correctly classified than expected by chance for five out of six individuals (binomial test: P ≤ 0.036). The kappa test resulted in a moderate agreement between the results of the DFA and the observed data (0.56). The DFA calculated five DFs. DF1 and DF2 explained 75% of the variation in the calls. DF1 showed the highest correlation to PC2 (r = –0.604) and DF2 to PC1 (r = 0.732). PC1 showed the highest loading on parameters 50%QUART(mean), 25%QUART(mean), MAX(mean), BAND(mean), and COG (r ≥ 0.810). PC2 showed the highest loading on parameters 25%QUART(max) and 50%QUART(max) (r ≥ 0.796). Thus, spectral parameters play a predominant role in encoding sender identity. Twenty-four of 29 parameters showed a PIC > 1 suggesting a potential for identity coding (Table 4).

Table 4.

Individual differences in the acoustic parameters of the Grunt of the southern white rhinoceros. PIC = potential for individual identity coding, CIB = between-individual coefficient of variance, CIW = within-individual coefficient of variance. Bold indicates PIC > 1.0 and P > 0.05; *F5, 36.


Hiss.—The ANOVAs revealed that 17 out of 24 parameters were significantly different across individuals (F20, 265 ≥ 2.0, P ≤ 0.030; Fisher-Omnibus test: χ2 = 248.10, d.f. = 48, P < 0.001; Table 5). A PCA based on these significant parameters extracted five PCs with an eigenvalue higher than 1 explaining 78% of the variance. An independent DFA based on these five PCs was able to classify 26% of the calls to the respective individual (cross-validation: 19%). The kappa test revealed a slight agreement (0.20) between the observed data and the classification by the DFA. For 11 out of 21 individuals significantly more calls were correctly classified than expected by chance (binomial test: P ≤ 0.047). The DFA calculated five DFs. DF1 and DF2 explained 71% of the variation in the calls. DF1 showed the highest correlation to PC2 (r = 0.627) and DF2 showed the highest correlation to PC5 (r = 0.794). PC2 showed the highest loading on factors SD, ENTR, and MIN(max) (r ≥ |0.701|). Seventeen out of these 24 parameters showed a PIC > 1 and thus could potentially be involved in the encoding of individuality (Table 5).

Table 5.

Individual differences in the acoustic parameters of the Hiss of the southern white rhinoceros. PIC = potential for individual identity coding, CIB = between-individual coefficient of variance, CIW = within-individual coefficient of variance. Bold indicates PIC > 1.0 and P > 0.05.


Snort.—The ANOVAs revealed that 16 out of 20 parameters that were measured for Snort vocalizations differed significantly across individuals (F22, 282 ≥ 2, P ≤ 0.028; Fisher-Omnibus test: χ2 = 219.20, d.f. = 40, P < 0.001; Table 6). The PCA based on these acoustic parameters extracted three PCs with an eigenvalue higher than 1 explaining 77% of the variance. An independent DFA based on these three PCs was able to classify 16% of the calls to the respective individual (cross-validation: 14%). The kappa test showed only a slight agreement (0.11). For six out of 23 individuals, significantly more calls were correctly classified than expected by chance (binomial test: P ≤ 0.039). The DFA calculated three DFs. DF1 and DF2 explained 77% of the variation in the calls. DF1 showed the highest correlation to PC2 (r = 0.840), whereas DF2 showed the highest correlation to PC1 (r = 0.923). PC1 showed the highest loading on almost all filter-related parameters (r ≥ |0.700| for all except MIN(max) and 25%Quart(max)). PC2 showed highest loading on MIN(max) (r = 0.747). All 20 parameters showed a PIC > 1 and could potentially be involved in the encoding of individuality (Table 6).

Comparison of call types—The DFA based on a balanced sample of an identical number of individuals per call type (nind = 6, 5–20 calls per individual; Table 7) revealed a classification accuracy of 65% in Grunts (cross-validation: 57%), 44% in Hisses (cross-validation: 38%), and 30% in Snorts (cross-validation: 25%). Thus, classification accuracy decreased from Grunts to Hisses to Snorts. This was supported by the kappa values, which also decreased from 0.56 for Grunts, suggesting moderate agreement, to 0.32 for Hisses, suggesting a fair agreement, to 0.13 for Snorts, reflecting a slight agreement. In addition, the overall PIC and the Hs showed the same pattern. Based on the subject balanced data set, the PICOverall and the Hs were lowest for Snorts (PIC = 1.1; Hs = 0.59), intermediate for Hisses (PIC = 1.2; Hs = 0.91), and highest for Grunts (PIC = 1.3, Hs = 2.63; Table 7). The values obtained for the balanced data set did not vary much from the total data set for Snorts (PICOverall = 1.2, Hs = 0.50) and only slightly for Hisses (PICOverall = 1.2, Hs = 1.25).


All three call types, the Grunt, the Hiss, and the Snort, possessed an acoustic structure capable of encoding individual identity according to their overall PIC (larger than 1) but differed in their acoustic variability and individual distinctiveness (Table 7). Based on the calculated information criterion (Hs), the level of individual distinctiveness increased from Snort to Hiss to Grunt. The Hs for the Pant reported by Cinková and Policht (2014; Hs = 3.15) was much higher than the Hs determined for the call types analyzed in the present study. Our analysis revealed that the differences in the degree of individual distinctiveness across call types are barely explained by the distance communication hypothesis, partly by the acoustic structure hypothesis, and best by the social function hypothesis.

Table 6.

Individual differences in the acoustic parameters of the Snort of the southern white rhinoceros. PIC = potential for individual identity coding, CIB = between-individual coefficient of variance, CIW = within-individual coefficient of variance. Bold indicates PIC > 1.0 and P > 0.05.


Table 7.

Comparison of the potential for individual identity coding and classification accuracy between the call types Grunt, Hiss, and Snort of the southern white rhinoceros. PIC = potential for individual identity coding, Hs = information criterion, DFA = discriminant function analysis, Total = total data set, Bal. = subject balanced data set, n = number of individuals, PC = principal component.


The present data provide no support for the distance communication hypothesis. Individual distinctiveness was much higher in the Grunt and the Hiss used for short-distance communication than in the Snort that is uttered at variable distances to other individuals (Linn et al. 2018). However, the hypothesis is supported when taking into account the Pant with its high degree of individual distinctiveness, which has been suggested to serve for long-distance communication since this call type has been recorded in situations with conspecifics several hundred meters away (Cinková and Policht 2014). The Pant is uttered with the mouth closed (sometimes only the lip is moving due to flehmen during vocalizations; Linn et al. 2018; personal observations) as is the Snort, and sound pressure levels of nasal vocalizations in general are much lower than those of oral sounds due to the fact that in most mammalian species the nasal passages are convoluted and filled with spongy absorbing tissue (Wiley and Richards 1978). Thus, in African elephants (Loxodonta AfricanaStoeger et al. 2012) and sheep (Sébe et al. 2010), oral calls are considerably louder than those emitted through the nose or trunk. As vocalizations with low amplitude will not propagate as far as those with high amplitude, it is questionable whether the Pant and the Snort are used for long-distance communication. That the Pant indeed is a low-amplitude call is supported by the difficulties we had to record high-quality Pant calls during social interactions. In the present study, we recorded 690 Pant calls. However, due to its low amplitude characteristics, interferences with sounds from animal locomotion or Hisses of female conspecifics, in particular, we were not able to extract a sufficiently large number of calls satisfying our quality criteria. We therefore compared our data with the data set published by Cinková and Policht (2014) who recorded Pant calls from 14 animals in an isolation context, thus obtaining better signal-to-noise ratios.

Our results partly support the acoustic structure hypothesis because the only harmonic call, the Grunt, is more individually distinctive than the Hiss, containing formant-like structures, followed by the noisy Snort. Even though we measured all parameters commonly used in the literature, it could be that other parameters may be better suited to measure individual signatures in Snorts and Hisses. Nevertheless, our findings are in accordance with the assumption that narrow-band harmonic calls are better suited for coding sender identity than broadband noisy calls (Yin and McCowan 2004; Leliveld et al. 2011). Thus, the dense energy distribution by the narrow-spaced harmonics favors the projection of formants. However, the Pant showed the highest level of individuality (Cinková and Policht 2014) although it has a broadband acoustic structure without fundamental frequency. The analysis of Cinková and Policht (2014) showed that sender identity was mainly encoded by temporal parameters such as the duration or the number of elements (Cinková and Policht 2014), whereas in our analysis temporal parameters were not important. Although individual differences based on frequency characteristics have been found in various mammals (e.g., Bastian and Schmidt 2008; Leliveld et al. 2011; Mumm et al. 2014), identity coding based on temporal features has also been described for some species (e.g., Shapiro 2006, 2010). In calls consisting of bouts of repetitive elements, the number of units per call and thereby the call duration are primarily dependent on individual lung capacity and the control of the air flow speed (Fitch and Hauser 1995). Individual-specific information based on the variance in temporal features, such as duration or temporal arrangement of frequency elements, has been found in bats (Brown 1976; Masters et al. 1995) and nonhuman primates (Lemasson et al. 2010; Bouchet et al. 2012). Temporal variation often is related to differences in the arousal state of an animal, which affects the mammalian vocal production mechanism (Kirchhübel et al. 2011). Arousal and anxiety are known to reduce saliva production and to increase muscle tension in mammals (Kirchhübel et al. 2011). In dwarf mongooses (Helogale parvula), it was shown that calls emitted during high-arousal situations show less individual variation as compared to calls emitted during low-arousal states (Rubow et al. 2018), whereas in domestic kittens (Felis catus) no difference in the level of individual distinctiveness was found between high- and low-arousal contexts (Scheumann et al. 2012). The southern white rhinoceros uttered two call types during aggressive interactions. Hisses acted as a first warning signal, whereas Grunts were a more powerful warning signal indicating a more pronounced motivation to fight (Policht et al. 2008). The Grunts thus may signal a higher level of arousal, yet they exhibited more pronounced individual differences compared to Hisses produced at a lower arousal level.

Our findings best support the social function hypothesis, as the level of individual distinctiveness increases from Snort, to Hiss, to Grunt, to Pant. Thus, the lowest level of individual distinctiveness was found in the Snort, which often is used in nonsocial situations such as feeding, resting, or locomotion (Policht et al. 2008; Linn et al. 2018). On the other hand, calls with a strong intragroup social function have high levels of acoustic variability, potentially allowing callers to convey a range of individual-specific information. These calls play a major role in affiliative (Pant) and agonistic interactions (Grunt and Hiss) with a specific social partner (Policht et al. 2008; Linn et al. 2018). Individual distinctiveness was highest in Pants functioning as a contact call during socio-positive interactions as compared to Grunts and Hisses uttered during socio-negative interactions. Pants are produced mainly in two distinct social contexts. First, white rhinoceroses emit Pants during social cohesive interactions as a kind of “greeting” when approaching or following a conspecific or a group of individuals (Policht et al. 2008; Linn et al. 2018). Moreover, Pants play an important role in the mating behavior of white rhinoceroses as bulls emit this call during mate guarding and mating encounters (Owen-Smith 1973; Policht et al. 2008). In both contexts it may be essential for a white rhinoceros to assess the identity of the caller, providing information about physiological and morphological attributes such as body size, dominance rank, or hormonal state. There is strong male–male competition and female mate choice in white rhinoceroses (Kretzschmar et al. 2020) and males use acoustic cues to gather information about rivals (Cinková and Shrader 2020).

Our finding is in agreement with the expansion of the social function hypothesis by Lemasson and Hausberger (2011) which assumes that individual distinctiveness is higher in calls related to affiliative contexts as compared to calls related to agonistic contexts. Our results agree with other studies showing that individual distinctiveness increases with increasing affiliative social value of a call type (Appendix I; e.g., Lemasson and Hausberger 2011; Bouchet et al. 2013; Ancillotto and Russo 2016). Selection may have favored more individually distinct calls in situations such as social cohesion in which vocal recognition is useful. On the other hand, in situations where context (e.g., aggression) is of greater importance than caller identity, selection will favor the suppression of individual vocal distinctiveness to reduce signal ambiguity and facilitate a rapid response by receivers (Shapiro 2010). From this point of view, it makes sense that evolution has favored individual distinctiveness in a contact and mating call, such as the Pant, providing signalers with benefits, but less so in aggressive calls such as the Grunt or the Hiss. Nevertheless, in agonistic contexts it may be important to estimate the potential outcome of an agonistic interaction by assessing the identity of the opponent, which may account for the individual distinctiveness in Grunt calls. For example, in northern elephant seals (Mirounga angustirostris), individuals remember the fighting abilities of potential opponents based on individual acoustic signatures (Casey et al. 2015).

Due to the fact that previous studies have used a wide variety of statistical methods to analyze and compare individual distinctiveness in vocalizations of different mammalian species (Appendix I) and that the published results thus may have been influenced by the methods used, we compared the three most prominent methods used in the literature when analyzing our data set to compare the level of individual distinctiveness between different call types: DFA, potential for individual identity coding (PIC), and the information criterion (Hs). Comparing the results of the DFA based on subject balanced and unbalanced data sets (Table 7), we confirmed that the classification accuracy is influenced by the number of individuals included in the analyses (e.g., Beecher 1989). However, comparing the balanced and unbalanced data sets, the kappa tests led to similar values, although they resulted in two different classification levels for Hisses (total data set: slight; balanced data set: fair). The overall PIC and the Hs varied only slightly between the balanced and unbalanced data sets (Table 7). The kappa test and the Hs provided a similar interpretation for individual distinctiveness. The Hs were below 1 for Snorts and Hiss in the balanced data set, suggesting that only a low number of individuals can be potentially discriminated (Searby et al. 2004), which is in agreement with the slight to fair agreement found by the kappa test. However, the overall PIC was above 1 suggesting a potential for identity coding. Nevertheless, the three measurements showed a comparable trend in the degree of individuality and the information criterion Hs turned out to be a reliable method when comparing different samples across studies as suggested by Beecher (1989) and Bouchet et al. (2013).

Observations made in this study have been carried out on southern white rhinoceroses in a zoo environment, which cannot completely reflect the natural situation. However, as individual signatures are related to the morphology of the individual, especially of the vocal tract, they should be independent from housing conditions or the social environment. Nevertheless, studies on wild southern white rhinoceroses would be important to clarify the role of vocal identity coding in social interactions under natural conditions. Moreover, studies on additional rhinoceros species are needed to clarify the impact of social system on the degree of individual distinctiveness. To date, comparative data are only available for a single species, the solitarily living black rhinoceros (Diceros bicornis). Budde and Klump (2003) showed that begging calls of captive adult black rhinoceroses carry individual signatures. Begging calls often are produced toward keepers (personal observation) but due to our limited knowledge on the vocal repertoire of black rhinoceroses, the function in conspecific communication is not yet understood. The begging call of the adult black rhinoceros corresponds to Whines produced by infants and subadults of the white rhinoceros. Further research is necessary to clarify whether the different socioecological niches, i.e., solitary, forest-dwelling versus semisocial, savanna-living (for discussion, see Linn et al. 2018), may account for these differences in vocalization behavior.

To sum up, our findings for the southern white rhinoceros suggest that the context of social interactions plays a major role in the evolution of individual distinctiveness in vocalizations. However, due to the fact that Grunts and Hisses are emitted in comparable contexts, namely during aggressive interactions, but differ in their acoustic structure and individual distinctiveness, it has to be assumed that not only the type of social interaction but also vocal production mechanisms influence the degree of individuality in different call types. Further, it still is unclear whether conspecifics use the different call types to discriminate and recognize different individuals. Cinková and Policht (2016) showed that southern white rhinoceroses are able to extract information about the sex and the species of the sender when listening to Pant calls. The present data can be used for further playback experiments, which are necessary to gain a clear understanding of the role of individual signatures in the noisy calls of the southern white rhinoceros and its capacity to discriminate between individuals.


We thank the Serengeti-Park Hodenhagen, the Dortmund Zoo, the Augsburg Zoo, the Erfurt Zoo, the Gelsenkirchen Zoo, and the Osnabrück Zoo, for the possibility to study their rhinoceroses and for their hospitality. We particularly thank M. Becker, D. Lahn, T. Lipp, S. Zech, T. Risch, W.-D. Gürtler, Prof. Dr. M. Boeer, and all the rhinoceros keepers for their patience and support during data collection. Furthermore, we acknowledge R. Frey for sending us information about vocal tract length in rhinoceroses. We dedicate this paper to Prof. Dr. E. Zimmermann: we miss the fruitful discussions with her. This research was funded by German Research Foundation (DFG; Projekt ID: SCHE 1927/2-1); Serengeti-Park-Stiftung (MS;; Studienstiftung des Deutschen Volkes (SNL;

Supplementary Data

Supplementary data are available at Journal of Mammalogy online.

 Supplementary Data SD1.—Original and filtered sonograms of the three call types Grunt, Hiss, and Snort.

Literature Cited


Altmann, J. 1974. Observational study of behavior: sampling methods. Behaviour 49:227–267. Google Scholar


Ancillotto, L., and D. Russo. 2016. Individual vs. non-individual acoustic signalling in African woodland dormice (Graphiurus murinus). Mammalian Biology 81:410–414. Google Scholar


August, P. V., and J. G. T. Anderson. 1987. Mammal sounds and motivation-structural rules: a test of the hypothesis. Journal of Mammalogy 68:1–9. Google Scholar


AVISOFT. 2018. Avisoft-SAS Lab Pro – sound analysis and synthesis laboratory. Version 5.2.12. Avisoft Bioacoustics. Glienicke, Germany. Google Scholar


Baker, M. C., and D. M. Logue. 2007. A comparison of three noise reduction procedures applied to bird vocal signals. Journal of Field Ornithology 78:240–253. Google Scholar


Bastian, A., and S. Schmidt. 2008. Affect cues in vocalizations of the bat, Megaderma lyra, during agonistic interactions. The Journal of the Acoustical Society of America 124:598–608. Google Scholar


BATSOUND PRO. 2013. Batsound – sound analysis. Version 4.1. Pettersson Elektronik AB. Uppsala, Sweden. Google Scholar


Beecher, M. D. 1989. Signaling systems for individual recognition - an information-theory approach. Animal Behaviour 38:248–261. Google Scholar


Belin, P., S. Fecteau, and C. Bédard. 2004. Thinking the voice: neural correlates of voice perception. Trends in Cognitive Sciences 8:129–135. Google Scholar


Boersma, P. 1993. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Proceedings of the Institute of Phonetic Sciences 17:97–110. Google Scholar


Boersma, P. 2001. Praat, a system for doing phonetics by computer. Glot International 5:341–345. Google Scholar


Bouchet, H., C. Blois-Heulin, and A. Lemasson. 2013. Social complexity parallels vocal complexity: a comparison of three nonhuman primate species. Frontiers in Psychology 4:390. Google Scholar


Bouchet, H., C. Blois-Heulin, A. S. Pellier, K. Zuberbühler, and A. Lemasson. 2012. Acoustic variability and individual distinctiveness in the vocal repertoire of red-capped mangabeys (Cercocebus torquatus). Journal of Comparative Psychology 126:45–56. Google Scholar


Brown, P. E. 1976. Echolocation ontogeny in pallid bat (Antrozous pallidus). Journal of the Acoustical Society of America 60:S3–S4. Google Scholar


Brown, J. L., A. C. Bellem, M. Fouraker, D. E. Wildt, and T. L. Roth. 2001. Comparative analysis of gonadal and adrenal activity in the black and white rhinoceros in North America by noninvasive endocrine monitoring. Zoo Biology 20:463–486. Google Scholar


Budde, C., and G. M. Klump. 2003. Vocal repertoire of the black rhino Diceros bicornis ssp. and possibilities of individual identification. Mammalian Biology 68:42–47. Google Scholar


Casey, C., I. Charrier, N. Mathevon, and C. Reichmuth. 2015. Rival assessment among northern elephant seals: evidence of associative learning during male-male contests. Royal Society Open Science 2:150228. Google Scholar


Charlton, B. D., et al. 2011. Cues to body size in the formant spacing of male koala (Phascolarctos cinereus) bellows: honesty in an exaggerated trait. The Journal of Experimental Biology 214:3414–3422. Google Scholar


Charrier, I., P. Jouventin, N. Mathevon, and T. Aubin. 2001. Individual identity coding depends on call type in the South Polar skua Catharacta maccormicki. Polar Biology 24:378–382. Google Scholar


Cinková, I., and R. Policht. 2014. Contact calls of the northern and southern white rhinoceros allow for individual and species identification. PLoS ONE 9:e98475. Google Scholar


Cinková, I., and R. Policht. 2016. Sex and species recognition by wild male southern white rhinoceros using contact pant calls. Animal Cognition 19:375–386. Google Scholar


Cinková, I., and A. M. Shrader. 2020. Rival assessment by territorial southern white rhinoceros males via eavesdropping on the contact and courtship calls. Animal Behaviour 166:19–31. Google Scholar


Déaux, É. C., I. Charrier, and J. A. Clarke. 2016. The bark, the howl and the bark-howl: identity cues in dingoes' multicomponent calls. Behavioural Processes 129:94–100. Google Scholar


Ehret, G. 2006. Common rules of communication sound perception. Pp. 85–114 inBehaviour and neurodynamics for auditory communication ( J. Kanwal and G. Ehret, eds.). Cambridge University Press. Cambridge, United Kingdom. Google Scholar


Fant, G. 1960. Acoustic theory of speech production. With calculations based on X-ray studies of Russian articulations. Mouton & Co. The Hague, The Netherlands. Google Scholar


Fitch, W. T. 1997. Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. The Journal of the Acoustical Society of America 102:1213–1222. Google Scholar


Fitch, W. T. 2010. The evolution of language. Cambridge University Press. Cambridge, Massachusetts. Google Scholar


Fitch, W. T., and M. D. Hauser. 1995. Vocal production in nonhuman primates: acoustics, physiology, and functional constraints on “honest” advertisement. American Journal of Primatology 37:191–219. Google Scholar


Gamba, M. 2014. Vocal tract-related cues across human and nonhuman signals. Reti, Saperi, Linguaggi 1:49–68. Google Scholar


Haccou, P., and E. Melis. 1994. Statistical analysis of behavioural data. Oxford University Press. New York, New York. Google Scholar


Hutchins, M., and M. D. Kreger. 2006. Rhinoceros behaviour: implications for captive management and conservation. International Zoo Yearbook 40:150–173. Google Scholar


Jenikejew, J., B. Chaignon, S. Linn, and M. Scheumann. 2020. Proximity-based vocal networks reveal social relationships in the Southern white rhinoceros. Scientific Reports 10:15104. Google Scholar


Kessler, S. E., M. Scheumann, L. T. Nash, and E. Zimmermann. 2012. Paternal kin recognition in the high frequency/ultrasonic range in a solitary foraging mammal. BMC Ecology 12:26. Google Scholar


Kirchhübel, C., D. M. Howard, and A. W. Stedmon. 2011. Acoustic correlates of speech when under stress: research, methods and future directions. International Journal of Speech, Language & the Law 18:75–98. Google Scholar


Knörnschild, M., A. A. Fernandez, and M. Nagy. 2019. Vocal information and the navigation of social decisions in bats: is social complexity linked to vocal complexity? Functional Ecology 34:322–331. Google Scholar


Kretzschmar, P., et al. 2020. Mate choice, reproductive success and inbreeding in white rhinoceros: new insights for conservation management. Evolutionary Applications 13:700–715. Google Scholar


Landis, J. R., and G. G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33:159–174. Google Scholar


Leliveld, L. M., M. Scheumann, and E. Zimmermann. 2011. Acoustic correlates of individuality in the vocal repertoire of a nocturnal primate (Microcebus murinus). The Journal of the Acoustical Society of America 129:2278–2288. Google Scholar


Lemasson, A., and M. Hausberger. 2011. Acoustic variability and social significance of calls in female Campbell's monkeys (Cercopithecus campbelli campbelli). The Journal of the Acoustical Society of America 129:3341–3352. Google Scholar


Lemasson, A., K. Ouattara, H. Bouchet, and K. Zuberbühler. 2010. Speed of call delivery is related to context and caller identity in Campbell's monkey males. Die Naturwissenschaften 97:1023–1027. Google Scholar


Lessells, C. M., and P. T. Boag. 1987. Unrepeatable repeatabilities: a common mistake. The Auk 104:116–121. Google Scholar


Lieberman, P., and S. E. Blumstein. 1988. Speech physiology, speech perception, and acoustic phonetics. Cambridge University Press. Cambridge, Massachusetts. Google Scholar


Ligout, S., F. Sebe, and R. H. Porter. 2004. Vocal discrimination of kin and non-kin agemates among lambs. Behaviour 141:355–369. Google Scholar


Linn, S. N., M. Boeer, and M. Scheumann. 2018. First insights into the vocal repertoire of infant and juvenile Southern white rhinoceros. PLoS ONE 13:e0192166. Google Scholar


Liu, R. C., K. D. Miller, M. M. Merzenich, and C. E. Schreiner. 2003. Acoustic variability and distinguishability among mouse ultrasound vocalizations. The Journal of the Acoustical Society of America 114:3412–3422. Google Scholar


Maciej, P., J. Fischer, and K. Hammerschmidt. 2011. Transmission characteristics of primate vocalizations: implications for acoustic analyses. PLoS ONE 6:e23015. Google Scholar


Manser, M. B. 2001. The acoustic structure of suricates' alarm calls varies with predator type and the level of response urgency. Proceedings of the Royal Society of London, B. Biological Sciences 268:2315–2324. Google Scholar


Masters, W. M., K. A. S. Raver, and K. A. Kazial. 1995. Sonar signals of big brown bats, Eptesicus fuscus, contain information about individual identity, age and family affiliation. Animal Behaviour 50:1243–1260. Google Scholar


MATLAB. 2018. Matlab. Version R2018a The MathWorks, Inc. Natick, Massachusetts. Google Scholar


Mitani, J. C., J. Gros-Louis, and J. M. Macedonia. 1996. Selection for acoustic individuality within the vocal repertoire of wild chimpanzees. International Journal of Primatology 17:569–583. Google Scholar


Muggenthaler, E. K., J. W. Stoughton, and J. C. Daniel. 1993. Infrasound from the Rhinocerotidae. Pp. 136–139 inRhinoceros biology and conservation. Proceedings of an international conference on rhinoceros, 9–11 May 1991 ( O. A. Ryder, ed.). Zoological Society of San Diego. San Diego, California. Google Scholar


Müller, C. A., and M. B. Manser. 2008. Mutual recognition of pups and providers in the cooperatively breeding banded mongoose. Animal Behaviour 75:1683–1692. Google Scholar


Mumm, C. A. S., M. C. Urrutia, and M. Knörnschild. 2014. Vocal individuality in cohesion calls of giant otters, Pteronura brasiliensis. Animal Behaviour 88:243–252. Google Scholar


Nair, S., R. Balakrishnan, C. S. Seelamantula, and R. Sukumar. 2009. Vocalizations of wild Asian elephants (Elephas maximus): structural classification and social context. The Journal of the Acoustical Society of America 126:2768–2778. Google Scholar


Owen-Smith, N. O. 1973. The behavioural ecology of the white rhinoceros. Ph.D. dissertation, University of Wisconsin. Madison, Wisconsin. Google Scholar


Patton, M., et al. 1999. Reproductive cycle length and pregnancy in the southern white rhinoceros (Ceratotherium simum simum) as determined by fecal pregnane analysis and observations of mating behavior. Zoo Biology 18:111–127. Google Scholar


Peckre, L., P. M. Kappeler, and C. Fichtel. 2019. Clarifying and expanding the social complexity hypothesis for communicative complexity. Behavioral Ecology and Sociobiology 73:11. Google Scholar


Pfefferle, D., and J. Fischer. 2006. Sounds and size: identification of acoustic variables that reflect body size in hamadryas baboons, Papio hamadryas. Animal Behaviour 72:43–51. Google Scholar


Phillips, A. V., and I. Stirling. 2000. Vocal individuality in mother and pup South American fur seals, Arctocephalus australis . Marine Mammal Science 16:592–616. Google Scholar


Plapous, C., C. Marro, and P. Scalart. 2005. Speech enhancement using harmonic regeneration. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing 1:1–157. Google Scholar


Plapous, C., C. Marro, and P. Scalart. 2006. Improved signal-to-noise ratio estimation for speech enhancement. Proceedings of the IEEE Transactions on Audio Speech and Language Processing 14:2098–2108. Google Scholar


Plotsky, K., D. Rendall, T. Riede, and K. Chase. 2013. Radiographic analysis of vocal tract length and its relation to overall body size in two canid species. Journal of Zoology 291:76–86. Google Scholar


Policht, R., K. Tomasova, D. Holeckova, and D. Frynta. 2008. The vocal repertoire in Northern white rhinoceros Ceratotherium simum cottoni as recorded in the last surviving herd. Bioacoustics 18:69–96. Google Scholar


PRAAT. 2018. Praat doing phonetics by computer. Version 6.0.42. University of Amsterdam. Amsterdam, The Netherlands. Accessed 14 September 2018. Google Scholar


Rendall, D., H. Notman, and M. J. Owren. 2009. Asymmetries in the individual distinctiveness and maternal recognition of infant contact calls and distress screams in baboons. The Journal of the Acoustical Society of America 125:1792–1805. Google Scholar


Rendall, D., M. J. Owren, and P. S. Rodman. 1998. The role of vocal tract filtering in identity cueing in rhesus monkey (Macaca mulatta) vocalizations. The Journal of the Acoustical Society of America 103:602–614. Google Scholar


Robisson, P., T. Aubin, and J.-C. Bremod. 1993. Individuality in the voice of the emperor penguin Aptenodytes forsteri: adaptation to a noisy environment. Ethology 94:279–290. Google Scholar


Roth, T. L. 2006. A review of the reproductive physiology of rhinoceros species in captivity. International Zoo Yearbook 40:130–143. Google Scholar


RStudio Team. 2016. RStudio: integrated development environment for R. Version 1.1.447. RStudio Inc. Boston, Massachusetts. Accessed 12 November 2019. Google Scholar


Rubow, J., M. I. Cherry, and L. L. Sharpe. 2018. A comparison of individual distinctiveness in three vocalizations of the dwarf mongoose (Helogale parvula). Ethology 124:45–53. Google Scholar


Salmi, R., K. Hammerschmidt, and D. M. Doran-Sheehy. 2014. Individual distinctiveness in call types of wild western female gorillas. PLoS ONE 9:e101940. Google Scholar


Schehka, S., and E. Zimmermann. 2009. Acoustic features to arousal and identity in disturbance calls of tree shrews (Tupaia belangeri). Behavioural Brain Research 203:223–231. Google Scholar


Scherer, K. R. 1989. Vocal correlates of emotional arousal and affective disturbance. Pp. 165–197 inHandbook of psychophysiology: emotion and social behavior ( H. Wagner and A. Manstead, eds.). Wiley. London, United Kingdom. Google Scholar


Scheumann, M., A. E. Roser, W. Konerding, E. Bleich, H. J. Hedrich, and E. Zimmermann. 2012. Vocal correlates of sender-identity and arousal in the isolation calls of domestic kitten (Felis silvestris catus). Frontiers in Zoology 9:36. Google Scholar


Scheumann, M., E. Zimmermann, and G. Deichsel. 2007. Context-specific calls signal infants' needs in a strepsirrhine primate, the gray mouse lemur (Microcebus murinus). Developmental Psychobiology 49:708–718. Google Scholar


Searby, A., P. Jouventin, and T. Aubin. 2004. Acoustic recognition in macaroni penguins: an original signature system. Animal Behaviour 67:615–625. Google Scholar


Sébe, F., J. Duboscq, T. Aubin, S. Ligout, and P. Poindron. 2010. Early vocal recognition of mother by lambs: contribution of low- and high-frequency vocalizations. Animal Behaviour 79:1055–1066. Google Scholar


Seyfarth, R. M., D. L. Cheney, and P. Marler 1980. Vervet monkey alarm calls: semantic communication in a free-ranging primate. Animal Behaviour 28:1070–1094. Google Scholar


Shapiro, A. D. 2006. Preliminary evidence for signature vocalizations among free-ranging narwhals (Monodon monoceros). The Journal of the Acoustical Society of America 120:1695–1705. Google Scholar


Shapiro, A. D. 2010. Recognition of individuals within the social group: signature vocalizations. Pp. 495–503 inHandbook of mammalian vocalization ( S. M. Brudzynski, eds.). Elsevier B.V. London, United Kingdom. Google Scholar


Shrader, A. M., and N. Owen-Smith. 2002. The role of companionship in the dispersal of white rhinoceroses (Ceratotherium simum). Behavioral Ecology and Sociobiology 52:255–261. Google Scholar


Sikes, R. S., and The Animal Care and Use Committee of the American Society of Mammalogists. 2016. 2016 Guidelines of the American Society of Mammalogists for the use of wild mammals in research and education. Journal of Mammalogy 97:663–688. Google Scholar


Snowdon, C. T., A. M. Elowson, and R. S. Roush. 1997. Social influences on vocal development in New World primates. Pp. 234–248 inSocial influences on vocal development ( C. T. Snowdon and M. Hausberger, eds.). Cambridge University Press. New York. Google Scholar


Stoeger, A. S., and A. Baotic. 2016. Information content and acoustic structure of male African elephant social rumbles. Scientific Reports 6:27585. Google Scholar


Stoeger, A. S., G. Heilmann, M. Zeppelzauer, A. Ganswindt, S. Hensman, and B. D. Charlton. 2012. Visualizing sound emission of elephant vocalizations: evidence for two rumble production types. PLoS One. 7:e48907. epub 2012 nov 14. pmid: 23155427; pmcid: pmc3498347Google Scholar


Taylor, A. M., and D. Reby. 2010. The contribution of source-filter theory to mammal vocal communication research. Journal of Zoology 280:221–236. Google Scholar


Torriani, M. V., E. Vannoni, and A. G. Mcelligott. 2006. Mother-young recognition in an ungulate hider species: a unidirectional process. The American Naturalist 168:412–420. Google Scholar


Wiley, R. H., and D. G. Richards. 1978. Physical constraints on acoustic communication in the atmosphere: implications for the evolution of animal vocalizations. Behavioral Ecology and Sociobiology 3:69–94. Google Scholar


Wittig, R. M., C. Crockford, R. M. Seyfarth, and D. L. Cheney. 2007. Vocal alliances in Chacma baboons (Papio hamadryas ursinus). Behavioral Ecology and Sociobiology 61:899–909. Google Scholar


Yin, S., and B. Mccowan. 2004. Barking in domestic dogs: context specificity and individual identification. Animal Behaviour 68:343–355. Google Scholar


Appendix I

Overview of studies that investigated individual distinctiveness among different call types within a species (including information on the acoustic structure of the respective call types, the context in which they are given, and typical distance at which they are exchanged) and results for acoustic variability and individual distinctiveness based on the different hypotheses. SF = social function hypothesis, predicts that calls uttered in directed interaction have a higher level of individual distinctiveness than calls uttered in general contexts and that calls uttered in affiliative social context have a higher level of individual distinctiveness than calls uttered agonistic social context; DC = distance communication hypothesis, predicts that level of individual distinctiveness is highest in call types uttered in far distance, intermediate in call types uttered in intermediate distances, and lowest for call types uttered at low distance; AC = acoustic structure hypothesis, predicts that the level of individual distinctiveness decreased from tonal to noisy calls with harmonic components (mixed) to noisy call types. PIC = potential for individual identity coding, Hs = information criterion, DFA = discriminant function analysis, + = equal number of individuals, CV = coefficient of variance, cluster = cluster analysis, NO = results do not support the hypothesis, YES = results support the hypothesis, PARTLY = results partly support the hypothesis, – = not testable with the data set, ?? = no information available in the paper.



© The Author(s) 2021. Published by Oxford University Press on behalf of the American Society of Mammalogists. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (, which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact
Sabrina Nicolleta Linn, Sabine Schmidt, and Marina Scheumann "Individual distinctiveness across call types of the southern white rhinoceros (Ceratotherium simum simum)," Journal of Mammalogy 102(2), 440-456, (20 March 2021).
Received: 24 February 2020; Accepted: 19 January 2021; Published: 20 March 2021
acoustic structure hypothesis
distance communication hypothesis
information criterion
nasal call
oral call
Back to Top