Presence or abundance of owls is frequently assessed using call-broadcast surveys to elicit responses and increase detection rates, but can draw owls in from a distance and could affect conclusions about fine-scale habitat associations. Passive acoustic surveys with field personnel or autonomous recording units (ARUs) may be a less biased method for surveying owls. Automated recognition techniques have proven useful to process large volumes of acoustic recordings from ARUs, and we sought to test the utility of automated recognition for three owl species. We built templates or “recognizers” for the territorial calls of the Barred Owl (Strix varia), the Boreal Owl (Aegolius funereus), and the Great Horned Owl (Bubo virginianus). We assessed the performance of each recognizer by evaluating precision, processing time, and false negatives. We used ARUs to survey for owls in northeastern Alberta, Canada, and compared the results from the recognizers to results from researchers listening to a subsample of the recordings. We verified the results to filter out false positives, but verification time was substantially lower than time spent listening. We processed more recordings and obtained a larger dataset of owl detections than would have been possible with either listening to the recordings only or conducting traditional field surveys without ARUs, suggesting a significant benefit of automated recognition. Precision was quite variable, but false negatives were relatively low and did not affect results of owl habitat associations. Given the relatively low detection rates of owls by listening to recordings, an automated recognition approach is likely to be highly useful for monitoring owls.
Most owl species are difficult for human observers to detect visually due to their nocturnal habits, cryptic coloration, and occurrence at low densities. Owls are more effectively detected by their calls, and as a result monitoring and research projects frequently use acoustic surveys to determine presence or abundance of owls (Goyette et al. 2011, Rognan et al. 2012). Owls use territorial vocalizations to attract mates and defend territories from conspecifics during the breeding season in early spring (Johnsgard 2002, Odom and Mennill 2010a), so detecting these calls is a reliable indicator that a species is occupying a territory. Tracking presence of owls using their territorial vocalizations can enable research and monitoring programs to estimate patch occupancy, and obtain information on owl habitat use and distribution across a landscape.
Acoustic surveys for owls often broadcast a recorded owl call (Clark and Anderson 1997, Sater et al. 2006, Grossman et al. 2008, Kissling et al. 2010). Broadcasting owl calls can increase the probability of detecting an owl by eliciting territorial individuals to call back (Kissling et al. 2010). The rationale for using call-broadcast surveys is that owl calling rates are thought to be low. Although a broadcast call can increase detection of owls, there are drawbacks. Call-broadcast surveys can draw owls in from a distance (Zuberogoitia et al. 2011), which could affect conclusions about habitat associations of owls. Detection from call-broadcast surveys may vary with different equipment, and can also affect detection of other owl species (Bailey et al. 2009, Wiens et al. 2011), which could be problematic if the survey is targeting multiple owl species. Depending on the study objective, passive acoustic surveys may be a less biased method for surveying owls.
Passive acoustic survey methods for owls do not broadcast calls. They can be implemented with field personnel as a traditional point count or with autonomous recording units (ARUs) that can be programmed to record on a set schedule and passively record owls calling. Traditional point counts are relatively time-consuming, and because the calling behavior of owls may be affected by a variety of environmental factors including time of season, temperature, weather, and lunar phase (Clark and Anderson 1997, Kissling et al. 2010), this can constrain the timing of field surveys. Passive acoustic surveys using ARUs are increasingly used in avian research (Shonfield and Bayne 2017) and can be useful for surveying rare and elusive species (Holmes et al. 2014, 2015, Campos-Cerqueira and Aide 2016) and for conducting nocturnal surveys for a variety of species including owls (Rognan et al. 2012). An important benefit of using ARUs for nocturnal owl surveys is that the units can be set up at any time and left out for extended periods. This reduces the challenges and constraints of planning surveys during optimal weather conditions, and eliminates many of the safety concerns for field personnel conducting nocturnal field work during the late winter/early spring.
The probability of detecting an owl is an important consideration when selecting a survey method, because false absences (i.e., failure to detect an owl when present) can lead to biased estimates and misleading inferences. At first glance, call-broadcast surveys may seem preferable because if the probability of detecting an owl is increased compared to a passive survey, then this should lead to fewer false absences. However, an ARU can increase the cumulative detection probability of owls because it can record on a set schedule for several days or weeks. Thus, an ARU can reduce the problem of lower detection probabilities of passive surveys and increase the utility of the survey data by increasing the number of sampling occasions while still only requiring two visits by field personnel. For these reasons, using ARUs for passive acoustic surveys appears to be a promising new approach for studying and monitoring owls.
Acoustic datasets collected with ARUs over extended time periods can be large and daunting to process. Automated species recognition of animal vocalizations is changing this. This process involves matching recording segments to a template (often termed a “recognizer”) derived from training data and registering a hit when a similarity threshold is reached. A few different approaches have been developed, including band-limited energy detectors (Mills 2000), binary point matching (Katz et al. 2016), decision trees (Acevedo et al. 2009, Digby et al. 2013), random forest (Ross and Allen 2014), spectrogram cross-correlation (Katz et al. 2016), hidden Markov models (Wildlife Acoustics 2011), and most recently deep learning through convolutional neural networks (Salamon and Bello 2017). A few are easily accessible to researchers through commercial or open source software, including hidden Markov models in Song Scope (Wildlife Acoustics Inc., Maynard, MA, U.S.A.), cluster analysis in Kaleidoscope (Wildlife Acoustics Inc., Maynard, MA, U.S.A.), band-limited energy detectors in Raven Pro (Cornell Laboratory of Ornithology, Ithaca, NY, U.S.A.), and spectrogram cross-correlation in R package “monitoR” (Hafner and Katz 2017), Avisoft SASLab Pro (Avisoft Bioacoustics, Berlin, Germany), and Xbat (Cornell Laboratory of Ornithology, Ithaca, NY, U.S.A.). Automated recognition and programs that can implement this approach are likely to be useful for a variety of projects using acoustic monitoring.
Previous studies have used automated recognition techniques to process acoustic recordings more efficiently for birds and amphibians (Buxton and Jones 2012, Frommolt and Tauchert 2014, Taff et al. 2014, Colbert et al. 2015, Holmes et al. 2015, Brauer et al. 2016). Automated recognition techniques perform poorly when there are a lot of overlapping calls (Buxton and Jones 2012, Digby et al. 2013) either from conspecifics or heterospecifics, and may also perform poorly if there is a lot of abiotic noise on the audio recordings. The effect of noisy recordings on the performance of automated acoustic recognition is important to assess because natural noise is present everywhere and anthropogenic noise is becoming increasingly prevalent in natural areas.
Owl calls are potentially well-suited to automated species recognition. The calls are unlikely to overlap with conspecifics (except for some minimal overlap during male–female duets in some species), and very few other species are present or vocally active at the same time, as owls generally call nocturnally in late winter/early spring. In the acoustic data we have collected, it is rare to hear two or more owl species calling on the same audio recording (about 1% of all recordings; J. Shonfield unpubl. data). Conducting passive acoustic surveys with ARUs and combining this approach with automated species recognition may be an efficient method of increasing the probability of detecting owls during passive surveys, and subsequently increasing the statistical power to detect trends and habitat specific differences in abundance. There is interest in using automated recognition of owls for acoustic surveys, and we are aware that this approach is being tested for surveys of Spotted Owls (Strix occidentalis; J. Higley, Hoopa Tribal Forestry, pers. comm.; M. Hane, Weyerhauser, pers. comm.); however, there is a gap in the literature on whether this is effective for other species of owls.
In this study, we used ARUs to conduct acoustic surveys for owls in northeastern Alberta. Our main objective was to test the utility of using automated computer recognition techniques to process acoustic data collected with ARUs to determine presence/absence of owls at survey locations. We chose three owl species found throughout Canada and the United States: the Barred Owl (Strix varia), the Boreal Owl (Aegolius funereus), and the Great Horned Owl (Bubo virginianus). We built templates (hereafter “recognizers”) for each species to scan through our large acoustic dataset and automatically identify the territorial calls of these owls. We built two different recognizers for Barred Owls using different parts of their territorial call because it is longer and more complex than the calls of the other two species. To evaluate the utility of the recognizers we did two comparisons of results; first, we compared the results of owl detections obtained from the recognizers to the results of detections obtained from listening to a subsample of the recordings. Second, we compared the results of the two Barred Owl recognizers to evaluate the effect of using different templates to identify different parts of the call. For both comparisons, our specific objectives were: (1) to assess the performance of each recognizer by evaluating the precision (probability of a recognizer match being a true match), total processing time, and false negatives; (2) to assess whether noise level on recordings affects the precision of a recognizer; and (3) to compare results of owl habitat associations based on different survey methods using occupancy models.
Study Area. We selected study sites in upland forested areas of the Lower Athabasca planning region in northeastern Alberta, located south of Fort McMurray, north of Lac la Biche and northwest of Cold Lake (Fig. 1). Forests in the study area were composed primarily of trembling aspen (Populus tremuloides), white spruce (Picea glauca), and black spruce (Picea mariana) trees. All sites were >3 km apart. We surveyed 45 sites with varying levels of industrial noise that were selected based on the type of industrial noise present. Chronic-noise sites (n = 13) had either an in situ oil processing plant facility or a compressor station present at the center of the site, both of which produced continuous noise at a loud level. Intermittent-noise sites (n = 17) were positioned with a road bisecting the site, and had intermittent traffic noise but no chronic noise present. Control sites (n = 15) did not have a road or industrial infrastructure present and thus had no industrial noise. The area surveyed at each site (256 ha) approximated the home-range size of pairs of Barred Owls and Great Horned Owls during the breeding season (Mazur et al. 1998, Bennett and Bloom 2005, Livezey 2007). The estimates for Boreal Owl home-range sizes during the breeding season vary widely between studies (Hayward et al. 1993, Santangeli et al. 2012), but are likely smaller than our sites.
Acoustic Surveys. We conducted passive acoustic surveys for owls using a commercially available ARU: the SM2+ Song Meter (Wildlife Acoustics Inc., Maynard, MA, U.S.A.). We programmed each ARU to turn on and record in stereo format for 10 min at the start of every hour at 44.1 kHz with a 16-bit resolution. Recording files were stored in .wac format, a loss-less audio compression format that is proprietary to Wildlife Acoustics. We recorded in stereo format to have a backup channel in case one of the microphones failed or was damaged in the field. We tested each ARU and both microphones prior to deployment to identify any units with non-responsive channels or degraded microphones. We used gain settings of 48 dB for both the left and right channel microphones. We installed ARUs at each site for approximately 2 wk between 21 March 2014 and 6 May 2014, which is the time period when owls are actively calling (Clark and Anderson 1997, Kissling et al. 2010). We attached ARUs at a height of approximately 1.5 m on trees with a diameter smaller than the width of the ARU (18 cm). At intermittent-noise sites and control sites, we deployed five ARUs in a square formation, with one at each corner spaced 1.6 km apart, and one in the center positioned 1.2 km from each corner. At chronic-noise sites, we deployed six ARUs per site with the sixth ARU on an adjacent or opposite side of the noise source a minimum of 200 m from the central ARU. We assumed the detection radius of a single ARU would be reduced in noisy areas, so the additional ARU was deployed to increase the area surveyed near noise sources and to increase our sample size of the number of locations we surveyed with loud noise. In total, we deployed ARUs at 238 locations across 45 sites; however, one ARU failed to record completely, so we effectively surveyed 237 locations.
For comparison to the recognizers, we listened to a subsample of recordings by randomly selecting four dates for each site, and listening to the midnight recordings (each recording was 10 min in duration) on those dates from each ARU deployed at that site. We used Adobe Audition CS6 (Adobe Systems Inc., San Jose, CA, U.S.A.) to visualize each recording as a spectrogram to help locate and identify vocalizations while listening to recordings. Four trained researchers identified owls calling on the recordings, and assessed industrial noise on each recording based on the following index: no noise (noise code 0), low and distant (noise code 1), moderate (noise code 2), and very loud and close (noise code 3). We took the modal noise index from the recordings and used that as the noise-level ranking at that ARU location. Researchers also kept track of the amount of time it took to listen and transcribe the data from each recording. Researchers were trained using sample owl clips and practiced listening of 25 example recordings prior to data collection. Any detections that they could not confidently identify were checked by a researcher with 2 yr of experience identifying owls on recordings (JS); JS also conducted random checks of recordings to ensure accuracy.
Building Recognizers. We used the program Song Scope 4.1.3A (Wildlife Acoustics Inc., Maynard, MA, U.S.A.) to build recognizers to detect the territorial calls of three owl species: the two-phrased hoot of the Barred Owl (Odom and Mennill 2010b), the trill of the Boreal Owl, and the territorial hoot of the Great Horned Owl (Kinstler 2009). For the Barred Owl call, we created two different recognizers, one for the entire two-phrased hoot and one for the terminal two notes of the two-phrased hoot (Fig. 2). We used several clips from the field recordings of good quality calls we identified from listening and annotated them in Song Scope to be used as training data to build each recognizer (Supplemental Table 1 [online]). We considered calls to be “good quality” if they were produced near the microphone (i.e., had little attenuation) and were not masked by acoustic signals from other animals or abiotic noise. We used good quality calls from as many different locations within our study area as possible, rather than using many annotations from the same recording because increasing the number of locations can have a positive effect on the precision of a recognizer (Crump and Houlahan 2017). We used 51 annotations of the entire two-phrased hoot of the Barred Owl from 17 locations, 26 annotations of the terminal two notes of the Barred Owl two-phrased hoot from nine locations, 42 annotations of the Boreal Owl trill from seven locations, and 83 annotations of Great Horned Owl territorial hoots from eight locations (Supplemental Table 1).
When building a recognizer, the user can adjust the settings in Song Scope to improve signal detection of the annotated clips. We kept the sample rate, background filter, Fast Fourier Transform (FFT) window size, and overlap settings consistent across all recognizers ( Supplemental Table 1 (10.3356_JRR-17-52.s1.docx)). Based on the call properties of each species' call, we adjusted the minimum and maximum frequency and timing settings (maximum syllable length, gap between syllables, and maximum song length) to constrain the program to only identify candidate signals within these settings ( Supplemental Table 1 (10.3356_JRR-17-52.s1.docx)). Song Scope uses hidden Markov models to match recording segments to a recognizer template derived from training data and registers a hit when a similarity threshold is met (Wildlife Acoustics 2011). For each detected vocalization, Song Scope provides two values: a quality value (between 0.0 and 99.9) that indicates where the vocalization fits within a statistical distribution of parameters from the training data used to build the recognizer, and a score value (between 0.00 and 99.99) indicating the statistical fit of the vocalization to the recognizer model (Wildlife Acoustics 2011). The user sets a minimum quality and minimum score threshold each time a recognizer is run through a set of acoustic data. Lower thresholds lead to more hits and more false positives, but higher thresholds can lead to more false negatives; thus, the choice of whether to minimize false positives or false negatives should depend on the study objective (Crump and Houlahan 2017). We aimed to use this data to determine presence/absence of owls, so we wanted to minimize false positives while still detecting owls at all locations where we detected them by listening to recordings. From test runs on a small subset of our acoustic dataset, we settled on using a minimum quality setting of 50 and a minimum score setting of 60, and used these values for all four recognizers when scanning all recordings collected in 2014. Though we recorded in stereo, Song Scope scans one channel at a time, so we scanned the left channel only because there was no damage in the field to any of the microphones.
Processing Recognizer Results. The results output from each recognizer had a number of false positives (i.e., hits that were not the target owl species), so five trained researchers verified all hits generated by the program before compiling the data. As with the listening data, detections that researchers could not confidently identify were checked by JS, who also conducted random checks to ensure accuracy. To address our first objective, we quantified true positives and false positives to calculate the precision of each recognizer as the proportion of true positive hits out of the total number of hits. We quantified false negatives by determining the number of locations where owls were detected by listening but missed by the recognizers. We quantified false negatives at two spatial scales: at the scale of an individual ARU location, and at the scale of a site by pooling detections from all ARUs within a site. To estimate processing time, researchers kept track of the time to verify the hits for a subset of the data processed by the recognizers (a minimum of 13 sites for each recognizer). We estimated for each recognizer how many hits can be verified per minute of researcher processing time, and used this rate to calculate the total processing time of each recognizer based on the total number of hits generated by Song Scope. We then compared the total processing time of the recognizers to the total time spent listening to recordings.
To address our second objective, we assigned a noise index ranking for each ARU location by using the modal noise index from our assessment while listening to recordings. For each noise-level rank, we calculated the average number of total hits per ARU location. We also calculated the proportion of true hits (i.e., the target species) weighted by the total number of hits per ARU location for each noise-level rank. This weighted average is the average precision, and we compared these values between noise levels for each recognizer to assess whether increasing noise on recordings led to a decrease in precision of a recognizer. We also checked the noise levels of the locations where owls that were detected from listening to recordings were missed by the recognizers.
Occupancy Analysis. Occupancy modeling uses repeated observations at sites to estimate detectability and account for imperfect detection when estimating the probability of species occurrence (MacKenzie et al. 2002). To address our third objective, to compare results of owl habitat associations based on different survey methods, we ran single-species single-season occupancy models (MacKenzie et al. 2002) for each owl species separately using the package “unmarked” (Fiske and Chandler 2011) in R version 3.3.3 (R Core Team 2017) using RStudio version 1.0.143 (RStudio Team 2017). For the occupancy models, we compiled detection histories from the data obtained from listening and the data from the recognizers. The listening data consisted of four recordings from each ARU at a site, so we compiled a detection history for each ARU location from these four “sampling occasions.” For the data obtained from the recognizers, we used each day as a separate “sampling occasion.” All ARUs were deployed for a minimum of 13 d, so we compiled a detection history of each ARU location for 13 sampling occasions, each consisting of a 24-hr period. ARUs with complete recording failures were not included in the dataset (n = 1). ARUs that failed at some point during the deployment (n = 12) and did not record for all 13 d were indicated in the detection history as “missed surveys” on days that they did not record. Occupancy modeling is able to deal with these “missed surveys” as long as they are indicated in the detection history. Owls are unlikely to be found consistently within the area around an ARU due to movement, so occupancy models at this scale represent use of the area as opposed to occupancy per se. Hereafter, we use the term “use” to represent the probability of an owl using the area we surveyed with an ARU during the breeding season.
We ran occupancy models with forest composition as an independent continuous variable of the occupancy parameter (i.e., seasonal use) to compare results of owl habitat use across methods (data extracted from listening or using a recognizer). We ran separate models for each species and for each extraction method. We extracted data on forest composition in ArcGIS 10.3.1 (Environmental Systems Research Institute, Inc., Redlands, CA, U.S.A.) by calculating the percent of coniferous forest weighted by area from the Alberta Vegetation Inventory (AVI) within an 800-m-radius buffer around each ARU location. We used an 800-m-radius buffer because this approximated the maximum detection radius of an ARU to detect owls calling (Yip et al. 2017). For Barred Owls, we included a quadratic term for percent coniferous forest, because previous research indicates they prefer mixed wood forests (Mazur et al. 1998, Russell 2008). For Boreal Owls and Great Horned Owls, we did not include a quadratic term for percent coniferous forest, as Boreal Owls prefer coniferous forests (Hayward et al. 1993, Lane et al. 2001) and Great Horned Owls are found in a wide variety of forest types (Johnsgard 2002). We compared the estimates of use in response to forest composition between the listening and recognizer acoustic datasets. We ran occupancy models for each of the Barred Owl recognizer templates and compared them to determine if different biological inferences would be drawn about Barred Owl habitat use based on a different type of data collection. We did not include forest age as a covariate in our models, because initial analyses with forest age extracted from the AVI layer suggested it was not a good predictor of occupancy for any of the three owl species. This was likely due to limited sampling in young forest stands. Mean forest age around each ARU ranged from 21–153 yr (overall mean of 92 yr), but 96% of stations were surrounded by mature forest (>50 yr old), and 84% of stations were surrounded by old forest (>80 yr old).
We listened to a total of 944 recordings, approximately 157 hr of audio data. Each 10-min recording took an average of 11 min to listen and record data, and from this we estimated that listening took approximately 174 hr. Song Scope scanned 84,516 recordings (approximately 14,086 hr of audio data), and this scanning process was repeated for each of the four recognizers. The amount of processing time required for trained observers to check the output results varied among the recognizers due to differences in verification rate (the number of hits observers could check per minute) and the total number of hits (Table 1). Each of the recognizers generated a number of true positives and false positives and the precision of each recognizer varied widely. Both Barred Owl recognizers had the highest number of total hits and the lowest precision, whereas the Great Horned Owl and Boreal Owl recognizers had fewer hits and greater precision (Table 1). Total verification time for each recognizer, calculated by dividing the verification rate by the total number of hits, was lowest for the Great Horned Owl recognizer and highest for the Boreal Owl recognizer (Table 1). The total verification time summed across all four recognizers is approximately 30 hr, which was substantially less than the 174 hr spent listening to a small subset of the total recordings collected.
Output results from Song Scope for each owl recognizer, and the time necessary to verify the output.
We compared the locations where each owl species was detected by listening to where they were detected by the recognizers to determine false negatives. The recognizers increased the number of locations where Barred Owls and Great Horned Owls were detected, but detected Boreal Owls at slightly fewer locations compared to the listening data (Table 2). The recognizers for Barred Owls (full two-phrased hoot) and Great Horned Owls detected these owls at all sites where they were detected by listening to the recordings; however, the Boreal Owl recognizer failed to detect this species at four sites where they were detected by listening. All recognizers missed owls at some ARU locations, but for Barred Owls (full two-phrased hoot) and Great Horned Owls there were very few locations missed, whereas for Boreal Owls the number of locations missed was quite a bit higher (Table 2).
Assessment of false negatives of owl recognizers detecting calls recorded on autonomous recording units (ARUs) deployed at 237 locations within 45 sites surveyed in the spring of 2014. Sites and ARU locations missed by the recognizer had detections from listening, and sites and ARU locations added by the recognizer were not detected from listening to a subset of recordings.
The two Barred Owl recognizers performed similarly when compared to the data from listening. The terminal note recognizer detected Barred Owls at one more site and ARU location than the full two-phrased hoot recognizer, but had lower precision (Table 2). Despite the similarity of the total number of sites and ARU locations with Barred Owl detections, when the results of the two recognizers were directly compared, it became evident that the two recognizers did not detect the target species at all the same locations (Table 3). When the results of both recognizers were pooled, this yielded the highest number of locations with Barred Owl detections (Table 3).
Comparison of the performance of the two Barred Owl recognizers based on the number of sites and autonomous recording unit (ARU) locations where this species was detected using each recognizer.
Of the locations we surveyed with ARUs, 101 had no noise, 79 had low noise, 32 had moderate noise, and 25 had loud noise present on the recordings. For all four recognizers, there were very few hits in total at loud locations (noise index of 3; Fig. 3). For the two Barred Owl recognizers, the greatest number of hits occurred at ARU locations with low levels of noise (index of 1); however, for the Boreal Owl and Great Horned Owl recognizers, there was little difference in the number of hits regardless of whether there was no noise or moderate noise on the recordings (Fig. 3). The average precision for the two Barred Owl recognizers was slightly lower at locations with low noise compared to locations with no noise (Fig. 4). There were no detections of Barred Owls at any ARU locations with moderate or loud noise (Fig. 4). The average precision of the Boreal Owl recognizer was consistent across noise level indices 0 to 2, and there were no hits at all at locations with loud noise (Fig. 4). The Great Horned Owl recognizer also had consistent precision across noise level indices 0 to 2 (Fig. 4). Oddly, the Great Horned Owl recognizer had higher precision at locations with loud noise (Fig. 4). Closer inspection of these results revealed that there were only two ARU locations at this noise level with hits from the recognizer, and for all of them the calls of a Great Horned Owl were clearly audible despite the loud noise. Of the three locations where the recognizers missed Barred Owls and Great Horned Owls where they were detected from listening to recordings (Table 2), two had no noise and one had low noise. Of the 24 ARU locations where Boreal Owls were missed (Table 2), 16 ARU locations had no noise, four ARU locations had low noise, and four ARU locations had moderate noise. Of the four sites where the recognizer missed Boreal Owls, three were sites with a source of chronic noise and one was a site with no noise.
We compared the results of the occupancy models from the recognizer and listening data for each species with forest composition as a covariate to assess whether conclusions drawn about owl habitat use were consistent across acoustic methods. For Barred Owls, we found very similar patterns of habitat use across percent coniferous forest for the two recognizer templates (Fig. 5). From both recognizers, we found that Barred Owl habitat use is highest for forests with a mix of deciduous and coniferous trees (Fig. 5). In contrast, the results from the listening data show a less clear pattern and much lower estimates of probability of use (Fig. 5). For Boreal Owls, we found a similar pattern of increasing habitat use as forests increase in the proportion of coniferous trees for both the recognizer and listening data; however, the listening data had consistently higher overall estimates of probability of use (Fig. 5). Great Horned Owls had similar estimates of probability of use across the range of forest composition, but we saw a dramatic difference in the precision of these estimates between the recognizer data and listening data (Fig. 5). The 95% confidence intervals for the listening data are very large, and this is likely due to low detection of Great Horned Owls by listening to recordings and few repeat detections at the same ARU location.
Using ARUs to conduct passive surveys facilitates collecting acoustic data over longer time scales but leads to large volumes of data that can be very time-consuming to process. Recognizers can potentially provide a solution by scanning the data to search for and identify calls of a target species. For large monitoring projects, it would not be feasible to listen to more than a few recordings per location, leading to a small sample size. Using recognizers allows more recordings to be processed, which can increase the sample size and the number of detections of the target species. Before recognizers can be employed in monitoring or research projects, it is important to test their performance. Our overall objective was to test the utility of using recognizers for three owl species found in North America. The first step was to assess performance of recognizers based on precision, false negatives, and processing time. The Great Horned Owl and Boreal Owl recognizers had relatively high precision, similar to what has been reported for other recognizers for bird and amphibian species using the same software (Buxton and Jones 2012, Brauer et al. 2016). Both Barred Owl recognizers we built had much lower precision, but that precision was similar to that of a recognizer built to identify Wild Turkey (Meleagris gallopavo) calls using the same Song Scope software (Colbert et al. 2015). At first glance, the Barred Owl recognizers may seem less useful because of their low precision and high number of false positives. Precision is useful in comparing different recognizers (Crump and Houlahan 2017), however it is not necessarily the best metric to assess performance of a recognizer. If recognizers are very precise then they have few false positives but potentially more false negatives.
Quantifying the number of false negatives is important to assess recognizer performance because it indicates what the recognizer is missing. Listening and/or visually scanning recordings are currently the best options to determine what the recognizer is missing (Buxton and Jones 2012, Brauer et al. 2016, Campos-Cerqueira and Aide 2016, Crump and Houlahan 2017). However, it can be difficult to objectively assess when a recognizer has truly missed a vocalization. For example, a study on several seabird species visually scanned recordings but only considered decent-quality calls not detected by the recognizer as false negatives (Buxton and Jones 2012). Our general impression is that the signal-to-noise ratio likely has an effect on the rate of false negatives, although we did not examine each missed vocalization to assess if it was faint. Our objective was to evaluate the utility of using automated recognizers to assess owl presence or absence at each survey location, so knowing the number of locations where recognizers missed owls was more important than determining how many recordings or how many calls on the recordings were missed. Given this objective, the recognizers for Barred Owls and Great Horned Owls performed well considering a relatively small number of locations were missed. The Boreal Owl recognizer did not perform as well and detected this species at fewer locations than listening, an example of the trade-off between precision and false negatives (Crump and Houlahan 2017). If a recognizer is less precise, then it will generate more false positives but potentially fewer false negatives, which was the case for the Barred Owl recognizers. The Great Horned Owl recognizer appeared to balance this trade-off to an extent, since it had relatively high precision and few false negatives. To reduce the false negatives for the Boreal Owl recognizer, we could try decreasing the score threshold, but this would likely decrease the precision and increase processing time. Another option would be to combine the two approaches, e.g., use the recognizer first and then subsequently listen to a subset of recordings at ARU locations where Boreal Owls were not detected by the recognizer. Despite some shortcomings, the benefit of using the recognizers to determine presence/absence of owls was that we were able to process many more recordings and obtain a much larger dataset of owl detections than would have been possible with either listening to recordings only or conducting owl field surveys without using ARUs.
Each recognizer differed in the estimated processing time required for trained researchers to check the output results, likely due in part to the ratio of true positives to false positives. Currently, the best way to deal with false positives is to have trained observers review the computer output to filter them out before analyzing the data (Holmes et al. 2014, Colbert et al. 2015, Celis-Murillo et al. 2016). We were able to process the output in a reasonable amount of time (4 to 12 hr per recognizer), and it took substantially less time than listening to a relatively small subset of the recordings. However, our estimate of processing time does not take into account the time required for the computer to scan the recordings. It was not always possible to know when the software finished scanning without monitoring it regularly, and the amount of time was dependent on the processing capability of the computer used. We used multiple computers with different processing capabilities, making it difficult to provide an estimate of the time spent scanning. However, it took a substantial amount of time to run the data collected through the recognizers (on the order of several weeks), and this process had to be repeated to obtain results from all recognizers. Although it took much longer to listen to a subset of recordings, it was possible to obtain data on all owl species heard calling from listening to each recording only once. Listening to recordings was also necessary to obtain good quality clips of owl calls before starting the process of building a recognizer. For small audio datasets (<40 hr of recordings), it can be more efficient to listen to recordings, but this advantage disappears once datasets become larger (E. Knight pers. comm.). Our results indicate that for such a large dataset (approximately 14,086 hr of recordings) there is a significant benefit of using recognizers in terms of processing time.
There is a general perception that noisy recordings can be problematic for automated species recognition, but few studies have attempted to address this directly. We surveyed for owls in areas with varying levels of industrial noise and found that for the Barred Owl and Great Horned Owl recognizers, no sites were missed and the ARU locations missed had low levels of industrial noise or none at all. The Boreal Owl recognizer missed some ARU locations with moderate and low noise, but the majority of ARU locations missed had no noise on the recordings. However, most of the sites missed for Boreal Owls were sites with sources of chronic noise. The precision of the recognizers did not appear to be strongly affected by the presence of industrial noise. Industrial noise on the recordings was predominantly below 1000 Hz, which overlaps substantially with the frequency range of all three owl species calls. There were no detections of Barred Owls from the recognizers at locations with moderate or loud noise; this could be due to difficulties of the recognizers in detecting the calls, but we did not detect them from listening to recordings either, and thus Barred Owls may not be present in noisy areas. Similarly, Boreal Owls may not be present in areas with loud noise, which could explain the lack of detections from the recognizer or listening to recordings. Boreal Owls were missed at several locations with moderate noise, indicating that they were sometimes present and calling in moderately noisy areas and the recognizer was not always able to detect them in these areas, potentially due to the frequency overlap with industrial noise. Great Horned Owls were detected in very noisy areas, and the recognizer appeared to be able to detect them despite the noise overlapping their calls. Overall, our results suggest recognizers can function in non-ideal recording environments to some extent.
We found that the two different recognizer templates we tested to detect Barred Owl calls performed similarly. We initially thought that because of the length and variability of the two-phrased hoot of the Barred Owl, that using the full call as the template might be less effective for automated recognition. The recognizer using the terminal two notes of the call had lower precision, and the increased number of false positives led to a greater total processing time. Both recognizers detected Barred Owls at a similar number of locations, but interestingly these locations did not completely overlap and each recognizer detected owls at a few different locations. Nevertheless, compared to the listening data both recognizers had few false negatives, and had nearly identical estimates of habitat use across a range of forest composition. These results suggest that biological inference on habitat use by owls is robust to changes in the template used to build a recognizer. Automated recognition approaches are still relatively new, and because there is no established methodology for building recognizers, it is important to explore potential differences in biological inferences from different methods. Other studies that have sought to identify best practices in building recognizers in Song Scope have focused on score threshold (Brauer et al. 2016), amount of training data, and temporal/spectral settings (Crump and Houlahan 2017). Our work contributes to this field, and suggests that for species with long vocalizations using a template of the entire vocalization does not negatively influence the effectiveness of automated recognition.
Acoustic surveys are often used to determine habitat associations of vocalizing species. So for owl monitoring programs it is important to determine if the results of owl habitat use from automated recognition is consistent with results based on listening to recordings. Barred Owls tend to be found most frequently in mixed wood forests (Mazur et al. 1998, Russell 2008), and our results of habitat use from both recognizers are congruent with the literature; however, the preference for mixed woods was less apparent with the results from the listening data. Boreal Owls tend to be found in more coniferous forests (Hayward et al. 1993, Lane et al. 2001), and we found a similar pattern of increasing habitat use in more coniferous forests for both the recognizer and listening data. Although the estimates of the probability of use by Boreal Owls are higher from the listening data across the range of forest composition, the trend is very similar for the recognizer data and thus we would make similar conclusions of preferred habitat from either dataset. Great Horned Owls are habitat generalists and use a wide variety of different habitats across North America (Laidig and Dobkin 1995, Bennett and Bloom 2005, Grossman et al. 2008). So it is not that surprising that our results suggest that Great Horned Owls are equally likely to use areas across a range of forest composition. Although the estimates of habitat use were similar for both methods for Great Horned Owls, the precision of the estimates from the occupancy models was much better using the recognizer dataset. Our results suggest that using automated recognition can lead to similar biological inferences in terms of owl habitat use, and can be preferable to obtain more precise estimates when using occupancy models.
The recognizers we built for the different owl species worked well, and in our opinion their performance was adequate to determine presence or absence of owls within a study area. Our approach could assist in scanning recordings to assess fine-scale habitat preferences and estimate density by localizing individuals in microphone arrays, where each ARU is synchronized using the time on a Global Positioning System (GPS) attachment (Mennill et al. 2012). We used a particular software to test the utility of automated recognition of owl calls, but there are several other software options available (e.g., the R package monitoR [Katz et al. 2016], Raven Pro by the Cornell Lab of Ornithology). New software and new techniques are likely to be developed as the field of bioacoustics progresses, so we stress that this study is not to demonstrate the utility of the particular software that we used. We argue that given the relatively low detection rates of owls by listening to recordings, using an automated recognition approach is likely to be highly useful for monitoring and studying owls. However, the output needs to be verified to remove false positives in the data. Despite the time needed to verify the output, we have clearly demonstrated the efficiency that can be gained by using recognizers for these owl species and we suggest that similar increases in efficiency could be obtained with recognizers built for other owl species.
We thank members of the Bayne lab, D. Wilson, and an anonymous reviewer for helpful comments on an earlier draft of this report. We thank A. MacPhail, M. Knaggs, and S. Wilson for their assistance in the field. We thank M. Foisy, C. Charchuk, and S. Tkaczyk for their assistance developing the recognizers. We thank N. Boucher and students and volunteers who listened to recordings and checked the output of the recognizers. We thank H. Lankau for organizing the field recordings and maintaining the database. This work was supported by funding from the National Science and Engineering Research Council, the Northern Scientific Training Program, the University of Alberta North program, the Alberta Conservation Association, the Environmental Monitoring Committee of the Lower Athabasca, Nexen Energy, Canadian Oil Sands Innovation Alliance, and the Oil Sands Monitoring program operated jointly by Alberta Environment and Parks, and Environment and Climate Change Canada. Due to the passive nature of our research, we were not required to obtain animal care approval or other permits/licenses.