Staggered-Entry Analysis of Breeding Phenology and Occupancy Dynamics of Arizona Toads from Historically Occupied Habitats of New Mexico, USA

For species with variable phenology, it is often challenging to produce reliable estimates of population dynamics or changes in occupancy. The Arizona Toad (Anaxyrus microscaphus) is a southwestern USA endemic that has been petitioned for legal protection, but status assessments are limited by a lack of information on population trends. Also, timing and consistency of Arizona Toad breeding varies greatly, making it difficult to predict optimal survey times or effort required for detection. To help fill these information gaps, we conducted breeding season call surveys during 2013–2016 and 2019 at 86 historically occupied sites and 59 control sites across the species' range in New Mexico. We estimated variation in mean dates of arrival and departure from breeding sites, changes in occupancy, and site-level extinction since 1959 with recently developed multi-season staggered-entry models, which relax the within-season closure assumption common to most occupancy models. Optimal timing of surveys in our study areas was approximately 5–30 March. Averaged across years, estimated probability of occupancy was 0.58 (SE = 0.09) for historical sites and 0.19 (SE = 0.08) for control sites. Occupancy increased from 2013 through 2019. Notably, even though observer error was trivial, annual detection probabilities varied from 0.23 to 0.75 and declined during the study; this means naïve occupancy values would have been misleading, indicating apparent declines in toad occupancy. Occupancy was lowest during the first year of the study, possibly due to changes in stream flows and conditions in many waterbodies following extended drought and recent wildfires. Although within-season closure was violated by variable calling phenology, simple multi-season models provided nearly identical estimates as staggered-entry models. Surprisingly, extinction probability was unrelated to the number of years since the first or last record at historically occupied sites. Collectively, our results suggest a lack of large, recent declines in occupancy by Arizona Toads in New Mexico, but we still lack population information from most of the species' range.

For species with variable phenology, it is often challenging to produce reliable estimates of population dynamics or changes in occupancy. The Arizona Toad (Anaxyrus microscaphus) is a southwestern USA endemic that has been petitioned for legal protection, but status assessments are limited by a lack of information on population trends. Also, timing and consistency of Arizona Toad breeding varies greatly, making it difficult to predict optimal survey times or effort required for detection. To help fill these information gaps, we conducted breeding season call surveys during 2013-2016 and 2019 at 86 historically occupied sites and 59 control sites across the species' range in New Mexico. We estimated variation in mean dates of arrival and departure from breeding sites, changes in occupancy, and site-level extinction since 1959 with recently developed multi-season staggered-entry models, which relax the within-season closure assumption common to most occupancy models. Optimal timing of surveys in our study areas was approximately 5-30 March. Averaged across years, estimated probability of occupancy was 0.58 (SE ¼ 0.09) for historical sites and 0.19 (SE ¼ 0.08) for control sites. Occupancy increased from 2013 through 2019. Notably, even though observer error was trivial, annual detection probabilities varied from 0.23 to 0.75 and declined during the study; this means naïve occupancy values would have been misleading, indicating apparent declines in toad occupancy. Occupancy was lowest during the first year of the study, possibly due to changes in stream flows and conditions in many waterbodies following extended drought and recent wildfires. Although within-season closure was violated by variable calling phenology, simple multi-season models provided nearly identical estimates as staggered-entry models. Surprisingly, extinction probability was unrelated to the number of years since the first or last record at historically occupied sites. Collectively, our results suggest a lack of large, recent declines in occupancy by Arizona Toads in New Mexico, but we still lack population information from most of the species' range. N ATURAL history museums and archived data are valuable resources for determining changes in species distributions or how common they are across landscapes (Sullivan, 1993;Shaffer et al., 1998;Tingley and Beissinger, 2009;Meineke et al., 2019). Reliance on surveys of historically occupied sites based on museum records may often overstate declines, however, because assuming a non-zero patch extinction rate, occupancy of historically occupied sites can only decrease over time (Skelly et al., 2003;Fournier et al., 2019). Sampling a combination of historically occupied sites plus control sites selected without prior knowledge of a species' presence can provide reliable information on changes over time and unbiased estimates of current occupancy (Skelly et al., 2003;Weiser et al., 2020).
The challenge of producing reliable population estimates at local and regional scales is magnified for species with variable phenology, such as the Arizona Toad (Anaxyrus microscaphus). The Arizona Toad primarily breeds in slowflowing streams of the southwestern United States. Their breeding season can begin from February to April, depending on elevation, and is initiated by warming nighttime temperatures rather than rainfall (Schwaner and Sullivan, 2009). Arizona Toads breed during a four-to eight-week period in early spring when males call from streamside habitats to attract females (Blair, 1955;Sullivan, 1992;Ryan et al., 2015). Timing and consistency of Arizona Toad breeding varies greatly throughout its range (Blair, 1955;Sullivan, 1992;Degenhardt et al., 1996), however, making it difficult to predict optimal survey times and the number of surveys required to be confident that males occupying a patch of breeding habitat will be detected.
The Arizona Toad has been petitioned for protection under the U.S. Endangered Species Act because of threats from hybridization with Woodhouse's Toads (A. woodhousii), changes in hydrological regimes, and drought (Bradford et al., 2005;Schwaner and Sullivan, 2009;USFWS, 2015;Wooten et al., 2019). Suspected declines have been reported from Nevada, Arizona, and New Mexico, and the species is a Species of Greatest Conservation Need in all four states where it occurs, which includes Utah (Bradford et al., 2005;AZGFD, 2015;NVDW, 2015;UDWR, 2015;NMDGF, 2016). Due in part to the unusual natural history of Arizona Toads among amphibians in the region, conservation assessments are confounded by a lack of information on population status and insufficient evidence of presumed threats.
Accounting for variation in phenology and detection, and determining optimal timing and survey effort, is critical for producing reliable population status information for the Arizona Toad. To help fill information gaps, we conducted breeding season call surveys during 2013-2016 and 2019 across the range of the species in southwestern New Mexico. We surveyed 86 sites where museum records indicated presence of toads during opportunistic collections from 1959-2009 (hereafter, historical sites) and 59 nearby control sites where we had no knowledge of previous occurrence by toads. We used recently developed staggered-entry multiseason occupancy models to help account for variation in calling activity (Kendall et al., 2013;Chambert et al., 2015). The staggered-entry design is like other multi-season occupancy models that estimate site occupancy, colonization, and extinction while accounting for imperfect detection, but it relaxes the within-season closure assumption by incorporating additional entry and departure parameters. These entry and departure parameters allow estimation of the true state of patches that can transition to occupied (entry) or unoccupied (departure) between surveys (Kendall et al., 2013;Chambert et al., 2015).
To provide guidance for future surveys of Arizona Toads and other species with similar variation in phenology, we compared results from staggered-entry models to those from standard multi-season occupancy models (MacKenzie and Royle, 2005). With the standard multi-season model, entry and departure from patches or variation in calling activity (i.e., true within-season variation in the ability to detect a species) is considered detection error and may provide a less accurate accounting of patch dynamics than staggered-entry models. But when estimating species' conservation status or population trends is of primary interest rather than characterizing variation in phenology, the simple multi-season model-which estimates fewer parameters than the staggered-entry design-may be more desirable because it can produce more precise estimates (Chambert et al., 2015). We compared three sets of dynamic occupancy models with similar covariates to (1) estimate annual occupancy of Arizona Toads during 2013-2019, especially with regard to site type (historical vs. control sites) and habitat type (lotic vs. lentic); (2) describe temporal variation in calling activity and how it affects detection; (3) compare occupancy estimates from staggered-entry and simple multi-season models; and (4) estimate the rate of change in occurrence at historically occupied sites.

MATERIALS AND METHODS
Study system.-In New Mexico, the Arizona Toad most often occurs between 1,800-2,700 m elevation in the upper Gila River Basin, including the Gila and San Francisco Rivers, the upper Mimbres River basin of the eastern Mogollon Rim, and in a disjunct population that occurs in the Rio Grande basin east of the Continental Divide in the San Mateo Mountains (Degenhardt et al., 1996;Jennings et al., 2010;Ryan et al., 2017). Most breeding in New Mexico occurs from early March to late April, when males call from stream margins, side channels, or outflows from reservoirs (Sullivan, 1992;Degenhardt et al., 1996). Arizona Toads will also occasionally breed in lentic stock tanks or similar isolated impoundments (M. Ryan, pers. obs.).
Historical records and site selection.-To identify historical survey locations, we queried eight regional and two national museum collections (Museum of Southwestern Biology, Western New Mexico State University, New Mexico State University, University of Kansas, University of Arizona, United States National Museum, Los Angeles County Museum, Texas Cooperative Wildlife Collection, University of Texas at El Paso, and the American Museum of Natural History) with the largest Arizona Toad collections (Degenhardt et al., 1996;Ryan et al., 2015Ryan et al., , 2017. See Ryan et al. (2017) for details on museum queries and specimens. There were six instances where separate collection events were within 1.5 km of each other. In these cases, we drew a centroid around the collection locations and assigned the center point as the historical survey site. The life stage of museum specimens (egg, larva, or metamorph) confirmed the presence of toad breeding activity at 75 percent of historical sites.
For control sites, we selected suitable waterbodies near historical sites. Even small groups of calling toads can be heard from~1.0 km (M. Ryan, pers. obs.). Therefore, all but 15 control sites (5 percent) were at least 1.5 km from historical sites. For the 15 exceptions, landscape or topographic features increased the effective distance and ensured sampling independence between sites. The median distance between historical sites and the nearest control site was 4.79 km (sd ¼ 7.76).
Calling surveys.-We conducted standardized surveys during 2013-2016 and 2019 at historical and control sites throughout most of the known species' range in southwestern New Mexico (mean elevation of sampled sites: 1,879 m; Fig. 1). Night calling surveys were conducted weekly during the breeding season, beginning in early March and ending late May (only 10 surveys occurred during May). We divided the approximately 21,727 km 2 sampling area into two broad units: one encompassed the Mimbres River drainage and Middle, East, and West Forks of the Gila River; the other encompassed the San Francisco River drainage and upper Gila River. Sampling occurred throughout the species' approximately eight-week seasonal breeding period. During each survey, a single observer listened for three minutes and noted the detection or non-detection of calling toads (Ryan et al., 2015). We stopped nightly surveys no later than 0030 hours, even though toads would call as late as 0200 hours.
The sampling design used during 2013-2016 precluded distinguishing detection error from real variation in calling activity at calling sites. Therefore, in 2019 we modified the survey design by having the single observer record (with a Sony PCM-10 recorder) their 3-minute sample session using a Wildtronics parabolic microphone directed toward the potential calling site (files are archived at the Museum of Southwestern Biology). An observer with no knowledge of field results later listened to each recording and assigned detection or non-detection of calling toads, providing a second survey of the same period that we used to evaluate detection error.
Analysis.-We used PRESENCE version 12.39 to perform three analyses that estimated initial occupancy, colonization, and extinction while accounting for imperfect detection. Our primary analysis was a staggered-entry multi-season model based on all surveys from 2013 through 2019. For initial occupancy and colonization, we evaluated the importance of site type (86 historical vs. 59 control) and habitat type (21 lentic vs. 124 lotic; Table 1). Given the low initial occupancy in 2013, we did not have enough information to estimate the effect of site or habitat type on extinction. We also evaluated trends (year treated as a continuous covariate) for coloniza-tion and extinction (Table 1). To model detection, we included year of sampling as either a trend or factor variable, plus the effects of cloud cover, moon, air temperature, and wind during call surveys. For staggered-entry models, detection is the probability of detecting toads in a site that was truly occupied. We included the effects of year   Sullivan, 1992;Degenhardt et al., 1996;Johnson and Batie, 2001;Corn et al., 2011;Weir et al., 2014 (continuous) and Julian date (day of year) to describe variation in entry and departure parameters (Table 1). We evaluated models in a step-down fashion, using AIC values to select the most parsimonious model (Doherty et al., 2012). We used coefficient effect sizes and confidence intervals to identify uninformative parameters (Arnold, 2010) and assessed the stability of covariate coefficients throughout the model table to identify potential confounding variables. For supported covariates, we used estimated coefficients to predict probabilities for model parameters across the range of collected data (MacKenzie et al., 2006). Confidence intervals for each predicted estimate were estimated based on the delta method (Seber, 1982), using packages msm (Jackson, 2011) and emdbook (Bolker, 2020) in program R v3.6.3 (R Core Team, 2020).
After identifying the best-supported staggered-entry model, we fit two subsequent models with the same covariates. First, given the relative complexity of implementing and interpreting staggered-entry models, we compared our bestsupported staggered-entry model to the more common simple multi-season occupancy model. With the standard multi-season design, entry and departure from patches (i.e., true within-season variation in the ability to detect a species) is a violation of model assumptions and is absorbed into detection (Chambert et al., 2015). Therefore, we included Julian date on the detection parameter for the simple multiseason models but otherwise held the model structure the same as the top-ranked staggered-entry model. Second, to provide insight into long-term rates of patch-level extinction for Arizona Toads, we re-fit the top-ranked model using only the 62 historical sites for which we had information on years of historical observations. We evaluated two added covariates for initial occupancy derived from museum records and archived field notes: years since first historical observation for a given site (11-57, mean ¼ 36.5, sd ¼ 11.8) and years since the most recent historical observation for a given site (10-56, mean ¼ 30.2, sd ¼ 11.1).

RESULTS
Calling surveys.-We conducted 1,986 calling surveys for Arizona Toads at 145 sites during 2013-2016 and 2019. We averaged 13.7 (sd ¼ 8.33) visits per site across all years, although the number of sites and visits varied between years ( Table 2). Comparison of detections from in-person surveys and recordings of 278, 3-minute surveys during 2019 revealed almost perfect concordance. There were only two cases that differed in reported toad detection, and both discrepancies were occasions where field personnel heard toads during the site visit but the independent observer that listened to the recording did not. This result indicates observer error in the field was near zero, and thus the most likely source of false negatives (assigning a patch as vacant when it was truly occupied) was from variation in toad calling activity. Based on these low error rates, we did not use data from the recordings in subsequent analyses.
Staggered-entry multi-season model.-The top model from our staggered-entry analysis indicated that detection varied by year (factor) and percent of moon visible (Table 3). Estimates from this model showed toad detection was highest in the first year of the study (2013) and decreased in all but one of the subsequent years (Table 4). While the covariate for moon visibility was in the top-ranked and several other highly ranked detection models, the estimated effect was small and uninformative (b ¼ -0.0001, 95% CI ¼ -0.0003-0.0001; Arnold, 2010). Based on entry and departure estimates from the best-supported staggered-entry model, the probability that toads were present and calling was highest from approximately 5 March to 30 March and peaked at 0.80 on 14 March (Fig. 2). This top-ranked model included the historical vs. control site covariate to describe variation in occupancy, but not habitat. Estimated probabilities of occupancy in 2013 were similar for historical sites (0.158, 95% CI ¼ 0.080-0.288) and control sites (0.067, 95% CI ¼ 0.016-0.241), but because of higher colonization rate at historical sites, occupancy diverged thereafter (Table 5, Fig.  3A).
Simple multi-season model.-When we re-fit the calling data to a simple multi-season occupancy model, yearly estimates of detection were generally lower than when estimated with the staggered-entry model, as expected (Table 4). Estimates from the simple multi-season model indicated the probability of detection declined through the season (b ¼ -0.006, 95% CI ¼ -0.014-0.0001; Fig. 4). All other parameter estimates from the simple multi-season model were similar to those from the staggered-entry model, resulting in nearly identical occupancy estimates over time (Table 5, Fig. 3B).
Historical site models.-Re-fitting the top staggered-entry model with only data from 62 historical sites and information on years since the first record for a given site (11-57 yrs) or years since the most recent record for a given site (10-56 yrs) showed that both models provided a similar fit to the data (DAIC ¼ 0.05; Table 3). The occupancy coefficient did not differ from 0 based on years since the first observation (b ¼ 0.0006, 95% CI ¼ -0.0045-0.0057) nor years since the most recent observation (b ¼ 0.0006, 95% CI ¼ -0.0045-0.0059; Supplemental Fig. 1; see Data Accessibility). Both historical site models estimated~21% of historically occupied sites were still occupied during 2013-2016 and 2019.

DISCUSSION
Surveys for Arizona Toads during 2013-2016 and 2019 covered approximately 85% of documented historical localities for the species in New Mexico and nearly its entire known range in the state. Based on surveys of 86 historical and 59 control sites, historical sites were approximately three times more likely to be occupied, and occupancy did not vary between lentic and lotic habitats. Analysis of the data using a staggered-entry framework also revealed variation in timing of when toads started (patch entry) or stopped calling (patch departure) and allowed us to identify optimal timing for future surveys. Our results also highlight that conclusions based on naïve results unadjusted for variation in detection probability would have greatly underestimated occupancy by the end of the study, especially for historically occupied sites. Our main staggered-entry analysis suggested that estimated occupancy of historically occupied sites increased sharply from a low of~16% in 2013 to~76% in 2019, while occupancy of control sites increased from~6% to~30% during the same time (Fig. 2). Although this increase is surprising, the result was consistent across all sub-models. Based on modeled estimates and naïve results (Table 2), 2013 was the year that toads were rarest. Year 2013 was also the year with the most intensive per-site survey effort and highest probability of detection (staggered-entry model), lending credence to the low occupancy estimates. Importantly, there was also a trend of lower detection during the study (Table 4), which caused an increasing mismatch between naïve and estimated occupancy over time.
We suspect low initial occupancy in 2013 resulted from extended drought that continued through the winter of 2015, and two wildfires that burned portions of the study area in 2011 (Miller Fire,~359 km 2 ) and 2012 (Whitewater-Baldy Fire,~1200 km 2 ; Ryan and Latella, 2013;Whitney et al., 2016;Gido et al., 2019). Approximately 50% of surveyed historical toad sites were dry during spring 2013 due to extended drought that reduced stream flows in parts of the Mimbres River drainage and small streams in the western part of the study area (Ryan and Latella, 2013;Gido et al., 2019). Breeding dynamics of the closely related Arroyo Toad (A. californicus) in southern California, which has similar habitat requirements as the Arizona Toad, are also strongly linked with drought-related variation in water availability in small streams (Sweet and Sullivan, 2005;Miller et al., 2012).
In 2013, many of the inundated sites in the West and Middle Forks of the Gila River also had heavy post-fire flooding and channel scouring that eliminated the clear, slow-flowing pools needed for toad breeding and tadpole development (Ryan and Latella, 2013;Gido et al., 2019). Stream and debris flows commonly increase after large wildfires in the southwestern USA (DeGraff et al., 2015;Sedell et al., 2015), sometimes causing loss of important habitat and localized extirpations of amphibians and other aquatic fauna (Hossack and Pilliod, 2011;Whitney et al., 2016;Zylstra et al., 2019). The combined effects of drought and habitat changes from post-fire flooding and sedimentation could have caused extinctions in patches in 2013 that were later recolonized, similar to rapid recovery of other amphibians and stream fishes that have evolved in stochastic environments (Miller et al., 2012;Whitney et al., 2016;Hossack et al., 2017;Gido et al., 2019;Zylstra et al., 2019). This evidence for localized extinction resulting from environmental variability highlights the importance of long-term studies, especially when resampling historically occupied sites in stochastic environments. Had our study concluded in 2013 or 2014, patch dynamics may have confounded longterm trends in occupancy, and low rates of occupancy in Table 3. Comparison of several staggered-entry occupancy models and the covariates used in each for occupancy (w), colonization (c), extinction (), entry (E), departure (D), and detection (P) parameters for the Arizona Toad, as well as the model structures for the simple multi-season (SMS) model and models based only on historical sites. Covariates included in the table are historical vs. control sites (Hist/Con), lentic vs. lotic sites (Len/ Lot), Julian date (Jul), percent moon cover (M), wind (Wi), air temperature (C, Celsius), year as a factor (YrF), and year as a trend (YrT). Staggeredentry models are ordered by delta AIC scores (DAIC), model weights (W), and -2 log likelihood values (LogLik).  The simple multi-season model did not incorporate the entry and departure parameters from the staggered-entry model, and temporal variation in calling activity was considered detection error. As a result, estimated probability of detection averaged 23% lower across years for the simple multi-season vs. staggered-entry models, and detection from the simple multi-season model varied greatly across years even based on the same model (e.g., 0.75 in 2013 vs. 0.23 in 2016; Table 4). However, the nearly perfect concordance of detections from live surveys and recordings of the same 3minute surveys in 2019 suggests variation in detection during the study was from toad calling activity, not observer error. When planning surveys, the lower end of detection estimates provides a safer margin for ensuring reliable information compared with using the mean, especially if few repeat surveys are conducted within a season or if a study will not extend for several years. Alternatively, removal designs that stop surveys of a site once a target species has been detected might provide a compromise between oversampling (years with high detection) and under-sampling (years with low detection; MacKenzie and Royle, 2005). Removal designs also allow allocation of effort to a larger number of sites, potentially increasing precision and better reflecting environmental and spatial variation.

SE model
Despite clear violation of the within-season closure assumption that is common to simple multi-season models-the assumption that staggered-entry models relax-the simpler model framework provided nearly identical occupancy and vital rate (colonization, extinction) estimates (Table 5). This similarity suggests if scientists are not explicitly interested in describing variation in phenology, the simpler model will suffice. Still, the staggered-entry approach provided useful insight into the variable calling phenology of Arizona Toads and optimal timing to detect them. Based on surveys during 2013-2016 and 2019, the derived probability of patch availability was highest during approximately 5 March to 20 March, which corresponded to the early portion of sampling efforts during most years (Fig.  2). Sampling during this optimal date range would have maximized the likelihood of encountering toads and decreased the likelihood of recording false absences.
Broad-scale surveys focused on historically occupied sites have provided some of the most compelling evidence of severe amphibian declines, including in the western USA (Clarkson and Rorabaugh, 1989;Corn et al., 1989;Drost and Fellers, 1996;Fisher and Shaffer, 1996). For example, during 1983-1987, the (now) federally threatened Chiricahua Leopard Frog (Rana chiricahuensis) was found at only two of 36 sites in Arizona where it was documented in the 1960s and 1970s (Clarkson and Rorabaugh, 1989). We expected refitting the top-ranked staggered-entry model with information on years since the first observation (11-57 yrs) and years since the most recent observation (10-56 yrs) of Arizona Toads at historically occupied sites would result in a steep extinction curve, with older sites the least likely to be occupied. However, occupancy was unrelated to time since the first or last record, indicating a site where toads were last documented 50 years ago was just as likely to be occupied as a site where toads were last documented 15 years ago (Supplemental Fig. 1; see Data Accessibility). Imprecise location information for some museum specimens, and a lack of information on whether some historical sites hosted breeding activity historically or simply had a vagrant adult, adds uncertainty to estimates for historically occupied sites. Nevertheless, the near-zero relationship between time and occupancy for these sites suggests a lack of widespread, recent population declines in the system. The lack of a relationship between age of record and extinction is notable because surveys of historically occupied sites are biased towards documenting extinction. Whether by chance or from changes in habitat or environmental conditions, populations are prone to extinction (Skelly et al., 2003;Fournier et al., 2019).
Overall, our results indicate calling activity (and thus detectability) of Arizona Toads in southwestern New Mexico is highly variable within and among years, but our results do not indicate large declines in patch occupancy in recent decades. These occupancy trends were reinforced by multiple model types and substantiated by a flat time-occupancy Fig. 2. Estimated probabilities of Arizona Toads entering sites between subsequent surveys (orange), given that they have not previously entered, based on the estimated relationship between entry and Julian date. Estimated probabilities of Arizona Toad departing sites between subsequent surveys (blue), given that they are already present, based on the estimated relationship between departure and Julian date. Yellow points show the probabilities that toads were available to be detected for each day. Julian values range from 1 March to 30 May. Table 5. Mean probabilities of colonization, extinction, and occupancy (SE) from the staggered-entry multi-season occupancy model and the simple multi-season occupancy model. As per our best-supported model structures, we estimated separate rates of colonization for historical and control sites, but constant rates of extinction for all site types. curve at historically occupied sites. Further, large increases in occupancy of historical and control sites since 2013, the year following an extended drought and post-fire habitat changes that likely made many breeding patches temporarily unsuitable, suggest that local breeding sites can be re-occupied by adults. Yet the large differences in detection probabilities among years also suggest populations may be small, and we lack information on whether most breeding sites reliably produce tadpoles that survive to metamorphosis. Importantly, we also still lack information on the status and trends of populations over most of the species' range.

DATA ACCESSIBILITY
Supplemental material is available at https://www. ichthyologyandherpetology.org/h2020133. Unless an alternative copyright or statement noting that a figure is reprinted from a previous source is noted in a figure