Brood success of northern bobwhite is biased by incomplete detectability during flush-counts

Flush-count surveys of game bird broods are a common method of deriving estimates of brood survival, but detection of chicks during surveys is < 1 due to factors such as vegetation obscurity, adult brooding behavior and variation by observer. Radio-telemetry is an alternative method for estimating survival that circumvents such factors and allow for higher detection. However, this practice is costly and labor-intensive and therefore not readily adopted. We sought to estimate detection probability of chicks during flush-counts as a function of vegetation height and adult brooding behavior. Secondly, we evaluated compromises in detection by comparing estimates of brood survival derived from flush-counts and radiotelemetry. Lastly, we compared counts between two observers to discern whether an additional observer could increase accuracy of counts. We radiomarked 247 northern bobwhite Colinus virginianus chicks at 10–12 days of age and conducted 46 flush-counts at 21 days of age. Vegetation height substantially decreased detection (β = –1.18; 95 CrI: –1.68 to –0.73); mean detection probability was 0.30 (95 CrI: 0.22–0.40). Observers failed to detect radio-marked chicks when adults exhibited running behavior (n = 16 chicks, n = 6 surveys), orphaning occurred (n = 11 chicks, n = 5 surveys) or brooding adults died or had transmitter failures (n = 4 chicks, n = 3 surveys). An additional observer did not affect counts with a mean difference of –0.6 chicks (95 CrI: –4.0 to 2.7) counted between observers. Chicks were not detected during 47% of surveys when ≥ 1 radio-marked individuals were known to be alive. Brood survival was 0.83 (95 CrI: 0.70–0.92) and 0.48 (95 CrI: 0.34–0.62) for radiotelemetry and naïve flush-counts, respectively. Because of low detectability of chicks during flush-counts, alternative methods should be considered by future researchers.

A comprehensive understanding of survival at all life stages is imperative for understanding population dynamics (Sandercock et al. 2008, Ludwig et al. 2018. While adult survival is relatively easy to assess by radio-marking individuals, survival of game bird chicks has been more difficult to study (Dahlgren et al. 2010a). Most recently, chicks of larger sized Galliformes such as wild turkeys Meleagris gallopavo and greater sage-grouse Centrocercus urophasianus have been tagged with radio-transmitters that allow daily surveillance (Burkepile et al. 2002, Spears et al. 2005, Dahlgren et al. 2010a. Radio-tags have also been attached to chicks of smaller sized species such as quails (Odontophoridae), but the method has not been readily adopted (Lusk et al. 2005, Orange et al. 2016, Lunsford et al. 2019, Tanner et al. 2019, Terhune et al. 2020 for exceptions). More commonly, estimates of chick survival for quails have been derived from flush-counts, where observers count the number of chicks flushed in a brood and calculate survival as a function of loss over a given time, based on the number of hatched eggs from a brood's nest (Cantu and Everett 1982, DeMaso et al. 1997, Hubbard et al. 1999, Mueller et al. 1999, Kamps et al. 2017. Similarly, brood survival is a commonly reported metric defined as the proportion of broods with ≥ 1 chick detected during flush-count surveys (Riley and Conway 2020). The age (days) in which chick survival or brood survival is assessed varies but most flush-counts occur at an age when moderate flight capabilities have been achieved, which is approximately 21 days post-hatch for quails (DeMaso et al. 1997, Paisley et al. 1998, Pitman et al. 2006, Orange et al. 2016. Estimates of chick survival from field studies tend to be lower than expected, and estimates from telemetry studies slightly higher when compared to flush-counts (Hubbard et al. 1999, Aldridge and Boyce 2007, Dahlgren et al. 2010a, Lunsford et al. 2019, Chamberlain et al. 2020. Biases in survival estimates from flush-counts may arise from multiple sources. For example, northern bobwhite Colinus virginianus broods frequently amalgamate (e.g. adopt, creche, kidnap, gang-brood, orphan) before chicks achieve 21 days of age, when flush-counts are conducted (Faircloth et al. 2005, Brooks andRollins 2007). Moreover, non-biological factors such as visual obscurity and variation in counts between observers may also affect detection (Anderson et al. 2001, Collier et al. 2007, Refsnider et al. 2011. While others have compared estimates of brood survival between flushcounts and radio-marked broods of bobwhite (Orange et al. 2016), few attempt have been made to understand what factors affect detection, and consequently, estimates of survival (Riley and Conway 2020). A comprehensive understanding of predictors that alter detection may allow researchers to become more efficient during flush-counts, and thereby obtain more reliable estimates of brood survival and-or chick survival.
Our objectives were to: 1) estimate detection probability of northern bobwhite chicks during flush-counts; 2) calculate differences in counts between observers; 3) evaluate the effect of vegetation height and adult behavior (e.g. run versus hold) on detection of chicks; 4) compare estimates of 21-day brood survival between samples of radio-marked chicks and conventional flush-counts, and; 5) calculate the frequency of amalgamations, particularly adoption and orphaning, occurring before flushing. We predicted that brood orphaning would decrease detection of chicks whereas adoption would not affect detection during flush-counts. We also predicted adults that ran to evade observers would decrease detection. Lastly, we expected an inverse relationship of vegetation height and detection probability. We conducted counts with one open and one blind observer. Blind observers were unaware of the fate of radio-marked chicks whereas the open observer frequently monitored radio-marked chicks and knew the number of individuals still at risk and their respective radio frequencies. As such, we predicted insight from consistent tracking would manifest in more accurate counts for open observers.

Study area
This study was conducted at Tall Timbers Research Station in Leon County, FL, USA during 2018-2020. Tall Timbers is a 1568-ha area comprised of upland pine forests (66%), annually disked fallow fields (13%) and hardwood drains (21%) (Terhune et al. 2019). Short-leaf pine Pinus echinata, loblolly pine P. taeda, long-leaf pine P. palustris and live oak Quercus virginiana comprise the majority of upland canopy, whereas undergrowth is comprised of comparable proportions of grass, forb and shrub cover (Terhune et al. 2019).

Field methods
Adult bobwhite were trapped in walk-in funnel traps during January, March and November as part of a long-term population monitoring study at Tall Timbers (Palmer and Sisson 2017, Palmer et al. 2019, Terhune et al. 2019. A subsample of trapped individuals was marked with 6-g necklace-style VHF radio transmitters (American Wildlife Enterprises, Tallahassee, FL, USA) and monitored via radio telemetry at a rate of three times/week during October-April and daily during the nesting season (May-September). Locations were obtained via homing (White and Garrott 1990) and recorded on a portable-document format (PDF) map (Avenza Systems, Inc., Toronto, ON, Canada) using a GPS-enabled mobile device (e.g. iPad, iPhone or Android device). When individuals were documented in the same location after two consecutive visits, we assumed they were incubating. Incubating bobwhite were monitored daily until a nest fate was declared (e.g. hatch, depredation). Upon a hatched nest, the incubating adult was tracked at a rate of three times per day for a concurrent study of habitat use. At 10-12 days of age, broods were captured using a corral method (Smith et al. 2003) and marked using patagial wing tags (National Band and Tag Co., Newport, Kentucky, USA) (Carver et al. 1999); a subsample was marked with 0.75 g VHF radio tags (American Wildlife Enterprises, Monticello, FL, USA), using a modified suture technique (Terhune et al. 2017, Lunsford et al. 2019).

Flush-counts
Flush-counts were conducted at 21 days of age by two observers. We used two observers because an observer who knows whether chicks should be present may be more (or less) inclined to be thorough during flushing procedures. That is, an open observer who knew radio-marked chicks were alive and present could have increased effort to detect chicks during flush counts. Alternatively, if all radio-marked chicks in a brood died, an open observer may have been inclined to spend less effort on a flush because of preconceived notion that brood survival was poor. A blind observer, on the other hand, conducted counts with effort similar to that of conventional flush counts until an arbitrary level of satisfaction was met, unaware of prior brood survival. Initially, in the pilot year of the study (2018) we sought to simply document presence or absence of chicks, but it became apparent that variation existed in counts between observers when chicks were present. Thus, starting in 2019 observers obtained independent counts to compare counts between the blind and open observer.
During flush-counts, the blind observer acted as the telemeter and homed to approximately 20 m from the assumed brooding adult (White and Garrott 1990). Observers then briefly stopped and deduced specifically where the brood was thought to be. Once the observers had a reference location, the observers briskly moved towards this point until the adult flushed. Upon flushing, observers counted the number of adults and chicks present in the brood and distinguished whether chicks were all the same size and-or exhibited similar flight capabilities to determine if the brood may have adopted chicks from a brood of a different age. When chicks were all the same size and exhibited comparable flight capabilities during flush-counts, we assumed no adoption had occurred, unless more chicks were counted than known to be with the brood during capture. Adoption of chicks of similar ages (i.e. within approximately five days of age) may be difficult to distinguish based on size and flight capabilities alone. Thus, our estimates of adoption could be biased low. All flush counts occurred during the daytime, from approximately 09:00 h to 17:00 h.
After flushing the brooding adult(s) and counting the number of chicks, the area in the immediate vicinity (~10-15 m in each cardinal direction) of the flushing adult was then traversed until the blind observer felt content that all chicks had flushed. Once the flush was considered complete, the open observer shared the frequencies of radio-marked chicks with the blind observer, if any were still alive and present. Collectively, the observers determined how many of the radio-marked chicks were counted by scanning the known chick frequencies.
The relative distance and direction of flushed chicks and radiomarked chicks was used to determine whether chicks had been counted or effectively flushed (i.e. if chicks used crypsis rather than flight to evade detection). In addition to independent observer counts, the behavior of adults (e.g. running, holding) and vegetation height at the flush site was recorded for each flush starting in mid-2018. Vegetation height for the area where the adult flushed was assessed using a meter stick and classified into five categories: 0-0.25, 0.25-0.50, 0.5-0.75, 0.75-1 and > 1 m. We binned vegetation data to account for structural heterogeneity at the flush site but treated these data as a continuous predictor (Robel et al. 1970, Hagen et al. 2005. We defined brood survival as having at least one chick surviving until 21 days of age.
Brooding adults that orphaned their radio-marked chicks or lost all radio-marked chicks to predation before 21 days of age were also flushed to compare apparent brood survival or apparent failure derived from flush-counts with and without the knowledge of radio-marked chicks. In other words, it was possible for adults that lost radio-marked chicks to predation to continue brooding a sample of chicks not marked with radio-tags. We also documented when a brooding adult died but radio-marked chicks were still known to be alive. Vegetation data was entered as missing for counts when adults died before surveys while chicks remained alive. Therefore, we reported detection for samples both inclusive and exclusive of missing vegetation data.

Statistical analysis
We estimated detection probability of chicks using a generalized linear model with a logit link in a Bayesian framework.
We modeled detection for radio-marked chicks as a binomial process similar to occupancy models. We specified weakly informative and normally distributed priors for betas with a mean of 0 and precision of 0.33.
Thus, the data generating model included priors where i was the index for each flush count conducted, C was the observed number of chicks during a flush count, p was the probability of detecting an individual chick and N was the known number of radio-marked chicks. We sampled posterior distributions using three Markov chains with 5000 iterations, 1000 adaptations, and a thinning rate of one in the jagUI package of Program R (Kellner 2017, <www.r-project.org>). We assessed convergence by visually inspecting chain mixing and assessing the Gelman-Rubin diagnostic (Gelman and Rubin 1992). We compared estimates of brood survival between conventional flush-counts and the telemetry sample using a binomial t-test in a Bayesian framework as implemented by Kéry (2010, pp. 91-101). Again, we sampled the posterior distribution using three Markov chains with 5000 iterations and a thinning rate of one. For flush-counts, broods with ≥ 1 chick heard or seen during counts were considered successful. For the telemetry method, broods with ≥ 1 radiomarked chick surviving to 21 days was considered successful regardless of detection during flush-counts.
To compare overall agreement of brood survival estimates derived from flush-counts and telemetry, we constructed an accuracy matrix and calculated sensitivity and specificity for brood survival as predicted by radio-marked chicks and observed during flush-counts (Warwick-Evans et al. 2016). Sensitivity, also referred to as the true positive rate, describes the proportion of correctly classified successes as predicted by the radio-marked sample. Specificity, also referred to as the true negative rate, describes the proportion of counts where no chicks were counted, and no chicks were predicted to be counted because all radio-marked chicks in a brood had died before the survey.
To compare counts between observers we conducted a Welch's t-test, assuming unequal variances, in a Bayesian framework as implemented by Kéry (2010, pp. 91-101). Chains, thinning rates and iterations were the same as specified in previous tests. We report the variation in counts between observers as the mean difference of counts (open count minus blind count). Lastly, we report amalgamations (adoption, orphaning) as a frequency of occurrence.
Overall accuracy of brood survival observed during flushcounts as predicted by the radio-marked sample was 0.57 (Table 1). Sensitivity of flush-counts was particularly low at 0.53. That is, broods were detected during 53% of flush counts when brood survival was known to be 1.0 based on the radiotelemetry sample. Estimates of brood survival were 0.83 (95 CrI: 0.70-0.92) and 0.48 (95 CrI: 0.34-0.62; Fig. 2) when derived from radiotelemetry and flush-counts, respectively.
When brooding adults exhibited running behavior rather than holding and flushing during counts (n = 6), no chicks available for detection (n = 16) were observed which precluded formal analysis due to complete separation of data points. On two occasions where adults exhibited running behavior, observers audibly detected chicks calling but were not able to visually detect presence of marked individuals. In such instance, we recorded apparent brood survival for the survey as 1.0, but detection of marked individuals as 0. We observed a brooding adult's transmitter failing after the brood capture and before the flush-count on one occasion; three of four radio-marked chicks were known to be alive at 21 days of age. On two other occasions, we documented predation of a brooding adult after the brood capture and before the flush-count. In one of these occasions, all radiomarked chicks were killed during the same predation event as the adult. In the remaining occasion where an adult died before flush-count, one of three radio-chicks were still alive.
We observed brood orphaning between the time of brood captures and before flush-counts on five occasions. On each occasion, no chicks were recorded during flush-counts but radio-marked chicks from four of these five broods were still known to be alive. Conversely, we documented adoption of chicks on six occasions. In all instances of adoption, apparent brood survival derived from flush-counts and telemetry of radio-marked chicks was one.
We found no evidence that blind observers counted fewer chicks than open observers. Rather, open observers counted 0.6 chicks less (95 CrI: −4.0 to 2.7; Fig. 3) than blind observers. Detection of marked chicks between observers was the same for all surveys except one. This survey discrepancy was deduced by comparing flush distances and directions of chicks counted by each observer where one observer detected chicks flushing in one direction that the other observer did not.

Discussion
Our results portend limited utility of flush counts for estimating brood survival of northern bobwhite, and likely other precocial birds, and a need for researchers to identify robust and novel methods to identify factors influencing detection probability. Our results exemplified our predictions that factors such as vegetation concealment, adult behavior and brood orphaning would influence detection. Alternatively, counts and detection were largely the same between observers. Thus, a second observer will not likely substantially increase detection or, thereby, refine estimates of chick or brood survival.
We failed to detect chicks on all occasions when brooding adults ran from observers. This occurrence was likely the result of imprecisely identifying the location of the adult and pushing the brood by not directly approaching them and-or brooding behavior exhibited by the adult in an attempt to lure the observer, seen as a predator, away from the brood. In the case of the former, researchers should consider the bearing at which the brooding adult is approached to ensure the flush is brisk and direct. The latter is a common behavior exhibited by bobwhite and multiple other gallinaceous birds (Wiley 1974, Sonerud 1988, Hudson and Newborn 1990, Lunsford et al. 2020). In either case, researchers should consider omitting data from flush-counts when adult behavior is suspect of such activities. By removing brooding adults that exhibited running behavior or died before flush counts, apparent brood survival was less biased. Even then, detection of chicks was still low (0.52) due to other extrinsic factors such as vegetation obscurity.
Apparent 21-day brood survival derived from flushcounts (0.48) and radiotelemetry (0.83) in this study closely resemble results reported by Orange et al. (2016) who documented apparent 21-day brood survival of 0.50 and 0.81 for flush-count and radiotelemetry derived estimates, respectively. Vegetation attributes were not reported by Orange et al. (2016); however, it might be assumed vegetation in the arid region of their study was sparser than our study site in northern Florida. Assuming the effect of vegetation on detection is similar across regions, and we found vegetation to negatively influence detection, other sources of variation such as brood abandonment and mortality by brooding adults may have affected detection in their study disproportionately to ours. As such, we suggest researchers using flush counts report the rates at which amalgamations were suspected to occur. However, this may prove difficult when trying to distinguish the difference between orphaning versus chick mortality and therefore lead to tenuous inference. Brood adoption may be apparent when brood size during flush counts exceeds hatch size. However, while larger broods tend to be the result of adoption, brood size is not necessarily a good indicator for overall brood amalgamations (Faircloth et al. 2005, Faircloth 2008. Given the rate of orphaning and adoption observed among bobwhite, it is apparent that modeling brood survival of unmarked chicks from radio-marked adults would  require additional parameters to account for amalgamation. To account for imperfect detection, researchers have successfully estimated survival of broods from unmarked chicks by tracking marked adults and using repeated counts to empirically estimate detection (Lukacs et al. 2004, Dreitz 2009). Models estimating young survival from marked adults has been implemented in packaged software programs such as MARK, but parameterization of these models assumes no brood mixing. (Lukacs and Dreitz 2010). Poor detection of chicks led to low specificity of 21-day brood survival estimates from flush-counts. A sensitivity of 0.53 suggests that chicks were not detected during 47% of flush-counts where chicks were known to be alive. Conversely, we also documented two instances where unmarked chicks were observed during flush-counts when the subsample of radio-marked chicks within the brood had all died. It is important to note that retention of radio-tags attached to chicks was estimated to be 0.76 (Terhune et al. 2020). Thus, detection and brood survival as estimated from radiotelemetry may also be imperfect and biased low.
Overall accuracy of conventional flush-counts was 0.57. Thus, researchers should use caution when drawing strong conclusions from flush-count derived brood survival and refrain from using brood survival as a response variable in modeling. Kamps et al. (2017) adopted this model-ing approach and found that hatch date and the number of chicks hatched in a clutch were significant predictors of apparent brood survival. However, later hatching broods have smaller clutch sizes and lower probability of brood amalgamation which may bias brood survival low (Faircloth 2008). Conversely, advancing vegetation through the growing season may decrease detection later in the nesting season. Thus, intrinsic autocorrelations may lead to spurious and fortuitous inference when using apparent brood survival as a response.
Repeated roost surveys using thermal cameras may provide a better avenue of obtaining brood survival compared to flush-count surveys, however amalgamation will still require attention and we caution against excessive disturbance events during brooding. Using thermal cameras to count bobwhite chicks during roosting, Andes et al. (2012) documented substantial agreement (> 95%) between known brood sizes and observer counts. Detection is likely higher during roost counts because all chicks in a brood should be located at the roost disk as opposed to scattered feeding such as during diurnal flush counts. While Andes et al. (2012) documented high detection, the effects of vegetation and brood amalgamation during roost counts remained uncertain. Andes et al. (2012) documented amalgamation in two of 43 brood capture events -a much lower rate than observed in this study (n = 5 orphan events, n = 6 adoption events of 46 flush counts). Moreover, vegetation on our field site was likely much more dense than on the study site by Andes et al., which occurred in a semi-arid region during extreme drought (Nielsen-Gammon 2012).
Recognizing factors influencing brood survival of northern bobwhite should help more comprehensively understand population dynamics and promote more effective conservation of the species. To effectively to do this, future researchers should adopt analytical methods accounting for varying rates of brood amalgamation and detection due to vegetation and behavior.