In recent years, much progress has been made in non-invasive genetic methods for various purposes including population estimation. Previous research focused on optimising laboratory protocols and assessing genotyping errors. However, an important source of bias in population estimates still remains in the field sampling methods. The probability of animals being sampled can vary according to sex, age, social status or home-range location. In this article, we present relevant literature reviewed to provide an overview of the occurrence of individual heterogeneity (IH) in the field, and how it can be minimised, e.g. by adaptation of sampling design. We surveyed 38 articles describing non-invasive population estimation for 12 mammal and two bird species. The majority of these studies discussed IH as a potential problem. The detectability of IH via goodness-of-fit testing depended on the average capture probability reported in the studies. Field tests for assessing variation in sampling probabilities or validating estimations were carried out in only 11 of the 38 studies. The results of these tests point out that IH is a widespread problem in non-invasive population estimation, which deserves closer attention not only in the development of laboratory protocols but also concerning the sampled species' characteristics and the field methods. IH can be reduced in the field by carefully adapting the sampling design to the characteristics of the studied population. If this is not reasonable, it may be better to switch to a different sampling strategy.
Reliable estimation of population size remains a major challenge in wildlife research and management. In recent years, non-invasive DNA-based population estimation methods have been widely applied in a variety of species. Several standard approaches have been modified to fit genetical implementation, among which are rarefaction (e.g. Frantz et al. 2004) and capture-recapture (e.g. Boulanger & McLellan 2001). In their conventional form, both methods presuppose capture or killing of animals or rely on direct sightings, and are challenged by the possibility of heterogeneous detection probabilities amongst the studied population (Borchers et al. 2002, Petit & Valière 2006). Being most frequently used, capture-recapture (CR) methods are especially vulnerable with respect to individual heterogeneity (IH; Pledger & Efford 1998, Link 2004, Lukacs & Burnham 2005), i.e. differences between individuals of a population in the probability of being captured (Borchers et al. 2002). Capture- and recapture probabilities may be influenced by age, sex, social status and individual experience (Baber & Coblentz 1986, Piggott & Taylor 2003), and this can generate severe bias in population estimates (White et al. 1982, Minta & Mangel 1989, Sweitzer et al. 2000). IH can be accounted for by use of different modelling approaches (e.g. see Otis et al. 1978, Chao 1987, Chao & Jeng 1992, Pledger & Efford 1998), but the power of goodness-of-fit (GOF) tests and model selection procedures to detect IH in a given data set is often low (Menkens & Anderson 1988, McKelvey & Pearson 2001). Furthermore, as Link (2003, 2004) recently stated, IH is far more difficult to model than has previously been recognised, modelling being especially problematic if the causes and extent of IH are unknown. Thus, in order to allow accurate population estimates, IH should be either minimised or quantified as far as possible (Petit & Valière 2006).
Methods based on non-invasive genetic sampling offer solutions for estimation of population size without capturing or killing animals, making them advantageous for rare or endangered species (Kohn et al. 1999, Taberlet et al. 1999, Mills et al. 2000, Piggott & Taylor 2003). McKelvey & Schwartz (2004) and Petit & Valière (2006) suggested that the absence of handling can overcome the effects of previous capture history on subsequent catchability, and thus certain sources of IH could be reduced. The most commonly used non-invasive DNA sources are hairs and faeces for mammals as well as feathers and faeces for birds (Lukacs & Burnham 2005). Non-invasive methods have made CR approaches, which in their conventional form are more suitable for small and abundant mammals, applicable for large, elusive and/or endangered mammal and bird species (Obbard et al. 2010).
However, despite their advantages, non-invasive genetic methods are also prone to heterogeneity related to biological variability among individuals (Kohn et al. 1999, Wilson et al. 2003, Boulanger et al. 2004a). Moreover, in non-invasive methods IH can interact with bias caused by genotyping errors. Allelic dropout and false alleles can create ‘new’ false individuals, leading to overestimation in population estimates because recaptures may be concealed, resulting in a decreased recapture rate (Creel et al. 2003, McKelvey & Schwartz 2004). Furthermore, there are some issues in non-invasive genetic CR which are not problematic in conventional CR. In genetic CR the total number of marks in the population is not known and marks may not be unique, because only a subset of each animal's genome is used for identification (Lukacs & Burnham 2005). Therefore, the danger of misidentification is increased compared to conventional CR. Also, a ‘sampling occasion’ can be more difficult to define than a ‘capture occasion’, because the moment of the deposition of a sample, e.g. hair or faeces, can not be assessed precisely. This can compromise the concept of population closure (Lukacs & Burnham 2005). Thus, despite the high potential of non-invasive genetic techniques, there are several issues which can complicate the application of a CR framework for population estimation, in addition to the difficulties already present in the conventional approach.
Until now, great progress has been made in genetic techniques. In particular, much effort has been devoted to quantifying and reducing genotyping errors (Taberlet et al. 1999, Paetkau 2003, Broquet & Petit 2004, Roon et al. 2005a, Miquel et al. 2006). In contrast, fewer attempts have been made in assessing the extent and causes of IH in the field, i.e. due to biological characteristics of the sampled species, to individual attributes or due to sampling procedures (Boulanger et al. 2006). However, information about the causes and the extent of IH is essential to improve sampling designs (Boulanger et al. 2004c). Furthermore, IH in combination with uncertainties caused by genotyping errors can cause multiplicative effects and thus lead to an increase in the overall bias. Therefore, it is crucial to address both IH and genotyping problems very carefully in order to minimise bias in population estimates.
Based on the recent peer-reviewed literature, our article aims at:
providing a survey of the occurrence and treatment of IH in non-invasive population estimation studies, especially with respect to different sampling strategies;
assessing the impact of sample size and capture probability (p) on the detectability of IH via GOF tests or model selection procedures in CR studies;
comparing different methods, which seem suitable to assess IH in the field, also with respect to the study species and its characteristics.
Material and methods
Our review is based on population genetic studies that involve non-invasive sampling for the purpose of population estimation in mammal and bird species. We performed a search in the Swiss Wildlife Information Service (SWIS) database for peer-reviewed publications using the following search terms: hair trap, non-invasive sampling, genotyping, population estimates, faeces sampling, hair sampling and genetic monitoring. The search yielded 104 titles and was supplemented with published lists of references. In total, we detected 142 articles, of which we focused on 38 studies (complete list of references available on request from the corresponding author). We only included papers in which the non-invasive sampling was de facto conducted in the field and applied for population estimation; we excluded literature reviews and articles dealing with single aspects in the development of sampling methods. We focused on studies using hair, faeces and feathers as those are the main sources of non-invasive tissue samples. Other sources (e.g. urine, shed skin or buccal swabs) have been much less employed for population estimation until now (Broquet et al. 2007). We also included cases in which a combination of different sampling strategies was applied. We restricted our review to studies using CR or rarefaction (also termed ‘accumulation curve’ methods; e.g. see Kohn et al. 1999 and Eggert et al. 2003) approaches for population estimation, as they are the most commonly used and more prone to be biased by IH than e.g. estimation of minimum densities or minimum number alive.
For each study, we assessed whether IH had been mentioned, i.e. considered as a factor potentially influencing the population estimate. Additionally, we recorded if IH was detected, e.g. via likelihood-ratio-tests (program CAPWIRE; Miller et al. 2005), χ2-tests (program CAPTURE; Otis et al. 1978), via Akaike's Information Criterion (AIC, e.g. in program MARK; White & Burnham 1999) or GOF testing in program U-CARE (Choquet et al. 2009). Furthermore, IH can be discerned in uneven ‘capture frequencies’ of sampled individuals (Kohn et al. 1999, Scheppers et al. 2007). The power of tests to detect IH can depend on capture probability (p; Pollock et al. 1990, Boulanger et al. 2002). Additionally, we suspected the number of sampling occasions and coverage, i.e. proportion of the population sampled, to have an effect on IH detectability. The estimated coverage is significantly correlated with p and was included because not all reviewed studies provided estimates of p. We used logistic regression to evaluate the impact of p, coverage and sampling occasions on the probability of IH being detected. In this context, we evaluated studies in which IH was detected in the capture frequency or via field test, but not in the GOF or model selection tests as ‘not detected’. We also included squared terms of p and coverage since data suggested an optimum somewhere in between the extreme values. We selected models based on AIC. For the logistic regression, we used every single population estimate reported in the 38 reviewed articles, as results of several study years or different study areas were often included in one article, resulting in different p and population estimates. As p and coverage of different analyses reported within the same study could be correlated, we included the studies as random factor. This worked well with coverage but not with p, because for p sample size was low (only 39 of the 76 analyses included estimates of p average) and the number of studies reporting only one analysis was high. Therefore, in the case of p we averaged the values for each study and conducted both a weighted and an unweighted logistic regression without random effects. All analyses were performed using program R (Ihaka & Gentleman 1996). For the mixed effect logistic regression model we used the function lmer of the package lme4 (Bates & Maechler 2010).
For studies using CR methodology, we additionally recorded if IH was included in the estimation model (which is not possible for rarefaction methods). We also searched for studies in which field tests had been carried out parallel to the non-invasive sampling for validation purposes. We put our special attention to methods and results of these studies and aimed to assess if the applied field tests hold the potential to reveal IH.
The articles we reviewed dealt with 14 different study species; 12 mammals and two birds. In 30 of the 38 studies we included in our review, CR was the sole method applied to estimate population size. Four studies used rarefaction analysis only and four used both methods (Table 1). Altogether, hair was the DNA source in 22 of the 38 studies. In one of these cases, the hair sampling was combined with harvest data, and in another one faecal sampling was carried out simultaneously. The remaining 16 studies relied on faeces as DNA source; one of them in combination with feathers. IH bias was mentioned as a potential problem in 34 (89%) of the 38 contributions, whereas it was modelled in 26 of the 34 CR studies (i.e. 76%). In 18 of the 34 CR articles, IH was detected via χ2-tests, likelihood-ratio-tests or with the help of AIC. In another five studies in which GOF tests were performed, IH was not detected in their data sets. In six of all the 38 studies, tests either failed to detect IH or were not performed, but it was nevertheless visible in the ‘capture frequencies’ (see Table 1). Altogether, in 24 of the 38 studies (63%) the data revealed the occurrence of IH amongst the studied population independent of further field tests.
Assessment of IH via GOF tests and model selection procedures
In this section we are taking into account every single population estimate (N = 76) reported in the 38 reviewed articles. More than half (53.8%) of the 39 reported estimates of p average are below 0.2 (Fig. 1) and thus around or below the minimum recommended by Otis et al. (1978) for reliable model selection and population estimates (Otis et al. 1978 recommend p ≥ 0.2 for a population of 200 animals and state that p should never be below 0.1. Recommended minimum number of capture occasions is 5, but better 7 - 10). Furthermore, IH was only detected for p between 0.16 and 0.4. The proportion of population estimates in which IH was detected increased with increasing p until p = 0.4 (see Fig. 1). In none of three studies with p ≥ 0.4 IH was detected. However, in one of these studies the sample size was too small to carry out tests in the program CAPTURE (see Table 1; Belant et al. 2005 sampling on Sand Island). The logistic regression showed no impact of coverage on the detectability of IH (Table 2). In the case of p and p2, the results suggest that there is an effect on the detectability of IH (Tables 3 and 4). The IH detectability is highest at p values around 0.3 (Fig. 2). The most supported model does not include the number of sampling occasions, but a model including sampling occasions is ranked marginally below (ΔAIC < 2), indicating a potential influence (Burnham & Anderson 1998). It seems possible that with an increasing number of sampling occasions, the detectability of IH increases. Models including an interaction between p and the number of sampling occasions were not supported (see Table 3).
Assessment of IH via field tests
Field tests suitable for assessing the occurrence of IH bias were performed in 11 of the 38 studies. In seven of these 11 cases, IH was actually detected (see Table 1). In one of the four other cases, IH was found to be present in the hair sampling part of the study, but was strongly reduced by sampling harvested animals as an additional strategy (Dreher et al. 2007). Furthermore, in eight of the 11 studies, IH was detected via GOF testing or in the ‘capture frequencies’. Thus, in two cases where the field tests did not reveal IH, nevertheless it seemed to be present and detectable in the data set. Furthermore, in two cases IH was detected through the field test but not in the data.
Field tests in detail
Using radio-telemetry, Kohn et al. (1999) found IH to be present in their population under study: 12 radio-collared coyotes Canis latrans made use of the area to different degrees, and the number of faeces that was deposited correlated with their relative use of the study area. This IH was also reflected in the ‘capture frequencies’ of the sampled individuals. Faeces sampling of a coyote population in Central Alaska, USA, exhibited IH with respect to age and home range, as well as resident status. This was revealed through radio-telemetry of 15 collared adult resident individuals, which showed higher survival and recapture rates than juveniles and transient or edge individuals (Prugh et al. 2005). Furthermore, the model selection process in program MARK detected IH in the data. This was also the case in a hair sampling study on grizzly bears Ursus arctos; Boulanger et al. (2004c) used location data of 12 GPS-collared bears to evaluate the potential bias. They found p to be greater for males than for females and also to be influenced by capture history (i.e. differences between collared and non-collared individuals). The latter was also detected in another study (Boulanger et al. 2004a) in which radio-telemetry was conducted over three years on a total of 35 bears and were compared to hair sampling data collected in the same area. Additionally, the p of females with cubs differed from those of the rest of the population. In a study carried out by Wasser et al. (2004), grizzly bear faeces sampling data were compared to hair sampling and radio-telemetry data collected simultaneously in the same area. Faeces collection was conducted with the help of trained dogs. This seemed to be an effective and relatively less biased method than hair sampling at baited stations. In the latter, close kin (i.e. females with their offspring) were considerably less represented. However, the sample size of matched faeces and telemetry data was too small to allow more fine-grained comparisons, and IH was neither detected in the field test nor in the data set (Wasser et al. 2004). In a black bear Ursus americanus hair sampling study accompanied by radio-telemetry, Dreher et al. (2007) used harvested bears as an additional sample. Due to this combination, the IH, which would have been present if hair sampling or harvest data were used alone, was strongly reduced (B. Dreher, pers. comm.) and thus neither detected in the field test nor in the data set.
Wilson et al. (2003) carried out a faeces sampling study on badgers Meles meles in which they used video control of a largely marked population to validate their rarefaction estimate. They succeeded in sampling almost the entire population by collecting faeces at latrines near badger setts and did not detect sex- or age class bias. However, considerable variation existed in the numbers of samples obtained from the different individuals. Moreover, some known individuals never used the sampled latrines and thus were not identified via faeces sampling. These results suggest the incidence of IH, e.g. due to variation in individual behaviour, which in that case might not have compromised the estimation because such a high proportion of the population was actually sampled. A hair sampling study conducted by Scheppers et al. (2007) at badger setts, simultaneously surveyed via direct observations, yielded similar results. The ease with which badgers were sampled varied considerably between setts; hair traps were not visited equally often by all members of the groups. Using baited hair traps and applying direct observation as a validation method, the results of Frantz et al. (2004) show a comparable pattern. Even though obvious variation in the individual sampling frequency existed, almost all badgers present in the area were sampled. Thus, the rarefaction analysis yielded quite reliable results. However, the IH observed in the three badger studies might have been crucial in other populations or situations, e.g. when a lower proportion of the population is represented in the samples, especially when CR methods are applied (Pollock 1982). Furthermore, video control as well as direct observation in all three studies focused on obtaining an independent census of the sampled badger groups; not on observing the sampling behaviour itself. As a consequence, potential sources of IH such as dependence of latrine use or access to bait on social status may remain undetected. The same seems to hold true for a study on the lesser horseshoe bat Rhinolophus hipposideros; the non-invasive population estimate was validated via direct counts of bats in their day roosts (Puechmaille & Petit 2007). The direct counts did not reveal any IH in the faeces sampling. However, IH was detected in the sampling data of several of the sampled bat colonies via likelihood-ratio or simulation tests.
In a faeces sampling study on wolves Canis lupus in the Italian Alps, Marucco et al. (2009) used an evaluation system for age-dependent marking behaviour related to defecation. Due to the fact that part of the population was radio-collared or otherwise known, it was possible to discriminate between faeces deposited by adult wolves for marking purposes and ‘non-marking’ faeces. The authors detected age- and status dependent IH in the defecation behaviour and concluded that they would have missed a considerable part of the juvenile population, if they had not adapted their search pattern. However, by means of the field tests, Marucco et al. (2009) were able to apply and confirm a representative faeces sampling strategy.
Most researchers seem to be aware of IH being a major problem present in population estimation based on non-invasive genetic methods. The vast majority of articles dealing with non-invasive methods applied for such purpose mention or discuss this problem. In most studies based on CR approaches, the authors attempted to account for potential bias by employing models which incorporate IH (Chao 1987). However, as long as the different sources and the extent of IH are unknown, the results of population estimations are strongly model dependent and might not reflect reality (Link 2003, 2004). Furthermore, the different methods to test for IH in the data set may have limited power and thus often fail to detect IH (Boulanger & McLellan 2001, Miller et al. 2005). In the literature, it has been mentioned that the power of such test procedures is especially low for small ps (Menkens & Anderson 1988, Boulanger & McLellan 2001). The results of our analysis support this finding; they indicate an impact of p on the detectability of IH via GOF tests and model selection procedures. This effect does not seem to depend on the type of test and the software used (logistic regression with test type as additional covariate showed no significant effect; results not shown). In our analysis, we used a very conservative approach by averaging over the studies and using both weighted and unweighted values. Both approaches show very similar results, indicating that the results are robust to details in the analysis. In studies with a low p, IH was detected considerably less often than in studies with a higher p. The highest proportion of detected IH was attributed to studies with p between 0.2 and below 0.4. Interestingly, in none of the three studies reporting p > 0.4, tests suggested the incidence of IH. This might be due to the fact that IH bias becomes much less problematic, perhaps even negligible, when p is high, which has been shown in simulation studies (J. Boulanger, pers. comm.). When most animals in a population are actually captured or sampled, the differences in p between individuals have much less impact on the population estimate (Pollock 1990, Lukacs & Burnham 2005). Thus, IH might not be reflected in GOF testing or model selection when p is high. Even though the number of sampling occasions for each study was not included in the most supported model, there seems to be an indication of a certain influence on the detectability of IH, because the model including sampling occasions was ranked only marginally inferior to the best model. The more sampling occasions that are carried out, the better might be the ability to detect IH via GOF-testing. However, this point needs further investigation before a clear conclusion on sampling occasions can be drawn. In contrast to p, an impact of coverage on the detectability of IH was not supported in our analysis; despite the correlation of coverage and p.
It should be mentioned that with our analysis, we are not able to distinguish if a negative result of the testing for IH is due to lack of power and test failure or because there simply is no IH present in a given data set. However, regarding the existing literature including simulation studies and studies on populations of known size, IH seems to be almost ubiquitous in non-invasive sampling data sets like in the conventional CR (Pollock 1990, Borchers et al. 2002, Knapp et al. 2007, Lukacs & Burnham 2005). Therefore, it seems much more likely that a negative test result is caused by a low test power than that a data set is really homogeneous, particularly when p is low. In recent years, new modelling approaches, e.g. multistate and multievent models (Pradel 2005), have been developed, which might allow a more flexible handling of CR data in presence of IH (Crespin et al. 2008). In this context GOF testing using non-parametric methods, like in program U-CARE, seems to be quite promising compared to conventional methods (Choquet et al. 2009, Cubaynes et al. 2010).
In five of the articles we reviewed, IH was not detected by the data tests which were performed. However, in two of the five cases, additional field tests were carried out, and both of them revealed IH. In general, data tests and/or pronounced differences in the ‘capture frequencies’ indicate presence of IH bias without carrying out extra field tests, but often further investigations would be required to uncover the causes of IH. Since many different IH sources exist, they can influence estimations in different ways, and this effect may also depend on the sampling design (Crespin et al. 2008). Models that are relatively robust to IH generally show reduced precision of estimate (Boulanger 2004a). This may not be tolerable in cases where an accurate estimate is particularly important, e.g. when the spread of diseases is concerned (Artois et al. 2002) or when management plans for rare or endangered species are considered (Guschanski et al. 2009). However, for endangered species, overestimating a population is much more critical than underestimating it (Meijer et al. 2008), so some underestimation bias may be tolerable in certain cases.
Assessment of IH via field tests
The choice of methods to test for IH in the field strongly depends on the observed species and its behavioural patterns, as well as its space and habitat use. Thus, e.g. for badgers which live in social groups, share setts and make rather small-scale movements, video control or direct observations at setts seem to be an adequate method to validate non-invasively obtained estimates (see e.g. Frantz et al. 2004 and Scheppers et al. 2007). Contrastingly, for highly mobile species such as bears and coyotes, radio-telemetry may be more promising. The suitability of a field method to test for IH furthermore depends on the applied sampling strategy. For example, video control or direct observations can be appropriate for surveillance of discrete sampling stations like hair traps or badger setts, but will not be suitable for large-scale sampling designs such as e.g. line transects. Radio-telemetry may be more effective to observe movements and transect- or trap-encounter rates of animals on a large scale. Furthermore, radio-telemetry is useful for obtaining information on spatial distribution and home-range sizes of animals in order to fit sampling designs and to account for closure violations and edge effects (Boulanger et al. 2004a, Dreher et al. 2007). The feasibility of a sampling method for a given species or population can depend on spatial characteristics like home-range sizes and distribution of animals in the sampled area. Settlage et al. (2008) found hair sampling of black bears via baited sampling stations impractical for the Southern Appalachian region. Due to small home-range sizes of the resident bears, sampling probabilities were low and biased. In order to yield a reliable estimate, a much higher sampling intensity would have been necessary (Settlage et al. 2008). Grizzly bears showed considerably higher p with comparable sampling intensities because of their larger average home ranges (Boulanger et al. 2004a, McLoughlin et al. 2003).
Interactions between sampling strategies and study species' characteristics
The occurrence and/or extent of IH may differ dependent on the applied sampling method. ‘Active’ sampling methods like hair sampling via baited hair traps presuppose that animals actively approach the sampling station. In many species, it has been shown that individuals show consistent or context-specific personality traits, e.g. they differ in their exploration behaviour and their reactions towards newly introduced factors, which may affect their sampling probability (Coleman & Wilson 1998, Ruis et al. 2000, Dingemanse et al. 2003, Mettke-Hofmann et al. 2005). Furthermore individual experience and life history may influence behaviour with respect to sampling stations. This could cause IH which is not necessarily related to sex, age or social status, and which might be hard to quantify and very difficult to account for in a model. Thus, in some cases, it may be reasonable to apply a different sampling method. In this context, ‘passive’ sampling strategies such as faeces sampling along transects represent an alternative, which may be less affected by individual behaviour or status differences. This may hold true particularly for group living species. Interactions between animals can increase IH, especially when sampling concentrates on defined stations such as hair traps, which require active approach. As an example, we conducted a hair sampling pilot study on wild boar Sus scrofa. Video observation at baited hair traps revealed significant behavioural differences depending on age of the animals and on their group status (Ebert et al. 2010). However, also for bears which can be considered as living mainly solitary, it has been shown that via faeces sampling, a ‘passive’ sampling strategy, a larger part of a population can be observed than at hair sampling as an ‘active’ approach (Wasser et al. 2004). Wasser et al. (2004) applied both methods in the same study area and time period, and via hair sampling only 46% of the individuals that were identified via faeces sampling were detected. ‘Passive’ sampling methods in most cases will not yield completely unbiased results (in fact, most of the faeces sampling studies that we reviewed reported IH in their data sets). Nevertheless, ‘passive’ sampling might rule out certain sources of IH which are not avoidable in ‘active’ approaches, and thus holds the potential to yield results with smaller overall bias. However, in some (especially social and/or territorial) species, status or age differences between individuals may cause differences in faeces deposition patterns leading to IH in detection probabilities. This has been shown e.g. for wolves (Marucco et al. 2009, Cubaynes et al. 2010). Thus, ‘passive’ sampling will not be suitable in all cases, and at any rate the appropriate sampling strategy and design have to be carefully tested for each particular species and population. Furthermore, the DNA quality of faeces in some cases has been shown to be inferior to that of hair, thus population estimates derived from faeces sampling data may be more in danger of bias due to genotyping errors (Piggott & Taylor 2003).
1) We recommend to perform a pilot study not only in the lab, but also in the field: In any case, it is most advisable that researchers who plan to establish population estimation based on non-invasive genetic sampling perform pilot studies not only to assess genotyping error rates, but also to detect sources of IH bias in the field. The fact that the majority of reviewed studies in which such field tests were performed actually detected IH highly supports this recommendation. The appropriate methods to assess IH in the field depend on the species or population under study as well as on the applied sampling method;
2) not to rely solely on GOF testing and model selection procedures: This holds true, especially when p and coverage are low! It can be reasonable to incorporate heterogeneity in an estimation model, even if tests suggest that there is no IH present in a data set, because their power is often low. It is always recommended to include biological knowledge and information about e.g. study species and habitat to validate model choice;
3) try to reduce IH by adapting the sampling design: Knowledge about the sources and extent of IH can enable researchers to adapt the sampling design to account for the bias. Among the methods to reduce IH bias in the field, the application of two or more different sampling strategies in combination seems especially promising (Dreher et al. 2007, Boulanger et al. 2008, Settlage et al. 2008). If multiple methods are used simultaneously to sample a population, the impact of IH caused by any single method can be minimised (Pollock 1982, Williams et al. 2002). The improvement of estimations based on multiple approaches increases with decreasing correlation between the applied sampling methods (Boulanger et al. 2008). In the case of hair sampling, the use of unbaited sampling stations (e.g. installed at trails or rubbing trees) and changing of sampling locations between sessions may be applied to reduce IH due to competition between individuals for resources and due to ‘trap happy’ individuals (Scheppers et al. 2007, Boulanger et al. 2008). Collection of faeces samples with the help of trained dogs seemed to increase the detection rate and efficiency of the method considerably, allowing a relatively representative and unbiased population survey compared e.g. to hair sampling (Wasser et al. 2004, Long et al. 2007). Furthermore, it is advisable to perform a sufficiently high number of sampling occasions in order to increase the overall sampling probability and thus to facilitate accurate estimates;
4) try to sample a large part of the population: As shown e.g. in the three badger studies we reviewed, one effective way to reduce the bias caused by IH is to sample a large proportion of the population (Lukacs & Burnham 2005). This is generally desirable and has been recommended in relevant literature many times before (see e.g. Otis et al. 1978, Pollock et al. 1982), but is certainly not always feasible. Furthermore, an increase in sample size can have an unfavourable impact on non-invasive genetic population estimates; the more samples are analysed, the higher the misidentification rate due to genotyping errors (McKelvey & Schwartz 2004). Thus, careful error-checking protocols for genotyping are crucial and genotyping error rates should be determined in order to avoid an increase in bias through misidentification (Maudet et al. 2004, Roon et al. 2005b);
5) consider switching to other sampling strategies: Adapting the sampling design may not always be possible or may yield no success. Furthermore, an unsolved problem still remains; even though detection of IH and its sources may be possible with methods like e.g. radio-telemetry or video observation, the exact quantification of such variation and thus its incorporation in estimation models still seems to be very difficult. Consequently, when reduction and/or modelling of IH is not possible, it can be recommendable to apply a different sampling method in some cases. The suitability of a method can depend e.g. on characteristics of the studied species, population or study area. In some cases, ‘passive’ sampling approaches may yield more representative results than ‘active’ methods. In case IH cannot be reduced or avoided, a study should be designed in such a way that it results in capture probabilities between 0.2 and 0.4, to have an ample chance to detect existing IH.
In conjunction with problems caused by genotyping errors, IH is a highly challenging issue in non-invasive population estimation. It is a well-known and explicitly discussed problem at least with regard to its theoretical and model-based aspects. IH can be identified and strongly reduced when field sampling design and analytical approach are carefully prepared. However, more attention should be given to the evaluation of field methods to bring forward more effective and sustainable population estimates, which is especially important for conservation of endangered species and even more in fragmented habitats.
We wish to thank J. Boulanger for giving valuable supplemental ideas and J. Arnold, J. Hofmann, W. Maurer, K. Jochum, as well as two anonymous reviewers, for providing further helpful comments on earlier drafts of this manuscript. Our study was granted by the foundation ‘Rheinland-Pfalz fuer Innovation’ and the Ministry for Environment, Forestry and Consumer Protection in Rhineland-Palatinate, Germany. C. Ebert gratefully acknowledges financial support from the FAZIT foundation.
- M. Artois, K. R. Depner, V. Guberti, J. Hars, S. Rossi, and D. Rutili . 2002. Classical swine fever (hog cholera) in wild boar in Europe. Revue Scienfique et Technique de l'Office International des Epizooties 2l:287–303. Google Scholar
- D. W. Baber and B. E. Coblentz . 1986. Density, home range, habitat use, and reproduction in feral pigs on Santa Catalina Island. Journal of Mammalogy 67:512–525. Google Scholar
- S. C. Banks, M. P. Piggott, B. D. Hansen, N. A. Robinson, and A. C. Taylor . 2002. Wombat coprogenetics: enumerating a common wombat population by microsatellite analysis of faecal DNA. Australian Journal of Zoology 59:193–214. Google Scholar
- S. C. Banks, S. D. Hoyle, A. Horsup, P. Sunnucks, and A. C. Taylor . 2003. Demographic monitoring of an entire species (the northern hairy-nosed wombat, Lasiorhinus kreftii) by genetic analysis of non-invasively collected material. Animal Conservation 6:101–107. Google Scholar
- J. L. Belant, J. F. Van Stappen, and D. Paetkau . 2005. American black bear population size and genetic diversity at Apostle Islands National Lakeshore. Ursus 16:85–92. Google Scholar
- E. Bellemain, M. A. Nawaz, A. Valentini, J. E. Swenson, and P. Taberlet . 2007. Genetic tracking of brown bear in northern Pakistan and implications for conservation. Biological Conservation 134:537–547. Google Scholar
- E. Bellemain, J. E. Swenson, D. Tallmon, S. Brunberg, and P. Taberlet . 2005. Estimating population size of elusive animals with DNA from hunter-collected feces: four methods for brown bears. Conservation Biology 19:150–161. Google Scholar
- M. R. Boersen, J. D. Clark, and T. L. King . 2003. Estimating black bear population density and genetic diversity at Tensas River, Louisiana using microsatellite DNA markers. Wildlife Society Bulletin 31:197–207. Google Scholar
- D. L. Borchers, S. T. Buckland, and W. Zucchini . 2002. Estimating Animal Abundance - Closed Populations. Statistics for Biology and Health. Springer, London, UK. pp. Google Scholar
- J. Boulanger, S. Himmer, and C. Swan . 2004a. Monitoring of grizzly bear population trends and demography using DNA capture-recapture methods in the Owikeno Lake area of British Columbia. Canadian Journal of Zoology 82:1267–1277. Google Scholar
- J. Boulanger, K. C. Kendall, J. C. Stetz, D. A. Roon, L. P. Waits, and D. Paetkau . 2008. Multiple data sources improve DNA-based capture-recapture estimates of grizzly bears. Ecological Applications 18:577–589. Google Scholar
- J. Boulanger and B. McLellan . 2001. Closure violation in DNA-based capture-recapture estimation of grizzly bear populations. Canadian Journal of Zoology 79:642–651. Google Scholar
- J. Boulanger, B. N. McLellan, J. G. Woods, M. F. Proctor, and C. Strobeck . 2004b. Sampling design and bias in DNA-based capture-capture-recapture population and density estimates of grizzly bears. Journal of Wildlife Management 68:457–469. Google Scholar
- J. Boulanger, M. Proctor, S. Himmer, G. Stenhouse, D. Paetkau, and J. Cranston . 2006. An empirical test for DNA capture-recapture sampling strategies for grizzly bears. Ursus 17:149–158. Google Scholar
- J. Boulanger, G. Stenhouse, and R. Munro . 2004c. Sources of heterogeneity bias when DNA capture-recapture sampling methods are applied to grizzly bear (Ursus arctos) populations. Journal of Mammalogy 85:618–624. Google Scholar
- J. Boulanger, G. C. White, B. McLellan, J. Woods, M. Proctor, and S. Himmer . 2002. A meta-analysis of grizzly bear DNA capture-recapture projects in British Columbia, Canada. Ursus 13:137–152. Google Scholar
- T. Broquet, N. Ménard, and E. Petit . 2007. Non-invasive population genetics: a review of sample source, diet, fragment length and microsatellite motif effects on amplification success and genotyping error rates. Conservation Genetics 8:249–260. Google Scholar
- T. Broquet and E. Petit . 2004. Quantifying genotyping errors in non-invasive poulation genetics. Molecular Ecology 13:3601–3608. Google Scholar
- K. P. Burnham and D. R. Anderson . 1998. Model selection and multimodel inference - a practical information-theoretic approach Springer Verlag. pp. Google Scholar
- A. Chao 1987. Estimating the population size for capture-recapture data with unequal catchability. Biometrics 43:783–791. Google Scholar
- A. L. Chao and S. L. Jeng . 1992. Estimating population size for capture-recapture data when capture probabilities vary by time and individual animal. Biometrics 48:201–216. Google Scholar
- R. Choquet, J-D. Lebreton, O. Gimenez, A-M. Reboulet, and R. Pradel . 2009. U-CARE: Utilities for performing goodness of fit tests and manipulating CApture-REcapture data. Ecography 32:1071–1074. Google Scholar
- K. Coleman and D. S. Wilson . 1998. Shyness and boldness in pumpkinseed sunfish: individual differences are context-specific. Animal Behaviour 56:927–936. Google Scholar
- S. Creel, G. Spong, J. L. Sands, J. Rotella, J. Zeigle, L. Joe, K. M. Murphy, and D. Smith . 2003. Population size estimation in Yellowstone wolves with error-prone noninvasive microsatellite genotypes. Molecular Ecology 12:2003–2009. Google Scholar
- L. Crespin, R. Choquet, M. Lima, J. Merritt, and R. Pradel . 2008. Is heterogeneity of catchability in capture-recapture studies a mere sampling artefact or a biologically relevant feature of the population? Population Ecology 50:247–256. Google Scholar
- S. Cubaynes, R. Pradel, R. Choquet, C. Duchamp, J-M. Gaillard, J-D. Lebreton, E. Marboutin, C. Miquel, A-M. Reboulet, C. Poillot, P. Taberlet, and O. Gimenez . 2010. Importance of accounting for detection heterogeneity when estimating abundance: the case of French wolves. Conservation Biology 24:621–626. Google Scholar
- N. J. Dingemanse, C. Both, A. J. van Noordwijk, A. L. Rutten, and P. Drent . 2003. Natal dispersal and personalities in great tits (Parus major). Proceedings of the Royal Society (London) 270:741–747. Google Scholar
- B. P. Dreher, S. R. Winterstein, K. T. Scribner, P. M. Lukacs, D. R. Etter, G. J. M. Rosa, V. A. Lopez, S. Libants, and K. B. Filcek . 2007. Non-invasive estimation of black bear abundance incorporating genotyping errors and harvested bear. Journal of Wildlife Management 71:2684–2693. Google Scholar
- C. Ebert, D. Huckschlag, H. K. Schulz, and U. Hohmann . 2010. Can hair traps sample wild boar (Sus scrofa) randomly for the purpose of non-invasive population estimation? European Journal of Wildlife Research 56:583–590. Google Scholar
- L. S. Eggert, J. A. Eggert, and D. S. Woodruff . 2003. Estimating population size for elusive animals: the forest elephants of Kakum National Park, Ghana. Molecular Ecology 12:1389–1402. Google Scholar
- A. C. Frantz, L. C. Pope, P. J. Carpenter, T. J. Roper, G. J. Wilson, and R. J. Delahay . 2003. Reliable microsatellite genotyping of the Eurasian badger (Meles meles) using faecal DNA. Molecular Ecology 12:1649–1661. Google Scholar
- A. C. Frantz, M. Schaul, L. C. Pope, F. Fack, L. Schley, C. P. Muller, and T. J. Roper . 2004. Estimating population size by genotyping remotely plucked hair: the Eurasian badger. Journal of Applied Ecology 41:985–995. Google Scholar
- V. Gervasi, P. Ciucci, J. Boulanger, M. Posillico, C. Sulli, S. Focardi, E. Randi, and L. Boitani . 2008. A preliminary estimate of the Apennine brown bear population size based on hair-snag sampling and multiple data source capture-recapture Huggins models. Ursus 19:105–121. Google Scholar
- K. Guschanski, L. Vigilant, A. McNeilage, M. Gray, E. Kagoda, and M. M. Robbins . 2009. Counting elusive animals: Comparing field and genetic census of the entire mountain gorilla population of Bwindi Impenetrable National Park, Uganda. Animal Conservation 142:290–300. Google Scholar
- R. Ihaka and R. Gentleman . 1996. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics 5:299–314. Google Scholar
- D. Immell and R. G. Anthony . 2006. Estimation of black bear abundance using a discrete DNA sampling device. Journal of Wildlife Management 72:324–330. Google Scholar
- G. Jacob, R. Debrunner, F. Gugerli, B. Schmid, and K. Bollmann . 2010. Field surveys of capercaillie (Tetrao urogallus) in the Swiss Alps underestimated local abundance of the species as revealed by genetic analyses of non-invasive samples. Conservation Genetics 11:33–44. Google Scholar
- S. H. Knapp, B. A. Craig, and L. P. Waits . 2007. Incorporating genotyping error into non-invasive DNA-based capture-recapture population estimates. Journal of Wildlife Management 73:598–604. Google Scholar
- M. H. Kohn, E. C. York, D. A. Kamradt, G. Haught, R. M. Sauvajot, and R. K. Wayne . 1999. Estimating population size by genotyping faeces. Proceedings of the Royal Society (London) 266:657–663. Google Scholar
- W. A. Link 2003. Nonidentifiability of population size from capture-recapture data with heterogeneous detection probabilities. Biometrics 59:1123–1130. Google Scholar
- W. A. Link 2004. Individual heterogeneity and identifiability in capture-recapture models. Animal Biodiversity and Conservation 27:87–91. Google Scholar
- R. A. Long, T. M. Donovan, P. Mackay, W. J. Zielinski, and J. S. Buzas . 2007. Comparing faeces detection dogs, cameras, and hair snares for surveying carnivores. Journal of Wildlife Management 71:2018–2025. Google Scholar
- P. M. Lukacs and K. P. Burnham . 2005. Review of capture-recapture methods applicable to non-invasive genetic sampling. Molecular Ecology 14:3909–3919. Google Scholar
- F. Marucco, D. H. Pletscher, L. Boitani, M. K. Schwartz, K. L. Pilgrim, and J-D. Lebreton . 2009. Wolf survival and population trend using non-invasive capture-recapture techniques in the Western Alps. Journal of Applied Ecology 46:1003–1010. Google Scholar
- C. Maudet, G. Luikart, D. Dubray, A. von Hardenberg, and P. Taberlet . 2004. Low genotyping error rates in wild ungulate faeces sampled in winter. Molecular Ecology Notes 4:772–775. Google Scholar
- K. S. McKelvey and D. E. Pearson . 2001. Population estimation with sparse data: the role of estimators versus indices revisited. Canadian Journal of Zoology 79:1754–1765. Google Scholar
- K. S. McKelvey and M. K. Schwartz . 2004. Genetic errors associated with population estimation using non-invasive molecular tagging: problems and new solutions. Journal of Wildlife Management 68:439–448. Google Scholar
- P. D. McLoughlin, H. D. Cluff, R. J. Gau, R. Mulders, R. Case, and F. Messier . 2003. Effect of spatial differences in habitat on home ranges of grizzly bears. Écoscience 10:11–16. Google Scholar
- T. Meijer, K. Norén, P. Hellström, L. Dalén, and A. Angerbjörn . 2008. Estimating population parameters in a threatened arctic fox population using molecular tracking and traditional field methods. Animal Conservation 11:330–338. Google Scholar
- G. E. Menkens and S. H. Anderson . 1988. Estimation of small-mammal population size. Ecology 69:1952–1959. Google Scholar
- C. Mettke-Hofmann, C. Ebert, T. Schmidt, S. Steiger, and S. Stieb . 2005. Personality traits in resident and migratory warbler species. Behaviour 142:1357–1375. Google Scholar
- C. R. Miller, P. Joyce, and L. P. Waits . 2005. A new method for estimating the size of small populations from genetic capture-recapture data. Molecular Ecology 14:1991–2005. Google Scholar
- L. S. Mills, J. J. Citta, K. P. Lair, M. K. Schwartz, and D. A. Tallmon . 2000. Estimating animal abundance using non-invasive DNA sampling: promise and pitfalls. Ecological Applications 10:283–294. Google Scholar
- S. Minta and M. Mangel . 1989. A simple population estimate based on simulation for capture-recapture and capture-resight data. Ecology 70:1738–1751. Google Scholar
- C. Miquel, E. Bellemain, C. Poillot, J. Bessière, A. Durand, and P. Taberlet . 2006. Quality indexes to assess the reliability of genotypes in studies using non-invasive sampling and multiple-tube approach. Molecular Ecology Notes 6:985–988. Google Scholar
- G. Mowat, D. C. Heard, D. R. Seip, K. G. Poole, G. Stenhouse, and D. W. Paetkau . 2005. Grizzly Ursus arctos and black bear Ursus americanus densities in the interior mountains of North America. Wildlife Biology 11 (1):31–48. Google Scholar
- G. Mowat and D. Paetkau . 2002. Estimating marten Martes americana population size using hair capture and genetic tagging. Wildlife Biology 8 (3):201–209. Google Scholar
- G. Mowat and C. Strobeck . 2000. Estimating population size of Grizzly bears using hair capture, DNA profiling, and mark- recapture analysis. Journal of Wildlife Management 64:183–193. Google Scholar
- M. E. Obbard, E. J. Howe, and C. J. Kyle . 2010. Empirical comparison of density estimators for large carnivores. Journal of Applied Ecology 47:76–84. Google Scholar
- D. L. Otis, K. P. Burnham, G. C. White, and D. R. Anderson . 1978. Statistical inference for capture-recapture experiments. Wildlife Monographs 62:1–135. Google Scholar
- D. Paetkau 2003. An empirical exploration of data quality in DNA-based population inventories. Molecular Ecology 12:1375–1387. Google Scholar
- E. Petit and N. Valière . 2006. Estimating population size with non-invasive capture-recapture data. Conservation Biology 20:1062–1073. Google Scholar
- M. P. Piggott and A. C. Taylor . 2003. Remote collection of animal DNA and its applications in conservation management and understanding the population biology of rare and cryptic species. Wildlife Research 30:1–13. Google Scholar
- S. Pledger and M. Efford . 1998. Correction of bias due to heterogeneous capture probability in capture-recapture studies in open populations. Biometrics 54:888–898. Google Scholar
- K. H. Pollock, J. D. Nichols, C. Brownie, and J. E. Hines . 1990. Statistical inference for capture-recapture experiments. Wildlife Monographs 107:1–97. Google Scholar
- K. H. Pollock, J. D. Nichols, C. Brownie, and J. E. Hines . 1982. Statistical inference for capture-recapture experiments. Wildlife Monographs 107:1–97. Google Scholar
- K. G. Poole, G. Mowat, and D. A. Fear . 2001. DNA-based population estimate for grizzly bears Ursus arctos in northeastern British Columbia, Canada. Wildlife Biology 7 (2):105–115. Google Scholar
- R. Pradel 2005. Multievent, an extension of multistate capture-recapture models to uncertain states. Biometrics 61:442–447. Google Scholar
- C. Prigioni, L. Remonti, A. Balestrieri, S. Sgrosso, G. Priore, N. Mucci, and E. Randi . 2006. Estimation of European otter (Lutra lutra) population size by fecal DNA typing in southern Italy. Journal of Mammalogy 87:855–858. Google Scholar
- L. R. Prugh, C. E. Ritland, S. M. Arthur, and C. J. Krebs . 2005. Monitoring coyote population dynamics by genotyping faeces. Molecular Ecology 14:1585–1596. Google Scholar
- S. J. Puechmaille and E. Petit . 2007. Empirical evaluation of non-invasive capture-capture-recapture estimation of population size based on a single sampling session. Journal of Applied Ecology 44:843–852. Google Scholar
- D. A. Roon, M. E. Thomas, K. C. Kendall, and L. P. Waits . 2005a. Evaluating mixed samples as a source of error in non-invasive genetic studies using microsatellites. Molecular Ecology 14:195–201. Google Scholar
- D. A. Roon, L. P. Waits, and K. C. Kendall . 2005b. A simulation test of the effectiveness of several methods for error-checking non-invasive genetic data. Animal Conservation 8:203–215. Google Scholar
- J. A. Rudnick, T. E. Katzner, E. A. Bragin, and J. A. DeWoody . 2008. A non-invasive genetic evaluation of population size, natal philopatry, and roosting behaviour of non-breeding eastern imperial eagles (Aquila heliaca) in central Asia. Conservation Genetics 9:667–676. Google Scholar
- M. A. W. Ruis, J. H. A. te Brake, J. A. van de Burgwal, I. C. de Jong, H. J. Blokhuis, and J. Koolhaas . 2000. Personalities in female domesticated pigs: behavioural and physiological indications. Applied Animal Behavioural Sciences 66:31–47. Google Scholar
- T. L. J. Scheppers, A. C. Frantz, M. Schaul, E. Engel, P. Breyne, L. Schley, and T. J. Roper . 2007. Estimating social group size of Eurasian badgers Meles meles by genotyping remotely plucked single hairs. Wildlife Biology 13 (2):195–207. Google Scholar
- K. E. Settlage, F. T. Van Manen, J. D. Clark, and T. L. King . 2008. Challenges of DNA-based capture-recapture studies of American black bears. Journal of Wildlife Management 72:1035–1042. Google Scholar
- K. H. Solberg, E. Bellemain, O-M. Drageset, P. Taberlet, and J. E. Swenson . 2006. An evaluation of field and non-invasive genetic methods to estimate brown bear (Ursus arctos) population size. Biological Conservation 128:158–168. Google Scholar
- R. Sweitzer, D. van Vuren, I. A. Gardner, W. Boyce, and J. D. Waithman . 2000. Estimating sizes of wild pig populations in the north and central coast regions of California. Journal of Wildlife Management 64:531–543. Google Scholar
- P. Taberlet, L. P. Waits, and G. Luikart . 1999. Non-invasive genetic sampling: look before you leap. Trends in Ecology and Evolution 14:323–327. Google Scholar
- D. A. Triant, R. M. Pace III., and M. Stine . 2004. Abundance, genetic diversity and conservation of Louisiana black bears (Ursus americanus luteolus) as detected through non-invasive sampling. Conservation Genetics 5:647–659. Google Scholar
- S. K. Wasser, B. Davenport, E. R. Ramage, K. E. Hunt, M. Parker, C. Clarke, and G. Stenhouse . 2004. Faeces detection dogs in wildlife research and management: application to grizzly and black bears in the Yellowhead Ecosystem, Alberta, Canada. Canadian Journal of Zoology 82:475–492. Google Scholar
- G. C. White, D. R. Andersen, K. P. Burnham, and D. L. Otis . 1982. Capture-recapture and removal methods for sampling closed populations. Los Alamos National Laboratory Report LA-8787-NERP,. Los Alamos. New Mexico, USA. pp. Google Scholar
- G. C. White and K. P. Burnham . 1999. Program MARK: survival estimation from populations of marked animals. Bird Study 46:120–138. Google Scholar
- B. K. Williams, J. D. Nichols, and M. J. Conroy . 2002. Analysis and management of animal populations. Academic Press. San Diego, California, USA. pp. Google Scholar
- G. J. Wilson, A. C. Frantz, L. C. Pope, T. J. Roper, T. A. Burke, C. L. Cheeseman, and R. J. Delahay . 2003. Estimation of badger abundance using faecal DNA typing. Journal of Applied Ecology 40:658–666. Google Scholar
- J. G. Woods, D. Paetkau, D. Lewis, B. N. McLellan, M. Proctor, and C. Strobeck . 1999. Genetic tagging of free-ranging black and brown bears. Wildlife Society Bulletin 27:616–627. Google Scholar
Details about all the reviewed articles. For each article, every single population estimate is registered separately (e.g. in studies which were carried out over several years or in different study areas). P average is the average per session capture probability reported in the studies (n.r. = not reported). For each study, it is noted if individual heterogeneity (IH) was detected via goodness-of-fit (GOF) testing or model selection procedure.
Table 1. Continued.
Details about all the reviewed articles. For each article, every single population estimate is registered separately (e.g. in studies which were carried out over several years or in different study areas). P average is the average per session capture probability reported in the studies (n.r. = not reported). For each study, it is noted if individual heterogeneity (IH) was detected via goodness-of-fit (GOF) testing or model selection procedure.
Support of logistic regression models testing the impact of coverage (i.e. the ratio on the detection of individual heterogeneity in the reviewed studies (‘samp_occ’ = number of sampling occasions).
Support of logistic regression models testing the impact of capture probability on the detection of individual heterogeneity in the reviewed studies (‘p’ = capture probability, ‘p2’ = squared capture probability, ‘samp_occ’ = number of sampling occasions).
Results of the regression model testing for the impact of capture probability on the detectability of individual heterogeneity with best fit according to AIC ranking (model ‘p, p2’).