A critique of the use of indicator-species scores for identifying thresholds in species responses

Thomas F. Cuffney

doi:10.1899/12-056.1

How to translate text using browser tools

26 June 2013 A critique of the use of indicator-species scores for identifying thresholds in species responses

Thomas F. Cuffney

Author Affiliations +

Freshwater Science, 32(2):471-488 (2013). https://doi.org/10.1899/12-056.1

Abstract

Identification of ecological thresholds is important both for theoretical and applied ecology. Recently, Baker and King (2010, King and Baker 2010) proposed a method, threshold indicator analysis (TITAN), to calculate species and community thresholds based on indicator species scores adapted from Dufrêne and Legendre (1997). We tested the ability of TITAN to detect thresholds using models with (broken-stick, disjointed broken-stick, dose-response, step-function, Gaussian) and without (linear) definitive thresholds. TITAN accurately and consistently detected thresholds in step-function models, but not in models characterized by abrupt changes in response slopes or response direction. Threshold detection in TITAN was very sensitive to the distribution of 0 values, which caused TITAN to identify thresholds associated with relatively small differences in the distribution of 0 values while ignoring thresholds associated with large changes in abundance. Threshold identification and tests of statistical significance were based on the same data permutations resulting in inflated estimates of statistical significance. Application of bootstrapping to the split-point problem that underlies TITAN led to underestimates of the confidence intervals of thresholds. Bias in the derivation of the z-scores used to identify TITAN thresholds and skewedness in the distribution of data along the gradient produced TITAN thresholds that were much more similar than the actual thresholds. This tendency may account for the synchronicity of thresholds reported in TITAN analyses. The thresholds identified by TITAN represented disparate characteristics of species responses that, when coupled with the inability of TITAN to identify thresholds accurately and consistently, does not support the aggregation of individual species thresholds into a community threshold.

Ecological thresholds are defined rather broadly as abrupt changes in the response of an ecological variable (e.g., species abundance) to a relatively small change in an environmental driver (e.g., urbanization) (Stringham et al. 2003, Groffman et al. 2006, Andersen et al. 2009). Thresholds can indicate the point at which an ecosystem transitions rapidly from one stable state to another with a concomitant change in ecosystem structure and function (Holling 1973, May 1977). Such changes usually represent a diminishment in ecological condition and the goods and services provided by the ecosystem. From a management perspective, threshold identification is important for establishing criteria that prevent degradation of the ecosystem and the subsequent high cost of restoration.

The ability of analytical methods to detect thresholds accurately and consistently is critical to their use as tools for establishing criteria that can protect aquatic resources. Methods that overestimate the threshold lead to degradation of the resource, whereas underestimates overprotect the resource, and inconsistent (i.e., highly variable) estimates provide unreliable protection. Many mathematical and statistical methods have been proposed for detecting thresholds (Brenden et al. 2008, Sonderegger et al. 2009, Dodds et al. 2010), but investigations of the accuracy and precision of these methods and the conditions under which they may be applied are relatively rare in the ecological literature (Daily et al. 2012).

Baker and King (2010, King and Baker 2010) proposed a threshold-detection method based on the indicator-value analysis used by Dufrêne and Legendre (1997) to identify taxa that differentiate among clusters. They describe their threshold indicator analysis (TITAN) program as a nonparametric statistical method for identifying thresholds of individual species, testing the significance of the threshold, estimating uncertainty in threshold estimates, and combining thresholds into an aggregate measure that depicts the community threshold. The TITAN program identifies thresholds by ordering samples along a disturbance gradient (e.g., urban intensity) and then successively splitting the samples into 2 clusters each time the disturbance variable changes value (Table 1). The midpoints of the disturbance variables that differentiate the clusters define a series of potential thresholds in a species' response. Indicator values (IndVals) are calculated for the clusters associated with each potential threshold using a modification of the method used by Dufrêne and Legendre (1997):

where i is the species index, j is the potential threshold that defines the 2 clusters, is the mean abundance in a cluster, O is the occurrence in a cluster, n is the number of samples in a cluster, and max is the maximum. The potential threshold associated with the cluster that has the maximum indicator value (32.5 and 72.5 for Species A and B in Table 1) is the threshold for the response of that species.

Table 1.

Indicator values (IndVal_ij) calculated for each 2-group cluster (1 and 2) associated with potential thresholds (i.e., midpoint between unique disturbance values) for a hypothetical data set. * indicates potential threshold that would not be considered in a threshold indicator analysis (TITAN) with a minimum cluster size criterion of 5.

TITAN restricts the minimum sample size for a group to 3, though a minimum group size of 5 is recommended. Consequently, the search for the threshold often starts at some distance from the low end and ends at some distance from the high end of the gradient (Table 1) with the displacement dependent upon the distribution of the data along the gradient. Permutations of the data in the clusters are used to establish the statistical significance of the TITAN threshold under the null hypothesis of no change (or clustering) in the abundance or occurrence of the species along the gradient. The mean and variance of the IndVals (μ_ind, σ_ind, respectively) estimated from the permutations are used to calculate the z-score for each species that is used in the calculation of the community threshold. Confidence limits for each species' threshold are derived by bootstrapping.

We investigated the ability of TITAN to identify accurately and consistently thresholds in simple models with known response forms, thresholds, and variation. We calculated IndVal_ijs with a direct application of the methods of Dufrêne and Legendre (1997) to assess how IndVal_ijs change in these data sets and how closely the potential thresholds associated with the maximum IndVal_ij correspond to the actual thresholds and the thresholds identified by TITAN. We also investigated the methods used to establish statistical significance and confidence intervals for thresholds in TITAN. The work presented here builds on the analyses that were broadly outlined in Cuffney et al. (2011).

Methods

We assessed the ability of the TITAN method to detect thresholds accurately based on detection of known thresholds in data sets simulating the responses of a community of 225 species to a gradient of disturbance ranging from 0 (least disturbed) to 100 (most disturbed) (Supplemental data file; available online from: http://dx.doi.org/10.1899/12-056.1.s1 (10.1899_12-056.1.s1.xlsx)). We modeled species responses as variations of 6 response models: broken-stick (BS), disjointed broken-stick (BSdj), step-function (SF), dose-response (DR), Gaussian (GA), and linear (LIN) (Fig. 1A–F). BS, BSdj, SF, and DR models (eqs 4–{ label needed for disp-formula[@id='i2161-9565-32-2-471-e05'] }{ label needed for disp-formula[@id='i2161-9565-32-2-471-e06'] }7 below) represented responses with statistical thresholds (i.e., model parameters changed at the threshold), GA models (eq. 8 below) represented responses with an ecological rather than a statistical threshold (i.e., the threshold is defined by the species optimum, μ, and model parameters do not change at the threshold), and LIN models (eq. 3) represented responses without a threshold. We based models on 201 observations evenly distributed across the gradient with no replication.

Fig.1.

Response models used to examine the performance of the Indicator Value (IndVal) and threshold indicator analysis (TITAN) methods: broken-stick (BS) (A), disjointed broken-stick (BSdj) (B), step-function (SF) (C), dose-response (DR) (D), Gaussian (GA) (E), and linear (LIN) (F) models.

The 6 response models were defined as:

{ label needed for disp-formula[@id='i2161-9565-32-2-471-e04'] }

{ label needed for disp-formula[@id='i2161-9565-32-2-471-e05'] }

{ label needed for disp-formula[@id='i2161-9565-32-2-471-e06'] }

{ label needed for disp-formula[@id='i2161-9565-32-2-471-e07'] }

where y_j is the j^th observed species abundance, x_j is the j^th observed disturbance value, a is the intercept, b is the slope, ϕ is the threshold value, ε is the error term [∼N(0, σ²)], δ is the offset value, A is the maximum abundance, μ is the species optimum, and σ is the species tolerance.

LIN models lacked a statistical or ecological threshold and were defined by a single intercept and slope (eq. 3, Fig. 1F). BS models consisted of 2 joined line segments (A–B and B–C; Fig. 1A) that differed in slopes and intercepts (eq. 4, Fig. 1A). BSdj models were similar to the broken-stick models except that the end of the 1^st line segment (B) and the beginning of the 2^nd (C) were displaced along the y-axis with the displacement determined by δ₁ and δ₂ for intercepts and slopes, respectively (eq. 5, Fig. 1B). SF models were similar to the disjointed broken-stick models in that the ends of the 2 line segments (B and C) were displaced along the y-axis (a₁ and a₂; eq. 6, Fig. 1C), but differed in having 0 slopes for both line segments. Because slopes are 0, the intercepts in the SF model are the mean abundances in each line segment (a₁ = μ₁, a₂ = μ₂). The DR models consisted of 3 joined line segments (A–B, B–C, and C–D) in which the slope of segment B–C differed from A–B and C–D and the slopes of A–B and C–D may or may not have differed from one another or from 0 (eq. 7, Fig. 1D). The 3 segments of the DR model result in thresholds at points B and C (ϕ₁ and ϕ₂, respectively). GA models were modeled as a bell-shaped curve with the species' optimum (μ) identifying the ecological threshold (i.e., disturbance value at the transition between increasing and decreasing species abundance), σ representing the species' tolerance, and A the maximum abundance (eq. 8, Fig. 1E). The tails of the GA model were truncated by setting abundances <1 to 0.

We modeled 41 species as LIN, 42 as BS, 41 as BSdj, 36 as SF, 33 as DR, and 32 as GA. We varied model parameters (e.g., location of the thresholds, intercepts, slopes, offsets, maximum abundance, means, and standard deviations) to produce models with different abundance and occurrence characteristics. Disturbance values were uniformly distributed across the gradient and coupled with a single response value (i.e., no replication). We examined the effects of variability on the derivation of thresholds by introducing an additive error term with a normal distribution of mean 0 and a standard deviation expressed as a percentage (1, 5, 10, 25, 40, 60, 80, and 100%) of the response value resulting in 2025 simulated species responses.

These species response models are relatively simplistic and cannot provide a complete representation of all possible response models, but they do represent common patterns of response and provide important insights into the ability of TITAN to detect thresholds and assess uncertainty under a variety of conditions. The simulated responses represent conditions in which the response form and thresholds are known and unambiguous (variance = 0), and conditions in which the threshold and response form, while known, are obscured by varying amounts of random variability. Our approach complements the hypothetical response curves analyzed in Baker and King (2010) and King and Baker (2010) that, while addressing realistic responses, were not able to address issues related to the conditions (e.g., response form, level of variability) under which TITAN can and cannot accurately identify thresholds. We used additional models to investigate the effects of replicated data, data distributions (uniform, right-, and left-skewed), and distributions of 0 values on threshold detection in TITAN.

We calculated thresholds using the TITAN program (Baker and King 2010) with the following default values: 250 permutations, taxa present at a minimum of 5 sites, minimum cluster size of 5, and log₁₀(y + 1)-transformation of abundance data. We repeated TITAN analyses 25 times to assess variability in the estimates of thresholds. We used a smaller set of representative data to examine issues related to bootstrapping because of the extremely long calculation times required for bootstrapping.

We calculated IndVal_ijs independently of TITAN with the method of Dufrêne and Legendre (1997). We placed samples in ascending order along the disturbance gradient and defined potential thresholds as the mid-points between successive unique disturbance values. We log₁₀(y + 1)-transformed abundances prior to calculating IndVals. Table 1 illustrates the process of identifying potential thresholds, calculating IndVals values (eqs 1, 2), and the effects of TITAN's minimum cluster size (5) on threshold detection in a small data set representing 2 species responses that follow a SF model. Two samples had replicate disturbance values (25, 25) resulting in potential thresholds at 22 ([19+25]/2) and 32.5 ([25+40]/2), but not at 25 ([25+25]/2). The clusters defined by each potential threshold are represented as 1s and 2s in Table 1. IndVals can be calculated for all potential thresholds, but TITAN requires a minimum number of samples in each cluster. For example, only 3 (32.5, 41.0, and 51.0) of the 10 potential thresholds in Table 1 would be considered in a TITAN analysis if a minimum cluster size of 5 were used. Independently calculated IndVal_ijs were used to determine the response of IndVal_ijs across the gradient and the relationship between the maximum IndVal_ij (e.g., 32.5 for Species A and 72.5 for Species B) and the TITAN threshold.

Results and Discussion

The TITAN program reported statistically significant (p < 0.05) thresholds for all species response models with <25% variability (Table 2). However, even under ideal conditions (0% introduced variability) the thresholds detected by the TITAN program were consistently accurate estimates of the actual thresholds only for SF models (Fig. 2A–F). Under less-ideal conditions (1–100% introduced variability), the deviation of TITAN thresholds from actual thresholds tended to increase when random variability was ≥40%, particularly for SF models (Fig. 3). Deviations for GA models were the largest of all the models studied and exhibited little change as variability increased. The level of variability (40%) at which statistical significance dropped off and the deviation of SF thresholds increased would not be unexpected for replicate macroinvertebrate samples depending upon the sampling methods used, the abundance of the taxon, and the spatial distribution of the taxon (Elliott 1977, Resh 1979).

Fig.2.

Correspondence between thresholds estimated by threshold indicator analysis (TITAN) and the actual thresholds for the broken-stick (BS) (A), disjointed broken-stick (BSdj) (B), step-function (SF) (C), Gaussian (GA) (D), and lower (E) and upper dose-response (DR) (F) models (variance = 0%). The dashed line indicates the 1∶1 correspondence between the actual thresholds and the thresholds estimated by TITAN.

Fig.3.

Median deviation between the threshold indicator analysis (TITAN)-estimated threshold and the actual threshold at various levels of introduced random variability.

Table 2.

Percentage of species-response models (Online Appendices) that were identified as having statistically significant thresholds (p ≤ 0.05) by the threshold indicator analysis (TITAN) program at levels of introduced variability ranging from 0 to 100% of the response. BS = broken stick, BSdj = disjointed broken stick, DR = dose response, SF = step function, GA = Gaussian, LIN = linear.

Plots of IndVal_ijs showed that the maximum value corresponded to the actual threshold only for SF models (Fig. 4A–H, Online Appendices; available online from: http://dx.doi.org/10.1899/12-056.1.s2 (10.1899_12-056.1.s2.docx)). Maximum IndVal_ij for BS, BSdj, and LIN models tended to occur at either end of the species distribution along the gradient (Fig. 4A–C, H, Online Appendices). If the species' abundance increased across the gradient, IndVal_ij decreased across the gradient. Conversely, if abundance decreased across the gradient, IndVal_ij increased across the gradient. If the species occurred over only a portion of the gradient (truncated species distributions; Fig. 5A–F), IndVal_ijs reached a peak at the point where the response curve intersected the x-axis rather than at the actual threshold. This corresponds to the point on the gradient that divided the data into a cluster consisting of all 0 values and a cluster consisting of non-0 values.

Fig.4.

Broken-stick (BS) models 2 (A) and 26 (B), disjointed broken-stick (BSdj) model 2 (C), step-function (SF) model 3 (D), dose-response (DR) model 4 (E), Gaussian (GA) models 4 (F) and 13 (G), and linear (LIN) model 4 (H) showing the response model (solid black line), the actual threshold (vertical dashed black line), the 25 thresholds estimates obtained from threshold indicator analysis (TITAN) (vertical light gray lines), the mean TITAN threshold (vertical gray dash-dot line), and the independently calculated indicator value (IndVal_ij) (dark gray response line). Model abbreviations and numbers refer to the models in Online Appendices.

Fig.5.

Effect of truncated species distributions (structural 0s) on the ability of TITAN to detect thresholds accurately in broken stick (BS) models 7 (A) and 33 (B), disjointed broken stick (BSdj) model 35 (C), Gaussian (GA) models 10 (D) and 1 (E), and linear (LIN) model 28 (F) showing the response model (solid black line denotes the response domain, black line with white center denotes 0 abundances), the actual threshold (vertical dashed black line), the 25 thresholds estimates obtained from threshold indicator analysis (TITAN) (vertical light gray lines), the mean TITAN threshold (vertical gray dash-dot line), and the independently calculated indicator value (IndVal_ij) (dark gray response line). Model abbreviations and numbers refer to the models in Online Appendices.

IndVal_ijs in GA models (Figs 4F, G, 5D, E, Online Appendices) showed responses that were similar to BS, BSdj, and LIN models when the model was asymmetrically located along the gradient resulting in an overall increasing or decreasing trend in the abundance (GA13 and GA10; Figs 4G, 5D). When GA models were symmetrically located across the gradient (GA4 and GA1; Figs 4F, 5E), peak IndVal_ijs corresponded to the ends of the distribution and reached a minimum, rather than a maximum, value at the threshold (μ). In these cases, multiple runs of the TITAN program detected one or the other end of the GA distribution resulting in a mean threshold estimate that was much closer to the actual threshold than any of the individual TITAN threshold estimates. As with BS, BSdj, and LIN models, the IndVal_ijs in GA models peaked at the point or points where the abundance curve intercepted the x-axis (GA1 and GA10, Fig. 5D, E) rather than at the threshold when the species occurred over only a portion of the disturbance gradient.

IndVal_ijs in DR models typically showed a peak that lay between the 2 thresholds (DR4; Fig. 4E), though some DR models showed a monotonic response across the gradient with a maximum value at the ends of the disturbance gradient if line segment A–B or C–D was very short (e.g., DR17; Online Appendices). Step-function models (SF3; Fig. 4D, Online Appendices) were the only response models that showed a sharp peak in the IndVal_ijs that corresponded to the actual threshold and to the threshold identified by TITAN.

Threshold identification in TITAN

Thresholds estimated by TITAN varied each time the data were run through the program (Fig. 6) even though the underlying data, potential thresholds, and IndVal_ijs did not change (Figs 4A–H, 5A–F, Online Appendices). The variability in threshold estimates occurs because TITAN uses the maximum z-score to identify the threshold rather than the maximum IndVal_ij value (Fig. 7A–C). z-scores are derived from a relatively small subset (e.g., 250–500) of all possible permutations of the data in the clusters. Consequently, the statistics used to calculate z-scores (μ_ind and σ_ind) and to identify the threshold vary each time the program is run. Increasing the number of permutations would appear to address this issue, but other problems with the derivation of z-scores cannot be addressed simply by increasing the number of permutations.

Fig.6.

Variability (coefficient of variation) in the threshold estimates produced by the threshold indicator analysis (TITAN) program for the broken-stick (BS), disjointed broken-stick (BSdj), step-function (SF), dose-response (DR), Gaussian (GA), and linear (LIN) response models. Variability is based on 25 iterations of the TITAN program using the default value of 250 permutations.

Fig.7.

Variables (abundance and indicator values [IndVal_ij]) and statistics (μ, σ, and z-scores of IndVal_ijs) used to identify thresholds in threshold indicator analysis (TITAN) based on a common broken-stick model with left-skewed (A), uniform (B), and right-skewed (C) data distributions. The solid vertical line identifies the actual threshold and the dashed line indicates the TITAN threshold.

Disparities in cluster sizes across the disturbance gradient introduce a systematic bias in the μ_ind and σ_ind statistics used to derive the z-scores and identify thresholds. This bias results in a U-shaped distribution across the gradient (Fig. 7A–C) with larger values of μ_ind and σ_ind at the ends of the gradient and smaller values in the middle. This pattern was evident for all model forms (BS, BSdj, SF, DR, GA, LIN) examined and results from disparities in cluster sizes associated with potential thresholds across the gradient. For example, if data are evenly distributed across the gradient (uniform; Fig. 7B), the larger disparity in cluster sizes at the ends (low and high) of the gradient will bias μ_ind and σ_ind toward high values and z-scores toward low values. This bias results in z-scores that can peak near, but not at, the ends of the gradient and account for why TITAN does not identify thresholds at the ends of the gradient when the IndVal_ijs show a monotonic response across the gradient. The bias in μ_ind, σ_ind, and z-scores persist even if the distribution of data across the gradient is highly skewed (Fig. 7A, C).

TITAN accurately and consistently detected thresholds in responses only for SF models. It could not accurately detect thresholds that were associated with abrupt changes in rates (slopes) or direction (GA) of response nor could it differentiate responses that lacked a threshold (LIN). A more comprehensive approach to threshold detection would be to compare multiple alternative models (Qian et al. 2003, Qian and Cuffney 2012) to determine whether the data fit a response model that is indicative of a threshold or whether the response cannot be differentiated from a model without a threshold (e.g., linear model).

Statistical significance and confidence limits

TITAN uses permutation tests to determine the statistical significance of threshold estimates. On the basis of these tests, the thresholds derived for all simulated species responses with <25% introduced variability were statistically significant (p ≤ 0.004), even those derived from LIN responses that have no thresholds (Table 2). The percentage of statistically significant thresholds tended to decrease as variability increased >25%, but most response models still had a high percentage of statistically significant thresholds at relatively high levels of variability.

TITAN assesses the statistical significance of thresholds using the same permutations that were used to calculate z-scores and identify thresholds (Table 3). The maximum z-score across the gradient (15.66) identifies the threshold (49.5), and the proportion of permuted IndVals that are larger than the IndVal_ij associated with the threshold (52.99) is used to determine the statistical significance of the threshold (p = 0.002). This approach is problematic because the permutations used to identify the threshold (i.e., find the best 2-cluster classification) also are used to test the statistical significance of the threshold (classification). Evaluation of the statistical significance of classifications using permutations requires that the classification must be derived a priori for the tests to be statistically valid, as emphasized in analysis of similarity (ANOSIM; Clarke and Gorley 2006) and multiresponse permutation procedure (MRPP; McCune and Mefford 1999) analyses. Because the permutation tests in TITAN are used both to derive and to test the classifications, the p-values derived from TITAN overestimate the statistical significance of the thresholds. This accounts for the very low p-values (1/[number of permutations + 1]) commonly observed even for LIN models.

Table 3.

Indicator values (IndVal_ijs), means (μ_ind), variances (σ_ind), and z-scores obtained from 500 permutations of the data in the clusters defined by the potential thresholds.

TITAN uses a bootstrap resampling method to calculate confidence intervals for the selected threshold. Bootstrapping uses Monte Carlo simulation to obtain an approximation of the sampling distribution of the variable of interest by substituting random samples from the existing data for random samples from the target population (Efron and Tibshirani 1993). However, bootstrapping is not appropriate for a split-point problem (Bühlmann and Yu 2002, Banerjee and McKeague 2007), such as TITAN, because the estimated standard deviation of the threshold will always be smaller than the true standard deviation, thereby leading to a narrower confidence interval. In a TITAN analysis with k unique gradient values, the number of potential thresholds (k − 1) and their values are fixed. The number of potential thresholds in a bootstrap sample is always <k − 1 and represents a subsample of the same k −1 potential thresholds in the original data. In other words, the bootstrapping process repeatedly selects the threshold from the same pool of k − 1 potential thresholds. Consequently, the bootstrapped estimated standard deviation is much smaller than it should be and the estimated confidence interval is much narrower than it should be. This problem can be illustrated by comparing the distribution of thresholds derived using bootstrapping in TITAN to IndVal_ijs (Fig. 8A–C). The displacement of the distribution toward the center of the gradient is apparent and is similar to the bias introduced in z-score calculations using statistics derived from permutations.

Fig.8.

Three response models that illustrate the changes in abundance, indicator values (IndVal_ij), and distribution of TITAN threshold estimates obtained from bootstrapping along the disturbance gradients. A.—Broken stick (BS)1. B.—BS2. C.—Disjointed broken stick (BSdj).

The uncertainty analyses incorporated into TITAN (permutation tests and bootstrapping) have statistical problems that lead to overestimates of the significance of the thresholds and underestimates of confidence intervals. A better approach to estimating uncertainty would be to use Bayesian methods to estimate model parameters (e.g., ϕ in eqs 4–{ label needed for disp-formula[@id='i2161-9565-32-2-471-e05'] }{ label needed for disp-formula[@id='i2161-9565-32-2-471-e06'] }7, μ in eq. 8) from which statistics (mean, standard deviation, and confidence limits) could be derived and tested (Qian et al. 2003, Qian and Cuffney 2012). Bayesian analysis would provide better estimates of uncertainty that could be applied to a wide range of model types and compared to identify the most appropriate representation of the response.

Abundance, occurrence, and thresholds

TITAN accurately detects thresholds in SF models because the abrupt change in abundance that defines these models (Fig. 1C, eq. 6) divides the data into 2 clusters that maximize the difference in average abundance relative to other cluster pairs. Consequently, the potential threshold associated with the maximum IndVal_ij and maximum z-value corresponds to the actual threshold. In contrast, models in which thresholds are defined by an abrupt change in the rate (slopes in BS, BSdj, or DR; Fig. 1A, B, D) or direction of response (GA; Fig. 1E) produce gradual changes in average abundance across successive clusters. For these models, neither maximum IndVal_ij nor maximum z-value corresponds to the actual threshold unless the model approximates a SF model (e.g., BSdj model with slopes close to 0).

Threshold detection in TITAN also is strongly influenced by the distribution of occurrences across the gradient. As with abundance, TITAN readily detects thresholds that correspond to an abrupt change in occurrence. Such changes are often associated with the limits of a species distribution across the gradient (i.e., species' response domain; Fig. 9A, B) or with the distribution of occurrences within a species' response domain (Fig. 9C, D). TITAN's sensitivity to the distribution of occurrences is evident when species responses include abrupt changes in both occurrence and abundance. In these situations, TITAN identifies thresholds associated with relatively small changes in occurrence while ignoring thresholds associated with large changes in abundance (Fig. 9B–D). Abrupt changes in occurrence represent interesting characteristics of a species response, but whether they should be interpreted as thresholds (i.e., an abrupt change in an ecological attribute) (Groffman et al. 2006) and combined into a community threshold is a matter of debate. For example, if the change in occurrence arises because the species is present over only a portion of the disturbance gradient, the resulting threshold will be consistent with the threshold definition if the transition to 0 constitutes an abrupt change (Fig. 9A), but not if it constitutes a gradual change (Fig. 9B). When the change in occurrence is associated with the distribution of 0 values within the response domain of the species (Fig. 9C, D), these 0 values constitute errors in sampling (failure to detect the species when it is present) or experimental design (response to environmental factors not captured in the disturbance measure). A relatively small difference in the distribution of these errors (0 values) can cause TITAN to identify a threshold associated with the errors even if other thresholds are present (Fig. 9C, D). Identifying thresholds in the distribution of errors within a species' response domain is ecologically and statistically meaningful, but this threshold should not be combined with thresholds representing other types of response characteristics (e.g., abrupt change in slopes, change in direction of response, or limits of the response domain) to form a community threshold.

Fig.9.

The effect of an abrupt change to 0 abundance (A), a gradual change to 0 abundance (B), and changes in the distribution of 0 values within the response domain of a step-function (C) and broken-stick (D) model on thresholds identified by TITAN (vertical gray dashed line).

Confusing thresholds that arise from sampling 0s (errors), structural 0s (response domains), or changes in abundance (slopes and direction) can lead to errors regarding the presence and ecological significance of thresholds. Unfortunately, distinguishing between the various causes of threshold detection to which TITAN is sensitive can be extremely difficult, particularly in real world situations where gradients may be short, experimental control may be tenuous, and levels of organism identification less than optimal (e.g., mix of taxonomic levels), all of which make differentiating responses along the disturbance gradient difficult. Only when analyses are conducted on simple, idealized responses with known forms, thresholds, and levels of random variability can the limitations of threshold methods be determined (Daily et al. 2012).

Synchronicity and data distributions

Baker and King (2010, King and Baker 2010, King et al. 2011) identify synchronicity of thresholds among species as a major finding of their work with TITAN. However, at least some of this apparent synchronicity probably results from a combination of inaccurate threshold estimates, biases in z-scores, and effects of skewed data distributions. Because of z-score bias, models (BS, BSdj, LIN, GA) in which IndVal_ijs exhibit a monotonic response over the gradient will yield TITAN thresholds near the low (increasing abundance) or high (decreasing abundance) end of the gradient regardless of the actual threshold. This result gives the impression that species have similar (synchronous) thresholds when actual thresholds may be very different or nonexistent. Strongly skewed data distributions also cause TITAN to identify species thresholds that are more similar to one another than are the actual thresholds, particularly when the minimum cluster size constraint (5) results in inclusion of the actual threshold in the first or last cluster (e.g., highest threshold in left-skewed and lowest threshold in right-skewed data). This situation causes TITAN to underestimate the actual threshold and increases the similarity among thresholds. These effects can be illustrated by comparing TITAN thresholds derived for BS, BSdj, and SF models that have thresholds at 20, 40, 60, and 80 along a gradient that ranges from 0 to 100. The ranges in TITAN thresholds for each model were strongly affected by the data distribution (uniform, left-, or right-skewed; Fig. 7A–C) and the direction of change (increasing or decreasing) in abundances across the gradient (Table 4). The thresholds derived from SF models with uniform data distributions captured essentially all of the range in the actual thresholds (>98%). However, thresholds derived from SF models with left- or right-skewed data distributions captured only 54 to 58% of the range because the low (20) and high (80) thresholds could not be represented by potential thresholds. Thresholds derived from BS and BSdj models represented 9.3 to 68.2% of the range in actual thresholds. These models were affected by both data distributions (lowest thresholds associated with left- and highest with right-skewed data) and direction of change in abundance (lower thresholds associated with increasing abundance).

Table 4.

Ranges in threshold indicator analysis (TITAN) thresholds expressed as a percentage of the range in actual thresholds (20, 40, 60, 80) for broken-stick (BS), disjointed broken-stick (BSdj), and step-function (SF) models with different data distributions (uniform, left- or right-skewed data) and directions of change in abundances (i.e., increasing or decreasing across the gradient).

These results reinforce the close association between the IndVal method and the SF model and underscore the need to use threshold detection methods that are appropriate for the underlying model and data distribution. Applying TITAN to responses that follow BS, BSdj, DR, GA, and LIN models is likely to produce thresholds that are inaccurate, strongly affected by data distributions, and that indicate a greater level of synchronicity than is supported by the actual models. Selecting an appropriate threshold analysis requires a careful understanding of the capabilities and limitations of the detection method and its applicability to the underlying data model (Daily et al. 2012). Characterization of individual taxon responses and detection of thresholds can be improved by comparing multiple alternative models (Qian and Cuffney 2012) to determine which model best represents the data and whether that model contains a threshold.

Community threshold

The community threshold is intended to provide an assessment of community responses by aggregating the standardized thresholds for the species in the community. The ecological significance of the community threshold depends on TITAN's ability to extract accurate thresholds that represent ecologically equivalent information. Unfortunately, the thresholds detected by TITAN are often imprecise and inaccurate, particularly if the underlying response model is linear or contains a threshold defined by a change in response rate (slope) or direction (BS, BSdj, DR, GA, LIN). TITAN thresholds can also represent very different characteristics of a species response, such as an abrupt change in average abundance (Fig. 9A), the limits of a species' distribution (Fig. 9A, B), or variation in the distribution of sampling errors (Fig. 9C, D) within a species response domain. The diagnostics provided by TITAN (p-values, confidence intervals) are not sufficient to assess which thresholds are statistically meaningful and which thresholds represent ecologically equivalent information about the species responses. Without this information, combining TITAN thresholds into a community threshold carries a significant risk of combining ecologically disparate information into a single index. Before a community threshold can be created, each threshold identified by TITAN must be evaluated to establish that it is a valid threshold and to determine whether it represents equivalent characteristics of the species responses. This evaluation has to occur outside of the TITAN program and should involve examining multiple alternative models to identify the proper model form against which the TITAN-derived threshold can be compared (Qian et al. 2003, Qian and Cuffney 2012). Failure to conduct such in-depth analyses can lead to erroneous conclusions regarding the presence of thresholds and the effect of disturbance on the community. This situation could be problematic if the community threshold were used to establish criteria for protecting the resource, and the resulting threshold was not protective.

Potential thresholds and replication

IndVal_ijs derived independently of TITAN (Figs 4A–F, 5A–H, Online Appendices) were identical to those produced by the TITAN program (variable obsiv in TITAN) as long as the data set did not contain replicate samples. When replicate samples were present, TITAN reported IndVal_ijs for potential thresholds that corresponded to the midpoints between all disturbance values rather than between unique values. For example, TITAN identified 8 potential thresholds (35, 45, 50, 50, 50, 50, 55, and 65) in a data set containing abundances measured at disturbance values of 30, 40, 50, 50, 50, 50, 50, 60, and 70 rather than the 4 potential thresholds (35, 45, 55, 65) that corresponded to the midpoints of the 5 unique disturbance values (30, 40, 50, 60, 70). This approach is problematic because: 1) successive clusters associated with the replicates do not represent a change in disturbance, 2) no basis exists for determining the order in which replicate responses should be incorporated into successive clusters, and 3) successive clusters contain varying numbers of responses for the same disturbance level. As an example, the 4 potential thresholds at disturbance = 50 define 4 clusters that contain 1, 2, 3, and 4 of the replicate responses in the left-hand clusters, and 4, 3, 2, and 1 in the right-hand clusters, respectively. In contrast, the potential threshold at 45 would combine all the observations at disturbance ≤40 into one cluster and all values ≥50 into the other. Treating replicate samples as independent potential thresholds apportions the variability in replicate responses among multiple clusters representing the same disturbance levels. This variability should be associated with a single potential threshold and 2-group cluster that includes all replicates in the same cluster. The TITAN program should be modified to drop IndVal_ijs that are derived from replicate samples to better quantify response variability associated with disturbance levels.

Simplistic models

Our evaluation of the IndVal method and its implementation in TITAN is based on the analysis of relatively simple models. One could argue that these models are unrealistic given the large amounts of variability normally observed in species responses. However, using simple models has enormous advantages when it comes to evaluating the performance of a threshold-detection method (Daily et al. 2012). The form of the response model can be defined in precise mathematical terms (eqs 3–{ label needed for disp-formula[@id='i2161-9565-32-2-471-e04'] }{ label needed for disp-formula[@id='i2161-9565-32-2-471-e05'] }{ label needed for disp-formula[@id='i2161-9565-32-2-471-e06'] }{ label needed for disp-formula[@id='i2161-9565-32-2-471-e07'] }8), which allows method performance to be assessed across different model types (e.g., BS, BSdj, DR, SF, GA, and LIN). The characteristics that define each model (e.g., slopes, offsets, thresholds, replication, variability, occurrences, and distribution of data across the gradient) can be manipulated individually and collectively to identify effects on threshold detection. The use of simple models allowed us to assess the accuracy and precision of thresholds detected in TITAN for various response models and to identify the effects of the response domain (sampling and structural zeros) and data distributions on threshold detection, neither of which would have been feasible with more realistic and less well understood data sets. The philosophy that underlies using simple models for method evaluation is very simple: 1) a method that cannot accurately detect a threshold in a simple model that represents the best statistical conditions for threshold detection is unlikely to detect a threshold accurately in a more complicated situation, and 2) the behavior of an analytical method can be best understood when the characteristics of the underlying models are clearly understood prior to analysis.

Conclusions

The TITAN program has been advanced as a nonparametric method of threshold detection that can be applied to a wide variety of response models (Baker and King 2010) and that can detect thresholds at very low levels of disturbance (King et al. 2011). It is an easy-to-use method that is beginning to appear in the literature (Kail et al. 2012). However, our analysis of the performance of TITAN indicates that it has important limitations that need to be considered. The problems with TITAN primarily arise from 3 sources: 1) the use of the IndVal method (Dufrêne and Legendre 1997) without consideration of its relation to the underlying response model (i.e., a discrete method applied to continuous data), 2) biases in statistics (μ_ind, σ_ind) derived from permutations of the data, and 3) the use of the same permutations to identify and test the statistical significance of thresholds. Collectively, these problems affect the accuracy, precision, and consistency of threshold identification. The concept of creating a community threshold by aggregating thresholds in TITAN is ecologically attractive, but great care must be exercised to establish that the thresholds extracted by TITAN are accurate and that they represent equivalent ecological characteristics of the species' responses. Combining disparate characteristics of species responses can easily lead to an aggregate threshold that has little ecological meaning and that is unsuitable for supporting management decisions or criteria development. Because the threshold detection method used by TITAN (IndVal) gives reliable results only for certain model forms (e.g., SF), it is imperative that the response of each species be evaluated carefully to know whether the response is appropriate for analysis with TITAN. The diagnostic tools provided with the TITAN program are not sufficient to do this evaluation. Instead, multiple alternative models should be evaluated and compared (Qian et al. 2003, Qian and Cuffney 2012) to determine which model best describes each species' response, whether that model contains a threshold, and whether it is amenable to TITAN analysis.

Acknowledgements

We thank Roxolana Kashuba, Ibrahim Alameddine, and Gerald McMahon for their insightful discussions and comments. Ian Waite (US Geological Survey [USGS], Portland, Oregon), John Van Sickle (US Environmental Protection Agency, Corvallis, Oregon), and Michael Lavine (University of Massachusetts, Amherst, Massachusetts) reviewed an early version of the manuscript and provided many valuable comments and suggestions. Comments and suggestions provided by 2 anonymous referees and Associate Editor Heikki Mykrä are also greatly appreciated. The research was completed while Song Qian was supported by the USGS through a USGS–Duke University cooperative agreement (08HQAG0121).

Literature Cited

1.

T Andersen J Carstensen E Hernandez-Garciaand C. M Duarte 2009. Ecological thresholds and regime shifts: approaches to identification. Trends in Ecology and Evolution 24:49–57. Google Scholar

2.

M. E Bakerand R. S King 2010. A new method for detecting and interpreting biodiversity and ecological community thresholds. Methods in Ecology and Evolution 1:25–37. Google Scholar

3.

M Banerjeeand I. W McKeague 2007. Confidence sets for split points in decision trees. Annals of Statistics 35:543–574. Google Scholar

4.

T. O Brenden L. Z Wangand Z. M Su 2008. Quantitative identification of disturbance thresholds in support of aquatic resource management. Environmental Management 42:821–832. Google Scholar

5.

P Bühlmannand B Yu 2002. Analyzing bagging. Annals of Statistics 30:927–961. Google Scholar

6.

K. R Clarkeand R. N Gorley 2006. PRIMER v6: user manual/tutorial. PRIMER-E Ltd, Plymouth, UK. Google Scholar

7.

T. F Cuffney S. S Qian R. A Brightbill J. T Mayand I. R Waite 2011. Response to King and Baker: limitations on threshold detection and characterization of community thresholds. Ecological Applications 21:2840–2845. Google Scholar

8.

J. P Daily N. P Hitt D. R Smithand C. D Snyder 2012. Experimental and environmental factors affect spurious detection of ecological thresholds. Ecology 93:17–23. Google Scholar

9.

W. K Dodds W. H Clements K Gido R. H Hilderbrandand R. S King 2010. Thresholds, breakpoints, and nonlinearity in freshwaters as related to management. Journal of the North American Benthological Society 29:988–997. Google Scholar

10.

M Dufrêneand P Legendre 1997. Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecological Monographs 67:345–366. Google Scholar

11.

B Efronand R. J Tibshirani 1993.An introduction to the bootstrap Chapman and Hall, New York. Google Scholar

12.

J. M Elliott 1977. Some methods for the statistical analysis of samples of benthic invertebrates. 2^nd edition. Publication 25. Freshwater Biological Association, Ambleside, UK. Google Scholar

13.

P Groffman J Baron T Blett A Gold I Goodman L Gunderson B Levinson M Palmer H Paerl G Peterson N Poff D Rejeski J Reynolds M Turner K Weathersand J Wiens 2006. Ecological thresholds: the key to successful environmental management or an important concept with no practical application? Ecosystems 9:1–13. Google Scholar

14.

C. S Holling 1973. Resilience and stability of ecological systems. Annual Review of Ecology and Systematics 4:1–23. Google Scholar

15.

J Kail J Arleand S. C Jähnig 2012. Limiting factors and thresholds for macroinvertebrate assemblages in European rivers: empirical evidence from three datasets on water quality, catchment urbanization, and river restoration. Ecological Indicators 18:63–72. Google Scholar

16.

R. S Kingand M. E Baker 2010. Considerations for analyzing ecological community thresholds in response to anthropogenic environmental gradients. Journal of the North American Benthological Society 29:998–1008. Google Scholar

17.

R. S King M. E Baker P. F Kazyakand D. E Weller 2011. How novel is too novel? Stream community thresholds at exceptionally low levels of catchment urbanization. Ecological Applications 21:1659–1678. Google Scholar

18.

R. M May 1977. Thresholds and breakpoints in ecosystems with a multiplicity of stable states. Nature 269:471–477. Google Scholar

19.

B McCuneand M. J Mefford 1999. PC-ORD. Multivariate analysis of ecological data. Version 4. MjM Software Design, Gleneden Beach, Oregon. Google Scholar

20.

S. S Qianand T. F Cuffney 2012. To threshold or not to threshold, that is the question. Ecological Applications 15:1–9. Google Scholar

21.

S. S Qian R. S Kingand C. J Richardson 2003. Two statistical methods for the detection of environmental thresholds. Ecological Modelling 166:87–97. Google Scholar

22.

V. H Resh 1979. Sampling variability and life history features: basic considerations in the design of aquatic insect studies. Canadian Journal of Fisheries and Aquatic Sciences 36:290–311. Google Scholar

23.

D. L Sonderegger H. N Wang W. H Clementsand B. R Noon 2009. Using SiZer to detect thresholds in ecological data. Frontiers in Ecology and the Environment 7:190–195. Google Scholar

24.

T. K Stringham W. C Kruegerand P. L Shaver 2003. State and transition modeling: an ecological process approach. Journal of Range Management 56:106–113. Google Scholar

The Society for Freshwater Science

Citation Download Citation

Thomas F. Cuffney "A critique of the use of indicator-species scores for identifying thresholds in species responses," Freshwater Science 32(2), 471-488, (26 June 2013). https://doi.org/10.1899/12-056.1

Received: 5 April 2012; Accepted: 16 January 2013; Published: 26 June 2013

Access the abstract

JOURNAL ARTICLE
18 PAGES

DOWNLOAD PAPER + SAVE TO MY LIBRARY