Stream indicators used to make assessments of biological condition are influenced by many possible sources of variability. To examine this issue, we used multiple-year and multiple-reach diatom, fish, and invertebrate data collected from 20 least-disturbed and 46 developed stream segments between 1993 and 2004 as part of the US Geological Survey National Water Quality Assessment Program. We used a variance-component model to summarize the relative and absolute magnitude of 4 variance components (among-site, among-year, site × year interaction, and residual) in indicator values (observed/expected ratio [O/E] and regional multimetric indices [MMI]) among assemblages and between basin types (least-disturbed and developed). We used multiple-reach samples to evaluate discordance in site assessments of biological condition caused by sampling variability. Overall, patterns in variance partitioning were similar among assemblages and basin types with one exception. Among-site variance dominated the relative contribution to the total variance (64–80% of total variance), residual variance (sampling variance) accounted for more variability (8–26%) than interaction variance (5–12%), and among-year variance was always negligible (0–0.2%). The exception to this general pattern was for invertebrates at least-disturbed sites where variability in O/E indicators was partitioned between among-site and residual (sampling) variance (among-site = 36%, residual = 64%). This pattern was not observed for fish and diatom indicators (O/E and regional MMI). We suspect that unexplained sampling variability is what largely remained after the invertebrate indicators (O/E predictive models) had accounted for environmental differences among least-disturbed sites. The influence of sampling variability on discordance of within-site assessments was assemblage or basin-type specific. Discordance among assessments was nearly 2× greater in developed basins (29–31%) than in least-disturbed sites (15–16%) for invertebrates and diatoms, whereas discordance among assessments based on fish did not differ between basin types (least-disturbed = 16%, developed = 17%). Assessments made using invertebrate and diatom indicators from a single reach disagreed with other samples collected within the same stream segment nearly ⅓ of the time in developed basins, compared to ⅙ for all other cases.
Indicators of biological condition, such as multimetric indices (MMIs; Karr et al. 1986) and estimates of taxonomic completeness (observed/expected ratios [O/E] derived from RIVer Prediction And Classification System [RIVPACS]-type models; Wright 2000) are widely used to assess the ecological integrity of streams, but like all measurements, they are influenced by several sources of variability. Potential sources of variability include methods used for field sampling and laboratory sample processing (Kerans et al. 1992, Doberstein et al. 2000, Li et al. 2001, Cao et al. 2005), sampling variability caused by the spatial and temporal distribution of aquatic assemblages (Canton and Chadwick 1988, Linke et al. 1999, Lindstrom et al. 2004), and the indicators themselves (Ostermiller and Hawkins 2004). Regardless, all sources of variability influence our ability to make reliable assessments of biological condition.
Organized methods exist to evaluate and partition multiple sources of variability (e.g., Larsen et al. 2001, Kincaid et al. 2004). However, relatively few investigators have concurrently examined how sampling variability influences stream biological indicators and subsequent assessments (Ostermiller and Hawkins 2004, Carlisle and Meador 2007, Stribling et al. 2008). Even fewer stream investigators have examined sampling variability relative to annual and among-site variability (Carlisle and Meador 2007). Most such studies have been focused on invertebrate assemblages.
The US Geological Survey's (USGS) National Water Quality Assessment Program (NAWQA) was designed to understand specific landuse effects on aquatic ecosystems in different environmental settings across the USA (Gilliom et al. 1995). More than 1500 streams in 45 major river basins in the USA have been sampled. As part of this effort, ecological samples (diatom, fish, and invertebrate) were collected from sites consisting of 3 reaches within predetermined stream segments (site). One of the reaches at a site was designated as primary and was sampled repeatedly over multiple years. This design allowed partitioning of several sources of variability (among-site, among-year, site × year interaction, and residual variance) following a previously described framework (Urquhart et al. 1998, Larsen et al. 2001, Kincaid et al. 2004) for stream diatom, fish, and invertebrate indicators.
Our objectives were to: 1) present the partitioning of several sources of variability (among-site, among-year, site × year interaction, and residual variance) for stream diatom, fish, and invertebrate indicators calculated from NAWQA data distributed across the USA, and 2) quantify and discuss the degree to which sampling variability influenced assessments of biological condition (i.e., impaired vs unimpaired). We discuss comparisons among assemblages and between least-disturbed (reference quality) and developed (primarily dominated by agricultural or urban land use) basins to address questions such as: Does variance partition differently among assemblages? Is the influence of sampling variability on assessments of biological condition assemblage specific? Are patterns similar between least-disturbed and developed basins?
Methods
Data description
All ecological sampling (diatom, fish, and invertebrate assemblages) was done along predefined stream reaches (150–300 m or 20× stream width; Fitzpatrick et al. 1998) that were selected to be representative of a larger stream segment (Frissell et al. 1986). In this context, we selected stream segments (sites) from the NAWQA database where ecological samples had been collected from the same reach in 3 of 4 consecutive years (multiple-year samples) and had been collected from 3 separate reaches in 1 of the 3 years (multiple-reach samples). This selection process resulted in 66 sites distributed across the USA (Fig. 1), each represented by 3 multiple-year and 3 multiple-reach diatom, fish, and invertebrate samples.
Designating least-disturbed and developed sites
We used a combination of expert judgment made by local USGS biologists and riparian land-cover data and aerial imagery to make basin designations (Carlisle and Meador 2007, Carlisle and Hawkins 2008). Criteria were inconsistent across regions by necessity because of variation in reference-site quality associated with the differences in landscape alteration across the USA (Stoddard et al. 2006). Based on these criteria, we designated 20 of the 66 sites as least-disturbed (Fig. 1). The remaining 46 sites were in basins dominated by urban, agricultural, or a mixture of these land-cover types, and therefore, were designated as developed.
Fish, invertebrate, and diatom sampling
All field sampling and sample processing methods followed NAWQA Program protocols and are detailed elsewhere (Cuffney et al. 1993, Walsh and Meador 1998, Moulton et al. 2000, 2002, Charles et al. 2002). In short, biological sampling generally was conducted during low-flow periods along a predefined reach within a designated stream segment (Fitzpatrick et al. 1998). All sampling occurred during a specific seasonal index period (Moulton et al. 2002) and was done by trained USGS personnel. Fishes were collected using a combination of 2-pass electrofishing and seining as described by Moulton et al. (2002), and fish were mostly identified and counted in the field (Walsh and Meador 1998) and released back to the stream. Fish not identified in the field were retained for identification and counting in the laboratory. Invertebrates were collected from 5 discrete 0.25-m2 samples taken from riffle substrates or woody snags with a Slack sampler (Cuffney et al. 1993, Moulton et al. 2002). At each site, invertebrate collections were composited in a single sample and passed through a 500-µm mesh sieve. In the laboratory, large and rare invertebrates were removed and the remaining content was subsampled until 300 individuals were extracted, identified, and counted (Moulton et al. 2000). In each reach, diatom samples were collected from the same habitat type as invertebrate samples (riffle substrates or woody snags) with methods detailed by Porter et al. (1993) and Moulton et al. (2002). Diatoms were identified and enumerated from permanent slides at 1000× magnification by personnel at the Patrick Center of Environmental Research (Academy of Natural Sciences, Philadelphia, Pennsylvania) with methods described by Charles et al. (2002).
Invertebrate, fish, and diatom indicators
Numerous indicators are used to assess biological integrity and stream condition by analyzing various attributes of biological assemblages (Davis and Simon 1995, Karr and Chu 1999, Wright et al. 2000). Two commonly used indicators are multimetric indices (MMIs) based on the Index of Biotic Integrity (IBI; Karr et al. 1986) and measures of taxonomic completeness represented by the ratio of observed (O) taxa to the taxa expected (E) to occur at a site in the absence of environmental degradation (Hawkins 2006). Descriptions of O/E-type model construction are detailed elsewhere (Moss et al. 1987, Hawkins and Carlisle 2001, Clarke et al. 2003), as are details of MMI development (Karr et al. 1986, Barbour et al. 1999).
The specific indicators (i.e., MMI or O/E model) used to address our primary goals varied by assemblage (diatom, fish, invertebrates) and region (eastern or western defined by the 100th meridian). We applied MMI or O/E models that were previously developed for NAWQA biological assessments of invertebrates (eastern O/E: Carlisle and Meador 2007, western O/E: Carlisle and Hawkins 2008), fish (eastern O/E: Meador and Carlisle 2009, western MMI: Meador et al. 2008), and diatom (eastern and western MMI: Potapova and Carlisle 2011) assemblages. We standardized indicators based on MMIs (Meador et al. 2008, Potapova and Carlisle 2011) to common nondimensional O/E units (Hawkins 2006) by dividing each site's indicator value by the mean of regional reference-site values used to develop each MMI (not the reference sites analyzed herein). Rescaling indicator values enabled us to compare variance components results directly among assemblages.
Data analysis
Estimating variance components.—
We estimated 4 variances for each assemblage following a previously described framework (Urquhart et al. 1998, Larsen et al. 2001, Kincaid et al. 2004) based on variance components analysis (Lewis 1978, Van Sickle et al. 2005). We fitted a linear mixed-effects model in which the dependent variable was the indicator value for each assemblage and the variance components estimated were among-site, among-year, site × year interaction, and residual variance. For each assemblage, among-site variance estimates represented site-to-site variation, among-year estimates represented sources of year-to-year variability that affected all sites equally, and interaction estimates represented within-site annual variability (Kincaid et al. 2004). Residual variance estimates accounted for within-site variability from the multiple-reach samples (site replicates) plus any remaining variation unaccounted for by the other 3 components (measurement, analytical, and sample-processing error), which we collectively define hereafter as sampling variability. We justified the treatment of multiple-reach samples as replicates to estimate sampling variability based on the following reasons. First, reaches were mostly consecutive and were considered representative of a larger stream segment (segment = site), which was the statistical population being characterized. Second, reaches were often inconsistently ordered among sites, minimizing systematic analytical upstream-to-downstream differences in community structure. Last, reaches within each site were mostly sampled within a few days, minimizing temporal influences on differences in community structure.
Total variance equaled the sum of among-site, among-year, interaction, and residual variance. We calculated each component's contribution to the total variance by dividing its variance by the total variance and multiplying by 100. In addition, we expressed the magnitude of variability in indicator units by taking the square root of each estimated variance in the model (i.e., we re-expressed each variance as the standard deviation) for comparison among assemblages. We estimated variance components with restricted maximum-likelihood procedures. We completed all analyses with the lme4 library (Bates 2010) for R (version 2.10.1; R Project for Statistical Computing, Vienna, Austria).
Evaluating the influence of sampling variability on site assessments.—
We sought to evaluate how sampling variability influenced site assessments of biological condition. Therefore, we designated a threshold to separate discrete condition classes (impaired or unimpaired). Several methods have been used for setting thresholds to define levels of biological impairment (Barbour et al. 1999, Hemsley-Flint 2000, Clarke et al. 2003, Van Sickle et al. 2005, Aroviita et al. 2010). We applied the 10th-percentile indicator value of the reference-site distribution from each previously developed O/E model (Carlisle and Meador 2007, Carlisle and Hawkins 2008, Meador and Carlisle 2009) or MMI (Meador et al. 2008, Potapova and Carlisle 2011) to assess whether a site was impaired or unimpaired. Using the 10th percentile of the reference distribution of each previously developed indicator tool (and not of the least-disturbed sites evaluated herein) enabled us to compare the influence of sampling variability on site assessments among assemblages. For this comparison, we used the multiple-reach samples to represent sampling variability and calculated the proportion of sites for which multiple-reach assessments disagreed. We discuss these results among assemblages and between least-developed and developed sites as % disagreement of within-site assessments.
Results
Variance component estimates
Among-site and among-year variance.—
In most cases, among-site variance accounted for the largest portion of total variance among assemblages and basin types (Fig. 2A, B). In developed basins, variation attributable to differences among sites accounted for 64 to 79% of the total variance (Fig. 2B). In least-disturbed basins, among-site variance was the greatest source of variation for diatoms (79%) and fish (73%) but not for invertebrates (36%) (Fig. 2A). In contrast, among-year variance was negligible (0.0–0.2% of the total) regardless of basin type or assemblage (Fig. 2A, B), indicating that no annual variation affected all sites equally. This result was somewhat expected because forces that drive annual variation in biological condition probably are inconsistent across the conterminous USA.
Site × year interaction and residual variance.—
Residual variance (sampling variability) accounted for more of the total variance than interaction variance (within-site annual variability) in all cases except diatoms at least-disturbed sites, where interaction variance was slightly higher (12%) than residual variance (8%) (Fig. 3). However, partitioning patterns were assemblage specific between basin types (Fig. 3). For fish, interaction variance accounted for 5% of the total variance regardless of basin type, whereas residual variance accounted for 16 to 22% in developed and least-disturbed basins. For diatoms, interaction variance was similar between basin types (12% least-disturbed, 10% developed), whereas residual variance accounted for a less-consistent percentage of the total (8% least-disturbed, 26% developed). Partitioning patterns were similar for invertebrates at developed sites where interaction variance accounted for 12% and residual variance 19%. However, the pattern was very different for invertebrates at least-disturbed sites, where residual variance accounted for 64% and the interaction accounted for 0%.
Discordance in site-condition assessments
Discordance of site-condition assessments depended on basin type and assemblage. Disagreements among within-site assessments were 2× as common at developed sites (29–31%) than least-disturbed sites (15–16%) for diatoms and invertebrates, whereas discordance was similar among fish assessments regardless of basin type (16% least-disturbed, 17% developed).
Discussion
Predominance of among-site variance
The predominance of among-site variance at least-disturbed sites is probably the result of a variety of factors, including variability in site quality and the method used to estimate biological condition. In theory, among-site variation in biological condition should be minimal at least-disturbed sites to increase the likelihood of detecting the effects of anthropogenic influence. For indicators of biological condition that are scaled by an expectation derived from reference sites (e.g., O/E, MMIs), great care generally is taken to maximize the precision with which the expectation is estimated. The precision of estimates of expected conditions often is improved by accounting for site-specific environmental settings (e.g., site and basin characteristics relatively insensitive to human activities; Hawkins 2006). Accounting for site-specific factors is the rationale for O/E models (Moss et al. 1987), which use environmental features at each site to estimate site-specific expectations of assemblage composition. Similar approaches have improved precision of estimated values of algal metrics (Cao et al. 2007).
We used only O/E models to assess invertebrates, a combination of O/E models and MMIs to assess fish assemblages, and only MMIs to assess diatoms. Among-site variance at least-disturbed sites was lower for invertebrate than for fish or diatom indicators, probably because the O/E models for invertebrates accounted for the environmental setting of each site and were constructed using common taxa (i.e., capture probability >0.5). In contrast, diatom indicator values partitioned the most among-site variance at least-disturbed sites, probably because site-specific factors were only partially accounted for in the ecoregional stratification scheme used to develop the MMIs (Cao et al. 2007). Fish indicator values partitioned among-site variance intermediately to invertebrates and diatoms, perhaps because our analysis of fish included both O/E models (Meador and Carlisle 2009) and MMIs (Meador et al. 2008). We suspect that among-site variance at least-disturbed sites could be reduced by adjusting the MMIs for factors that influence fish (Angermeier and Winston 1999) and diatoms (Stevenson 1997) at scales finer than ecoregion.
The predominance of among-site variation seems inevitable at developed sites given the variable degree and type of anthropogenic disturbance inherent to the wide variety of natural settings represented in our study. These among-site differences had a stronger influence on variance partitioning than other sources of variation regardless of assemblage. Comparing sites at smaller spatial scales, such as within environmentally homogenous ecoregions, probably would yield smaller among-site effects because of greater similarity among sites. More investigations are needed to evaluate how the spatial scale of an assessment influences our ability to separate anthropogenic disturbance from natural environmental factors.
Invertebrate residual variance at least-disturbed sites
Residual variance (sampling variability) accounted for most of the total variance (64%) for invertebrates at least-disturbed sites. In this case, among-year and interaction variance did not account for any of the total variance, so the variance was partitioned between 2 (sampling and site) of the 4 components. The relative residual variance was high, but the absolute magnitude of this variance (SD: 0.11; Fig. 4A) was comparable to all other cases at least-disturbed (SD range: 0.09–0.14) and developed sites (SD range: 0.11–0.16; Fig. 4B). We suggest that unexplained sampling variability (sensu Van Sickle et al. 2005) is largely what remained after the invertebrate O/E models (Carlisle and Meador 2007, Carlisle and Hawkins 2008) had accounted for environmental differences among least-disturbed sites. We also suspect that the diatom and fish MMIs and O/E models collectively did not account as well for environmental differences among least-disturbed sites as the invertebrate O/E models did.
Discordance in site condition
To save costs associated with sample collection and processing (Resh et al. 1995), stream assessments are rarely replicated (i.e., multiple reaches or multiple collections/reach). A sample collected from a representative reach (Fitzpatrick et al. 1998, Barbour et al. 1999) is assumed to be representative of community attributes along a larger stream segment (Rabeni et al. 1999, Gregg and Stednick 2000, Meador and McIntyre 2003), even though others have shown that this assumption may be incorrect (Lenz and Rheaume 2000, Brigham and Sadorf 2001, Gebler 2004). Nevertheless, most segment-scale assessments are made from a single sample collected from a representative reach, and most investigators rarely report estimates of uncertainty.
Our results were comparable to those of others who have evaluated % disagreement in assessments from multiple-reach or paired-type samples with invertebrate indicators. Assessments from 21 reference sites in the eastern USA showed a 16% disagreement (Carlisle and Meador 2007), which was comparable to our larger-scale findings (15%). Percent disagreement of assessments from repeated-sample pairs ranged between 15 and 23% for indicators used by the Montana Department of Environmental Quality (Stribling et al. 2008). Neither group compared % disagreement between least-disturbed and developed sites (Carlisle and Meador 2007, Stribling et al. 2008). If we average our findings for invertebrates across least-disturbed and developed sites, our results are comparable (22% disagreement) to those reported by Stribling et al. (2008). In our study, most disagreements in assessments occurred when indicator values were near the impairment threshold (i.e., 10th-percentile value of the reference distribution for each indicator), which suggests that the result was an artifact created by the choice of thresholds relative to the distribution of indicator values in these data. Unfortunately, this artifact is often unavoidable and is inherent to the distribution of indicator values. In cases where assessments are made based on an indicator value that is near the threshold, information from additional sampling is needed to understand the uncertainty of the assessment (Stribling et al. 2008).
Conclusions
Our results showed general patterns of variance partitioning among diatom, fish, and invertebrate indicators. In most cases, among-site variance dominated the relative contribution to the total variance, residual variance (sampling variance) accounted for more variability than the site × year interaction (within-site annual variance), and among-year variance was negligible. Departures from this general result appeared dependent on the ability of an indicator to account for differences among least-disturbed sites and were specific to certain basin types and assemblages. We also found that data from a single reach could potentially misclassify segment-scale biological condition nearly ⅓ of the time when using invertebrate and diatom indicators in developed basins. This result was strongly influenced by the distribution of indicator values from developed basins relative to the predetermined impairment thresholds we used. Collectively, our results suggest that variance partitioning and discordance in assessments can be assemblage- and basin-type specific. However, more assemblage-specific research is needed to account better for among-site differences inherent to large-scale assessments.
Acknowledgements
We thank the US Geological Survey personnel who spent countless hours collecting data as part of the NAWQA Program. Also, we thank James B. Stribling, Wade Bryant, and 4 anonymous referees for helpful suggestions that improved the quality of this manuscript. This study was part of an ecological synthesis of the data collected as part of the NAWQA Program of the USGS.