Stream metabolism is used to characterize the allochthonous and autochthonous basis of stream foodweb production. The metabolic rates of respiration and gross primary production often are estimated from changes in dissolved O_{2} concentration in the stream over time. An upstream–downstream O_{2} accounting method (2-station) is used commonly to estimate metabolic rates in a defined length of stream channel. Various approaches to measuring and analyzing diel O_{2} trends have been used, but a detailed comparison of different approaches (e.g., required reach length, method of measuring aeration rate [k], and use of temperature-corrected metabolic rates) is needed. We measured O_{2} upstream and downstream of various reaches in Kings Creek, Kansas. We found that 20 m was the approximate minimum reach length required to detect a significant change in O_{2}, a result that matched the prediction of a calculation method to determine minimum reach length. We assessed the ability of models based on 2-station diel O_{2} data and k measurements in various streams around Manhattan, Kansas, to predict k accurately, and we tested the importance of accounting for temperature effects on metabolic rates. We measured gas exchange directly with an inert gas and used a tracer dye to account for dilution and to measure velocity and discharge. Modeled k was significantly correlated with measured k (Kendall's τ, *p* < 0.001; regression adjusted *R*^{2} = 0.70), but 19 published equations for estimating k generally provided poor estimates of measured k (only 6 of 19 equations were significantly correlated). Temperature correction of metabolic rates allowed us to account for increases in nighttime O_{2}, and temperature-corrected metabolic rates fit the data somewhat better than uncorrected estimates. Use of temperature-correction estimates could facilitate cross-site comparisons of metabolism.

Metabolic activity in streams is driven by autochthonous production and use of allochthonous inputs. Stream metabolism indicates total biotic activity and interacts with water quality via basic ecosystem properties, such as nutrient uptake rates, C flux into the food web, and trophic status (heterotrophic and autotrophic state; Dodds 2007). Diel trends in dissolved O_{2} have been used to measure whole-system metabolism since Odum (1956) introduced the method. Gross primary production (GPP), community respiration (R), and aeration rates (k) drive changes in O_{2} concentration over time. Stream metabolic rates are estimated by measuring how each factor increases or decreases O_{2} over distance or time. Net ecosystem production (NEP) is the sum of GPP and R, and NEP, GPP, and R are fundamental indicators of organism-mediated C gain or loss in an ecosystem.

Researchers commonly use either the 1-station or the 2-station method for calculating whole-stream metabolism. The 1-station method is based on O_{2} measurements from 1 point in the stream, and the 2-station method is based on measurements at an upstream and a downstream point. Two-station methods are useful because they allow estimation of metabolism in a defined reach of stream (Bott 2006). However, researchers debate the best procedures for measuring and calculating metabolism from 2-station methods and when 2-station methods should be used (e.g., Parkhurst and Pomeroy 1972, Genereux and Hemond 1992). Procedural issues include the best way to estimate rates of gas exchange between the water column and the atmosphere, the influence of temperature on metabolic rates, and the best distance across which to apply the 2-station method.

The length of stream required to detect metabolic rates when using the 2-station method is an important consideration in application of the method and has only received modest attention. Reichert et al. (2009) defined the reach length required between sampling stations as 0.4v/k, where v is velocity and k is the aeration rate. We needed to know minimum reach length to assess responses in animal-exclusion experiments in riffle–pool segments (Bertrand et al. 2009), and we identified the minimum reach length directly instead of relying on estimates of k and v. In this paper, we verify the analysis of Reichert et al. (2009), which could be useful for other types of experiments requiring reach-specific 2-station estimates of metabolism.

k must be known to make accurate estimates of metabolic rates. Modeling k or using simple equations to estimate k would be preferable to direct measurement because of its difficulty and cost. Such information is particularly important for 2-station metabolism methods. Numerous authors have estimated k based on physical properties of the stream channel (see Parker and Gay 1987 for 19 empirical equations), some have modeled k (e.g., Atkinson et al. 2008, Dodds et al. 2008, Holtgrieve et al. 2010), and others have measured k directly by adding a conservative hydrologic tracer and an inert gaseous tracer to stream channels (e.g., Grant and Skavroneck 1980, Genereux and Hemond 1990, Wanninkhof et al. 1990). Morse et al. (2007) related turbulence to sound level to estimate k. A few investigators have compared methods for estimating k (Kosinski 1984, Young and Huryn 1999, Aristegi et al. 2009), and modeled and measured k have been compared in 1 river (Dodds et al. 2008). However, we are not aware of stream studies in which k-values from modeling (nonlinear curve-fitting method) have been compared to those obtained from direct measurement and empirical equations across multiple small streams.

Temperature influences metabolic rates (Gulliver and Stefan 1984, Megard et al. 1984, Ambrose et al. 1988) and k (Elmore and West 1961, Bott 2006). Some calculation methods account for diel variation in temperature (Holtgrieve et al. 2010), but others do not (van de Bogert et al. 2007). We observed that O_{2} concentrations increased over night as stream temperature decreased in some systems, particularly in streams that have significant temperature swings between night and day. Decreasing respiration throughout the night, probably driven by decreasing temperature, is one explanation for the O_{2} increases. Lower nighttime temperatures would increase O_{2} saturation but decrease the rate of aeration. Thus, we attempted to parse out these temperature effects and to explore the influence of temperature correction of metabolic rates on calculated rates of O_{2} flux.

We focused on a 2-station model for metabolism with measured k-values. We investigated the following questions: 1) What is the shortest reach length that can be used to estimate metabolism with a 2-station method, and does this match the predictions of Reichert et al. (2009)? 2) Is it possible to model k with sufficient accuracy that measured values of k are not necessary? 3) Does use of temperature-corrected estimates of metabolic rates vs uncorrected estimates affect modeling outcomes?

## Methods

*Site selection*

We used study reaches in Kings Creek, which is situated within Konza Prairie Biological Station (KPBS). KPBS is in the northern part of the Flint Hills region near Manhattan, Kansas. Detailed descriptions of KPBS sites were published by Gray et al. (1998) and Gray and Dodds (1998). We conducted the reach-length study in 2 subwatersheds, N04D and AL. Subwatershed N04D has an open canopy or shrub cover, is continuously grazed by the native American bison (Bos bison), and is burned every 4 y. Site AL is in the lower reaches of Kings Creek in the gallery forest. N concentration is higher in N04D than AL (Kemp and Dodds 2001).

O'Brien et al. (2007) published a detailed description of the 6 streams from watersheds in native prairie or with various degrees of urbanization or agriculture to address questions 2 and 3. These 6 streams were a subset of 72 streams used in the Lotic Intersite Nitrogen Experiment II (LINX) and in a broad comparison of stream metabolism measures (Bernot et al. 2010). We used 1^{st}- to 3^{rd}-order streams of similar size and slope (Table 1) that varied in NO_{3}^{−} content (0.9–21,000 µg NO_{3}-N/L) and canopy cover (0 to >70% shaded). The 3 streams representing prairie/reference streams were N04D (bison grazed), Shane Creek (ungrazed), and Natalie's Creek (lightly grazed by cattle, 20 km northwest of KPBS). Ag North and Swine Creek had a small amount of urban area high in the watersheds. Ag North had extensive row-crop agriculture, and Swine Creek had row crops and animal-holding facilities. Ag North and Swine Creek had open canopies and high NO_{3}^{−} concentrations (35% and 21,000 µg NO_{3}-N/L, respectively). Campus Creek was an urban stream with a tree or shrub canopy cover along most of the study reach.

## Table 1.

Site characteristics for streams used in calculating aeration and metabolism. T = temperature, w = width, d = depth, v = velocity, and Q = discharge. * indicates sites used only for comparison of measured and modeled aeration.

*Field sampling and laboratory analyses*

In July 2005, we took water samples at the tops and bottoms of numerous reaches in N04D and AL at midday and around midnight to measure small-scale upstream–downstream changes in O_{2} and to test minimum reach length needed to estimate metabolism. We chose these times to coincide with expected maximal rates of GPP and R. Both sites contained 8 contiguous pool-and-riffle combinations ranging from 7 to 77 m in length. These reaches were generally cobble-bottomed and had slopes of ∼3 to 3.5%. We used the Winkler method with replication because the precision and accuracy of the method (APHA 1995) is better than those of typical O_{2} electrodes (based on technical specifications; data not shown). We adapted standard procedures for 300-mL biological O_{2} demand (BOD) bottles to 60-mL BOD bottles with high precision titration procedures. We filled 6 replicate bottles at each site and added reagents. We titrated the samples within 6 h of sampling (APHA 1995) to measure O_{2}.

We used the 2-station upstream–downstream method (Marzolf et al. 1994, Young and Huryn 1998) to estimate metabolic rates at baseflow. We measured 2-station diel O_{2} curves once in each stream in May or June 2003–2005 and multiple times in 2 Kings Creek watersheds, N04D and K02A (ungrazed and burned every 2 y), in 2006 and 2007 (sites listed in Table 1). We measured dissolved O_{2} and temperature with YSI logging data sondes (Yellow Springs Instruments, Yellow Springs, Ohio) set to record values every 10 min. We calibrated the sondes together at a single stream station in the field immediately before deployment. We immersed sondes completely for 30 min to bring all sonde bodies to the same temperature as the water and each other because calibration depends on the temperatures of the sensor and sonde enclosure. We calibrated all sondes to water-saturated air and allowed them to log for 30 min. If sondes were not reading within 3% of each other, we repeated calibration until all sondes were within 3% before deployment. At the end of deployment, we placed the sondes together at 1 station for 30 min. If the sondes did not read the same value post-deployment, we corrected the data assuming a linear drift in calibration over the period of measurement, except in cases of severe probe malfunction, in which case we discarded the data.

We measured light with a Li-Cor LI-1000 datalogger equipped with a photosynthetically active radiation (PAR) sensor (Li-Cor, Lincoln, Nebraska). We logged light measurements every hour at the sites studied in 2003–2005 and every 10 min at the sites studied in 2006–2007. We placed the PAR sensor on a level, elevated object in an area with open canopy next to the stream in full sunlight to measure daily variation in light availability for primary producers. The model requires relative PAR over the day, so we did not have to correct for canopy cover.

We assumed that physical measurements of gas-exchange rates in the field would yield the best estimate of the O_{2} k value. At all streams, we measured k under similar discharge conditions and in the same reaches where we made diel O_{2} measurements. We used a relatively inert gas (propane or acetylene) and a relatively conservative tracer dye (rhodamine WT) or ion (Br^{−}). We made subsequent measurements of k at a subset of the sites using the inert gas SF_{6} and obtained comparable rates. Thus, microbial consumption of the propane or acetylene was not significant over our typical experimental times (data not shown). We dissolved dye and ion solutions in water purified by reverse osmosis. We used an FMI laboratory pump (model QBG; Fluid Metering, Inc., Syosset, New York) to release solutions at a consistent rate as we released the gas into the stream through an airstone at a constant rate controlled by a 2-stage regulator. We positioned the airstone and the tube releasing the dye or ion inside a T-shaped polyvinyl chloride (PVC) tube placed upstream of the 1^{st} sampling point to allow gas and tracer dye or ion to mix completely with stream water before the 1^{st} sampling point (Dodds et al. 2008).

We measured rhodamine fluorescence in the field with a handheld Aquafluor fluorometer (model 8000-010; Turner Designs, Sunnyvale, California) and Br^{−} in the field with a handheld, ion-specific probe. Once measurements at the downstream station had <1% change/min (reached plateau), we assumed complete mixing and began sampling for dissolved gases and dilution of the tracer. At 5 of the LINX streams (Ag North, N04D, Campus, Swine, and Shane Creeks), we measured gas replicates at varying points along the stream reach. At Natalie's Creek and the N04D and K02A sites in 2006–2007, we measured gas in replicate samples at the top and bottom of the reach.

At each gas sampling point, we slowly drew 40 mL of water (so cavitation did not cause degassing of the solution) into a 60-mL syringe with a 3-way stopcock attached. We drew 20 mL of He (gas chromatography carrier gas) into each syringe and shook the syringe for 3 min to allow the headspace to come to equilibrium with the dissolved gas in the syringe. Then we injected the 20 mL of gas into an evacuated vial (Vacutainer®, 15 mL). We analyzed the remaining solution for tracer ion or dye concentration to account for dilution on a sample-specific basis. We discarded samples in vials that did not maintain vacuum. We analyzed gas samples within 24 h with a gas chromatograph (Shimadzu GC-14A; Shimadzu, Columbia, Maryland) equipped with a flame-ionization detector. We used the difference in average gas-peak area from points along the reach or from upstream to downstream (depending on the site) to calculate k.

We calculated standard error for estimates of k based on the measured gas values depending on how gas samples were collected. For streams in which we measured gas concentrations longitudinally, we obtained standard error from regression analysis based on the error of the slope of the log(*x*)-transformed data. For streams in which we measured gas at the top and bottom of the reach, we used a pooled Student's *t*-test to test for significant differences from upstream to downstream, and we calculated the pooled standard error of the differences between the upstream and downstream gas replicates.

We used length and mean width in all reaches to calculate discharge and travel time. We made width measurements every few meters along the length of each reach and took 5 depth measurements across each width transect.

*Modeling and calculation*

We approached the question of minimum reach length by assuming that some minimum distance exists below which a difference in O_{2} will be undetectable, but that a difference might be detectable at longer distances. Thus, this minimum distance should be a threshold, and highly significant changes should be detected only above this threshold. We used an accurate and precise replicated titrimetric technique to ensure the best possible chance of detecting significant differences between upstream and downstream O_{2} concentrations. We used *p*-values from Student's *t*-tests with Bonferroni-adjusted α = 0.0006 to assess whether O_{2} concentrations differed between upstream and downstream sampling points for each study reach and each sampling time. Then we used a 2-dimensional Kolmogorov–Smirnov test (a nonparametric method for identifying breakpoints in variance for bivariate data; Garvey et al. 1998) on the *p*-values to estimate the threshold distance below which significant differences were not detectable. We also applied the 0.4v/k-equation from Reichert et al. (2009) to the v- and k-values from reaches in N04D and AL (8 contiguous pool-and-riffle reaches) to calculate minimum reach length requirements and compared these results to the value obtained from the Kolmogorov–Smirnov test.

We used physical measurements and change in O_{2} over time between stations to parameterize a model for estimating k and the sensitivity of metabolic rates to temperature correction. Our modeling approach was to calculate O_{2} every 10 min as influenced by rates of GPP, R, and aeration. We used Solver in Microsoft Excel (version 2007; Microsoft Corporation, Redmond, Washington) to find the best fit of our modeled O_{2} to observed O_{2}. Solver is a minimization procedure that uses a Newton search method (precision = 0.0000000001, convergence = 1%, and tolerance = 0.00000001). We used Solver to minimize the sum of squares of error (SSE) between modeled and measured values by changing the basic rates (GPP, R, and k) that drove the model.

We obtained the diel temperature and O_{2} data (10-min temporal resolution) needed to run the model from the sondes. Additional data required for the model included reach characteristics (length, depth, width, average v, and discharge), barometric pressure, and light. The variables and equations used to construct the model are provided in Tables 2 and 3, respectively. The model spreadsheet is available from the authors upon request. Equations used in this model are similar to those in Holtgrieve et al. (2010).

The temperature and O_{2} values were offset by the calculated travel time (Table 3, Eq. 3). We used the equation published by Elmore and West (1961) as modified by Bott (2006) to correct k for temperature (Table 3, Eq. 4). Elmore and West (1961) used 1.0241 and Bott (2006) used 1.024 as the temperature coefficient. We used a relationship adapted from Parkhill and Gulliver (1999) to correct R for temperature (R_{T}; Table 3, Eq. 7). In model runs without correction for temperature, we used a single value of R_{T}. We modeled GPP with a hyperbolic tangent model developed by Jassby and Platt (1976) to link photosynthesis and irradiance (Table 3, Eq. 8). Photoinhibition generally is not observed in intact periphyton assemblages (Dodds et al. 1999). We did not model photoinhibition because our model was insensitive to maximum photosynthesis (P_{max}) with our data (results not shown). In model runs with correction for temperature, we used an equation published by Parkhill and Gulliver (1999) to correct Pmax for temperature (Pmax_{T}; Table 3, Eq. 8). In model runs without correction for temperature, we used single values of P_{max} and the initial slope of the photosynthesis–irradiance curve (α).

## Table 2.

Variables used for calculations and in the model. Subscript mod = modeled, subscript meas = measured values.

## Table 3.

Equations used in the model along with a reference for the equation if taken from the literature. See Table 2 for abbreviations. Subscripts d and u refer to downstream and upstream, respectively; avg = average.

Any modeling method using nonlinear curve-fitting approaches is somewhat subjective. Iterative numerical methods can find locally stable solutions that are not globally optimal. Thus, we compared graphs of measured vs modeled values to evaluate the fit and to ensure the SSE (Table 3, Eq. 11) was minimized. When fitted curves did not match data, our first step was to rerun the model with altered initial parameters. If this step did not correct the mismatch of data or generated nonsense results, we examined the original O_{2} data for anomalies that could have thwarted modeling efforts. For example, animals occasionally entered sonde housings and caused drastic short term dips in O_{2}. In obvious cases, we corrected the diel O_{2} trace. In some cases, we discarded the entire run because of sonde malfunction. Last, we compared Solver results to results from another data set (obtained from a separate sonde deployment that resulted in similar O_{2} and temperature values) calculated with an R rate that provided solutions within 10% of rates calculated with our model in Excel to test for bad solutions that can occur when some functions are analyzed in Solver (McCullough and Heiser 2008).

We ran 3 general model scenarios. In the first scenario, GPP, R, and k were corrected for temperature and Solver changed Pmax_{T}, α_{T}, k, and R_{T} to minimize the SSE between measured and modeled O_{2} values. Then measured vs modeled values of k were compared to assess the model's ability to predict k while using R_{T}, GPP (Pmax_{T}, α_{T}), and k to fit the observed data. We assumed that measured k was more accurate than modeled k, and thus, all subsequent model scenarios were run with measured k. We compared k between 2 scenarios, one with temperature correction of GPP, R, and measured k (fully temperature-corrected model) and another in which only measured k was corrected for temperature (k-only temperature-corrected model) with a paired Student's *t*-test (paired 2-sample for means).

We compared measured k to the modeled k and to k calculated from the 19 empirical equations collected by Parker and Gay (1987) with a nonparametric Kendall's τ correlation analysis (STATISTICA 6.0; StatSoft, Tulsa, Oklahoma). The empirical equation that was significantly correlated with the measured k value and had the highest *R*^{2} from regression analysis was deemed the best empirical equation. We corrected all empirical equations and measured values to values expressed at 20°C (Parker and Gay 1987) for this comparison.

## Results

*Analysis of minimum reach length and characteristics*

We detected a significant breakpoint in the variance associated with *p*-values at a reach length of 20 m (Fig. 1). Below this length, *p*-values were generally >0.006, whereas above this length, *p*-values were variable, a result indicating that O_{2} may or may not differ between upstream and downstream points depending on the relative productivity in the reach. These data suggest that 20 m is the minimum distance required to detect a difference in O_{2} given the metabolic rates and k in this stream and our measurement methods. When we applied the equation from Reichert et al. (2009) to our data, the median predicted reach length across sites was 25 m, which is comparable to the 20-m threshold identified with the Kolmogorov–Smirnov test.

We identified the locations and times of maximum change in O_{2} concentration in each reach from midday and midnight Winkler measurements at the Kings Creek sites (watershed N04D and AL; Fig. 2). O_{2} was below saturation in the 1^{st} pool and had increased by the 2^{nd} pool. At both sites, O_{2} was higher at the most downstream than at the most upstream point. At both sites, the reaches were fed by low-O_{2} groundwater upstream of the uppermost sampling point. The groundwater effect was apparent in the increase in O_{2} downstream even at night, as the subsaturated groundwater equilibrated with the atmosphere. O_{2} increased more in a downstream direction during the day than at night because of O_{2} generation by photosynthesis.

*Analysis of methods to calculate aeration*

Sensitivity analyses with k fixed (uncorrected) showed that R and GPP were closely correlated with k (data not shown). That is, if k was increased by some factor, the model predicted that GPP and R were increased by the same factor. Therefore, we always corrected k for temperature.

We measured k more than once at some sites (N04D and K02A), but values were not correlated among years, so we treated these values as independent observations. Temperature-corrected modeled k and temperature-corrected measured k were correlated (Kendall's τ, *p* < 0.001; adjusted *R*^{2} = 0.70; Fig. 3A) across all 16 sampling points. Thus, modeled k could be used to predict measured k.

*Temperature correction of metabolic rate*

Temperature correction affected model predictions. Predicted values of R (*p* < 0.02) and GPP (*p* < 0.03) differed significantly between models in which R, GPP, and k were temperature-corrected (fully temperature-corrected) and those in which only k was temperature-corrected (k-only temperature-corrected). After Bonferroni adjustment of α (0.05/2 = 0.025), predicted R differed significantly and predicted GPP differed marginally (0.10 > *p* > 0.05) between the 2 model scenarios. In 4 of the 6 cases, modeled SSEs were lower by 0.5 to 12% in the fully temperature-corrected than in the k-only temperature-corrected models. These results suggest that models fit data better when GPP and R were temperature-corrected than when they were not. Fully temperature-corrected models also explained the observed nighttime increase in O_{2}, but in the case of nighttime data, full temperature correction did not strongly influence the SSE (data not shown).

k-only temperature-corrected models yielded estimates for R that were, on average, 10% lower (less negative) than estimates from fully temperature-corrected models. The difference in estimates was smallest in Shane (3% lower) and largest in N04D (18% lower). k-only temperature-corrected models yielded estimates of GPP that were, on average, 14% lower than estimates from fully temperature-corrected models. The difference in estimates of GPP was smallest in Campus (1% lower) and highest in Natalie's Creek (50% lower).

Estimates of daily NEP from both model scenarios indicated that 3 streams were net heterotrophic and 3 streams were net autotrophic. The heterotrophic status of the streams did not change between the 2 model scenarios, but the magnitude of the metabolic rates differed. In general, estimates of R were greater (more negative) with full temperature correction than with k-only temperature correction (Table 4). GPP and R estimates were influenced similarly by temperature correction, so the temperature correction did not have as great an influence on NEP as on R or GPP (NEP is the sum of GPP and R, and the differences offset each other).

## Table 4.

Daily metabolism results (g O_{2} m^{−2} d^{−1}) from the fully temperature-corrected model (measured aeration [k] and temperature-corrected aeration, respiration [R], gross primary production [GPP]) and the k-only temperature-corrected model for 6 Kansas streams. Ag North, N04D, and Swine were net autotrophic. NEP = net ecosystem production.

## Discussion

*How typical were the streams we studied?*

Whole-stream metabolism has been measured in various ecosystems and in reaches of varying lengths. In a broad study of N metabolism, Mulholland et al. (2008) characterized 72 streams, and discharge ranged from 0.01 to 16.08 m^{3}/min. The discharge of our streams was within this range. The slope of our streams also fell within the range of slopes for the 72 streams (Bernot et al. 2010). Metabolic rates in our streams were mostly within the upper 75^{th} and lower 25^{th} percentiles of the range of values reported by Young et al. (2008), although by their criteria, values in some of our reference streams (N04D, Shane and Natalie's Creeks) would result in classification of these streams as being in “satisfactory river health” rather than in reference condition. Thus, the streams used in our study appear to be typical of other low-order streams where metabolism has been measured.

*Temperature correction and other aspects of modeling*

A comparison of modeled and actual results allows us to discuss several features of application of our model. We use Ag North and Natalie's Creek as examples and compare O_{2} change (ΔO_{2}) driven by R, GPP, and k for both model scenarios (Fig. 4A–C).

In the fully temperature-corrected model for Ag North, ΔO_{2} R was −2.7 mg O_{2} L^{−1} reach length^{−1} at night and −4.0 mg O_{2} L^{−1} reach length^{−1} during the day, whereas in the k-only temperature-corrected model, ΔO_{2} R was constant at −4.7 (mg O_{2} L^{−1} reach length^{−1}) (Fig. 4B). In the fully temperature-corrected model for Ag North, ΔO_{2} GPP ranged from 0 to 9.4 mg O_{2} L^{−1} reach length^{−1} (Fig. 4A). However, in the k-only temperature-corrected model, ΔO_{2} GPP was more variable and ranged from 0 to 12.0 mg O_{2} L^{−1} reach length^{−1} because temperature-corrected R was higher than uncorrected R. In the models for Ag North, ΔO_{2} k showed similar patterns to ΔO_{2} GPP, and values were more variable with the k-only temperature-corrected model (−7.5–4.5 mg O_{2} L^{−1} reach length^{−1}) than the fully temperature-corrected model (−4.0–2.4 mg O_{2} L^{−1} reach length^{−1}) (Fig. 4C). In both models, maximum ΔO_{2} _{k} occurred at night when temperatures were the lowest. For Ag North, measured and modeled ΔO_{2} were in close agreement (Fig. 5). The saw-blade pattern in the modeled curve was caused by the low (hourly) temporal resolution of the light measurements and illustrates that the modeled O_{2} probably responded more quickly to changes in light than measured O_{2}.

In Natalie's Creek (one of the more variable data sets) the model captured the major trends in the measured values (Fig. 6). An animal probably entered the probe enclosure at about 0800 h on the day 1 and caused a downward spike in O_{2}. In Ag North, O_{2} did not increase at night, whereas O_{2} did increase during the night at downstream sampling points in Natalie's Creek as temperature decreased (Fig. 7A). The fully temperature-corrected model at Natalie's Creek showed that both k and R decreased during the night (Fig. 7B). Overall, k decreased at night because R decreased and the system was not forced as far from saturation.

*The 2-station method and minimum reach length*

We focused on the 2-station method because metabolism is measured in a physically defined reach. This method is strongly recommended when O_{2} concentration is strongly influenced by upstream factors, e.g., groundwater input immediately above the reach where metabolism is measured. If the O_{2} saturation of groundwater is known, then the influence of groundwater can be corrected with the 2-station method (Hall and Tank 2005). Such correction cannot be made with the 1-station method because the upstream influence on metabolic rates is poorly defined and the importance of groundwater influences is difficult to assess.

Knowledge of the minimum reach length required for detecting differences in O_{2} is helpful when using the 2-station method. In the extreme case, k can be too high for any metabolic measurement regardless of reach length. If k and water replacement (mean v) are low, substantially shorter reaches might yield significant results. Our results suggest that a reach <20 m long probably cannot be used to assess whole-stream metabolism in streams similar to our study streams. Our data matched the predictions of Reichert et al. (2009) suggesting that their approach to determining minimum reach length could be useful for future studies.

*Estimation of k*

k can be measured, modeled, or calculated empirically from published equations and calculation methods (energy dissipation, surface exchange, nighttime regression). We did not use the nighttime regression method (Hornberger and Kelly 1975), which is a common method, because nighttime regression is based on the assumption that nighttime R is constant throughout a 24-h period. This assumption was not met at all of our sites because R was temperature dependent in several reaches.

Physical measurement of k in the field is technically difficult. Modeling or calculating k would be simpler and more cost effective, and we have shown that modeling k could be a viable approach. We regard direct measurement of k by the 2-station method in the reach used for measuring metabolism as the best option for obtaining estimates of k because this method directly measures gas flux rates as water moves between the 2 defined points in the stream. However, such measurement comes with its own methodological errors (e.g., horizontal error bars in Fig. 3A, B). Nevertheless, modeling k in conjunction with R and GPP is probably less accurate than measuring it directly because the 3 parameters can covary and give similar results. For example, if R and GPP rates are doubled, then doubling k will lead to similar diel patterns of O_{2}. If measuring k is not possible, our data suggest that the next best way to obtain k would be to model it with a nonlinear curve-fitting method. The least reliable option for obtaining k for streams similar to ours would be to use empirical equations.

We compared the measured k-values for the 6 LINX Kansas streams to modeled k-values and k-values calculated from 19 published empirical aeration equations (Table 5). The measured k-values were significantly correlated with estimates of k obtained with 6 equations, and 5 of these correlations had *p* = 0.039. The equation published by Parkhurst and Pomeroy (1972) incorporated v, channel slope, stream depth, and the Froude number. It yielded estimates of k that had the greatest correlation coefficient and lowest *p*-value (*p* = 0.015) when compared to measured k, but the regression was a poor fit (*R*^{2} = 0.172; data not shown). Tsivoglou and Neal (1976) incorporated travel time and the difference in elevation from the top to bottom of the reach into their equation. This equation yielded k-values that gave the highest adjusted *R*^{2} (0.59 and 0.72, respectively, for modeled and calculated k regressed against measured k; Fig. 3A, B). The predicted slope of the relationship between modeled and measured k was not significantly different from 1 (*p* > 0.05), whereas the slope of the calculated vs measured k was significantly <1 (*p* < 0.05). Root mean square error was 1.8× greater for the comparison of measured k with calculated than with modeled k. Last, both modeled and calculated k underestimated measured k (the mean error was negative), but the mean error was 3.2× greater for measured vs calculated as opposed to measured vs modeled k. This analysis suggests greater bias is introduced when using calculated than when using modeled values to estimate k. Based on correlation and regression analysis, the aeration equation from Tsivoglou and Neal (1976) would be the best of the 19 empirical aeration equations for estimating k in streams similar to our study streams. Even though the measured k values were greater than the values calculated from the Tsivoglou and Neal (1976) equation, the regression was significant. The Parkhurst and Pomeroy (1972) and Tsivoglou and Neal (1976) equations both included stream slope as a component and both fit moderately well (Table 5).

## Table 5.

Kendall's τ correlation analysis of measured aeration values for 6 Kansas streams compared to 19 empirical equations for aeration with significant results having *p* < 0.05 and denoted by an asterisk (*).

*Potential sources of error and variations among methods*

Errors in modeling or measuring k can contribute to but are not the only sources of error in calculating metabolic rates. McCutchan et al. (1998) offered a more detailed discussion on additional sources of error (e.g., measurement of travel time and instrument calibration). Error can be estimated in modeling (Holtgrieve et al. 2010), but the error structure is poorly defined because unknown fractions of R are directly and indirectly related to GPP (e.g., primary producers respire, and GPP provides C for heterotrophic R). O_{2} isotopes can be used to parse out heterotrophic R (Tobias et al. 2007), but not all investigators have access to these methods.

Methods for calculating or modeling metabolic rates vary among researchers, and this variability is another potential source of difference among studies. For example, we used an Arrhenius coefficient of 1.045 to account for the effect of temperature on R (Gulliver and Stefan 1984, Ambrose et al. 1988), whereas Naegeli and Uehlinger (1997) used an Arrhenius coefficient of 1.07. This temperature coefficient could differ, and we have no way to know which one is correct for whole-system metabolism or whether coefficients vary among streams with different microbial assemblages or even within a stream across seasons. The theory of temperature correction of biological rates is still debated (del Rio 2008), and it is not known how metabolism summed across entire ecosystems responds to temperature in streams in different biomes. Isotopic work indicates that not all variation in R can be accounted for by variation in temperature (Tobias et al. 2007). We suggest that diurnal temperature dependence of whole-stream metabolism could be a fruitful area for future research.

Correcting k for temperature is extremely important because k increases by 2 to 4%/1°C change within a temperature range of 5 to 30°C (Owens 1974). In streams like ours, the maximum diel temperature variation was ∼9°C, which could lead to an 18 to 36% variation in k over a diel period.

Temperature correction also is important because it accounts for the observation that O_{2} increases at night in some streams and because it allows cross-site comparison of metabolic rates. R and O_{2} concentrations can change overnight (Jones et al. 1995, Tobias et al. 2007) because of variation in biological activity and temperature. Some researchers corrected R for temperature (e.g., Parkhill and Gulliver 1999), and some did not (e.g., Marzolf et al. 1994, Mulholland et al. 2001, Bott 2006, Bernot et al. 2010). We found that including a term to account for temperature-driven diel swings in metabolic rates improved our model fits.

*Conclusions*

Knowing the minimum reach length needed to detect a significant difference in O_{2} is useful information that will aid in the design of future experiments. The 20-m minimum length for our study is comparable to the median of 25 m calculated with the equation by Reichert et al. (2009) for the reaches in our study. Our method of testing for a difference in upstream and downstream O_{2} does not require an estimate of k.

Measuring k enables investigators to avoid error that comes with modeling k (although measuring k has its own sources of error), and could ultimately yield the most accurate estimates of metabolic rates. However, a nonlinear curve-fitting model can provide reasonable estimates of k if direct measurement is not possible. Temperature can influence R, GPP, and k, so correcting these parameters for temperature and including the influence of light on GPP in the model allows stronger cross-site comparisons of metabolic rates and a closer fit between observed and modeled O_{2} dynamics in streams.

## Acknowledgements

We thank Jon M. O'Brien, Kymberly C. Wilson, Justin N. Murdock, Jessica Eichmiller, Katie N. Bertrand, and Mandy Stone for assistance with data collection. K. Gido, J. Blair, and 3 anonymous referees provided helpful comments on the manuscript. The research was supported by the National Science Foundation (Konza Long Term Ecological Research, Konza Research Experience for Undergraduates, Lotic Intersite Nitrogen Experiment). This is publication #10-312-J from the Kansas Agricultural Experiment Station.

## Literature Cited

^{th}edition. American Public Health Association, American Waterworks Association, and Water Environment Federation, Washington, DC. Google Scholar

^{nd}edition. Academic Press, San Diego, California. Google Scholar

^{15}N experiments across a broad gradient of nitrate concentrations. Biogeochemistry 84:31–49. Google Scholar