Upper Mississippi River (UMR) resource managers need a quantitative means of evaluating the health of mussel assemblages to measure effects of management and regulatory actions, assess restoration techniques, and inform regulatory tasks. Our objective was to create a mussel community assessment tool (MCAT), consisting of a suite of metrics and scoring criteria, to consistently compare the relative health of UMR mussel assemblages. We developed an initial MCAT using quantitative data from 25 sites and 10 metrics. Metrics fell in five broad groups: conservation status and environmental sensitivity, taxonomic composition, population processes, abundance, and diversity. Metric scoring categories were based on quartile analysis: 25% scoring as good, 50% scoring as fair, and 25% scoring as poor. Scores were meant to facilitate establishing management priorities and mitigation options for the conservation of mussels. Scoring categories assumed that a healthy mussel assemblage consists of species with a variety of reproductive and life-history strategies, a low percentage of tolerant species, and a high percentage of sensitive species; shows evidence of adequate recruitment, a variety of age classes, and low mortality; and has high abundance, species richness, and species and tribe evenness. Metrics were validated using a modified Delphi technique. MCAT metrics generally reflected the professional opinions of UMR resource managers and provided a consistent evaluation technique with uniform definitions that managers could use to evaluate mussel assemblages. Additional data sets scored a priori by UMR resource managers were used to further validate metrics, resulting in data from 33 sites spanning over 980 km of the UMR. Initial and revised MCAT scores were similar, indicating that data represent the range of mussel assemblages in the UMR. Mussel assemblages could be evaluated using individual metrics or a composite score to suit management purposes. With additional data, metrics could be calibrated on a local scale or applied to other river systems.
Native freshwater mussels (Order Unionida) are bioindicators of riverine ecosystem health because of their sensitivity to hydrophysical conditions, disturbance, and contamination, and their strong ecological ties to other components of aquatic communities and biotic and abiotic processes (Strayer et al. 2004, Vaughn 2010). Native freshwater mussels are ecologically significant because they transfer nutrients and energy from the water column to the sediments, stimulate production across trophic levels, stabilize substrates, provide habitat for other invertebrates and fish, and provide food for fish and mammals (Howard and Cuffey 2006, Vaughn 2017).
The Upper Mississippi River (UMR) historically harbored a diverse assemblage of native freshwater mussels (Van der Schalie and Van der Schalie 1950). Navigation pools (hereafter, pools) comprise the river reach between two adjacent dams, typically ranging from 20 to 40 km long and from 1 to 4 km wide. Freshwater mussel surveys in the UMR have documented 50 species; however, 10 of these species have been collected only as shell material in the last 40 yr, and 28 of the 40 extant species are federally listed or listed by bordering states as threatened or endangered (Dan Kelner, U.S. Army Corps of Engineers [USACE], 2020 oral communication). Mussel-assemblage composition in many areas of the UMR appears to have changed considerably from pre-European settlement times toward less-dense and less-species-rich assemblages dominated by contamination-tolerant habitat generalists (e.g., Amblema plicata, Fusconaia flava, Van der Schalie and Van der Schalie 1950, Theler 1987). These ongoing changes in abundance, species richness, and assemblage structure are driven by factors including human alteration of hydrology and hydrophysical habitat, contamination, exotic species, and past commercial harvest activities (Fuller 1980, Baker and Hornbach 2000). Perhaps most important, a series of 29 dams, constructed mostly in the 1930s for commercial navigation, dramatically altered habitat and hydrology.
Mussel conservation in the UMR is of great concern to the bordering states (Minnesota, Wisconsin, Illinois, Iowa, and Missouri) and federal agencies including the U.S. Fish and Wildlife Service (USFWS), the USACE, and the National Park Service (NPS). Natural resource managers in state and federal agencies expend considerable effort assessing the effects of management and regulatory actions (e.g., poolwide draw-downs, island construction) on mussels in the UMR system (defined as the UMR from Minneapolis, Minnesota to Cairo, Illinois; the Illinois Waterway from Chicago to Grafton, Illinois; and navigable tributaries). Natural resource managers in the UMR system need a quantitative means of evaluating the relative health of mussel assemblages to evaluate mussel resources, measure effects of management and regulatory actions, assess the efficacy of restoration techniques, and inform a variety of regulatory tasks.
Tools such as the Indices of Biotic Integrity exist for fish (e.g., Karr 1981) and macroinvertebrates (e.g., Blocksom and Johnson 2009), and they frequently are used to assess environmental conditions suitable for biota and to prioritize conservation actions. Metrics in fish and macroinvertebrate indices often include measures of sensitive and tolerant taxa, species richness and diversity, and taxonomic composition (Karr 1981, Lyons et al. 2001, Angradi et al. 2009, Blocksom and Johnson 2009). However, compared with freshwater mussels, most fish and invertebrates are short-lived and may respond more quickly to changes in environmental conditions, whereas mussels are likely to incorporate stressors over greater spatial and temporal extents (Newton et al. 2008). Moreover, assessment of mussel responses to stressors (e.g., degraded habitat, nutrient enrichment) is hindered because life-history traits and species-specific tolerances to contaminants and disturbances are largely unknown (Haag 2012, FMCS 2016). Our objective was to develop a mussel community assessment tool (MCAT) for natural resource managers to compare the relative health of mussel assemblages in the UMR. To meet this objective, we completed two phases: (1) creation of the MCAT through development of a suite of quantitative metrics and development of cut points using quartile analysis, and (2) validation of the MCAT through professional judgment and comparison with additional data from UMR resource managers. MCAT scores were developed to facilitate establishment of management priorities and mitigation options aligned to conservation goals.
Criteria for data-set selection.—Data used to calculate metrics were from 25 sites within the UMR spanning 925 km from pools 2 to 26 (Figure 1, Table 1). Data sets came largely from Ecological Specialists, Inc. (a consulting firm specializing in freshwater mussel surveys) and from the USACE mussel database (USACE 2006). Data were collected either as part of long-term monitoring studies or for assessing potential effects of in-stream activity in support of permit applications under Section 404 of the Clean Water Act or Section 10 of the Rivers and Harbors Act. Most data were restricted to mussel beds within the UMR main stem, with few “nonbed” areas in the data sets. Thus, the inference from this study is largely limited to mussel beds. We used the mussel bed definition of Strayer et al. (2004): “aggregations of mussels where many or all of the species found co-occur at densities 10 to 100 times higher than those outside the bed.”
To ensure consistency among data used to calculate metrics, we used only those data sets that had a sample size ≥20 0.25-m2 quadrats, and only those samples with mussel-age data. All quantitative samples were collected by excavating the substrate within each 0.25-m2 quadrat to a depth of ≤15 cm into either a 20-L bucket or bag with ≤6-mm mesh size. Each sample was rinsed through 6-mm and 12-mm mesh sieves, and live mussels and fresh-dead shells (shells with clean shiny nacre, Southwick and Loftus 2018) were separated from substrate and debris. We identified all live mussels and fresh-dead shells to species, and we measured most live mussels for length and age using external annuli counts. Although such counts may be less accurate than counts using internal annuli (Haag 2009), they can be done in the field, do not involve sacrificing animals, and are sufficiently accurate to identify younger (≤5 yr old) and older (≥15 yr old) mussels.
Data sets were within a spatial scale of ≥250 m2, a spatial scale used for many management actions (e.g., island construction, dredging) and regulatory permit requests (e.g., Clean Water Act Section 404) in the UMR. Because Dreissena polymorpha has affected many UMR mussel beds, data sets were also all post-2000, after D. polymorpha became abundant in the UMR (circa 1995, Cope et al. 1997).
Metrics.—For the MCAT, we considered a suite of 46 candidate metrics often used by UMR resource managers to evaluate mussel assemblages (Table 2). Candidate metrics fell into five broad groups of ecological attributes: conservation status and environmental sensitivity, taxonomic composition, population processes, abundance, and diversity (Table 2). Metrics were computed from 25 data sets collected within the main-stem UMR using SAS (v.9.2, SAS Institute, Inc., NC, USA) and Primer-E (v.6, Plymouth Marine Laboratory, Plymouth, United Kingdom). Because data sets originally were collected for other purposes, some metrics could not be computed at all sites because of small sample size or questionable age data.
Site locations and description of data sets used in developing (Phase 1) and validating (Phase 2) the mussel community assessment tool in the Upper Mississippi River.
Our goal was to identify 10 metrics to serve as indices of the five broad groups, with one to three metrics in each group. First, we reduced the 46 metrics to 20 by prioritizing those that had sufficient distribution to discriminate among sites and that were less sensitive to sampling methods. We used Spearman correlation analyses to identify redundancy among metrics within broad groups. We sequentially discarded metrics having strong rank correlations (P < 0.05, r > 0.6) with other metrics in the same broad group. When selecting between candidates with strong correlations, we focused on those metrics that are least dependent on sample size or distribution.
Two metrics were selected within the broad group conservation status and environmental sensitivity. The percentage of species listed as threatened or endangered was selected as a measure of sensitive species. We calculated percent-listed species as the sum of individuals listed either federally or by bordering states, divided by the total number of individuals, multiplied by 100. The percentage of tolerant species was selected as a measure of a disturbed assemblage. This metric was calculated as the sum of individuals of A. plicata, Quadrula quadrula, and Obliquaria reflexa (abundant species in UMR mussel beds), divided by the total number of individuals, multiplied by 100.
List of candidate metrics explored for the mussel community assessment tool (MCAT) in the Upper Mississippi River (UMR). Broad MCAT metric groups are underlined. Metrics selected for use in the MCAT are bolded.
One metric was selected to represent taxonomic composition. The percent tribe Lampsilini measured the dominance or lack of dominance by one tribe. This was calculated as the number of individuals in the tribe Lampsilini, divided by the total number of individuals, multiplied by 100.
Three metrics were selected to represent population processes. The percentage of fresh-dead mussels was used as an index of recent mortality and was selected as a measure of recent stress on a mussel assemblage. We calculated percent fresh-dead mussels as the number of fresh-dead shells, divided by the number of fresh-dead and live individuals, multiplied by 100. The percentage of ≤5-yr-old mussels represents recruitment into an assemblage over the last 5 yr and was calculated as the number of individuals ≤5 yr old, divided by the total number of individuals, multiplied by 100. The percentage of ≥15-yr-old mussels is a measure of older individuals in the assemblage and was calculated as the number of individuals ≥15 yr old, divided by the total number of individuals, multiplied by 100.
The metric selected for abundance was abundance at the 75th percentile (Q75, 3rd quartile). Quartiles provide more information on the spread of data than simply the mean or median. This metric represents abundance in the densest part of a sample area and was calculated by ranking abundance from all samples and selecting the value that was exceeded in 25% of the samples.
Three metrics were selected to measure diversity: Pielou's evenness (J′) at the species level, evenness at the tribe level, and rarefaction richness at 100 individuals (ES_100). Evenness measures the distribution of species or tribes within an assemblage and was calculated as J′ = H′/H′max, where H′ is Shannon diversity index and H′max is the maximum possible H′ (every species/tribe is equally represented):
where pi is the proportion of individuals of the ith species (Ludwig and Reynolds 1988).
Rarefaction richness at 100 individuals is the expected number of species with a sample size of 100 individuals estimated by rarefaction (Colwell et al. 2012). Because the number of species is highly related to the number of individuals collected, rarefaction richness allows richness to be compared on the basis of an equal number of individuals (Colwell et al. 2012). Rarefaction richness was calculated using EstimateS (v.9.1, Colwell 2013).
Frequency histograms of individual metric values were plotted, and a quartile analysis was used to determine critical values (hereafter referred to as cut points) for dividing data sets into scoring categories, with ∼25% of sites scored in the poor category, 50% in the fair category, and 25% in the good category for each individual metric. Typically, the 2nd and 3rd quartiles were combined for the fair scoring category.
Metrics were validated in three ways. First, we compared (via agreement or proximate agreement) MCAT-derived scoring categories derived from a subset of the initial data sets with the professional judgment of UMR natural resource managers. Second, we compared cut points derived from Phase 1 with cut points derived from Phases 1 and 2 data sets combined. Third, we compared multivariate patterns among sites using principal components analysis (PCA) with the professional judgment of UMR natural-resource managers.
We used a modified Delphi technique (i.e., on the basis of expert opinion, Zuboy 1981, Mukherjee et al. 2015) to compare the MCAT metrics with the resource managers' professional judgment. Independent scores from UMR natural-resource managers were compared with scores derived from MCAT metrics for a subset of Phase 1 sites and for newly identified data sets. We organized a workshop in La Crosse, Wisconsin during February 2015 that was attended by 10 UMR natural-resource managers from state and federal agencies (Minnesota Department of Natural Resources, Wisconsin Department of Natural Resources, Iowa Department of Natural Resources, Missouri Department of Conservation, NPS, USACE, and USFWS). Before the workshop, we provided each participant with three high-scoring and three low-scoring data sets randomly selected from the Phase 1 data but not the Phase 1 metrics or scoring categories. The six data sets were Lansing essential habitat area (EHA)-river, Capoli Slough, Prairie du Chien EHA, Cassville downstream, Burlington, and Batchtown (Table 1).
For each data set, participants were provided raw data (i.e., species, length, and age in each sample), summarized data for the site (i.e., number and relative abundance of each species), and general site information (i.e., UMR pool number, sample size, coordinates, area sampled). We asked workshop participants to use their professional judgment and the method typically used by their agency to score each site as poor, fair, or good for the site overall (overall composite score), and for each broad metric group (i.e., conservation status and environmental sensitivity, taxonomic composition, population processes, abundance, and diversity; broad metric group composite score). At the workshop, we assembled scores from participants and discussed processes used to score test data sets, as well as strengths and weaknesses of each metric and potential alternative metrics.
To match the level of scoring done by workshop participants for each site, we computed Phase 1 broad metric group composite scores and an overall site composite score. Poor, fair, and good category scores for individual metrics were converted to a numerical score of 0–6 (poor = 0; fair = 3; good = 6) on the basis of Phase 1 cut points (Table 3). Broad metric group composite scores were computed as the mean of the component metrics, and the overall site composite score was computed as the mean of broad metric group composite scores. Composite scores 0–2.0 were considered poor, 2.1–4.0 were considered fair, and 4.1–6.0 were considered good.
We estimated the percent agreement between workshop-participant scores (i.e., professional judgment) and Phase 1 scores (number of participant scores agreeing with MCAT, divided by number of participants, multiplied by 100). We also estimated the proximate agreement between participant scores and the Phase 1 scores to evaluate differences across a broader continuum. For proximate agreement, the Phase 1 broad group composite scores and overall site composite scores were judged similar to a workshop-participant categorical score (good, fair, poor) if they fell within the trisected numerical scoring range for each scoring category expanded by 1.0 point (i.e., 0–3.0 = poor or nearly poor, 1.6–4.5 = fair or nearly fair, 3.1–6.0 = good or nearly good).
Workshop participants were asked to provide a list of additional data sets (Phase 2 data sets) that might be used in the Phase 2 validation effort. From these candidate data sets, we randomly selected four data sets from the upper pools (pools 1–8), three from the middle pools (pools 9–17), and three from the lower pools (pools 18–26) that met the criteria developed in Phase 1. Each contributor of a data set was asked to a priori score the overall mussel assemblage as poor, fair, or good. Individual MCAT metrics for these new sites were calculated as in Phase 1. However, four sites had an insufficient number of individuals to compute ES_100 by simple rarefaction; in these cases, we applied a sample-based Bernoulli product model (Colwell et al. 2012) to extrapolate species-richness curves. Workshop participant scores (i.e., a priori ranking of Phase 2 data sets) were compared with an overall composite score based on the MCAT.
Final metrics and individual metric cut points within scoring categories (poor, fair, good) for Phase 1, Phase 2, and combined data of the mussel community assessment tool in the Upper Mississippi River.
Data from Phases 1 and 2 were combined (combined data sets) and used to generate combined-frequency histograms for each individual metric, and quartile cut points for scoring categories were updated. Combined-data cut points were compared with Phase 1 cut points to assess their validity. The percent change in cut points was calculated by dividing the difference between Phase 1 and combined-data cut-point values by the overall range of values for that metric.
We used PCA of the MCAT metric values in the combined data sets to explore multivariate patterns among sites. Only sites with a full suite of metrics were analyzed, and data from sites sampled over multiple years were averaged before analysis. Percentage data were arc-sine transformed, and all data were normalized to account for differences in measurement scales before correlation-based PCA ordination. Only PCA axes with eigenvalues >1 were interpreted.
We used 35 data sets from 33 sites meeting a priori criteria in the MCAT, 25 data sets from Phase 1 plus 10 data sets in Phase 2 (Table 1). Two of the Phase 2 data sets (Hanson's Slough EHA and Cordova EHA) were from sites also included in Phase 1, and data from both phases were combined, resulting in a combined 33 data sets. Data sets were collected between 2002 and 2014 in 14 pools spanning a range of ∼ 980 km from Pool 2 just south of Minneapolis–St. Paul, Minnesota to Pool 26 just north of St. Louis, Missouri (Figure 1, Table 1). Average abundance across these data sets ranged from <1 to about 41 mussels/m2. Phase 1 data sets included a concentration of sites in pools 9–14, lack of poor-quality sites, and lack of sites with a high percentage of ≤5-yr-old individuals. Phase 2 data sets included sites in the upper, middle, and lower pools, two poor-quality sites, and two sites with >75% individuals ≤5 yr old. Phase 2 data values also were well distributed among the Phase 1 values for all metrics, and the similarity in distribution of values and cut points with the additional data sets added credibility to the metric cut points developed in Phase 1 (Figures 2–5; Table 3).
Ten metrics deemed useful for assessing the relative health of mussel assemblages were identified: percent listed species, percent tolerant species, percent Lampsilini, percent fresh-dead, percent ≤5 yr old, percent ≥15 yr old, Q75 abundance, species evenness, tribe evenness, and ES_100 (Table 3). The percent-listed-species metric ranged from 0 to 12% (Figure 2A). The upper quartile of sites (good scoring category) had >3.6% listed species, and the lower quartile (poor scoring category) had <1.0% listed species. The percent-tolerant-species metric ranged from 11 to 83% (Figure 2B), with the good category having <40% tolerant species and the poor category having >62% tolerant. The percent-tribe-Lampsilini metric ranged from 11 to 78% (Figure 3A). The mid-quartile (>37 to 48%) was scored in the good category. The low (<26%) and high (>56%) extremes were scored in the poor category. The percent-fresh-dead metric ranged from 0 to 39% (Figure 4A). The lower quartile of sites (good) had <4% freshly dead mussels and the upper quartile (poor) had >8%. The percent-≤5-yr-old metric ranged from 5 to 55% (Figure 4B). The upper (good) and lower quartile (poor) of sites had >49% and <23%, respectively. The percent-≥15-yr-old metric ranged from 0 to 19% (Figure 4C). The mid-quartile (>2 to 5%) was scored as good, and the extremes (<0.6 or >9%) were scored as poor. The Q75-abundance metric ranged from 0 to 56 mussels/m2 (Figure 3B). The upper (good) and lower quartile (poor) of sites had >12/m2 and <8/m2, respectively. The species-evenness metric ranged from 0.5 to 0.9 (Figure 5A). The upper quartile (good) was >0.8 and the lower (poor) quartile was <0.7. The tribe-evenness metric ranged from 0.6 to 0.9 (Figure 5B). The upper (good) and lower (poor) quartiles were <0.8 and >0.7, respectively. The rarefaction-richness (ES_100) metric ranged from 8 to 18 (Figure 5C); the upper (good) and lower (poor) quartiles had >16 and <13 species, respectively.
In assessing data sets, workshop participants generally agreed with the MCAT (Table 4). Any disagreement stemmed from variable interpretations of the broad metric groups, agency priorities, and expectations of scoring categories based on personal experience with specific river reaches rather than evaluation of the data set. Phase 2 data sets also were scored similarly between workshop participants and the MCAT. Metric values for Phase 2 data sets were generally in the same range as the Phase 1 data sets (Figures 2–5). Scoring cut points based on combined data were similar to cut points based on Phase 1 (Table 3).
Conservation status and environmental sensitivity.—Workshop participants used variable criteria to evaluate this broad metric group, but their scores generally agreed with MCAT scores. Most participant scores and Phase 1 MCAT scores agreed across all sites (Table 4). The percentage of participants scoring the sites nearly the same as the Phase 1 MCAT was the highest for any broad metric group (80 to 90%). Some participants primarily evaluated this metric group on the basis of the presence or absence of threatened and endangered species, whereas others also considered abundance or age composition of listed species. Participants varied in focal species evaluated at sites, ranging from a focus on only federally listed species, to consideration of both state and federally listed species, to consideration of listed species as well as other species perceived to be rare in a given reach.
Taxonomic composition.—Workshop-participant scores agreed or nearly agreed with the Phase 1 MCAT scores at only two sites, and agreement with the MCAT was the lowest for any broad metric group (Table 4). Measures used by workshop participants to evaluate taxonomic composition ranged widely and included combinations of evenness, richness, presence or number of rare species, number of sensitive species, number of individuals in each tribe, presence of each tribe, richness in each tribe, and a balance between Amblemini and Lampsilini. This variability in interpretation likely contributed to the disagreement in scores.
Population processes.—Most workshop participants' scores and Phase 1 MCAT scores strongly agreed (≥50%) for four of the six sites (Table 4). Proximate agreement was >50% of participant scores for all sites. Criteria used by workshop participants to evaluate population processes generally focused on the age structure of the assemblage. Participants often used measures of recent recruitment, such as the total number of species represented by mussels ≤5 yr old and the percentage of the overall assemblage composed of mussels ≤5 yr old. Presence of older individuals also was considered. Despite the variability in defining population processes, workshop participant and MCAT scores were similar.
Abundance.—Most workshop participant scores agreed with the Phase 1 scores for five of the six sites (Table 4). Workshop participants generally scored abundance by considering the mussel density in samples with some spatial considerations. Some considered the overall density of the site compared with other sites within a given river reach, but others evaluated sites on the basis of whether samples indicated the presence of dense patches of mussels. However, these comparisons were generally qualitative (i.e., they did not compute any specific percentile of the distribution).
Diversity.—Workshop participant scores strongly agreed with each other for three of the six sites, but most disagreed with the Phase 1 MCAT scores for four of five sites (ES_100 could not be computed for Lansing EHA) (Table 4). Proximate agreement between participants and Phase 1 MCAT scores was ≥50% for three of the five sites. Participants used widely differing criteria when evaluating sites for diversity, including the percentage of the assemblage comprised of A. plicata, qualitative assessment of evenness, representation of all tribes, frequency of each species within samples, and degree of patchiness within a site. However, workshop-participant scores for diversity closely matched the individual-metric ES_100 scores (four of five sites), suggesting that participants may have relied on species richness rather than evenness measures when scoring sites.
Metric values and cut points.—Workshop participants agreed that Phase 1 cut points were within the range of their professional judgment. Most Phase 2 metric values fell within the range of Phase 1 metric values (Figures 2–5). Phase 2 data sets expanded the range of values slightly for four of the 10 metrics: percent Lampsilini, percent ≤5 yr old, percent ≥15 yr old, and ES_100. For most metrics, the scoring category cut points changed <10% between Phase 1 and the combined data set (Table 3). The change was slightly greater (10 to 20%) for percent Lampsilini, percent ≥15 yr old, and ES_100.
Principal components analysis.—Generally, patterns resulting from the PCA reflected site scores by workshop participants (Figure 6). Sites ranked poor by participants plotted to the left, fair sites plotted in the middle, and good sites plotted to the right on the PCA axis 1. The first three principal components were interpreted (eigenvalue >1) and accounted for 45, 15, and 14% of the variation in the data, respectively. Metrics with high loadings in the first principal component were percent listed species, percent tolerant species, percent Lampsilini, percent ≥15 yr old, and ES_100. Metrics with high loadings in the second principal component were percent ≤5 yr old, Q75 abundance, species evenness, and tribe evenness. Metrics with high loadings in the third principal component were percent fresh-dead mussels, Q75 abundance, and tribe evenness.
Indices of biological integrity are typically motivated by a desire to improve understanding of the ecological condition of sites or systems, and to assess the degree of environmental impairment (Karr 1981, Lyons et al. 2001, Angradi et al. 2009, Blocksom and Johnson 2009). Biological integrity refers to a site or water body's ability to support and maintain a balanced, integrated, adaptive community of organisms having a species composition, diversity, and functional organization comparable with natural habitats (Karr and Dudley 1981). The MCAT is uniquely focused on evaluating the conservation value of native freshwater mussel assemblages, rather than extrapolating scores to overall site or system ecological health.
Few other assessment tools have been developed for mussel assemblages (but see Szafoni 2002). Illinois' Freshwater Mussel Classification Index contains four metrics (species richness, presence of intolerant species, total abundance, and percent live species with individuals ≤30 mm or ≤3 yr old) that are summed to one index value that is used to identify priority areas for mussel conservation (Szafoni 2002). The strength of the MCAT lies in (1) using quantitative data to derive robust cut points that can change as information accumulates, (2) providing resource managers with 10 well-defined metrics across five assemblage characteristics that can be used individually or aggregated to one overall index value, depending on conservation objectives, and (3) providing resource managers with a consistent, quantitative means of evaluating mussel assemblages to aid decision making.
Our analysis indicates that the most robust mussel assemblages in the examined data sets have the following characteristics: >4% listed species, <38% tolerant species, 35 to 40% Lampsilini, ≤3% fresh-dead mussels, >49% mussels ≤5 yr old, 2 to 6% mussels ≥15 yr old, >13 mussels/m2 in the 75th quartile, a species evenness >0.8, a tribe evenness >0.8, and >16 species in a sample of 100 individuals (ES_100). These characteristics are similar to those reported by Haag and Warren (2010) in their assessment of the traits of self-sustaining mussel assemblages in southern streams. They characterized self-sustaining mussel assemblages as having high retention of historical species richness, a gradual increase in species richness from upstream to downstream, widespread occurrence of most species, low dominance and high evenness, high abundance of many species, and frequent recruitment for all species.
The 10 selected MCAT metrics appeared to adequately reflect how UMR resource managers evaluate mussel assemblages. Overall summary scores were similar between the MCAT and UMR resource managers participating in the workshop. Principal components analysis of sites based on the MCAT metrics also ranked sites similarly to workshop participants. Additionally, Lampsilis higginsii EHAs, which are sites that were selected by the L. higginsii recovery team as high-quality mussel assemblages (USFWS 2004), all plotted on the positive side of axis 1 in the PCA. Variation in scores by workshop participants stemmed largely from inconsistent group and individual metric definitions rather than from disagreements in cut points for scoring sites. Collectively, these findings indicate that the MCAT reflects the professional judgment of resource managers with respect to mussel assemblages in the UMR.
Mussel community assessment tool (MCAT) score and percentage of workshop participants independently scoring six sites as good, fair, or poor for broad MCAT metric groups. Individual metrics were converted to numerical values (poor = 0; fair = 3; good = 6). Broad metric group scores were computed as the mean numerical score of the individual metric scores. Proximate agreement is the percentage of participant scores similar to the MCAT score on a 0–6 numerical scale. Workshop participant and MCAT broad metric group scores were considered similar (proximate) if they fell within the trisected numerical scoring range expanded by 1.0 point (i.e., 0–3.0 = poor or nearly poor; 1.6–4.5 = fair or nearly fair; 3.1–6.0 = good or nearly good). Bolded participant ratings indicate the percent agreement with the Phase 1 MCAT score. Site-name descriptions as in Table 1.
Metrics of species sensitivity and environmental tolerance are often included in biotic indices (Karr 1981, Lyons et al. 2001, Angradi et al. 2009, Blocksom and Johnson 2009). Percent listed species was used as a surrogate for sensitive species, similar to the Illinois index described above. Although there has been considerable progress in evaluating the sensitivity of mussels to environmental contaminants, toxicity data are available for only a fraction of species (Cope et al. 2008, FMCS 2016). Listed species are those that state or federal regulatory agencies have determined are imperiled because of sensitivity to environmental conditions (e.g., physical disturbance, poor water quality) or because they are at the edge of their natural range (IL DNR 2020, MN DNR 2020). Higher percentages of these species in an assemblage indicate more pristine environmental conditions, likely reflecting a higher-quality assemblage.
Biological communities frequently show skewed species-abundance distributions, with a few numerically dominant species and many rare species (Kunte 2008). Species that dominate under degraded conditions are often the most tolerant (Karr 1981). Three species dominated mussel assemblages across the 980-km study reach: A. plicata, O. reflexa, and Q. quadrula. Dominance by a few species often indicates human effects or other stressors (Haag and Warren 2010). Stressors may affect many species simultaneously, causing decreases in rare species and subsequent increases in common species (Haag 2012). Although little information is available on tolerance of mussel species to impaired conditions, tolerant species are often more abundant in areas with silt accumulation, low velocity, and high temperature (Miller and Payne 1998, Spooner and Vaughn 2009, Bartsch et al. 2010).
A healthy assemblage should contain diverse behavioral and life-history traits, which often align with mussel tribes (Haag 2012). The percent-tribe-Lampsilini metric was selected to represent taxonomic composition for the MCAT. Twenty-one of the 50 species known from the UMR are in tribe Lampsilini (Graf and Cummings 2007), and 20 of the 21 Lampsilini are opportunistic or periodic species (Haag 2012). Opportunistic traits, such as rapid growth, early maturity, short life span, and high reproductive output enable a species to colonize a site rapidly and to persist in unpredictable environmental settings (Winemiller 2005, Haag 2012). For example, Randklev et al. (2019) found that opportunistic species, such as Lampsilis sp., were proportionally more abundant in reaches where the adverse effects of dams were prominent. Thus, assemblages dominated by Lampsilini may indicate less-stable habitat.
Self-sustaining mussel assemblages should contain multiple size and age classes and have a recruitment rate that meets or exceeds the mortality rate (Haag and Warren 2010). The metrics percent fresh-dead, percent ≤5 yr old, and percent ≥15 yr old were selected for the MCAT as indices of population processes. The percent fresh-dead mussels in an assemblage can be used as a measure of recent mortality; in our analysis, <3% fresh-dead shells typically were observed in higher-quality mussel assemblages. For most mussel species, once maturity is reached, the mortality rate is low (Haag 2012). Mean estimated annual mortality of the three most common species in a reach of the UMR was 11% in A. plicata, 19% in O. reflexa, and 18% in Cyclonaias pustulosa (Newton et al. 2011). A high percentage of fresh-dead shells may indicate relatively recent mortality from a chronic or acute water-quality event, substrate deposition or scouring, high level of D. polymorpha infestation, disease, or other factors (Southwick and Loftus 2018).
Areas that contain both young and old mussels are likely to be areas of persistent mussel assemblages (Ries et al. 2016). The percent-≤5-yr-old metric represents recruitment into an assemblage over the last 5 yr and has been used commonly to describe recent recruitment in the UMR (e.g., Newton et al. 2011, Ries et al. 2019). Age at maturity varies from 0 to 11 yr old among species, but most species mature at ≤6-yr old and many mature between 2 and 4 yr old (Haag 2012). In our analysis, higher-quality mussel assemblages contained ∼50% ≤5-yr-olds. Similarly, the percentage of the population consisting of juveniles ≤5 yr old ranged from 40 to 62% across three reaches of the UMR (Newton et al. 2011).
Longevity of mussels also varies considerably among species, but generally ranges from 15 to 40 yr (Haag 2012). Low recruitment, coupled with a high percentage of older individuals, may indicate a nonreproducing assemblage due to conditions that are no longer suitable for recruitment (Haag 2012, Ries et al. 2016). Areas with many juveniles and few older individuals may indicate newly forming areas with suitable habitat (areas where juveniles are deposited by fish or by local hydraulic conditions) or ephemeral habitats (areas that may be destroyed by the next flood or drought, Ries et al. 2016). Recent observations in the UMR indicate that assemblages with >75% juveniles may represent a transient or new assemblage (H. Dunn, personal observation). Variation in life-history strategies are important to consider when interpreting age metrics.
Often, areas with locally high abundance are considered to be of higher quality relative to areas with low abundance (Szafoni 2002, USFWS 2004). The results of our workshop showed that most resource managers rely on mean abundance if quantitative data are available. However, mean abundance is sensitive to nonnormal distributions (e.g., skewness, outliers) and strongly affected by sampling design that may or may not account for spatial patterns of mussels or include various proportions of bed and nonbed areas. Thus, abundance at the 75th percentile may better reflect densities in the core of a mussel bed and should allow data sets containing at least part of a good mussel area to score higher. Given that mussels are distributed patchily across several scales (Ries et al. 2016, 2019), this metric should allow patches of high abundance to score higher.
Biological diversity is composed of two components: species richness and species evenness (Bock et al. 2007). The latter is an estimate of the dominance of an assemblage by a few species (Ludwig and Reynolds 1988). Several studies indicate that evenness is a useful metric in mussel-assemblage analyses (Haag and Warren 2010, Zigler et al. 2012, Hornbach et al. 2017). For example, Haag and Warren (2010) reported evenness values ranging from 0.82 to 0.88 across six high-quality mussel assemblages in the Sipsey River, Alabama. These values are similar to those reported in the MCAT (∼0.80) across high-quality sites. Thus, high evenness values are often a characteristic of robust mussel assemblages. Because the number of species and the number of individuals are highly correlated, observed richness is often a downward-biased estimate of true richness (Colwell et al. 2012). Rarefaction curves estimate the number of species that one would expect to find, on average, after x individuals are sampled (Gotelli and Colwell 2001). ES_100 accounts for the effect of sample size better than using raw species richness. This advantage is especially important when evaluating data from multiple sources, obtained for different purposes, and with differing sampling designs—as was done in the MCAT. Rarefaction curves are becoming more frequent in studies of mussel assemblages (e.g., Daniel and Brown 2013, Miller et al. 2017).
A strength of the MCAT is in providing a series of consistent and quantitative metrics for managers to use when evaluating mussel assemblages. We view the MCAT as an important step toward developing a suite of useful metrics to assess the relative health of mussel assemblages in the UMR and elsewhere. However, the distribution of metrics and the decision points for scoring metrics need to be interpreted carefully because of limitations in the data. We attempted to apply reasonable decision points, but a sample size of 33 data sets is relatively small. Although we applied criteria to reduce sampling variability among sites and attempted to select metrics that were relatively insensitive to sampling design, concerns about sampling design cannot be dismissed.
We also recognize that our data represent a single snapshot of each site. Because some mussel species are long-lived, population and assemblage responses to environmental stressors might have substantial lag times that may complicate interpretation of metrics and their application in management decisions. Metrics can be improved adaptively by reevaluating decision points or adding or replacing metrics as new data become available. For example, inclusion of metrics describing functional guilds, such as thermal and reproductive guilds, may add considerable value to the MCAT once more species are categorized (Barnhart et al. 2008, Gates et al. 2015). Application of a standardized design for sampling mussels (see Newton et al. 2011) may improve the development of MCAT metrics. Last, metrics derived for the UMR may apply to other systems with modification and calibration. For example, a tolerant-species metric could consider those species having increased abundance over time or that overwhelmingly dominate mussel assemblages in a given river.
The creation of multiple metrics will provide more information to resource managers than a single composite score. For example, sites with high diversity but low density might have a high conservation priority in reaches depauperate in species. Conversely, sites with high density but low diversity might merit conservation importance if management goals prioritize specific ecosystem functions, such as water filtration. Preserving mussel assemblages with differing attributes may enhance the ecological integrity of rivers. Individual metrics may help managers identify potential problems. For example, although an assemblage may score “good” on most metrics, a “poor” recruitment score may be an early warning sign of a declining assemblage.
Any ecological model constructed for conservation purposes, such as the MCAT, can provide a common framework for assessing mussel assemblages and subsequent conservation decisions. More important, such frameworks can facilitate discussion of management decisions, especially when biologists or agencies disagree. Discussing the strengths and weaknesses of natural resource decisions using formalized models is often more beneficial than an ad-hoc approach and can lead to adaptive improvements to both the model and resultant decisions (Starfield et al. 1994).
This work was funded by the USACE's UMR Restoration Program and the U.S. Geological Survey Ecosystems Mission Area Fisheries Program, Upper Midwest Environmental Sciences Center. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. We thank workshop participants Bernard Sietman, Bernie Schonhoff, Byron Karns, Dan Kelner, Joe Jordan, Jon Duyvejonck, Lisie Kitchel, Mike Davis, Scott Gritters, and Steve McMurray. We also thank Joe Jordan and two anonymous reviewers for reviewing the manuscript and offering useful suggestions.