We comment on the paper of Irons et al. (2000), which evaluated the status of several marine bird taxa in Prince William Sound, Alaska, nine years after the Exxon Valdez oil spill. We discuss concerns about the effects on the study design of inherent differences between the oiled and unoiled areas; about interpretations of results that use inconsistent criteria to define the spatial scales of analysis; and about explanations of underlying causes that are not empirically founded. These comments highlight general difficulties in assessing the effects of large-scale environmental perturbations. It is important to draw conclusions about the effects of such events, but the conclusions must be founded on accuracy in reporting study results, caution in interpreting the results, and adequate consideration of alternative causal explanations for the observed results.
Acerca de Inferir Conclusiones Nueve Años después del Derrame de Petróleo del Exxon Valdez
Resumen. Comentamos el artículo de Irons et al. (2000), que evaluó el estado de varios taxa de aves marinas en Prince William Sound, Alaska, nueve años después del derrame de petróleo del Exxon Valdez. Discutimos inquietudes acerca de los efectos del diseño del estudio en relación a diferencias inherentes entre las áreas afectadas y no afectadas por el petróleo; sobre las interpretaciones de los resultados que usan criterios incosistentes para definir las escalas espaciales del análisis; y sobre explicaciones de causas subyacentes que no están empíricamente fundamentadas. Estos comentarios resaltan las dificultades generales en establecer los efectos de grandes perturbaciones ambientales. Es importante inferir conclusiones sobre los efectos de dichos eventos, pero estas conclusiones deben considerar la presición de los resultados presentados, ser precavidas al interpretar los resultados, y considerar adecuadamente explicaciones causales alternativas para los resultados observados.
The Exxon Valdez oil spill had major immediate impacts on a large number of marine bird species (Piatt et al. 1990, Fry 1993, Wiens 1995, Ford et al. 1996, Piatt and Ford 1996, Wiens et al. 1996, 2001, Day et al. 1997, Murphy et al. 1997, Lance et al. 2001). The magnitude of the spill, of public interest, and of the litigation that followed generated a considerable amount of research, some of which continues. As research continues, so do debates about the overall effects and long-term consequences of the spill on bird populations.
In a recent paper, Irons et al. (2000) presented an analysis of shoreline-transect surveys of seabirds that were repeated over 15 years, from before the spill (1984) through 9 years after the spill (1998). They reported that densities of several taxa were reduced in oiled areas of Prince William Sound (hereafter, PWS) in relation to those in unoiled areas, that these effects subsequently disappeared for some taxa, and that five taxa exhibited persistent negative effects 9 years after the spill. They attributed the lack of recovery of these taxa to oil persisting in the environment and to reduced abundance of forage fish. These conclusions contrast with our findings (Wiens et al. 1996, 2001, Day et al. 1997, Murphy et al. 1997, Day et al., unpubl. data), which also showed that densities of many species were negatively affected in oiled areas but indicated recovery of all of the impacted species by 1996 or 1998.
Rather than detail these differences here, we address concerns that emerged from an examination of the Irons et al. study. We should make it clear at the outset that we believe that the Irons et al. study was founded on a data set derived from conscientious surveys and careful and statistically sound analyses. Our concerns relate instead to aspects of the analytical design, the interpretation of results, and the explanation of the underlying causes. These concerns bear directly on the broader issue of how one evaluates the long-term consequences of an oil spill or, indeed, of any environmental perturbation.
CONCERNS ABOUT STUDY DESIGN
Irons et al. based their analysis on a before-after-control-impact (BACI) design (Stewart-Oaten et al. 1986, Wiens and Parker 1995, McDonald et al. 2000, Stewart-Oaten and Bence 2001), in which densities of species recorded in surveys from oiled and from unoiled areas for several years following the oil spill were compared with densities from the same survey locations before the spill. As Irons et al. noted, the assumption of this approach is that the relative changes in the two areas over time would be similar in the absence of the spill, not necessarily that post-spill densities in these areas would have been the same as those before the spill had the spill not occurred. Thus, if abundance were to decrease in both oiled and unoiled survey locations relative to the pre-spill baseline but were to decrease more in the oiled areas, this relative change would be taken as evidence of a negative oiling impact.
Although the BACI design does not require that the compared areas be similar in environmental features, it does assume that any differences between the areas do not influence the directions or rates of population change over time. Irons et al. did not deal directly with this assumption, but they did address the issue of environmental similarity between the oiled and unoiled areas by categorizing transects in terms of four shoreline substrate types and then using chi-square analysis to determine whether the proportions of these types differed between the set of oiled and unoiled transects. Although there were clear differences in the frequencies of individual shoreline types, this analysis indicated no significant overall differences in the frequencies of shoreline types between the oiled and unoiled samples. Because individual survey transects varied considerably in shoreline length, it is not possible to determine whether total lengths of the shoreline types were similar in the oiled and unoiled samples.
Habitat variables other than shoreline type may be important, however. In our analyses of a 10-bay subset of the Irons et al. study sites that ranged from completely unoiled to some of the most heavily oiled locations in the spill area, we measured >20 environmental features in addition to shoreline type. These measures were included in multi-factor analyses to separate the effects of habitat variation from those of oiling intensity per se (Day et al. 1997). Shoreline-substrate variables were significant factors in models of species abundance and habitat for slightly over half of the models (103 of 191 models). Several habitat factors other than shoreline type also differed significantly between oiled and unoiled areas, however (Wiens et al. 2001), and habitat variation explained the patterns of species distributions and abundances better than did measures of oiling for many of the species we analyzed (Day et al. 1997; Day et al., unpubl.). Moreover, some of these variables, such as intertidal coverage of rockweed (Fucus; De Vogelaere and Foster 1994, van Tamelen et al. 1997) or the extent of mussel (Mytilus) and eelgrass (Zostera) beds (Murphy et al., unpubl. data) showed different patterns of post-spill change in oiled versus unoiled areas (Peterson 2001). Thus, the assumption of BACI analyses that any habitat differences between oiled and unoiled areas remain constant over time is open to question.
Systematic habitat differences between samples from oiled and unoiled areas may also complicate interpretation of the results obtained by Irons et al. Figure 1 in Irons et al. shows that the unoiled transects were (of necessity) spread over a larger region than were the oiled transects. This greater geographical coverage of samples is likely to increase the habitat variation among the unoiled transects relative to that of the oiled transects. If the unoiled sites encompass a broader range of environmental variation, it is unlikely that the changes in unoiled and oiled data sets would be concordant over time in the absence of oiling (which is the hypothesis that BACI tests). If there are systematic differences in habitat between the data sets, the problem is exacerbated. For example, some of the transects assigned to the unoiled data set by Irons et al. are in mainland areas, several of which are strongly affected by nearby glaciers (i.e., Harriman Fjord, Passage Canal, Blackstone Bay, Port Nellie Juan/King's Bay, Icy Bay, Columbia Bay, and Port Valdez) or have dramatically different environments than those seen in the oiled areas (i.e., Montague Island). By our count, perhaps one-third of the individual transects or medium-scale clusters used by Irons et al. to characterize unoiled areas are in glacially affected areas, whereas none of the oiled ones is.
It is unrealistic to expect that changes in bird abundances in these areas would have paralleled those in the oiled areas, had the spill not occurred. If, for example, abundances in such unoiled areas were to increase between 1984 and the 1990s in association with changing glacial conditions while those in oiled areas remained unchanged, one would record a negative relative change in the BACI analysis, which would be interpreted as a negative oiling effect. The opposite also might occur, producing a positive result in the BACI analysis. Clearly, the inclusion of environmentally different sites in the unoiled data set may invalidate the assumption of equivalent changes in the absence of oiling. This problem does not affect the statistics of the BACI analysis, but it does compromise the interpretation of the results.
CONCERNS ABOUT INTERPRETATION
Irons et al. determined whether species were negatively affected, positively affected, or unaffected by the spill by evaluating the sign and significance of the difference between pre- and post-spill abundance in the oiled samples relative to the unoiled samples. The analyses were conducted at three scales: the fine scale of individual transects, a medium scale of clusters of 2–5 geographically adjacent transects (some of which included both oiled and unoiled transects), and a coarse scale that compared the entire oil-spill area with all transects lying outside of this area. For most of the taxa they considered, the results (given in their Appendix 4) are easy to interpret, at least at face value. For others, however, the interpretation is more problematic.
By conducting analyses at different scales, Irons et al. hoped to be able to match the scale of spill effects with the general home-range scale of a given species. Murres (Uria spp.), for example, occur as nonbreeding, summering birds in both oiled and unoiled areas of PWS but breed in PWS only at Port Etches, Hinchinbrook Island, in small numbers. Individuals may range broadly over the sound, and the coarse-level analysis is probably best suited to reveal possible spill impacts. Appropriately, Irons et al. found significant negative changes at the coarse spatial scale in all years except 1993, and they concluded that murres showed continuing negative effects of the spill through 1998. However, they reached the same conclusion for “mergansers” (Mergus spp.; a combination of species that are ecologically somewhat different and showed different spill responses in the analyses of Day et al. 1997). This conclusion was apparently based on analyses at the coarse scale, which showed a disproportionate negative change in the oiled area that increased from nonsignificant in the spill year to highly significant in 1996 and 1998. Relationships at the other scales were negative but (with the exception of a weak relationship in 1993) were all nonsignificant. Because individual mergansers occupy rather limited home ranges during the midsummer period, the results from the fine and medium scales would seem to be more relevant. At these scales, the analyses suggest that changes in abundance in the oiled sites may have differed from those in the unoiled sites, but to conclude that there was an oiling impact in the absence of statistically significant effects seems unwarranted.
Similar inconsistencies are found in the interpretation of spill effects on “goldeneyes” (Bucephala spp.). Irons et al. (2000:729) concluded that goldeneyes showed “strong evidence of negative oil spill effects nine years after the oil spill.” Goldeneyes did indeed exhibit significant negative responses (i.e., greater change in oiled than in unoiled samples) at all scales of analysis in 1989, 1990, and 1993, but not in 1991. At the fine and medium scales, the results were significantly negative only in 1996 and at the coarse scale only weakly in 1998. Because goldeneyes occupy small individual home ranges within a few meters of the shoreline, there seems to be little basis for concluding that spill effects persisted after 1996.
Irons et al. (2000:729) also concluded that Black Oystercatchers (Haematopus bachmani) and Harlequin Ducks (Histrionicus histrionicus) “displayed strong evidence of negative oil spill effects a few years after the spill and may be recovering.” In fact, oystercatchers showed significant negative effects in 1990 (all scales) and 1991 (coarse scale only) and then again in 1998 (coarse scale only). Oystercatchers occupy small to moderate-sized, well-defined territories on shorelines, so there seems to be no basis for concluding that spill effects persisted after 1990. Harlequin Ducks exhibited nonsignificant positive changes in the year of the spill and significant negative effects in 1990 (fine and medium scales) and 1991 (medium and coarse scales); thereafter there were no significant differences between oiled and unoiled data sets at any scales (even using an α-level of 0.20 to minimize Type II errors). The evidence of initial negative spill impacts is arguably clear, but it is not clear why Irons et al. consider the evidence for recovery to be equivocal.
To some extent, the apparent inconsistencies in the interpretations of their results by Irons et al. may stem from their definition of “negative effect” and “recovery.” Thus, “if the bird densities were lower in the oiled area post-spill than expected based on the pre-spill/post-spill change in the reference area, it was considered a negative oil spill effect” (p. 726), and “recovery of an injured [taxon] was defined as lack of an effect” (p. 726, following Murphy et al. 1997, who actually said that recovery was “a detected impact [that] had diminished in subsequent years”). It is not apparent from these definitions whether a relationship must be statistically significant to be considered a negative effect, but it appears that they took any disproportionate decrease in abundance in oiled sites from the 1984 baseline relative to changes in the reference sites as evidence of a negative effect. For “recovery” to occur, such negative relationships (significant or not) therefore had to disappear entirely (i.e., a relative change = 0 versus <0). In view of the environmental differences that exist between the two areas in the Irons et al. study, one would not expect changes in abundance over time to be precisely concordant, which is why statistical analyses are so important. It is not clear why Irons et al. chose to emphasize their statistical procedures if they did not intend to abide by the results: the lack of a significant change from baseline conditions in any analysis should be taken as just that.
CONCERNS ABOUT EXPLANATIONS
Having concluded that there was “no indication of recovery in the number of birds for several taxa nine years after the oil spill,” Irons et al. sought to explain these conclusions in terms of two underlying mechanisms. First, they attributed the continuing negative relationships with oiling to the persistence of “oil in a toxic state” in PWS. To support this conclusion, they cited the work of Hayes and Michel (1999), who found oil residues under boulder/cobble armor in six coarse-grained gravel beaches selected specifically because they had been heavily oiled by the spill. Hayes and Michel made no mention of the areal extent or toxicity of the remaining oil, so it is not clear how their results can be applied to the spill area as a whole. In fact, the extent of oiled shoreline in PWS declined from 782 km in 1989 to 14 km (92% of which had very light oiling) in 1993; heavily oiled shoreline declined from 140 km to 0.1 km over the same period (Neff et al. 1995). A separate survey of all of the originally oiled areas in PWS sponsored by the Exxon Valdez Oil Spill Trustee Council (cited in Hayes and Michel 1999) found subsurface oil remaining in 1993 to be scattered widely along ca. 7 km of shoreline and surface oil along 4.8 km of shoreline (Gibeaut and Piper 1997, Peterson 2001).
The linkage between residual oil in a few protected mussel beds in PWS and contamination of mussel tissues documented by Babcock et al. (1996) also does not mean that such contamination was widespread. Indeed, Boehm et al. (1996) reported that, even in mussel beds at two “worst case” sites that were heavily oiled in 1989, less than 3% of the mussels occurred in association with residual oil trapped in sediments in 1993, and polycyclic aromatic hydrocarbon (PAH) levels in mussel tissues were far less than the levels known to cause sublethal effects in surrogate bird species. The median PAH concentration in sediments at 12 “worst case” sites in PWS in 1999 (117 ng g−1) was well below the toxicity level (ca. 2600 ng g−1) established using standard amphipod bioassays (Page et al. 2001). The observation of Irons et al. that Exxon Valdez oil deposited as mousse outside of PWS in Shelikof Strait was only slightly weathered (and therefore presumably still toxic) was based on the studies of Irvine et al. (1999:578), who explicitly stated that “the situation described from our study sites is not directly comparable to Prince William Sound sites.” Overall, then, although some oil residues may remain in extremely limited areas in PWS, most of these residues are highly asphaltic, not readily bioavailable, and not toxic to marine life (Page et al. 2001). It is difficult to see how such a small amount of oil buried in sediments in a few localized sites could produce widespread and persistent effects on a large number of bird species.
Irons et al. also referred to studies that reported elevated levels of cytochrome P450 1A in tissues of two seaduck species from oiled areas of PWS (Holland-Bartels et al. 1998, Trust et al. 2000) to support their explanation based on persistent oil toxicity. Because cytochrome P450 1A is induced by exposure to PAHs, it has been suggested that it can be used as a sensitive and specific indicator of exposure to oil. Irons et al. noted that it is not possible to determine whether the source of the PAHs was from Exxon Valdez oil or from other anthropogenic or natural sources. In fact, there are many sources of PAHs in PWS (Page et al. 1996, 1999), and both the prevailing currents in PWS and the levels of commercial and recreational boat traffic make it likely that background levels of PAHs would be greater in the oiled than in the reference areas used in the cytochrome P450 1A studies. In many parts of PWS (including parts of the spill area), Exxon Valdez PAHs represent a minor component of the total PAHs that are available to induce cytochrome system responses (Page et al. 1996, 1997, 1999). Overall, the evidence to support the contention that there are significant amounts of oil remaining in a toxic state in areas affected by the oil spill in 1989 is equivocal at the very least.
Irons et al. also attributed post-spill changes in abundances to reduced forage-fish abundance, which could have affected the recovery of spill-impacted marine birds in PWS. There is indeed some evidence that the abundance of juvenile herring (Brown et al. 1996) and other high-quality prey (Kuletz et al. 1997, Golet et al. 2000) declined in PWS after the oil spill (although the mechanisms underlying declines of some species are not entirely clear; Pearson et al. 1999). There is also evidence, however, that changing oceanographic conditions over the past 20–30 years have affected the abundance and species composition of the prey base available to marine birds in this region (Piatt and Anderson 1996). In particular, a decadal-scale climatic regime shift occurred in 1976–1977 in the North Pacific Ocean, altering ocean circulation patterns, sea-surface temperatures, and the abundances of plankton, shrimp, fish, and marine birds and mammals (Ebbesmeyer et al. 1991, Hayward 1997, Francis et al. 1998). It is also apparent that oceanographic conditions and marine productivity underwent another regime shift in 1989, coincident with the oil spill (Hare and Mantua 2000). Because this ecosystem was in considerable flux, both before and after the spill occurred, any effects of the spill would be superimposed on these long-term dynamics (Agler et al. 1999, Gilfillan et al. 2001). Irons et al. specifically suggested that prey such as sand lance (Ammodytes hexapterus) were less available for Pigeon Guillemots (Cepphus columba) after the spill than before, although they also suggested that increases in murrelets and terns after the spill were related to an increase in the abundance of sand lance in the oiled area. In fact, Brown et al. (1999) reported an increase in the abundance of sand lance schools in the oiled part of PWS relative to unoiled reference areas in 1995 to 1998, supporting the latter explanation but not that for guillemots. Abundances of forage fish certainly have been changing in PWS, both before and after the oil spill. Whether these changes are affecting recovery from the spill, however, is unknown.
Our intent in responding to the paper of Irons et al. is not to prolong the debate about the magnitude of the effects of the Exxon Valdez oil spill on marine birds or their recovery. It is undoubtedly true that this oil spill had strong negative impacts on a number of species, and it may well be true that the spill is still affecting some birds. The broader issue is really how one designs a comparison or conducts an analysis that will unambiguously tell one so.
Irons et al. used a BACI statistical analysis, which is potentially one of the more rigorous ways to assess spill-related impacts. Because oil spills are unreplicated and nonrandomly distributed, it is difficult to control for the confounding effects of other environmental differences that may exist between the oiled areas and reference sites (Wiens and Parker 1995). One of the advantages of BACI over other analyses is that one need not assume that the environments of the “control” and “impact” samples are the same in all respects other than the impact, but only that their dynamics in time would be concordant had there been no impact. This is a critical assumption. If the two areas differ substantially, their dynamics may differ; if these differences are not recognized, interpretations of the patterns revealed in a BACI analysis may be compromised. For example, an increase in the density of a species in oiled sites might suggest a positive effect on the species, but densities might have increased even more in a reference area. Alternatively, there might be no change in an oiled area (suggesting no impact) but an increase in the reference area. In both cases, the results could be interpreted as evidence of negative spill impacts, even though the relationships might be due primarily to changes occurring in the unoiled reference area. The problem, then, is not with the BACI design or analysis itself, but with the underlying assumptions. Because environmental differences can confound the results of a BACI analysis, it is imperative that one assess those assumptions and interpret the results accordingly.
Linking the results of a BACI (or any other) analysis to an environmental disruption such as an oil spill also requires that a reasonable causal pathway be established. It is difficult to assess cause-effect linkages in ecological systems under the best of circumstances. The usual devices that ecologists use to establish causation, such as experimentation, statistical hypothesis-testing, or rigorous forms of model selection, are generally compromised by the unreplicated nature of environmental accidents. Moreover, as time passes following such an event, more things happen, and different things happen in different places. What may have been a clear signal of oil-caused effects a few months or a year or two after a spill becomes diluted and distorted over time, as other forces influence the distribution and abundance of birds on multiple scales.
Given these difficulties, it is appropriate to ask what standards should be applied when assessing spill impacts and recovery. On the one hand, there is the specter of committing a Type II error, of failing to detect a real impact (or falsely documenting recovery) because sample size or statistical power is low or statistical criteria are too stringent. This is why it is becoming customary to use an α-level of 0.20 in statistical tests of potential oil-spill impacts (e.g., Day et al. 1997, Murphy et al. 1997, Irons et al. 2000). Even then, however, there remains the question of how blindly one should follow the results of statistical tests. If the data suggest negative spill effects but the tests are not statistically significant (even using α = 0.20), for example, should one nonetheless be conservative and conclude that there is evidence of an impact? After all, natural variability may make it difficult to document an effect with statistical rigor, so such suggestive data may be the best one can obtain.
On the other hand, simply because one expects oil spills to have severe and long-lasting effects on marine birds (e.g., Piatt et al. 1990, Fry 1993, Heinemann 1993, Wiens 1996) does not mean that one should unduly emphasize evidence of negative spill effects. Indeed, the fact that we expect negative impacts should foster caution about too readily accepting apparent evidence of such impacts that is compromised by unmet assumptions in statistical tests or undocumented links in cause-effect pathways. Guarding against preconceptions is difficult in any study. Even in basic science, theories affect how we think about things, and a “test” of a theory is rarely completely independent of the preconceptions fostered by the theory. In applied work, such as assessing the consequences of oil spills or other environmental perturbations (e.g., forest fires, habitat fragmentation, grazing, invasion of exotic species), the effects of preconceptions can be even more pernicious.
Because human-caused environmental disruptions such as oil spills often lead to litigation and, perhaps eventually, to the formulation of environmental policies, the consequences of the analyses and interpretations of scientific studies may be especially great. In a dynamic environment in which the effects of environmental accidents cannot be investigated using a cleanly replicated study design, results and conclusions will always be tinged with uncertainty. This uncertainty should not be used as an excuse to ignore the results of scientific studies, but it does dictate that findings be reported as they are, that results be interpreted with appropriate caution, and that adequate consideration be given to alternative explanations for the observed patterns.
We thank Al Maki of ExxonMobil Corporation and David Page of Bowdin College for their assistance. The writing of this paper was supported by ExxonMobil Corporation; the conclusions, however, are our own. David Dobkin and an anonymous reviewer provided insightful and helpful reviews. This paper was written while JAW was a Sabbatical Fellow at the National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant #DEB-0072909), the University of California, and UC Santa Barbara.