Species-area relationships (SAR) are useful in predicting species richness for a given geographical area. Using SAR and the state of Texas as a case study, we present a model that provides a quantifiable and objective approach for identifying large scale data gaps in species inventories and museum collections by comparing documented species richness (determined by herbarium records) to predicted species richness. For Texas our results indicate that 88% of the counties have documented species richness values that are below predicted values based upon our results from the proposed model. Many biological survey and inventory programs are funded to document species occurrence and richness. Such studies help identify species of concern and enhance species conservation efforts. Future species inventories may benefit from such predictive models in identifying regions of large scale data gaps.
Classic (Isley, 1972) and recent (Turner, 1998; Ertter, 2000; Heywood, 2001; Prather et al., 2004) articles have emphasized floristic studies and the need for continued collecting and cataloging of herbarium specimens. Unfortunately, this appeal for continued collecting has mostly been based on anecdotal evidence. Few articles attempt to quantify the current stagnation in botanical collections. Prather et al. (2004) provide the most recent and compelling evidence for large-scale information gaps by presenting data that show a temporal decline in herbarium collections over the last three decades. Prather et al. (2004) also identify regions with increasing and decreasing herbarium collections in the continental U.S.A. These geographical data, however, oversimplify spatial data and assume that specimen growth in a region's herbaria indicates an increase in that region's floristic inventory.
The species-area relationship (SAR) is regarded as “one of community ecology's few laws” (Schoener, 1976). SAR simply states that as area increases, species richness increases (Brown and Lomolino, 1998). Often SAR can be used to estimate species richness (S) for a given geographical area (A). Estimations of S are based on the formula S = CAz where z and C are constants varying with geographic location and taxa studied (MacArthur and Wilson, 1967). Such SAR have used geographical area to predict species richness of birds (Diamond and Mayr, 1976), earthworms (Judas, 1988), arthropods (Covarrubias and Elgueta, 1991), and stream fishes (Angermeier and Schlosser, 1989). These relationships have also been useful in determining floristic richness (McNeill and Cody, 1978; Buys et al., 1994; Palmer et al., 2002; Fridley et al., 2005). Although species-area analyses are commonly used and generally accepted for predicting species richness, there is little indication of its utility in identifying large scale data gaps in herbarium collections.
We present and discuss a model that provides a quantifiable and unbiased approach for identifying large scale data gaps in herbarium collections. By comparing documented species richness values (determined from herbarium records) with predicted species richness values (determined from the formula S = CAz ), we address the following questions: 1) can species-area relations be used to predict plant diversity?, 2) using predicted species richness, can significant data gaps in herbarium records be geographically identified within a largescale geographic region?, and 3) can predicted species richness be used to determine sampling effort and a threshold number of samples needed to eliminate data gaps in museum collections?
Table 1.
Known values of species richness for vascular plants and associated geographic area from published accounts.
Materials and Methods
A literature search was performed to identify published checklists and floras for regions of known area with defined boundaries within and bordering the state of Texas. In all 17, checklists and floras were found (Table 1). From these checklists one value represented the entire state of Texas, nine represented entire counties, and seven represented smaller inventories collected within counties. Each study provides a value for species richness and geographical area smpled (which we converted to square kilometers). Both species richness and geographical area were log transformed and entered into a database. The database was imported into SPSS® version 10.1 and a linear regression was performed to determine the statistical relationship between species richness (dependent variable) and geographical area (independent. variable). This analysis provided the theoretical slope (z) and intercept (C) for the formula S = CAz. We then predicted species richness for each individual county in Texas by applying the above determined constants z and C to the Arrhenius (1921) log-log (log S = log C + z(log A)) model with A representing the area in square kilometers for each of the 254 counties in Texas.
Next we accessed cataloged herbarium specimens through the Flora of Texas Consortium (FTC; http://csdl.tamu.edu/FLORA/ftc/ftchome.htm) database and recorded the documented species richness (determined by the number of species collected and identified from each county to date) and the number of specimens reported from each county in Texas. We then ran a cubic regression analyses comparing documented and predicted species richness for each of the 254 counties with relation to area.
Lastly, using information gathered from the FTC we performed a linear regression to describe the relationship between the number of herbarium specimens (independent value) and documented species richness (dependant value).
Results and Discussion
The constants z (0.1553) and C (266) for vascular plants in Texas were determined using linear regression (Fig. 1) of geographical area and known species richness values cited from the 17 floristic inventories listed in Table 1. The determined value of z (0.1553) is consistent with the reported and accepted range of z values (0.12–0.17) for terrestrial plants within continents (MacArthur and Wilson, 1967). For a given square-kilometer in Texas, C indicates a species richness of 266. Using z and C in the formula log S = log C + z(log A), we addressed our first question and predicted species richness for each of the 254 counties in Texas. Our approach determined a statewide z and C value by plotting data for all checklists within the state of Texas (Fig. 1). Consequently, we most likely overestimated species richness in the northern counties and underestimated species richness in the southern counties. We used this approach because the checklists used to determine z and C are randomly scattered throughout Texas (Fig. 2). However, it is possible that predicted species richness for each county could be further modified by determining unique z and C constants for each of the 11 physiognomic regions of Texas and applying these values to the area of the counties within the specific physiognomic region. In order to determine unique z and C constants for each of the physiognomic regions, a minimum of three floristic inventories (within a known boundary) within each region needs to be performed and documented. Given that there are 11 physiognomic regions a minimum of 33 inventories need to be performed. To date there are 17. Ideally, more inventories performed per region would yield more optimal results. We also view the lack of checklists and floristic inventories across these physiognomic regions as a data gap.
To address our second question, we used cubic regression analyses to compare documented (F(1, 250) = 14.10; p < 0.001; r2 = 0.145) and predicted species richness (F(1, 250) = 5280.55; p < 0.001; r2 = 0.984) for each of the 254 counties with relation to geographical area (Fig. 3). Counties with documented species richness that approximate or exceed predicted species richness fall on or above the predicted species regression line; counties with under represented documented species richness fall below the predicted species regression line. This cubic regression model allows one to identify counties that are well collected and those that are under collected. Our results indicate that only 29 (or 11.4%) of the 254 counties in Texas fall close to or above the predicted line and are, therefore, considered well collected (Fig. 3). The 29 well collected counties are listed in Table 2 and are presented spatially on a map of Texas (Fig. 4). Interestingly, all counties with documented species richness values approximating or exceeding predicted values have, or are neighboring, universities with systematic botany programs. A comparison between counties with and without herbaria (Fig. 5) indicate a significant difference between both the percent species representation (documented species richness/predicted species richness) (t(253) = -9.494; p < 0.001) and mean herbarium specimens (t(253) = -10.156; p < 0.001). Although not significant (t(253) = -0.492; p = 0.623), counties with herbaria have more documented species per geographical area than non-herbaria counties (Fig. 6).
Table 2.
Counties with documented species richness that approximate or exceed predicted species richness.
The implications for the above model may have broad interest. Apart from isolating geographical areas with paucity in collection, the model identifies and defines geographical areas with limited data on documented species richness and distribution. Detailed specimen collections are important for future conservation efforts and provide a historical perspective for increasing or decreasing species richness in a given area. Accurate records of species richness prior to disturbance events will also allow for an accurate evaluation of the disturbance and appropriate conservation measures.
Finally, we address our third question: can predicted species richness be used to determine sampling effort and a threshold number of samples needed to eliminate data gaps in herbarium collections? If sampling effort in a certain region nets little increase in documented species richness, sampling in different and new localities may prove more productive. The relation between the number of individual specimens sampled and the number of taxa documented was first suggested by Preston (1948) and has been referred to as the “collector's curve” (Colwell and Coddington, 1995). In addition, Miller and Wiegert (1989) utilized SAR to determine completeness in botanical exploration. However, their application was for only rare plants and relied heavily on hypothetical data.
Here we present a simple statistical method for potentially determining optimal collection effort for documented species richness. A linear regression (F(1,252) = 21.73; p < 0.001; r2 = 0.079) was used to describe the relationship between the number of herbarium specimens and documented species richness (Fig. 7, curve a). Although this relationship is statistically significant, only 7.9% of the variation in documented species richness is explained by the number of herbarium specimens. This is because “collector's curves” follow a logarithmic relationship where the rate at which new taxa are documented decreases with the number of specimens collected. Thus, the likelihood of finding a new taxon during the first 1000 specimens collected is much greater than while collecting the second 1000 specimens. A logarithmic regression (F(1,252) = 1366.24; p < 0.001; r2 = 0.844) demonstrates this relationship where over 84% of the variation in documented species richness is explained by the number of herbarium specimens (Fig. 7, curve b). Figure 7 also illustrates that documented species richness intersects with predicted species richness near 3000 herbarium records. However, for the next 3000 specimens added to the collection, there is only a net gain of 100 new documented species above the predicted species richness. Our methodology suggests that the intercept between the linear and logarithmic regressions indicates optimal sampling effort (barring bias) and the threshold number of samples needed to be collected in order to reach predicted species richness values for a locality. In the statistical model presented here, 3000 samples should be collected to approximate predicted species richness for each Texas county. Once 3000 specimens are collected within a county, additional sampling effort beyond this point will result in minimal gain in additional documented species richness. This methodology may aid in eliminating collecting redundancy in over sampled counties and increase sampling efforts in under sampled counties.
Despite the obvious explanation of under collecting, several other contributing factors may lead to low documented species richness values per county. These include collector bias and the fact that not all collections are inventoried and data-based in the FTC. Collector bias is difficult to test and is an innate aspect of collecting. Incomplete data-basing however, reflects another growing example of data gaps and can be rectified through inter-herbaria cooperation and increased funding. Nevertheless those counties identified as having documented species values greater than or equal to predicted species richness values are indeed well collected counties.
We welcome the application and testing of this approach to other biological collections. The further development of such models may aid in identifying data gaps within collections and may benefit future collecting efforts for species inventory.
Acknowledgements
We thank Bruce Hoagland (University of Oklahoma), Michael W. Palmer (Oklahoma State University), Jake Schaefer (University Southern Mississippi) and Beryl Simpson for comments and improvements to the paper. This research was partially funded through USDA grant 321-20-A164.