Ordinary least-square (OLS) regression is fundamental to quantitative research in many ecological disciplines. However, spatially explicit methods have recently been proposed that allow the incorporation of spatial autocorrelation into ecological models. We compared the spatial error simultaneous autoregressive model (SARerr) and generalized least squares regression (GLS) with the results of simple and multiple OLS regressions, to analyze the relationship between white-tailed deer (Odocoileus virginianus) population density and environmental conditions in two regions dominated by tropical dry forests in central Mexico. The spatially explicit methods presented better goodness of fit than the OLS regression; we also observed a miscalculation in the probabilities obtained with the OLS regression, which in this method led to an incorrect interpretation. In general, we suggest the application of spatially explicit methods to analyze species-habitat relationships when SAC is observed in model residuals. We also discuss the management implications of these results.
Introduction
Ordinary least-square (OLS) regression is fundamental to quantitative research in many ecological disciplines [1]. However, spatial autocorrelation structure in the data may invalidate the OLS regression assumption that model residuals are uncorrelated [2]. Spatial autocorrelation of an ecological response occurs when nearby locations in general have more similar values than distant locations, due to the relationship between distance and biological processes such as speciation, extinction, dispersion, or species interactions [3]. Spatial autocorrelation can explain these processes but also is a challenge for statistical analysis, since it violates the assumption of independence required for most statistical models, resulting in incorrect error probabilities and seriously flawed coefficient estimates [2, 4].
In the last two decades, a wide variety of methods (known as spatially explicit) have been proposed to correct for the effects of spatial autocorrelation. These methods, such as the wavelet-revised model [5], eliminate this autocorrelation in the response variable, while others [6] incorporate it into the predictor variables. However, according to Beale et al. [2], the methods that perform best are those that correct spatial autocorrelation in model residuals, such as generalized least-squares regression [7] and spatial error simultaneous autoregressive models (SARerr) [8].
We used the spatially explicit models SARerr and GLS to analyze habitat variables that correlated to white-tailed deer (Odocoileus virginianus) population density in two regions dominated by tropical dry forest in central Mexico. We compared the results of these models with those obtained with OLS regression, which is the most commonly used technique for analyzing species-habitat relationships [9]. We generated various models and evaluated their goodness of fit using the Akaike Information Criterion corrected for small samples [10]. In addition, we analyzed whether differences in deer density and habitat variables existed between the two study regions.
This evaluation is important because studies conducted in Mexico on the effects of different habitat variables on the density or abundance of the white-tailed deer [11121314151617–18] have employed conventional techniques such as simple regression or multiple regression by ordinary least-squares, multivariate techniques such as principal component analysis that do not take into account spatial autocorrelation [19]. Understanding and including spatial autocorrelation can reveal more accurately the relationship between habitat and white-tailed deer population, with important repercussions for conservation and management of the white-tailed deer in Mexico.
Methods
Study area
The study areas were the Bajo Balsas and the Tehuacán-Cuicatlán Valley, both located in central Mexico (Fig. 1). The Bajo Balsas is in the western portion of the state of Michoacán, México (19° 11′ N, 101° 42′ W) and covers 6,904 km2. Altitude ranges from 200 to 1,800 m.a.s.l., with three climatic zones: hot sub-humid, semiarid warm, and warm sub-humid with summer rains [20]. The annual mean temperature varies from 18 to 29°C and annual precipitation is from 533 to 1,347 mm. Main vegetation types include tropical dry forest in the lowlands and oak and mixed oak–pine forests at higher elevations [21]. The Tehuacán-Cuicatlán is located in the southern part of the state of Puebla and northern Oaxaca (18° 53′ N, 97° 44′ W) and covers almost 10,000 km2. Altitude ranges from 34 to 1,829 m.a.s.l., the annual mean temperature is from 12 to 45°C, and annual precipitation varies from 260 to 3,011 mm. Main vegetation types include tropical dry forest and crassicaule scrub in the lowlands and temperate forests at higher elevations [22].
Population density estimations
We sampled a total of 11 locations in Bajo Balsas from August 2007 to August 2008, and 11 locations in the Tehuacán-Cuicatlán valley from March 2010 to May 2011. In Bajo Balsas, we sampled a total of 25 strip-transects (500 × 2 m) for pellet-group counts in five locations, and 30 transects (500 × 2 m) for track counts in six locations. In Tehuacán-Cuicatlán, we sampled 88 strip-transects (500 × 2 m). To sample pellet-groups, we followed the fecal standing crop count method according to Camargo-Sanabria and Mandujano [23, 24], while for track counts, we followed the procedure proposed by Mandujano [25]. Both of these methods generate similar results for relative deer density and both may therefore be used for detecting temporal changes in a population [26]. To estimate population density through pellet-group count, we used the equation proposed by Eberhardt & Van Etten [27]. We employed PELLET version 2.1 which is a semi-automated procedure in Excel ® [28]. For the track count method, we used the equation proposed by Mandujano [25].
Environmental variables
We obtained a set of predictor variables for each studied site. These were: annual mean temperature (AMT), temperature mean diurnal range (MDR), annual precipitation (AP), precipitation seasonality (PSEAS), precipitation in the driest month of the year (PDRM), and precipitation in the coldest quarter of the year (PCOQ). However, because some of these represented redundant information, we screened for collinearity by examining pairwise correlations between variables. When a pair had a Pearson product-moment correlation coefficient >0.7, one of the two variables was removed [29]. This procedure produced a reduced set of variables, of which six represented average (1950-2000) climatic conditions (Worldclim database [30]). In addition, we derived slope from the SRTM elevation model (< http://srtm.csi.cgiar.org>) and human population density from the population statistics reports of Michoacán and Oaxaca states [31]. With these eight variables the analyses described below were performed (final resolution of variables was 30 arc-seconds).
Statistical models
To analyze the relationship between deer density and environmental variables, we used OLS regression and the spatially explicit methods SARerr [8] and GLS [7]. For a more extensive and detailed review of SARerr and GLS, as well as other spatial regression techniques, see Dormann [8] and Perez et al. [32].
We first generated univariate models to evaluate the relationship between deer density and each habitat variable. In both SARerr and GLS, an iterative protocol of “trial-and-error” was conducted for each model, generating proposals with different parameters to isolate spatial autocorrelation as much as possible. In GLS, the residual autocorrelation in the OLS was modelled by semi-variograms using different spatial structures (spherical, exponential, Gaussian) and coefficients (‘sill’, ‘null’ and ‘range’) [see 19]. An OLS fit between the expected (defined by the modelling process) and observed semi-variances was used to select the most appropriate parameters. In SARerr, various models were generated by testing with different alpha values (between 1 and 2). This parameter (alpha) controls the weighting given to the closeness between pairs of neighboring observations [33].
At the end of the iterative protocol, the minimum residual autocorrelation (minRSA) and the overall explanation of the model (R2) were used to select the best parameterization for each variable. It is important to highlight that R2 values are not directly provided for the GLS and SARerr models, and maximum model fit was therefore estimated with a pseudo-R2 value (hereafter referred to simply as R2), which was calculated as the squared Pearson correlation between the predicted and observed values [34].
Multivariate regression models were subsequently generated, taking all of the variables into account. Different models were constructed to determine the set of variables that best explained deer density. The final model for each region was chosen to minimize AICc (the Akaike Information Criterion corrected for small samples) and minRSA based on backward selection [10]. Parameters for each multivariate model were determined using the iterative protocol “trial-and-error”. All spatial analyses were performed using the program SAM 4.0 (Spatial Analysis in Macroecology) [33].
Finally, we used a Z test [35] to determine whether differences in deer density existed between the two study regions. Comparisons among habitat variables were subsequently conducted using t-Student tests where variables fulfilled the assumptions of normality and homogeneity of variance, and Wilcoxon tests where they did not [36]. These analyses were conducted using the program R ver. 2.11.0 [37].
Results
According to the OLS regression, there were four habitat variables significantly related to the density of the white-tailed deer in Bajo Balsas and three in Tehuacán-Cuicatlán (Appendix 1). However, according to the spatially explicit regression methods (SARerr and GLS), only two variables were related in each region: precipitation in the driest month of the year and precipitation in the coldest quarter of the year in Bajo Balsas (Fig. 2a); and annual mean temperature and precipitation seasonality in Tehuacán-Cuicatlán (Fig. 2b). Based on backward selection, we chose the models with the lowest AICc and minRSA, which selected the same variables (Appendix 2). In general, the spatially explicit methods (GLS and SARerr) presented better goodness of fit than the OLS regression and also reduced spatial autocorrelation in the residuals (see minsRSA values). In particular, GLS regression best modeled the spatial autocorrelation and obtained the lowest AICc (Appendix 2; Fig. 3).
According to the Z test (P < 0.05), population density was significantly higher in Bajo Balsas (8.75 ± 4.03 ind/km2) than in Tehuacán-Cuicatlán (2.03 ± 1.05 ind/km2). Regarding habitat variables, no significant differences were found between the two study regions for temperature mean diurnal range, precipitation in the coldest quarter of the year, and slope (Fig. 3). Significant differences were observed in the remaining habitat variables between the study regions (Fig. 3). In Bajo Balsas, annual mean temperature, annual precipitation, and precipitation seasonality were all higher than in Tehuacán-Cuicatlán. In contrast, precipitation in the driest month and human population density were lower in Bajo Balsas than in Tehuacán-Cuicatlán (Fig. 3).
Discussion
Comparison among methods
Analysis of the residuals through correlograms in the OLS regression consistently presented a positive spatial autocorrelation at short distances. This could mainly be due to the absence in the analysis of spatially structured explicative variables [8] that reflect biological processes in the deer populations, such as dispersion movements of individuals, interaction with other species (predators and food), anthropogenic effects, and other demographic factors. The SARerr and GLS methods eliminated, or considerably diminished, the effects of SAC in the residuals of the models.
When spatial autocorrelation exists in the residuals and the methods do not incorporate it, as is the case in the OLS regression, there is an increased probability of type I error (rejecting the null hypothesis when it is in fact true). It is thus possible to erroneously conclude that there is a relationship between variables when this is in fact untrue, since the coefficients of determination are inflated while the P values decrease [2, 4, 38]. Our univariate analysis results confirm this, since more habitat variables were significantly correlated with deer density in the OLS regression, and the P values were always lower than those of the spatially explicit methods SARerr and GLS (in both Bajo Balsas and Tehuacán-Cuicatlán).
Regarding the comparison of the measurement of goodness of fit among the three methods, the spatially explicit methods performed better (lower AICc and higher R2) than the OLS regression, a common pattern that has been reported in other studies involving data with spatial autocorrelation [2, 8, 39]. On the other hand, the results of the SARerr and GLS methods do not differ greatly since both control spatial autocorrelation of the model residuals and are mathematically similar [40, 41]. Nevertheless, consistently lower AICc values were found in GLS, which is possibly due to the fact that GLS is more flexible in the form in which SAC is incorporated into the models [8]. In GLS, the spatial structure of the covariance is modeled using a parametric function that is usually a semivariogram model [8, 42], while in SARerr, a weights matrix is generated that specifies the force of influence between neighboring observations and requires an iterative process in order to achieve optimum parametrization [33].
Explicative environmental variables
The best models in both regions had at least one climatic variable. This is logical from a biological perspective, since the importance of climatic factors and the manner in which they affect ungulate populations is widely documented in the literature [43444546–47]. High temperatures (> 30 °C) in semi-arid and tropical dry forests could have a negative effect on deer density, since this may lead to dehydration in the animals [48]. In the two study regions, a clear trend was observed in which densities began to decrease as temperatures increased, although significantly so only in Tehuacán-Cuicatlán. Annual precipitation between 400 and 1,800 mm has a generally positive effect on this deer species since it increases the quantity of available food [46, 49]. For example, in Bajo Balsas, high precipitation in the driest month and the coldest quarter of the year were positively associated with deer density, while in Tehuacán-Cuicatlán seasonality of precipitation correlated negatively with density, since rain throughout the year is less than 500 mm.
The only two non-climatic variables used in this analysis were slope and human population density. We considered it important to include the former since it is related to deer strategy for escaping from predators [13, 50, 51]. Regarding the latter, it is well known that human presence and activity have a negative effect on many species, which has been described for the white-tailed deer in Mexico by some authors [17, 18, 52]. However, while it is possible to note certain trends in the relationship between these variables and deer density, this was not found to be significant in either of the two study regions.
Comparison between regions
The exact causes for the differences in deer density between the two regions are difficult to determine from our analyses. However, we can make some assumptions based on the environmental comparison between the zones. Climatically, four of the six analyzed variables differed significantly between the regions: annual mean temperature, annual precipitation, precipitation in the driest month, and precipitation seasonality. In Bajo Balsas, deer density was significantly higher because, despite having higher temperatures than Tehuacán-Cuicatlán, it rains more and over a longer period of the year. This can be directly related to the availability of food for the deer and thus to the carrying capacity of the region [46, 49].
The other variable that differed significantly between regions was human population density, which was greater in Tehuacán-Cuicatlán. It is therefore likely that the low deer densities observed in this zone were the result of human pressure. In most of the sites of this region, we note high levels of illegal hunting and also that livestock production competes with the deer for space and food, with direct consequences for the deer populations [11, 18, 53, 54]. Another important consideration is that certain variables related to the structure of vegetation strongly affect the density of the white-tailed deer at the local scale in both regions (Bajo Balsas [17] and Tehuacán-Cuicatlán [18]). However, in order to characterize variables of this type it would be necessary to conduct time-consuming vegetation sampling in every transect where deer excrement was found.
Implications for conservation
Our results provide insights into methodological, ecological, management and conservation issues of white-tailed deer inhabiting tropical dry forests in Mexico. From methodological and ecological perspectives, studies of deer-habitat relationships have traditionally been conducted at small scales and have provided valuable information regarding the variables that affect local populations of this species [see 11–18, 47]. However, unless these studies are repeated in various habitats and different regions, it is impossible to know which results are specific to each location and which can be generalized. Large-scale (regional, landscape, ecoregion) studies are therefore very important to better understand natural systems, to make inferences in non-sampled areas, and to propose improved management strategies [16]. However, such studies require statistical tools that take the spatial aspect into account when analyzing the data. The SARerr and GLS models are a good option for evaluation of species-habitat relationships where spatial autocorrelation is observed in the model residuals.
For management and conservation of white-tailed deer populations, habitat management is a key strategy [55]. Intensive habitat management involves manipulating food quality and availability, vegetation cover to protect against climatic conditions and predators, and supplementary water sources during the driest months, which have been the principal practices in temperate and semi-arid habitats [5657–58] and in some tropical dry forests [51, 52, 59]. These habitat variables are affected principally by precipitation and temperature [56]. According to our results, the main variables related to the density of deer in Bajo Balsas region were the precipitation in the driest month of the year and precipitation in the coldest quarter of the year, and in Tehuacán-Cuicatlán the annual mean temperature and precipitation seasonality. Therefore, we suggest integrating spatially explicit models that incorporate different environmental and vegetation variables [17, 18] to implement appropriate habitat management actions. Since management practices imply economic cost for local people, it is important to adequately define the key variables that affect local deer populations [55].
Acknowledgements
We thank J. M. Lobo, S. Gallina and R. Guevara for comments. Field work was assisted by L. A. Escobedo Morales, A. González Zamora, Á. Méndez, J. C. Castillo, N. Corona, A. Vázquez, C. Ríos, K. P. Rodríguez Medina, D. Ponce and local guides. We appreciate the collaboration of A. Pérez Arteaga, J. M.Salazar, M. Romero Tinoco and V. Tapia, This study was partially supported by the government of the State of Michoacán, COECYT, FOMIX, CONACYT, Universidad Michoacana de San Nicolás de Hidalgo and Conservación y Manejo Ambiental de la Costa de Michoacán A. C. in Bajo Balsas. In Tehuacán-Cuicatlán Valley, the study was supported by the Biosphere Reserve directive (CONANP), the local authorities of Oaxaca and Puebla states, CONACYT (CB-2009-01 No. 130702), and the Red de Biología y Conservación de Vertebrados del Instituto de Ecología, A. C. Valuable scientific comments input were provided during the ‘Academic Retreat 2014’ organized by INECOL.
References
Appendices
Appendix 2
Summary characteristics of the three models with lowest minRSA and AICc after the backward selection procedure. A measure of model fit is also given (R2). PDRM = precipitation in the driest month of the year, PCOQ = precipitation in the coldest quarter of the year, AMT = annual mean temperature, PSEAS = precipitation seasonality.