Logistic Multiple Regression, Principal Component Regression and Classification and Regression Tree Analysis (CART), commonly used in ecological modelling using GIS, are compared with a relatively new statistical technique, Multivariate Adaptive Regression Splines (MARS), to test their accuracy, reliability, implementation within GIS and ease of use. All were applied to the same two data sets, covering a wide range of conditions common in predictive modelling, namely geographical range, scale, nature of the predictors and sampling method.
We ran two series of analyses to verify if model validation by an independent data set was required or cross-validation on a learning data set sufficed. Results show that validation by independent data sets is needed. Model accuracy was evaluated using the area under Receiver Operating Characteristics curve (AUC). This measure was used because it summarizes performance across all possible thresholds, and is independent of balance between classes.
MARS and Regression Tree Analysis achieved the best prediction success, although the CART model was difficult to use for cartographic purposes due to the high model complexity.
Abbreviations: AUC = Area under the ROC curve; CART = Classification Regression Trees; FN = False negative; FP = False positive; GAM = Generalized Additive Model; GIS = Geographic Information System; GLM = Generalized Linear Model; LMR = Logistic Multiple Regression; MARS = Multivariate Adaptive Regression Splines; NDVI = Normalized Difference Vegetation Index; PCR = Principal Components Regression; ROC = Receiver Operating Characteristics.