Multivariate regression analysis, combined with residuals correction, was carried out to develop a precipitation prediction model for the Daqing Mountains of Inner Mongolia in northern China. Precipitation data collected at 56 stations between 1955 and 1990 were used: data from 48 stations for model development and data from 8 stations for additional tests. Five topographic factors—altitude, slope, aspect, longitude, and latitude—were taken into account for model development. These topographic variables were acquired from a 100-m resolution digital elevation model (DEM) of the study region, and the mean values of the sub-basin in which a precipitation station is located were used as the values of the respective variables of that station. The multivariate regression model can explain 72.6% of the spatial variability of precipitation over the whole year and 74.4% of variability in the wet season (June–September). Precipitation in the dry season (October–May) is hard to model owing to little rainfall (21.78% of annual rainfall) and a different synoptic system. Interpolation-based residuals correction did not significantly improve the accuracy of our model, which shows that our model is quite effective. The model, as presented in this paper, could potentially be applied to other mountains and in mountain climate research.
Environmental research, management and planning—such as agricultural and forest management—hydro-ecology modeling, and environmental assessments usually require spatially continuous data for precipitation. However, precipitation is usually measured at a very limited number of stations, especially in mountain areas. As a result, accurate estimation and prediction of precipitation represents a great challenge due to lack or non-validity of observation data (Daly et al 1994; McGuffie and Henderson-Sellers 2001; Alijani et al 2007). The development of geographic information systems (GIS) in recent years provides increased opportunities for precipitation modeling.
Interpolation methods have been developed for rainfall modeling. Most of them are based mainly on the similarity and topological relations of nearby sample points and on the value of the variable to be measured (Beek et al 1992; Gemmer et al 2004; Chang et al 2005). Interpolation can be achieved using simple mathematical models (inverse distance weighting, trend surface analysis, splines and Thiessen polygons, etc) or more complex models (geostatistical methods, such as kriging). Geostatistical interpolation has become an important tool in climatology because it is based on the spatial variability of the variables of interest and makes it possible to quantify the estimation uncertainty (Martinez-Cob 1996; Holawe and Dutter 1999). However, interpolation methods only consider spatial relationships among sampling points, and do not take into account other important topographic variations. Consequently, the usual interpolation methods cannot provide researchers with adequate precision of precipitation estimation, especially in complicated terrains of mountainous regions (Marquínez et al 2003; Sharples et al 2005).
In recent years, geographic and topographic factors have been integrated into the modeling of precipitation (Johansson and Chen 2005). Some authors have attempted to incorporate local topographic factors, such as elevation, into geostatistical approaches (Goovaerts 2000), and others have developed models relating climate to site position and elevation (Goodale et al 1998). Relationships between topography and the spatial distribution of precipitation have been analyzed for mountainous regions (Basist et al 1994; Cheval et al 2003). The Precipitation–elevation Regressions on Independent Slopes Model (PRISM) brings a combination of climatological and statistical concepts to the analysis of orographic precipitation (Daly et al 1994), and recently PRISM has used weighting functions to incorporate gauge data of neighboring topographic facets for regressions (Daly et al 2002). More recent studies have considered more refined topographic factors by using higher resolution digital elevation models (DEM) to predict the physical influence of topographic variables on precipitation patterns. Precipitation models integrating statistical and GIS techniques have become widespread and common (Ollinger et al 1995; Goovaerts 2000). Multiple linear (Ninyerola et al 2000, 2007; Brown and Comrie 2002; Guan et al 2005) and non-linear (Marquínez et al 2003) regression models have proved to be rather effective.
The present study, focusing on the Daqing Mountains in Inner Mongolia of northern China, was conducted to develop a multivariate regression model for estimation of precipitation in mountain regions, based on five topographic factors (altitude, slope, aspect, longitude, and latitude).
Materials and methods
In northern China, many mountain ranges block the summer monsoon from the Pacific Ocean, preventing it from penetrating deep into the heartland of Asia: the Da (Great) Hinggan, Taihang, Yinshan, Helan, and Mazong mountains; as a result, they usually constitute significant moisture and temperature boundaries. The Daqing Mountains are located roughly in the center of the Inner Mongolia Autonomous Region, at 40°20′–41°20′N and 100°30′–111°30′E, covering an area of 9336 km2 (Figure 1). Their elevation ranges from 990 m to nearly 2340 m asl within only a few kilometers of planimetric distance. The local climate is dominated by warm-moist summers and cold-dry winters. The Daqing Mountains form the dividing line between the warm-temperate and temperate zones in northern China. As a climatic and geographic transition belt, this region is particularly suited to the study of relationships between precipitation and topographic factors.
Data collecting and processing
Precipitation data from 56 stations were obtained from the observation archives of the Yellow River Conservancy Commission (1955–1990). All of these records cover over ten years, 38 of them more than twenty years. For each station, the annual average amount of precipitation was used. Density was one station for every 167 km2, with most of these stations located at medium altitude, between 1060 m and 1954 m asl (Table 1 and Figure 1). The stations were quite evenly distributed both horizontally and vertically. The majority of annual rainfall, averaging 78.22%, occurs in the four months from June to September. This period is the wet season, and the other months make up the dry season.
Vertical distribution of precipitation stations in the Daqing Mountains.
There are many factors which affect precipitation and its spatial distribution. Usually, precipitation increases with growing elevation; it varies depending on slope and aspect; the amount and intensity of precipitation is larger on windward than on leeward slopes; and differences increase with growing steepness of the slope (Fu 1992). Factors closely related to precipitation in the Daqing Mountains include not only rugged orography, but also geographical location, as rainfalls in the study region are mainly due to moisture-laden air masses from the southeast (Pacific summer monsoon). Therefore, latitude and longitude, understood in terms of their relative distance to the Pacific Ocean, should be considered in the development of precipitation modeling for this area.
The introduction of GIS and the corresponding powerful software have brought new capacities to the world of geography (Longley 2000). Using ArcGIS 9.0 software, we were able to obtain topographic variables from the DEM based on 1:250,000 contour maps. The DEM had a 100-m resolution, with a projected coordinate system of WGS_1984_Transverse_Mercator. Smoothing processing of predictors—eg smoothing of neighboring grids (Daly et al 1994; Wotling et al 2000; Sharples et al 2005)—or using mean values of the sub-basins (Marquínez et al 2003) is usually applied to increase the representativeness of the stations. For this purpose, we divided the study region into more than 1900 sub-basins with the help of the ArcGIS hydrology module. These sub-basins covered areas between 2.96 and 26.67 km2, with an average of 4.9 km2. The mean values of the topographic factors (elevation, slope, and aspect) in the sub-basin in which a station was located were used as the values of the respective variables of that station. In addition, we calculated the topographic variables for the five layers (altitude, slope, aspect, longitude, and latitude) as raster models using 100-m resolution grids. We used a 100-m grid because the study region is not large but varies greatly (the elevation ranges from 990 m to nearly 2340 m asl within only a few kilometers of planimetric distance on the southern flank of the Daqing Mountains). We also tried smoothing values (eg with 5000 neighboring cells), and results showed that virtually no detailed information was missing in our model.
The combination of statistical (multivariate regression) and spatial interpolation (ordinary kriging) has been demonstrated to be effective in modeling precipitation (Ninyerola et al 2007). Topographic variables, if considered on their own, are poorly related to rainfall statistics, as shown in Table 2. In other words, none of the topographic variables could be used by itself to appropriately explain the precipitation pattern. Considering that linear equations are not well suited to depicting the relationships of precipitation and topographic variables, we tried to model precipitation with multivariate nonlinear regression, as follows:
Pearsonian correlation coefficient matrix for independent variables and precipitation data.
From the 56 precipitation stations, a total of 48 were selected for modeling by random sampling using SPSS 13 software, and the remaining 8 stations were used for validation. The constants were acquired by means of the least squares method in the regression module of SPSS, and all the regression statistics were also created by the software (Table 3).
Regression statistics and evaluation of the multivariate regression model (including 48 stations used for modeling). R2 is the determination coefficient that serves as a measure of the goodness-of-fit of the model; adj_R2 is the adjusted determination coefficient, which compensates for the limitation of the determination coefficient by taking into account the size of the sample and the number of prediction variables, and it exactly represents the proportion of variation of the dependent variable (ie annual and seasonal mean precipitation) explained by the multivariate regression model; RMSE is the root mean squared error, which describes the error of prediction in the modeling of precipitation; F is the value of the mathematics test; DW is the value of the Durbin-Watson statistic—a test statistic used to detect the presence of autocorrelation in residuals, based on regression analysis.
Results and discussion
About the regression model
It has been shown that when the fit set accounts for more than 80% of the whole set, the adjusted determination coefficient (adj_R2) tends to remain stable. Hence, it is reasonable to select 85% (48 stations) of the whole set as the fit set (Figure 2).
The equations for both annual and seasonal precipitation pass the F tests at 0.01 significance level, and the high determination coefficient (R2) and adj_R2 show the goodness-of-fit of the equations (Table 3).
The capability of the model to explain the spatial variability of precipitation varies depending on the period: its accuracy is 74.4% for the wet season and 72.6% for the whole year, but only 41.5% for the dry season. The root mean squared error (RMSE)—an index for estimating relative error— is 8.02%, 8.38%, and 18.78% for the three periods, respectively. This shows that our multivariate regression model works well for both the whole year and the wet season.
A Durbin-Watson test was conducted for residuals estimation. The values in Table 3 show little autocorrelation among the residuals, meaning that our model is of good quality and able to predict most of the spatial variability of precipitation in the Daqing Mountains.
In order to improve the results of the regression model, we adopted ordinary kriging to obtain a residuals surface. An exponential model was used for predicting the residuals of the 48 precipitation stations. After some trials, lag size was finally fixed at 8 km, with a total of 8 lags. The searching neighborhood was elliptical and divided into 4 sectors, and their maximum and minimum numbers of neighbors to be included were 5 and 2 in each case. Ultimately, the corrector layers of distribution for residuals were produced in ArcGIS. After overlaying them onto the regression surface, we obtained the spatial distribution of precipitation (Figures 3A to 3C).
Spatial pattern of precipitation
In the wet season, precipitation clearly decreases from southeast to northwest. In the dry season, precipitation is almost randomly distributed, with high values in some highlands only, such as in the center and south of the study region. The precipitation pattern for the whole year is similar to that of the wet season, with less precipitation in the northwest and on the lee sides and relatively much precipitation on the southeast slopes. We also found that precipitation in the Great Bend of the Yellow River (the Hetao Plain) is higher than in the Inner Mongolian Plateau, which is situated at much higher altitude.
Validation of the multivariate regression model with 8 test stations
We calculated the mean absolute error (MAE) and RMSE of modeled precipitation for the 8 test stations (Table 4). The RMSE for whole year, wet season, and dry season yielded precipitation values of 44.08 mm, 36.71 mm, and 12.42 mm, respectively, accounting for 12.77%, 11.94%, and 32.96% of measured precipitation. Upon considering the kriged residuals grids, predicted precipitation values increased only slightly, ie by less than 3 mm, adjusting for MAE and RMSE. This demonstrates that the residuals are random errors, and that our regression model works quite well for interpreting the spatial variability of precipitation.
Validation of the multivariate regression model with 8 test stations.
The multivariate regression model developed for the Daqing Mountains is able to describe 72.6% of the spatial variability of annual precipitation in the study region, and it is much more effective for the wet season than for the dry season.
Multivariate regression models have universality in statistics, and they can depict most of the spatial variability of precipitation in mountain regions. Our model only uses DEM and precipitation data from a limited number of stations, but it could be easily applied to other mountain areas and in mountain climate research. We consider local prevailing winds, especially their direction and force, also to be an important variable for rainfall patterns. Integration of the wind variable in modeling would thus probably increase the accuracy of our model.
The authors are grateful to the two anonymous reviewers, who made constructive comments and suggestions regarding our manuscript. Funding from the Knowledge Innovation Project (No. kzcx2-yw-308) of the Chinese Academy of Sciences and the National Natural Science Foundation (No. 40571010) of China is gratefully acknowledged.