Successful predictions of population fluctuations are valuable in game management, as population estimates are instrumental in increasing the time available for management decisions. However, finding a population model which produces predictions accurate enough to be used for management purposes is often precluded due to scarcity and noisiness of population data. Using two long-term population data sets, 1964–1984 data on Finnish grouse (*Tetrao urogallus, T. tetrix* and *Bonasa bonasia*) and 1914–1950 data on coloured fox *Vulpes fulva* from Canada, we demonstrate the use and power of an artificial neural network in predicting population fluctuations. The performance of an artificial neural network model is compared to two benchmark forecasts: time series mean and the previous data value. Unfortunate as it is, in practise management decisions often have to be made with limited data. Therefore, a notable advantage of neural network modelling is the forecast accuracy even in cases when the time series available are short and noisy, and the processes underlying population fluctuations are not fully understood.

The essence of the theory on applied population dynamics can be rephrased as: given the past, what can be said about the future? Forecasting future events is of special importance in many branches of natural sciences aiming at sustainable use of natural resources. For instance, in Finland, there are long traditions in monitoring game populations (e.g. Lindén, Helle, Helle & Wikman 1996). Consequently, various management organisations are accustomed to utilising long-term monitoring data to derive predictions of the population numbers in the next hunting season to guarantee timely decisions in hunting regulations. In Finland, the grouse populations of the autumn have already been predicted in the spring for several years (Lindén, Laurila & Wikman 1990). Besides benefiting game harvesting, precise enough forecasts also provide time for decision making, e.g. in pest control (Entwistle & Dixon 1986, Paton 1986) and fisheries biology (Crecco, Savoy & Whitworth 1986, Cochrane & Hutchings 1995). Here we restrict the discussion of forecasting to point forecasts one step ahead, and do not attempt to identify and predict long-term trends (see Ives 1995, Link & Sauer 1996). Thus, our approach here applies to tasks such as developing annual game management recommendations more than to conservational issues, where longterm predictions are of interest.

In the simplest cases the future population size, i.e. the forecast, can be obtained by projecting the present state of the population into the future according to a specified model. This can be done, e.g. by multiplying the current population size by its estimated growth rate, or by building a projection matrix with the survival and fecundity values of the individuals in the population (May 1976, Caswell 1989). In reality, however, this approach often poses several difficulties. Selecting an appropriate population model is not necessarily a straightforward task (Pascual, Kareiva & Hilborn 1997) and parameter estimation in practice may also prove to be dreadfully difficult especially with a large set of parameters and scarce data in the presence of noise. Therefore, in order to attain a more general and flexible forecasting system, population managers may make use of forecasting methodology derived from the time series statistics.

Time series models are of special interest in forecasting, since their implementation is clear: the only information they use is the past time series of population size observations (Box, Jenkins & Reinsel 1994). Thus, despite not readily offering biologically relevant interpretations for each parameter value, they have sometimes been used in population studies (e.g. Moran 1953, Mendelssohn 1980, Mendelssohn & Cury 1987, Jeffries, Keller & Hale 1989, Meltzer & Norval 1992). In some cases population data have been used together with external variables (Howard & Dixon 1990, Bautista, Alonso & Alonso 1992).

However, a common problem for traditional Box-Jenkins type of forecasting approaches as well as for econometric modelling (e.g. Pindyck & Rubinfeld 1991) is that ecological time series do not generally match the theoretical preconditions of time series analysis. For instance, they seldom have time invariant mean and variance (second-order stationarity; Diggle 1990, Chatfield 1996), or cannot be conveniently transformed to reach this assumption. The inherent features of population records, viz. short and noisy time series, often further hamper usage of conventional time series analysis. Additionally, though Box-Jenkins forecasting does not build on any specific population growth model, the range of possible population processes to be successfully modelled with time series models is limited (Tong 1990). Clearly, a more flexible tool would be of interest.

Notable progress in time series analysis has been done in the domain of non-linear time series models, which provide a flexible tool for describing complex processes. However, these model types, such as threshold autoregressive models (TAR), autoregressive conditionally heteroscedastic models (ARCH) and its generalised version GARCH, do not generally improve the accuracy of the actual point forecast over the older methods. Instead, their credit in this task is more due to producing better prediction intervals and thus better assessment of the risk involved in the forecast (Davies, Pemberton & Petrucelli 1988, Chatfield 1996).

In this paper, our aim is to focus on the possibilities and capability of one recent alternative tool, the artificial neural network technique (e.g. Carling 1992, Haykin 1994), in forecasting ecological time series. The power of neural networks in predicting time series is based on their flexible function approximation. We shall demonstrate the performance of the neural network approach with long-term grouse data from Finland (Lindén 1989, Lindström, Ranta, Kaitala & Lindèn 1995) and compare it to two simple benchmarks, time series mean and the previous data value (e.g. Casti 1993). Additionally, a different data set on coloured fox *Vulpes fulva* fur returns in Canada (Keith 1963), is used to show that the neural network algorithm like the one developed for grouse can also be generalised to a broader usage.

## Material and methods

### Material

The Finnish data are population records of capercaillie *Tetrao urogallus,* black grouse *Tetrao tetrix* and hazel grouse *Bonasa bonasia* dynamics in 11 provinces in Finland (Lindén 1989, Lindström et al. 1995) collected during a 21-year period from 1964 to 1984. The coloured fox data are fur returns before 1951 from the following seven Canadian provinces: British Columbia (32 years), Alberta (32 years), Saskatchewan (37 years), Manitoba (32 years), Ontario (32 years), New Brunswick (27 years) and Nova Scotia (32 years, Keith 1963).

To enhance the forecasting possibilities of the neural network system, the Finnish grouse data were detrended by regressing all the log-transformed time series against time and scoring the residuals. This was done to make the data more stationary, since the grouse fluctuations in Finland show a clear decreasing trend over the time span covered (Lindén 1989, Lindström et al. 1995). The coloured fox data did not show any profound trend, but the average densities and variances differed much. Therefore, they were standardised to zero mean and unit variance (Sokal & Rohlf 1995). Pre-processing the data is often a necessity in ordinary time series modelling (Diggle 1990, Box et al. 1994, Chatfield 1996), and vital in many cases in neural network modelling as well (Smith 1993, Azoff 1994). Worsened performance was also found in our initial explorations while training the neural network without pre-processing the data. Note, however, that despite the pre-processing the forecasted values can easily be transformed back to the original scale.

### Artificial neural network

An artificial neural network transforms the values of an input matrix **P** to values of an output matrix **T.** The actual neural network architecture between these two matrices largely depends on the specific problem. In time series forecasting an efficient design is a two-layer network, where the transfer function, **F**, is sigmoidal in the first one and linear in the second one (Smith 1993, Azoff 1994). In the network architecture, each element of the input matrix **P** is connected to each input neuron. Elements of **P** are weighed with layer-specific values, **W**_{i}, and shifted by a layer-specific bias value **b**_{i}. The results are then transformed by the layer functions of the corresponding level of network layers, in which form they reach the second layer. After the corresponding procedure in the second layer the output vector, **T,** is reached as:

**W,**and comparing the training output matrix

**T**

_{t}to the target output matrix

**T.**

Using as efficient a tool as a two-layer non-linear neural network, finding an excellent fit for the model poses no problem. Too precise learning in the presence of noise, however, prevents the generalisation of the network, i.e. using the trained network for forecasting the future values of new, independent data sets. This phenomenon is called overfitting (e.g. Smith 1993). Consequently, the art of building a successful network for forecasting purposes is to find a balance between finding a fit which is sufficiently accurate and simultaneously retaining the capacity for general solutions.

When the network is trained, increasing the number of learning epochs improves both the fit and forecast of the network to a certain limit, after which the fit gets still closer but the forecast deteriorates. This is where overfitting begins (Smith 1993). Overfitting originates from the network attempting to model noise present in the data, and is thus linked to a general rule of forecasting stating that the model providing the best fit is seldom the best one in forecasting (Pindyck & Rubinfeld 1991). This corresponds to a situation where one can find a perfect match between any data and an adequately high order polynomial only to have no success whatsoever in predicting future values.

One notable advantage in using a neural network in forecasting is that it can form the forecasting rules in an extremely flexible and efficient way. When there are data from the same process, i.e. population data on the same species from different locations as is the case of all species mentioned in this article, the input matrix, **P,** can be constructed so that it includes all the information available of the process. This is achieved by splitting the time series data according to the desired time lag used in the model, and combining all the original time series as follows. Suppose we have two time series (X(1),…,X(k)} and {Y(1),…,Y(n)}, and the desired maximum time lag used for forecasting is d. Then the first column, **p**_{x}(t), of the input matrix, **P,** and its corresponding counterpart, **t**_{x}(t), of target vector, **T,** are:

**p**

_{Y}(t) and

**t**

_{Y}(t) are formed correspondingly. The whole input matrix,

**P**, and target vector,

**T**, will comprise a combination of inputs p

_{x}and p

_{y}, and targets

**t**

_{x}and

**t**

_{y}: This procedure, called cross-sectioning (Chakraborty, Mehrotra, Mohan & Ranka 1992, Smith 1993), was done for all species to achieve a maximal set of learning examples for training the neural network.

### Neural network performance

Forecasting performance of the neural network was compared to two traditional benchmarks: 1) predicting the future value to equal the present, and 2) setting the prediction equal to the mean of the observed values so far. These benchmarks are suitable here since realisations of most stochastic processes, such as observed population dynamics in time, tend to aggregate near the mean and they are also temporally correlated (Turchin 1990, Royama 1992). Thus, these benchmarks show the performance of simple and extremely conservative forecasting methods.

The comparison was done so that for the grouse data, we initially optimised the time lag and learning rate of the neural network using cross-sectioned data. The optimisation was performed by seeking the time lag and learning rate which minimise the forecast error (sum of squared errors, SSE) for 25 randomly chosen data points for each parameter combination and species. Then we tested the performance of this optimised network against the benchmarks. The forecasting was done both in the optimisation and test phases by randomly removing one column from the **P** matrix (equation 3a) and its corresponding target value from the **T** vector (equation 2b), and letting the network start learning from the beginning with the earlier defined time lag, learning rate and newly constructed **P** and **T**. The obtained network parameters were then used to forecast the excluded target value by entering the corresponding input vector, **p**, without the target vector, **t**, into the network. This was repeated 100 times for each species, capercaillie, black grouse and hazel grouse. In each repeat, the SSE of the forecast error was scored, as well as the SSE of the forecast error using the benchmark forecasts.

## Results

Optimisation of the artificial neural network resulted in a learning rate of 0.007 for both data sets, and the optimal time lag for forecasting was 10 for the grouse species and 7 for the coloured fox. With these choices, the forecast error (SSE) was found to be smallest for 50 test runs. After this optimisation was completed, the network forecasting performance was tested against the two benchmarks.

Both benchmark forecasts and neural network forecast matched the observed grouse population size to a high degree (Fig. 1) as would be expected of a good forecasting method. However, the generalised neural network outperformed the benchmarks in every species (see below, binomial test: event E_{i} = ‘forecasting method i has lower SSE than the corresponding benchmark method’, H_{0}: P(E_{i}) = 0.5, H_{1}: P(E_{i}) > 0.5).

The benchmark using the previous data value turned out to yield slightly smaller forecast errors than the time series mean (Table 1).

Also, optimising and training in a similar manner the same network structure with data from the coloured fox, we were able to produce a rather well generalisable neural network resulting in a successful forecast performance for the coloured fox data as well (Fig. 2).

## Table 1.

P-values of binomial tests for comparing SSE of neural network performance to two benchmark forecasts (see text for details).

## Discussion

Artificial neural networks provide a very powerful tool in cases where forecasts are required but the underlying process is unknown or only partially understood. An obvious drawback is the complexity of its usage. The most demanding phase in using artificial neural networks is the optimisation of the network training parameters: learning rate and the number of learning epochs. This is due to the non-linearity of the network, which on the other hand renders it possible to find a signal even when hidden in a rather noisy data set. This is why a neural network forecasting is always a data-driven process without much possibility of giving general rules for optimal parameter choice (Smith 1993, Azoff 1994).

As to practical applications, the neural network methodology - like any other forecasting method - works better the more data that are available. Especially important are extreme values: if they are missing in the training data, they cannot be reliably forecasted either. The coloured fox results also show that although there is a considerable amount of synchrony in the Finnish grouse data (Ranta, Lindström & Lindén 1995, Lindström, Ranta & Lindén 1996), this is not necessary in order to obtain good results with cross-sectioning of the data (see Fig. 2).

Neural network does not assume anything about the underlying process - alas - neither does it provide information about it. Herein lies its strength as well as its limitations: forecasts will not be destroyed because of an unfortunate choice of a population model, but validation of ecological hypotheses about the data underlying population processes are also excluded. The sole aim of the method described here has been to forecast future population numbers. However, this is a task of utmost importance in making management decisions based on limited knowledge. In those cases, an artificial neural network provides a potential tool. We would like to emphasise, however, that in any particular situation the choice of the ‘best’ method for forecasting depends on the data and the objectives of the study. In most cases it may be advisable to have a range of models at hand and choose an appropriate one among them.

## Acknowledgements

this study was funded by the Academy of Finland and Öskar Oflund Foundation. We thank Chris Chatfield for statistical advice, Peter Hudson for comments and Nigel Yoccoz and two anonymous referees for their constructive criticism.