While the number of human cases of mosquito-borne diseases has increased in North America in the last decade, accurate modeling of mosquito population density has remained a challenge. Longitudinal mosquito trap data over the many years needed for model calibration, and validation is relatively rare. In particular, capturing the relative changes in mosquito abundance across seasons is necessary for predicting the risk of disease spread as it varies from year to year. We developed a discrete, semi-stochastic, mechanistic process-based mosquito population model that captures life-cycle egg, larva, pupa, adult stages, and diapause for Culex pipiens (Diptera, Culicidae) and Culex restuans (Diptera, Culicidae) mosquito populations. This model combines known models for development and survival into a fully connected age-structured model that can reproduce mosquito population dynamics. Mosquito development through these stages is a function of time, temperature, daylight hours, and aquatic habitat availability. The time-dependent parameters are informed by both laboratory studies and mosquito trap data from the Greater Toronto Area. The model incorporates citywide water-body gauge and precipitation data as a proxy for aquatic habitat. This approach accounts for the nonlinear interaction of temperature and aquatic habitat variability on the mosquito life stages. We demonstrate that the full model predicts the yearly variations in mosquito populations better than a statistical model using the same data sources. This improvement in modeling mosquito abundance can help guide interventions for reducing mosquito abundance in mitigating mosquito-borne diseases like West Nile virus.
Culex mosquitoes are a primary vector for West Nile virus (WNV) in the United States and Canada (Turell et al. 2005; Russell and Hunter 2012; Public Health Ontario 2013; Giordano et al. 2017, 2018). First introduced to the United States in 1999 and to Canada in 2001, WNV is a potentially fatal mosquito-borne disease (Pepperell et al. 2003, Public Health Ontario 2013, Giordano et al. 2017). Culex pipiens and Culex restuans are known to transmit WNV in North America (Ebel et al. 2005); therefore, being able to predict their abundance could provide public health professionals with a system to help anticipate and mitigate disease outbreaks.
Methods of capturing mosquito population dynamics vary greatly—including statistical, mechanistic (process-based), and hybrid approaches and various combinations of the mosquito life cycle. Ewing et al. (2016) examined the effects of temperature on mosquito populations using four delay-differential equations, which represent each stage of the mosquito life cycle. However, this study did not consider the effects of the availability of standing water on aquatic life stage progression. Similarly, Cailly et al. (2012) developed a model using two systems of ordinary differential equations based on the time of year and 10 compartments to comprise the four stages of the mosquito life cycle (Cailly et al., 2012). Other mechanistic models that use a series of ordinary or delay-differential equations were developed by Gong et al. (2007), Wang and Zou (2018), and Lou et al. (2019). Some include diapause explicitly (Gong et al. 2011, Cailly et al. 2012, Yu et al. 2018), while others model only within-season dynamics. Statistical approaches to model mosquito populations include, but are not limited to, generalized linear models (GLM) (Wang et al. 2011), site-specific generalized linear mixed models (Yoo 2014), harmonic analysis, and mixed-effects models (Yoo et al. 2016). The predictors in these studies include temperature, precipitation, elevation, remote-sensing indices, and land use.
Culex mosquito development is dependent on surrounding temperature (Tachiiri et al. 2006; Ruiz et al. 2010; Chuang et al. 2012; Yoo 2014; Danforth et al. 2016). Therefore, many mosquito prediction studies have focused on temperature-dependent approaches (Otero et al. 2006, Lana et al. 2011, Ewing et al. 2016, Wang and Zou 2018, Yu et al. 2018). While temperature can capture seasonal trends well, studies have concluded that additional factors must be considered to accurately capture the fluctuation in mosquito abundance over the seasonal trend (Lana et al. 2011, Ewing et al. 2016). The availability of still water has a more significant impact on the development of eggs, larva, and pupa than on adults (Hamer et al. 2011, Yoo et al. 2016). Recent studies have addressed this by including temperature and precipitation to better capture the year-to-year variation in mosquito abundance (Ahumada et al. 2004, Morin and Comrie 2010, Wang et al. 2011, Yoo 2014, Yoo et al. 2016, Valdez et al. 2017). However, these studies have alluded to the fact that assessment of the influence of different rainfall regimens on mosquito populations needs further examination as the rainfall linkage to mosquito habitats, in particular the larval habitat, could depend on factors such as slope, river routing, and availability of potential habitats for mosquitoes. Some have tackled this by using a lag in precipitation measurements or simulated rainfall to produce more accurate results (Shone et al. 2014, Valdez et al. 2017, Ratti and Wallace 2020, Ratti et al. 2022). We hypothesize that including water gauge measurements in addition to temperature will improve predictions over temperature and precipitation (see Supp Fig. S1 (tjac127_suppl_supplementary_material.docx) [online only]).
Models such as CIMSiM (Focks et al. 1993) and Skeeter-Buster (Magori et al. 2009) tackle the variation in mosquito abundance through detailed modeling of each life stage, using all available data on mosquito biology and environmental conditions to minutely calculate the numbers of mosquitoes at each age stage. In the case of Skeeter-Buster, spatial heterogeneity, stochastic effects, and the genetic makeup of the population add additional complexity in order to capture mosquito dynamics. Similar to these two models, we aim to model each life stage using available information on development and survival. However, we aim to simplify this process to reduce run time and parameter space in order to use this model at larger spatial scales, and treating each life stage in aggregate so that the model can be adapted to multiple locations and species with only slight alterations. In other words, our model falls between the very detailed agent-based models and the lower fidelity differential equation population-level models in order to capture local impacts on mosquito life stages while retaining the ability to scale up and run the model quickly.
Since we are using data from the Greater Toronto Area (GTA) to parameterize and validate our model, we briefly summarize current work specific to the GTA. The most recent studies examining mosquito abundance within the GTA have focused on the Peel Region within the GTA (Wang et al. 2011, Yoo et al. 2016, Yu et al. 2018). Wang et al. (2011) analyzed the association between a gamma-distribution model of mosquito populations with temperature and precipitation in a generalized linear model. They concluded that dynamical equations should investigate other meteorological factors plus all phases of the mosquito life cycle to capture interactive effects between environment and mosquito abundance (Wang et al. 2011). Yoo et al. (2016) combined a mixed-effects model with a harmonic analysis of temperature and precipitation to examine the association among land use, population density, elevation, spatial patterns, and mosquito abundance (Yoo et al. 2016). The authors concluded that this approach fails to capture dynamic interactions between the mosquito life cycle and environmental variables over time. Recently, Yu et al. (2018) exploited a temperature-dependent response function for aquatic and adult life stages over a single season (Yu et al. 2018). Their model used temperature alone to predict mosquito life cycle, and the authors concluded that ‘additional variables needed to be considered to account for the year to year variability in weather’.
Studies in the GTA have also focused on precipitation as a proxy for water habitat availability (Trawinski and Mackay 2008, Wang et al. 2011, Yoo et al. 2016). They concluded that inclusion of precipitation better informs mosquito abundance model predictions as impacted by available aquatic habitat (Shone et al. 2006, Valdez et al. 2017). However, standing water does not necessarily correlate linearly with precipitation since the amount of flooding caused by a given rainfall volume depends on city infrastructure (Butler et al. 2018), terrain conditions, and watershed characteristics (James 1972).
We developed a mechanistic discrete, semi-stochastic, process-based modeling approach for predicting mosquito abundance that incorporates daily municipal water station measurements, daylight hours, and temperature to predict Culex pipiens/restuans mosquito populations (Fig. 2). The process-based model (PBM) model accounts directly for the mosquito life cycle and development through life stages is influenced by environmental variables. This approach extends previous models by combining water gauge levels with laboratory and field data in a mechanistic model. This mechanistic model is a synthesis of various models that describe the relationship between environmental conditions and development/survival of mosquitoes at each life stage. To appropriately parameterize the model for Culex pipiens/restuans mosquitoes, we used location and species specific data on lifespan and development rates as dependent on temperature for the entire life cycle. By explicitly accounting for the life stages of mosquitoes, the PBM can assess the impact of temperature, precipitation, and water resource management approaches on seasonal mosquito populations. As a mechanistic model, it can also be adapted to include and compare mitigation scenarios. It can be generalized to other locales and mosquito species with appropriate parameterization. To investigate the accuracy of the process-based model, we ran a linear regression model with Gaussian errors to a log-transformed response to compare a statistical model with our process-based model and underscore the need to include the mosquito developmental processes driven by environmental conditions.
Methods
Modeling Approach
We model mosquito abundance in the GTA using two approaches: a mechanistic process-based model (PBM) and a statistical model. The PBM is a simulation model that incorporates the dependencies of mosquito abundance on exogenous variables mechanistically using time- and temperature-dependent equations. It explicitly tracks the number and age of mosquitoes in each life stage through time (see Fig. 1). The ‘statistical’ model is based exclusively on the correlation between mosquito trap data and the environmental data feeds using a generalized linear model. We then compare our PBM predictions with the observed mosquito trap data for the GTA and with the linear statistical model. We also compare PBM predictions when using stage gauge (water-level) data versus precipitation data to evaluate the efficacy of the two different data sources as indicators of water availability in the model. All analysis and modeling was done using R (v4.0.0; R Core Team 2020).
To compare the models quantitatively, we computed standard error metrics, the root mean squared error (RMSE), the mean absolute error (MAE), and the R Pearson correlation coefficient, over the training set, testing set, and the entire dataset as a whole. When trap data are used to evaluate the predictions made by a model, accuracy is most frequently measured using root mean square error (Yu et al. 2018), or correlation coefficients (Shaman et al. 2002), thus motivating our inclusion of those metrics in our analysis. Any error metric selected will be biased based on its underlying objective function, thus using multiple error metrics to evaluate performance allows multiple views. Multiple metrics also allow for researchers to more easily compare across models, many of which use a subset of our chosen metrics. We also evaluated the differences between predicted and observed peak number of mosquitoes since our new model was motivated by better capturing year-to-year differences in the mosquito population. As our trap data are observational, comparing alternative measures like the peak number of mosquitoes, can add insight into performance and quality of the model output.
Data Sources The mosquito trap data for the GTA were obtained from Public Health Ontario's WNV mosquito database, where public health units trap female mosquitoes every week in order to train and test both models. These data are available upon request and approval from the Public Health Ontario. A total of 115,338 records were collected over 16 yr between 2002 and 2017 from 2,722 trap sites. The dates for the observation data start on Jun 6, 2004, and end on Sept 27, 2017. In the study region, the mosquito season lasts about 17 wk between late May and October (weeks 24–40). Our mosquito trap observation data include not only mosquitoes commonly known for carrying WNV such as C. pipiens and C. restuans but also mosquitoes from other genera such as Aedes and Ochlerotatus. We only considered total counts of combined C. pipiens and C. restuans for our model. Public Health Ontario identifiers grouped both species as a single entry, C. pipiens/restuans to speed up identification, thus reducing cost and effort. Similarly, we filter the data set further to only include those trap sites with lat/long coordinates within the bounds of the GTA. While mosquitoes are not explicitly sorted by sex in the data, we assume that mosquitoes counted in trap data are active female mosquitoes, as most traps (85%) used in our data set are light traps (LT). We computed the average number of C. pipiens/restuans mosquitoes per trap per day. The average number of mosquitoes was calculated as the sum of the total number of mosquito counts recorded on a particular day and then divided by the number of traps sites that had observations for that day.
Both models also employ three types of temporally varying exogenous data to link mosquito abundance to the concurrent environmental conditions, 1) daylight hours, 2) water availability (precipitation or water levels), and 3) temperature. Daylight hours were calculated based on the day of the year and the latitude of Toronto using the CBM model with the US government standard day length definition (Forsythe et al. 1995). Temperature and precipitation data were collected using the R package ‘rclimateca’ (Dunnington 2019), which fetches data from the Environment Canada climate archives. Water levels for the riverways and lakes of the GTA were collected from the Rpackage ‘tidyhydat’ (Albers 2017), which accesses historical and real-time national ‘hydrometric’ stage gauge data from Water Survey of Canada data sources. Information about the boundaries and definitions of the Ontario watersheds were gathered from Land Information Ontario website, last revised on April 1, 2010. Figure 2 shows a map overlaying the mosquito trap sites and the water station locations.
The Process-based Model
Research within laboratory settings has informed understanding of mosquito development and how it depends on temperature and environmental factors (see Fig. 1). While these are performed in controlled rather than natural settings, we use the mathematical relationships between temperature and mosquito life cycle parameters determined by lab studies within the PBM and adjust for the time-varying environmental inputs in the wild. The PBM incorporates different development rates and death rates for eggs, larvae and pupae, and adult mosquitoes. We describe in the subsequent subsections the algorithm for calculating development progression in and out of the life stages. The final output of the algorithm is a daily prediction for the abundance of the average number of host-seeking female mosquitoes found in a single trap over 13 yr of data. The model is inspired by a partial differential equation approach where mosquitoes develop through life stages via lab-informed kinetics equations that describe the effect of environmental impacts of water availability and temperature on mortality and developmental velocity. These differential equations are discretized and so that the previous day's population distribution can first be affected by environmental mortality effects and then adjusted to update the age distribution of all stages according to the developmental velocity. Throughout we will refer to daylight hours as DL, average daily temperature in Celsius as Tc and normalized average water availability as H2O.
Egg Development We use the Eyring equation (Eyring, 1935) to model the developmental progression of mosquito eggs to larva. We assume that eggs do not compete for nutrients but that the development rate, vE(t), is based on the ambient temperature and time. Thus we use the daily average temperature as the input value for the Eyring equation:
where Ψegg and –AEegg are fit using non-linear least squares of the Erying equation from laboratory and field studies (Madder et al. 1983). This fit is shown in Fig. 3. R is the ideal gas constant of proportionality that relates the energy scale in physics to the temperature scale. We impose the restriction that if the daily temperature feed is below 10.02°C, then vE(t) = 0 (Madder et al. 1983, Tachiiri et al. 2006). Additionally, if the temperature drops below freezing, the environment becomes too inhospitable for the aquatic stages, and all eggs in the model die.
The rate of increase in the egg population depends on the number of adult female mosquitoes calculated to be transitioning from one gonotrophic cycle to the next (and not in diapause).
Larva and Pupa Development We combine the larva and pupa stages into one group and model the rate of development through this stage via the Briere equation (Briere et al. 1999). This model does not consider these two stages separately, nor do we consider instars, as we found additional age stages did not significantly improve the fit to justify the additional complexity and subsequent parameters. We fit this function to identified laboratory and field observations in larva and pupa development in order to extrapolate them to depend on environmental real-time data:
where the values of ΘT, Tmin, and Tmax are derived based on laboratory and field data (Madder et al. 1983) fit to the Briere equation. Tmin and Tmax are understood to be lower and upper bounds on temperature for which larva and pupa can develop, based on the fit to data. Additional data and observations would alter these values. This fit is shown in Fig. 3. The daily temperature data feed, Tc, is in degrees Celsius. We impose the same restriction, halting development, as we did for the eggs when the temperature drops below 10.02°C.
We diverge from our previous assumption of neglecting competition to include a density-dependent death rate for the larva/pupa stage, modeled by applying an approximation to a quadratic loss differential equation as in Ratti and Wallace (2020) and Wallace et al. (2017) to the combined larvae/pupae population (denoted as LP), dLP/dt = –δ * (LP)2 and delta is calculated as
The parameters α1 and α2 are fitted to the observed trap data. Note that the rate of change, δ, used in the quadratic loss function depends on aquatic habitat availability H2O, incorporating either precipitation or water stage gauge levels as a proxy. This observation underscores the fact that it is in standing water that Culex pipiens/ restuans larva and pupa thrive. This model has an age-structured population so we diverge from this simple differential equation for density-dependent death, by approximating density dependence based on the sum of all individuals in the larvae/pupae age stage. The precise discrete approximation can be found in Supplementary material (tjac127_suppl_supplementary_material.docx). In addition to accounting for competition for resources, it has been established that prey density can serve as a proxy for predation (Edwards and Brindley 1996, 1999). As this part of the model is fit to our trap data rather than laboratory/field data, the fitted parameters will necessarily account for mortality impacts beyond competition. As the larval and pupal stages are also aquatic, freezing events cause all members of this stage to die in the model. Outside of this condition, we do not incorporate additional temperature-dependent environmental mortality as a simplifying assumption, although it has been shown that pupal and larval survival rates decline significantly below 15°C (Shelton 1973).
Adult Development The adult mosquito life stages are identified by the number of times the average mosquito goes through a gonotrophic cycle. At the end of each gonotrophic cycle, the female mosquitoes seek to lay eggs at or near water. Thus the number of newly laid eggs depends upon the number of female mosquitoes ovipositing at any given time. We will first focus on the method for which the developmental rate of adult mosquitoes, vAD(t), is calculated. Adult mosquitoes may not transition linearly through all four gonotrophic cycles (Samarawickrema 1967). There is a significant variation in wild adult mosquito lifespan, so we incorporate this variability through a random variable, M. We follow closely the method of age distributions used to inform age progression as described in Goodsman et al. (2018). We diverge slightly from Goodsman et al. in that a Gamma distribution is used for the rate of development of the adult mosquito population, i.e., M ∼ Γ(vAD(t), 1). The gamma distribution, Γ(α, β)has an expected value of α/β. Thus, the expected value of the random variable used to model the rate of development M is vAD(t). It is the calculation of the value of vAD(t) which exploits the rate dependence upon temperature
where AD1 and AD2 are derived from fits to laboratory and field data (Ciota et al. 2014) (see Fig. 3), and Tc is the data feed of the daily average temperature in degrees Celsius. This will adjust the shape of the gamma distribution based on temperature while at the same time maintaining the observed behavior that extremely high or low temperatures yield slower development rates or may result in early death. We impose the same restriction, halting adult development, as for the eggs and larvae/pupae, when the temperature drops below 10.02°C.
Adult mosquitoes are assumed to have a temperature-dependent death rate. We model the death rate using an exponential decay differential equation dA/dt = –δADA, where the rate of change within this equation is the Eyring equation (Sharpe and DeMichele 1977). For the adult application of the Eyring equation, we use the following form:
where Ψadult and –AEadult are derived using non-linear least squares of the Erying equation from laboratory and field studies (Madder et al. 1983) (see Fig. 3); R is the ideal gas constant; and Tc is the average daily temperature data feed in Celsius.
Diapause is modeled through a logistic regression, which relates the probability of a mosquito being in diapause to the number of daylight hours. Studies have shown that diapause induction and termination depend on a combination of temperature and daylight hours, although the precise dynamics are difficult to measure (Nelms et al. 2013). As the cycles of temperature and daylight hours mirror each other, we will assume the proportion of mosquitoes in diapause is dependent only on daylight hours, to limit the number of fitted parameters. We use the logistic form to impose a cyclical diapause behavior dependent on daylight hours in lieu of a laboratory-derived function. The following formula yields the resulting proportion of adult mosquitoes in a given time step which are now in diapause:
Where β0 and β1 are fitted from the observed trap data, and DL is the number of daylight hours per day-step as recorded in the data feed for the GTA. We impose β0 < 0 in order to force a larger proportion (at least 50%) of mosquitoes not in diapause when the days are longer. Functionally, the mosquitoes that are in diapause do not lay eggs or progress developmentally in the model.
Eggs are added to the first age of the egg domain at each time step by adult oviposition using the following formula:
where Ovirate is fitted to the observed trap data and gi is the total number of adult female mosquitoes transitioning from adult gonotrophic stage i to i + 1for i = 1, . . ., 4. The calculation of diapprob thus forces the oviposition totals to be informed by the maximum average number of daylight hours per day. Note that Ovirate will diverge from observed oviposition rates (Madder et al. 1983), as the model considers these the eggs that will progress to the larval/pupal stage barring a freezing event, not the total eggs or egg rafts laid by each adult. The mosquito abundance prediction per day is the sum of the total calculated active mosquitoes. We define active mosquitoes as female mosquitoes that have transitioned from one gonotrophic cycle to the next. These active mosquitoes are actively looking for an oviposition location, e.g., (g1 + g2 + g3 + g4) for each day.
For detailed tracking of the total number of mosquitoes on any given day, each stage (eggs, larva-pupa, and four adult female stages) is split into 100 evenly spaced compartments through which individuals move based on development rates as described above (a discretization of time and development which is heuristically similar to solving a partial differential equation across time and developmental age). Individuals at the end of the compartments are moved up to the next stage when development rates push them past the end of their current stage. See Fig. 1 for a visual overview of the simulation process. Additionally, the precise discrete time/age equations used in the simulation process can be found in Supplementary material (tjac127_suppl_supplementary_material.docx).
Parameters We divided the PBM model's 15 parameters into two groups: parameters that we treat as fixed (9) and those that we fit during the process of training the model (6). The nine fixed parameters are derived from two laboratory and field studies on life history traits of C. pipiens/restuans mosquitoes (Madder et al. 1983, Ciota et al. 2014). Each set of data on developmental velocities and longevity were fit to a linear, Eyring (Eyring 1935) and Briere (Briere et al. 1999) model and the best one selected based on the Akaike information criterion (Sakamoto et al. 1986). These fixed parameter values can be found in Table 1 and the fits to the selected model can be seen in Fig. 3. The remaining six parameters are fit as a part of the training process of the model. These parameters did not have clear biological and data analogs as they incorporated various overlapping mechanisms observed in nature into a single parameter or a single function. The model is trained on the first nine years (405 observations) of observed trap data and tested on the final five years of the time series (179 observations). We employ a nonlinear least squares approach using the Nelder–Mead minimization algorithm (Nelder and Mead 1965) in the optim function included in base R to find the parameter estimates that minimize the sum of squared differences between the modeled active (females seeking a blood meal) population trajectory and the observed trap data. The parameters that result from the fitting process using both the average normalized precipitation and water station level data can be found in Table 2.
Table 2.
Description of parameters within the PBM model which were fitted using the Nelder–Mead Algorithm, constraining β0 < 0, to enforce biologically reasonable diapause behavior either using the Water Station Level Data Feed, Water Level or using the Precipitation Data Feed, Precip.
The Statistical Model
In light of the importance of environmental variables in calculating mosquito abundance, we investigated these data streams without a mechanistic model to see whether the data are sufficient to determine the fluctuations in the abundance of mosquitoes. Using a linear model, we aimed to describe the relationships between the mosquito trap data and our environmental variables (temperature, daylight hours, and precipitation/water levels).
This method required setting any observed mosquito averages that were equal to zero to some small value E > 0 as the analysis was performed on a log scale. For this data set, there was only one such observation recorded. In the event of a higher proportion of observed zeroes zero-inflated regression methods could be implemented (Zuur and Ieno 2016). This linear model was trained in R using glm and the same training data as the PBM.
Results
The output of both models using the parameters given in Tables 1 and 2 for the PBM and Table 3 for the statistical models is displayed in Figs. 4 and 5, respectively. The PBM with either water or precipitation inputs uniformly outperformed its statistical counterpart along all error metrics. For example, the PBM trained on normalized averaged water station levels had an RMSE of 4.706 as compared to 6.574 for the statistical model with the same data feed across the complete data set. The PBM with normalized averaged precipitation did not perform nearly as well, as evidenced by lower RMSE for the PBM with water stations across all data splits, but with an RMSE of 5.278 over the complete data set it still outperformed the analogous statistical model which had an RMSE of 6.527. This was true of the training and testing data splits. The results from MAE, which does not penalize for large errors nearly as much as RMSE, displays similar results. Finally, the R Pearson correlation values for the PBM are all above 0.6 while all correlation values for the statistical model are under 0.4. This comparison supports the observation that a process-based model capturing mosquito population dynamics provides predictive value above and beyond a standard exclusively data-driven approach. Table 4 summarizes the three standard error metrics for the two model types and the two water availability inputs.
Table 1.
Description of parameters within the PBM model which were derived from published laboratory findings on the effect of temperature on life history of Culex pipiens/restuans mosquitoes (Madder et al., 1983; Ciota et al., 2014). Plots of the fits can be found in Fig. 3.
Table 3.
Linear model coefficients for the two statistical models trained using daylight hours, temperature, and the two water feeds Precipitation and Water Station Levels.
In terms of the error metrics, the performance of the PBM with water station levels and precipitation is similar, although the errors for the water station data are lower across all three data splits. The only metric for which the precipitation data out-performed the water station data is the correlation for the test data set, 0.701 versus 0.687. The year with the best performance for both data feeds in training is 2005, with root mean squared error, relative to the number of observed mosquitoes that year, of 0.00679 for water station data and 0.00740 for the precipiation data. The worst performing testing year is 2009 for both data feeds (RRMSE: 0.01708 [Water Levels], 0.02126 [Precip]), where there are clearly two outlier observations that the model did not capture. For the testing years, 2016 is the best performing (RRMSE: 0.01163 [Water Levels], 0.01003 [Precip]), while 2015 and 2013 are the worst performing years for water station precipitation data, respectively (RRMSE: 0.01537 [Water Levels], 0.01710 [Precip]). Of note is that 2017, which has the highest peak of all years tested is among the best of the testing years for the model driven by water station data, while it is among the worst performing years for the model based on precipitation data. While these differences exist, numerically the differences between the values are quiet small year to year, making it difficult to determine the best overall model. For further distinction between the two data feeds and their predictive power in the PBM, we can turn to more alternative measures like the difference between the modeled peak and the observed peak. Table 5 contains the differences in the peak predicted by the models for the 5 yr of testing data.
The PBM with water station-level data reproduces the inter-annual variation as observed in the trap data. In particular, we see in Fig. 4 that it better captures the peak behavior of four of the 5 yr in the testing set. It averages a difference of 3.16 mosquitoes per trap at the peak compared to 12.09 for the PBM with precipitation data. This is reflected quantitatively in Table 5 where the PBM with water-level data has the smallest difference in magnitude of the peaks from the data for all testing years besides 2016. In fact, the only instance of the statistical model out-performing the PBM is in 2016, where the precipitation informed statistical model had the lowest magnitude difference in the peak. The predicted abundance of the PBM with water station-level data indicates that it has a demonstrable effect in capturing the magnitude of the observed mosquito abundance data, especially in flooding years, e.g., 2013 and 2017.
Table 4.
Comparison of the errors for the PBM and the statistical GLM built using either water level or precipitation measurements. For the purposes of comparing to other published models, relative errors can be obtained by dividing the above values of RMSE and MAE by 5,409.224.
Figure 6 shows the total egg, larvae/pupae, and adult populations alongside the active mosquito population that is used to train the model. The ranges for the total egg, total larvae/pupae, and adult population are significantly larger, as the active mosquitoes is a small subset of the adult population that is transitioning between gonotrophic stages. The eggs and larvae/pupae stages closely mirror the dynamics of the active (egg laying) population; however, the dynamics of the total population reflects more complex behavior, with adult mosquitoes coming to an equilibrium at the beginning of winter, and dying off as temperatures drop.
Discussion and Conclusion
Our model was parameterized for and fit to C. pipiens/restuans trap data from the GTA, and shows considerable predictive value in the test data set over a standard statistical approach, as it has lower errors and is better able to capture the inter-annual variation in peak mosquito observations. Our model has further advantage over the basic statistical framework because we are able to generate population estimates for all the life stages modeled using our mechanistic approach. The PBM can use various environmental data streams to drive the dynamics of the model and in particular we investigated the efficacy of two different sources of hydrological data, precipitation and water gauge data. We found that both can capture the seasonal dynamics of mosquito abundance; however, the water station gauge measurements capture the amplitudes of the peak number of mosquitoes better than precipitation measurements.
Table 5.
The difference between observed peak mosquito average and the predicted peak values of the corresponding season based on the use of the water data feed versus the precipitation data feed in the PBM and the Statistical GLM are listed as absolute values for the test data years. The smallest difference between the four models is highlighted in green. The PBM model with water gauge data out-performs both the statistical models as well as the PBM with precipitation data 4 out of 5 yr.
Though temperature and precipitation levels, when used as drivers or predictors in mosquito population models, provide an excellent correlation to observed populations, we observed that model accuracy for peak mosquito abundance can be improved by adding water station levels as a proxy for water habitat availability. Because they are a function of absorption and runoff processes on the ground, water levels measured at hydrological stations may be more reliable representations of standing water availability than precipitation measurements, which are known to be highly spatially variable (Smith et al. 2004). Over-bank flow, as measured by river water gauges, is a direct quantification of flooding. Combining flood-level information with temperature and daylight hours drivers better predicted the abnormally high mosquito population years than a predictive model that incorporated just the latter two variables. The water station levels also appear to incorporate the lag between mosquito populations and aquatic habitat availability. This may be because precipitation measures are taken during rainfall. In contrast, the water station levels are taken at locations where rainwater flows and collects in the following hours and days. Thus water stage gauge data could better inform standing water needed for C. pipiens/restuans aquatic life stages.
The optimal combination of life cycle attributes, data sources, and methods that best captures changes in the year-to-year abundance of mosquito populations is still unclear. For this study, we used measurements of water in natural habitats (precipitation and water gauge data) as a proxy for aquatic habitat availability to approximate the dynamics of C. pipiens/restuans mosquitoes in an urban/ suburban environment. Some mosquito species breed primarily in human created aquatic sites. Applications of this model to other mosquito species or locations may require more explicit modeling of this behavior to accurately capture mosquito aquatic dynamics. The complexity of the inter-dependence of the mosquito life cycle with environmental factors has led to the wide variety of models with variation in accuracy and assessment of the usefulness of different data streams cited here. In terms of data streams used, data processing, and methods for calibration and validation, our models are comparable to existing research. Temperature is used as the dominant climatic variable in most models, and precipitation appears almost as frequently. When climate data are collected from multiple stations within the region of interest, daily averages are often used for both the environmental data and mosquito counts, which we mirrored in our study. Some existing models include non-climatic factors such as diapause or varying forms of density-dependent competition (Ahumada et al. 2004, Watanabe et al. 2017, Lou et al. 2019). As outlined in the introduction and above, it is difficult to capture the high year-to-year variation in abundance of C. pipiens/restuans mosquito populations with temperature alone. We agree with Otero et al. (2006) that incorporating age structure in a mechanistic model is an important aspect of capturing the year-to-year variation.
Our model controls the progression from one life stage to the next independently to reflect actual mosquito development and can guide control measures and account for weather conditions that differ from the previously observed input. It has the advantage of tracking every stage of the mosquito life cycle from eggs through host-seeking females, which is important in determining how many adult females are active in a given period for pathogen spread. Although statistical models can identify the significance of environmental factors related to mosquito abundance, they do not include mechanistic determinants of how the factors influence abundance biologically. Mechanistic models can account for the interactions among the environmental parameters and mosquito abundance. Furthermore, the mechanistic approach generates additional information outside of a quality of fit. The PBM can generate daily population estimates for all modeled age stages, not just the population that is matched to data as we saw in Fig. 6. Thus, the PBM model provides more insight into these populations that could lead to targeted mitigation and interventions, broader modeling applications, and the ability to flexibly fit to multiple types of data if available. Somewhat unusually, the PBM performs better than a standard statistical model in predicting mosquito abundance. Qualitatively, all four models match the timing of the seasons present in the data. However, the clearest difference between the models can be seen in the magnitude of the modeled peaks. The mechanistic PBM better captures year-to-year variation in the peak of the mosquito season compared to the purely statistical model. This is likely because there are nonlinear and lagged interactions between mosquito population dynamics and the exogenous variables we use to predict them that are difficult for statistical models to capture.
The PBM model does not require new initial conditions every year, instead explicitly modeling diapause triggered by daylight hours and weather inputs to determine emergence and densities early in the year. This reduces the number of parameters that need to be fitted to data and allows the model to be run in perpetuity. We ran the model for 13 yr across the entire GTA with no re-initialization, instead of being restricted to a single year at a time. As a result of this, the PBM does produce more noisy behavior at the beginning of the season than the statistical model does. To account for stochasticity, particularly at the beginning and end of seasons, the models uses a gamma distribution to stochastically model the rate of development of the adult mosquito population. This stochastic behavior interacts with the changing diapause behavior as the days lengthen, causing the model to output the low levels of active mosquitoes early in the season. We currently have no way to validate this behavior as mosquito trapping in Ontario begins in June while this behavior is observed in the model in April and May each year.
Our PBM could be particularly useful when applied to the effects of climate change on mosquito-borne disease risk. Since the availability of standing water in cities is directly impacted by precipitation, it is essential for local agencies to be able to anticipate how increases in precipitation and flooding, along with temperature change, will impact risk of WNV in the future effectively. Climate change has already been shown to cause more extreme weather patterns, which could lead to increased rainfall, flooding, and soil moisture in temperate regions like Toronto (Pyke et al. 2011, Kienzle et al. 2012). Knowing the fluctuation in the size of mosquito populations during peak seasons of activity will establish a basis for the severity of public risk for contraction of the WNV and other mosquito-borne diseases.
Water levels within an urban area will almost certainly depend not only on weather but also on water management strategies. Municipalities often have complex systems for managing stormwater as well as infrastructure for modulating water levels in municipal rivers and waterways (Villarreal et al. 2004, Perez-Pedini et al. 2005, Pitt and Clark 2008, Tingsanchali 2012). Stormwater management affects the availability of breeding habitats for mosquitoes in urban settings and can impact the flushing of mosquito populations (Water of 2005, Bélanger 2008). The GTA focuses its water management strategy around local water management, working to divert water within regional sub-basins (Bélanger 2008, Di Gironimo et al. 2013, Toronto and Authority 2019). Previous stormwater management reports have noted that this strategy can cause localized flooding events (Di Gironimo et al. 2013). 2013 was an extreme flood year in Toronto. Based on our results, the process-based model does better than the linear model of forecasting mosquito responses to this ‘extreme’ event. In future work, it will be important to consider other areas with varying water management strategies to see whether the importance of water station data holds. Thus, water levels may not be a better predictor than precipitation in some regions. That was a motivation for developing the model to use either precipitation or water-level data.
One limitation of this model is that as a result of being trained on averages per trap per day, all populations will necessarily be average population sizes per trap area per day. To obtain total population sizes for the entire study area, one must scale by the number of traps. Many traps go in and out of commission, and it is not clear from the data which traps are live on a given day besides the ones that report mosquitoes that day. We plan in future work to develop processes by which population sizes over a region or study area can be adjusted to account for numbers of traps on a given day.
Our process-based discrete, semi-stochastic model is also limited in that we do not have the mathematical theory to rigorously identify equilibria or quantify uncertainty outside of parameter sensitivity as we would with a differential equations model. The focus of this model, however, was not a mathematical analysis but a novel method for fusing environmental data along with a laboratory-based understanding of the progression of C. pipiens/restuans life stages for creating a model which could replicate field observations of mosquito abundance. In particular, we wanted to capture better the year-to-year variation in abundance resulting from flooding or other environmental drivers. Our model has demonstrated that data alone are not as informative as the fusion of data and developmental dynamics. We have also highlighted that the type of data streams used matters. The use of water stage gauge measurements resulted in a more accurate prediction of the magnitude of population size throughout the years, particularly in flood years. We plan to do future research on testing the model at additional locations across North America, adapting to other mosquito species, and coupling the mosquito dynamics explicit to models for human case counts of mosquito-borne diseases.
Acknowledgments
Support for DS, DG, KM, CM, CX, and JC provided by LDRD grants at Los Alamos National Laboratory. Support for JMH was from NSF Award #1563531. We thank the 36 local public health units and the service providers that trapped, counted, and recorded mosquitoes in the GTA and insightful conversations with them. Research presented in this article was supported by the Laboratory Directed Research and Development program of Los Alamos National Laboratory under project number 20190581ECR and 20210062DR. The U.S. Department of Energy supported this work through the Los Alamos National Laboratory. Los Alamos National Laboratory is operated by Triad National Security, LLC, for the National Nuclear Security Administration of U.S. Department of Energy (Contract No. 89233218CNA000001).