Analysing the recolonisation of a highly fragmented landscape by wild boar using a landscape genetic approach

Wild boar are currently one of the most widespread mammals of the world and in many regions populations keep expanding. In Flanders (Belgium), the wild boar has returned since 2006 after almost half a century of absence and numbers are increasing fast. The Flemish landscape is severely fragmented and is one of the most densely populated areas in the world. Understanding the relationship between landscape structures and species biology is the basis of landscape ecology and increases the understanding of factors driving habitat use, recolonisation and expansion. We conducted a landscape genetics study to identify factors driving wild boar expansion in Flanders. A total of 838 DNA-samples collected from the wild boar hunting bag between 2007 and 2016 were genotyped for 140 single nucleotide polymorphisms (SNPs). We show that the wild boar population expansion started from two local gene pools while staying relatively genetically distinct, though with some admixture. A third gene pool emerged around 2013 in the northwest coming from the Netherlands and Germany. The landscape genetic analysis revealed that the main factors explaining the spatial genetic pattern are isolation by distance and forest cover which influenced gene flow positively. Forest fragmentation had no significant effect on genetic distances. As human–wildlife conflicts are increasing in line with wild boars' expanding distribution range, understanding factors driving expansion during recolonisation is essential for assessing the future dispersal of wild boar in Flanders. With a better insight in future dispersal, it will be possible to conduct risk assessments which target more efficient management actions to limit human–wildlife conflicts.

The relationship between landscape structures and the biology of a species forms the basis of landscape ecology (Coulon et al. 2006). In this matter, landscape genetics has become an important tool to increase our understanding of landscape elements that affect diffusion. How a landscape is perceived by a certain species does not necessarily match human presumptions of landscape connectivity or habitat quality (Cushman et al. 2006). Understanding the interaction between geographical/environmental features and genetic variation can reveal discontinuities in the landscape such as barriers, the extent of landscape connectivity (Balkenhol et al. 2013, Parks et al. 2015, Kierepa and Latch 2016, Norman et al. 2016, Villemey et al. 2016, Cox et al. 2017, aid understanding of colonisation and expansion processes of invasive species (Fischer et al. 2017) and delineate management units for species management (e.g. ungulates; Coulon et al. 2006, Frantz et al. 2009). These applications have all used landscape genetic approaches (Manel et al. 2003, Holderegger andWagner 2008).
Knowledge about landscape elements that affect dispersal is essential for developing effective management strategies of fast expanding species (Scandura et al. 2008, Nikolov et al. 2009, Frantz et al. 2010, Fischer et al. 2017. One such fast expanding species is the wild boar Sus scrofa L. Since the 1960s wild boar populations have been expanding throughout Europe (Saez-Royuela and Telleria 1986, Acevedo et al. 2007, Massei et al. 2015) and much of their invasive distribution range (New Zealand: Bengsen et al. 2018;Australia: Choquenot et al. 1996;North America: Mayer 2018, McClure et al. 2015South America: Salvador and Fernandez 2018). This makes the wild boar currently one of the most widespread mammals in the world found in all continents apart from Antarctica (Keuling et al. 2018).
Flanders (northern Belgium) is one of the most densely populated areas in Europe, with a human population density of 462 persons per km 2 (report Linell et al. 2001, FOD Economie 2011). An intense interweaving of small natural areas, forest remnants and agricultural areas, interspersed by a dense road network (5.2 km km −2 ) results in a highly fragmented landscape. Wild boar populations have been present and/or expanding their distribution area during the last decades in all neighbouring regions and countries around Flanders (Sodeikat and Pohlmeyer 2003, Widar 2011, Carnis and Facchini 2012, Jansman et al. 2013, Massei et al. 2015, Morelle et al. 2016. After an absence of more than fifty years, wild boar reappeared in Flanders in 2006 at two geographically distinct locations from where range expansion within Flanders began. There is no official information on where these population originated from and these populations were geographically not connected to neighbouring populations. Despite this highly fragmented landscape, wild boar numbers in Flanders are increasing rapidly (Scheppers et al. 2014). At the same time, the anthropogenic landscape causes frequent contacts between wild boar and human activities resulting in wildlife-human impacts and emerging human-wildlife conflicts (Young et al. 2010).
As wild boar numbers and human-wildlife conflicts increase in Flanders (Rutten et al. 2018), a risk assessment is urgently needed. Essential in a risk assessment are future dispersal predictions, its consequences and effective management strategies. Effective strategies to prevent wild boar impacts should be based on the understanding of factors driving population increase and colonisation of new areas by wild boar (Massei et al. 2015, Veličković et al. 2016. Reforestation, agricultural intensification and climate change have been found to be the main drivers of wild boar population growth (Saez-Royuela andTelleria 1986, Massei et al. 2015). However, patterns related to range expansions are poorly understood (Morelle et al. 2016). Forest coverage has recently been found to influence wild boar dispersal (Morelle et al. 2016) but the influence of landscape fragmentation, characterizing the Flemish landscape, is currently unknown.
In this study we assess the recolonisation and expansion processes of wild boar in Flanders using a landscape genetic analysis. We aim to increase our understanding of genetic connectivity in a severely fragmented landscape. We analysed a set of DNA samples, collected from the Flemish hunting bag since the beginning of the reappearance of wild boars, to 1) assess the evolution in genetic population structure during recolonisation and 2) understand the role of landscape connectivity throughout recolonisation. We expected to find that fragmentation has affected wild boar dispersal during the recolonisation in the past decade as its spatiotemporal behaviour and occurrence patterns have been found to be affected by fragmentation (Virgós 2002, Podgórski et al. 2013. This information about factors driving recolonisation is crucial to be able to predict future dispersal of wild boar in a fragmented landscape like Flanders and will be essential for the development of a risk assessment to guide effective management strategies.

Study area
The study area largely encompasses the current distribution area of wild boar in Flanders. Presently, wild boars are almost exclusively found in the eastern part of Flanders (Fig. 1). Hunting records show that recolonisation started at two distinct geographical areas in the eastern province of Limburg (Scheppers et al. 2014).

Sample collection
Tissue samples of the lower jaw of wild boar have been collected since the early beginning of the recolonisation from hunting bag and road kills. No animals were killed for this study. From the available set of approximately 2000 tissue samples a set of 838 samples was selected for the period from 2007 until 2016 (Table 1, Fig. 1). The selection aimed at an even spread of samples across the distribution area per year. Samples are not evenly distributed over the years given the limited number of wild boar shot in the beginning of the recolonisation. The exact coordinates of these samples were not available as hunters are not obliged to report the exact coordinates of hunted wild boars. However all samples could be assigned to a specific municipality and a specific game management unit (GMU). The geographic unit used in the landscape genetic analysis is the part of a municipality located within a certain GMU.

DNA extraction and genotyping
DNA was extracted from the 838 tissue-samples using the Qiagen DNAEasy Blood and Tissue kit (QIAGEN Inc). Samples were genotyped for 150 single nucleotide polymorphisms (SNPs) using the Illumina porcine SNP60 genotyping beadchip (Ramos et al. 2009). These SNPs were selected out of the set of 351 SNPs used by Goedbloed et al. (2013) in an analysis of population structure of northwest European wild boar and that are known to be polymorphic for wild boar in this area (Goedbloed et al. 2013). The set of 150 SNPs was selected based on observed heterozygosity (<0.8) and highest minor allele frequency (MAF). SNPs were genotyped by LGC Genomics (LGC Group, Hoddesdon, UK) using their KASP (Kompetitive Allele Specific PCR) genotyping assay (He et al. 2014). A maximum limit of 5% missing values was used to exclude SNPs because of low genotyping quality.

Hardy-Weinberg equilibrium and linkage disequilibrium
As deviation from Hardy-Weinberg equilibrium (HWE) or linkage disequilibrium (LD) can have a substantial impact on population structure analysis (Waples 2015), HWE and LD were assessed while taking into account potential population structure. Clusters were defined using discriminant analysis of principal components (DAPC) with the adegenet package (ver. 2.0.0, Jombart and Collins 2015) in R ver. 3.5.2 (< www.r-project.org >). The absence of assumptions concerning HWE or LD make DAPC a useful method to detect a potential structure (Jombart et al. 2010). Tests were performed for each identified genetic cluster. HWE was tested using the adegenet R package, LD was tested using the genetics R package (ver. 1.3.8.1, Warnes et al. 2015) and results were corrected for multiple testing using the Bonferroni correction.

Population structure during expansion
Using the Bayesian clustering approach in Structure  (Table 2). For each temporal section, we used 10 independent runs with a burn-in of 100 000 Markov chain Monte Carlo (MCMC) iterations and 1 000 000 sampling iterations for each of 1-8 potential clusters (K) assuming correlated allele frequencies and admixture with a variable alpha value of 1/K. Variable alpha values were used to correct for small sample sizes (Wang 2017). The optimal value of K for each section was determined using the method of Evanno et al. (2005) embedded in Structure Harvester We assigned individuals to a cluster based on their highest average q-value to a certain population K to determine the evolution of clusters throughout the years.
We included samples genotyped for the same set of SNPs as used by Goedbloed et al. (2013), which were collected between 2008 and 2010 in the Netherlands and western Germany. For this comparison all Dutch and German samples were selected within a buffer of 75 km around Flanders. A limit of 75 km was set to assess potential geographic connections with the Flemish population clusters. The selection resulted in 202 samples of Goedbloed et al. (2013) and was analysed together with the corresponding 2009-2011 section of our data. A separate Structure analysis was run including all 838 Flemish + 202 Dutch and German samples of Goedbloed et al. (2013) to assess connections of clusters over the years.
Of the final genetic clusters identified, observed heterozygosity (H o ) and expected heterozygosity (H e ) per section were calculated using the adegenet R package. Between inferred clusters, pairwise fixation index (F st ) as a measure for differentiation among populations, was calculated using the hierfstat R package (ver. 0.04-22, Goudet and Jombart 2015). Temporal changes were tested using simple linear models (lm function in R) in which year is used as response variable and H o , H e or F st are used as explanatory variables.

Resolution
To identify landscape elements that could potentially influence wild boar dispersal, a landscape genetic analysis was conducted. As each sample was assigned to a specific area (the part of a municipality within a specific GMU) with different sizes (median = 43 km 2 , min. size = 1 km 2 and max. size = 114 km 2 ), a subset of samples with a maximum size of 40 km 2 was selected. Segelbacher et al. (2010) showed that the effects of landscape features in gene flow should be studied at the scale as large as the movement distance of the studied species. Dispersal capacity of wild boars in Wallonia (southern Belgium) is found to vary between 2.49 ± 3.74 km (mean ± standard deviation) for juveniles and females and 4.90 ± 5.65 km for males (Prévot and Licoppe 2013), resulting in a potential movement area of 20-75 km 2 (when supposing a circle buffer with a radius of 2.49-4.9 km). However, taking into account the large variation in home ranges which have been reported ranging between 0.68 and 48.3 km 2 (Garza et al. 2017), we had to find a balance between a useful scale to conduct a landscape genetic analysis taking into account dispersal-and home ranges and geographic representation of our samples. The selection of maximum 40 km 2 area size was therefore found the optimal balance taking all these aspects into account. This resulted in a subset of 286 samples out of the 838 Flemish samples used in this landscape genetic analysis (Table 1).
Spatial genetic patterns respond to changes in the landscape structure (Landgut et al. 2010) but as the Flemish landscape did not alter considerably in the last decade, we do not consider the time span of 10 years of these genetic data long enough to be influenced by recent landscape changes. Therefore, we did not use temporal restrictions for sample selection (Landgut et al. 2010).

Individual pairwise genetic distances
We used Rousset's a genetic distance (GD) (Rousset 2000) as it was shown to be among the most accurate metrics for landscape genetic approaches (Shirk et al. 2017a). Individual pairwise Rousset's a genetic distances between 286 samples were calculated using Spagedi (ver. 1.5, Hardy and Vekemans 2002).
As hunting records showed geographically distinct founder populations, and we want to focus this analysis on mechanisms driving recolonisation, the pairwise genetic distance of these 286 individuals which show a high ancestry to different population clusters, and therefore individuals who are potentially from different historical founder populations and are thus not related, are not calculated. This way we get a better insight in the mechanisms driving recolonisation within clusters and drivers of potential admixture between the source-populations.
To define which individual pairwise genetic distances were not calculated, Structure results were used: these represented pairwise combinations of which both individuals show a q-value higher than 0.8 to different populations clusters (for example: pairwise genetic distance between individual 1 with a q-value of 0.90 to population 1 and individual 2 with a q-value of 0.90 to population 2 is not calculated while the pairwise genetic distance between this individual 1 and an admixed individual 3 with a q-value of 0.45 to population 2 is calculated).

Interpatch pairwise genetic distances
We calculated the mean interpatch genetic distances between all 33 municipality-GMU areas arising from the individual pairwise combinations. This resulted in a patch-based landscape genetic approach with a total of 501 mean interpatch pairwise genetic distances (it should be noted that in theory there are 528 possible interpatch combinations, but this number decreased due to reduction in the number of pairwise combinations in step b). The patch-based approach accounts for potential dependency among individuals from the same patch.

Random point buffers
As the exact geographical coordinates of samples within each area are unknown, we randomly selected a point location in each area. After connecting the random points by straight lines a buffer with a radius of 3.5 km was drawn around each line (Supplementary material Appendix 1). A buffer radius of 3.5 km results in a buffer surface of 40 km 2 around each location, corresponding to the threshold of a maximum surface area of 40 km 2 used to select samples for the landscape genetic analysis (step a). In total, 501 buffers corresponding to the 501 interpatch pairwise genetic distances were drawn.
To account for variability of the landscape within each area, we repeated the random selection of a point location 100 times after which the buffers were drawn for each of the 100 times and landscape variables (next step e) were calculated for each of the 100 sets of corresponding buffers (Supplementary material Appendix 1).

Landscape variables
Within the buffers, a set of 10 landscape variables (Table  3) were calculated using the CORINE land cover classes (inventory of 2012 (EEA 2012); Supplementary material Appendix 2). Forest cover was calculated by merging all forest classes in CORINE and was included in this analysis as wild boar is a forest-dwelling species (Briedermann 1990) and forest coverage has been shown to influence wild boar dispersal (Morelle et al. 2016). The CORINE classes of grasslands, heathlands, wetlands etc. were grouped into low natural cover and were incorporated as these landscape types are also part of the natural habitat of wild boar (Thurfjell et al. 2009). Because agricultural crops are an important food source for wild boar (Schley and Roper 2003), the percentage of arable land use (agricultural coverage) was included as a third variable. Finally, the percentage of urbanized land was used as a forth land cover type. Isolation by distance (IBD, the increase in genetic differentiation among individuals with geographic distance) can affect genetic distances due to limited dispersal when distance between individuals increase (Frantz et al. 2010). Therefore, the Euclidian distance (km) between the random chosen locations of each polygon was calculated.
A set of fragmentation measures for each buffer was calculated: road density (the length of primary, secondary roads and motorways (OpenStreetMap 2018) divided by buffer area), forest patch density (number of forest patches divided by buffer area), mean forest patch size (forest cover area divided by number of forest patches), mean forest patch edge-area ratio (forest patch perimeter divided by patch area) and mean nearest neighbour forest patch distance (distance from each forest patch to the nearest other forest patch).

Model
For each of the 100 generated datasets including the 10 landscape variables and genetic distances, we set up linear regression models with the 501 interpatch pairwise genetic distances as response variable, using maximum-likelihood population effects parameterization (MLPE) (Clarke et al. 2002) as it was shown that these models perform well in landscape genetic regression methods (Shirk et al. 2017b). The R package nlme was used to fit all models (ver. 3.1-137, Pinheiro et al. 2016). In order to define the correlation structure that accounts for non-independence of pairwise distances, the R package corMLPE (ver. 0.0.2, Pope 2018) was used.
The 10 landscape variables were standardized around the mean and were screened for multicollinearity using variance inflation factors (VIF) applying a threshold of VIF < 3 to remove variables (Fox and Monette 1992). Model optimisation using backwards selection was performed based on Akaike's information criterion (AIC) with a threshold of AIC-difference of 2 (Burnham and Anderson 2004).
Using the AIC-values of each run, weighted AIC-scores were calculated according to Wagenmakers and Farrell (2004) using the qpcR R package (ver. 1.4-1, Spiess 2018). Coefficient results were averaged over the 100 model runs using these weighted AIC-scores and the number of significant runs per coefficient was calculated.

Preliminary analysis
Ten out of 150 SNPs were excluded because of too high percentage of missing values, resulting in 140 SNPs for data analysis. Table 3. Overview of landscape variables used in the landscape genetic analysis. Mean values and standard deviation (SD) of these variables have been calculated over all 501 buffers and over the 100 different sets of buffers. These variables were selection based on knowledge of habitat use of wild boar and to assess potential effects of forest fragmentation on wild boar dispersal.

Variable
Mean SD Using the Bayesian information criterion (BIC) in function of the number of clusters (k-value) (Supplementary material Appendix 3), DAPC analysis resulted in 4 clusters. Twelve out of 140 SNPs did show deviations from HWE in at least 3 out of 4 clusters. Structure and landscape genetic analysis were run with the total set of 140 SNPs and were compared with the reduced set of 128 SNPs. Deviations from LD and HWE were found to have a substantial impact on population structure analysis as outcomes differed between both SNP sets (results not shown). We therefore used the reduced 128 SNP set for further analysis (Supplementary material Appendix 4).

Population structure during expansion
The Structure analysis of the three-year sections detected two distinct clusters in the beginning of the recolonisation in 2007-2009 ( Fig. 2A): a more eastern population (EP) and a more western population (WP), although there was evidence of some admixture between the two clusters (Supplementary material Appendix 5). In the following years (2009-2011, 2011-2013 and 2013-2015) the two clusters stay largely geographically delineated (Fig. 2B-E) although there remains a certain degree of admixture (Supplementary material Appendix 5). The analysis of the section 2015-2016, revealed a new third cluster in the north-west (NWP). This NWP cluster was found to be related to the samples of the Netherlands and Germany as shown in the separate Structure analysis including all samples of Flanders, the Netherlands and Germany (Fig. 2F). Moreover, the comparison with the samples of the Netherlands and Germany shows a connection of the western population (WP) to southern samples from Germany in the 2009-2011 section, however this connection is not clear when all samples were analysed together.
Changes in H e , H o and F st over the years were not found to be significant (Table 4). Differentiation between NWP and WP was lower than between NWP and EP. DAPC clustering results (4 clusters) were compared with Structure clustering results (3 clusters: Supplementary material Appendix 3): clustering patterns are mainly similar, the fourth DAPC-clusters is largely a subdivision of the NWP cluster.

Landscape genetics analysis
In total, 27 316 individual pairwise genetic distances were used to calculate 501 mean interpatch genetic distances. Of all 10 landscape variables that were considered, agricultural coverage, mean forest patch size and road density were excluded because of high VIF, removing multicollinearity between variables. Backwards model selection did not result in further exclusion of variables resulting in a final model including forest coverage, urbanised coverage, low nature coverage, Euclidean distance, forest patch density, mean forest patch edge-area ratio and mean nearest neighbour forest patch distance.
Both forest coverage and Euclidean distance were found to influence genetic distance as all 100 runs showed a significant effect of these variables. Forest coverage has a negative effect on genetic distance and Euclidean distance was found to have a positive relation towards genetic distance (Table 5, Supplementary material Appendix 6). Although model optimization resulted in the inclusion of urban coverage, low nature coverage, forest patch density, mean forest patch edge-area ratio and mean nearest neighbour forest patch distance, these were mainly non-significant in the full model (few or no runs were significant for these variables).

Discussion
Knowledge about the influence of landscape composition and fragmentation on species distribution is essential for the design of effective risk assessments to reduce human-wildlife conflicts, particularly in anthropogenically disturbed landscapes such as Flanders. The population structure analysis gave us an insight in the first 10 years of recolonisation: dispersal did not happen at random and a specific population structure was found. Flemish wild boar populations expanded from two separated local gene pools, confirming the presence of the two geographically distinct areas identified at the start of the expansion (Scheppers et al. 2014). Although there was quite some admixture between populations and little genetic differentiation was found, population differentiation was significant indicating that gene flow was influenced by environmental factors, slowing down the formation of one panmictic population.
As there is no official information on where the Flemish populations originated from, the finding that some clusters were related to populations from neighbouring regions gives insight into the potential origin of the Flemish population clusters. By assessing the Flemish population structure together with samples from the Netherlands and Germany, we found that a third gene pool which emerged in 2015 in northwest Flanders is related to samples from the Netherlands and Germany suggesting that natural migration from these neighbouring countries has occurred since 2015. Additionally, the western Flemish population shows some similarities to the samples from southern Germany in the yearly section, however, as this connection was not found in the population structure of all samples together we cannot say with certainty that southern Germany would indeed be a source of this population in Flanders. Moreover, no connection between the East Flemish population and any of the other sampled populations was found. Previous analysis of wild boar ancestry in the Netherlands, Belgium and Germany using microsatellites (Jansman et al. 2013) did not show any relatedness between the Flemish population and other neighbouring populations (including eastern Walloon populations). The research of Breyne et al. (2014) on genetic profile of Flemish wild boar also failed to find a clustering connection between Flemish and the sampled Walloon populations. Possible explanations for the absence of clear origin of all Flemish populations in neighbouring populations can be the lack of samples from other potential neighbouring origin-clusters as we did not have access to samples from southern Belgium, potential reintroductions from non-neighbouring populations or a lack of clear relatedness with other clusters due to founder effects (Broders et al. 1999). Further research including DNA-samples from other populations would be needed to identify origin populations.
The landscape genetic analysis gave us insight in the connectivity of the highly fragmented landscape in Flanders. The results showed that increasing forest cover is linked to decreasing genetic distance, which is not surprising as even in urban areas it has been shown that wild boar has a strong preference for natural landscapes (Stillfried et al. 2017). The importance of forest during expansion was also found in the studies of Saito et al. (2012) and Morelle et al. (2016). Clear patterns of isolation by distance (IBD) were also found as Euclidean distance was positively related to genetic distance, confirming findings of Frantz et al. (2012) and Renner et al. (2016). We did not find any significant effects of fragmentation in contrast to what we expected, as wild boar spatiotemporal behaviour and number of occurrences have been found to be affected by fragmentation (Virgós 2002, Podgórski et al. 2013). However, it has also been shown that wild boar show substantial plasticity regarding adjustment to human-dominated environments i.e. landscape of fear (Stillfried et al. 2017); wild boar tolerate human presence by modulating their risk perception and even use humanassociated habitat classes. Wild boar dispersal is thus not limited by human-dominated environments and they may not experience a negative effect of habitat fragmentation during expansion. Another explanation could be that the lack of effects on fragmentation is the consequence of hunting-induced dispersal. It has been shown that hunting can influence spatial utilisation, increase flight distances or home ranges although effects differ with different hunting methods and sometimes no change or decreasing homeranges are found (Boitani et al. 1994, Calenge et al. 2002, Keuling et al. 2008). We could not test this potential effect of hunting-induced dispersal as we did not have detailed information on variation in hunting pressure through the study time-span and study area.
Landscape genetic studies benefit from an individual-based approach, often using datasets with specific coordinates of sample locations (Cushman et al. 2006, Holderegger and Wagner 2008, Segelbacher et al. 2010, Parks et al. 2015, Kristensen et al. 2018). As we did not have the exact coordinates from each sample, we opted to conduct the landscape genetic analysis using a subset of samples which could be allocated to an area of a municipality within a GMU of less than 40 km 2 and used repeated random point locations in these areas to determine the influence of landscape variables. Working without exact coordinates, we expected that this would significantly reduce statistical power to connect landscape variability to genetic distances compared to working with specific coordinates. However, the landscape analysis resulted in interesting findings showing that with minimal time and cost effort (working with data which is available), it is possible to gain essential knowledge on factors driving expansion during recolonisation. We are therefore convinced that the presented method is robust and results are solid.

Management implications
Wildlife-human impacts are the main limiting factor in stakeholders' tolerance towards wildlife (Carpenter et al. 2013). The lack of an effect of habitat fragmentation on wild boar dispersal is likely to lead to future dispersal and range expansion in the highly fragmented landscape of Flanders. Conducting a risk assessment, in which the extent of future dispersal is crucial, can lead to developing effective management strategies to limit wildlife-human impacts. The knowledge gained through this landscape genetic analysis will now allow us to incorporate the influence of landscape elements in species distribution modelling (SDM) approaches and thereby improving predictions for the future wild boar distribution in Flanders.