The following questions were addressed in this study: (1) If a suite of 12–15 microsatellites were used in the genetic stock identification (GSI) of Chinook salmon Oncorhynchus tshawytscha, which microsatellites should be in the suite? (2) How many microsatellites are required to provide stock identification resolution equivalent to that of 72 single-nucleotide polymorphisms (SNPs)? (3) How many SNPs are required to replace the current microsatellite baselines used in GSI applications? (4) If additional GSI power is required for microsatellite baselines, what is the incremental increase provided by SNPs and microsatellites? The variation at 29 microsatellite loci and 73 SNP loci was surveyed in 60 populations of Chinook salmon in 16 regions in British Columbia. Microsatellites with more observed alleles provided more accurate estimates of stock composition than those with fewer alleles. The options available for improving the accuracy and precision of stock composition estimates for a 12-locus Fisheries and Oceans Canada (DFO) microsatellite suite range include adding either 4 microsatellites or 25 SNPs to the existing suite to achieve an overall population-specific accuracy of 86% across 60 populations. For the 13-locus Genetic Analysis of Pacific Salmon (GAPS) microsatellites, either 2 microsatellites or 20–25 SNPs can be added to the existing suite to achieve approximately 86% population-specific accuracy in estimated stock composition. The enhanced DFO (16 loci) and GAPS (15 loci) microsatellite baselines were projected to require 179 and 166 SNPs, respectively, for equivalent precision of the population-specific estimates. The level of regional accuracy of individual assignment available from the enhanced DFO and GAPS suites of microsatellites was projected to require 90 and 82 SNPs, respectively. The level of individual assignment to specific populations available from the enhanced DFO and GAPS suites of microsatellites was projected to require 137 and 121 SNPs, respectively.
One key aspect of the management of Pacific salmon Oncorhynchus spp. fisheries is estimating the stock composition of mixed-stock fishery samples with enough resolution for effective management decisions, with the constraint that the estimates be timely and cost-effective. Prior to 2002, microsatellites had been successfully applied in sockeye salmon O. nerka fishery management in British Columbia, where the twin management objectives were to restrict exploitation of populations of conservation concern while enabling the harvest of abundant populations (Beacham et al. 2004). In 2002, this was precisely the dilemma confronting Canadian fishery managers in the management of the Chinook salmon O. tshawytscha fisheries off the Queen Charlotte Islands in northern British Columbia and the west coast of Vancouver Island in southern British Columbia, as only a small portion of the quota available to Canadian fishermen from 1995 to 2001 was harvested due to conservation concerns about specific Chinook salmon stocks. Microsatellite variation had been surveyed previously for stocks likely to be present in these two fisheries (Beacham et al. 2003, 2006). Beginning in 2002, with managers' knowledge of the timing and locations of specific stocks of Chinook salmon derived from previous or in-season microsatellite-based stock composition analysis, fisheries were managed with the intent of allowing Canadian fishermen to harvest the quota to which they were entitled under the Pacific Salmon Treaty (PST) while providing protection to stocks of conservation concern in Canada (Beacham et al. 2008c). This change in the management of Canadian fisheries led to increased emphasis on microsatellites for the stock identification of Chinook salmon. Subsequently, staff from U.S. agencies and Fisheries and Oceans Canada (DFO) formed a research group known as the Genetic Analysis of Pacific Salmon (GAPS), which agreed to develop a 13-locus microsatellite baseline to be used for the stock composition estimation of Chinook salmon in fisheries subject to the PST (Seeb et al. 2007). There was no formal evaluation of the genetic stock identification (GSI) power of the 13 microsatellites prior to their inclusion in the baseline. The set of microsatellites used by Beacham et al. (2006a) for previous stock composition estimation became known as the DFO loci; four of the loci in the DFO suite were also in the GAPS suite. Other than the comparison outlined by Beacham et al. (2008b) for 19 populations of Yukon River Chinook salmon, there has been no formal comparison of the relative power of the microsatellites in the GAPS and DFO baselines, nor has there been any comparison of the power of adding microsatellites currently not in either baseline.
While the GAPS microsatellite baseline was being developed, single-nucleotide polymorphisms (SNPs) were also being developed and promoted by some fishery management agencies as an alternative to microsatellites for salmon stock identification. The benefits of using SNPs were suggested to be ease of data standardization among laboratories, high throughput, high among-population diversity, lower genotyping errors, and lower cost of analysis per individual (Smith et al. 2005b, 2005d). However, there was no consensus among agency laboratories as to the preferred technique to use for GSI in Chinook salmon. In an evaluation of the two techniques with respect to Chinook salmon stock identification, it was concluded that comparisons between microsatellites and SNPs were necessary before conclusions could be drawn as to their relative efficacy (PSC 2008). Comparisons between microsatellites and SNPs have been conducted for other species and applications (Beacham et al. 2010; Glover et al. 2010; Haasl and Payseur 2010) and would be useful for Chinook salmon. The focus of any evaluation for Chinook salmon would be the resolution of the stock composition and individual identification estimates provided by the two techniques and the cost per individual required to obtain the observed resolution. An initial comparison of stock identification resolution was conducted for 19 Yukon River populations incorporating 30 microsatellites and 9 SNPs (Beacham et al. 2008b). In comparisons of population-specific estimates, a 9-SNP baseline was approximately equivalent to a single microsatellite locus with 17–22 alleles. In a subsequent study comparing the ability of the 13 GAPS microsatellites and 37 SNPs to differentiate 29 broadly distributed populations, closely related populations were better differentiated by microsatellites than by SNPs but a combination of microsatellites and SNPs provided the most effective suite of loci for individual assignment to population (Narum et al. 2008). More comprehensive baselines, both in terms of the numbers of microsatellites and SNPs and the number of populations included in the analysis, need to be developed before definitive conclusions can be drawn about the resolution of stock composition estimates derived from the two techniques for Chinook salmon.
If a single approach to Chinook salmon GSI is to be implemented, the key question is how many SNPs must be used to provide stock composition and individual identification estimates of equivalent quality (in terms of both accuracy and precision) to that of the estimates obtained from a high-resolution microsatellite baseline. Kalinowski (2002, 2004) had previously suggested that equivalency in stock identification estimates could be obtained by using either a limited number of loci with many alleles or more loci with fewer alleles. Empirical evidence from microsatellites has indicated that loci with greater numbers of alleles generally provided more accurate and precise estimates of stock composition than loci with fewer alleles (Beacham et al. 2005, 2006, 2008b). Single-nucleotide polymorphisms generally display only two alleles, and thus individual SNPs will generally be less powerful than individual microsatellites in stock identification applications. The lesser power of individual SNPs can be compensated for by simply adding more SNPs to a GSI application, so that equivalency in the accuracy and precision of the estimated stock compositions is obtained. Once the number of SNPs is determined, evaluations of which technique produces the more cost-effective method of stock identification can be conducted within individual laboratories.
The genetic structure of Chinook salmon is generally regionally based, with populations in the same geographic area being more similar to each other than to populations in more distant areas (Waples et al. 2004; Beacham et al. 2006b). A regional genetic structure is the basis for defining reporting groups in GSI applications. In more complex applications, such as when fishery management actions are to be directed toward specific populations, differentiation among populations within a reporting group may be required (Parken et al. 2008). The final level of resolution required is the identification of individuals to specific populations in a reporting group or to specific reporting groups, and this is the most demanding aspect of GSI applications. Thus, the effectiveness of GSI techniques needs to be evaluated with respect to the accuracy and precision of their identification of regional reporting groups, specific populations, and individuals.
In the current study, 29 microsatellites were surveyed in 60 populations of Chinook salmon in British Columbia. These microsatellites included the 13 loci of the GAPS baseline, the 12 loci of the DFO baseline (4 of which are also in the GAPS baseline), and 8 additional microsatellites. We also evaluated the accuracy and precision of estimates of stock composition derived from a suite of 73 SNPs from the same 60-population baseline. The key questions investigated were as follows: (1) If a suite of 12–15 microsatellites were used in Chinook salmon GSI applications, which microsatellites should be in the suite? (2) How many microsatellites are required to provide stock identification resolution equivalent to that of 72 SNPs? (3) How many SNPs are required to replace the current GAPS and DFO microsatellite baselines used in GSI applications? (4) If additional GSI power is required for either the GAPS or DFO microsatellite baselines, what is the increase in power provided by adding SNPs and microsatellites?
METHODS
Collection of DNA samples and laboratory analysis.—The sampling sites or populations surveyed in each geographic region are outlined in Table 1. The geographic locations of the populations listed in Table 1 are indicated in Figure 1. Tissue samples were collected from mature Chinook salmon in these populations, preserved in 95 % ethanol, and sent to the Molecular Genetics Laboratory at the Pacific Biological Station. DNA was extracted from the tissue samples using a variety of methods, including the chelex resin protocol outlined by Small et al. (1998), the Qiagen 96-well Dneasy procedure, and the Promega Wizard SV96 Genomic DNA Purification system. Once extracted DNA was available, surveys of the variation at the following 29 microsatellite loci were conducted: Ots100, Ots101, Ots104, Ots107 (Nelson and Beacham 1999), Ssa197 (O'Reilly et al. 1996), Ogo2, Ogo4 (Olsen et al. 1998), Oke4 (Buchholz et al. 2001), Omy325 (O'Connell et al. 1997), Oki100 (Beacham et al. 2008a), Omm1009, Omm1037 (Rexroad et al. 2002), Omm1080 (Rexroad et al. 2001), Ots201b, Ots208b, Ots211, Ots212, Ots213 (Grieg et al. 2003), Omm1076, Bhms417 (Danzmann et al. 2005), Ots2, Ots9 (Banks et al. 1999), Ots3M (Grieg and Banks 1999), OH10 (Smith et al. 1998), OtsG474, OtsG68 (Williamson et al. 2002), OmyRGT3TUF, OmyRGT30TUF (Sakamoto et al. 2000), and Ssa408 (Cairney et al. 2000).
In general, polymerase chain reaction (PCR) DNA amplifications were conducted using DNA Engine Cycler Tetrad2 (BioRad, Hercules, California) in 6-µL volumes consisting of 0.15 units of Taq polymerase, 1 µL of extracted DNA, 1 × PCR buffer (Qiagen, Mississauga, Ontario), 60 µM of each nucleotide, 0.40 µM of each primer, and deionized H2O. The thermal cycling profile involved one cycle of 15 min at 95°C followed by 30–10 cycles of 20 s at 94°C, 30–60 s at 47– 65°C and 30–60 s at 68–72°C (depending on the locus). The PCR conditions for particular loci could vary from this general outline. The PCR fragments were initially size-fractionated in denaturing Polyacrylamide gels using an ABI 377 automated DNA sequencer, and genotypes were scored by Genotyper 2.5 software (Applied Biosystems, Foster City, California) using an internal lane sizing standard. Later in the study, microsatellites were size-fractionated in an AB13730 capillary DNA sequencer, and genotypes were scored by GeneMapper software 3.0 (Applied Biosystems) using an internal lane sizing standard. Allele identification between the two sequencers was standardized by analyzing approximately 600 individuals on both platforms and converting the sizing in the gel-based data set so that it would match that obtained from the capillary-based set for the 12 original DFO loci and 13 GAPS loci. The repeatability of genotyping was evaluated by repeat PCR analysis and scoring of the genotypes of individual fish. Discrepancies in scoring were observed in 8 of the 15,360 genotypes scored across all loci, for a genotyping error rate of 0.05%.
TABLE 1.
Regions, populations within regions, and sample sizes available in the survey of single-nucleotide polymorphisms (SNPs) and microsatellites for 60 Chinook salmon populations in British Columbia. The locations of the populations are shown in Figure 1. Region codes are as follows: East Coast Vancouver Island (ECVI), West Coast Vancouver Island (WCVI), upper Fraser River (UPFR), middle Fraser River (MUFR), lower Fraser River (LWFR), South Thompson River (SOTH), lower Thompson River (LWTH), southern British Columbia mainland (SOMN), and northern British Columbia mainland (NOMN).
Continued
Variation was analyzed at 72 nuclear and 1 mitochondrial SNP with the primer and probe sequences outlined by Smith et al. (2005a, 2005b, 2005c, 2007), Campbell and Narum (2008), and Miller et al. (2008). A listing of all the SNPs surveyed is given in Table 2. After PCR amplification, the plates (384 well) were read on an ABI Prism 7900HT Sequence Detection System by one individual using sequence detection software from ABI. One SNP (Ots_CI_A1) required analysis on the automated sequencer, as it was an insertion or deletion and as such a size variant. The repeatability of genotyping was evaluated by repeat PCR analysis and scoring of genotypes. Discrepancies in scoring were observed in 3 of the 21,408 genotypes scored across all loci evaluated, for a genotyping error rate of 0.01%.
Estimating stock composition in single-population samples.—Two software packages were utilized in estimating the stock composition of single-population mixtures: the Statistical Package for the Analysis of Mixtures (SPAM version 3.7; Debevec et al. 2000) and ONCOR (Kalinowski et al. 2007). A mitochondrial DNA SNP (mtSNP) was analyzed in the survey (Ots_C3N3), and as ONCOR is currently unable to analyze variations incorporating mitochondrial haplotypes, SPAM was used exclusively for those analyses. Genotypic frequencies were determined for each locus in each population and were used to estimate the stock composition of simulated single-population samples. The Rannala and Mountain (1997) correction to baseline allele frequencies was used in SPAM analyses to avoid the occurrence of fish in the mixed sample from a specific population with an allele not observed in the baseline samples from that population. This correction incorporated Bayesian modeling of baseline allele frequency distributions. All loci were considered to be in Hardy-Weinberg equilibrium (Beacham et al. 2006b); the expected genotypic frequencies were determined from the observed allele frequencies. The reported stock compositions for the simulated single-population samples are the bootstrap mean estimates for each mixture of 200 fish analyzed, where the mean and variance estimates are derived from 1,000 bootstrap simulations. Each baseline population and simulated single-population sample was sampled with replacement in order to simulate the random variation involved in the collection of the baseline and fishery samples. When ONCOR was used to estimate stock composition, the Rannala and Mountain (1997) correction to baseline allele frequencies was again implemented, the precision of the stock compositions being calculated by bootstrapping (100 simulations) over the observed baseline population sample sizes and a mixture size of 200 fish. For both SPAM and ONCOR, the allocations to individual baseline populations were summed to provide estimates of stock composition for regional stock groups (Table 1). Additionally, ONCOR was used to provide estimates of the accuracy of identification of individuals to specific populations or regional stock groups through leave-one-out assignment testing.
TABLE 2.
Ranking of 102 markers (29 microsatellites and 73 single-nucleotide polymorphisms [SNPs]) in terms of the average estimated composition (population and regional accuracy determined with SPAM) of single-population samples over 60 populations of Chinook salmon, as well as the number of alleles observed at the locus and Fst value. Types are as follows: M = microsatellites, S = SNPs, and SM = mtDNA SNPs. He is expected heterozygosity.
Continued
The sample sizes for the microsatellites were variable among populations and loci (Table 1). To control for the effect of varying sample size, construction of the microsatellite baseline for the analysis proceeded on the basis of capping population sample size at 200 individuals. The population sample size for the microsatellite analysis ranged from 74 to 200 individuals. The population sample size for the SNP analysis ranged from 79 to 190 individuals but was typically set at 95 individuals. Allele frequencies for all of the populations surveyed in this study are available at the Molecular Genetics Laboratory's Web site ( http://www.pac.dfo-mpo.gc.ca/science/facilitiesinstallations/pbs-sbp/mgl-lgm/data-donnees/index-eng.htm).
Relative ranking of the loci.—The power of individual loci for stock composition estimation was initially evaluated by incorporating only a single locus in the estimation of the stock composition of simulated single-population samples. As an mtSNP was included in the analysis, only SPAM was used to provide estimates of stock composition for all 60 single-population samples. Mean accuracy was determined as the average estimate across all 60 populations, with loci then being ordered from the most accurate to the least accurate relative to population-specific accuracy (Table 2). Heterozygosity for each locus over all populations was calculated with FSTAT version 2.9.3.2 (Goudet 1995).
How many microsatellites for SNP equivalency?—ONCOR was used exclusively for this analysis, with the proviso that the mtSNP was eliminated from the analysis. The microsatellite locus with the highest average population accuracy was initially incorporated into the analysis; lower-accuracy microsatellites were then added sequentially until the average population accuracy and precision of the stock composition estimates provided by the suite of microsatellites matched that provided by the SNPs. Single-population samples were analyzed with the best 3– 14 microsatellites. Accuracy and standard deviation were determined for the population- and region-specific estimates of stock composition and then averaged over the 60 single-population simulations for each set of loci. The percent correct assignment of individuals to specific populations and regions was also determined for all 60 populations and then averaged over all populations for each set of microsatellites evaluated.
Comparing DFO and GAPS microsatellites.—For each run (injection) of the ABI 3730 sequencer, analysis of microsatellite variation can accommodate loci marked with up to four different tags in addition to the size standard necessary for estimating allele size. As the number of injections increases, the cost of the analysis rises, and thus efficient laboratory operation requires that the number of injections be minimized in surveys of microsatellite variation. The DFO suite of microsatellites is comprised of 12 loci surveyed with two injections on the automated DNA sequencer, and the GAPs suite is comprised of 13 loci surveyed with three injections (two dyes unutilized) (Table 3). If additional accuracy is required for either set of loci, either microsatellites or SNPs can be added to the existing baselines to provide increased stock identification power. If microsatellites are added to either suite and the number of injections on the automated sequencer is capped at three, four microsatellites can be added to the DFO suite (4 dyes are available for each injection) and two microsatellites can be added to the GAPS suite (2 unutilized dyes). The microsatellites added to the DFO suite would include the top-ranked non-DFO microsatellites Ots201b, Ots213, Omm1080, and Ots212. Similarly, the microsatellites added to the GAPS suite would include Ots107 and Ots100. Single-population samples were analyzed with the DFO (regular and enhanced suites) and GAPS microsatellites (regular and enhanced suites). Additionally, the original suites of DFO and GAPS microsatellites were enhanced with the addition of SNPs to the baselines used for stock composition analysis, the number of SNPs being added in increments of five, starting with the highest-rated ones. The addition of groups of SNPs continued until the average estimated population accuracy derived from the enhanced DFO and GAPS microsatellite baselines was achieved. Accuracy and standard deviation for population- and region-specific estimates of stock composition were determined for each group of genetic markers examined and then averaged over the 60 single-population simulations. The percent correct assignment of individuals to specific populations and regions was also determined for all 60 populations and then averaged over all populations for each set of microsatellites evaluated.
TABLE 3.
Numbers of injections on the automated sequencer, primer tags employed, and loci surveyed for the Fisheries and Oceans Canada (DFO) and Genetic Analysis of Pacific Salmon (GAPS) surveys of microsatellite variation.
How many SNPs for micro satellite equivalency?—Projections of the number of SNPs required for equivalency with the current microsatellite baseline were made by ranking the SNPs according to the average accuracy observed in estimating stock composition for single-population samples over the 60 populations surveyed. The mtSNP was eliminated in these analyses. Subsequent analyses were conducted exclusively with ONCOR. Single-population samples were analyzed with 10– 72 SNPs in increments of 5 SNPs. The SNPs with the highest average accuracy were initially incorporated into the analyses of the single-population mixtures, with the less accurate SNPs being added sequentially. Additionally, the SNPs with the lowest average accuracy values were initially incorporated into the analyses, with progressively more accurate SNPs being added sequentially, again with average accuracy and precision being recorded. The overall mean accuracy and precision of each specified number of SNPs were determined by averaging the results from both processes; this was considered indicative of the average trend in estimating accuracy and precision when the number of SNPs employed in the analysis was increased. The average regional, population, and individual accuracy and precision over all 60 populations were recorded for each set of SNPs. A hyperbola function of the form Y = a/X + b was fitted with Labfit curve-fitting software (Pereira da Silva and Pereira da Silva 2007), in which Y is the mean observed accuracy for the population and regional estimates and X is the number of SNPs incorporated into the analysis. Estimates of standard deviations were derived from the power function Y = aXb, where Y and X are defined as before. Individual assignment accuracy for populations and regions was fitted with the modified geometric function Y = aXb/x. Projections were then made with these regression models to estimate the number of SNPs required to provide estimates of comparable resolution to that provided by the microsatellites with respect to estimated stock composition at both the regional and population levels as well as individual identification at the regional and population levels.
RESULTS
Relative Rankings of the Markers
The number of alleles observed at a microsatellite locus was important in determining the value of the locus for stock identification applications. The number of alleles observed at a locus varied from 7 to 62 for the populations and loci surveyed in our study (Table 2). The number of alleles observed at a microsatellite locus was related to the accuracy of the estimated stock composition of the simulated single-population samples (Figure 2a). As the number of alleles increased up to about 30 per locus, the accuracy of population-specific estimated stock composition increased. However, larger numbers of alleles provided only a minor increase in the accuracy of estimated stock composition. The accuracy of the mean estimated stock compositions of the single-population samples (correct = 100%) were 40.2% for simulations with single loci having 10 or fewer alleles, 56.9% for loci with 11–20 alleles, 71.6% for loci with 21–30 alleles, 75.7% for loci with 31–10 alleles, 76.7% for loci with 41-50 alleles, 78.6% for loci with 51–60 alleles, and 79.3% for a locus with more than 60 alleles (Table 2). The precision of the estimated stock composition for a locus was influenced by the number of alleles observed at a microsatellite locus, with more precise estimates being derived from loci with larger numbers of alleles (Figure 2b). The mean standard deviations of the estimated stock compositions were 22.9% for loci with 10 or fewer alleles, 19.4% for loci with 11–20 alleles, 13.5% for loci with 21–30 alleles, 9.4% for loci with 31–40 alleles, 8.8% for loci with 41–50 alleles, 7.7% for loci with 51– 60 alleles, and 7.1% for a locus with more than 60 alleles (Table 2). In general, microsatellites with more alleles provided more accurate and precise estimates of the stock compositions of the single-population samples than did loci with fewer alleles, but the results varied depending on the specific loci.
Average population-specific accuracy for the 102 markers evaluated ranged from 2.4% to 81.1%, with a clear break between SNPs and microsatellites. The average accuracy in the stock identification analysis ranged from 2.4% to 13.9% for individual SNPs and from 34.2% to 81.1% for individual microsatellites. The top 29 of the 102 markers evaluated were microsatellites, with the accuracy of the estimated populationspecific stock compositions produced by incorporating the lowest-ranked microsatellite (OmyRGT30TUF, which has seven alleles) being 2.5 times as high as that of the highest-ranked SNP (Ots_FARSLA-220) (34.2% versus 13.9%; Table 2). The differential between these two loci was less for region-specific accuracy (56.5% versus 39.7%).
Heterozygosity was a good predictor of the relative power of microsatellites for stock composition analysis (r = 0.85, P < 0.01). For microsatellites, loci with fewer alleles also tended to have lower heterozygosity (Table 3), and just as loci with more alleles were more powerful for stock identification analysis, more heterozygous loci were more powerful for stock identification analysis (Figure 3a). However, there was little increase in the accuracy of estimated stock composition for microsatellites once heterozygosity values reached 0.80, which corresponded to approximately 30 alleles. All of the SNPs evaluated displayed only two alleles, and there were only minor differences in stock identification accuracy among SNPs for loci with a heterozygosity value greater than 0.15 (Figure 3b). In this case, heterozygosity at SNP loci was a modest predictor of the relative power of the locus for stock identification analysis (r = 0.55, P < 0.01).
How Many Microsatellites for SNP Equivalency?
The application of 72 SNPs in estimating the stock composition of 60 single-population samples resulted in an average accuracy of 97.5% to reporting region, with a standard deviation of 0.8% (Table 4). These levels of accuracy and precision, along with the levels associated with the population-level estimates of stock composition and individual assignment to both region and population provided the reference points for estimating the number of microsatellites required to provide equivalent results in stock identification applications. The regional accuracy estimates for individual populations ranged from 79% for the Louis Creek population to 100% for the middle Shuswap River population. Starting with the microsatellite with the highest observed population-specific accuracy (Ots107), at least 14 microsatellites were required to achieve comparable levels of accuracy (97.3%) and precision (SD = 1.0%) in the regional estimates of stock composition (Table 4).
The application of 72 SNPs in estimating stock composition resulted in an average population-specific accuracy of 83.1% (SD = 3.7%; Table 4). Individual population estimates ranged from 6.3% for the Puntledge River fall-run population to 99.9% for the middle Shuswap River population (Figure 4). Some populations, such as those of the Quinsam, upper Chilcotin, and Bulkley rivers, were clearly more differentiated than others, with high levels of population-specific accuracy (>97%) regardless of the suite of markers used to estimate stock composition. Similarly, the estimated stock compositions for the populations that displayed lower levels of accuracy, such as those of Shakes Creek, the Nahlin River, and the Nakina River, were consistently lower than than those for most other populations with both microsatellites and SNPs. The standard deviations of the estimated stock compositions were generally higher with SNPs than with either of the DFO or GAPS suites of microsatellites (Figure A. 1 in the appendix). The standard deviations of the estimated stock compositions derived from the SNPs were greater than those of the compositions derived from the DFO microsatellites for 39 of the populations evaluated (sign test analysis; P < 0.05) and greater than those of the compositions from the GAPS microsatellites for 43 of the populations (P < 0.01). On average, application of the best microsatellites resulted in comparable levels of accuracy (83.4%) and precision (SD = 3.4%) with nine and six microsatellites, respectively.
TABLE 4.
Accuracy (means) and precision (SDs) of the estimated population and regional stock compositions of single-population samples determined with ONCOR from suites of markers incorporating the best 3–14 microsatellites from Table 2 as well as the best 72 single-nucleotide polymorphisms (SNPs). The percent correct assignment of individuals to specific populations and regions is also shown.
Assignment of individuals to the correct region of origin with the 72 SNPs was attained with an accuracy of 87.3%, and assignment to specific populations was attained with an accuracy of 56.7% (Table 4). Regional assignment accuracy ranged from 15% for the Bear River population to 100.0% for several populations, such as the Burman River population. Population assignment accuracy ranged from 8.7% for the Shakes Creek population to 98.6% for the middle Shuswap River population (Figure A.2). Comparable region-specific levels of assignment accuracy (87.5%) were achieved with the use of 12 microsatellites, comparable levels of population-specific assignment accuracy (58.2%) with 8 microsatellites (Table 4). In summary, the number of microsatellites required to produce stock identification results equivalent to those from applying 72 SNPs depended on the specific task. Regional estimates of stock composition required at least 14 microsatellites, population-specific estimates of stock composition required 6–9 microsatellites, and individual assignment to population and region required 8–12 microsatellites.
Comparing DFO and GAPS Microsatellites
The average estimates of population-specific stock composition for single-population samples were 83.0% for the DFO microsatellites and 83.1% for the GAPS microsatellites (Table 5). Both suites of loci provided essentially equal average population-specific accuracy over the 60 populations of Chinook salmon surveyed in British Columbia. The addition of the four most powerful non-DFO microsatellites (Ots201b, Ots213, Omm1080, and Ots212) to the suite of DFO microsatellites improved the estimated accuracy of the single-population samples to 85.6% (Table 5). Improvement of the level of accuracy of estimated stock compositions to this level could also be achieved by the addition of about 25 of the top-ranked SNPs to the 12 DFO microsatellites (Table 5). The addition of the 2 most powerful non-GAPS microsatellites (Ots107 and Ots100) to the GAPS suite improved the estimated accuracy to an average of 85.5% (Table 5), essentially the same as that of the enhanced DFO suite. Approximately 20–25 of the top-ranked SNPs would be required to be added to the suite of GAPS microsatellites to provide the same population resolution as that provided by the addition of the 2 most powerful non-GAPS microsatellites. In summary, at least two options are available for improving the accuracy and precision of stock composition estimates for the DFO and GAPS suites of microsatellites if the number of injections on the automated sequencer is capped at three. For the DFO microsatellites, either 4 microsatellites or 25 SNPs can be added to the existing suite. For the GAPS microsatellites, either 2 microsatellites or 20–25 SNPs can be added to the existing suite. The addition of either set of loci to both sets of microsatellites would provide generally equivalent results.
In stock identification applications, different levels of resolution may be required for populations in different reporting regions. For example, population-specific estimates of stock composition may be required in a specific region, while regionallevel estimates are satisfactory in other regions. If an existing microsatellite baseline is to be enhanced to improve accuracy, the choice of loci to add may depend to some degree on the specific application. The levels of accuracy of estimated stock composition at the population level clearly differed among the reporting regions. For example, in applications utilizing the DFO loci and centered on providing higher resolution of East Coast Vancouver Island populations (a region of conservation concern), enhancing the DFO baseline with four additional microsatellites provided the most effective means of increasing the resolution among populations (Table 6). Alternatively, enhancing the accuracy of identification of Stikine River populations was most effective by adding 25 SNPs. For the GAPS microsatellite baseline, enhancement with two microsatellites provided the best resolution of upper Fraser River populations, but for West Coast of Vancouver Island populations a combined microsatellite-SNP approach provided the best population resolution (Table 6).
TABLE 5.
Accuracy (means) and precision (SDs) of the estimated population and regional stock compositions of single-population samples determined with ONCOR from suites of markers incorporating the best 13 microsatellites (micros), the best 72 single-nucleotide polymorphisms (SNPs), the Fisheries and Oceans Canada (DFO) microsatellites, the DFO microsatellites plus the best 20–25 SNPs, the DFO microsatellites plus 4 additional microsatellites (Ots201b, Ots213, Omm1080, and Ots212), the Genetic Analysis of Pacific Salmon (GAPS) microsatellites, the GAPS microsatellites plus the best 20–25 SNPs, and the GAPS microsatellites plus 2 additional microsatellites (Ots107 and Ots100) over 60 Chinook salmon populations in British Columbia. The percent correct assignment of individuals to specific populations and regions is also shown.
TABLE 6.
Average accuracy of estimated population stock compositions by region for the Fisheries and Oceans Canada (DFO), Genetic Analysis of Pacific Salmon (GAPS), single-nucleotide polymorphism (SNP), DFO plus 4 microsatellites (micros), GAPS plus 2 microsatellites, DFO plus 25 SNPs, and GAPS plus 25 SNPs baselines incorporating the regions and populations indicated in Table 1. See Table 5 for more information about the loci utilized.
In some applications, individual Chinook salmon are required to be identified to either their region or population of origin. The regional origin of individuals from lower Thompson River populations was identified with a high degree of accuracy regardless of the suite of loci used in the procedure (Table 7). With enhanced baselines, assignment of individuals to the correct region was typically accomplished with an accuracy greater than 90% for regions in southern British Columbia. In northern British Columbia, identification of individuals to the Stikine River, Taku River, and regional groups in the Skeena River displayed the lowest assignment accuracy (69–78% for the Stikine and Taku River regions and 55–88% for the Skeena River regions) when enhanced baselines were considered.
TABLE 7.
Average accuracy of regional assignment for individual Chinook salmon for the Fisheries and Oceans Canada (DFO), Genetic Analysis of Pacific Salmon (GAPS), single-nucleotide polymorphism (SNP), DFO plus 4 microsatellites (micros), GAPS plus 2 microsatellites, DFO plus 25 SNPs, and GAPS plus 25 SNPs baselines incorporating the regions and populations indicated in Table 1. See Table 5 for more information about the loci utilized.
TABLE 8.
Average accuracy of population assignment for individual Chinook salmon for the Fisheries and Oceans Canada (DFO), Genetic Analysis of Pacific Salmon (GAPS), single-nucleotide polymorphism (SNP), DFO plus 4 microsatellites (micros), GAPS plus 2 microsatellites, DFO plus 25 SNPs, and GAPS plus 25 SNPs baselines incorporating the regions and populations indicated in Table 1. See Table 5 for more information about the loci utilized.
The most difficult problem encountered in stock identification applications is the correct assignment of individuals to specific populations. The highest accuracy of assignment was typically observed for the Fraser River populations, the lowest accuracy for the Stikine and Taku River populations (Table 8). There was no consistent ranking within regions of the relative assignment accuracy provided by the DFO, GAPS, or SNP baselines (Table 8). Enhancing the GAPS baseline with 2 microsatellites provided the least relative increase in assignment accuracy, whereas enhancing the GAPS baseline with 25 SNPs provided the greatest increase.
TABLE 9.
Accuracy (means) and precision (SDs) of the estimated population and regional stock compositions of single-population samples determined with ONCOR from suites of markers incorporating either the best or worst 10, 15, 20, 25, 30, 35, 40,45, 50, 55, 60, 65, and 72 single-nucleotide polymorphisms (SNPs) from Table 2. The percent correct assignment of individuals to specific populations and regions is also shown.
How Many SNPs for Microsatellite Equivalency?
The average stock composition accuracy of the 60 singlepopulation samples incorporating the 12 DFO microsatellites was 96.9% to geographic region, that for the 13 GAPS microsatellites was 96.7% (Table 5), and that for the 72 SNPs was 97.2% (Table 9). In essence, the 72 SNPs employed for stock composition estimation provided an accuracy in the estimation of stock composition to geographic region equivalent to or better than that of the DFO and GAPS microsatellites. The average precision of the regional estimates of stock composition was again essentially equivalent to that of the DFO microsatellites, and 80 SNPs of the average quality evaluated in our study were projected to be required to produce regional estimates of stock composition equivalent to those available from the GAPS microsatellites. When the enhanced microsatellite baselines were evaluated, 79–88 SNPs were projected to be required for the equivalency of regional accuracy and precision for the enhanced DFO microsatellites, and 68–88 SNPs were projected to be required for the GAPS baseline (Table 10).
The average population-specific accuracy derived from the DFO microsatellites was 83.0% to specific populations, 83.1% for the GAPS microsatellites, 85.6% for the enhanced DFO microsatellites, and 85.5% for the enhanced GAPS microsatellites (Table 5). The 72 SNPs evaluated already provided populationspecific accuracy equivalent to that available from the existing DFO and GAPS microsatellites for most populations (Figure A.2). The accuracy available from the enhanced DFO or GAPS suites of microsatellites was projected to require between 118 and 122 SNPs (Table 10). The variability of the population-specific estimates derived from the existing DFO and GAPS microsatellites was less than that available from the 72 SNPs for most populations (Figure A.1). The precision of the population-specific estimates available from the existing microsatellite baselines was projected to require 122 SNPs for equivalency with that of the DFO microsatellites and 118 SNPs for equivalency with that of the GAPS microsatellites. The enhanced microsatellite baselines were projected to require 179 SNPs and 166 SNPs, respectively, for equivalency of the precision of population-specific estimates (Figure 5; Table 10).
TABLE 10.
Estimated number of single-nucleotide polymorphisms (SNPs) required to equal the performance of different suites of microsatellites with respect to accuracy and precision (SD) of population and regional estimates of stock composition, as well as the percent correct individual assignment to population and region, based on fitting functions to the results in Table 9. The mean accuracy and precision and the accuracy of individual assignments obtained from the different suites of microsatellites are given in Table 5.
The average accuracy of the assignment of individuals to specific regions was 87.0% for the DFO microsatellites and 87.3% for the GAPS microsatellites, and 73–74 SNPs of the average quality evaluated in the study were estimated to be required to provide equivalent accuracy in the assignment of individuals to region. The SNPs surveyed in the study provided equivalent levels of regional assignment accuracy for individuals. The level of regional accuracy of individual assignment available from the enhanced DFO and GAPS suites of microsatellites was projected to require 90 and 82 SNPs, respectively (Figure 5; Table 10). Assignment of individuals to specific populations was achieved with an average accuracy of 61.5% for the DFO microsatellites and 61.8% for the GAPS microsatellites. These levels of accuracy were projected to be achieved with 93– 94 SNPs of the average quality surveyed. The levels of accuracy of individual assignment available from the enhanced DFO and GAPS suites of microsatellites were projected to require 137 and 121 SNPs, respectively (Table 10).
DISCUSSION
Sample Size
Two alleles were observed at the SNP loci, but up to 62 alleles were observed at the microsatellite loci. Therefore, microsatellites require more fish to be sampled in a population to obtain estimates of allele frequencies with similar accuracy and precision than do SNPs. Beacham et al. (2011) demonstrated that microsatellites required larger baseline samples than SNPs to reduce the sampling variation in the estimation of allele frequencies and thus increase the accuracy of the estimated stock compositions. The population sample size was required to be about two to three times larger in the microsatellite baselines than in the SNP baselines before equivalent levels of accuracy relative to the asymptotic value were obtained. As most population sample sizes were approximately 100 individuals for the survey of SNP variation in our study, microsatellite sample size was capped at a maximum of 200 individuals per population to estimate microsatellite allele frequencies. The accuracy of the estimated stock composition of single-population samples of salmon derived from microsatellites can be limited by the baseline population sample size, with sample sizes of approximately 200 individuals being required before there is little effect of sample size on the accuracy of population-specific estimates (Beacham et al. 2006a). For sockeye salmon O. nerka, Beacham et al. (2010) showed that once approximately 95 individuals within a population had been sampled at SNP loci, there was virtually no increase in the accuracy of estimated stock compositions or individual assignments. Thus, the accuracy of the estimated stock compositions and assignments of individuals derived from SNPs was not limited by population sample size in our study. Comparison of the utility of SNPs and microsatellites for stock identification requires that adequate sample sizes be available for the populations included in the analyses for both classes of markers.
Relative Ranking of the Markers
In the current study, the survey of microsatellite variation included loci with 7–62 alleles. The number of alleles observed at a microsatellite locus was related to the accuracy and precision of the estimated stock compositions. Microsatellites with larger numbers of alleles provided more accurate and precise estimates than did microsatellites with few alleles. Similar empirical results have previously been reported for Chinook salmon in the Yukon River drainage (Beacham et al. 2008b) and on a Pacific Rim basis (Beacham et al. 2006a). Of the top 10 microsatellites surveyed in the current study, 5 were also in the top 10 evaluated in the Yukon River; 2 of the remaining loci were not surveyed in the previous study and 3 were of lesser value. The 5 loci in the top 10 in both studies were Ots100, Ots107, Oki100, Omm1080, and Ots211, which, given the widely divergent geographical settings of the two studies, is indicative of the general power of these loci for stock identification applications.
Anderson (2010) reported that the accuracy of population assignment for a set of genetic markers can be overestimated in routine applications if the suite of markers has been chosen specifically for the observed differentiation in allele frequency among existing population samples. Sampling error in the estimation of baseline allele frequencies may lead to an optimistic assessment of their accuracy in routine applications. Accuracy would ideally be tested with individuals not included in the baseline samples. In our study, all of the samples available for a population were used to determine SNP allele frequencies, and all of the samples were used for 62% of the populations surveyed for microsatellites. Although the accuracy of population assignment was not tested with new individuals, the trends in accuracy among the different suites of microsatellites and SNPs would probably not have been affected.
Although other methods for evaluating the relative power of loci for population differentiation and individual assignment have been applied (Rosenberg et al. 2003; Hedrick 2005; Narum et al. 2008), power for stock identification was the sole basis of evaluating the loci surveyed in the study. In a previous study evaluating the effectiveness of 13 microsatellites and 37 SNPs for assigning individuals to 29 specific populations of Chinook salmon, Narum et al. (2008) reported that the best 10 loci for correct individual assignment were microsatellites and that the best 15 loci included 12 microsatellites and 3 SNPs. Assignment accuracy was higher than when using either SNPs or microsatellites alone. In our study, which centered on the accuracy of population-specific estimates of stock composition as a measure of power of the locus for stock identification and which incorporated 29 microsatellites and 73 SNPs across 60 populations in British Columbia, the top 29 of the 103 markers evaluated in our study were all microsatellites, with the accuracy of the estimated population-specific stock compositions produced by incorporating the least-informative microsatellite being approximately 2.5 times that of the highest-ranked SNP. The 12 highest-ranked SNPs in the study of Narum et al. (2008) were incorporated in our survey, and all of these SNPs were found to be less valuable in estimating stock composition than the least-powerful microsatellite.
DFO and GAPS Microsatellites
The accuracy and precision of the stock composition estimates provided by the 12-locus DFO and the 13-locus GAPS sets of microsatellites were essentially equivalent, although both were lower than that provided by an optimum set of 13 microsatellites. If increased accuracy and precision of stock composition estimates is required for either set of loci, there are three potential solutions. The first potential solution is to increase the number of microsatellites incorporated in the suites of loci, the second is to augment the current suite of microsatellites with higher-resolution SNPs, and the third is to replace the microsatellites entirely with a set of SNPs. Increasing the resolution of stock composition estimates in the real world is constrained by the cost of producing the estimates for a particular sample.
On a practical basis, when microsatellites are used for stock composition estimation, constraining the cost typically centers on limiting the number of injections on the automated sequencer. If the number of injections is arbitrarily capped at three for analysis of Chinook salmon microsatellite variation, then four microsatellites can be added to the DFO suite and two to the GAPS suite. When these additional microsatellites were incorporated into our analysis, the improvements in the accuracy and precision of the estimated stock compositions were similar with both suites of loci, with an average population-specific accuracy of 85.5% over all populations and an average standard deviation of 2.5%. When the second option (augmenting the microsatellites with a suite of high-resolution SNPs) was employed, the addition of 20–25 SNPs achieved the accuracy and precision of the population-specific stock composition estimates achieved with the augmented suite of microsatellites. If the 85.5% average population-specific accuracy and 2.5% standard deviation observed in our survey are considered acceptable for management applications, the choices are clear as to how to improve the accuracy of estimated stock compositions. For the GAPS microsatellites, adding two microsatellites to the suite will increase the cost of analysis by the expense of conducting two polymerase chain reactions and the time required to analyze two additional loci. The resulting cost would be compared with that of surveying an additional 20–25 SNPs. These cost comparisons can be made by the individual laboratories that apply the GAPS microsatellites; in our laboratory, it is more efficient to add two microsatellites to the survey than it is to add 20–25 SNPs.
Microsatellites or SNPs?
Accurate and precise regional estimates of stock composition are generally the easiest to produce, followed by population-specific estimates, with the assignment of individuals to specific populations being the most difficult. Assessment of the accuracy and precision of the regional estimates of stock composition produced by the application of 72 SNPs indicated that they were equivalent to the accuracy and precision available from the existing DFO and GAPS microsatellites. If that level of resolution is all that is required, either the existing sets of microsatellites or the SNPs could be utilized. However, if the existing DFO and GAPS microsatellites are enhanced by either four or two microsatellites, respectively, the SNP baseline evaluated would require some enhancement. If population-level estimates of stock composition are required for some regions of the baseline utilized, an additional 46–50 SNPs of the average quality evaluated in the study would be required to provide population-specific accuracy and precision comparable to those obtained with the microsatellites. An additional 94–107 SNPs were projected to be required if population-specific results comparable to those available from the enhanced microsatellite baselines for British Columbia Chinook salmon were required. Single-nucleotide polymorphism arrays containing 100–200 loci were also estimated to be required to meet management standards for fine-scale resolution of Columbia River Chinook salmon (Hess et al. 2011). If individual assignment is part of the stock identification application, then no enhancement of the existing SNP baseline is required if the regional levels of accuracy provided by the DFO and GAPS microsatellites is acceptable, and only about 20 additional SNPs would be required if the enhanced microsatellite baselines were utilized. An additional 49–65 SNPs was projected to be required if population-specific assignment results comparable to those available from the enhanced microsatellite baselines were required. However, if SNPs are developed that provide higher resolution than the average of the 72 SNPs used in our projections, fewer additional SNPs will be required to produce stock composition results of a quality equivalent to that of the microsatellites.
The cost of laboratory analysis for individual fish is a key factor in deciding the appropriate technology to use in a particular laboratory. Technologies are replaced when one provides a clear advantage over another, such as substituting microsatellites or SNPs for allozymes in analyses of the genetic variation in Pacific salmon. Although a number of techniques are available to survey SNP variation, we are aware of none that will allow well over 100 SNPs to be analyzed in our laboratory at a cost comparable to that of analyzing up to 16 microsatellites with three injections on an automated DNA sequencer. For the present, the combined microsatellite-SNP approach outlined by Narum et al. (2008) and Hess et al. (2011) may be a practical approach to incorporating the power of both classes of markers.
ACKNOWLEDGMENTS
We thank various DFO field staff for collection of the samples used in the development of the baseline. Funding for the SNP portion of the study was provided by the Northern and Southern Endowment Funds of the Pacific Salmon Commission (PSC); funding for the GAPS microsatellite survey was provided by the two endowment funds as well as the Chinook Technical Committee of the PSC to support the technical work required for the U.S. section of the Pacific Salmon Treaty in implementation of the Chinook Salmon Agreement; and funding for the DFO microsatellite survey was provided by the DFO. L. Fitzpatrick drafted the map of population locations.