Forward genetics is a powerful tool to uncover the genetic mechanisms regulating biological processes. Moving from mutants to genes, or cloning, can be accomplished by a variety of methods. However, the ease and popularity of chemical mutagenesis, particularly ethyl methane sulfonate (EMS), has generated a large collection of mutants in diverse plant species for which a map-based or positional cloning approach is appropriate.
Map-based cloning (Fig. 1) uses the techniques of genetic mapping to define a progressively narrower chromosomal interval containing the mutant locus until the lesion causing the mutant phenotype is identified (Lukowitz et al., 2000). The strength of positional cloning is that it can be applied to mutations derived from almost any source (e.g., transposon, chemical, radiation, natural variation), as long as that mutation defines a locus that can be mapped.
Maize (Zea mays L.) has a long history of genetic research, and consequently a diverse collection of mutations affecting a broad range of biological processes has accumulated (Neuffer et al., 1997). Traditional cloning approaches in maize (e.g., candidate gene approaches, or transposon tagging) are labor intensive and risky. The large genome of maize and the lack of sequence information effectively discouraged any attempt at positional cloning, as it was assumed that extremely large mapping populations would be necessary to provide sufficient resolution. Undeterred, several laboratories have demonstrated that positional cloning in maize is generally no more difficult than in rice or even Arabidopsis (DC.) Heynh. (see, e.g., Wang et al., 2005; Bortiri et al., 2006; Gallavotti et al., 2008; Whipple et al., 2010). Although the maize genome is approximately 20 times larger than Arabidopsis, the majority of this size increase is the result of retrotransposon amplification, creating large non-recombining heterochromatic seas interspersed with highly recombining, gene-rich euchromatic islands (SanMiguel et al., 1996; Fu et al., 2002). These poorly recombining tracts of retrotransposons effectively disappear during mapping, leaving a much smaller, gene-rich portion of the genome that is “seen” in a genetic map. With the recent completion of the maize genome sequence (Schnable et al., 2009), positional cloning is rapidly becoming routine, opening up the large and diverse collection of maize mutants to cloning. While positional cloning is appropriate for cloning any trait that can be genetically mapped, there are situations for which a positional cloning approach may not be the best option. The high resolution needed for fine mapping requires a reasonable recombination frequency, which may be suppressed for loci that map to centromeres or other poorly recombining regions. Similarly, mutations associated with chromosomal rearrangement can prove difficult to map.
A universal protocol for positional cloning in maize would be unwieldy because the steps taken will depend upon the particularities of the locus to be cloned. Consequently, we will focus here on the most likely case, a single locus recessive or dominant mutant, while at the same time describing some of the more common alternatives that may need to be considered. Recently, high-throughput sequencing technology has opened up the possibility of mutant mapping and cloning with a single sequencing run, dramatically reducing the time typically spent on map-based cloning (Williams-Carrier et al., 2010; Schneeberger and Weigel, 2011; Abe et al., 2012; Liu et al., 2012). These approaches and the rapidly decreasing cost of sequencing will likely open many maize mutants to cloning by sequencing in the near future. However, even with these advances some loci will be recalcitrant to a sequencing approach, and thus require the traditional approach described here.
METHODS AND RESULTS
1. Genetic characterization of mutant —Poorly studied or recently isolated mutants will need to be genetically characterized, as the mapping strategy ultimately employed will depend on the nature of the mutation. The steps below outline a basic genetic characterization.
a. Determine the mode of inheritance. Cross the mutant to wild-type maize inbreds to create multiple F1s (Appendix 1, Note 1). If F1 progenies from these crosses display the mutant phenotype, the mutation is either dominant or semidominant. If all F1 progenies are phenotypically normal (hereafter, “wild type” will also be used to indicate phenotypically normal plants that can, however, carry polymorphisms compared to the B73 reference genome sequence), then the mutation is likely recessive. Self-pollinate the F1 plants from each distinct inbred cross to generate F2 populations. Recessive mutations should be seen segregating in these populations.
b. Determine the number of loci involved. Examine the frequency of the mutant phenotype in an F2 population. A population of approximately 100 F2 individuals from each F1 should be sufficient to determine the segregation ratio. A single recessive locus should conform to the expected 3:1 (normal:mutant) ratio. This can be confirmed by a χ2 goodness-of-fit test to the expected 3:1 ratio. Any significant deviation from this ratio could indicate that multiple loci contribute to the mutant phenotype (Appendix 1, Note 2).
c. Identify inbred lines that show consistent and strong expressivity (phenotype severity) and penetrance (i.e., plant phenotype consistently matches the genotype) of the mutant phenotype. Because the genetic background in maize can dramatically affect the severity, or even presence, of the mutant phenotype, F2 populations from crosses to inbreds where the mutant phenotype is consistent and easy to score will be very helpful. This will avoid problems when mapping, where it is imperative that the genotype at the mutant locus can be confidently determined from the plant phenotype (Appendix 1, Note 3).
2. Create a mapping population—After initial genetic characterization, an appropriate mapping strategy should be chosen. Because the creation of the mapping population is often the rate-limiting step in positional cloning, it is advisable to initiate multiple mapping populations as soon as possible. Ultimately, only polymorphic populations that are well-suited to your mutant of interest need to be used. The two most common and useful mapping strategies involve creation of F2 or backcross (BC) mapping populations. The steps in creating each mapping population are described below.
a. Cross the mutant to many diverse inbred lines (Appendix 1, Note 4). This is the first step for both BC and F2 populations. If possible, choose inbreds that are likely to maximize the number of polymorphic markers. Fig. 2A shows the relationships of common inbreds and can provide a rough guide for choosing lines that are more likely to be polymorphic. The crosses initiated during the genetic characterization can be used for this step.
b. Generate BC populations (Appendix 1, Note 5) by crossing the F1 back to one of the parental lines originally used to generate the F1. This backcross can be to either the mutant parent or to the wild-type inbred. The first case will generate progeny that are 50% homozygous mutant and 50% heterozygous, while a backcross to the inbred parent will create 50% heterozygous and 50% homozygous wild-type progeny. As a general rule, a backcross to the wild-type parent is appropriate for dominant/semidominant mutants, while recessive mutants must be backcrossed to the mutant parent to see the phenotype in the next generation (see Fig. 2B).
c. Generate F2 mapping populations (Appendix 1, Note 6) by selfing the F1 individuals (step 2a) to create segregating F2 populations. The phenotypic segregation of the mutant phenotype will depend on the nature of the mutation: recessive, dominant, or semidominant (Fig. 2C). In an F2 population for a recessive mutant, only the 25% mutant individuals will have a known genotype at the mutant locus, because the remaining wild-type plants can be either homozygous wild type or heterozygous. For a dominant mutant the situation is reversed, and only the 25% wild-type individuals will have a known genotype (homozygous wild type) because mutant individuals can be either homozygous or heterozygous for the dominant mutant allele. A semidominant mutant will segregate three phenotypic classes, and each class will have a known genotype.
3. Bulked segregant analysis—rough mapping —In this section, we describe a general method for bulked segregant analysis (BSA), used to quickly identify molecular markers linked to any trait of interest (Michelmore et al., 1991). BSA involves creation of two pools by bulking individuals identical in phenotype (e.g., wild type and mutant) from a segregating population. The pooled DNA samples are then genotyped with molecular markers that span the entire genome. Markers that are not linked to the mutant will show no difference in the wild-type and mutant pools, while linked markers will show an overrepresentation of the mutant parent allele only in the mutant pool. Although a variety of markers have traditionally been used for BSA, new methodologies have been developed for the rapid detection of single nucleotide polymorphisms (SNPs) that provide a more efficient alternative for this preliminary mapping step. One example is the MassARRAY System (Sequenom, San Diego, California, USA), which has been optimized at Iowa State University for the simultaneous detection of approximately 1000 SNPs polymorphic between B73 and Mo17 inbreds (Liu et al., 2010). The same set of markers can also be used for mapping populations created with different genetic backgrounds, with the caveat that only a subset of the B73-Mo17 SNPs will be polymorphic. In our experience, MassARRAY has also succeeded with the following genetic backgrounds: A619 × B73, A632 × Mo17, and A632 × Oh43. The steps below are appropriate for BSA using either MassARRAY or more traditional simple sequence repeat (SSR) markers (Taramino and Tingey, 1996). Regardless of the marker type used in BSA, the preparation of the pooled samples is identical.
a. Score your mapping population for both mutant and wild-type individuals. Flag and number the mutant and wild-type plants for collecting tissue.
b. Collect tissue from each of the mutant and wild-type plants scored in step 3a for DNA extraction. Pool an equal amount of tissue (leaf disks or equivalent) from at least 10–20 individual plants for both wild-type and mutant pools (Appendix 1, Note 7). We generally use a standard single-hole paper punch modified so that a 15-mL Falcon tube can be attached for collection of the leaf disks (punches) of uniform size (Fig. 3; Appendix 1, Note 8). Punch the same leaf three times into the collection tube to ensure that enough tissue is obtained from each sample in the pool. Be consistent with all the samples collected, to minimize sample-to-sample variation in the amount of tissue pooled.
c. Independently, collect individual leaf samples from the same individuals used to prepare the pools, and store them at −80°C. These samples will be used to confirm linkage in step 3g.
d. Extract genomic DNA from each pool. General protocols for extracting genomic DNA are usually sufficient. If samples are to be prepared for the MassARRAY System, we use a protocol (Appendix 2) that would also be appropriate for a standard BSA using SSR markers.
e. This and the following step (3f) can be skipped if BSA is performed using the MassARRAY System. If not using MassARRAY, analyze the mutant and wild-type pooled samples with SSR markers that are distributed across the whole genome and are polymorphic between the two parental lines used to generate the mapping population (Appendix 1, Note 9). It is also advisable to analyze, alongside the two pools, DNA from the parental lines and the corresponding F1 as controls. PCR-based SSR markers are highly polymorphic, inexpensive, and easy to use (Weber and May, 1989; Taramino and Tingey, 1996), and more than 1300 maize SSR loci have been mapped. To increase the likelihood of identifying a linked marker, select a polymorphic SSR every 1–2 bins (bins are arbitrary genetic units of maize chromosomes and correspond approximately to 20 cM). Mapped, polymorphic SSRs can be selected from those catalogued at the Maize Genetics and Genomics Database (MaizeGDB, http://www.maizegdb.org [Appendix 1, Note 9]). A list of polymorphic SSRs appropriate for BSA between A619 and B73 is provided in Table 1, and others have been identified for W22 × B73 (or Mo17) (Martin et al., 2010). Arrange the SSR set in a 96-well plate format (with forward and reverse primers for each SSR combined in a single well) to facilitate a high-throughput analysis of the DNA pools by PCR. These primers will then be used to set up five separate PCR reactions in a 96-well format using the following DNA templates: wild-type parent (1), mutant parent (2), wild-type pool from mapping population (3), mutant pool from mapping population (4), and F1 of mutant and wild-type parent (5). For PCR, use a touchdown protocol with the annealing temperature stepping down (−1°C/cycle) from 65°C to 55°C, followed by 30 cycles at 55°C, and a 45 s extension time for all cycles.
f. Analyze the PCR reactions on Agarose 3:1 HRB (AMRESCO, Solon, Ohio, USA) or MetaPhor Agarose (3–4%; Lonza, Basel, Switzerland) gels in 1× TBE, electrophoresed for 1–2 h at 100–120 V, or until all bands resolve. For each SSR marker, load each of the five PCR reactions set up in step 3e into adjacent lanes. Potentially linked SSRs will be different between the mutant and wild-type DNA pools (Fig. 4). For a recessive mutation, any linked SSR marker should show only the band corresponding to the mutant parent allele in the mutant pool sample, while the wild-type pool should carry both the wild-type and mutant parental alleles (Fig. 4), although there is often a faint wild-type band in the mutant pool (Appendix 1, Note 10).
g. Confirm the linkage of the polymorphic SSRs from step 3f by using the DNA extracted from each individual that formed the pools (step 3e). A linked marker should show an overrepresentation of the mutant parental allele in the mutant samples. Verify the significance of any potential linkage with a χ2 test. If using the MassARRAY System, select a few SSRs in the same chromosomal region for confirmation.
4. Fine mapping —While BSA (step 3) provides a rough chromosomal location, it is still necessary to substantially refine the map position of the mutant of interest for the eventual cloning of the gene. This will require developing new molecular markers and transitioning from a genetic map to a physical map. Keep in mind that the resources available for fine mapping are quickly evolving and constantly updated (a list of useful databases is provided in Table 2). Furthermore, with the release of the first assembled genomic sequence for maize (Schnable et al., 2009), a wealth of additional information became available. For these reasons, we recognize that there are multiple, yet equally valid, ways to proceed. Here we describe a procedure that has worked successfully for us in the past.
Before proceeding, it is important to understand that the maize physical map is constructed of contigs and bacterial artificial chromosomes (BACs). A contig is a set of overlapping (contiguous) DNA fragments. BACs are bacterial plasmids carrying large fragments (∼100–150 Kb) of genomic DNA that have been used to assemble the sequence of the maize genome. Large contigs are usually assembled by overlapping sequences of BAC clones. Genetic markers often define a unique sequence that has been localized to a specific BAC, and thus to a specific contig. These markers are said to be “anchored” and make it possible to transition from the genetic map to a physical map location. Anchoring information for individual markers is usually available at MaizeGDB ( http://www.maizegdb.org; Table 2).
a. Identify polymorphic markers flanking the mutation (i.e., markers located proximal and distal to the mutation). These markers will define the interval on a genetic map containing the mutant locus. Using the map location determined in step 3g, select additional polymorphic SSRs in flanking bins. Genotype each individual from the mutant pool with these new markers (if dealing with a BC population, both mutant and wild-type individuals should be genotyped). Identify individuals that are recombinant between these SSRs and the mutant locus. Recombinants are individuals whose SSR genotype does not match the genotype at the mutant locus. These recombinant individuals make it possible to order the SSR loci relative to the mutant. Any markers that are on the same side of the mutation should have recombination in the same F2 or BC individuals, with closer markers showing recombination in a progressively smaller subset of the F2 or BC individuals. Markers that are on the opposite sides of the mutation should show recombination in different F2 or BC individuals (Appendix 1, Note 11). Identify two SSR markers located on the opposite sides of the mutant locus (Fig. 5) and calculate the recombination frequency between the mutant locus and the SSR markers to establish an ordered genetic map. If possible, find flanking markers that are each ∼5 cM or less from the mutant to reduce double recombination events that will complicate later mapping.
b. Extract genomic DNA from all the mutant samples of a large (approximately 500 chromosomes, see Appendix 1, Note 12) F2 mapping population. If using a BC population with either a recessive or a dominant mutant, extract also all wild-type samples (see Fig. 2B). A “quick-and-dirty” protocol (Appendix 2) for DNA extraction of the entire mapping population is suggested to reduce time. Most of these samples will not be used as only the recombinants are informative for fine mapping. Store numbered leaf samples from each individual at −80°C as these will provide a backup source of DNA and allow for a more “clean” DNA extraction from just the recombinant individuals.
c. Genotype each sample from step 4b with the flanking SSR markers determined in step 4a. Identify recombinant individuals with each flanking marker; these should comprise a largely nonoverlapping set (Fig. 5). If there are more available polymorphic SSRs within the region defined by the flanking markers, use these to genotype only the recombinants and further narrow the interval. Ultimately, a set of recombinant individuals for each of your closest flanking markers will be identified (Appendix 1, Note 13). Because these recombinants will be genotyped with several more markers in subsequent steps, we suggest extracting DNA from the leaf samples set aside earlier at −80°C (step 4b).
d. Analyze the corresponding physical map of the genetic interval defined by the flanking SSR markers. Anchor these markers by identifying BACs, or sequence contigs containing them, at MaizeGDB ( http://www.maizegdb.org) or Gramene ( http://www.gramene.org), taking care to search the most recent genome assembly; these will define the physical interval containing your gene. Determine the ratio of the genetic:physical distances for your interval, by dividing the genetic distance (in cM) by the physical distance (in mega base pairs [Mb]). This analysis (cM/Mb) provides an estimate of the recombination rate in your interval, and can help determine if the number of recombinants remaining will narrow the region to a reasonable number of genes to evaluate in step 5 (Appendix 1, Note 14).
e. After exhausting all of the polymorphic SSR markers in the region, if the physical region is still too large (see step 4f below) it will be necessary to develop new markers to further restrict the interval. There are multiple ways to proceed as there are different types of molecular markers. This step requires the use of information from different databases (Table 2). Depending on the background of the mapping population, there may already be information available on polymorphic SNPs, SSRs, and insertion-deletions (indels) (Qu and Liu, 2013; Xu et al., 2013; Settles et al., 2014) (Appendix 1, Note 15). If using backgrounds for which SNP information is not available, new markers must then be developed using sequence polymorphisms found in genes within the interval. Identify several predicted unique genes in the mapping interval. To avoid retrotransposon or other potentially repetitive sequence, we generally select genes that have been annotated with a predicted function, excluding all repetitive elements and genes of unknown function. Design primers to amplify 1–2-kb portions of these predicted genes, and preferably design primers that bind to exons and amplify across introns to maximize the likelihood of finding polymorphisms. Amplify this region from both parents of the mapping population, isolate the bands, and send these samples to be sequenced. Align the parental sequences with blast2seq (at the National Center for Biotechnology Information [NCBI]) to identify useful polymorphisms, including indels or SNPs. If indels are present (of at least 8 bp or more), these can be resolved on 3–4% Agarose 3:1 HRB (or MetaPhor) gels after designing a new set of primers to amplify a smaller region (∼100–300 bp) across the indel (Fig. 6). If only SNPs are identified, these are often easy to develop as cleaved amplified polymorphic sequence (CAPS) markers or as derived cleaved amplified polymorphic sequence (dCAPS) markers (Konieczny and Ausubel, 1993; Neff et al., 1998). If the SNP creates a polymorphic restriction site, a CAPS marker can be created by PCR amplification of a short fragment containing the polymorphic site, followed by restriction digest of the PCR product (Fig. 6). If SNPs do not generate a restriction site polymorphism, it is often still possible to generate a dCAPS marker. One of the dCAPS primers artificially introduces one or more point mutations to create a restriction enzyme site that distinguishes the mutant from the wild-type allele (Fig. 6; Appendix 1, Note 16). The dCAPS Finder website ( http://helix.wustl.edu/dcaps/dcaps.html) can be used to design primers. Following a standard PCR reaction, the samples are digested with the restriction enzyme. We use 5–8 µL of the PCR reaction in a 20-µL restriction digest with 10–20 units of enzyme for 1–2 h (buffer and incubation temperatures vary by enzyme). It is necessary to amplify a small fragment (150–250 bp) because the restriction polymorphism will often result in only small length differences following digest that must then be resolved on 3–4% agarose.
f. If the estimated distance between the SSR markers and the mutant locus is sufficiently small (<10 cM), proceed to analyze only the recombinant samples from step 4c with the newly developed indel, CAPS, or dCAPS markers. If the interval is >10 cM, it is advisable to genotype the whole population to avoid mis-scoring double recombinants as nonrecombinants, and thus losing informative recombinant samples.
g. Reiterate steps 4e–f to narrow the interval until all recombinants are exhausted or the interval is small enough to proceed to step 5 (see criteria below). These iterations are often laborious, with progress limited by the number of recombinants available (the more, the better), and by the frequency of polymorphisms among the parental lines (the more diverse the parents, the better; see step 2a).
5. Evaluating candidate genes in the interval and confirmation that the correct gene has been identified—At this point, the mutation should be localized to a minimal interval flanked by markers, each showing recombination with at least one individual from the mapping population. When the physical distance delineated by your closest flanking markers is localized to a single contig and narrow enough to contain a reasonable number of genes, it is possible to begin evaluating genes within this interval.
a. Anchor your closest flanking markers to the genome and determine the number of predicted genes within the minimal interval. This is simply a matter of examining the genes (as annotated by MaizeGDB [ http://www.maizegdb.org] or Phytozome [ http://www.phytozome.org]) that lie within this interval. If the number of genes is sufficiently small (Appendix 1, Note 17), evaluate each gene as a possible candidate (step 5b). If the region proves too large, identify new recombinants to narrow it by increasing the mapping population (Appendix 1, Note 18).
b. Evaluate each of the genes in the interval, and make an ordered list of good candidates (Appendix 1, Note 19).
c. Analyze candidate gene sequences for lesions. Design primers and PCR amplify candidate genes from both mutant and wild type with high-fidelity Taq (Appendix 1, Note 20). Sequence these PCR products and compare wild-type and mutant sequences, looking for sequence differences that are consistent with the nature of the mutation. EMS tends to produce point mutations (more than 99% are G/C to A/T transitions) (Greene et al., 2003), although deletions are also possible, and recessive mutants should result from deleterious mutations such as premature stop codons, splice-site mutations, or changes in conserved amino acid positions. Mutants from other sources (e.g., transposon active populations or naturally occurring mutations) could be caused by a variety of possible lesions including point mutations, insertions, deletions, and chromosomal rearrangements (Appendix 1, Note 21).
d. Confirm the candidate gene. If sequencing identifies a change relative to the reference B73 sequence in a candidate gene, this does not guarantee that this change causes the mutant phenotype. It could be a naturally occurring polymorphism or a linked mutation. Naturally occurring polymorphisms can be ruled out by sequencing the known progenitor of the mutant, if available (Appendix 1, Note 22). Ruling out a linked mutation is more difficult, and it is common practice in maize to sequence multiple independent alleles, which should each have distinct lesions in the same gene. Complementation of the mutant phenotype by transgenic insertion of the wild-type copy is an alternative approach, although less common given the time and expense of generating transgenic plants and performing the complementation crosses (Appendix 1, Note 23).
Simple sequence repeat (SSR) markers polymorphic between A619 and B73 for bulked segregant analysis (BSA). These SSR markers, ordered by chromosome (chr.) location, were selected uniquely based on gel patterns available at MaizeGDB. Note that for some bins no polymorphic SSR marker was available. By analyzing the bulk DNA samples with these markers, there is a good chance of obtaining a rough map location for the mutant of interest. We successfully mapped all five mutants that we attempted with this set. A similar list of polymorphic SSR markers useful for BSA in W22 × Mo17 and W22 × B73 mapping populations has been described ( Martin et al., 2010).
Available online resources.
The preceding protocol has proven successful for positional cloning a number of maize mutants, and should be effective for a large number of the maize mutants that currently exist. As the genomes of more plant species become available, we anticipate that this protocol can be broadly applied with appropriate modifications, opening up a large number of novel species to positional cloning approaches.