Writing about a biological subject based on one species can limit the scope of the discussion, but in the case of DNA methylation and Arabidopsis, the restriction is entirely appropriate. Unlike many other popular model organisms, Arabidopsis has retained and embellished a multi-layered methylation system that contributes to gene and transposon silencing, imprinting, and genome stability. Many of the findings from Arabidopsis are applicable to other eukaryotes. The field has relied in large part on forward and reverse genetic screens and the study of methylation and its consequences at one or a few loci. These sorts of experiments are still providing novel insights, especially when combined with the natural genetic variation present within Arabidopsis thaliana. Additionally, the field is increasingly moving towards genomics to understand the interplay between methylation, demethylation, chromatin state, and gene expression.
Cytosine can be methylated at the carbon five position, and in plants this can occur on any cytosine regardless of the sequence context (Figure 1). In general, 5-methylcytosine is associated with transcriptional silencing. How this is achieved is still not well understood. Methylation can block transcription factor binding and prevent transcription or it can recruit chromatin-modifying complexes that mediate silencing through changes to the underlying histones. In this chapter we provide an introduction to the mechanisms of DNA methylation and demethylation and consider the biological relevance of these activities at different types of sequences. First, we provide a brief background on how methylation status is empirically determined, then address in more detail the enzymes and systems that add and remove it, and the interaction of DNA methylation with chromatin. Finally, we consider possible regulatory roles of DNA methylation in Arabidopsis.
If you're interested in methylation at a certain sequence, how do you go about measuring it? Several methods of varying utility have been devised to determine the methylation status of specific sequences and entire genomes; most rely on chemical differences between 5-methylcytosine and cytosine or on the ability of restriction enzymes and antibodies to discriminate between the two.
Several restriction enzymes are inhibited by methylation in CG, CHG, or CHH (H = A, T, or C) sequence contexts. Genomic DNA can be digested with a methylation sensitive restriction enzyme and the sequence of interest probed by Southern blot, with the amount of methylation determined by the completeness of digestion. The major disadvantage to this method is that the methylation status of only a few nucleotides can be queried at once. Methylation is often found in clusters in Arabidopsis. Although clusters of methylation are highly heritable, differences do arise at individual nucleotides within those regions even in closely related lines (Tran et al., 2005a). Given the potential for site-to-site variability in DNA methylation, drawing any conclusions about methylation based on one or a few enzyme sites can be tenuous.
The McrBC enzyme is also of use; it imprecisely cleaves methylated DNA if there are methylated sites containing a methyl-cytosine preceded by a purine within 40 bp to 3 kb of one another (Sutherland et al., 1992). After genomic DNA is digested with McrBC, regions of interest can be amplified by PCR—the more highly methylated a sequence is the less it will be amplified (Vaughn et al., 2007). An advantage of both of the methods that employ methylation-sensitive enzymes is that all molecules can be assayed simultaneously.
Affinity purification is an attractive route for assaying methylation. A commercially available 5-methylcytosine antibody can be used to pull down the methylated fraction of the genome. Particular sequences can then be analyzed by comparing PCR amplification between input and pull-down fractions or between genotypes. The methyl-binding domain (MBD) from mammals has been used in a similar manner, but it only binds methylated CG sites.
Bisulfite sequencing provides the most detailed data about methylation of particular cytosines in a sequence. Treatment of restriction digested or sheared single-stranded DNA with sodium bisulfite followed by desulfonation converts cytosine to uracil but leaves 5-methylcytosine intact (Figure 1) (Frommer et al., 1992; Clark et al., 1994). A region of interest, usually less than 700 base pairs, is then amplified in a strand-specific manner by PCR; after cloning and sequencing of the PCR product the original cytosines will be read as Ts, and 5-methylcytosines as Cs. Comparison to the reference sequence allows determination of the methylation status of each cytosine. Although bisulfite sequencing is the gold standard for methylation analysis, biases can be introduced at several points in the procedure, including PCR amplification and cloning of the PCR products (Warnecke et al., 2002). Sequencing the PCR product directly using pyrosequencing can eliminate one aspect of this bias, as long as information about individual molecules is not required (Tost and Gut, 2007). Because bisulfite treatment produces DNA strands that are no longer complementary, two separate PCR reactions must be performed if methylation information for each strand is desired. This information might be particularly relevant in plants since asymmetric cytosines are methylated and thus the number of potentially methylated sites differs on each strand. Hairpin bisulfite PCR cleverly solves this problem by joining the complementary strands of DNA using a hairpin linker, allowing them to be assessed simultaneously (Laird et al., 2004), but has so far not been applied in plant studies.
Many of the above methods can be, or have already been, extended to whole genome methylation analysis if combined with microarrays or high-throughput sequencing (Yazaki et al., 2007; Zilberman and Henikoff, 2007). As the technology has improved, the definition of what constitutes “genome-wide” and “high-resolution” methylation mapping has also changed. Initial efforts to globally map methylation in Arabidopsis relied on methylation-sensitive enzymes and microarrays of limited genomic coverage (Tompa et al., 2002; Lippman et al., 2004; Tran et al., 2005a; Tran et al., 2005b), but provided important information about methylation location at a gross level. Methylation has also been mapped by immunoprecipitating the methylated fraction of the genome with a 5-methylcytosine antibody or MBD and hybridizing the bound or unbound DNA to tiling microarrays. This has allowed detailed examination of methylation patterns across the genome and within specific regions (Zhang et al., 2006; Penterman et al., 2007c; Zilberman et al., 2007). Methylation mapping in methyltransferase, demethylase, and RNA interference mutants is giving us a broader understanding of how methylation patterns are established, maintained, and potentially changed. Recently, methylation has been mapped genome-wide at single base resolution by bisulfite-treating DNA, sequencing it using Solexa high-throughput sequencing technology, and aligning the sequences back to the genome (Cokus et al., 2008). This has provided a level of detail unprecedented for any organism. Another new method, which requires only small amounts of DNA, involves hybridizing bisulfite-treated DNA to tiling microarrays (Reinders et al., 2008). It is expected that the use of high-throughput sequencing or microarrays in combination with affinity purification or bisulfite treatment will continue to further refine the Arabidopsis methylation map in different genotypes, tissues, or conditions.
Once a methylation pattern is determined, deriving meaning from it can be challenging. Often differences are compared between wild type and mutant plants or between different conditions. Although differences might be statistically significant, biological significance is harder to assess. As will become clear in the following sections, large questions still remain as to how methylation exerts the influence that it does, or if methylation at particular sequences is relevant to biological function.
The Arabidopsis genome contains methylation at 24% of CG sites, 6.7% of CHG and 1.7% of CHH (Cokus et al., 2008). How is this methylation distributed? Staining an Arabidopsis interphase nucleus with the DNA dye DAPI reveals several bright spots referred to as chromocenters. Immunostaining with a 5-methylcytosine antibody results in a strikingly similar distribution (Soppe et al., 2002; Fransz et al., 2003; Tariq et al., 2003). On each chromosome, chromocenters contain repetitive DNA sequences consisting of the 178 base pair centromere tandem repeat sequences flanked by pericentromeric heterochromatin, which is largely composed of repetitive elements, including DNA transposons and retroelements. Each chromocenter encompasses several megabases of DNA. On chromosomes 2 and 4 additional chromocenters are formed from the nucleolar organizing regions (NOR), which consist of arrays of repeated 45S rDNA genes (Pikaard, 2002). It is clear that the preferred location for DNA methylation is at repetitive DNA sequences (Figure 2). This holds true outside of chromocenters, as most euchromatic transposons are also highly methylated, and within chromocenters, where genes surrounded by methylated transposons can themselves remain free of methylation (Lippman et al., 2004; Zhang et al., 2006; Zilberman et al., 2007).
Transposable elements (TEs) invade genomes and increase in copy number, with strong potential for detriment to the host. All organisms have adopted mechanisms to keep transposable elements silent, including RNA-based chromatin silencing, histone modifications, DNA methylation, or a combination thereof (Slotkin and Martienssen, 2007). The bias of methylation toward repetitive DNA sequences suggests that one of methylation's primary functions is to silence the transcription of transposable elements (Zilberman and Henikoff, 2004; Gehring and Henikoff, 2007). In Arabidopsis, transposons are generally methylated throughout their length at cytosines in all sequence contexts, although distinct patterns do emerge at individual loci (Lippman et al., 2003; Lippman et al., 2004; Zhang et al., 2006; Zilberman et al., 2007).
Studies of individual genes and of methylation patterns genome-wide have revealed that genic sequences also contain considerable amounts of methylation, although mainly in the CG sequence context (Tran et al., 2005a; Zhang et al., 2006; Zilberman et al., 2006; Vaughn et al., 2007; Cokus et al., 2008). It is estimated that at least a third of genes contain some methylation within their coding sequences. Although coding region methylation alone can lead to gene silencing (Hohn et al., 1996; Chawla et al., 2007), most of these genes are transcribed very well and are distributed across all gene ontology categories. Clearly, methylation is not acting to silence these genes – so what is its impact there? There is an intimate relationship between methylation and transcription. Unlike in transposons, methylation shows a distinct distribution within genes (Figure 2). Methylation is depleted at both the 5′ and 3′ ends of coding sequences (Zhang et al., 2006; Zilberman et al., 2006; Vaughn et al., 2007). The distribution of methylation within genes is the inverse of the distribution of RNA polymerase II distribution (Zilberman et al., 2007). This suggests that methylation at the 5′ and 3′ ends of genes could be inhibitory to transcription, potentially interfering with initiation or termination. Indeed, genes that are methylated in their promoter and 5′ coding sequences have some of the lowest transcription rates in the genome. Moderately transcribed genes are more likely to be methylated than those with low or high expression (Zhang et al., 2006; Zilberman et al., 2007). This suggests that the process of transcription itself, if it is not too slow, or not too fast, can promote methylation. One interesting, but unexplained, observation from these studies is that genes are more likely to be methylated the closer they are to centromeres, although this does not depend on how close a gene is to a transposable element (Zilberman et al., 2007). This suggests there might be a chromosome-level organization of DNA methylation.
DNA Methyltransferases: Adding Methylation
The Arabidopsis enzymes that add a methyl group to cytosine were discovered through genetic screens for mutations that reduced methylation at centromeres, relieved silencing of endogenous genes or transgenes, or by sequence similarity to methyltransferases discovered in other organisms. These genes have been identified for a decade and several excellent reviews are available (Rangwala and Richards, 2004; Chan et al., 2005; Goll and Bestor, 2005). All known cytosine 5-methyltransferases belong to a single family with several subfamilies. There are three subfamilies of DNA methyltransferases in Arabidopsis: CG maintenance methyltransferases, chromomethylases, and the de novo methyltransferases. Multiple genes exist for each enzyme class, but only one enzyme of each type appears to be active; the other genes are either expressed at low levels or contain stop codons in various backgrounds, and none have been recovered in mutant screens. The crystal structures of bacterial cytosine methyltransferases in complex with DNA have been solved (Suck, 1994). The cytosine base is flipped out of the DNA helix into the enzyme active site, where an S-adenosyl-L-methionine (SAM) cofactor donates a methyl group to the carbon 5 position. Disruptions of enzymes that affect the SAM cycle reduce DNA methylation in Arabidopsis (Tanaka et al., 1997; Rocha et al., 2005; Mull et al., 2006; Jordan et al., 2007).
METHYLTRANSFERASE1 (MET1) is the CG maintenance methyltransferase in Arabidopsis. This designation is based on sequence similarity to Dnmt1, the orthologous mammalian maintenance methyltransferase, and on the effect mutations in the gene have on DNA methylation. Like Dnmt1, MET1 contains a long N-terminal domain and a C-terminal catalytic domain. The N-terminal region contains two BAH domains (bromo-adjacent homology), which might interact with other proteins (Finnegan and Kovac, 2000; Kankel et al., 2003; Goll and Bestor, 2005). Dnmt1 methylates newly synthesized DNA strands after DNA replication, using the CG methylation pattern of the parent strand as a template (Goll and Bestor, 2005).
Methylation has been mapped globally in met1 mutants (Zhang et al., 2006; Cokus et al., 2008). More than half of the regions that are methylated in wild type are lost in met1. The effect is strongest at genes, which makes sense because they are primarily methylated in the CG context. At some repetitive sequences significant amounts of CHG and CHH methylation are also lost. Additionally, new CHG and CHH methylation (hypermethylation) appears at previously unmethylated sequences and increases as met1 plants are inbred (Jacobsen et al., 2000; Mathieu et al., 2007). Many TEs become transcriptionally active in a met1 mutant (Kato et al., 2003; Lippman et al., 2003; Zhang et al., 2006; Zilberman et al., 2006) as do sequences that are currently unannotated, which could either represent non-coding RNAs or as yet unidentified TEs (Zhang et al., 2006).
Mutations in MET1 and antisense-directed MET1 silencing cause varied and dramatic phenotypes from gametogenesis onwards. MET1 is required to maintain methylation patterns during the haploid gametophyte stage of the plant life cycle (Saze et al., 2003). In the sporophyte, met1 phenotypes include abnormal embryo patterning, narrow leaves, homeotic transformations of floral organs, altered flowering time, and reduced fertility (Finnegan et al., 1996; Kankel et al., 2003; Saze et al., 2003; Xiao et al., 2006; Mathieu et al., 2007). Some of the affected genes underlying these phenotypes have been identified. For example, met1 plants are late flowering in many backgrounds because the homeodomain gene FWA, which is normally promoter- methylated and silent during sporophyte development, becomes hypomethylated and expressed (Soppe et al., 2000; Kankel et al., 2003). Hypermethylation of the SUPERMAN and AGAMOUS genes contributes to some of the floral phenotypes (Jacobsen et al., 2000). The reasons for many of the other phenotypes remain unknown.
The other maintenance methyltransferase in Arabidopsis is CMT3 (CHROMOMETHYLASE3), a methyltransferase containing a chromodomain. This particular family of enzymes is unique to plants (Henikoff and Comai, 1998). Chromodomains bind to methylated lysines in histone tails. CMT3 also has a single BAH domain in the N-terminal region. CMT3 maintains methylation in the CHG sequence context, and genetic and molecular evidence indicates that it does so in close concert with histone H3 lysine 9 (H3K9) methyltransferases (Bartee et al., 2001; Lindroth et al., 2001; Jackson et al., 2002). In vitro, CMT3 binds the N-terminal tail of histone H3 when it is trimethylated at lysine 9 and trimethylated at lysine 27, suggesting a dependence of CHG methylation on histone methylation (Lindroth et al., 2004). However, H3 K9 trimethylation is not a common histone modification in Arabidopsis (Jackson et al., 2004; Johnson et al., 2004), so it remains to be seen if this finding is relevant in vivo. cmt3 single mutants do not have any morphological phenotypes, but when combined with a null met1 allele the outcome is severe (Xiao et al., 2006; Zhang and Jacobsen, 2006). The additional loss of CHG methylation in met1 cmt3 mutants might push the genome over a methylation threshold such that the remaining methylation, in whatever sequence context, is insufficient to accomplish its functions.
The de novo methyltransferases, DRMs (DOMAINS REARRANGED METHYLTRANSFERASES), were identified based on homology to the mammalian de novo methyltransferases Dnmt3a and Dnmt3b, although their catalytic domains are in a different order along the protein sequence (Cao and Jacobsen, 2002). The enzymes also have a N-terminal ubiquitin-associated domain. DRM2 appears to be the only functional enzyme in Arabidopsis. De novo methylation can be monitored by assaying CHH methylation, which must be maintained after DNA replication by constant targeting since it is not symmetric between complementary DNA strands. A genetic distinction can be made between establishing methylation at previously unmethylated sequences and actively maintaining asymmetric methylation. DRM2 is required for establishing methylation at all loci examined. For maintenance, DRM2 has locus-specific effects on asymmetric methylation – at some loci it is the only enzyme required, whereas at others CMT3 also acts as a de novo methyltransferase (Cao and Jacobsen, 2002). In in vitro assays, tobacco DRM preferentially methylates CHH and CHG sites, with far less activity at CG sites, and prefers unmethylated DNA over hemimethylated DNA (Wada et al., 2003). Loss of drm2 and cmt3 has little overall effect on the distribution of methylation genome-wide (Zhang et al., 2006). This is because non-CG methylation is always found in the vicinity of CG methylation (unless CG sites are depleted from the sequence) and meCG, which is the most abundant context for cytosine methylation, does not change in drm2 cmt3. drm2 cmt3 mutant plants, but not drm2 mutants, also display multiple phenotypes including small size, twisted leaves, and reduced fertility (Cao and Jacobsen, 2002). Affected genes underlying these phenotypes have not been identified.
MET1 might also have de novo methyltransferase activity. Dnmt1, the mammalian MET1 homolog, was originally isolated based on de novo methyltransferase activity (Goll and Bestor, 2005). De novo methylation of CG sites is impaired in met1 mutants (Aufsatz et al., 2004). Additionally, drm2 cmt3 mutants retain some CHG and CHH methylation, particularly in pericentromeric heterochromatin (Cokus et al., 2008). This suggests that another enzyme(s) has de novo methylation activity.
Targeting Methylation to Specific Sequences
Now that we've introduced the enzymes that methylate DNA, we'll turn to how these enzymes are targeted to their substrates. De novo DNA methylation is guided by small interfering RNAs (siR-NAs) in a process termed RNA-directed DNA methylation (RdDM). This topic has been extensively reviewed and is the subject of much current research (Bender, 2004; Matzke and Birchler, 2005). Here we discuss only the basics.
Arabidopsis contains multiple small RNA pathways that are involved in pathogen defense, development, stress response, and the establishment of DNA methylation (Vaucheret, 2006; Chapman and Carrington, 2007). Each small RNA pathway relies on the function of an Argonaute (AGO), Dicer (DCL) and, in some instances, an RNA-dependent RNA polymerase (RdRP). The molecular functions of these enzymes have been elucidated in a number of organisms. Dicer RNA endonuclease enzymes cleave double-stranded RNAs (dsRNAs) into 21–25 nt pieces. One strand of the RNA is loaded into Argonaute complexes and then, depending on the pathway, can serve as a guide for transcript cleavage via Argonaute slicer activity or for de novo methylation. If dsRNAs do not arise naturally (e.g. from miRNA foldback precursors or overlapping sense-antisense transcripts) they can be synthesized by RNA-dependent RNA polymerases from single stranded RNA templates. Arabidopsis contains multiple genes for each of these enzymes. Many have discrete functions in particular small RNA pathways, although overlaps and substitutions between pathways do occur.
For most de novo DNA methylation at repetitive sequences, the primary players are AGO4, RDR2 (the RdRP), and DCL3. Also essential is RNA polymerase IV, a plant-specific addition to the familiar DNA-dependent RNA polymerase I, II, and III family of genes, and DRD1, a SNF2-like chromatin remodeling enzyme (Kanno et al., 2004; Herr et al., 2005; Kanno et al., 2005; Onodera et al., 2005; Pontier et al., 2005). Mutation in any of these genes prevents de novo methylation establishment at a naïve FWA transgene and causes loss of non-CG methylation at some loci where it is being actively maintained (Chan et al., 2004; Chan et al., 2006; Huettel et al., 2006). There are two large subunits for Pol IV, NRPD1a and NRPD1b, and one small subunit, NRPD2. Genetic and cell biology evidence indicates that NRPD1a and NRPD1b have distinct functions in the siRNA biogenesis and the RdDM pathway. The current model for how methylation is established is as follows: aberrant transcripts are produced from endogenous repeat loci, including transposons, in a process dependent on NRPD1a and NRPD2. RDR2 converts these transcripts into dsRNA, which DCL3 processes into 24 nt siRNAs. The siRNA is then loaded into a complex containing AGO4 and NRPD1b, which act with DRD1 and DRM2 to target the cognate DNA for methylation by DRM2 (Figure 3). NRPD1b has been shown to directly bind AGO4 via an extended C-terminal domain distinct from that found in Pol I, II, III or NRPD1a (Li et al., 2006; El-Shami et al., 2007). AGO4 can also slice transcripts generated from the target locus, producing secondary siRNAs that will continue to feed into the pathway, although this catalytic activity is not always required to maintain methylation (Qi et al., 2006).
Most, but not all, of the heterochromatic small RNAs generated from repeats are of the 24 nt size class and are dependent on RDR2 and NRPD1a (Xie et al., 2004; Lu et al., 2006; Zhang et al., 2007). For a transgene that produces a hairpin transcript, neither NRPD1a nor RDR2 is required (Kanno et al., 2005). Deep sequencing of small RNA populations in nrpd1a/1b and rdr2 mutants demonstrated that a class of siRNAs corresponding to hairpin RNAs persist, presumably because the hairpin generates a dsRNA substrate that can be directly diced (Zhang et al., 2007). While DCL3 is responsible for processing many of the methylation-directing siRNAs, DCL2 and DCL4, which generate slightly smaller siRNAs, serve functions that are redundant with DCL3 for targeting methylation (Henderson et al., 2006). For example, dcl2 dcl3 dcl4 triple mutants establish de novo methylation at naïve sequences more poorly than dcl3 mutants alone and, at some loci, are required to maintain a full level of non-CG methylation (Henderson et al., 2006). Additionally, AGO6 seems to act redundantly with AGO4 at transgene and endogenous loci (Zheng et al., 2007). Ultimately, the mechanics of how the process works seems to depend on the type of locus being examined. All repeats are not created equal: inverted repeats, tandem repeats, and dispersed repeats can have different genetic requirements for establishing and maintaining methylation.
Recently, progress has been in made in understanding where these processes take place within the cell (Figure 3). Unlike small RNAs that direct post-transcriptional gene silencing, which are found in the cytoplasm, heterochromatic 24 nt siRNAs reside in the nucleus. Many concentrate in the nucleolus (Pontes et al., 2006), the nuclear organelle responsible for the production of rRNA and ribosome assembly. Also found in the nucleolus are RDR2, DCL3, AGO4, NRPD1b, and DRM2, suggesting that siRNA processing and complex loading takes place there (Li et al., 2006; Pontes et al., 2006; Li et al., 2008a). Disruption of NRPD1a causes mislocalization of RDR2, DCL3, and NRPD1b, supporting its upstream role in the RdDM pathway (Pontes et al., 2006). AGO4 colocalizes with Cajal bodies, which are distinct nucleolar bodies involved in ribonucleoprotein (RNP) complex processing (Li et al., 2006), and with smaller AB bodies, which lie adjacent to the NORs (Li et al., 2008a). NRPD1b and DRM2 localize with AB bodies, but not Cajal bodies (Li et al., 2008a). Unlike AGO4, localization of NRPD1b to the nucleolus is actually only observed in a small percentage of nuclei (6%) (Li et al., 2006; Li et al., 2008a). It would be interesting to determine under what conditions this occurs or if these nuclei arise from a particular cell type. AGO4, NRPD1b, and DRM2 are also found more diffusely throughout the nucleus, presumably at target DNA sequences (Li et al., 2006; Pontes et al., 2006; Li et al., 2008a).
It has been proposed that Pol IVa transcribes heterochromatic methylated DNA into RNA, which is then made into dsRNA by RDR2. However, polymerase activity has not yet been demonstrated for RNA Pol IV. No DNA-dependent RNA polymerase activity is associated with the smaller subunit NRPD2 (Onodera et al, 2005). An RNA template has also been suggested as a substrate for Pol IV (Vaughn and Martienssen, 2005; Pontes et al., 2006). A conventional DNA-dependent RNA polymerase might transcribe methylated DNA into RNA. Any aberrant transcripts arising from this transcription could then become a template for Pol IVa and the resulting RNA a template for RDR2 (Pontes et al., 2006). Transcripts from methylated DNA might be more likely to be aberrant because methylated DNA can impede transcript elongation (Lorincz et al., 2004), thus reinforcing methylation at sequences that are already methylated. This begs the question of what makes an RNA aberrant. Another RdRP (RDR6) makes dsRNA from transgene RNA templates that are truncated and without poly (A) tails (Luo and Chen, 2007) or uncapped (Gazzani et al., 2004). The template for RDR2 is unknown.
An open question remains as to how siRNAs actually guide methylation. Does the siRNA bind to complementary DNA, or does it bind to a nascent RNA transcript? Data from an instance of microRNA (miRNA)-directed DNA methylation suggests that the latter is the more likely possibility (Bao et al., 2004). PHABULOSA (PHB) expression is down-regulated by a miRNA that is complementary to the spliced PHB transcript. This is correlated with methylation of the PHB gene downstream of the miRNA binding site, which only occurs when the miRNA binding site remains intact. This suggests that the miRNA base-pairs to the spliced nascent PHB transcript and directs DNA methylation, through unknown means, to the template chromosome (Bao et al., 2004).
The mechanisms of methylation targeting described so far apply to CG and non-CG methylation at repeated sequences. What about the methylation in genes? Genes can contain repeats, which can also be methylated efficiently by these systems (Chan et al., 2004). However, in general genes are depleted in siRNAs (Kasschau et al., 2007) and even genes with methylation in their bodies are not enriched in siRNAs (Zhang et al., 2006). As a result, most of the methylation found within gene bodies is clusters of CG methylation with little non-CG methylation. CG methylation without non-CG methylation indicates that the sequence is not being actively targeted by siRNAs and de novo methyltransferases, but that methylation is simply being maintained by MET1. The genic meCG-bias is consistent with analysis of di- and trinucleotide content within the Arabidopsis genome (Tran et al., 2005a). 5-methylcytosine can be deaminated to thymine, and, if left unrepaired, a CG to TA transition occurs (Figure 1). CG dinucleotides are depleted in introns compared to intergenic regions, as expected if the methylation within genes is of the CG type (selection for amino acids in exons confounds analysis there). Additionally, C(A/T)G trinucleotides have the highest intronic/intergenic ratio, and C(C/G)G the lowest, indicating that the methylation found at genes for long enough periods of evolutionary time to result in sequence biases is of the CG type only. Originally, this methylation probably existed as actively maintained CG and non-CG methylation that was targeted by RdDM due to the production of RNAs that could be templates for RdRPs. As the active signal is lost, only CG methylation remains. Active signals could be lost in particular tissues, or from entire plants. A comparison of methylation between the Ler and Col accessions made by hybridizing McrBC-digested DNA to 1 kb tiling arrays found that half the genes with methylation in one accession were not methylated in the other (Vaughn et al., 2007). This speaks to the stochastic nature of the processes producing methylation in the bodies of genes. A brief period of aberrant RNA production and subsequent siRNA biogenesis could lead to CG methylation that is maintained for thousands of years.
Methylation in the Context of Chromatin
All of the processes we have discussed take place not on naked DNA, but on DNA wrapped around nucleosomes and compacted into chromatin. Changes in DNA methylation often parallel changes in histone modifications. In organisms that lack DNA methylation, like S. pombe and Drosophila, complexes containing siRNAs direct histone modifications and repressive chromatin structure (Matzke and Birchler, 2005; Slotkin and Martienssen, 2007). What impact does DNA methylation have on chromatin structure and vice versa?
In mammals, methyl-binding domain (MBD) proteins bind to methylated CpG sites and can recruit other chromatin modifying enzymes, such as histone deacetylases or histone methyltransferases (Wade, 2001). The Arabidopsis genome contains 12 MBD proteins, but so far they have been assigned little role in mediating methylation interactions (Grafi et al., 2007). Only a few can bind methylated CpG in vitro, and although some are required for normal plant development, it is not clear that this has anything to do with DNA methylation per se (Zemach and Grafi, 2003). AtMBD7 seems to be the most likely to have any role in CG methylation; it binds methylated CpG in vitro and is found nuclear-localized to chromocenters (Zemach et al., 2005). We await further elucidation of the functions of these genes.
The RdDM methylation pathway is not required to maintain methylation at centromere repeats. A screen for naturally hypomethylated centromere repeats among Arabidopsis accessions led to the discovery of a novel methylcytosine binding protein, VIM1 (Woo et al, 2007). The Bor-4 strain of Arabidopsis contains centromere repeats that are hypomethylated and decondensed compared to the standard lab strain Columbia (Col). This phenotype is caused by a deletion in the VIM1 gene, which codes for a protein with a PHD domain, RING finger domains, and a SRA domain. In Col-0, VIM1 is found throughout the nucleus but is concentrated at chromocenters. Through its SRA domain and some additional amino acids, VIM1 binds double-stranded CG or CHG methylated oligonucleotides, but not those that are CHH methylated. In in vitro pull down assays, VIM1 interacts with the centromeric H3 variant (CenH3) and with all of the core histones, except for H2A. This led the authors to propose a model whereby VIM1 binds 5-methylcytosine and then alters chromatin structure by modifying histones, although the nature of this modification remains speculative. The mammalian homologue, Np95, has a N-terminal ubiquitin ligase domain and can ubiquitinate histones in vitro (Citterio et al., 2004).
Another mechanism for VIM1's effect on methylation has been suggested by studies of Np95 in mouse ES cells. It was recently shown that Np95, which also binds histones and double stranded oligonucleotides with hemimethylated CG sites (Citterio et al., 2004; Unoki et al., 2004), recruits the Dnmt1 methyltransferase to replication foci and is required to maintain DNA methylation patterns throughout the genome (Bostick et al., 2007; Sharif et al., 2007). It remains to be seen whether VIM1 recruits MET1 or other methyltransferases to replicating methylated DNA, and, if so, whether this is general or is specific to centromeric sequences. In contrast to the global methylation defect observed in np95 mutant mouse ES cells, Arabidopsis Col-0 vim1 mutants lose methylation at centromeres but not at FWA or pericentromeric repeats (Woo et al., 2007).
There are 12 additional SRA domain genes in Arabidopsis, three of which are part of the VIM1 family. Nine SUVH histone methyltransferase genes also contain the domain (Johnson et al., 2007). A connection between histone methylation and DNA methylation has been apparent for many years. In Neurospora, DNA methylation is lost if the histone methyltransferase DIM5 is mutated (Tamaru and Selker, 2001). In repetitive regions of the Arabidopsis genome, the distribution of H3 lysine 9 dimethylation (H3K9me2) closely parallels that of DNA methylation. Mutations in the SWI/SNF2 chromatin remodeling enzyme DDM1 cause loss of DNA methylation and H3 K9 methylation in heterochromatic sequences (Jeddeloh et al, 1999; Lippmann et al., 2004). met1 mutants also lose H3K9me2 at repetitive sequences (Soppe et al., 2002; Tariq et al., 2003; Mathieu et al., 2007). This has been interpreted to mean that CG methylation directs H3K9me2. Another possibility is that the change in histone methylation status is a byproduct of the increased transcription of repetitive elements observed in met1. For example, transcription might disrupt chromatin and evict nucleosomes, which could result in the replacement of H3 methylated at K9 with unmethylated H3. Or, transcription might prevent histone methyltransferases from gaining access to their substrates.
Three histone lysine methyltransferases KYP/SUVH4, SUVH5, and SUVH6 are required for maintenance of non-CG methylation at CMT3 target loci. The impact mutations in each of these genes has on DNA methylation differs depending on the locus (Ebbs and Bender, 2006). The SRA domain of KYP/SUVH4 and SUVH6 binds to methylated double-stranded oligonucleotides with a preference for methylated CHG and CHH sites (Johnson et al., 2007). This suggests that histone methyltransferases and CMT3 act in a positive feedback loop to maintain non-CG DNA methylation and histone H3K9 dimethylation (Johnson et al., 2007).
Most of the attention on histone modifications and DNA methylation has focused on lysine methylation of histone H3. A screen for suppressors of transgene silencing associated with DNA hypermethylation and siRNA accumulation has suggested that histone H2B monoubiquitination is negatively correlated with DNA methylation (Sridhar et al., 2007). The suppressor encodes a protein, UBP26, similar to ubiquitination-specific proteases, and is required for maintaining non-CG methylation and H3K9me2 at the target locus.
DNA methylation, although an excellent tool for keeping transposons transcriptionally silent, can be harmful if it invades gene space, particularly in promoter regions. As will be discussed in the next section, one mechanism to prevent this is genic demethylation by 5-methylcytosine DNA glycosylases. Analysis of the bonsai epimutant has suggested that histone demethylases might also function in this process. bonsai originally arose in a ddm1 mutant background due to silencing of the APC13 gene (Saze and Kakutani, 2007). Gene silencing is associated with accumulation of siR-NAs and hypermethylation of the 3′ transcribed sequence, which extends to the entire gene as ddm1 is successively inbred. APC13 sits about 1 kb upstream of a heavily methylated LINE retrotransposon element that seems to be responsible for the epigenetic effects. Hypermethylation is not induced in accessions in which the LINE element is not inserted downstream of the gene. A screen for other mutants that would cause hypermethylation at bonsai within one generation (unlike ddm1, which takes several generations) identified the IBM1 gene (Saze et al., 2008). Methylation accumulated at the 3′ end of the gene, particularly at non-CG sites, and was associated with increased H3 K9 methylation. IBM1 encodes a Jumonji-C domain-containing protein (Saze et al., 2008). Jumonji domains are found in H3K9 demethylases from diverse organisms (Klose et al., 2006). Although as of yet evidence that IBM1 is a histone demethylase is not complete, this work suggests a mechanism for “protecting” genes from nearby TEs.
DNA Demethylases: Removing Methylation
Another mechanism to protect genes from methylation targeted to repeats is to directly remove 5-methylcytosine from the DNA. Arabidopsis is the first organism in which DNA demethylases have been positively identified. Therefore a given methylation pattern can be the outcome of both methyltransferase and demethylase activities.
The plant DNA demethylases are HhH-GPD (helix-hairpin-helix —Gly/Pro/Asp) DNA glycosylases. DNA glycosylases are base excision repair proteins that typically recognize and remove damaged or mispaired bases from DNA (Fromme and Verdine, 2004). They cleave the glycosidic bond between the base and sugar-phosphate backbone and leave an abasic site in the DNA. Other enzymes then complete DNA repair; an AP endonuclease nicks the DNA backbone, DNA polymerase inserts the correct base, and DNA ligase seals the nick. DNA glycosylases are found in all organisms and repair many different types of damage, including oxidation, alkylation, and deamination. The crystal structure of several classes of glycosylases in complex with DNA has been solved. Like methyltransferases, these enzymes work by flipping the lesion base into the active site (Scharer and Jiricny, 2001; Fromme et al., 2004).
There are four members of the DNA glycosylase demethylase family in Arabidopsis: DEMETER (DME), REPRESSOR OF SI-LENCING1 (ROS1), DEMETER-LIKE2 (DML2), and DEMETER-LIKE3 (DML3) (Choi et al., 2002; Gong et al., 2002). Each of these proteins contains a HhH-GPD DNA glycosylase domain embedded within a much larger protein. The proteins share two other conserved domains that are of unknown function and unique to this class of proteins (Morales-Ruiz et al., 2006). DME and ROS1 were identified in genetic screens while DML2 and DML3 were identified based on sequence homology. Mutations in DME lead to endosperm overgrowth followed by seed abortion when inherited maternally, and mutations in ROS1 cause transcriptional gene silencing of a previously stable promoter:reporter gene fusion (Choi et al., 2002; Gong et al., 2002).
DME and ROS1 have been the most extensively biochemically characterized in vitro, although the details from different groups are not always identical. Purified, recombinant DME excises 5-methylcytosine, but not cytosine, from double-stranded oligonucleotides. 5-methylcytosine is released as a free base as a product of the reaction (Morales-Ruiz et al., 2006). Excision occurs in CG, CHG, and CHH sequence contexts, on either fully methylated or hemimethylated substrates, with the strongest activity observed on CG sites (Gehring et al., 2006; Morales-Ruiz et al., 2006). HhH-GPD DNA glycosylases are characterized by a conserved aspartic acid residue and an invariant lysine. When either of these residues is mutated in recombinant DME, 5-methylcytosine DNA glycosylase activity is lost (Gehring et al., 2006; Morales-Ruiz et al., 2006).
Similar biochemical results were obtained for purified recombinant ROS1. One group found that ROS1 had in vitro preference for methylated CHG sites, whereas another found the highest activity against CG (Agius et al., 2006; Morales-Ruiz et al., 2006). This discrepancy might be due to the different sequences of the methylated oligos used in these studies. A comparison of the kinetics of glycosylase action on 5-methylcytosine in different sequence contexts found that for both DME and ROS1, meC in the CAG context is removed much more efficiently than meC at the outer cytosine in the CCG context (Morales-Ruiz et al., 2006). CCG is also the least likely CHG site to be methylated in vivo (Cokus et al., 2008). DML2 and DML3 activities have also been investigated. DML3 removes all types of cytosine methylation from methylated oligonucleotides (Penterman et al., 2007c). DML2 also does so, but far more weakly than the other glycosylases and not in all sequence contexts (Penterman et al., 2007c).
The idea that DNA glycosylases serve as demethylases by recognizing and removing 5-methylcytosine is not a new one and has some precedence in the animal literature (Cortazar et al., 2007). One persistent criticism of this idea has been that removal of symmetrically methylated cytosines on opposite strands of the DNA could lead to double-stranded DNA breaks, a side effect that would be disastrous. It appears, however, that inhibition of the enzyme by nearby abasic sites solves this problem. DME activity on meCG is greatly reduced when it is opposite an abasic site. Moving the abasic site farther away from the methylated CG relieves the inhibition (Gehring et al., 2006).
HhH-GPD DNA glycosylases can be monofunctional or bi-functional (Fromme and Verdine, 2004). Monofunctional glycosylases attack the base using water and leave the DNA backbone intact. Bifunctional DNA glycosylases (or glycosylases/lyases as they are also known) use nucleophillic attack to remove the damaged base and nick the DNA 3′ of the base. The bifunctional glycosylase pathway acts only on the single base being repaired while the monofunctional glycosylase pathway can lead to long-patch repair, in which the DNA strand is displaced and several bases replaced by polymerase. Monofunctional and bifunctional DNA glycosylases are distinguished based on the different elimination products generated by the two reactions and on the fact that bifunctional DNA glycosylases form a covalent intermediate with the DNA that can be trapped with sodium borohydride. All of the Arabidopsis enzymes are bifunctional DNA glycosylases, releasing characteristic beta and delta elimination products (Agius et al., 2006; Gehring et al., 2006; Morales-Ruiz et al., 2006; Penterman et al., 2007c) and DME and ROS1 can be trapped by sodium borohydride (Agius et al., 2006; Gehring et al., 2006; Morales-Ruiz et al., 2006). This means that the enzymes only affect a single base at a time and reduces the chance that they could cause double strand breaks during 5-methylcytosine removal.
Photosynthetic organisms are subject to continual assault on their DNA from the Sun and photosynthesis byproducts. Plants contain a large number of DNA glycosylases, which might have allowed additional functional specialization of these enzymes (Britt, 2002). While HhH-GPD DNA glycosylases are the largest class of glycosylases among all organisms, the DME family appears to be unique to the plant lineage. The chromatin database ( www.chromdb.org) identifies members in all of the sequenced or partially sequenced plant genomes, down to the unicellular algae Ostreococcus luminarias. In most organisms, HhH-GPD glycosylases range in size from 18–40 kDa, with the glycosylase domain itself being about 200 amino acids. In contrast, the Arabidopsis DME family of enzymes are between 1100 and 1800 amino acids long. The only other glycosylase that approaches this size is the Drosophila MUG (mismatch uracil glycosylase) protein Thd1, which is 191 kDa and processes U:G mismatches (Cortazar et al., 2007).
It is possible that the DEMETER family of enzymes also plays a role in repair of spontaneous DNA damage. Although many organisms repair DNA damage caused by spontaneous deamination of methylated cytosines to thymines with a thymine-specific MUG DNA glycosylase, no MUG homologs are found in the Arabidopsis genome (Britt, 2002). This is especially surprising given the large amounts of 5-methylcytosine found in plant genomes and the fact that spontaneous deamination is not uncommon. The DEMETER family of enzymes might serve this function as well in Arabidopsis. Studies have shown that DME and ROS1 have substantial activity in vitro against T:G mismatches (results for DML2 and DML3 have not been reported) (Gehring et al., 2006; Morales-Ruiz et al., 2006). ros1 mutants are also more sensitive than wild type plants to the DNA alkylating agent MMS (methyl methanesulfonate) and the oxidizing agent hydrogen peroxide (Gong et al., 2002). It's unknown whether this is a direct or indirect effect. It would be unusual for a single glycosylase to repair so many diverse types of DNA damage.
How 5-methylcytosine DNA glycosylases recognize their substrate is a mystery. 5-methylcytosine is a “normal” base. Thymine DNA glycosylases, which recognize the normal base thymine, do so by also querying the complementary strand of DNA, so that only thymines paired to guanines are removed, not thymines paired to adenines (Cortazar et al., 2007). This sort of mechanisms does not seem sufficient for 5-methylcytosine DNA glycosylase activity. Solving the crystal structure of these enzymes in complex with DNA will be vital for understanding how they recognize and remove 5-methylcytosine.
In vivo evidence also supports the designation of these enzymes as DNA demethylases and indicates that their glycosylase activity is required for function in planta. When DME with a missense mutation in the critical aspartic acid residue is transformed back into plants, it no longer complements the dme phenotype (Choi et al., 2004). The MEA Polycomb group gene, defined genetically as a target of DME in the female gametophyte, becomes hypermethylated when dme is mutated (Gehring et al., 2006). Similarly, the endogenous RDR29A promoter and RDR29A promoter transgene become hypermethylated and transcriptionally silent in a ros1 background (Gong et al., 2002).
Two different approaches have been undertaken to search for endogenous targets of ROS1 and the other DEMETER-LIKE genes. One group compared gene expression in wild type plants and ros1 plants, the idea being that ros1-induced hypermethylation could lead to gene silencing (Zhu et al., 2007). Three genes that decrease in expression in a ros1 mutant showed increases in DNA methylation at the promoter region. They also looked at several well-studied transposons and identified increases in non-CG methylation at a subset of these (CG methylation was already near 100% in wild type and could not be expected to go much higher).
Targets of ROS1 and the DML2 and DML3 enzymes have been directly mapped using genome-wide methylation profiling. Methylation patterns in wild type and ros1 dml2 dml3 triple mutant whole plants were compared by isolating methylated DNA from each genotype and hybridizing it to tiling microarrays (Penterman et al., 2007c). By subtracting one methylation pattern from the other, about 200 regions that were hypermethylated in the mutant were identified. Detailed bisulfite analysis at a subset of these sequences indicated that hypermethylation occurred both at sites where there was no methylation in wild type, or where some was already present. Overall, hypermethylation of cytosines was found in all sequence contexts, although this varied depending on the locus. Most of the hypermethylation occurred in genic sequences. Among genes that were not methylated in wild type but became methylated in the triple mutant, methylation accumulated predominantly at the 5′ and 3′ ends of genes. This is the opposite of the genic methylation pattern found in wild type, and are the regions potentially the most deleterious to gene function if methylated (Figure 2). However, of the 13 targets tested, very few showed an expression change between wild type and mutant, indicating that hypermethylation is not necessarily immediately deleterious to gene expression. The methylation removed by the DEMETER family of enzymes is deposited by RdDM and targets of the DEMETER family are enriched in siRNAs (Penterman et al., 2007a). This suggests that one of the functions of the demethylases is to “clean-up” after a methylation system that robustly recognizes and silences parasitic transposable elements. In this way, genes would be protected from methylation that can potentially silence normal gene expression.
At present, nothing is known about how the glycosylases are targeted to their substrates. How, for example, are the ends of genes distinguished? One possibility is that chromatin differences mark the ends of genes, and that this is recognized by ROS1, DML2 and DML3. How any glycosylase finds its target among millions of base pairs of DNA condensed into chromatin is still an open question. One possibility is that glycosylases look for their substrates by constantly scanning the DNA. The endogenous RDR29A gene promoter is not a target for ROS1 in wild type plants, but becomes so when a siRNA-producing RDR29A trans-gene is introduced (Gong et al., 2002). This suggests that ROS1 can immediately recognize new genic targets and remove methylation. The mechanism of 5-methylcytosine DNA glycosylase targeting is likely to be a fruitful area of future research, as will be identifying interacting factors.
Finally, evidence for cross-talk and potential coordination between the methylation, RNA silencing, and demethylation pathways is emerging. Expression of ROS1 is reduced in mutants of the RdDM pathway and in met1 (Huettel et al., 2006; Mathieu et al., 2007; Penterman et al., 2007a) and ROS1 target loci become hypermethylated in rdr2 and drm2 mutants (Penterman et al., 2007a). Reduction of DNA demethylase function in RdDM and methyltransferase mutants complicates the analysis of methylation in these backgrounds.
Methylation, Demethylation, and Regulation
Methylation has been ascribed many roles in Arabidopsis; it is widely reported to regulate transposons, regulate genes, and regulate development. The word regulate has a tendency to be bandied about rather loosely in scientific discourse and applied to any situation in which a molecular or morphological phenotype emerges: mutation of factor X causes phenotype y to emerge, therefore Factor X regulates process Y. Mutations in HUELLEN-LOS (HLL) lead to a striking developmental phenotype. Ovule growth is arrested and the pistil does not elongate. HLL encodes a mitochondrial ribosomal gene (Skinner et al., 2001). Does this mean that ribosomal proteins regulate development? Clearly not. With regard to methylation, it might be helpful to be more precise with our use of the word regulate. Methylation and demethylation activities appear to function primarily in genome housekeeping, with regulation limited to, at this point, the expression of imprinted genes (Figure 4).
For the majority of methylated sequences found in the Arabidopsis genome, like TEs and centromeres, “methylated” is the default state and methylation is propagated indefinitely. Unlike mammals, the plant methylome does not appear to be reset during gametogenesis. There is little evidence that the status of methylated sequences changes during growth and development of the organism. Methylation certainly impacts repeated sequences, as a dramatic effect is observed when methylation is removed, but does not appear to dynamically regulate them. Similarly, DNA demethylation serves a housekeeping function by removing genic RNA-directed DNA methylation (Gong et al., 2002; Penterman et al., 2007a). In these ways transposable elements are kept silent and genes are protected from methylation-induced silencing (Figure 4A).
A similar housekeeping scenario may apply to the clusters of CG methylation found in genes. In met1 mutants, genes that are body-methylated in wild type have a slight overall increase in expression compared to genes that are not methylated in wild type (Zilberman et al., 2007). This suggests that methylation in the body of genes dampens their expression, but does not suggest that it does so in a regulatory manner.
The fact that drm2 cmt3 plants exhibit reproducible recessive phenotypes has been interpreted to mean that non-CG methylation regulates development (Chan et al., 2006). The molecular basis of the phenotypes has not been elucidated and no target genes are known. As of yet, there is no evidence that non-CG methylation actively regulates genes.
Regulation by methylation describes a change in the methylation status of a sequence associated with a change in function. In mammals, tissue specific differences in methylation have been discovered. There is evidence that germline-specific genes become methylated as they are no longer expressed in differentiated tissues (Weber et al., 2007). These questions have not been fully experimentally addressed in plants. Several plant studies have found differences in methylation between tissue types, although the responsible sequences have not, for the most part, been identified (Gehring and Henikoff, 2007). These must be viewed with a degree of caution as most rely on detecting differences by methods that utilize methylation-sensitive enzymes. Methylation can be lost at individual cytosines even though the region remains methylated overall (Tran et al., 2005a). A recent study profiled DNA methylation and histone modifications in young rice shoots and cultured cells using McrBC digestion followed by hybridization to high density tiling microarrays. DNA methylation differences were observed at several genes and transposons (Li et al., 2008b). The question of whether there are tissue-specific patterns of DNA methylation in Arabidopsis is now ready to be addressed with the new methods available for isolating DNA from specific tissues and for determining high resolution methylation patterns. It might be particularly interesting to compare undifferentiated and differentiated tissue types (i.e. meristems and leaves). For example, the PHB gene is less methylated and more highly expressed in inflorescence meristems than in mature tissues. However, expression levels are under miRNA control, and methylation appears to play no role in regulating mRNA. Determining whether tissue-specific methylation differences are biologically significant will be a challenge.
Despite caveats concerning the possible regulatory roles of most plant DNA methylation, imprinted genes are clear examples in which DNA demethylation has been harnessed to regulate gene expression. Alleles of imprinted genes are expressed differently depending on whether they are inherited from the male or female parent. There is good evidence that the methylation of expressed alleles of imprinted genes changes during female gametophyte development (Figure 4) (Gehring and Henikoff, 2007; Huh et al., 2007; Penterman et al., 2007b). In Arabidopsis we know of four imprinted genes, MEDEA, FWA, FIS2, and PHERES. The first three are expressed maternally and silenced paternally in the endosperm. The endosperm is one of the products of double fertilization (the embryo being the other) and arises when a haploid sperm fertilizes the diploid central cell of the female gametophyte. The triploid endosperm nourishes the embryo during seed development. Expression of maternal MEDEA and FIS2 alleles is essential for normal endosperm development and seed viability (Grossniklaus et al., 1998; Kiyosue et al., 1999; Luo et al., 1999). These genes encode components of Polycomb group complexes, which generally function to maintain gene repression. Maternal inheritance of mea and fis2 alleles causes endosperm overproliferation and embryo arrest. The role of FWA during endosperm development is unknown.
MEA, FWA, and FIS2 are all associated with DNA methylation that is maintained by MET1 (Soppe et al., 2000; Xiao et al., 2003; Kinoshita et al., 2004; Jullien et al., 2006a). Maternal expression of MEA, FWA, and FIS2 in the female gametophyte and endosperm depends on inheritance of a wild type maternal allele of the 5-methylcytosine DNA glycosylase DME (Choi et al., 2002; Kinoshita et al., 2004; Jullien et al., 2006a). DME exhibits a restricted pattern of expression during reproductive development; it is expressed in the polar nuclei (which fuse to form the central cell nucleus) and central cell nucleus, but not in the egg cell, which gives rise to the embryo (Choi et al., 2002).
The simplest imprinting scenario (or so it seems at present) is represented by FWA (Figure 4B). In the sporophyte, FWA is highly methylated at tandem repeats in the promoter and first exon and is silenced. When methylation is lost, as in met1 mutants, the gene is expressed (Soppe et al., 2000). In wild type endosperm, but not embryo, FWA is relatively hypomethylated and is expressed. Expression depends on DME, suggesting that DME removes 5-methylcytosine from the 5′ region of FWA (Kinoshita et al., 2004). Since the endosperm is a terminally differentiated tissue, the hypomethylated allele is never inherited, thus obviating the need for methylation resetting. Regulation of MEA expression is slightly more complicated, but the same basic features are present. MEA is methylated at tandem direct repeats 3′ of the gene and, in some ecotypes, at a CG cluster 5′ of the gene (Xiao et al., 2003; Gehring et al., 2006). The expressed maternal MEA allele is hypomethylated in the endosperm compared to the silent paternal allele and the expressed embryo alleles. If a mutant maternal dme allele is inherited, the maternal MEA allele is hypermethylated in the endosperm (Gehring et al., 2006). Unlike FWA, loss of MEA paternal allele methylation via a met1 mutation does not lead to paternal allele expression in the endosperm. Polycomb group (PcG) genes, including maternal MEA, repress paternal MEA expression (Baroux et al., 2006; Gehring et al., 2006; Jullien et al., 2006b). For both FWA and MEA (and FIS2, although endosperm methylation data has not yet been reported), the presumptive site of demethylation is in the central cell of the female gametophyte, where DME is expressed (Choi et al., 2002). Work on maize imprinted genes also supports this conclusion. A PcG gene that is maternally expressed and paternally silent in the endosperm is hypomethylated in the central cell compared to the egg cell and sperm cells (Gutierrez-Marcos et al., 2006). The restricted expression pattern of the DME demethylase during reproductive development allows imprinting to be established. Thus far, the methylation at the FWA and MEA genes is the only example of methylation changing during the growth and development of the plant and actively regulating gene expression.
The pace of methylation and demethylation research in Arabidopsis is expected to remain brisk. A little over a year after the first high-resolution methylation map for Arabidopsis (or any organism) was published (Zhang et al., 2006), a map with single base resolution is now available (Cokus et al., 2008). The challenge now is to make sense of all this information. Both genetic and genomic experiments will be required to tease apart the impact of methylation and demethylation on the genome. How significant is methylation at a particular sequence? How dynamic is it? Our understanding of DNA demethylation is still relatively rudimentary and significant questions remain unanswered. With all of the molecular, genetic, and genomic tools available for studying these problems in Arabidopsis, it is expected that this chapter will need a significant update within just a few years.
We thank Roger Deal for comments on the manuscript. M.G. is a Howard Hughes Medical Institute Fellow of the Life Sciences Research Foundation.