Fundamental questions remain to be answered on how lineages split and new species form. The Arabidopsis genus, with several increasingly well characterized species closely related to the model system A. thaliana, provides a rare opportunity to address key questions in speciation research. Arabidopsis species, and in some cases populations within a species, vary considerably in their habitat preferences, adaptations to local environments, mating system, life history strategy, genome structure and chromosome number. These differences provide numerous open doors for understanding the role these factors play in population divergence and how they may cause barriers to arise among nascent species. Molecular tools available in A. thaliana are widely applicable to its relatives, and together with modern comparative genomic approaches they will provide new and increasingly mechanistic insights into the processes underpinning lineage divergence and speciation. We will discuss recent progress in understanding the molecular basis of local adaptation, reproductive isolation and genetic incompatibility, focusing on work utilizing the Arabidopsis genus, and will highlight several areas in which additional research will provide meaningful insights into adaptation and speciation processes in this genus.
Explaining how the diversity of organisms populating Earth arises remains a fascinating challenge. Extensive progress has been made in recent years, and numerous informative reviews have been written on this topic (e.g. Coyne and Orr, 2004; Lowry et al. 2008a; Rieseberg and Willis, 2007; Rieseberg and Blackman, 2010; Schluter, 2009; Schluter and Conte, 2009; Sobel et al. 2009), and a number of conclusions can be drawn from work on speciation in recent decades. For example, it is increasingly clear that pre-zygotic barriers (factors that prevent mating), such as adaptation to divergent habitats or distinct pollinator preferences, likely play starring roles in diversification and speciation (Lowry et al., 2008a; Rieseberg and Blackman, 2010; Rieseberg and Willis, 2007; Schluter, 2009; Schluter and Conte, 2009; Sobel et al. 2009). Though they may contribute less to overall isolation, it is also clear that postzygotic barriers (e.g. genetic incompatibilities) also play an important role in restricting gene flow among species (Lowry et al., 2008a), and recent years have seen the identification of numerous causal genes, particularly in plants (Rieseberg and Blackman, 2010). It has been pointed out also that the majority of “speciation genes” identified to date are post-zygotic, and comparatively little is known about the molecular basis of pre-zygotic barriers (with the exception of pollinator preferences; Rieseberg and Blackman, 2010).
Implicit in contemporary views of speciation is the notion that ecological and genetic factors play important, and often inextricably intertwined roles in population divergence and isolation (e.g. Lowry et al., 2008a; Rieseberg and Willis, 2007; Schluter, 2009; Schluter and Conte, 2009). Relevant factors in speciation thus include causes of population fragmentation and divergence (such as geographic barriers, temporal isolation or adaptations to different habitats), genes that mediate environmental adaptation, and the genetic barriers that accumulate between diverging populations, whether they result from selection or drift. This underscores the importance of gaining a detailed understanding of multiple aspects of species biology, including ecology, population genetic patterns, demographic history and the genetics of extrinsic or intrinsic reproductive isolating barriers. Perhaps most important (and in some ways most difficult) is to clearly assess and quantify current, and where possible historic impediments to gene flow. Such studies are still rare, but those that have been done provide insights into the relative importance of different barriers to gene flow in natural populations (Lowry et al., 2008a).
Molecular tools available in model systems can be extended with increasing ease to relatives, providing us with “model genera” or even “model families” (Bomblies and Weigel, 2007a, 2010; Crosby et al. 2007) that can help get around some of the biases inherent in studying just a single model species, while the relatedness to traditional models keeps open the door to a wealth of molecular tools. The Arabidopsis genus, with a manageable number of closely related, genetically tractable and sometimes inter-fertile species, provides excellent opportunities to study processes relevant to speciation in molecular detail (Bomblies and Weigel, 2007a, 2010). In this review, we will do two things: First, we will discuss work relevant to speciation and population divergence that has utilized the Arabidopsis genus either as a way of gaining molecular insight into barriers known from other species, or to understand barriers active in this genus. Second, we will point to ongoing and future opportunities for more fully capitalizing on the power of this genus for studies in speciation. Since previous reviews have covered additional aspects of speciation research in Arabidopsis, we will not cover all areas here (see Bomblies and Weigel 2007a, 2010).
Species and Speciation History in the Arabidopsis Genus
The taxonomy, evolution and ecology of the Arabidopsis genus have been extensively reviewed (see e.g. Al-Shehbaz and O'Kane, 2002; Clauss and Koch, 2006; Hoffmann, 2005; Koch et al., 2008; Koch and Matschinger, 2007; Mitchell-Olds, 2001; Shimizu, 2002) and we will thus include only a brief summary here to provide context relevant for the remainder of the paper. Aside from several rare or locally endemic species, the Arabidopsis genus consists of four major lineages: A. thaliana, A. lyrata, A. arenosa, and A. halleri with the latter three each divided into two or more sub-species (Figure 1). Species in these Arabidopsis lineages generally have eight chromosomes, or sixteen in tetraploids, with the exception of A. thaliana which has 5 chromosomes, and A. suecica, which has 13 chromosomes (Al-Shehbaz and O'Kane, 2002). Most species are self-incompatible, though A. thaliana, A. suecica and A. kamchatika are self-fertile.
The earliest diverging lineage is A. thaliana, which split from the rest of the genus about 3.8–5 million years ago (MYA). The remaining species radiated approximately 2 MYA (Clauss and Koch, 2006; Hoffmann, 2005; Koch and Matschinger, 2007). Arabidopsis thaliana is primarily self-fertilizing and inhabits a variety of low competition habitats, primarily human-associated cultivated or waste areas, but also sparse meadows, rubble slopes and riverbanks (Al-Shehbaz and O'Kane, 2002). Being the model system of choice for many molecular biologists means that a genome sequence, a wealth of molecular information and extensive resources are available. Large-scale genome sequencing and characterization efforts are adding to the store of knowledge about variation in genome sequences and organization, and opening the door to understanding how this correlates with phenotypic variation, poising A. thaliana to become an increasingly powerful tool for understanding the molecular basis of evolutionary processes (e.g. Clark et al. 2007; Weigel and Mott, 2009; Zhang et al. 2006).
Arabidopsis lyrata is perennial and is made up of two recognized geographically isolated subspecies: A. lyrata ssp. petraea occurs in Eurasia, while A. lyrata ssp. lyrata occurs in North America (Al-Shehbaz and O'Kane, 2002; Clauss and Koch, 2006). There are populations in temperate regions of Europe and North America, but A. lyrata is primarily an arctic species (Al-Shehbaz and O'Kane, 2002; Hoffmann, 2005). It is diploid (n=8) throughout most of its range, but tetraploid (n=16) populations are known from Austria (Dart et al., 2004). The A. lyrata genome has recently been sequenced, greatly facilitating molecular studies and opening numerous doors for detailed investigation of mechanisms relevant to speciation and adaptation ( http://genome.jgi-psf.org/Araly1/Araly1.info.html).
Arabidopsis arenosa is currently divided into two recognized subspecies that both occur in Europe: ssp. borbasii tends to be perennial and inhabits mountainous or rocky habitats, while ssp. arenosa is annual or biennial and more often inhabits ruderal sites (Al-Shehbaz and O'Kane, 2002; Scholz, 1962). Arabidopsis arenosa is tetraploid (n=16) through much of its range, but diploid populations (n=8) are known from the Balkans and Carpathian mountains (MA Koch et al. personal communication; Měsíček, 1967; Schmickl and Koch in preparation). The genome of a diploid A. arenosa is also being sequenced, which will greatly expand the opportunities for comparative molecular approaches with multiple members of the Arabidopsis genus (Comai and Rokshar et al.; http://www.jgi.doe.gov/sequencing/statusreporter/psr.php?projectid=402430).
Arabidopsis halleri is currently divided into five recognized sub-species (Koch et al., 2008; Kolník and Marhold, 2006). It is diploid (n = 8) and populations are patchily distributed throughout Eurasia (Hoffmann, 2005; Kolník and Marhold, 2006). Unlike its congeners, A. halleri is tolerant of shading and competition and occurs primarily in mesic meadow habitats (Clauss and Koch, 2006; Pauwels et al., 2005). This species is also tolerant of high levels of heavy metals in soil, which has allowed it to invade many habitats polluted by human mining activities (Clauss and Koch, 2006).
Hybrid zones can be very important models for understanding processes of introgression, adaptation, speciation, and gene flow barriers (e.g. Barton and Hewitt, 1989; Rieseberg et al., 1999). In Eastern Austria tetraploid (n = 16) populations of A. arenosa and A. lyrata co-occur and a hybrid zone has recently been found (Schmickl and Koch in preparation; MA Koch et al. personal communication), promising exciting opportunities to study the molecular and phenotypic consequences of gene flow and the mechanisms of maintenance of species barriers. There is molecular evidence of hybridization in several other Arabidopsis species as well, though no current hybrid zones have been identified. For example, there is evidence of historical gene flow between A. halleri and A. lyrata (Ramos-Onsins et al., 2004), though while crosses in the lab between these species are possible, hybrid fitness is lower than that of the parents. This suggests that intrinsic incompatibilities have arisen since their divergence, which may explain why there are no modern hybrid zones known (Ramos-Onsins et al., 2004). Gene flow also seems to have occurred historically between A. arenosa and A. croatica (Koch and Matschinger, 2007).
Barriers relevant to plant speciation
Hybridization can also give rise to new species. This mechanism is thought to be a very important factor in plant speciation (Buerkle et al. 2000; Hegarty and Hiscock, 2005). The Arabidopsis genus contains two hybrid species that provide interesting models (Figure 1): A single hybridization event between A. thaliana and A. arenosa spawned a natural allopolyploid hybrid species, A. suecica, with 13 chromosomes, eight of which come from A. arenosa, and five from A. thaliana (Hylander, 1957; Jakobsson et al., 2006; Měsíček, 1967; Säll et al., 2003). Arabidopsis kamchatika is a more recently recognized hybrid Arabidopsis species that seems to have arisen via several independent hybridization events between A. halleri ssp. gemmifera and A. lyrata in Eastern Asia, though some lineages recognized as A. kamchatika appear to be autopolyploid derivatives of A. lyrata (Clauss and Koch, 2006; Koch and Matschinger, 2007; Schmickl et al., 2010; Shimizu-Inatsugi et al., 2009; Watanabe et al., 2005).
Overall, the species in the Arabidopsis genus provide many opportunities for studying processes relevant to adaptation, divergence and speciation. In this review we will discuss how they have already contributed to our understanding of these processes, as well as where more research is needed. Table 1 provides a list of gene flow barriers relevant to plants, primarily following previously utilized classification schemes (e.g. Coyne and Orr, 2004; Rieseberg and Blackman, 2010), and provides a brief summary of what is known in the Arabidopsis genus, which will be discussed in more detail below, as well as pointing out areas where more research is needed.
Edaphic adaptation: Adaptations to habitat or soil type differences (edaphic adaptation) may be very important in divergence and ecological speciation in plants (Schluter, 2009; Schluter and Conte, 2009; Rajakaruna, 2004; Rieseberg and Willis, 2007). While several genes important for adaptation are known, it has yet to be demonstrated whether these adaptations lead to reproductive isolation; thus genes that might truly be considered “edaphic isolation genes” remain as yet unknown (Rieseberg and Blackmann, 2010). The Arabidopsis genus provides excellent opportunities in this area. Despite extensive range overlap and broad similarities in habitat preferences, botanical surveys rarely name multiple Arabidopsis species co-occurring in the same plant communities. This suggests that at least some habitat isolation exists among these species. It will be crucial for future research in speciation in this genus to investigate which local adaptations lead to gene flow barriers among species or populations in nature.
Arabidopsis species have colonized a wide range of primarily low-competition habitats. Arabidopsis lyrata, for example, occurs in sand dunes, tundra, stream banks, lakeshores, and rocky slopes, and can be found on calcareous/alkaline substrates (e.g. limestone, dolomite or gypsum), silicaceous/acidic substrates (e.g. sandstone and granite), and even serpentine (Al-Shehbaz and O'Kane, 2002; Černý et al., 2006; Clauss and Koch, 2006; Stelfox, 1970; Turner et al., 2008). It occurs from temperate regions into the high arctic (Hoffmann, 2005). These habitats likely vary in numerous important factors, such as nutrient content and availability, rhizosphere communities, temperature, light incidence, growing season length, competition, and pathogen and herbivore prevalence. The closely related A. arenosa is also found in a diverse array of habitats (Boublík et al., 2007; Clauss and Koch, 2006; Husová, 1967; Justin, 1993; Polatschek, 1965), but has also adapted to more extreme substrates than those reported for A. lyrata, such as strongly acidic soils (Lawesson and Mark, 2000) and heavy metal contaminated sites (Kucharski et al., 2005; Przedpełska and Wierzbicka, 2007). Arabidopsis halleri is unusual in that it is tolerant of shading and can fare well in high competition habitats such as mesic meadow sites (Clauss and Koch, 2006; Pauwels et al., 2005). There are many adaptations that are no doubt relevant to gene flow in and among Arabidopsis species - for example, adaptation from ancestral alpine habitats to lower altitude habitats was likely important in the diversification of A. arenosa and A. halleri subspecies (Al-Shehbaz and O'Kane, 2002).
Little is known in this genus about the role of edaphic factors in isolating gene pools. However, Arabidopsis species have already provided important insights into the molecular mechanisms of adaptation, which provide valuable information for filling out future studies of edaphic isolation. Adaptation to challenging substrates has been particularly well studied. As an illustrative example, we will discuss heavy metal tolerance and accumulation in A. halleri, which is an adaptation that is especially well understood at the molecular level (Bert et al., 2002; Macnair, 2002; Macnair et al., 1999). In A. halleri, adaptation to metal-contaminated sites has occurred multiple times, indicating a predisposition to achieving this ecological transition (e.g. Bert et al., 2002; Pauwels et al., 2005; Punz and Mucina, 1997). Several genes important in adaptation to heavy metal-containing soils have been identified. For example, HMA4 is a metal transporter that contributes to Zn and Cd tolerance differences between A. lyrata and A. halleri (Courbot et al., 2007; Hanikenne et al., 2008; Willems et al., 2007); its ortholog is also implicated in heavy metal tolerance and hyperaccumulation in T. caerulescens (Bernard et al., 2004; Papoyan and Kochian, 2004). Non-tolerant species and A. halleri differ in HMA4 expression, with the higher level in A. halleri attributable to a combination of gene copy number increase and cis-regulatory changes (Hanikenne et al., 2008). Another zinc transporter, MTP1, and at least two additional genes implicated in hypertolerance also show evidence of copy number and expression level increases (Dräger et al., 2004; Talke et al., 2006). Adaptation in A. lyrata to another challenging substrate, serpentine soil, also showed evidence of copy number expansion in several substrate-associated genomic regions, though causal genes remain to be characterized (Turner et al., 2008; Turner et al., 2010). Salt tolerance in A. thaliana accessions was similarly traced to a natural variant with altered expression in root tissues of the sodium transporter HKT1 (Rus et al., 2006). These studies suggest that expression changes of causal genes already found in ancestral populations will be a common theme at least in substrate adaptation. To what degree these confer gene flow barriers, and whether the effects are direct or indirect, provides interesting questions for follow-up studies.
In the future, traditional QTL mapping (where crosses are possible), and high throughput genomic approaches will likely provide additional insights into adaptation to the diverse habitats in which Arabidopsis species are found. TILING-array analysis and high-throughput sequencing of genotypes inhabiting alternate habitats can be used to identify genomic regions differentiated by habitat, providing a list of candidate loci that can be tested for involvement in local adaptation and habitat isolation (e.g. Turner et al., 2008; Turner et al., 2010). One important example likely to contribute significantly to our understanding of substrate adaptation in particular is “ionomics” - a strategy designed specifically to accelerate the identification of the genetic basis of nutrient acquisition (Baxter, 2009; Salt et al., 2008). While these approaches have been currently applied primarily to soil adaptations, there is no reason that modified strategies could not be employed for studying adaptations to other environmental factors, such as shading, temperature, altitude, moisture availability, and pathogen or herbivore repertoires. Large-scale approaches hold promise for speeding up the identification of the molecular basis of habitat adaptations. When combined with studies of how these adaptations in turn affect gene flow, such studies will greatly expand our understanding of how edaphic adaptation and gene flow barriers may inter connect.
Plasticity and trade-offs in temporal isolation: Gene flow barriers due to temporal non-overlap of reproduction may be very important in speciation and maintenance of species boundaries in both animals and plants (Coyne and Orr, 2004). There is abundant evidence in the plant literature of differences in flowering time among populations or related species leading to non-overlapping flowering times. In numerous examples flowering time differences are associated with edaphic adaptations (e.g. examples in Hall and Willis, 2006; Levin, 2009; Lowry et al., 2008b; Savolainen et al., 2006; Silvertown et al., 2005; Snaydon and Davies, 1982). For example, recent speciation of palms on Lord Howe island seems to have involved both adaptation to distinct substrates, and associated flowering time differences (Savolainen et al., 2006), and a long-term study on Anthoxanthum odoratum populations growing under distinct artificially-applied edaphic regimes showed significant local adaptation within 150 years (Snaydon and Davies, 1982) and sometimes also associated differences in flowering time (Silvertown et al., 2005). The two Lord Howe Island palm species are especially intriguing. They have strong reproductive isolation with almost no overlap in flowering times, but reproductive isolation in this case arises from a plastic response to the environment — if the now-diverged palms happen to co-occur on the same substrate, the flowering time difference disappears (Savolainen et al., 2006). Plastic responses to the environment can promote divergence if growth on different substrates is inextricably (e.g. physiologically) linked to divergence in flowering time. This may be a very important factor in plant speciation (Levin, 2009). A mathematical model built on the Howe Island palm scenario suggested that speciation indeed becomes particularly likely when there is a direct environmental effect on flowering time and a small number of loci involved in adaptation (Gavrilets and Vose, 2007).
The Arabidopsis genus provides tests for the predictions of these ideas. While we are not aware of direct evidence in the Arabidopsis genus of temporal isolation restricting gene flow among Arabidopsis species or populations, the raw material — genetic variation for flowering time — is widely available, at least in A. thaliana (Gazzani et al., 2003; Koornneef et al., 1998; Lempe et al., 2005) and A. lyrata (Riihimäki and Savolainen, 2004; Sandring and Agren, 2009; Sandring et al., 2007). In A. thaliana the genetic basis of flowering time regulation has been intensely investigated with mutants as well as natural variants (see for reviews e.g. Koornneef et al., 1998; Simpson and Dean, 2002; Yant et al., 2009), and it is clear that a small number of mutations can have a large impact on flowering time, as is predicted to facilitate speciation in the Gavrilets and Vose models (2007).
Given that temporal isolation is also greatly aided if there is developmental plasticity of flowering time in response to environmental cues that differ among habitats, it is intriguing that a number of different abiotic factors (e.g. soil type, nutrient availability, pH, moisture, and temperature) can alter maturation and flowering times in a wide range of species (see Levin, 2009; Wielgolaski, 2001). In A. thaliana environment can also directly affect flowering time, and provides the rare opportunity to understand the molecular basis of this plasticity (e.g. Juenger et al., 2005; Lempe et al., 2005; Pigliucci and Schlichting, 2002; Stinchcombe and Schmitt, 2004; Wilczek et al., 2009). Several studies have highlighted the importance of hormones in linking environment and flowering time. For example, salicylic acid (SA), which is induced by environmental or pathogen stress, can accelerate flowering (Korves and Bergelson, 2003; Martínez et al., 2004). Elevated SA also increases tolerance of some abiotic stresses (e.g. high heavy metal content in soil or drought), but can decrease tolerance of other stresses (e.g. temperature and salinity; Chini et al., 2004; Freeman et al., 2005). Natural accessions differ in their level of response to SA (van Leeuwen et al., 2007), suggesting that the utility of SA signaling for adaptation, and thus the magnitude of any effects of edaphic factors on flowering time will likely vary among strains. In contrast to SA, Nitric oxide (NO), whose production is also induced by various biotic and abiotic stresses, delays flowering (He et al., 2004). This knowledge gives a good framework for understanding and interpreting patterns observed in the wild in other species.
Alternative strategies for adaptation to similar environments could also conceivably lead to reproductive isolation during population divergence (“non-ecological speciation”; Schluter, 2009). In A. thaliana, experimental evidence exists supporting the plausibility of this in flowering time differentiation: A trade-off between drought escape via early flowering and drought resistance via water use efficiency is caused by pleiotropic effects of flowering-time genes (McKay et al., 2003). This means that two viable alternate adaptation strategies (escape by early flowering versus tolerance due to increased water use efficiency) to the same environmental challenge (drought) could result in temporal isolation as a pleiotropic effect of environmental adaptation. Whether these differences isolate populations in nature remains to be tested.
Given how much is known about the mechanisms of flowering time regulation and plasticity in Arabidopsis, it will be interesting to investigate whether temporal isolation occurs in this genus in nature.
Mating system: Mating system differences may affect gene flow among diverging populations and could thus isolate nascent lineages (Grant, 1971; Jain, 1976; Rieseberg and Willis, 2007). In the related genus Capsella, speciation of C. rubella is attributable to isolation due to self-fertilization arising in a single individual (Foxe et al. 2009; Guo et al. 2009). In the Arabidopsis genus, there are three self-fertilizing species: A. thaliana, A. suecica and A. kamchatika. In the case of A. thaliana, the transition to self-fertilizing is estimated to have occurred between 400,000 and 1 million years ago (Bechsgaard et al., 2006; Tang et al., 2007), and multiple alleles of the self-incompatibility locus still exist, speaking against a selective sweep as observed in Capsella rubella (Boggs et al., 2009; Sherman-Broyles et al., 2007; Tang et al., 2007). These findings argue that the loss of self-incompatibility is unlikely to have been the primary cause of speciation of A. thaliana, though it may have contributed to strengthening isolation at later stages. For A. kamchatika and A. suecica the timing of the origin of selfing relative to speciation is unknown, but it is possible that, as has been previously observed for hybrids in Arabidopsis and Capsella, selfing may have arisen in these species as a direct consequence of polyploidization and/or hybridization via epigenetic silencing of the self-incompatibility locus (Nasrallah et al., 2007). In these instances, the transition to self-fertilization may indeed contribute directly to speciation. Transitions from self-incompatibility to self-fertility are commonly observed in association with polyploid speciation (Comai, 2005).
Species-specificity of pollination: Fertilization is often highly species-specific and rapid evolution of species-specificity in gamete recognition and fertilization is observed in virtually all taxa that have been examined (Clark et al., 2006; Swanson et al., 2004; Swanson and Vacquier, 2002). Several hypotheses have been put forth to explain the extremely rapid evolution of proteins involved in reproduction, including gamete competition, sexual selection, and sexual conflict (Brandvain and Haig, 2005; Clark et al., 2006; Swanson et al., 2004; Swanson and Vacquier, 2002). Under these models, reproductive isolation arises as a by-product.
In plants, species-specificity of gamete recognition is mediated at numerous stages during pollination, including adhesion, germination, pollen tube growth, and guidance to the ovules (Swanson et al., 2004). While any of these barriers individually may not provide complete isolation, acting together they can confer strong species-specificity. In the absence of strong barriers, there is still commonly an advantage of conspecific pollen over heterospecific pollen in mixed matings (conspecific pollen precedence), which may also be a very important factor in isolation and speciation (Howard, 1999).
Arabidopsis species, especially A. thaliana, have been used to investigate the molecular mechanisms resulting in species-specificity of pollen recognition. Pollen of other Arabidopsis species can germinate successfully on A. thaliana stigmas (Palanivelu and Preuss, 2006), suggesting that primary recognition events are unlikely to have contributed to speciation. Pollen tube growth and guidance to the ovules, however, do decline with phylogenetic distance (Palanivelu and Preuss, 2006). That immature pistils show much less ability to discriminate pollen than mature pistils, suggests that there is an active production during pistil maturation of factors important for favoring conspecific pollen or inhibiting heterospecific pollination (Kandasamy et al., 1994).
Ovules are known to emit signals that attract pollen tubes, and the ability to respond to these signals declines so rapidly with phylogenetic distance that even A. arenosa ovules are noticeably less capable of attracting A. thaliana pollen tubes than are A. thaliana ovules (Palanivelu and Preuss, 2006). Recent work in Torenia fournieri has shed some light on the molecular identity of a highly species-specific pollen tube attractant produced by the synergid cells of the ovules (Higashiyama et al., 2006). This signal consists of a cocktail of secreted cysteine-rich proteins named LUREs, which are related to a diverse class of proteins called defensins (Okuda et al., 2009). Female gametophytes of Arabidopsis and maize also express large numbers of diverse defensin-related proteins (Cordts et al., 2001; Jones-Rhoades et al., 2007; Punwani et al., 2007; Yang et al., 2006), suggesting that this type of signal is widely utilized, and that the particular mixture produced may confer high specificity. In the Arabidopsis genus the ease of genetic manipulation will provide ample opportunity to test the roles that genetic diversity of these defensin-like molecules may play in mediating the specificity of pollination.
Another gene that has been suggested to confer species-specificity in the final stages of pollination of A. thaliana is a receptor-like kinase, FERONIA (FER). Plants mutant for FER show a phenotype of pollen tube overgrowth and fertilization failure that is phenotypically similar to that observed in interspecies crosses onto A. thaliana (Escobar-Restrepo et al., 2007). The extracellular domain of FER appears to be rapidly evolving; it is highly divergent between A. thaliana and A. lyrata (Escobar-Restrepo et al., 2007) as well as among populations of A. lyrata (Gos and Wright, 2008). That sequence divergence of FER plays a role in interspecies barriers is certainly plausible from these findings, but remains to be formally demonstrated.
These patterns suggest that pollination barriers may evolve sufficiently rapidly to play a crucial role in species-barriers in the Arabidopsis genus. The degree to which the multiple steps of the pollen recognition process play a role in gene flow in Arabidopsis in the wild has not, to our knowledge, been quantified.
The Role of Chromosomal Differences in Hybrid Sterility and Inviability
Polyploidy: Polyploidy is thought to be one of the major mechanisms of speciation in plants, and one of the few by which reproductive isolation may be essentially instant, since interploidy crosses often fail or produce hybrids that are sterile due to meiotic aberrations (Comai, 2005; Coyne and Orr, 2004; Levin, 2002). Though meiotic problems associated with interploidy crosses do present a strong barrier to gene flow, many species vary in ploidy without being considered separate. This is true for at least three species within the Arabidopsis genus — A. thaliana, A. lyrata and A. arenosa all show natural variation in ploidy without taxonomic recognition as separate species (Al-Shehbaz and O'Kane, 2002). Arabidopsis thaliana and its relatives have provided excellent systems for studying the immediate and longer-term consequences of polyploidy and interploidy crosses both within and between species (Comai et al., 2000; Comai, 2005).
Tetraploid A. thaliana plants are fully viable, fertile and barely distinguishable from diploids (Koornneef et al., 2003) as is also true of A. lyrata and A. arenosa tetraploids. Diploid and tetraploid A. thaliana can be crossed to produce viable triploids (Scott et al., 1998), allowing quantification of the degree to which ploidy changes can cause gene flow barriers via triploid sterility or other mechanisms. Triploids in A. thaliana have reduced fertility, as predicted, but they are not completely sterile (Henry et al., 2005). When selfed, they spawn aneuploid swarms with a wide range of phenotypes, variable fertility and genomic instabilities (Henry et al., 2005; Huettel et al., 2008). After several generations, descendant populations resolve into stable diploid and tetraploid groups (Henry et al., 2005). Discrete genetic loci are associated with aneuploid propagation success, implying that some genotypes are more likely than others to contribute to gene flow across ploidy barriers (Dilkes et al., 2008; Henry et al., 2007; Henry et al., 2009). This work demonstrates that ploidy barriers are not impenetrable to gene flow.
Much is known about why interploidy crosses sometimes fail to produce viable progeny in Arabidopsis. Reciprocal interploidy crosses have different effects on endosperm development: An excess of the maternal genome results in an underdeveloped endosperm in the hybrid seed, while an excess of the paternal genome causes the endosperm to over-proliferate, the severity of which correlates with relative parental genomic doses (Scott et al., 1998). DNA methylation has been implicated as a major factor in this type of hybrid failure - the effect of paternal genome excess can be phenocopied if the maternal (but not the paternal) genome is hypomethylated (Adams et al., 2000).
Barriers to allopolyploid formation: In interspecies crosses differing in chromosome number (due either to differences in ploidy or a different base chromosome number), very similar patterns of seed failure due to endosperm imbalance are often observed (Haig and Westoby, 1991). For studying the mechanisms underlying this, work in the Arabidopsis genus has again provided many insights. Work in this area has focused on hybrids between A. thaliana (n=5) and A. arenosa (n=8 or 16). Though diploid A. thaliana can be readily fertilized by tetraploid A. arenosa, most of the resulting seeds abort (Comai et al., 2000). As was observed within A. thaliana, hybrid seed lethality in these interspecies crosses worsens if the maternal (A. thaliana) genome is hypomethylated, suggesting a role for epigenetic regulation in this type of hybrid failure (Bushell et al., 2003).
In crosses between A. thaliana and A. arenosa, an increase in expression of maternal (A. thaliana) genes that are normally expressed only from paternal copies is correlated with an increase in relative paternal (A. arenosa) genome dosage (Josefsson et al., 2006; Walia et al., 2009). Seed viability is increased when the A. thaliana parent is homozygous for a mutation in one of these genes, PHERES1 (PHE1), demonstrating that this mis-regulation of imprinted genes plays a causal role in this type of incompatibility (Josefsson et al., 2006). Normally, maternal PHE1 expression is suppressed in A. thaliana by a maternally expressed polycomb group protein MEDEA (MEA) (Gagliardini et al., 2005; Köhler et al., 2003; Makarevich et al., 2006). It has been proposed that in hybrids excess paternal genome sequesters much of the maternally expressed MEA with the result that an insufficient amount remains to suppress maternal PHE1 expression. This is consistent with the “Dosage-Dependent Induction (DDI)” model, which suggests that regulatory molecules are produced in proportion to genome size or copy number and that hybrids can be compromised if one parental genome cannot suppress deleterious expression of genes from the other parent due to insufficient production of regulatory factors (Erilova et al., 2009; Josefsson et al., 2006). A recent genome-wide study of DNA methylation in the endosperm of A. thaliana helped uncover five additional imprinted genes and about 50 candidates (Gehring et al., 2009). It will be interesting to see whether any of these genes are also misregulated to ill effect during hybrid seed development. Consistent with a possible role for epigenetic factors regulating seed development in hybridization barriers, MEA appears to be evolving under diversifying selection (Miyake et al., 2009). This fits with the idea that imprinting can arise from parental conflicts, which drive rapid gene evolution (Haig and Westoby, 1991; Wilkins and Haig, 2003). Self-fertilization is expected to affect selection for imprinting (Brandvain and Haig, 2005), thus it would be especially interesting in this context to examine whether imprinting-related incompatibility affects hybrids involving A. thaliana differently from hybrids among outcrossing members of the genus.
Short interfering RNAs (siRNAs) may also play an important role in hybrid seed failure. Many siRNAs produced from transposons and other repetitive sequences are important in their silencing (Matzke et al., 2009). Recently, it has been demonstrated that many siRNAs are expressed only from the maternal genome in the endosperm (Mosher et al., 2009). Plants defective in the production of these siRNAs show transcription of some transposons, though within A. thaliana this leads to no obvious defects (Mosher et al., 2009). Paternally derived transposon siRNAs of a different size class are produced in pollen grains (Slotkin et al., 2009) and it is thought that they may function in coordination with maternally-derived siRNAs to suppress transposon expression (Martienssen, 2010). The combination of maternal and paternal transposon-derived siRNAs may not function together correctly in interspecies hybrids, or may not be produced in adequate quantities to silence transposons from both parental genomes. These possibilities are intriguing, as improper silencing of transposons has been shown to negatively impact interspecies hybrids. For example, in Drosophila, highly deleterious activation of paternal transposons occurs in hybrids among strains in which only the father carries P-element transposons, because the mother lacks the transposon siRNAs necessary to silence them in the fertilized egg and developing embryo (Blumenstiel and Hartl, 2005). Interspecies hybrids in rice and evening primrose similarly have high levels of transposon activity, which leads to a high mutation rate and varied morphological defects in subsequent progeny (Wang et al., 2009). In interspecies hybrids in both banana and petunia, inadequate silencing in hybrids can lead to activation of integrated pararetroviruses, which can generate active infections (Harper et al., 1999; Ndowora et al., 1999; Richert-Poggeler et al., 2003). In A. thaliana × A. arenosa hybrid seeds, expression of paternal ATHILA transposons is elevated proportional to the severity of genome imbalance and correlates with seed failure (Josefsson et al., 2006; Madlung et al., 2005). The A. arenosa genome contains more ATHIILA elements than the A. thaliana genome and thus the amount of maternally produced ATHILA repressing elements (probably siRNAs) may not suffice to silence the A. arenosa elements in hybrid seeds (Josefsson et al., 2006).
Problems can also arise later in allopolyploid hybrid plants that have survived the initial seed development hurdle. Hybrid A. thaliana × A. arenosa plants often exhibit reduced fitness, low fertility and developmental abnormalities. Differences in gene and siRNA expression among hybrids, and between hybrids and the parent plants, help to explain the reduced fitness as well as phenotypic variation (Ha et al., 2009; Wang et al., 2006b). Multiple chromosomal rearrangements and segmental losses observed in A. thaliana × A. arenosa hybrids may be a main cause of meiotic abnormalities, the prevalence of which in turn is inversely proportional to pollen viability (Madlung et al., 2005; Pontes et al., 2004). It has been suggested that increased transposon activity in allopolyploids causes chromosomal breaks, which lead to joining of homeologous chromosome segments (Pontes et al., 2004). Homeologous recombination could also be an explanation for the observed fusion of homeologous chromosomes, but this remains to be demonstrated (Madlung et al., 2005).
Allopolyploidy as an opportunity: In some cases, crosses between species with different chromosome numbers can give rise to stable allopolyploid hybrids, with a full genome complement from each parent (Hegarty and Hiscock, 2005; Mallet, 2007; Rieseberg and Willis, 2007). Allopolyploidy is commonly observed in nature and may be a common mode of speciation in plants (Leitch and Leitch, 2008). Arabidopsis thaliana × A. arenosa hybrids, replicating the cross that gave rise to A. suecica, can easily be produced in the laboratory. Though first generation hybrids often have reduced viability and low fertility, stable, fertile lines can be generated and have provided useful models for studying hybridization and early events following allopolyploidy (Comai et al., 2000).
Nonadditive gene expression observed in resynthesised A. suecica can confer novel traits and provide the nascent allopolyploid hybrid species with novel habitat and help isolate it and protect it from parental lineages. For example, interaction of a flowering time regulator from an A. thaliana accession with one from an A. arenosa accession causes significantly delayed flowering in inter-species hybrid plants relative to either parent (Wang et al., 2006a, 2006b). Such a change in flowering time could impose a significant prezygotic gene flow barrier. The allotetraploid species A. kamchatica also enjoys a far broader range of climates than its parents (Hoffmann, 2005) allowing it to exploit more habitats and develop geographic isolation from its parents. That hybrid species may have novel adaptive potential that can provide the nascent species a novel niche and an easy route to limiting gene flow with parental populations is well known, especially from wild sunflower species (Rieseberg et al., 2003; Schwarzbach et al., 2001; Ungerer et al., 1998;). The Arabidopsis genus has already been providing the opportunity to uncover molecular details of this speciation mechanism.
The potential role of centromeres in hybrid sterility: Evolutionary divergence of centromere sequences and associated proteins could cause aberrant chromosome segregation in hybrids, and thus, meiotic instability and hybrid sterility. The centromeric repeats and associated proteins make up part of the kinetochore (a protein-based structure that connects the centromere to the spindles), which is essential for proper meiotic and mitotic cell division. A complex interaction exists between the sequence of the repeats, the chromatic state, and the kinetochore proteins of the centromere (Dawe and Henikoff, 2006). Hybrids of species with diverged centromeres and centromeric proteins may exhibit reduced fitness due to unequal behavior of centromeres during cell division, which could lead to nondisjunction (Henikoff et al., 2001). In Drosophila species, rapid evolution of heterochromatic repeats and associated proteins can contribute to hybrid sterility by disrupting chromosome segregation (Bayes and Malik, 2009; Ferree and Barbash, 2009). The role that centromeres and associated, perhaps adaptively evolving proteins might play in speciation is an intriguing question and one that certainly merits further exploration.
A main component of most plant and animal centromeres is a repeated sequence usually around 180 base pairs in length. Analysis of the centromeric repeat sequences of different Arabidopsis species demonstrated that each species has its own unique repeat sequence or a characteristic combination of different repeats (Heslop-Harrison et al., 2003; Kamm et al., 1995; Kawabe and Nasuda, 2005). Proteins specifically associated with the centromeric repeats are surprisingly divergent between A. thaliana and A. arenosa and there is evidence that they are rapidly evolving (Talbert et al., 2002; Talbert et al., 2004). Furthermore, there has been a burst in the number of LTR retrotransposons in A. lyrata and A. halleri compared to A. thaliana and most of these are found in the centromeres (Tsukahara et al., 2009). Centromere transposon complements have been shown to vary considerably in abundance and type even between closely related species (Hawkins et al., 2006; Vitte and Bennetzen, 2006). Thus the idea that centromere evolution can contribute to gene flow barriers among species in the Arabidopsis genus is enticing, though a formal demonstration that this contributes to speciation in this genus is still lacking.
Genic Causes of Hybrid Sterility or Inviability
Segregation distortion: Segregation distortion (SD), also known as transmission ratio distortion, has been used throughout the speciation literature as a way of identifying genomic regions that might contribute to genetic isolation between populations or species (e.g. Bradshaw et al., 1998; Fishman et al., 2008; Fishman et al., 2001; Hall and Willis, 2005; McDaniel et al., 2007; Myburg et al., 2004; Orr and Irving, 2005; Yin et al., 2004). SD is conceptually simple - a deviation from expected Mendelian segregation ratios in a particular region of the genome - but in practice, interpreting SD and its causes can be complex. SD may be caused by diverse factors, including segregation of genetic incompatibilities, chromosomal rearrangements, nuclear cytoplasmic incompatibility, meiotic drive, fitness differences that lead to differential survival, and/or inadvertent human selection during line generation (especially in approaches such as recombinant inbred line construction where multiple generations are successively propagated). SD can arise due to selection during the haploid (gametic) phase (e.g. through differences in fertilization success) or in the diploid (zygote) stage (e.g. through genetic incompatibilities or maladaptation).
Progeny of crosses within A. thaliana commonly show SD (e.g. Balasubramanian et al., 2009; Clerkx et al., 2004; el-Lithy et al., 2006; Liu et al., 1996; McKay et al., 2008; Törjek et al., 2006; Wilson et al., 2001). This has been primarily a nuisance factor in mapping experiments and thus only a few studies addressed the underlying causes. However, SD among strains within a species may be of interest in speciation research, because it could indicate early-arising incompatibilities. To date all SD cases within A. thaliana for which causes have been identified result from genetic incompatibilities. In crosses between the Col and C24 accessions, for example, two unlinked loci interact epistatically to cause severely reduced male fertility in complementary homozygotes (Törjek et al., 2006). The causal genes remain to be identified, but will provide an interesting insight into how male sterility, a common and probably early-arising type of between-species incompatibility (Coyne and Orr, 2004), may originate as within-species variation. In another case, segregation distortion results from the loss of alternate copies of a duplicated essential gene leading to embryo lethality in complementary double homozygotes (Bikard et al., 2009), supporting the idea that loss of alternate copies of a duplicated essential gene in different lineages could lead to incompatibility (Lynch and Force, 2000). Some cases of segregation distortion in A. lyrata appear to arise from linkage of deleterious mutations that affect pollen performance to the self-incompatibility (SI) locus. Since at the SI locus rare alleles have a mating advantage, haplotypes carrying linked deleterious alleles are difficult to purge from the population and can result in gamete-level SD and long-term balancing selection (Bechsgaard et al., 2004; Leppälä et al., 2008). Similar hitchhiking of deleterious mutations possibly leading to balancing selection has been suggested to occur at MHC locus alleles in humans — the high diversity, heterozygosity and low recombination, all features shared with the S-locus, render purifying selection inefficient for purging deleterious alleles (van Oosterhout, 2009).
As populations are genetically isolated by geography, edaphic adaptations, pollinator fidelity, incompatibility or other factors, genetic incompatibilities are expected to accumulate and further impede gene flow (Rieseberg and Willis, 2007). This indeed appears to be the pattern in numerous taxa (Coyne and Orr, 2004; Orr and Turelli, 2001; Rieseberg and Willis, 2007), but we still know very little about the degree to which incompatibility accumulates during divergence in the Arabidopsis genus. However, patterns are beginning to emerge: For example, within a population of A. lyrata ssp. petraea from Iceland, SD was observed at only about 10% of markers in crosses. Most of this could be attributed to gameticlevel selection, due in part to deleterious mutations linked to the SI locus (Bechsgaard et al., 2004; Leppälä et al., 2008). Reciprocal crosses among a Swedish and Russian population on the other hand showed about twice as much SD, and much more of it attributable to zygotic-level selection (Kuittinen et al., 2004; Leppälä et al., 2008). Crosses between species show, as predicted, much more extensive distortion: hybrids between A. lyrata ssp. petraea and A. halleri ssp. halleri, have reduced vigor (Ramos-Onsins et al., 2004) and marker distortion in different intercrossing strategies ranges from 40% (Willems et al., 2007) to 76% of markers (Filatov et al., 2007). These studies indicate a general trend of increasing accumulation of genetic incompatibilities with divergence time, but more studies are needed to understand the underlying molecular causes and how this might contribute to gene flow barriers.
Hybrid necrosis: Hybrid necrosis, a phenotype involving dwarfism and extensive cell death is very common in the plant kingdom (Bomblies and Weigel, 2007b), and is also a repeated cause of failure of hybrids among A. thaliana accessions. Hybrid necrosis occurs in progeny of about 2% of crosses among accessions and can cause strong segregation distortion in recombinant inbred lines or F2 populations (Alcazar et al., 2009; Bomblies et al., 2007). While in the Arabidopsis genus hybrid necrosis is rare, in some taxa it is common and may play an important role in gene flow restriction (Brieger, 1929; Jiang et al., 2000; Kostoff, 1930; Lee, 1981; Phillips, 1977). Thus the Arabidopsis system provides a model for a phenomenon relevant to divergence and speciation in plants more generally. Several genes have been cloned in A. thaliana, showing that this type of hybrid failure is caused by deleterious epistatic interactions among diverged components of the plant immune system that interact to trigger a highly detrimental induction of pathogen responses (Alcazar et al. 2009; Bomblies et al., 2007). Hybrid necrosis has been attributed to immune induction and resistance-related genes in other species as well (Hannah et al., 2007; Jeuken et al., 2009; Khanna-Chopra et al., 1998; Krüger et al. 2002; Yamamoto et al., 2010). Defense-related genes are among the most diverse in the A. thaliana genome, and often show patterns of nucleotide polymorphism suggestive of non-neutral evolution, including patterns of diversifying and balancing selection (Bakker et al. 2006; Borevitz et al. 2007; Clark et al., 2007; Mondragón-Palomino et al. 2002). This implicates adaptation to rapidly diversifying pathogens in the accumulation of genetic incompatibilities among populations or species (Bomblies, 2009). Modeling has recently shown that under some circumstances, hybrid necrosis could contribute to the evolution of reproductively isolated populations (Ispolatov and Doebeli, 2009).
Though we know quite a lot about the molecular basis of a number of evolutionary processes in the Arabidopsis genus, we still know comparatively little about speciation, gene flow barriers and genetic divergence in this tractable group of plants. What is known so far about potential and realized gene flow, as well as barriers among species, is summarized in figure 2. Though many barriers and gene flow possibilities have been uncovered, many gaps remain in our understanding of what drove the divergence of these species, and what barriers maintain their distinctiveness.
The Arabidopsis species, with extensive variation in habitats, evidence of local adaptation, variation in ploidy, at least two allopolyploid hybrid species, a newly discovered hybrid zone and well-established molecular models and tools, provide exciting opportunities for using this genus more fully to study speciation-related processes in molecular detail. To fully capitalize on these opportunities, we still need some groundwork: The need for a detailed, molecularly informed re-evaluation of morphology and taxonomy in the Arabidopsis genus has already been emphasized (Koch et al., 2008), but we also need to greatly improve our understanding of the present and historical patterns of local adaptation and divergence (i.e. with population genetics and evolutionary modeling approaches), especially as they might relate to gene flow barriers. We need to understand more fully why Arabidopsis species so rarely occur in the same areas. What are their specific adaptations and to what degree do these limit contact? Attempting to correlate this information with genome sequence divergence will be informative for understanding which barriers may have been particularly significant over the long term. While studies that attempt to characterize all gene flow barriers acting between two species or populations still have some limitations (for example, they do not necessarily allow us to infer the temporal order in which isolating barriers arose, or whether distinct barriers are genetically independent), they can provide an extremely valuable starting point for investigating what factors might have been important in speciation. Such studies of gene flow barriers (see for examples Lowry et al., 2008a) would be very useful in Arabidopsis and will allow us to more effectively capitalize on molecular and genomic tools to ask ecologically informed questions and benefit from the power of this genus to uncover mechanisms relevant to speciation and adaptation. In addition, it will be very informative to begin including the outcrossing species in more detailed studies of species barriers — for example, why do diploid A. arenosa and A. halleri, whose ranges overlap, not seem to hybridize in the wild, or why do A. lyrata and A. halleri, whose hybrid is viable in the lab, and whose ranges also overlap in Europe, not form hybrid zones? Are hybrid zones such as that observed with A. lyrata and A. arenosa tetraploids in Austria also observed among diploids? There are many remaining questions about the causes and consequences of divergence in the Arabidopsis genus, and the molecular tools available for a growing number of species provide exciting opportunities for understanding the molecular basis of speciation in this model genus.
We are grateful to Levi Yant, Marcus Koch, Roswitha Schmickl, Jesse Hollister and Brian Arnold for discussions and helpful comments on the manuscript. The authors were supported by setup funds from Harvard University and a fellowship from the John D. and Catherine T. MacArthur Foundation (KB).