The genetic basis of weedy and invasive traits and their evolution remain poorly understood, but genomic approaches offer tremendous promise for elucidating these important features of weed biology. However, the genomic tools and resources available for weed research are currently meager compared with those available for many crops. Because genomic methodologies are becoming increasingly accessible and less expensive, the time is ripe for weed scientists to incorporate these methods into their research programs. One example is next-generation sequencing technology, which has the advantage of enhancing the sequencing output from the transcriptome of a weedy plant at a reduced cost. Successful implementation of these approaches will require collaborative efforts that focus resources on common goals and bring together expertise in weed science, molecular biology, plant physiology, and bioinformatics. We outline how these large-scale genomic programs can aid both our understanding of the biology of weedy and invasive plants and our success at managing these species in agriculture. The judicious selection of species for developing weed genomics programs is needed, and we offer up choices, but no Arabidopsis-like model species exists in the world of weeds. We outline the roadmap for creating a powerful synergy of weed science and genomics, given well-placed effort and resources.
Weedy and invasive species cause up to $100 billion in damage annually in crop and ecosystem function loss (Pimental et al. 2005), but the biological mechanisms responsible for their success remain poorly understood. Genomics is an approach to understanding biology that involves global analysis of gene organization, expression, and function at the whole-genome level (Hieter and Boguski 1997). Genomic tools offer unparalleled opportunities to dissect the genetic basis and evolution of traits associated with the success of weedy and invasive plants. Although many weed scientists already study complex features of biology that arise from gene activity across the genome, most researchers have not been able to make use of these genomic resources.
Genomics has language, assumptions, and conventions of its own, and these can pose a barrier to the uninitiated researcher. The broad scope of genomic research necessitates the use of high-throughput technologies, and these generate large data sets that are comprehensible only with the aid of computer analyses. This situation can be daunting to researchers who are not versed in the computational tools of bioinformatics (see Box 1), particularly because most genomics software developed to date is not user-friendly. Moreover, the high cost of incorporating genomics into research projects has historically prohibited the application of these technologies to nonmodel systems. To put the powerful tools of genomics to work for weed science, we must explore the nature of these information and technology gaps and how they might be closed.
Bridging the gap from genomics to weed science is not without precedent. A similar situation occurred with the emergence of molecular biology, which, in its early years, was viewed by many weed scientists as a separate and foreign discipline. Of course, today the techniques of molecular biology pervade all types of biological research and have provided tremendous insights into the biology of weeds, including their origin, dispersal, and mechanisms of control. In much the same way, genomic approaches promise to extend our insights again, this time beyond individual genes to the nature and evolution of complex traits and genomes as a whole (Yuan et al. 2008).
Although a number of reviews have been published on the use of genomics, molecular genetics, and biochemistry in weed science (Basu et al. 2004; Chao et al. 2005; Indergit et al. 2006; Stewart 2009; Yuan et al. 2007), the development of genomic tools and resources for weedy and invasive species lags far behind that for crops and model species. The question is how to focus the expertise and resources of the scientific community on achieving a set of common goals for weed genomics. A recent workshop was held to tackle the major issues related to developing a weed genomics research plan (Table 1) and to chart the course of a research agenda. The product of that workshop is provided here as a proposed roadmap for using modern research tools in weedy and invasive plant biology, especially to better understand the evolution of these traits. This article will review (1) key strategies for using genomic approaches to achieve the goals of weed science, (2) examples of successful research programs in this area, (3) candidate species for efficient leveraging of genomic resources, and (4) how weed scientists can move toward implementing this agenda in their research.
Questions that should be addressed to develop a strategic and comprehensive weed genomics research plan.
Genomic Approaches to Weed Science
Genomic technologies already have a proven record of advancing our understanding of basic animal and plant biology. For example, ecological genomics has tackled issues of organismal response to environment, genetic variation, and adaptation (Karrenberg and Widmer 2008; Thomas and Klaper 2004; Wu et al. 2008). The challenge for the weed science community is how to maximize the use of genomic approaches to answer questions that are important to weed biology. Genomics must help us understand the traits that have made weeds successful colonizers and troublesome pests, as well as how these features evolve, because we know that weeds can adapt quickly (Barrett 1983; VanGessel 2001). There are two main genomic approaches to understanding the genetic basis of the enhanced performance of weedy and invasive species: genome analyses and trait analyses. These two approaches complement one another, with genome analyses generating insights into loci and traits of interest, their evolutionary context, and interactions with other loci, and trait-based analyses providing insights into the nature and function of the genes underlying focal weedy traits (Figure 1). These approaches share an interrelated set of genomic tools (Figure 2; Box 1).
Genome Analyses: Population Genomics
Analyses of the genome itself include both population genomics and functional genomics (Figure 1). Population genomics refers to the assessment of genetic variation and differentiation within loci across the genome (Stinchcombe and Hoekstra 2008). This requires gathering genomic data from multiple individuals. Although whole-genome sequencing of model organisms will be essential for providing frameworks for assembly and annotation of related individuals, population genomics must make use of a host of different methods for determining genome-wide patterns of sequence variation. These methods range from indirect assays with molecular markers (Kane and Rieseberg 2007; Neale and Ingvarsson 2008; Wood et al. 2008) to direct sequencing of expressed sequence tags (ESTs; see also, gene-space sequencing) using next-generation sequencing technologies (Mardis 2008). Patterns of genetic similarity among weedy and related populations can then be used to reveal important aspects of population history, including the origin of weedy and invasive populations, their history of expansion, their propensity for gene flow, and their tendency to hybridize with related taxa (Kane and Rieseberg 2007; Zayed and Whitfield 2008). Genome-wide scans are particularly sensitive methods for detecting gene flow and hybridization, critical issues in weed science, where the acquisition of locally adapted traits or resistance to chemical and biological controls might be rare in occurrence but high in importance (Dlugosch and Whitton 2008; Ellstrand and Scierenbeck 2000; Whitney et al. 2006).
An especially powerful application of genomics is in the identification of targets of selection, including both artificial selection, imposed by control efforts, and natural selection, for colonizing ability and adaptation to local environments. Detection of putative selection from molecular-marker scans relies on outlier analyses, in which loci that show the greatest reduction in diversity (a selective sweep), or greatest genetic distance (diversifying selection), or both, are viewed as possible targets of selection. However, marker-based scans appear to have a high false-positive rate (Wiehe et al. 2007), so these studies are best viewed as providing a ranked list of candidate loci. Also, although marker-based approaches offer an inexpensive means of identifying candidate loci, they fail to detect the actual sites targeted by selection (although see Wood et al. 2008).
A broader and more powerful array of methods is available for detecting signs of selection in sequence data (Wright and Gaut 2005). These include methods of testing for selective sweeps via reduced variability (Hudson-Kreitman-Aguadé [HKA] test; Hudson et al. 1987) or mutation frequency distribution shifts (Tajima's D test; Tajima 1989) and testing for protein evolution via increased nonsynonymous substitution rates (nonsynonymous [Ka] to synonymous [Ks] ratio test, Yang 1998; McDonald-Kreitman test, McDonald and Kreitman 1991). Although these methods are less prone to false positives than marker-based approaches, again, it is probably best to employ them for providing ranked lists of candidate genes. It is the observation of the same genetic changes in invasive populations that have independent origins that will provide the strongest evidence for identifying specific genes or mutations as being responsible for weedy and invasive traits. Parallel evolution of functional groups of genes might also reveal consistent trade-offs that contribute to invasion success, even if particular evolutionary pathways differ among populations or species (as observed in weedy sunflowers, Kane and Rieseberg 2008; Lai et al. 2008). Once loci under selection have been identified in species that are polymorphic for weedy and invasive behaviors, changes in these genes can be analyzed at a broader phylogenetic scale to better understand why weeds are concentrated in some groups of plants but not in other seemingly similar taxa.
Genome Analyses: Functional Genomics
Functional genomics includes the study of genome-wide patterns of gene expression (Hieter and Boguski 1997). It is possible to make quantitative comparisons of genomic expression patterns across species and populations by printing complimentary DNAs (cDNAs) or oligonucleotides onto microarrays and probing them with the transcriptomes (messenger RNA [mRNA] extractions) of different plants. Microarrays can be made for model species and used to survey expression in related species because heterologous microarray-hybridization experiments have been successful in species with divergence times as great as 65 million years (Renn et al. 2004; Taji et al. 2004). However, expression data are most easily interpreted if nucleic acid hybridizations are conducted using a microarray developed from the same species. Gene expression and sequence data can be obtained simultaneously using next-generation sequencing technologies. The comparisons made with these techniques can identify loci that are differentially expressed by weedy genotypes, suggest trade-offs in physiological responses to different environments or control measures, and reveal correlated responses among networks of interacting loci (Yuan et al. 2008).
Detecting the selection for weediness genes from expression data is more challenging than from sequence data. If weedy and nonweedy populations are exchanging genes, then significant expression differences (measured in a uniform environment) are probably a consequence of selection, although maternal environments, particularly temperature differences, could also affect gene expression (Blödner et al. 2007; Johnsen et al. 2005). As with sequence data, the strongest evidence of selection comes from parallel expression shifts in weedy populations that have independent origins (Lai et al. 2008). A major issue in the interpretation of expression data is whether a significant expression change is a direct target of selection or a side-product of selection on other genes (pleiotropy). In principle, it should be feasible to distinguish between these alternatives by determining the regulatory basis of the expression changes: cis-regulated changes are more likely to be the direct product of selection, whereas trans-regulated changes are more likely to result from pleiotropy (Landry et al. 2007).
Expression data are well-suited to identifying physiological trade-offs experienced by weeds as they invade different environments or face various control measures. Microarray experiments have already provided valuable insight into physiological processes related to weediness (Horvath and Clay 2007; Horvath et al. 2003, 2006a, 2008). By surveying genomic expression in plants grown under various conditions, we can understand which loci or classes of genes are up-regulated or down-regulated in different environments. The latest generation of arrays even allows identification of individual genes within larger gene families, permitting assessments of how divergent functions of gene paralogs might contribute to the broad ecological tolerances of many weedy species (Chao et al. 2005; Kim et al. 2008).
Using genomic expression data in combination with genome sequences, as previously described, also provide opportunities for detecting short transcription-factor binding sites shared between clusters of coordinately regulated genes (Tatematsu et al. 2005). These clusters could play important roles in regulating various weedy traits, and such characterization is an important complement to trait-based analyses (see below). Importantly, characterization of transcription factors could also provide molecular targets for novel herbicide development.
Trait analyses, the second major genomic approach (Figure 1), focus on the genetics of traits that are hypothesized a priori to contribute to weediness or invasiveness, such as competitiveness, high fecundity, delayed germination, the ability to reproduce vegetatively, and herbicide tolerance or resistance (Gressel 2002). Trait-based analyses include both forward genetics, which starts with the phenotype and moves toward gene identification, and reverse genetics, which starts with a gene and moves toward identifying the phenotype it affects. These approaches would be greatly facilitated by the creation of model weed systems, ranging from full genome sequencing to the development of permanent mapping populations to transgenics (Figure 2). The fact that 80.6% of the genes in Arabidopsis are also found in rice (Oryza sativa L.; Yu et al. 2002) underscores the potential for many of the genes and physiological processes controlling weedy and invasive traits to be shared among model and nonmodel species. However, a 20% (or even 5%) difference is sizable. There is a need to develop the genomics of species that are diverse with respect to life history and phylogeny.
Whole-genome sequencing and the development of population genomic markers can be used to perform forward genetics, including mapping of quantitative trait loci (QTL) in controlled crosses and association mapping of loci to phenotypes in natural populations. These techniques are critical for identifying the genetic basis of key traits, such as herbicide resistance and plant parasitism. By understanding their genetic basis, we will be able to track the evolution of these traits, as well as their occurrence, inheritance, and dispersal. This information, in turn, provides the ability to predict responses to different control measures, to genetically tailor management to weeds, and to modify crops genetically for resistance to parasites. Focal genes for trait analyses will also prompt further genome-level analyses to understand the evolutionary context and interactions of these key genes.
The connection between particular loci and phenotypes cannot be confirmed without reverse genetics, where the effects of genes are demonstrated directly by genetic transformation. Manipulation of genes in plants can be done by transgenic overexpression, gene knockdown analysis, or mutagenesis. For example, a putative herbicide-resistance gene cloned from a resistant genotype might be overexpressed in an otherwise susceptible genotype, followed by subsequent herbicide challenge. Or that same gene's expression could be knocked down in the resistant genotype, challenged with herbicide, and tested for conversion to herbicide susceptibility. The combination of transformation and susceptible- and resistant-biotypes would be valuable for screening putative, nontarget, herbicide-resistance targets from other species in overexpression assays. In fact, genomic approaches have already been used in the search for herbicide target sites by high-throughput knockout of genes (Lein et al. 2004). Efficient transformation systems will be a necessary component of any genomic analysis because the biological significance of identified putative weediness genes must be verified by investigating their effect in the species of interest.
Benefits to Weed Management
Ultimately, the practical goal of weed genomics is to aid in weed management. Support for genomic research is dependent upon its application to the needs of end users and its benefits to agriculture and the environment. Genomics and related molecular techniques have the potential to provide these practical benefits by increasing our ability to identify traits that contribute to weediness, to find new effective and environmentally sound control measures, and to predict evolutionary responses to our management practices (Anderson 2008).
Historically, it has been difficult to precisely define the traits and genes that make a species particularly weedy and invasive. Invasiveness in a particular environment depends on the genomic constitution of the weed species and on the environment at the site of introduction. For example, an agronomic weed might have succeeded by accumulating domestication traits, such as the loss of dormancy and shattering that mimic a crop (Warwick and Stewart 2005). In contrast, for invasive weeds of wild or natural areas, success may be based on the retention of those traits (Lai et al. 2006). Understanding the sources of genetic variation for these traits and their rapid adaptation to different environments could lead to the ability to predict whether and where a weed will become invasive (Prentis et al. 2008). Genome scans that compare gene-sequence diversity across populations can be used to reveal which loci are associated with success in different environments and to determine the sources of variation in those traits (i.e., standing variation vs. new mutations).
Herbicide resistance is undoubtedly the most important trait affecting long-term control of weedy populations. Genomics provides powerful opportunities to elucidate the action of herbicides (Eckes et al. 2004), the evolution of herbicide resistance, and the occurrence, inheritance, and dispersal of herbicide-resistance genes. Extensive information, mainly using DNA sequencing and single nucleotide polymorphism (SNP) analysis, has already been used in research examining the molecular mechanisms of target-site herbicide resistance (Devine and Shukla 2000; Tranel and Wright 2002). However, fewer nontarget-site resistance mechanisms have been elucidated at the molecular level because of the more complicated basis of this type of resistance and the limited genomic information available for weedy species (Yuan et al. 2007). Global gene-expression profiling techniques, such as microarrays, are a powerful tool for studying the molecular responses to herbicide application (Lee and Tranel 2008; Raghavan et al. 2005, 2006) and can be especially valuable in identifying nontarget herbicide-resistance mechanisms (Yuan et al. 2007). Molecular markers have been used to investigate single vs. multiple origins of herbicide resistance, gene flow, and the frequency of resistant alleles in weed populations, which are factors that strongly influence weed management strategies (Bodo Slotta 2008). Genomics approaches might be able to finally provide a mechanistic understanding of the utility of herbicide rotations vs. herbicide mixtures for prevention of resistance and of the effect of low doses or high doses on the evolution of resistance. A mechanistic understanding would allow us to predict when and where a particular practice (e.g., low dose vs. high dose) would be correct (Gardner et al. 1998). The identification of pathways involved in herbicide response may also suggest novel molecular targets for herbicide development.
Parasitic weeds are among the most difficult weeds to manage because of the physical and physiological interactions of these species with their host plants. Genomic techniques can aid in identifying host genes that naturally provide resistance to parasitic weeds or genetic pathways critical for parasitic infection. Metabolomics and proteomics could be used to identify the unique features of plants that are naturally resistant to parasites, which could lead to identification of the genes responsible for resistance (Gressel 2008). Additionally, such studies are likely to suggest the pathways and genes in the parasite that are required for infection, offering targets for new control measures.
We know that these weediness traits can evolve in response to control measures, and genomics can help us to identify sources of variation in weedy populations and to predict their evolutionary responses to control. Genome-scale surveys of molecular markers can quantify gene flow and the frequency of hybridization, which can affect weed management practices (Bodo Slotta 2008; Tranel and Wright 2002). In particular, gene flow and hybridization have recently become popular areas of study because of the movement of herbicide-resistance genes both from naturally evolved resistance genes and from transgenes. However, the effects of gene flow on traits, such as salt- or drought-tolerance that could increase a weed's fitness, have not, as yet, to our knowledge, been addressed (Mallory-Smith and Zapiola 2008).
Finally, coupling estimates of gene flow with rates of adaptation in weediness traits would be particularly powerful for guiding management. For example, comparing selective pressures on genes in weeds sampled from different cropping systems would provide information on the roles that agricultural fields, fallow fields, and natural areas play in the maintenance of heritable adaptive traits. This information aids in the design of weed management systems: A high migration rate with a low adaptation rate would require different management than if both migration and adaptation rates were high. In the latter case, it would be important to change management practices more quickly to minimize opportunities for the weed to adapt.
Successful Examples of Weed Genomics Research
Evolutionary Population Genomics in the Compositae Family
As far as we are aware, evolutionary population genomic methods have thus far only been applied to weeds in the sunflower (Compositae syn. Asteraceae) family (Stevens 2007). Evolutionary genomic studies have been feasible in this group because of the development of EST libraries and microarrays for several weeds in the family (Barker et al. 2008; Broz et al. 2007; Church et al. 2007; Lai et al. 2006). Most of this work has been done through the Compositae Genome Project ( http://compgenomics.ucdavis.edu/compositae_index.php), which has been funded by the now defunct U.S. Department of Agriculture (USDA) Initiative for Future Agriculture and Food Systems (IFAFS) program and more recently by the National Science Foundation (NSF) Plant Genome Program, with the goal of developing genomic tools and resources for this large and economically important family.
Three studies from the Compositae Genome Project illustrate both the promise and challenges of evolutionary population genomics. A scan of 106 microsatellite (simple sequence repeat [SSR]) loci for evidence of selection in wild and weedy sunflower (Helianthus annuus L.) populations identified several loci that have swept through one or more weedy populations. The scans employed SSRs located within ESTs, which have the advantage of providing candidate genes that are known to be expressed and tightly linked to each locus. Although most of the putative sweeps appear to represent examples of local or regional adaptation, rather than selection for weediness per se, one gene (a heat shock protein) exhibited independent sweeps across weed populations from across the United States and, thus, appears to represent a “weedy gene” (Kane and Rieseberg 2008). Likewise, microarray experiments using a first-generation cDNA array (3,100 unique genes; Lai et al. 2006) identified 165 genes, representing about 5% of total genes on the array, which showed differential expression in one or more weed populations (Lai et al. 2008). Two functional categories of genes were significantly overrepresented: response to stress and response to biotic or abiotic stimulus. However, the most intriguing finding was that genes with consistent expression differences across all four weed populations assayed were mostly down-regulated, implying trade-offs with other functions and potential adaptation to more benign conditions.
More recently, the Roche GS-FLX (454) next-generation sequencing platform1 has been employed to sequence normalized cDNAs from 10 native and 10 invasive yellow star-thistle (Centaurea solstitialis L.) genotypes (K. Dlugosch, M. Barker, Z. Lai, and L. Rieseberg, unpublished data). An average of 89,000 200-bp reads and 32,000 unigenes were obtained per genotype or about 71 Mbp/plate. Despite fairly low redundancy, preliminary assemblies and analyses indicate that approximately 2,000 unigenes can be scanned for evidence of selection. As in sunflower, genes involved in stress responses predominate among those showing evidence of selection, a result consistent with a trade-off hypotheses for weed evolution, which posits that plants are unable to be highly stress tolerant and highly competitive (or reproductive) simultaneously (Grime 1977). Should this observation prove to be general, it would provide one of the first mechanistic explanations for the evolution of weediness. Evolutionary genomics, within and among species, thus offers powerful new opportunities to identify common mechanisms facilitating the success of weedy and invasive plants, with important implications for the management of these species.
Comparative and Functional Genomics in Leafy Spurge
Leafy spurge (Euphorbia esula L.) is a member of the Euphorbiaceae family that contains important agronomic crops, such as cassava (Manihot esculenta Crantz), castorbean (Ricinus communis L.), and rubber tree [Hevea brasiliensis (Willd. ex A. Juss.) Müll. Arg.], as well as horticultural species, such as poinsettia (Poinsettia pulcherrima Willd. ex Klotzsch). Leafy spurge has been considered as a model to study seed and adventitious root bud dormancy in perennial dicot weeds (Chao et al. 2005). However, early attempts to garner support and funding to initiate a genomic program for leafy spurge met with little success. To overcome some of the financial hurdles, potential collaborators working on related species were identified. Several research groups realized that a significant understanding of the conservation and diversity of genes between members of Euphorbiaceae was lacking but saw the potential for identifying economically important genes and physiological/developmental processes common to multiple members of this plant family; this initiated the pursuit of a coordinated large-scale effort for developing genomics resources in multiple Euphorbiaceae species, including cassava, rubber tree, and leafy spurge.
Preliminary collaborations demonstrated good cross-species utility of genomic resources (Anderson et al. 2004). Thus, based on a common goal of generating a Euphorbiaceae-specific microarray, various in-house resources, collaborative agreements, and small, competitive grants were used to develop a low-cost program that resulted in the production of about 23,000 unique leafy spurge sequences (Anderson et al. 2007) and about 9,000 unique cassava sequences (Lokko et al. 2007). These ESTs have been annotated and organized for the construction of Euphorbiaceae-specific DNA microarrays, which represent in excess of 23,000 unigene set, including 19,015 leafy spurge unigenes and 4,129 unigenes from cassava. The development and use of these high-density microarrays are enhancing our understanding of genes and genetic networks associated with traits that make perennial weeds, such as leafy spurge, so invasive and difficult to control (Horvath et al. 2008). The success of these initial collaborations have resulted in further success stories, which include (1) grants, through the U.S. Department of Energy–Joint Genome Institute (DOE-JGI), to sequence the genome of cassava; (2) development of two sets of 96 SSR markers from cassava ESTs that are being used in breeding programs in Africa (interestingly, 80% of these SSRs work in amplifying leafy spurge DNA); and (3) construction of cassava-specific oligo arrays through Agilent Technologies.2 It is still too early to know the full agricultural benefit from the original collaborative initiative, but many research groups are currently using these valuable genomic tools. It is evident that pooling resources and developing collaborative projects are essential for developing programs in weed genomics.
Weed Candidates for Genomics
The question of which or how many candidate weeds should be chosen for answering the fundamental and practical questions of interest to weed scientists is critical for developing a roadmap for genomic exploration of weed biology and ecology. A single weed that lends itself to genomic manipulation and that can be used to answer most of the questions of interest to weed scientists would be ideal for focusing funding and intellectual efforts, a strategy proven by the mouseear cress [Arabidopsis thaliana (L.) Heynh.] plant model. However, no single species can encompass all weedy traits. Instead of posing one or two model weeds that might be prescribed to the community of researchers and stakeholders, we will discuss the characteristics needed for weed candidates. It is clear that even if such a single-weed model existed, it is unlikely that mechanisms imparting weediness to any single species will have analogous mechanisms in all other weeds, given the diversity in weedy species. Thus, it is inevitable that more than one candidate weed is needed for developing a robust weed genomics program. In choosing candidate species for large-scale genomics research, several factors are important (Basu et al. 2004; Chao et al. 2005):
(1) Candidate weeds considered for development of a weed genomics program must have a foundation of previous research, providing critical preliminary data and demonstrating their feasibility as study systems. Recent research efforts by the weed science and invasive plant community provide an indicator of weeds amenable to further study (Figure 3).
(2) Candidate weeds must pose a large threat. Weeds that infest a broad range of habitats and that are troublesome over geopolitical boundaries are likely to inspire support for funding from multiple governmental and nongovernmental agencies. Fortunately for the quest to identify a limited number of candidate species, the world's most troublesome and well-studied weeds (Holm et al. 1997) also display a large number of classic weedy characteristics, with most exhibiting more than 70% of the 14 weediness traits described by Baker (1974).
(3) Candidate weeds must be amenable to genome-scale studies. Ideally, model species should have small genomes (See Figure 4) or genomes with significant synteny to sequenced genomes, permitting detailed comparative studies and inferences across study systems (Basu et al. 2004). Advances in genomic technologies have made it feasible to study plants with complex genetics, such as wheat (Triticum aestivum L.), but model species with small genomes will remain the most tractable and affordable systems for concerted research efforts.
(4) Candidate species should be easily manipulated through genetic transformation. Genetic transformation is a very useful tool for elucidating the links between genotype and phenotype (Figures 1 and 2) and has already proven useful in advancing weed science (Halfhill et al. 2007).
As stated above, any given model is unlikely to encompass all weedy traits in a manner perfect for genomic analysis. Numerous potential model species have previously been suggested based on the above or similar criteria (Basu et al. 2004, Chao et al. 2005). More recently, nearly 100 national and international weed scientists interested in exploring genomics of weeds were tasked with making a short list of candidates (WSSA 2008). Among the weeds considered were pigweed (Amaranthus L. ssp.), Johnsongrass [Sorghum halepense (L.) Pers.], leafy spurge, jointed goatgrass (Aegilops cylindrica Host), purple (Cyperus rotundus L.) and yellow nutsedge (Cyperus esculentus L.), common ragweed (Ambrosia artemisiifolia L.), nightshades (Solanum L. ssp.), and many others. The group came to consensus that a diverse suite of species are worth pursuing further. What follows are some examples.
Ryegrass (Lolium L. spp.; Poaceae) is among the best-studied weed genera (Figure 3) An extensive EST database (Sawbridge et al. 2003) and cDNA microarrays (Ciannamea et al. 2006) already exist for these weeds. Ryegrasses have widespread distributions (Charmet et al. 1996) and are problematic in numerous habitats, including agricultural, range, and recreational settings (Bossard et al. 2000). Ryegrasses have numerous weedy characteristics, such as varying levels of seed dormancy, high fecundity, ability to propagate by seed and tillers, potential for cross-species hybridization, and herbicide resistance (Basu et al. 2004). Lolium ssp. have genome sizes of about 4,067 Mbp (Figure 4; Evans et al. 1972) making them poor candidates for full genome sequencing, but synteny with, and genomic resources developed for other Poacea, make gene-space sequencing a possibility. Also, ryegrasses are generally small enough to grow and study in a limited laboratory facility. Finally, transformation systems have been developed for ryegrass, although the transformation is inefficient (Wu et al. 2005).
Canada thistle [Cirsium arvense (L.) Scop.; Compositae] is generally dioecious; however, some true hermaphroditic plants have been observed (Heimann and Cussans 1996 and references therein), and thus it presents an excellent model to study the effect of mating-system variation on invasiveness (Barrett et al. 2008). This species also reproduces vegetatively, offering another mode of reproduction for comparison and the opportunity to propagate clonal experimental plants. Various genomic resources, including extensive EST collections and microarrays, have been developed for related species in the Compositae (Barker et al. 2008) and should enable efficient development of genomic-based tools for Canada thistle. Canada thistle is a diploid with a genome size of about 1,519 Mbp (Figure 4; Bennett and Leitch 2003) Again, its large genome precludes it from being rapidly sequenced. Several members of the Compositae have been transformed (Malone-Schoneberg et al. 1994; Michelmore et al. 1987; Narumi et al. 2005; among others), but some species, such as sunflower, are very recalcitrant against transformation. Thistle (Cirsium Mill. spp.) transformation is unknown.
Canadian horseweed [Conyza canadensis (L.) Cronquist; Compositae] is a nuisance weed that is highly selfing. It was the first dicot weed known to evolve glyphosate resistance and has the most widespread distribution of glyphosate-resistant biotypes of all weeds. Although the genus has not been the focus of intensive research (Figure 3), it is becoming more of an agricultural concern because of its rapid and widespread resistance evolution. It is the most attractive weed for whole-genome sequencing because it has the smallest genome of all surveyed weeds: 335 Mbp (Figure 4; Peng, Yuan, Tranel, and Stewart, unpublished data). It is very amenable to genetic transformation (Halfhill et al. 2007) and would be amenable to reverse-genetics approaches. Its transcriptome has recently been sequenced using GS-FLX (454) technology, which produced 411,962 raw reads, averaging 233 bp, yielding a total data size of 95.8 Mb (Peng, Yuan, Tranel, and Stewart, unpublished data).
Pigweeds (Amaranthus spp.; Amaranthaceae) are the most cited (Figure 3) and, arguably, the most troublesome weed pests in many agricultural settings. In addition, they are rapidly evolving herbicide resistance, in some cases, becoming resistant to multiple herbicides in the same plant (Patzoldt et al. 2005) Most pigweeds are monoecious (bisexual individuals with unisexual flowers), but some species are dioecious. Agrobacterium-mediated transformation has been performed in Prince-of-Wales feather (Amaranthus hypochondriacus L.; Jofre-Garfias et al. 1997), but the most important weed species have not been transformed The genome sizes of Amaranthus spp. weeds are moderately sized, ranging from approximately 900 Mbp for Palmer amaranth (Amaranthus palmeri S. Wats.) to 1,400 Mbp for tall waterhemp [Amaranthus tuberculatus (Moq.) Sauer; Figure 4; Rayburn et al. 2005). Waterhemp genomic resources have expanded rapidly in the past 6 mo. A GS-FLX (454) genomic DNA run produced 160,000 sequencing reads with an average read length of about 270 nucleotides, yielding a total of about 43 Mbp (P. J. Tranel, unpublished data). A 454-transcriptome run yielded 483,225 raw reads, with an average length of 232 bp, and a total data size was 114.8 Mbp (P. J. Tranel, unpublished data).
As described above, successful inroads into weed genomics are already being made, and they clearly demonstrate that implementation of such programs will require the leveraging of resources from related species, especially crops, as well as extensive collaborations among researchers. Work on related species can provide useful genomic tools directly (Horvath and Clay 2007; Horvath et al. 2006b), as well as biological mechanisms that might apply across taxa (e.g., mechanisms regulating perennial dormancy in model plant species could be extended to the study of dormancy regulation in weeds). Importantly, both the leveraging of existing agricultural model species and the direct funding of weed genomics will require significant consumer and stakeholder input and support at the political level. Thus, it will be critical to raise awareness about the benefits of genomics for weed science.
A key factor that will influence the perceived benefits of weed genomics is economic: the value of the information gained compared with the cost of the work. Pooling resources from crop, weed, and other agricultural communities maximizes return on investment. Weed scientists have much to offer in the form of the compelling biological problems posed by weeds. Weeds encompass a wide range of biological traits that are both scientifically interesting and economically important. Moreover, although the study of traits such as herbicide resistance has obvious profitability within weed science alone, work on broader genomic analyses and other traits such as dormancy and allelopathy may be easier to justify for weed science if they illuminate the biology of other species of value. Reductions in sequencing costs are also improving the economics of weed genomics. Most sequencing conducted to date has used the relatively expensive Sanger technology (dideoxynucleotide sequencing),3 such that genome sequencing of a plant species required massive financial inputs. As next-generation sequencing (Box 1) becomes routine and increasingly efficient, it will be easier to bridge the technology gap between genomics and weed science.
Collaborations among groups with different research foci will capitalize on a broad array of resources, generate the large body of research needed to establish model species, and allow weed scientists to build teams that can synergize different types of expertise. We can expect molecular weed science laboratories to find synergies to that make rapid progress toward well-defined goals.
Small inroads are already being made in weed genomics; however, a critical mass of scientists is needed to fully realize its potential. Weed scientists must initiate collaborations with genomics-oriented researchers and bioinformaticians, bringing together disparate areas of expertise and leveraging a broader array of financial resources for these large projects. It is also imperative that funding agencies recognize that the use of genomic approaches focused on major weed species, although not a panacea, will offer novel solutions and provide tangible benefits to science. For basic biology, the study of weed genomics offers a window into the world of rapid plant evolution and stress physiology. For weed management, nontarget-site herbicide resistance, in particular, mechanisms of parasitism and allelopathy, and evolution of invasiveness are weed science issues ripe for genomic-level analyses. Fundamental knowledge of the genetic underpinnings of what makes a plant a weed will provide for new management strategies to mitigate the negative effects of weedy and invasive plants on food production and habitat destruction.
No single species will encompass all of the myriad weediness traits, nor could it serve to answer all of the weed management questions. Thus, we argue that weed genomics should not be limited to a single species. In fact, the highest return on investment could be realized with initiating parallel projects that span the range of weediness traits and plant genera that are important to agriculture and the environment. Subsequent comparative genomics approaches could lead to strong inferences of important weed mechanisms, such as herbicide resistance and dormancy.
Possibly the most significant mile marker on the road map to weed genomics involves the next generation of scientists: We must ensure that graduate students and postdoctoral researchers are instilled with enthusiasm for weed science and receive the breadth of training needed to conduct their own genomic analyses. Therefore, the onus is on current weed scientists to develop collaborative weed-genomics research projects that will serve as training grounds and on funding agencies to provide the needed financial support. Only after scientists are both comprehensively trained in genomics and knowledgeable of weed management issues will we fully get the immense benefits weed genomics has to offer.
We are grateful for funding from the USDA-NRI to hold the workshop on the evolution of weedy and invasive plant genomics as part of the 2008 annual meeting of the Weed Science Society of America and also to the WSSA for also providing financial support for this workshop. We express our appreciation to members of our laboratories who not only generated the data that make this new line of investigating possible, but who also have enabled us to have the time and inspiration to write such a article. Thanks to Mat Halter for rendering Figure 4.
 GS-FLX (454) next-generation sequencing platform, Roche, Grenzacherstrasse 124, CH-4070, Basel, Switzerland.
 Cassava oligo arrays, Agilent Technologies, Inc., 5301 Stevens Creek Blvd., Santa Clara, CA 95051.
Box 1. Tools in the Weed Genomics Toolbox
Bioinformatics refers to the handling and analysis of biological information using computers. It is the central tool in the genomics toolbox (Figure 2) because it is the means by which the large data sets inherent to genomic research are translated into biologically meaningful information. Numerous software packages are freely available or can be purchased for these purposes; however, some custom programming is almost always required to handle the data. Weed scientists entering genomics must either become comfortable with bioinformatics themselves or find ways to work closely with bioinformaticians or computer scientists (Larrinua and Belmar 2008). Good communication is imperative in these collaborations because the questions of interest to the researcher will guide how the data are acquired, organized, and analyzed.
Bioinformatics can include searching and acquisition of publicly available data, and extensive databases of gene sequences and expression patterns (such as Genbank and The Gene Expression Omnibus, both hosted by National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/) already exist, which could be a starting point for studying weed-related issues. Data obtained from genomic databases have been used by some weed scientists to design DNA microarrays for identifying global patterns of gene expression (Lai et al. 2008), for developing molecular markers for exploring evolution of biotypes (Kane and Rieseberg 2008), for mapping to identify genes responsible for specific phenotypes, and as resources for annotating sequences based upon similarity to model species and the burgeoning data from nonmodel species being deposited in databases.
Molecular markers are making a wide variety of contributions to weed science (Bodo Slotta 2008), and some types of these molecular markers are particularly suited to the high-throughput tracking of variation across a genome.
Amplified Fragment Length Polymorphisms (AFLPs). AFLPs determine variation in the length of randomly amplified regions of the genome after restriction fragmentation. Variation at a large number of loci is scored simultaneously by electrophoresis of a subset of fragments.
Single Nucleotide Polymorphisms (SNPs). SNPs determine variation in the nucleotide present at a single base-pair location. After identifying these loci through sequencing of multiple individuals, a wide variety of techniques are available for rapidly screening large numbers of loci and individuals.
Simple Sequence Repeats (SSRs): SSRs (also known as microsatellites) are regions of the genome where motifs of a few bases (two or three) are repeated a variable number of times. Polymorphism is apparent as variation of the length of the region, easily scored by electrophoresis after polymerase chain reaction (PCR) amplification of the SSRs.
Single Feature Polymorphisms (SFPs): Sequence variation is revealed by variation among individuals in the ability of their DNA to hybridize to a particular location on a tiling array (see below).
Molecular maps describe how molecular markers are positioned relative to one another in the genome and how individual markers or sets of markers relate to phenotypes. These allow us to understand the genetic basis of weediness traits and how genome structure and its evolution have affected weed biology.
Genetic Maps. Genetic maps show the relative positions (order) of genetic markers in the genome, as determined by their linkage (recombination distances) to one another.
Quantitative Trait Loci (QTLs). QTLs are genomic regions associated with a particular phenotypic effect, as defined by a set of marker loci, which correlate with the phenotype. QTL mapping relies upon controlled crosses to generate known relationships among individuals, which yield known probabilities that markers are identical by descent and which permit their association with shared phenotypes among individuals.
Association Mapping. Association mapping allows identification of QTLs in individuals with unknown pedigree by using genetic similarities to infer probability of identity by descent. Because genotypes used for association mapping are less closely related than those used for QTL mapping, much finer-scale mapping is typically possible.
Physical Maps: The physical distances (bp) between genetic markers are shown in physical maps. These may include full sequence information for a region, for instance through the use of bacterial artificial chromosomes (BACs), where regions of several thousands of base pairs are cloned into bacteria and fully sequenced.
Map-Based Cloning. Map-based cloning allows identification of a candidate gene of interest through physical mapping and sequencing within a region defined by markers in a genetic map.
DNA sequences generate direct insights into the genetic makeup and evolution of weeds, as well as indirect information useful for most of the genomic tools listed here.
Whole-Genome Sequencing. The sequencing of the genome in its entirety, including noncoding regions as well as transcribed and untranscribed coding regions provides a level of detail that facilitates assembly of resequencing data for population genomics, discovery and annotation of coding regions, identification of promoter sequences for particular genes of interest, and development of molecular maps, map-based cloning, and marker development. These data will be particularly integral to the identification of common regulatory elements from promoters associated with clusters of coordinately expressed genes.
Expressed Sequence Tags (ESTs). ESTs are sequences of transcribed genes (mRNAs) present in a particular sample or pool of samples. ESTs are an efficient way to focus in on the transcribed portion of the genome and can be used to study gene evolution, to obtain molecular markers and variation for population genomics, and to generate probes for microarray development.
Gene Space Sequencing. Gene space sequencing is sequencing of the low-copy, gene-rich portion of the genome (Barbazuk et al. 2005). A filtration approach is used to enrich a sample for gene space and then the sample is sequenced (Emberton et al. 2005; Whitelaw et al. 2003). Gene space sequencing is similar to expressed sequence tagging but with a higher proportion of genes discovered, particularly large genes and those that are weakly or rarely expressed (e.g., transcription factors and disease resistance genes). Gene space sequencing also yields information on promoters, introns, and other nonexon sequences that are critical to analyses of gene structure and evolution.
Next-Generation Sequencing (NGS). NGS methods are recently developed, high-throughput alternatives to traditional sequencing, offering simultaneous sequencing of hundreds of thousands of short regions of a DNA sample (Mardis 2008). Traditional (Sanger) sequencing returns a single sequence of up to about 1,000 bp/sample. NGS methods involve finer fragmentation of a large sample of DNA, distribution of those fragments across a slide or the plate of microscopic wells, and simultaneous sequencing of the fragments. These techniques can be used for any of the above genomic-sequencing approaches, at a significantly reduced cost per project relative to Sanger sequencing, although shorter read lengths limit their ability to provide complete de novo sequencing of repetitive (low complexity) regions of the genome. NGS methods can also be performed such that the number of reads for a particular sequence correlates with its frequency in the sample; sequencing of a library of expressed sequences in this way provides both sequence and a measure of expression levels.
Microarrays are slides printed with a set of cDNA or oligonucleotide probes, to which corresponding sequences in a sample of expressed sequences (a transcriptome) will anneal (hybridize). Samples are fluorescently labeled, such that hybridization intensity correlates with expression level for a single sample or color distinguishes relative expression levels when two samples are labeled with different dyes and pooled to compete for the same sites on an array. Microarrays allow efficient comparisons of transcriptomes across species, populations, tissues, or plants grown under various conditions (see Lee and Tranel 2008).
It has been suggested that NGS (above) might render microarrays obsolete (Shendure 2008). However, because of library construction costs for sequencing, microarrays are still the most cost-effective method. High-throughput companies (NimbleGen, Agilent, Affymetrix, etc.) now offer to generate the arrays at little or no cost and provide a fixed-fee schedule to run the experiments and supply the scientist with transcriptome expression data.
cDNA Arrays. cDNA array probes are PCR-amplified clones from a cDNA library, with probes up to a few thousand base pairs in length. These long probes offer the potential for hybridization across different species with divergent homologous sequences, making the arrays useful for multiple species and for direct cross-species, competitive hybridization experiments.
Oligo Arrays. Oligo array probes are synthesized oligonucleotides. Long probes (about 70 bp) are similar to cDNA arrays but offer increased accuracy and reproducibility. Short-probe arrays (about 20 bp) are available for single-sample experiments and offer higher specificity, with the potential to differentiate among members of gene families.
Tiling Arrays. Tiling array probes are short oligonucleotides designed to cover the entire genome or contigs of interest. Depending on the experiment and the length and overlap of the probes, tiling arrays can be used to examine details of expression variation, transcription factor binding sites, copy number variation, or DNA methylation, or to map transcriptomes in sequenced genomes (Liu 2007).
Functional characterization involves a set of tools designed to help us understand the function, regulation, and phenotypic effects of particular loci and alleles.
Real-Time Reverse-Transcriptase PCR (real-time RT-PCR). Real-time RT-PCR quantifies PCR products by florescence after each amplification cycle, also known as quantitative PCR. This permits direct assessment of the quantity of a particular mRNA transcript in the tissue of interest (see Chao 2008; Yuan et al. 2006).
Transgenic Overexpression. Transgenic insertion of a gene and its promoter into an individual induces increased expression of that gene, demonstrating the phenotypic effects of its up-regulation.
Gene Knockdowns. Gene knockdowns are reductions in gene expression through either genetic modification or treatment with an oligonucleotide that interferes with gene or mRNA function. These demonstrate the effects of down-regulation or loss of function of the gene of interest.
Mutagenesis. Induction of mutations demonstrates the effects of allelic variation or loss of function in a gene.