Genome sequences and genome-wide transcript profiles are becoming increasingly available, opening a way to use this information in analyzing how groups of genes are connected in pathways or “regulons” that might explain how organisms accomplish the integration on an organismal level. We have begun to explore the large datasets that are available for transcripts of the best characterized plant model, Arabidopsis thaliana, setting up a gene network using clustering methods. A network, based on the Graphical Gaussian Model (GGM), describes coregulation of genes under a variety of external factors: abiotic, biotic, and chemical treatments. In its present structure, the network reveals coregulation for more than 7,000 genes in the Arabidopsis genome. The network appears to be particularly suited to reveal the regulatory structure of biochemical pathways and environmental stress responses. Examples describe network predictions centered on a trehalose-6-phosphate phosphatase, an Arabidopsis response regulator and EPSPS. Results from the statistical analysis and bioinformatics of large data sets provide hypotheses that must be checked by additional studies. However, networks, which should be expanded from transcripts to also include proteins and metabolites, can be expected to explain not only how the Arabidopsis gene network is structured, but also provide insight in how similar networks in weed species might deviate or show correspondence and overlap.
Nomenclature: Arabidopsis, Arabidopsis thaliana (L.) Heynh