We have sequenced the complete plastid genome of the fern Angiopteris evecta. This taxon belongs to a major lineage (marattioid ferns) that, in most recent phylogenetic analyses, emerges near the base of the monilophytes. We used fluorescence activated cell sorting (FACS) to isolate organelles, rolling circle amplification (RCA) to amplify the plastid genome, followed by shotgun sequencing to 8X depth coverage, and then we assembled these reads to obtain the plastid genome sequence. The circular genome map has 153,901 bp, containing inverted repeats of 21,053 bp each, a large single-copy region of 89,709 bp, and a small single-copy region of 22,086 bp. Gene order is similar to that of Psilotum. Several unique characters are observed in the Angiopteris plastid genome, such as repeat structure in a pseudogene. We make structural comparisons to Psilotum and Adiantum plastid genomes. However, the overall structural similarity to Psilotum indicates either wholesale conservation of genome organization, or (less likely) repeated convergence to a stable structure. The results are discussed in relation to a growing comparative database of genomic and morphological characters across the green plants.
Vascular plants first appear in the fossil record during the Silurian (Kenrick and Crane, 1997; Pryer et al., 2004a; Stewart and Rothwell, 1993). Although many major lineages are extinct, recent phylogenetic studies (Pryer et al., 2001) indicate that an early split resulted in two extant lineages: seed plants and monilophytes. The latter includes the leptosporangiate ferns, marattioid ferns, horsetails, and a clade that includes eusporangiate ferns and whisk ferns. How these four lineages are related to each other is still poorly understood (Pryer et al., 2001; Pryer et al., 2004b; Wilkström and Pryer, 2005). Resolving these phylogenetic nodes is important for understanding the evolution of morphological, genetic, and developmental systems in monilophytes. As part of an effort to provide data for addressing this issue, we sequenced the complete plastid genome of Angiopteris evecta (Marattiaceae). Currently, complete plastid genome sequences are available from only one leptosporangiate fern, Adiantum capillus-veneris L. (Wolf et al., 2003), and from only one other monilophyte, Psilotum nudum (L.) P. Beauv., whereas about 50 seed plant plastid genomes are currently in GenBank. Complete genome sequences can provide information on many levels, including genome structure, gene content, intron content, and nucleotide sequences of targeted regions. We chose Angiopteris evecta for our study because it is an easily available representative of an ancient lineage for which no plastid genome has been sequenced. Extant marattioid ferns include about 240 species (see Pryer et al., 2004b) typically treated in four genera and one family (Smith et al., 2006). Marattioid ferns first appeared in the middle carboniferous, and fossils assignable to the extant genus Marattia date to the late Triassic (Hill and Camus, 1986). Thus marattioid ferns represent a clade as significant as seed plants or leptosporangiate ferns in terms of age, though not in terms of extant diversity.
Although the plastid genome is generally conserved in overall structure among land plants (Palmer, 1985), there is often sufficient variation for comparative analysis both at the structural and sequence levels. Large rearrangements, spanning several genes, are likely to be rare events that can be used a phylogenetic markers (Raubeson and Jansen, 1992). Early studies of fern chloroplast genomes uncovered a wealth of phylogenetic data and insights into the evolution of the genome (Hasebe and Iwatsuki, 1992; Raubeson and Stein, 1995; Stein et al., 1992; Stein et al., 1989). One significant finding from these studies was that a large portion of the plastid genome has been rearranged in ferns, but the exact series of events has not yet been fully characterized. Subsequently, there was a shift to more focused studies on DNA sequences of a few genes from large numbers of taxa (Hasebe et al., 1994; Hasebe et al., 1995; Pryer et al., 2004b). Thus, our understanding of structural evolution of fern plastid genomes remains limited. This study represents part of a broader investigation into plastid genome evolution by sequencing complete genomes or large portions thereof. Because Angiopteris represents a major lineage, details of its plastid genome can provide baseline data for this and other studies. Our objective here is to present the plastid genome sequence of Angiopteris evecta and compare it structurally to other monilophytes.
Materials and Methods
Preparation and Dna Sequencing
Pinnules from an immature crozier of A. evecta were collected from a plant growing at the University of Washington, Seattle, WA, USA (original source unknown). Voucher specimens (UC 1794629, 1794630, and 1794631) are deposited at the University of California Herbarium at Berkeley (UC). We collected purified fractions of intact chloroplasts from A. evecta by fluorescent activated cell sorting (FACS). One hundred milligrams of fresh frond tissue was sliced into 0.25–1 mm segments in a sterile plastic Petri dish (on ice) in 1.0 mL of an organelle isolation solution containing 0.33 M sorbitol, 50 mM HEPES at pH 7.6, 2 mM EDTA, 1 mM MgCl2, 0.1% BSA, 1% PVP-40, 1.5 M NaCl and 5 mM β-Mercaptoethanol, adjusted to pH 7.6 with KOH. Suspended organelles (chloroplasts, mitochondria, and nuclei) were withdrawn using a wide-bore pipette then filtered through 30 µm nylon mesh. Organelles were then stained with DAPI (Sigma-Aldrich, St. Louis, MO, USA) and Mitotracker Green (Molecular Probes Inc., Eugene, OR, USA) at final concentrations of 2 µg/mL and 100 nM, respectively. The organelle suspension was incubated on ice for 15 min then analyzed on a FACS DiVa using sterile phosphate buffered solution (Invitrogen Inc., Carlsbad, CA, USA) as sheath fluid. We used a Coherent INNOVA Enterprise Ion laser (Coherent, Inc., Santa Paula, CA, USA) emitting a 488 nm beam at 275 mW to excite chlorophyll and Mitotracker Green, and a UV beam at 30 mW to excite DAPI. Red fluorescence from chlorophyll was passed through a 675±20 nm filter, held within the FL3 photomultiplier tube (PMT). Green fluorescence from Mitotracker Green was passed through a 530±30 nm filter held within the FL1 PMT. DAPI fluorescence from DNA was passed through a 424±44 nm filter held within the FL4 PMT. Organelles were collected into separate sterile 15 ml centrifuge tubes by flow cytometric sorting based on the respective sorting gates. Sorted organelles were pelleted at 3000 g for 15 min, flash frozen in liquid nitrogen, and shipped frozen for DNA isolation and amplification.
The DNA preparation was processed for sequencing by the Production Genomics Facility of the DOE Joint Genome Institute (JGI). Template was first amplified via rolling circle amplification (RCA) with random hexamers (Dean et al., 2001). The RCA product was mechanically sheared into random fragments of about 3 kb by repeated passage through a Hydroshear device (Genemachines, San Carlos, CA, USA). These fragments were then enzymatically repaired to ensure blunt ends, then purified by gel electrophoresis to select for a narrow distribution of fragment sizes. Fragments were ligated into dephosphorylated pUC18 vector and transformed into E. coli to create plasmid libraries, using standard techniques (Sambrook et al., 1989). Automated colony pickers were used to select colonies into 384-well plates containing LB medium. After overnight incubation, a small aliquot was processed robotically by RCA of plasmids (Dean et al., 2001), then used as a template for DNA sequencing using Big-Dye chemistry (Applied Biosystems, Foster City, CA, USA). Sequencing reactions were cleaned using SPRI (Elkin et al., 2001) and separated electrophoretically on ABI 3730XL or Megabace 4000 automated DNA sequencing machines to produce a sequencing read from each end of each plasmid.
Assembly and Annotation
Sequences were processed using Phred (Ewing and Green, 1998; Ewing et al., 1998), trimmed for quality, screened for vector sequences, and assembled using Phrap. Quality scores were assigned automatically, and the electropherograms and assembly were viewed and verified for accuracy using Consed 12 (Gordon et al., 1998). As is typical, manual input was required to reconstruct part of one of the inverted repeat regions, since automated assembly methods cannot recognize these as different. Regions of low quality or coverage and several gaps were reamplified by PCR and then sequenced. We designed primers from the ends of the longest contigs and used Proofstart long-PCR (QIAGEN, Valencia, CA, USA) to amplify the missing regions. Reagent concentrations and amplification conditions followed the manufacturers instructions and we used PCR extension times of 1 min./kb of estimated PCR product. PCR products were digested with Tsp409I (compatible overhang with EcoRI) and Sau3aI (compatible overhang with BamHI). The fragments were separated in agarose, visualized, and cut from the gels. These fragments were then cloned into puC19, end-sequenced, and added to the previous assembly. If assembly of a gap was incomplete at this stage, then primers were designed from the subclone fragment sequences above and used to sequence the appropriate region using the earlier long-PCR product as a template. In this way we closed all 12 gaps. The final assembly has an average depth of coverage of 8X. We assembled the sequence as a circular genome with two copies of the inverted repeat. We annotated the genome using DOGMA (Dual Organellar GenoMe Annotator) (Wyman et al., 2004). Genes were located by using a database of previously published chloroplast genomes, from which Blast searches (Altschul et al., 1997) are used to find approximate gene positions. From this initial annotation, we located hypothetical starts, stops, and intron positions based on comparisons to homologous genes in other chloroplast genomes. We also took into account the possibility of RNA editing, which can modify the start and stop positions (Kugita et al., 2003). We examined the plastid genome sequence for repeat structure using the program REPuter (Kurtz et al., 2001). We set the minimum repeat size to 20 and analyzed the sequence with only one copy of the inverted repeat.
Results and Discussion
The plastid genome of Angiopteris evecta has 153,901 bp, with inverted repeats (IRA and IRB) of 21,053 bp each, a large single-copy (LSC) region of 89,709 bp, and a small single-copy (SSC) region of 22,086 bp (Fig. 1). The sequence and annotation is deposited in GenBank as accession number DQ821119. During annotation of the genome, we located the repertoire of genes that is typical of land plant plastid genomes. The overall organization of the Angiopteris plastid genome is typical of other vascular plants and most similar to that of Psilotum nudum among plastid genomes sequenced to date. Some of the differences between Angiopteris and Psilotum are possibly a function of autapomorphies in either lineage, but this cannot be determined until more plastid genomes are examined. For example, Psilotum lacks three genes (chlL, chlN, and chlB), for subunits of protochlorophyllide, an enzyme involved in the light-independent formation of chlorophyll. These three genes are found in most other plastid genomes, including Angiopteris. The ends of the IR also vary considerably among vascular plants. Psilotum differs from Angiopteris in that the SSC-IR boundary in the former is near trnL-UAG and the SSC extends from ccsA to ycf1, whereas in Angiopteris the SSC is longer and extends from ndhF to chlL (Fig. 2). Gene order at the LSC-IR boundary of Angiopteris is very similar to that of Psilotum, differing only in the sizes of intergenic regions rather than gene positions (Fig. 3). The overall gene order within the IR is similar to that of seed plants and Psilotum, consistent with the hypothesis that this region sustained several rearrangements at some time during the diversification of leptosporangiate ferns (Hasebe and Iwatsuki, 1992; Stein et al., 1992). An inversion of about 3Kb, involving psbD, psbC, and psbZ, was previously detected in Psilotum and Adiantum relative to other land plants, and more recently documented in the plastid genome of Equisetum (K. Karol, personal communication). This inversion is also seen in Angiopteris, thus providing a potential phylogenetic marker for the monilophyte clade.
Another region of interest is in the LSC between rpoB and psbZ. This region has the same gene order in Psilotum and Angiopteris so it is likely to be an ancestral monilophyte organization. The Adiantum gene order differs from that of Angipoteris and Psilotum in this region. However the gene order difference cannot be explained by a single inversion. Instead, at least two overlapping inversions are required to explain the variation. Fig. 4 presents two alternative most-parsimonious pathways from a putative ancestral monilophyte gene order to that of Adiantum. Analysis of this region from several clades of leptosporangiate clades may help determine which sequence of events occurred.
One gene that we have not annotated is that for the hypothetical protein ycf68. Although found in several land plant plastid genomes, this gene is usually not annotated, perhaps because it is a relatively short reading frame (approximately 600 bp) and its function is unknown. In Angiopteris it is located in the IR at positions 104265–104639 and 139346–138972. However, there are at least three frameshifts, suggesting that ycf68 is a pseudogene.
The Angiopteris plastid genome contains several regions with repeat structure. Results from the analysis by REPuter revealed two main regions of long repeats (more than 20 bp). We found an 817 bp direct repeat within the region annotated as the pseudogene for the hypothetical protein ycf1 in the SSC, as well as a 352 bp string with a 95% similarity to the reverse complement also in the same region. Either several duplications or inversions resulted in ycf1 becoming a pseudogene, or its loss of function lifted selective constraints against such structural rearrangements. The remaining repeat regions were all at the beginning of the IR between trnI and trnL. This region is highly variable in several plastid genomes, probably due to the creation of partial genes during expansion and contraction of the IR (Goulding et al., 1996; Palmer, 1991).
We found no stop codons within otherwise open reading frames and no other obvious indications that RNA editing would be required. The ycf1 pseudogene was too drastically different from heterologous ycf1 sequences to explain the differences by RNA editing. However, absence of evidence is not evidence of absence; RNA editing can only be tested by sequencing cDNAs. This has only been done systematically for two chloroplast genes (ndhB and rbcL) in all major lineages of land plants (Freyer et al., 1997). The complete set of transcripts from chloroplast genes has only been examined in one liverwort, Anthoceros, (Kugita et al., 2003) and one leptosporangiate fern, Adiantum capillus-veneris (Wolf et al., 2004), for which 350 RNA edited sites were detected. Thus, it remains unclear whether high levels of RNA editing are derived or ancestral within monilophytes.
Why have the plastid genome structures of Angiopteris and Psilotum remained so constant over such a long period of evolutionary time? The most-recent common ancestor of Angiopteris and Psilotum probably lived over 400 million years ago (Pryer et al., 2004a). Plastid genome structure has evolved rapidly in several younger clades such as Geranium (Palmer et al., 1987a) and Campanulaceae (Cosner et al., 2004). Some events have been correlated with loss of structural stability, such as loss of the inverted repeat (Palmer et al., 1987b). Clearly, plastid genome structure does not evolve in a clock-like manner. In fact, it is for this reason that structural changes can provide useful phylogenetic markers. Gene order can take on many possible states whereas DNA sequences have only four states. Thus, structural changes are more complex than nucleotide substitutions: reversion to an ancestral gene order is unlikely compared to reversion to an ancestral base in the DNA sequence. Long evolutionary branches have, on average, more opportunity to accumulate changes. However, the non-clock-like nature of structural changes provides a chance for them to become phylogenetic markers on short branches where signal is weak in DNA sequence data. The conservation of plastid genome structure between Angiopteris and Psilotum is either a function of long term stability in both lineages, or independent (and perhaps repeated) convergence to a more stable structure and gene order. Distinguishing these hypotheses can only be achieved with more genome structural data from additional clades, and such information is needed if we are to understand more about the levels of homoplasy for structural genomic characters.
If the plastid genome structure of Angiopteris has indeed remained constant since the origin of monilophytes, this would correlate with other evolutionary trends in the Marattiaceae. Analysis of rates of molecular evolution for nuclear and plastid genes revealed reduced substitution rates in both the Marattiaceae and in the tree fern clade (Soltis et al., 2002). If genome structure has also been evolving slowly in Marattiaceae this would imply a correlation of evolutionary rates for morphology, DNA sequences, and genome structure. Testing for such a correlation would require more data on genome structure, including nuclear genomes, from more taxa.
Angiopteris represents an ancient lineage whose affinities to other monilophytes is currently unresolved. Most analyses of DNA sequence data suggest a sister relationship of Marattiaceae to Equisetaceae, the horsetails (Wilkström and Pryer, 2005). If so, this would be an ancient clade, with little signal remaining. Data are forthcoming on the plastid genome of Equisetum, in addition to the mitochondrial genomes of Angiopteris and Equisetum (K. Karol, personal communication), and it is hoped that additional phylogenetic information will soon be provided.
The circular diagram depicted in Fig. 1 is, like all such genome maps, a visual representation of something far more complex. One unusual feature of plastid genomes is that the LSC and SSC have alternative orientations relative to the IR within a single organelle (Palmer, 1983). This so-called flip-flop recombination has also been documented for plastid genomes in the fern Osmunda (Stein et al., 1986). Thus, the relative orientations of the LSC and SSC in any map are arbitrary. Furthermore, experiments with native chloroplast genomes indicate that, at least in some situations, most molecules are linear and some even branched, with few displaying the more familiar circular structure depicted in most maps (Oldenburg and Bendich, 2004).
We provide here the first complete plastid genome sequence from the marattioid clade of plants. Availability of this sequence can enable researchers to design conserved primers to PCR-amplify and sequence new genomic regions that could provide useful phylogenetic information not available from the array of regions usually studied in ferns (Small et al., 2005). In addition, the structural details of the Angiopteris plastid genome join a growing database from other green plants. Ultimately such data can be used to infer phylogeny as well as help understand evolutionary process at both the sequence and genome structural levels.
Thanks to Aru Arumagnathan for FACS expertise, Tiffany Sorrentino for lab assistance, Norm Wickett for help with drawing the genome map, and M. Winston Ellis and Carol Rowe for comments on the manuscript. Aaron Duffy assisted with the structural analysis and two anonymous reviewers made helpful suggestions to improve the manuscript. This research was supported by the Green Tree of Life Project, funded by the National Science Foundation: http://ucjeps.berkeley.edu/TreeofLife/. Work at the Joint Genome Institute was performed under the auspices of the U.S. Department of Energy's Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Berkeley National Laboratory under Contract No. DE-AC02-05CH11231.