The California Phenology Thematic Collections Network (CAP TCN) is a collaborative project that seeks to maximize the value of herbarium specimens and their data, especially for understanding changes in plant phenology due to anthropogenic climate change. The project unites personnel in herbaria at California universities, research stations, natural history museums, and botanic gardens with the goal of capturing images, transcribing label data, and producing georeferenced coordinates of nearly one million preserved plant specimens collected over the past 150+ years. Each digitized specimen will also be scored for its phenological status—the stage of growth and reproduction of the specimen such as flowering or fruiting. The CAP TCN is developing efficient workflows and data standards necessary to collect, store, and analyze trait data from specimens to ensure their utility for research and other applications. These novel resources and data will enable powerful research in phenology and other topics in the California Floristic Province biodiversity hotspot and beyond.
Dried, pressed plant specimens preserved in herbaria have been essential sources of biodiversity data for centuries (Lavoie 2013). In the modern information era, data from specimens are being used for an increasing number of purposes, for example, documenting plant distributions in time and space and allowing for comparisons of plant morphology, biochemistry, and genetic variation among individuals, taxonomic groups, and lineages (Pyke and Ehrlich 2010; Lavoie 2013; Thornhill et al. 2017; Meineke et al. 2018a; Lang et al. 2018). Herbarium specimens have also proven critical in areas of major societal concern, such as understanding the effects of anthropogenic change due to pollution (Peñuelas and Filella 2002; Zschau et al. 2003), land-use change (Case et al. 2007), and climate change (Calinger et al. 2013; Wolf et al. 2016; González-Orozco et al. 2016), among other topics. For example, spatiotemporal specimen data have enabled researchers to track the spread of economically important invasive species (Chauvel et al. 2006), identify changes in the distributions of native species (Farnsworth and Ogurcak 2006; Case et al. 2007), and establish conservation priorities for vulnerable taxa (Kling et al. 2018).
Herbarium digitization—capturing standardized label data and high-resolution images of herbarium specimens—makes large amounts of high-quality data readily and globally available online, which accelerates research and facilitates the development of new research tools and methods (Elith et al. 2006; James et al. 2018; Pearson 2018). Specimen images, in particular, have opened doors to new avenues of research, including automated species identification of herbarium specimens (Carranza-Rojas et al. 2017) and the detection of shifts in plant phenological events in response to climate change (Willis et al. 2017). Digital specimen images may also accelerate characterization of other plant features such as evidence of herbivory (see Meineke et al. 2018b) or disease (see Antonovics et al. 2003).
Despite its importance for advancing research, herbarium specimen digitization remains an enormous task for the world's herbaria. In California, herbarium digitization has been underway since the early 1990s and was accelerated with the establishment of the Consortium of California Herbaria in 2003 (CCH; http://ucjeps.berkeley.edu/consortium/about.html), which has since grown to include 2.2 million specimen records from 40 institutions. Even with these efforts, digitization is far from complete in the state's herbaria; hundreds of thousands of specimens remain in analog format only, and only 7% of currently digitized California specimens have been imaged as of March 2019, according to the national data aggregator iDigBio (idigbio.org). While label data alone can be used to address certain scientific questions, high-resolution images of herbarium specimens are necessary for verifying taxon identification and to provide data regarding plant traits that can be scored upon visual inspection, including plant size, vegetative or floral herbivory, evidence of pathogens, morphology, and precise reproductive status.
The California Phenology Thematic Collections Network (CAP TCN; https://www.capturingcaliforniasflowers.org/) was established by a grant from the Advancing Digitization of Biodiversity Collections (ADBC) program of the United States National Science Foundation. The CAP TCN aims to generate nearly one million high-resolution images of herbarium specimens from 22 California institutions. Each specimen record will consist of an image, transcribed label data, georeferenced coordinates when possible, and phenological data. To share these data widely, the CAP TCN has established a new web-accessible database system (cch2.org) available for use by researchers, land managers, educators, and any member of the public. Along with specimen collection dates, the reproductive status (e.g., the presence or number of unopened or open flowers, inflorescences, or mature or dehiscing fruits) of all imaged specimens will be scored and comprise the basis of a significant phenological dataset that can be used to study the effects of climate change on the seasonal cycles of plants in California, a biodiversity hotspot.
California has the most diverse native flora of any state in the U.S., containing more than one third of all U.S. plant species. The state is considered a biodiversity hotspot due to its high diversity, high number of endemic taxa, and major threats (Raven and Axelrod 1978; Myers et al. 2000; Baldwin et al. 2012, 2017). The state's flora includes nearly 7,700 minimum-rank taxa (including species and infraspecific taxa), of which 6,572 (85%) are native and 2,303 (30%) are endemic (Jepson Flora Project 2019). This diverse and highly endemic flora is also highly endangered; the California Native Plant Society classifies about one third of the state's native taxa as taxa of special conservation concern (CNPS 2018), and nearly 4% of taxa are state or federally listed as endangered, threatened, or rare (CDFW 2019). In this context, rapid land use changes and anthropogenic climate change pose a heightened threat to the California flora. The rising temperatures predicted by climate models are already being observed in California (Parmesan and Yohe 2003; Kelly and Goulden 2008), and this change is impacting the state's plants (Rapacciuolo et al. 2014). Understanding how plant species, populations, and communities change with time and space across the state is critical to directing conservation efforts, land management, and future scientific inquiry. By producing thorough, high quality data records, georeferenced coordinates, and images of nearly one million herbarium specimens, the CAP TCN digitization project will greatly advance our understanding of the state's flora.
Investigating changes in plant phenology, the timing of plant growth and reproduction, is a key application for the data produced by the CAP TCN. Phenological change is one of the most significant and widely recognized effects of climate change (Walther et al. 2002; Parmesan and Yohe 2003; Calinger et al. 2013; Willis et al. 2017), and such change may pose a heightened threat to the California flora (Loarie et al. 2008). Numerous ecological functions depend on plant phenology at multiple levels of biological organization, from individuals to ecosystems. Phenology not only affects the individual fitness of plants, but also the fitness of organisms that depend on plants, such as mutualistic pollinators and seed-dispersers or antagonistic herbivores and parasites (Visser and Both 2005; Both et al. 2006). This in turn can affect population-level processes such as population growth, mating patterns, gene flow, and evolution (Franks and Weis 2009; Ozgul et al. 2010; Anderson et al. 2012). For example, temporal mismatches between plants and pollinators can extirpate local populations of both members of a mutualistic pair, cause rapid evolutionary shifts, and result in billions of dollars of agricultural losses (Memmott et al. 2007; McKinney et al. 2012; Kudo and Ida 2013; Miller-Struttmann et al. 2015). As climate change progresses, phenology-dependent interactions between plants and their mutualists and antagonists will likely change with unknown consequences for biodiversity or agricultural systems (Encinas-Viso et al. 2012; Matthews and Mazer 2016). Understanding changes in plant phenology is important not only to improve our understanding of—and our ability to forecast—ecological change but also to address practical environmental problems in both agricultural and natural settings.
Herbarium specimens preserve invaluable phenological data for thousands of plants across time and space. Although most herbarium specimens are not generally collected with the purpose of conducting phenological research, the phenological status of a specimen can usually be ascertained from reproductive structures visible on the sheet. The use of herbarium specimens to track the relationship between local climatic conditions and the collection dates of flowering specimens has a relatively short history (Willis et al. 2017). Nevertheless, several herbarium-based studies have corroborated the link between phenological events and climate change that was first observed in long-term, place-based studies (Primack et al. 2004; Lavoie and Lachance 2006; Davis et al. 2015; Willis et al. 2017), despite known geographic, temporal, and taxonomic biases of herbarium records (Daru et al. 2017). These studies have improved our understanding of narrow- and broad-scale phenological shifts among many taxa and in many regions across the globe (Willis et al. 2017). They have also elucidated the specific advantages of herbarium specimens for phenological research, such as filling gaps in long-term or observational data sets for a period of time (Meyer et al. 2016; Willis et al. 2017), underrepresented regions (Li et al. 2013), and threatened or rare taxa (Robbirt et al. 2011).
Digitizing California's herbaria will unlock long-term phenological records for nearly a million specimens, some of which date back to the late 1800s. With these data, researchers will be able to generate an unprecedented picture of the relationship between phenology and climate change in California and how this relationship varies with different taxa and phenophases (phases within a phenological event, such as full flowering or end of flowering). The phenological data generated by this project will help answer questions such as: (1) To what extent can observed phenological sensitivities to climate conditions be generalized among, for example, congeneric species, confamilial genera, or distinct families? (2) Are the flowering times of individual taxa more strongly associated with climate normal values (mean climate values over 30 years), suggesting that flowering time has evolved in response to long-term climatic conditions, or with climate conditions in the single year or season preceding the collection date? (3) For which species, genera, families, vegetation types, or functional groups do multivariate phenoclimate models best predict flowering times? (4) Which habitats and vegetation types are most phenologically sensitive to changes in precipitation and temperature? (5) Do different functional groups (e.g., evergreen versus deciduous taxa, annual herbs, geophytes) differ in their responses to long-term changes in temperature and precipitation? (6) Are rare species more (or less) phenologically sensitive to climate than widespread species? (7) Do the phenological sensitivities of species occupying highly water-limited habitats (e.g., deserts, serpentine outcrops, south-facing slopes) differ from those of species occupying more mesic habitats? and (8) In systems for which we have phenological data on pollinators, pathogens, and pests, where might phenological mismatches occur between flowering plant species, including agricultural plants, and these interacting taxa? In addition, with more robust assessments of the phenological status of herbarium specimens, along with historical and contemporary climatic data available online through PRISM (Daly et al. 1994, 2008) and ClimateNA (Wang et al. 2016), researchers will be poised to generate novel predictions concerning the effects of upcoming climate change on the seasonal cycles of individual California plant species and the communities that they constitute (cf. Park et al. 2019; Park and Mazer 2018, 2019).
Phenological research—though important in itself—is only the beginning of the many applications for digital specimen images, and the potential for these specimens and their associated data to enhance our understanding of ecology, systematics, evolution, biodiversity, and the effects of anthropogenic change is rapidly increasing. The CAP TCN empowers this research, unites and supports existing data-providers, and makes new resources available for the exploration of natural history collections.
The CAP TCN currently comprises 22 California institutions: 11 California State University campuses, seven University of California campuses, two botanic gardens, one natural history museum, and one California Department of Parks and Recreation research station (Fig. 1). Over the duration of this project, hundreds of undergraduates and members of the public will engage with California's herbaria and learn about the importance of natural history collections. The project is directed by California Polytechnic University, San Luis Obispo (OBI) with additional leadership from UC Berkeley (UC/JEPS) and UC Santa Barbara (UCSB). Despite ambitious digitizing efforts among California herbaria in the past, prior to the CAP TCN, only a few herbaria had generated images of their specimens (RSA, SD, SDSU, SJSU, UC/JEPS, and UCSB). In the future, hopefully every specimen in California herbaria will be accompanied by a high-resolution image available online.
The sheer number of plant specimens in California herbaria precludes imaging all of them in a single project. Therefore, to build the most robust dataset for detecting phenological shifts, the CAP TCN is targeting the oldest records, the most diverse families, and the families with the most endemic and threatened taxa (Fig. 2). These families also include many species that represent model systems for evolutionary research; the families Asteraceae, Brassicaceae, Onagraceae, Phyrmaceae, and Polemoniaceae have been the focus of years of research in systematics and evolutionary biology. In addition, the CAP TCN is targeting the Adoxaceae, Agava- ceae, Sapindaceae, Zygophyllaceae, and an additional 250 taxa that are currently monitored by the USA National Phenology Network (usanpn.org) and the California Phenology Project (cpp.usanpn.org) to provide robust comparisons between the historical record of plant phenology (based on herbarium specimens; Fig. 3) and the contemporary record (based on in situ observations of living plants; Haggerty et al. 2013; Matthews et al. 2014). In total, the CAP TCN aims to make available 904,200 imaged, fully databased, and georeferenced herbarium specimen records. The project will also produce phenological scores (e.g., annotations of unopened flowers, open flowers, and fruit) of all 904,200 specimens according to cooperatively-developed phenological standards and protocols.
Equipment and Workflows
Producing such a large number of specimen images within four years requires new equipment, efficient protocols, training, and, most importantly, strong collaboration. Fortunately, the CAP TCN operates within the collaborative network of other ADBC-funded digitization projects, such as the Mid-Atlantic Megalopolis TCN and Southeastern Collaborative Network of Expertise and Collections (SERNEC), both of which have provided instrumental guidance in establishing CAP TCN protocols and best practices. Representatives of most CAP TCN participating institutions (Fig. 1) attended the ADBC Summit in October 2018 to kickstart active communication and receive initial training. Institutions have purchased and assembled new imaging stations, each including a high-resolution camera, lighting equipment, and necessary software for capturing and processing images. CAP leadership is constantly developing workflows and protocols for use across the network, drawing heavily from existing resources and best practices disseminated by iDigBio (idigbio.org), other Thematic Collections Networks, and other herbaria. While each herbarium in the CAP TCN adapts these workflows and protocols to fit their own institution, they all begin with an understanding of digitization best practices, and, as a result, the images and data meet a specified standard of quality. Site-visits, webinars, and conference calls with the project manager build capacity at each herbarium and ensure that project goals are met. The developed resources, including training manuals, videos, protocols, and quarterly reports, are publicly available on the project website: capturingcaliforniasflowers.org. A simplified diagram of the CAP TCN digitization workflow is shown in Figure 4.
Once specimen images are produced and processed, they are made available online through a new specimen data portal, CCH2 (cch2.org). The CCH2 portal contains all specimen data—regardless of taxon or collection location—from all collaborating institutions. This portal greatly expands institutions' abilities to curate and digitize specimens because it leverages Symbiota, an open source content management system that enables storage, sharing, and active curation of specimen data (symbiota.org; Gries et al. 2014). Symbiota software, and thus the CCH2 portal, contains numerous data curation and management tools, such as data cleaning and georeferencing modules. The CAP TCN is developing new tools to capture phenological data from specimen images (see Phenological Scoring), and several of these tools can be co-opted to capture other trait data (e.g., leaf measurements, herbivore damage).
Previously, many herbaria in the CAP TCN lacked interoperable databases that allowed efficient curation, cleaning, and sharing of specimen data. Now, using a web browser, each herbarium can actively manage its own data in the CCH2 portal, and data managers, users, and the public can view images and data, including phenological data, from these collections as soon as information is added. Institutions with managers who prefer to manage specimen data in a local database can still share their data by regularly uploading a “snapshot” of their dataset to the portal, which also becomes publicly accessible upon upload. Data entered in the CCH2 portal, either “live” or through a “snapshot” upload, are automatically mapped to biodiversity data standards provided by Darwin Core (Wieczorek et al. 2012b), promoting interoperability of data between institutions worldwide. Also because of these standards, CCH2 data are easily parsed and distributed via global data aggregators iDigBio (idigbio.org) and GBIF (gbif.org). The earlier California Consortium of Herbaria public interface, now known as CCH1 ( http://ucjeps.berkeley.edu/consortium/), will also draw from the CCH2 portal and display only data from vascular plant specimens collected in California, integrating closely with the Jepson eFlora ( http://ucjeps.berkeley.edu/eflora/).
The new CCH2 portal will be used to georeference all targeted specimens that have not already been assigned latitude and longitude coordinates in previous projects (e.g., Baldwin et al. 2017). The CCH2 portal, like most Symbiota instances, is equipped with GEOLocate (Rios and Bart 2010), batch georeferencing tools, and botanical duplicate identification tools, each of which will facilitate efficient georeferencing of herbarium specimen records according to existing community-established protocols (Fig. 4; Wieczorek et al. 2012a).
To capture phenological data, the CAP TCN is developing workflows for scoring phenology in a number of ways (Fig. 4). Once phenological data standards are established and appropriate tools are developed in the data portal, some institutions may capture phenological traits concurrently with other digitization steps (e.g., label transcription). The new phenological data fields on the occurrence record page of the portal, developed for this project, will greatly facilitate this process. The project is also expanding the Image Scoring Tool, a Symbiota module developed for the New England Vascular Plants Thematic Collections Network (nevp.org). The Image Scoring Tool presents users with images of specimens and allows them to apply a score (e.g., “unopen flowers absent, open flowers present, and fruit present”) to each record. These scores are recorded as annotations to the specimen record according to Darwin Core-compatible standards (see Yost et al. 2018 for proposed scoring schema), which will be fully interoperable with phenological data produced by other means via mapping to the Plant Phenology Ontology (Stucky et al. 2018).
The project will also use and build upon another newly-developed Symbiota function, the Attribute Mining Tool. This tool allows editors to search any database field for certain words that refer to reproductive states and apply a phenological score according to set definitions. For example, an editor can search the “Notes” field of the database for the word “flower” and all unique text strings containing “flower” will be shown. The editor can then select all records that are suitable and apply the same phenological score to all selected records at once. This tool greatly facilitates adding trait attributes to records and could be expanded for any number of other traits.
Some phenological scores will be produced via crowdsourcing using the citizen science platform Notes from Nature (notesfromnature.org; see below). These scores will represent the consensus scores offered by at least three independent volunteer scorers, requiring expert review only when results are ambiguous. Finally, there are many opportunities to explore automated phenological scoring of herbarium specimen images. Investigators at several collaborating institutions, as well as the CAP TCN project manager, are exploring the use of machine learning (e.g., neural networks) to automatically score the phenology of individual specimens, a workflow that has shown much promise (Lorieul et al. 2019). All phenological scores will be associated with their specimen records and therefore be accessible through the CCH2 portal.
Specimen digitization entails a number of activities, such as pre-curation, barcoding, photographing, and image processing, that require a great deal of human participation. Each institution relies on the dedication of faculty, staff, paid and unpaid students (some receiving research credit for their work), and community volunteers to conduct digitization. Many institutions are building partnerships with naturalist and environmentalist groups, such as local chapters of the California Native Plant Society to accelerate the rate of digitization and engage the broader community. For instance, several institutions are crowdsourcing label capture using Notes from Nature, an online platform that engages citizen scientists to transcribe label data from images into designated text fields (notesfromnature.org). The CAP TCN will also collaborate with Notes from Nature to create phenology-themed online “expeditions” in which volunteers use specimen images to score specimens’ phenological statuses.
All interested groups and individuals are invited to contribute via online crowdsourcing opportunities and to participate in on-site citizen science events (e.g., the Worldwide Engagement for Digitizing Biodiversity Collections event; wedigbio.org) as they become available. Interested parties also have the opportunity to participate in one of several phenology workshops that will be conducted at UC Santa Barbara, Santa Barbara Botanic Garden, Rancho Santa Ana Botanic Garden, and/or UC Berkeley in 2020–2022. These workshops will expose participants to the importance of phenological observations, the resources available in the CCH2 portal, and native pollinators and native plants, and they will encourage the continuation of current phenological monitoring programs. More information about each of these ways to get involved—Notes from Nature expeditions, on-site digitization events, and phenology workshops—will be posted on the project website ( capturingcaliforniasflowers.org).
The CAP TCN further invites all herbaria that are either located in California or can share specimen data representing the California Floristic Province to collaborate with the CAP TCN. Posting and managing data on the CCH2 data portal is an efficient and effective way to improve data quality, share specimen data, and enhance institution visibility, and all are welcome to engage in the CAP TCN project.
Capturing images of herbarium specimens is a critical task for mobilizing herbarium specimen data for research and education. Specimen images can provide a wealth of data that characterize the morphology, pathology, and phenology of the plants they represent. The California Phenology Thematic Collections Network will add nearly one million images from the United States' most diverse floristic province to the public sphere. Accompanying these images will be full label data, georeferenced coordinates, and the phenological statuses of the specimens. These data will rapidly enable and inspire research on California's changing flora through integration with the California Phenology Project, the Plant Phenology Ontology, and data generated through other digitization projects. We invite all interested individuals, groups, and herbarium collections to contribute to and benefit from this growing resource.
The authors thank the phenological standards advising committee for their consultation regarding data standardization: Kjell Bolmgren, Katharine Gerst, Ed Gilbert (coauthor), James Macklin, Liz Matthews, Gil Nelson, Patrick Sweeney, Ramona Walls, and John Wieczorek. We also thank all the volunteers, staff, and students who have contributed and continue to work toward the goals of this project.
This project was made possible by National Science Foundation Awards (herbarium acronyms according to Index Herbariorum; Thiers 2019): 1802301 (OBI), 1802163 (UC/JEPS), 1802181 (UCSB), 1802182 (CHSC), 1802191 (CSLA), 1802203 (DAV), 1802178 (FSC), 1802185 (HSC), 1802312 (IRVC), 1802199 (LA), 1802192 (LOB), 1802200 (MACF), 1802183 (RSA), 1802176 (SBBG), 1802186 (SD), 1802180 (SDSU), 1802194 (SFV), 1802177 (SJSU), and 1802188 (UCR). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.