In 2017, a minimum of 8.5 million mollusk lots representing some 100 million specimens were held by 86 natural history collections in the U.S. (81) and Canada (5). Of these, 6.2 million lots representing 70 million specimens were cataloged (73%), another 2.3 million lots were considered quality backlog awaiting cataloguing, and 4.5 million lots (53% of the total) had undergone some form of data digitization. About 1.1 million (25%) of the digitized lots have been georeferenced, albeit with different approaches to accuracy and uncertainty. Fewer than 25% of collections, mainly larger ones, claim to be fully Darwin Core compliant. There are 35,000 primary type lots and 66,000 secondary type lots, representing 1.6% of cataloged lots. About 87% of lots are dry and 13% are fluid preserved, with less than 0.3% frozen. The majority of lots are gastropods (71%) and bivalves (26%). By habitat, 54% of lots are marine, 26% terrestrial, 19% freshwater, and 1% brackish. About 43% of marine and 57% of non-marine holdings are from North America including the Caribbean.
Solem (1975), in a previous survey of U.S. and Canadian malacological collections, reported 3.74 million lots of which 775,000 (21%) were uncataloged backlog, and suggested that backlog was growing at a faster rate than specimens were being cataloged. Since then the overall size of mollusk collections has grown by 227% and cataloged lots by 208%, but quality backlog has grown by 300%, confi rming Solem's extrapolation. Solem noted that the eight largest collections held 78% of the lots, but in 2017 the eight largest (now with a slightly different composition) held only 63.5% of the lots, refl ecting substantial growth of small and mid-sized collections, and the larger number of institutions that we surveyed. Solem reported a substantial gap between large collections (≥160,000 lots; AMNH, ANSP, BPBM, DMNH, FMNH, LACM, MCZ, UF, UMMZ, USNM) and mid-sized ones (35,000-75,000 lots; ChM, FWRI, Hefner, HMNS, SDNH, NCSM, SIOBIC, UCM, UWBM, YPM), but seven collections now fall in the range of 76,000 to 160,000 (CM, BMSM, CASIZ, CMNML, INHS, OSUM, and SBMNH), and two have jumped to the large category (UF and DMNH).
Often overlooked is Solem's conclusion that mollusk collections in the United States and Canada are second only to insect collections for number of specimens, which is still true. Because there are far fewer species of mollusks than insects, mollusks have more specimens per species, averaging 1,100 in our survey, almost ten times what Solem reported for insects and approaching what he reported for fish. Bivalvia may have as many as 2,400 specimens/species, which makes them among the best-sampled classes of metazoans. The high number of specimens/species among mollusk and fish collection makes them well-suited for environmental studies that track faunal change over time and space.
Mollusks represent the second largest phylum in the animal kingdom, one that contains extraordinary ecological diversity, spanning terrestrial, freshwater, and marine environments, and has a fossil record dating back to the Cambrian. Formally and permanently accessioned mollusks in institutional collections constitute a rich library of morphological and genetic diversity and provide baseline data of the group's distribution in time and space. As such, they contribute to an enormous range of research fields, from evolutionary history of life forms, to the occurrence and abundance and management needs of species, shifting of distribution ranges (including fisheries and pest species), and changing attributes (e.g. body size) over time. High quality molluscan specimen data contained in natural history collections provide a foundation for environmental monitoring of all human-impacted habitats.
The ecological and economic importance of North American specimen collections can only be fully assessed and harnessed if the data are accessible in meaningful and comparable ways. Traditionally, taxon-specific publications have reported on the scope of individual museum collections or type material (e.g. Bieler and Bradford 1991) and that practice continues today across taxa (e.g. Ciubuc 2017). Individual publications are an important way to annotate collections and holdings, but inefficient for providing wide access to collections' information. Ariño (2010) estimated 3% of the possible 2.1 billion natural history collection lots of all taxa were available through GBIF. GBIF (accessed in June 2018) lists 156,000,000 specimen records (11 million of which are mollusks), so the figure has grown to perhaps 8%, but clearly there is still far to go in digitizing collections. Having a realistic sense of the scope of collections is essential to planning for efficient digitization, data and specimen management, and to promote data and specimen usage. Digitizing metadata about collections may be a global first step (Berendsohn and Seltmann 2010, Scoble 2010, Schindel et al. 2016), but local collections must publish their holdings as quickly as time and resources allow.
The curators and collections managers responsible for mollusk collections have made several significant attempts to understand and document the size and scope of their holdings. In response to the Association of Systematics Collection's National Plan (Irwin et al. 1973), Field Museum curator Alan Solem published a seminal work on the state of the U.S. and Canadian mollusk collections (Solem 1975). He surveyed 125 institutions and 100 private collectors and provided a synopsis of the data from 45 mollusk collections and 50 collectors from the U.S. and Canada. Nineteen mollusk collections with fewer than 5,000 lots were excluded from his analysis. He calculated that 78% of all molluscan holdings were contained within eight institutions and argued that supporting these collections would have the maximum benefit to molluscan research. Since then, additional compilations of institutions with type specimens (Kabat and Boss 1992, 1997) or important holdings (Sturm 2006) have been published, and an extensive list of worldwide mollusk collections with contact information and summary collections data has been maintained online by Cummings et al. (last updated 2009). An additional comprehensive resource was provided by Coan and Kabat (2018), who compiled biographical and bibliographical publications for more than 10,000 malacologists and other individuals with an interest in and relevant contributions to mollusks' natural history and distribution. However, there has not been another comprehensive survey of mollusk collection holdings and their scope in over 40 years.
Collection management of natural history collections has changed fundamentally over the past decades, incorporating advances in archival storage materials and techniques, digitization of text data and images, and global gathering and sharing of specimen and metadata information via the Internet. The rapid development and adoption of such approaches in mollusk collections is demonstrated in the “Standards for Malacological Collections,” developed and published by Solem et al. (1981) for the North American Council of Systematic Malacologists. Focus therein was on the physical well-being of the collections (proper storage of material for future morphological study) and the local availability of specimen and collecting event data.
Online accessibility of specimen records now allows harvesting locality data that can be used in a growing and everdeveloping range of research fields, such as biogeography, species range shifts, niche modeling, environmental monitoring, and conservation research, as well as documenting spatial, temporal, and taxonomic collecting gaps. Such data mining is greatly enhanced by data aggregators (e.g. GBIF) and unified collection portals, such as iDigBio (idigbio.org), InvertEBase (invertebase.org), and SCAN (scan-bugs.org). Increasing data quality (e.g. through improved georeferencing), data scope (e.g. by adding 2D and 3D images), and specimen attributes (e.g. documenting host-parasite associations) forms the foundation of a new range of specimen-based research activities (e.g. see Digital Data in Biodiversity Research Conference series, organized by iDigBio).
Here, we report on the results of a new survey of United States and Canadian mollusk collections that was conceived and initiated prior to the Molluscan Digitization Workshop at the 2017 American Malacological Society meeting (Shea et al. 2018, this volume). This survey revisited some of the same questions that Solem (1975) addressed and investigated new issues including georeferencing and moving collections data onto the web. Importantly, this survey also focused on finding and including smaller, lesser-known, and “hidden” collections to get a more complete understanding of the scope of molluscan holdings in the United States and Canada (documented in Appendix 2). The institutions surveyed are listed in Table 1. The results provide new insights into the complex landscape of natural history holdings and will help prioritize and maximize limited resources to improve the care of, access to, and research use of mollusk collections.
BACKGROUND ON MOLLUSK COLLECTIONS
The structure and nature of molluscan (malacological) collections reflect the specific physical attributes of the phylum Mollusca, the species-richness (Table 2) and unique characteristics of each included group, their collection-forming history, the advancement of preservation techniques (Appendix 4), and the ever-increasing research use and research techniques applied to these collections. Perhaps more so than most other groups of organisms in collections, mollusk collections have a history of contributions by amateur collectors. In addition to major collecting efforts by researchers and government agencies, Solem (1975:223) estimated that 85% of the mollusks in major institutional collections were collected by amateurs. These specimens often were (and are) of very high quality and with good locality data but may be biased towards large and attractive shells. In addition, such material from private collections consists predominantly of dry shells, without tissues suitable for anatomical and molecular study. An outstanding example of a private collection absorbed by a U.S. museum is Leslie Hubricht's collection of about 500,000 specimens in 43,000 lots of eastern U.S. land snails that forms the backbone of FMNH’s North American land snail collection (Solem 1986, Gerber 2010). Molluscan collections cover a wide range of specimen sizes, from microscopic snails to giant squid, and preservation types including dry shells, fluid preserved bodies, fossil material, and other derivative materials, e.g. dissected specimens and histological preparations on microscope slides. With increasing focus toward modern research applications, the diversity of preservation techniques and concomitant storage needs have evolved since the 1970s to include cryogenic facilities and electron microscopy mounts.
Surveyed mollusk collections — List of U.S. and Canadian mollusk collections, in alphabetical order of museum or collection identifier. All contacted collections are listed. Museum identifiers are those the institutions currently prefer and may differ from acronyms or identifiers used in other listings. Column 1975 shows collections surveyed by Solem (1975). Column 1996/2009: indicates collections included in Cummings et al. 2009 (latest partial updates are from 2009). Museum identifiers used by Solem and Cummings et al. 2009, if different from this list, are given in the respective columns. Column 2017: shows the current survey. Notations:ENA = data limited to eastern North America; LD = limited data provided (these collections are included in subsequent tables and appendices only when sufficient data are available); NC = no mollusk collection present; [F] = indicates fossil holdings; [R] indicates Recent holdings in largely paleontological mollusk collections.
The largest and oldest component of mollusk collections are the dry shell collections, predominantly of gastropods and bivalves but also scaphopods and polyplacophorans and the occasional shelled cephalopod. Most collections hold predominantly dry material (Appendix 4), which is arranged in systematic order, according to one or more higher-level taxon treatments (e.g. WoRMS and MolluscaBase). Within each family, organization generally is alphabetical or geographic, but this may vary by size or local needs and interest.
Material of shell-less or largely soft-bodied groups (e.g. cephalopods, aplacophorans, nudibranchs, terrestrial slugs) is usually fluid preserved and often fixed in formalin. Some specialized collection (e.g. ARC, UNM(MSB) [Parasites], SIO-PIC) are essentially entirely wet-preserved. The final storage solution usually is 70-80% ethanol. Fluid-preserved specimens are often stored in numerical order to save space but can be stored in systematic order. Various protocols have been followed in tissue fixation (Roper and Sweeney 1983, see papers cited in Sturm et al. 2006). Material intended for anatomical study, especially of marine mollusks, often underwent fixation in buffered formalin or Bouin's solution (especially for histological investigations) before transfer to alcohol. Specialized histological techniques introduced additional fixatives (Howard et al. 2004). A formal record of such fixation becomes an important part of specimen metadata as it will influence the tissue selection for successful anatomical and molecular approaches. With the advent of molecular component extraction and analyses, preservation of soft tissue increased significantly, with storage in high-percentage ethanol without prior chemical fixation, in nucleic acid preservation buffer fluids, or direct freezing in ultracold freezers or liquid nitrogen. All glass- and plasticware as well as labels need to be of archival quality, selected to handle chemical and/or low-temperature exposure to ensure long-lasting preservation of the material.
Primary types (holotypes, syntypes, lectotypes, neotypes) are concentrated in the larger, older collections, especially USNM, ANSP, and MCZ (documented in Appendix 3). These collections house material from the early phase of documenting North American molluscan diversity, going back to authors such as William Dall (1845-1927) and Henry Pilsbry (1862-1957). Large numbers of paratypes exist in other collections (e.g. DMNH) where acquiring specimens has been emphasized over describing new species. Accumulation of secondary types may also result from more recent international collecting practices whereby primary types are deposited in the host country and secondary types deposited in additional museums (e.g. Solem's extensive land snail work in Australasia, with many secondary types deposited at FMNH). Type collections are often housed separately from the main collection (e.g. USNM), but can be integrated with it in systematic order (e.g. ANSP).
Fossil and Recent mollusks are traditionally separated into different organizational units (e.g. invertebrate paleontology vs. zoology) within collection-holding institutions. Pleistocene and subfossil Holocene (e.g. loess) material, particularly of species also known from the Recent is often included as part of the “Recent” collection unit.
A multitude of additional preparation types exist in mollusk collections. Preserved egg masses and radula slides have a long history in the field, while scanning electron microscope mounts and frozen tissue samples are relatively new. Some institutions preserve associated parasites (e.g. UNM(MSB)), and others preserve hosts parasitized by mollusks (e.g. ANSP). Extensive holdings of field photographs of living animals (e.g. of deep-sea cephalopods, DMNH), digital specimen photography, x-ray, and CT scanning have added new layers of virtual collections and will likely grow in the future.
Average number of lots per molluscan species by class. The number of lots held for each class is divided by the number of species in that classes as a reflection of taxonomic coverage across the surveyed collections. Data on number of marine species is from MolluscaBase; data on non-marine is from Rosenberg (2014) updated with recently described species from MolluscaBase.
Recent decades have seen substantive improvements in standards for the archival care necessary to assure the long-term integrity of calcareous shells and associated soft-bodies and tissues. Molluscan shells are known to be susceptible to so-called Bynesian decay, an efflorescence triggered by acid vapors from wood or paper materials that can destroy shell surfaces (e.g. Tennent and Baird 1985). Acidic wood material, specimen boxes, label paper, and organic cotton should be replaced by acid-free archival-quality materials (or nonarchival materials should be isolated from direct specimen contact). Most North American collections are in various stages of this shift (e.g. from wooden to metal drawers and cabinets) as staffing and funding allow, and many collection staff members indicated a pressing need for such re-curation in the survey questionnaires and at the 2017 Mollusk Digitization Workshop.
Mollusk collections today are moving rapidly from a hand-written “ledger and label” system to being digitized in a variety of data management systems. The term “digitization” as used here encompasses any specimen data capture in digital form regardless of software platform: from word processor and spreadsheet flat files to relational databases. Most mollusk collections started their transition by entering ledger and label information (specimen identification and locality data) into a spreadsheet or database. Collections that began digitizing in the 1980's (e.g. DMNH) may have a digital record for most of their collection, but amount and scope of data captured varied over time. Thus these records may now be considered incomplete (skeleton data) or are non-normalized due to file-size or field-length constraints of early databases. Although labor-intensive, digitization makes day-to-day operations such as loan transactions, printing labels, and updating taxonomy more efficient, and broadens their availability for research use. A few collections (e.g. DMF) digitize accessions, be they lots or specimens, rather than cataloguing lots. The type of captured data in such accessions may be rather variable and not comparable to digitization of cataloged lots. Recently, significant attention has been paid to mining online specimen data for occurrence records and traits, opening up a new field of data analysis based on the online accessibility of specimen data (Beaman and Cellinese 2012, Ball-Damerow et al. 2014).
The development of the Internet provided an opportunity for serving data online and exposing individual databases to much larger potential userbases through institutional websites and data aggregators such as iDigBio and GBIF. Adding images to specimen records represented another milestone, as did the inclusion of mappable geodetic coordinates, known as georeferencing. Over time, georeferencing has evolved from adding rough map coordinates to detailed point data aided by GPS units in the field and modern online tools such as Google Maps, GEOLocate and other specialized georeferencing software.
A survey questionnaire (see Appendix 1) was distributed (by RB) in March 2017 as a Microsoft Excel worksheet to 80 known or expected institutional collections of extant mollusks in the U.S. and Canada. We targeted collections listed by Solem (1975), those identified in Kabat and Boss (1992, 1997) as having molluscan type holdings, and those in the online listing by Cummings et al. (2009). The survey focused on extant holdings of formal institutional collections; exclusively paleontological collections or private collections were not included. After the initial survey, some additional collections came to our attention and were sent the survey individually. For some of these mostly smaller collections, data were collected and added as late as June 2018. In total, 60 (70%) of the 88 collections contacted provided full or partial data (Table 1); 27 collections provided limited data, and one collection (Marshall University) has been closed (V. Fet in lit.).
Intensive efforts were made to obtain comparable data for collection sizes measured in cataloged lots, digitized lots, and quality backlog. We often sent follow up questions to individual respondents to clarify ambiguities, such as reporting specimen numbers rather than lot numbers or accession numbers rather than cataloged lots; including backlog in estimates of proportion of holdings by taxonomy, geography, habitat or preservation; or providing such proportions only for digitized parts of the collection rather than for all cataloged lots. Some data in the Tables and Appendices are supplemented from other sources, especially from Cummings et al. (2009), and institutional websites. Nonetheless, some inconsistencies remain across the figures reported in the tables and are flagged with superscripts in Appendices 3-10.
Although we made every effort to find and include all known mollusk collections in the United States, we have undoubtedly missed some collections, misinterpreted some freestyle responses, and were not able to obtain detailed records from some collections in time for this analysis. We hope that we will be able to include these additional, under-documented, or undiscovered collections in future treatments.
The collections surveyed were in various stages of curation and digitization (see definition of digitization above), and respondents often replied with educated guesses and estimates, frequently in narrative form (e.g. “at least 20% of our holdings are marine”, see Appendices 3-10). To standardize the data across collections, some interpretation and recalculation was necessary to turn narrative responses and estimates into comparable numbers. While necessary, this approach may have under- or overestimated collections' holdings and there are surely errors in the tables. We maintained the data in Microsoft Excel, with two of us (PS and GR) independently compiling data in the tables. Totals and other statistics were calculated in versions of the tables that had non-numerical characters stripped out, which means that indications such as greater than or less than signs were omitted and assumed to average out across collections. Where a range was given, we calculated based on the average of that range. We have done our best to be conservative in our estimates and minimize errors; however, we apologize for any misinterpretations of institutional data that we might have introduced and will be happy to update our dataset in response to comments. Given the large number of institutions surveyed, such errors should average out, so our overall conclusions should be reliable. The presence of backlog material and the difference between counting individuals, specimens, and lots may further affect the size estimates.
Specimens that have not been formally evaluated or added to a collection are often referred to as backlog. Depending on individual collection management practices, this might include anything from a small quantity of research specimens of a recognized authority, to an orphaned institutional collection in need of specialist taxonomic review, to containers of mixed shells collected during a sampling expedition. These scenarios are separated by the degree to which they have been physically and academically curated.
We adopt the term “quality backlog” to describe specimen lots that have good locality data and confident identifications that are ready for digitization with minimal physical curation necessary.
We adopt the term “deep backlog” to describe materials that have locality data but are either not sorted into lots or are not (or poorly) identified, and require considerable physical curation prior to digitization.
The numbers reported here are for quality backlog; however, the deep backlog in some collections approaches or even greatly exceeds the number of previously cataloged lots in the respective collection (e.g. CASIZ, FWRI, SBMNH, UNSM, YPM. see Appendix 3).
Individuals, specimens, lots, and records: In principle, it is desirable to count the total number of individuals in a collection; however, the variety of preparation types employed to preserve soft bodies and hard shells makes this goal difficult to achieve. Mollusk collections manage four different unit concepts: an individual, a specimen, a lot, and a record.
An individual. An individual is a single, whole organism. In mollusks collections, the individual may be represented by a single body (e.g. cephalopods, aplacophorans); a single shell with or without a body (gastropods, scaphopods); by two or more articulated shell valves, with or without a body (bivalves, chitons); or by disarticulated shell valves (bivalves, chitons). Some collections count valves rather than individuals, even for live-collected specimens. When soft bodies are removed and preserved in ethanol separate from the dry shell, the individuals may be counted twice in some collections.
A specimen. A specimen is not equivalent to an individual because derived objects (e.g. microscope slides of a radula, frozen tissue samples) may also be counted as specimens. The individual that originally contained the derived object may be preserved separately and may even reside at a different institution. The number of specimens therefore is an estimate of the objects managed in a collection, not the number of individual organisms preserved.
A lot. The commonly accepted definition of a lot is a group of individuals (n=1 to many) of the same species that were collected during a single collecting event (same locality, same date), but it is context dependent. If a lot is split and a part is sent to another institution, it then becomes two lots. Individuals of one species from a particular collecting event are generally counted as different lots if they have different preservation (dry versus alcohol), but those lots often have the same catalogue number and are tracked as a single database record. Specimens might also be cataloged as individuals if it is necessary to track information at the individual level, for example, a holotype split from paratypes, or an individual from which a DNA sequence is available. The number of lots therefore is an estimate of the number of samples (containing one of more specimens) managed by a collection.
A record. A record in a database generally corresponds to a line in a hand-written ledger of a collection, and some collection databases originated in part from digitizing such ledgers. Like a lot, a record refers to a group of individuals of the same species (or taxon if sorting is incomplete) that came from a single collecting event. Samples with different preservation (alcohol or dry) or derived objects (slides or SEM stubs) may or may not be managed through a single database record, depending on institutional convention. The record is the usual level from which point specimen occurrences are mapped.
Specimen counting and collections size estimates
Institutional estimates of the number of individuals from the number of counted (or estimated) lots vary widely due to the nature of the specimens in a given collection (e.g. largebodied marine species usually contain fewer specimens per lot than microscopic land-snails), but also due to individual collection conventions for using multipliers to estimate individuals. For example, to streamline cataloguing, large lots with many specimens may be recorded in ledgers or on labels as “many”, “> 100” or even “go”. Counts of individuals are frequently estimated assuming that each lot contains an average of 4–20 specimens depending on the collection.
Finally, collection-specific workflows may affect how easy it is to count specimens.
Accessioning is the formal process of transferring ownership of an object(s) to a museum for inclusion in a permanent, managed collection, with associated legal and ethical obligations to care for those objects (Simmons 2006). Traditionally in natural history collections, a single accession (or acquisition) number is given to an entire incoming collection (e.g. the Smith Collection of land snails) regardless of whether it contains one object or many. In some institutions or collections, an object is not considered accessioned until it is cataloged.
Furthermore, if a collection uses consecutive catalogue numbers, the size of the collection (in lots) is roughly the same as the latest number assigned. However, this number is often impacted by historic breaks or duplications in the catalogue numbering sequence, deaccessioning of material, the inclusion of non-molluscan taxa (e.g. brachiopods) or fossil taxa in the same numbering sequence, or by different practices of assigning single or multiple numbers to sublots (e.g. those stored in different media or sorted into age classes).
Size of collections
The mollusk collections of the United States and Canada are diverse in size and specialization (Tables 1, 4; Appendices 3, 5, and 6). They range from the Smithsonian's National Museum of Natural History, which is the largest general molluscan collection in the world with more than one million cataloged lots, to small collections with regional or topical holdings. It should be noted that molluscan collection size is often, but not necessarily, a reflection of overall institutional size, and holdings included here are, for instance, smaller extant molluscan holdings in a much larger predominantly fossil-oriented collection (e.g. PRI).
The current listing of these collections encompasses 86 institutions (Table 1). Of these, 30 were included in Solem's (1975) survey. We identified five size-categories of collections, four of which are directly comparable to Solem (1975) with a new category of large collections that occupies a space between Solem's large and medium groups (Table 3, Appendix 3; Figure 3). Solem categorized institutional collections in the following size classes: large (>160,000 lots), middle-sized (30,000 – 75,000 lots), and small (9,000 – 29,000) lots).
Cataloged lots totaled 6.2 million across the collections, with 2.3 million backlog lots for a total of 8.5 million lots in U.S. and Canadian mollusk collections (Appendix 3). Some institutions also provided estimates of the number of specimens, allowing calculation of the number of specimens per lot, which ranged from 1 to 35. The weighted average is 10.5 and the straight average is 8.3 specimens/lot (excluding three institutions with only one per lot because of their specialty). Using 10 specimens/lot as a reasonable average, we calculated a number of specimens per lot when institutions did not provide this number (marked with a double asterisk in Appendix 3; this was also used in a few cases to indicate the reverse calculation where an institution provided specimen but not lot numbers). For USNM, we used 16.2 specimens/lot based on Solem (1975), since using 10/lot would have resulted in estimating fewer specimens than reported more than 40 years ago. The total estimate of the number of mollusk specimens in the collections surveyed here is 70,500,000, of which 4,590,000 resulted from estimation (10/lot). If the straight average rather than the weighted average were used, the total would be reduced by 770,000, which still yields 70 million as an estimate for the number of cataloged specimens.
A backlog of 2.3 million lots implies 23 million backlog specimens, or a total of 93 million specimens, but that estimate is surely low. Some institutions did not provide estimates of the size of their backlog, and we asked for estimates of quality backlog, i.e. material ready to catalogue, not deep backlog, which would include unsorted and unidentified material. Also, the estimate of the number of cataloged specimens is probably low. Specimen counts for large lots in mollusk collections are often estimates, e.g. “>50” or “100+”. Only the numeric portion can be summed, resulting in an underestimate of number of specimens. We therefore regard 100 million specimens in mollusk collections in the U. S. and Canada as a minimum estimate.
Our survey showed about 35,000 primary type lots (holotypes, lectotypes, syntypes, neotypes) among our surveyed collections, and 66,000 secondary type lots (paratypes, paralectotypes) (Appendix 3). The number of primary type lots is likely to decline – there can be multiple lots of syntypes for a given name – but a lectotype designation renders all but one specimen paralectotypes. The ten largest collections in terms of cataloged lots hold 84% of the Recent type material (USNM, ANSP, LACM, UF, FMNH, MCZ, AMNH, BPBM, UMMZ, DMNH).
Across institutions, there were 4,677,000 cataloged dry lots and about 742,000 wet lots (Appendix 4), the total of which (5,420,000) is 771,000 lots less than the total cataloged lots reported. This difference is partly because some institutions, generally those with fewer than 40,000 lots, did not report dry versus wet lots, and partly because some institutions reported numbers only for digitized material, rather than from the whole collection. Assuming that institutions that did not report wet holding have essentially entirely dry collections, 194,000 should be added to the total for dry lots, which means that about 13% of lots are fluid preserved. Backlog was more than 90% dry preserved, 1,130,000 versus 106,000 lots (Appendix 4), but only 53% of the total backlog of 2.3 million is accounted for in this figure.
Comparison of mollusk collection sizes – Categories are a combination of the Large, Medium, Small, and Very Small sizes presented in Solem (1975) plus a new, Medium-Large category based on 2017 data. 1In Solem (1975), the Small category ranged from 9,000 (EKY) - 27,700 (ChM), but it was expanded to 9,000–29,000 to exclude gaps between collection categories. 2Only 8 collections are listed here because the SU collection was transferred to CAS.
Only 14 institutions reported frozen holdings (tissues and whole animals), and only one (UCMP) included frozen lots in its count of cataloged lots. Frozen lots (cataloged and backlog, not counting DNA extracts) total about 15,000 (Appendix 4) and so less than 0.3% of holdings. Other holdings included slides (radular and histology), SEM stubs, egg masses, hosts and parasites, DNA extracts, and images (Appendix 4).
About 71% of lots are gastropods, 26% bivalves, 1.2% cephalopods, 1% chitons, 0.4% scaphopods, and 0.1% aplacophorans (Appendix 5). Some institutions reported on only the digitized parts of their collection, whereas others included backlog material. Despite inclusion of backlog, total lots reported across classes was only 5.2 million, 1 million less than total cataloged lots. Although some institutions might have prioritized cataloguing or digitizing certain groups of mollusks, there is no particular reason to expect percentage by class to differ substantially on average between cataloged and backlog material across institutions, so we accept these percentages as representative.
Based on these percentages, we calculated number of lots per species by class (Table 2), based on data from MolluscaBase and Rosenberg (2014) on currently accepted species of mollusks. Average across the Mollusca was 79 cataloged lots per species and 108 total lots per species. Highest coverage is for bivalves, with 230 lots per species (based on total lots), next cephalopods at 128 lots per-species, gastropods and chitons roughly even, with 91 and 82 lots per species, scaphopods at 65 lots per species, and aplacophorans substantially lower at 19 lots per species.
Across the surveyed collections, 54% of lots were marine, 19% freshwater, 26% terrestrial, and 1% brackish Appendices (6–7). In Solem's (1975) survey, 50% were marine, 21% fresh-water, and 29% terrestrial. These percentages are probably not significantly different. In Solem's survey seven out of 21 institutions that provided a habitat break down of their collection holdings assumed an even split between two habitat types, for example a predominantly marine collection, like USNM reported 60% marine and 20% for both freshwater and terrestrial. In the 2017 survey, while the percentage may still be estimates, only three of 58 institutions made such an assumption (Appendix 7) (two others were our own assumption, PRI 50:50 freshwater and terrestrial and FWM 50:50 freshwater and marine).
Despite inclusion of backlog by some institutions (* in Appendix 6), total lots reported across habitats was only 5.6 million, 600,000 less than total cataloged lots. Some institutions did not have a mechanism for determining habitat in their database, such as a taxonomic dictionary with habitats coded for family, genus, or species. Even with this capability, if a specimen lot is not identified in the database, the habitat cannot be determined. Determining brackish status in particular was problematic as some institutions do not distinguish such specimen lots from those from marine environments, and the survey did not give a definition of the difference between these habitats. Some institutional respondents reported percentages as high as 10% brackish, which may require substantiation.
Not surprisingly, North American material was the largest component of institutional molluscan holdings at 43%. Caribbean was 6% and South American was 4%. These figures combine marine and non-marine taxa (Appendices 9–10). For marine material from the Americas, about 64% was from the western Atlantic and 36% was from the eastern Pacific. As with taxon and habitat coverage, some institutions included backlog in their figures for geographic coverage, but total lots reported across regions was 1 million less that total cataloged lots. Unlike taxonomic and habitat categories, the holdings in geographic categories do not sum to 100%, since our survey did not ask about areas outside the Americas, so the percentages for geographic coverage were calculated without data that included backlog (* in Appendices 9-10).
Of the 6.2 million cataloged lots, 4.5 million (73%) have undergone some form of data digitization (which includes all forms of digitization, e.g. ledger records entered, transcribed, or imported into word processor, spread sheet, or relational database formats). About 1.1 million (25%) of digitized records have been georeferenced, which represents 18% of all cataloged lots. Only 20 collections (<25%) claim to be fully Darwin Core compliant, however, 34 of the 66 collections with some form of digitization are searchable online (Appendix 8) through iDigBio, Arctos, or other portals, or directly through institutional websites.
Mollusk Collections in the U.S. and Canada in 2017
In the United States, collections included herein were from 37 states, the District of Columbia, and the Territory of Puerto Rico; Canada was represented by five collections in four provinces (Figure 1 – Map). Collections are concentrated along both coasts of North America, with five of the ten largest in the Boston-Washington corridor of the east coast, two in the Midwest, and one each in Florida, California, and Hawaii. States with the largest populations have multiple collections (e.g. California, Florida, Illinois, Massachusetts, New York, Ohio, and Texas); whereas large parts of the central and northwestern United States do not have any identified mollusk collections.
Mollusk collections are maintained by a wide variety of institutions and were developed for a similarly wide variety of reasons. Collections are developed, acquired, and grown due to research, educational, avocational, or monitoring activities that are part of the overall mission of the institution. The largest mollusk collections in the U.S. and in Canada are both federally funded. Their collections are geographically global in scope, although North American materials dominate (Table 3 – Collection Size Categories, Appendices 9, 10). Many of the largest mollusk collections (>160,000 lots) are “private” museums, which, in the U.S., are tax-exempt, not-for-profit, 501c3 institutions (e.g. AMNH, DMNH, FMNH). The formal reports on 990 tax forms ( https://irs.gov/forms-pubs/about-form-990) show a wide variety in what it means to be “private,” including public-private partnerships. Large collections are also found at university-affiliated and supported museums (e.g. ANSP at Drexel University, MCZ at Harvard University, OSUM at Ohio State University, UF at the University of Florida), many of which have a long history of mollusk-centered research. In contrast, smaller collections are often found at institutions with a regional audience or research focus. For example, the Natural History Society of Maryland in Baltimore has a land snail collection, and state wildlife agencies such as the Fish and Wildlife Research Institute in St. Petersburg, Tampa or the Florida Fish and Wildlife Conservation Commission maintain specimen collections as a result of their mission to understand and manage local wildlife.
In addition to full-time, permanent collection staff, survey respondents included emeritus faculty, graduate students, and volunteers. The wide range of individuals who answered the survey is both a testament to the depths of dedication to the care of these important collections, and simultaneously a sign of an ongoing staffing crisis.
The number and kind of collection-related staff positions reported were highly variable (Table 4). In larger collections, there was generally one or more senior research biologists with a mollusk-specific research interest plus administrative responsibilities for the collection (e.g. strategic oversight, growth, funding) plus one or more collection professionals with day-to-day operational responsibility for managing and caring for the collection. Job titles for the research/administrator position vary from Curator to Professor to Research Biologist, and often this person has significant additional responsibilities elsewhere such as teaching and mentoring undergraduate and graduate students. In smaller collections, this senior position may be held by a volunteer. In larger collections, operations staff may be a full or part-time Collection Manager, Collection Assistant, data entry specialist, archivist, or someone completely outside the collections' community (e.g. an aquarist). Especially at some of the larger university-supported institutions, Collection Managers often have advanced degrees, some with PhDs, and are expected to participate in grant writing and collection-based research (e.g. UMMZ).
Collection operations staff may be generalized and pooled across disciplines so that a few people are responsible for multiple extant and paleontological invertebrate collections. In these scenarios, it is not uncommon for collections to be without relevant molluscan curatorial expertise, sometimes for many years (e.g. AMNH, DMNH-P, SDNH, UNSM). Our data demonstrate that collections exposed to prolonged periods without dedicated staff with expertise in the field become increasing inaccessible and lag in technological and digital data management advances (see Table 1, collections indicated by LD provided only limited data in our survey and also Appendix 3; collections such as CMC and UAZ reporting little or no change of their collection data between 1975 and 2017).
Digital data in mollusk collections of the U.S. and Canada
Most North American collections nowadays have reached some level of digitization, with about 73% of total cataloged lots digitized (Appendix 8). Many collections (ANSP, ARC, CASPNNM, DMNH-P, FWM, FWRI, INHS, JFBM, LSUMG-I, MCZ, MMNHC, MMNS, OSUM, RBCM-INVZ, SMNC, SUI, UCM, UCMP, UF, VMNH) have reached complete or near-complete data entry of their cataloged collections.
Software platforms for the digitization efforts of these molluscan collections are remarkably diverse and a reflection of both the different directions and levels of institutional information technology development and of the individual initiatives and preferences of collections staff (Figures 2, 4). Eight collections reported no past or present digitization efforts. Of the collections with digitized data, many described using the dedicated commercial or open-source collections software solutions Specify (11 collections), EMu (8), Arctos (5), PastPerfect (3), Mimsy XG (2), and Proficio (1); others are based on generic current or legacy database (Access ; FileMaker Pro , Paradox ) and spread-sheet (Excel ) applications, sometimes combining more than one software solution. Several collections currently on generic database or spreadsheet systems indicated plans of switching to either Specify or EMu in the near future. Although most of the collections use database systems, it became clear during communications about the survey questions that many collections were not in a position to run basic queries on their database holdings. While some of this reflected a lack of authority files (e.g. to link to higher taxa or geographic hierarchy), in other cases it was clearly a lack of institutional staffing support. Particular data types (e.g. habitat coverage data) are currently unavailable for eleven collections (see Appendices 5, 6). Extraction of geographic (regional) data is currently problematic for 29 collections (Appendices 9, 10).
Collection staffers are routinely asked the deceptively simple question – how big is your collection? Most collections have settled on a standard answer to this question that is both defensible and approximately correct using available tools. As digitization continues, collections will be able to report on their size and scope with increasing rigor and nuance. However, nationwide, collections are mid-way through this transition and are grappling individually with challenges of rapid technological changes, enormous workloads and backlogs, all contained within an environment of declining resources and reduction in staffing (Table 4).
Mollusk Collection Metadata – Collection management metadata about mollusk collections with >85K cataloged in 2017. Abbreviations: staffing c = curator, which includes the senior administrative staff responsible for research and oversight of the collection; staffing s = staff which includes operational staff volunteers, and students responsible for the day to day activities within the collection; 1Curators, Collection Managers and Collections Assistants are counted as 1 FTE Full-time) or 0.5 FTE (part-time, or emeritus). Volunteers or casual to part time employees are counted as 0.2 FTE per day of volunteering.
Overall, we estimate that in 2017 there are at least 100 million mollusks in 8.5 million lots held by about 90 mollusk collections across the U.S. and Canada. The Smithsonian's Natural History Museum has the single largest cataloged collection in North America with more than 1 million cataloged lots; however, there are also > 20 mollusk collections with between 1,000 – 8,000 cataloged mollusk lots and > 20 mollusk collections with between 9,000 – 30,000 lots (Appendix 3). Some of the smaller collections have important regional and taxonomic holdings. For example, SIO-PIC has the fourth largest cephalopod collection, and RBCM has the sixth largest chiton collection among surveyed institutions (Appendix 5). The number of specimens per lot reported in collections varies widely, from 1 to 35, depending on the focus of an institution. OGL is a collection of DNA extracts from frozen tissues wherein every extract is a specimen. UNM is a parasite collection, with individual mollusks preserved as the source of parasites, so it also has a 1:1 ratio of specimens to lots. Brazosport has a synoptic display collection, usually with only one specimen per lot. On the other end of the scale, institutions that have largely expedition and survey material tend to have larger number of specimens per lot.
U.S. and Canadian collections hold about 101,000 type lots (Appendix 3), but the actual number of taxon names typified by these is unknown (e.g. multiple institutions may hold syntype lots for a single nominal species). The number of names typified can eventually be determined once most collections are represented in online portals such as GBIF and iDigBio. Unrecognized type material might also be found through these portals, by searching for specimens collected, donated, or formerly owned by the authors of taxa. Images of primary type specimens collected through these general natural history portals, and taxon-specific ones such as MolluscaBase, will be an enormous resource for the scientific community, as they are the standards for authoritative identification.
Type specimens represent less than 2% of the holdings of the surveyed institutions, and so present an obvious, achievable target for digitization and imaging efforts. However, mobilizing data on a much larger scale is needed to understand changes in distribution patterns of species over time. Across the surveyed collections, 73% of cataloged material is digitized in the broadest sense (Appendix 8), but when quality backlog material is included, this drops below 53%.
Computerization or digitization of collections, then referred to as electronic data processing (EDP), was in its infancy when Solem prepared his 1975 report, which was preparatory to large scale efforts to digitize mollusk collections. Digitization of 4.5 million molluscan lots since then is an impressive achievement, but, including backlog, 4 million more already in collections remain to be digitized, in addition to new acquisitions resulting from ongoing field collecting, donations of collections from private individuals, and acceptance of orphaned collections from other institutions (e.g. universities that reduce their organismal research programs).
The greatest need for digitization in mollusks collections is probably georeferencing, which is essential for mapping and visualizing distributions of species. Only 25% of digitized mollusk lots and only 18% of cataloged lots are georeferenced (Appendix 8), which means that most institutions are not yet in position to automate sharing of their spatial data through standard portals. Our survey asked only if coordinate data were available, not whether the georeferencing met modern standards. For example, fields for source of coordinates, and error radius (Shea et al. 2018), which are currently part of Darwin Core (see at: http://rs.tdwg.org/dwc/terms/, Biodiversity Information Standards TDWG) were not initially part of the Darwin Core standard (Wieczorek et al. 2012). Only 20 collections claimed that all their records were fully Darwin Core compliant, but 66 collections reported having some level of digitization. Of these, 34 have their data online through searchable databases. This suggests that more than 20 collections have databases that are compliant.
Our survey asked for the number of records captured with skeleton data (minimal or incomplete label data, e.g. only identification and country) or with fields not standardized, versus records that were Darwin Core compliant (Appendix 1, question 5), but at least one institution replied that all of its records were skeleton data only, but that all were Darwin Core compliant. This is technically correct—the database is compliant, and the data that are present are compliant, but a large amount of data remain to be input or retrofitted. Clearly, the responses to this part of our survey were heterogeneous.
Solem (1975) emphasized that mollusk collections are exceeded in number of specimens only by entomological collections. Several other collection communities produced surveys similar to Solem's in the 1970s, from which he noted 120 million insects, 72 million mollusks, and 46 million plants. The U.S. National Herbarium currently reports that its 5 million specimens represent about 8% of the holdings in the United States, implying that there are about 63 million total specimens ( http://botany.si.edu/colls/collections_overview.htm), so the relative rankings of insects, mollusks and plants are unchanged since the 1970s.
Solem also estimated that mollusks, with about 850 specimens per species across the surveyed collections were second only to fish, with about 1,750 collection specimens per species in U.S. and Canadian collections. These high numbers of specimens available per species for study and comparison make these two groups extremely well-suited to environmental monitoring, allowing measurement of attribute changes (e.g. body size) through space and time. No other taxon group passed 500 specimens per species in the nation’s collections (Solem 1975: 231). Since Solem's time, the estimated number of described mollusk species have declined from 85,000 to about 78,800 (Table 2), whereas the number of fish species has increased from the 20,000 Solem used for calculations to 34,852 (Eschmeyer and Fong 2018). We found an average of 108 lots per species for mollusks (Table 2), or about 1,100 specimens per species, using the weighted average of 10.5 specimens per lot (Appendix 3). This is almost 10 times what Solem reported for insects, 120 specimens per species, a figure that is probably high, given the explosive increase in estimates for diversity of insects since that time (Stork et al. 2015).
The estimate of 1,100 specimens per species might also be considered high, since mollusk collections contain undescribed species and unidentified specimens. Many mollusks species are rare, however, being known only from the type locality, and so may not even be present in North American collections. We expect that many common species will be represented by hundreds of samples and thousands of specimens that will allow construction of time series for studying changes in distribution patterns.
The pattern of lots per species among the molluscan classes in U.S. and Canadian collections is interesting. (We switch here to using lots rather than specimens per species since it is not clear that the average number of specimens per lot will be the same across classes and our survey data do not allow us to calculate it). Bivalves (230 lots per species) and cephalopods (128 lots per species) have the highest values, which perhaps reflects that some species have high enough abundance that they can sustain fisheries, but they are collected with different methods, since virtually all bivalves are benthic whereas many cephalopods are pelagic. Gastropods and chitons have similar, intermediate values, 91 and 82 lots per species, which is probably coincidental since gastropods occupy terrestrial and freshwater habitats in addition to the marine habitat of chitons. Scaphopods are somewhat lower at 65 lots per species, which might reflect their exclusively infaunal habitat. Aplacophorans are undersampled relative to other mollusks at 19 lots per species, reflecting that they lack shells and therefore are not collected post-mortem like shell-bearing mollusks; also, their small worm-like bodies may not be recognized as molluscan by the non-specialist. If the average value of specimens per lot (10.5, Table 3) is applied across the molluscan classes, the aplacophorans end up in Solem's “low” category for specimens per species, whereas the bivalves have the highest value, at 2,400 specimens per species, which suggests that bivalves are one of the best sampled classes of metazoans. Only among microorganisms such as diatoms might higher number of specimens per species be expected in natural history collections.
Fish collections have grown more rapidly than mollusk collections, increasing from 35 million to 64 million specimens by the early 1990s (Poss and Collette 1995). Much of this growth however has been in larval fish, which are difficult to identify by morphological means (Ko et al. 2013). Also, unlike mollusk collections, where all the large collections reported an increased number of lots between Solem's 1975 survey and our 2017 survey, some fish collections show a decrease in number of lots from 1995 to today, judging from collection websites (LACM, 7 million to 4 million; CAS: 2.16 million to 1.2 million; USNM, 5 million to 4 million, MPM, 1.5 million to 685,000). It is therefore difficult to judge whether mollusk or fish collections have more material per species that is relevant for assessing environmental change. What is more important is the similarity of fish and mollusk collections in being lot based, which means they can be more rapidly and effectively digitized than taxa in which cataloguing and labelling is individual based. A single lot in a mollusk or fish collection can contain hundreds of specimens, which allows studies of environmentally mediated change in morphological and genetic variation over time, in addition to changes in distribution patterns.
Over the past 15 years, the value of natural history collections (National Science and Technology Council 2009), their use in formal and informal education (Cook et al. 2014, Ellwood et al. 2015, Hiller et al. 2017), and their increased use in research due to digitization efforts has been the subject of much discussion. There are many challenges to digitization that still must be addressed, including a need for coordination of activities across all natural history collections, and means of finding efficiencies in similar tasks (Vollmar et al. 2010).
The scope of the challenge is staggering, and global efforts to consider the best approach have focused on digitizing metadata from all collections across the world as a starting point (Berendsohn and Seltmann 2010, Scoble 2010, Page et al. 2015). As part of the U.S. response to this enormous task, the National Science Foundation Program “Advancing Digitization in Biological Collections” has been funding digitization and web-publishing of data in non-federal collections since 2011. The iDigBio program (Integrated Digitized Biocollections), headquartered at the Florida Museum of Natural History and currently supported by NSF, has become an important collaborator and coordinator of collections information and resources for collections trying to improve their digitization efforts.
Sustained funding and coordinated collaboration have brought millions of new specimen records online, and natural history collections are slowly being recognized as the important source of biodiversity data that they are; however, the usefulness of natural history collections in the 21st century is directly tied to their availability online and the ability of researchers to use those data quickly and with confidence. The rate at which mollusk collection data are annotated, contextualized with provenance data, and published online needs to be accelerated. Planning for this challenge requires an understanding of the amount of data available and its state of preparedness for digitizing.
A big thank you goes, posthumously, to Alan Solem (FMNH) who had the foresight and stamina to request and compile the original collection survey results in the early 1970s. He likely suspected how valuable the baseline that he generated would become. For this new round of questionnaires, we had the support of a very large number of colleagues at numerous collections across North America who responded to our (RB and PS) requests for survey data and/or helped answering various associated questions (here arranged by U.S. states and Canadian provinces):
J. Andrés Lopez, UAM (AK); Jason Bond, Melissa Callahan, AUMNH (AL); Nancy G. McCartney, ARK (AR); Peter N. Reinthal, UAZ (AZ); Margaret Dykens, SDNH (CA); Christina Piotrowski, Elizabeth Kools, CASIZ (CA); Erica Clites, UCMP (CA); Lindsey T. Groves, Jann Vendetti, LACM (CA); Charlotte Seid, SIO-BIC (CA); Linsey Sala, SIO-PIC (CA); Paul Valentich-Scott, SBMNH (CA); CalCOFI (CA);Jingchun Li, Kelly Martin, UCM (CO); Paula Cushing, Phyllis Sharp, DMNS (CO); Eric A. Lazo-Wasem, YPM (CT); Ellen Strong, USNM (DC); Alex Kittle, DMNH (DE); Dennis Hanisak, HBOM (FL); Paul Larson, Laura Wiggins, FWRI (FL); José Leal, BMSM (FL); Nancy A. Voss, RSMAS (FL); John Slapcinsky, Gustav Paulay, UF (FL); Byron J. Freeman GTMC-GMNH (GA); Norine W. Yeung, Richard Pyle, Jaynee R. Kim, BPBM (HI); Cindy Opitz, SUI (IA); Dawn Roberts, CASPNNM (IL); Kevin S. Cummings, INHS (IL); Meredith Mahoney, ISM (IL); Jochen Gerber, Janeen Jones, FMNH (IL); Ronald L. Richards, Ryan Rokicki, Randy and Deborah Patrick, INSM and WMI (IN); David M. Hayes, EKY (KY); Lorene E. Smith, LSUMG-I (LA); Adam Baldinger, MCZ (MA); Scott Jervas, Berkshire Museum (MA); Akiko Okusu, UMass Amherst (MA); Hayley Singleton, Beneski Museum, Amherst College (MA); Hannah Appiah-Madson, OGL (MA); James Young, NHSM (MD); Anthony L. Swinehart, DM F (MI); Taehwan Lee, UMMZ (MI); Sean Keogh, Andrew M. Simons, JFBM (MN); Richard J. Oehlenschlager, SMM (MN); Robert L. Jones, MMNS (MS); Lenny Lampel, MCPR (NC); Arthur E. Bogan, Jamie Smith, NCMNS (NC); Denise Furr, SMNC (NC); Patricia W. Freeman, Thomas E. Labedz UNSM (NE); David Parris, NJSM (NJ); Sara V. Brant, Sandra L. Brantley, MSB (NM); Christine Johnson, AMNH (NY); Isabel P. Hannes, Kathryn Leacock, BMS (NY); Denise A. Mayer, NYSM (NY); Greg Dietl, Leslie L. Skibinski, PRI (NY); Nicole Gunter, Gavin Svenson, CLEV (OH); Francisco Borrero, Emily Imhoff, CMC (OH); Steven Sullivan, Hefner (OH); G. Thomas Watters, OSUM (OH); Katrina Menard, OMNH (OK); Christopher Marshall, OSAC (OR); Nezka Pfeifer, Everhart (PA); Timothy Pearce, CM (PA); Paul Callomon, ANSP (PA); David Robinson, USDA (PA); Alex Van Dam, UPRM (PR); Dave Cicimurri, SCSM (SC); Matthew Gibson, ChM (SC); Gerald R. Dinkins, MMNHC (TN); Teresa Mayfield, UTEP (TX); Melissa Casarez, TNHC (Austin, TX); Karen Morton, DMNH-P (TX); Lacie Ballinger, FWM (TX); Tina Petway, HMNS (TX); Christy Bills, UMNH (UT); Wesley Skidmore, MLBeanLSM (UT); Jennifer C. Dreyer, VIMS (VA); Haley Cartmell, VMNH (VA); Melissa Frey, UWBM (WA); Emily Halverson, Laura A. Monahan, UWZM (WI); Daniel J. Meinhardt, Richter (WI); Julia Colby, MPM (WI); Claire Goodwin, Rebecca Milne, ARC (New Brunswick, Canada); Henry Choong, RBCM-INVZ (British Columbia, Canada); Jean-Marc Gagnon, CMNML (Ontario, Canada); Sebastian Kvist, Maureen Zubowski; ROM (Ontario, Canada).
Many other colleagues helped with additional data, among them Jay Codeiro, Victor Fet, Daniel L. Graf, and Timothy Rawlings, and we apologize for any inadvertent omissions from this list. The work on this project was partly supported by NSF award EF-Digitization TCN 14-02667 to P. Sierwald and R. Bieler, and by a mollusk collection digitization workshop grant through iDigBio to P. Sierwald and E. Shea ( https://idigbio.org/wiki/index.php/Digitizing_the_2nd_largest_Invertebrate_Phylum:_Mollusks).
Questionnaire – In the questionnaire, requested data were grouped by topics. The questionnaire was organized in a spread sheet format, prompting entries into cells next to the item in question. # = the questionnaire requested entering a number; several items requested a write-in reply or write-in narrative, here indicated by a colon: PUBLIC "-//Atypon//DTD Archiving and Interchange DTD Suite with ACHS extensions v2.2 20080601//EN" "archivearticle.dtd">; comments in () or  indicate explanations of the requested data type, e.g. a) skeleton (= partial) data only (e.g. name and basic locale only).
Collection details and finding aids to mollusk collections. Collections were asked to identify significant holdings and donations and provide additional narratives to document the scope of holdings. These responses are excerpted here, arranged in alphabetical order.
Collection Sizes – U.S. and Canadian mollusk collections listed by size, based on number of cataloged lots reported in the 2017 survey. Numbers of cataloged lots reported by Solem (1975) and Cummings et al. (2009) provided for comparison. Some of Cummings et al's numbers come from 1996; “listed” means they included the collection but did not provide its size. Collection acronyms as in Table 1. Superscript notations: sp = number of specimens reported (instead of lots); 1 = no catalogue numbers assigned; bl = quality backlog included; d = digitized; U = Unionida only; LD = limited data; R&F = Recent and fossil; F = fossil only; ENA, only Eastern North American holdings shown; unknown = ‘unknown’ reported by the collection or data not available; * rounded calculation assuming same ratio of lots to specimens as in Solem (1975); * rounded calculation assuming 10 specimens / lot on average across institutions, usually to convert lots to specimens, but sometime vice versa. The figure under total in the “Specimens per lot” column is the weighted average += collection incorporated by institution listed above. PRI's number for total types included fossils and so was excluded in the column total.
Preservation Type. – Arranged by size as in Appendix 3. Main collection types are dry, wet (fluid preserved), and frozen tissue. Others include eggs, microscope slides of radula or histology, SEM stubs, DNA extracts and images. Abbreviations: bl = figure includes backlog lots; D = digitized lots; sp = number of specimens reported (instead of lots); “0” = collection stated that none are presently in the collection; blank: no data provided by institution.
Taxonomic composition. – Arranged by collection size as in Appendix 3. “0” means an institution reported not having (cataloged) specimens of the taxon; “-” and “” mean the respondent left the field blank but the latter means that the count could be inferred to be zero as other columns sum to the expected number of lots; “?” means we could not fit the response to the table. Institutions with fewer than 40,000 lots that did not report a taxonomic break down are omitted from the table.
Number of lots by habitat. – Arranged by collection size as in Appendix 3. For Solem (1975) lots by habitat was determined by applying the percentages in his tables 1–3. This revealed a few discrepancies with the numbers he used for rankings on p. 229: FMNH was 27,000 not 29,000 for freshwater, BPBM was 112,000 not 120,000 for terrestrial and OSUM was 18,000 not 39,000 for freshwater. * indicates institution included backlog in their calculations for 2017.
Percentage of lots by habitat. – Under “Total of habitat”, the 2017 column shows the sum of the marine, freshwater and terrestrial columns from App. 6, whereas the 1975 column repeats the cataloged lots column from App. 6.
Digitization of collections. – Arranged by collection size as in Appendix 3. If an institution reported only a percentage for georeferenced lots, a rounded number of lots calculated from this percentage is shown in square brackets. “Darwin core compliance” is reported only as a percentage since some institutions answered in terms of compliance by field rather than by lot or record.
Marine holdings by geographic regions. –Several collections could not provide regional data in the form requested for the survey. These collections, some of which undoubtedly hold material from these regions, are omitted from this table. * means backlog included; percentages were calculated excluding backlog.
Non-marine holdings by geographic regions. – Several collections could not provide regional data in the form requested for the survey. These collections, some of which undoubtedly hold material from these regions, are omitted from this table. * means backlog included; percentages were calculated excluding backlog.