Research report

Beyond the repository: integrating local preservation systems with national distribution services

4 Jan 2018

The “Beyond the Repository” planning grant investigated how local digital preservation practices and repository systems interoperate with distributed digital preservation (DDP) services. The grant team conducted a survey followed by in-depth interviews with selected survey respondents. The survey and interviews revealed both great diversity in how digital preservation is practiced and common challenges in the intersection of local repositories and DDP services.

The survey received 170 complete responses from a variety of organizations. Seventy-seven percent of respondents self-identified as academic institutions, but representatives from archives, government organizations, museums, non-profit organizations, and public libraries also responded to the survey. Survey respondents were nearly evenly split between those that identified as an administrator or department/unit head and those that identified as staff. The vast majority of survey respondents - 90% - reported that their institution had collected more than a terabyte of unique digital content, and 63% reported having collected fifty terabytes or less. Survey respondents reported using a variety of digital preservation and repository systems to manage their content; no one system was used by a clear majority of respondents.

The survey data reveals that most respondents (84%) are storing copies of their unique content in multiple locations. However, the number of copies stored varied among respondents: keeping two or three copies were the most common responses, but ten respondents reported keeping seven or more copies. In regards to where these copies are stored, survey respondents frequently indicated that their organizations pursue more than one storage strategy. Sixty-six percent keep copies in multiple locations onsite, but the cloud and DDP services are also common storage mechanisms. Of the survey respondents who use a DDP service, nearly half are members of the Digital Preservation Network, though several of these use DPN in conjunction with other services.

When asked about curation, almost half of the survey respondents indicated that they sent a subset of their data to a distributed repository (or offsite, or to the cloud). When these respondents were asked to rank the importance of criteria used to select the subset of materials sent off-site, the majority chose Mandate as the most important, followed closely by Intrinsic value and Content type. Sixty percent indicated that they have policies in place to guide selection of locally-held materials, but only 47% have similar policies for materials being sent to distributed systems. The interviews reflected this trend, with many interviewees commenting that they have criteria for selecting materials to be sent to offsite storage or DDP systems, but these are not necessarily articulated in policies.

Survey respondents and interviewees frequently cited lack of interoperability between tools and systems as a challenge. Many identified overspecialization of systems as a contributor to interoperability issues. Others described their systems as separate units with little integration between them, requiring manual processes and workarounds. One way this seems to commonly manifest is the difficulty many respondents and interviewees had in tracking their content between systems.

This research uncovered a number of organizational challenges as well. A common theme in both survey responses and interviews was the lack of required funding or staffing for a robust digital preservation program. These factors were cited as the main reasons why respondents did not keep multiple copies of content in multiple locations, and as significant reasons why their organizations did not have digital preservation policies. Staff turnover was a challenge that was mentioned by many of the interviewees. Several mentioned struggles in retaining technical staff, and others noted that it was difficult to convince administrators to replace staff members who had left. In addition to the above challenges in integrating tools and systems, funding and staffing emerged as significant barriers to building robust digital preservation programs.

The grant team and the advisory board have coalesced around three recommendations after reflecting on the survey and interview results. The first recommendation is for the creation of a decision-making toolkit for choosing materials to send to DDP systems, which would help users with curation decisions and streamline digital preservation workflows. The second recommendation is to determine a shared BagIt profile for DDP systems, which would improve interoperability between systems. The third recommendation is a dashboard or similar tool that could be used to track content between systems. It is our hope that these recommendations are considered for any follow-on work from this project with the aim of improving DDP workflows and interoperability.

