Descriptive metadata for web archiving: review of harvesting tools

Internet Web archiving Metadata
Attachment Size
apo-nid132156.pdf 820.24 KB

The OCLC Research Library Partnership Web Archiving Metadata Working Group (WAM) was formed to recommend descriptive metadata best practices for archived web content. When the group began its work early in 2016, we discovered that metadata practitioners had high hopes that it would be possible to extract descriptive metadata from harvested content.

This report offers our objective analysis of 11 tools in pursuit of an answer to that question. We reviewed selected web harvesting tools to determine their descriptive metadata functionalities. The question we sought to answer was this: Can web harvesting tools automatically generate descriptive metadata that supports the discoverability of archived web resources? Auto-generation of descriptive metadata for archived web resources could result in significant gains in the efficiency of data entry and thus help enable metadata production at scale.  

Our intent was twofold:

(1) provide the web archiving community with a description of each relevant tool’s overall purpose and metadata-related capabilities, and

(2) inform WAM’s overarching objective of preparing best practice recommendations for web archiving descriptive metadata based on an understanding of user needs.

This report is one of a complementary trio being issued simultaneously to document the work of the WAM Working Group. Its siblings are Descriptive Metadata for Web Archiving: Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group and Descriptive Metadata for Web Archiving: Literature Review of User Needs.

Publication Details