Science and Research Content

HathiTrust US Federal Documents Registry now available as a beta release -

The HathiTrust US Federal Documents Registry is now available as a beta release.

The Registry is intended to be a comprehensive source of metadata for the US federal documents corpus - material produced at government expense since 1789. While many potential use cases exist, an important use will be the identification of materials that have not yet been digitised and/or deposited into the HathiTrust repository.

The Registry was conceived in 2012 as a mechanism to determine how far HathiTrust had progressed in meeting its goal of a comprehensive digital corpus, as outlined in the ballot initiative from the 2011 Constitutional Convention. In the fall of 2013, a broad call for records was issued. From the over 40 libraries that responded, more than 25 million records was received. With such a large aggregation of records, the project team needed to develop multiple approaches for detecting and grouping duplicate records (records describing the same work).

The Registry was launched as a public alpha in June 2015, and since then the team has worked to steadily improve duplicate detection, Registry infrastructure, and the accessibility of the interface.

Now that the Registry has reached this milestone, the project team's focus will shift to gap detection in the HathiTrust digital collection. There will be a continuous effort to reduce the number of duplicate records in the Registry, while also looking to identify and fill gaps in Registry metadata. Staff will also be using Registry data and HathiTrust members’ holdings data to identify materials to be digitised.

Brought to you by Scope e-Knowledge Center, a world-leading provider of metadata services, abstraction, indexing, entity extraction and knowledge organisation models (Taxonomies, Thesauri and Ontologies).

Click here to read the original press release.

STORY TOOLS

  • |
  • |

sponsor links

For banner ads click here