OCLC is taking significant steps to improve the quality and efficiency of WorldCat, its global library resource-sharing network, through the implementation of AI technology. In a move to tackle the persistent issue of duplicate bibliographic records, OCLC Metadata Quality teams have introduced a machine learning model designed to detect and merge duplicates across WorldCat’s vast database.
The initiative, launched in August 2023, was informed by feedback from over 300 cataloging professionals who participated in data labeling exercises, helping to train the AI model. As a result, approximately 5.4 million duplicates in printed book records across multiple languages, including English, French, German, Italian, and Spanish, were removed from WorldCat.
Building on this success, OCLC has enhanced the AI model to handle de-duplication across all formats, languages, and scripts within WorldCat. On February 11, 2025, OCLC will conduct a test run, merging 500,000 duplicate records from printed English books—the largest category of duplicates. The results will be evaluated before further de-duplication efforts are made for other materials.
This project leverages both human expertise and AI to refine metadata, ensuring WorldCat continues to support the global library community with accurate and streamlined data. Libraries are encouraged to enable WorldCat updates in WorldShare Collection Manager to benefit from the latest de-duplication efforts, further enhancing their cataloging and discovery services.
Click here to read the original press release.
More News in this Theme