The National Library of Medicine (NLM), a component of the US' National Institutes of Health, has launched a new digital repository, Digital Collections, at http://collections.nlm.nih.gov. The new resource is complementary to the PubMed Central digital archive of electronic journal articles (http://www.ncbi.nlm.nih.gov/pmc/). The repository is projected to allow rich searching, browsing and retrieval of monographs and films from NLM's History of Medicine Division. Additional content and other format types will be added over time. Users can perform full-text and keyword searching within each collection or across the entire repository.
This first release of Digital Collections includes a newly expanded set of Cholera Online monographs, a portion of which NLM first published online in PDF format in 2007. The version of Cholera Online now available via Digital Collections includes 518 books (dating from 1817 to 1900) about cholera pandemics of that period. More information about the selection of the books and the subject of cholera may be found on the original Cholera Online web page at: http://www.nlm.nih.gov/exhibition/cholera/. Each book was scanned into high-quality TIFF images, which underwent optical character recognition to generate corresponding text files. Finally, a JPEG2000 derivative was created for each page for presentation through the integrated book viewer, which includes a Flash-based zooming feature for resizing and rotating a page on demand.
The second collection is a selection of 11 historical films, all created by the US government and in the public domain. The films have been digitised in a variety of video formats, to accommodate a wide range of playback devices, including mobile devices. Digital Collections also includes an integrated, Flash-based video player which allows full-text search of a film's transcript and graphically displays where the searched word or phrase occurs within the timeline of the film.
Every page of each book and every video is stored as a discrete object in Digital Collections, with an XML 'glue' describing each object and relationships between objects. To ensure long-term integrity of these digital files, checksums (number strings which act like mathematical "fingerprints") are calculated and written into the objects as the objects are ingested into Digital Collections. These checksums will be re-calculated periodically and compared with the original values. Additionally, all ingested files are versioned, so that any changes do not overwrite the original but instead create a new, second file which is stored along with the first.
Digital Collections was built using several open-source components, with the Fedora Commons Repository Software providing the foundation. The primary browse and search interface has been adapted from the Muradora 'front-end' for Fedora, created by Macquarie University in Sydney.
In 2009, NLM began a pilot project to build the repository, develop appropriate workflows for ingesting and managing the content, and provide a core set of end-user services suitable for general public access. Information on the year-long evaluation process leading to the selection of Fedora can be found at http://www.nlm.nih.gov/digitalrepository/index.html.
To access our daily STM news feed through your iPhone, iPad, or other smartphones, please visit www.myscoope.com for a mobile friendly reading experience.