Science and Research Content

IBM develops full-text digitisation system for Japan's National Diet Library -

Technology firm IBM, US, has announced that it is helping the National Diet Library of Japan - the country's only national library - digitise its literary artefacts on a massive scale. The aim is to make them widely available and searchable online by all information seekers.

The prototype technology, created by IBM Research, allows full-text digitisation of Japanese literature to be quickly realised through expansive recognition of Japanese characters and enabling users to collaboratively review and correct language characters, script and structure. Additionally, the full-text digitisation system is designed to promote future international collaborations and standardisation of libraries around the world.

Compared to other languages, which rely on just a few dozen alphabetical characters, Japanese is extremely diverse in terms of script. In addition to syllabary characters, hiragana and katakana, Japanese includes about 10,000 kanji characters (including old characters, variants and 2,136 commonly used characters), in addition to ruby (a small Japanese syllabary character reading aid printed right next to a kanji) and mixed vertical and horizontal texts.

Aside from ensuring quality recognition of Japanese characters, IBM researchers aimed to optimise the amount of time needed to review and verify the accuracy of the digitised texts. By introducing unique collaborative tools via crowdsourcing, the technology allows many users to quickly pour through the texts and make corrections at a much higher rate of productivity and efficiency.

The architecture of the full-text digitisation prototype system provides two key collaborative features - Collaborative Correction and Collaborative Data Structuring.

The full-text digitisation prototype system was realised based on two streams of technologies. IBM researchers in Tokyo applied an approach called Social Accessibility, which allows large groups of reviewers to work collaboratively via Web browsers regardless of location. Also, the COoperative eNgine for Correction of ExtRacted Text (CONCERT) technology - developed by IBM Researchers in Haifa, Israel - was leveraged to significantly improve productivity through the repetition of simple operations.

To access our daily STM news feed through your iPhone, iPad, or other smartphones, please visit www.myscoope.com for a mobile friendly reading experience.

Click here to read the original press release.

STORY TOOLS

  • |
  • |

sponsor links

For banner adsĀ click here