The European research project IMPACT (Improving Access to Text) recently entered its second phase by taking up 11 new partners from Southern and Eastern Europe into the consortium. These new partners will seek to contribute to the project's goals of optimising OCR (Optical Character Recognition) software and language technology for historical material and sharing institutional knowledge and expertise on digitisation. Also, they will help to build the IMPACT Centre of Competence. Slated for launch in early 2011, the IMPACT Centre of Competence will provide a central service entry point for all libraries, archives and museums involved in the digitisation of text material.
IMPACT now brings together 26 national and regional libraries, research institutions and commercial suppliers. The project is coordinated by the National library of the Netherlands and runs from 2008-2011. In early 2010 the EC funded a proposal to extend the project's reach, adding nearly Euro 1 million to the Euro 15.5 million original budget. In the current second phase (2010-2011) an additional four national libraries, four research centres and three universities from France, Spain, Poland, Bulgaria, Slovenia and the Czech Republic have been added. The new research partners will work on building historical lexica for their languages, while the libraries will provide datasets.
After the first two years of the IMPACT project (2008-2009), initial versions of tools have been completed, including a set of tools for efficient lexicon building. An important insight was that the work on the individual languages (Dutch, German and English) contributes to the design of these language-independent tools and that adaptations may be needed to extend the applicability of the tools to new situations. Therefore, a key aim of this extension of IMPACT is to arrive at a cross-language view of the accessibility and enhancement of digitised text.
The IMPACT objective is to significantly improve the accessibility of historical printed text. Issues that users and institutions currently face include the fact that material dated before 1900 is difficult to access in a digital form, as the latest OCR software does not provide satisfactory results for old books, magazines and newspapers. Also, libraries, archives and other content holding institutions across Europe lack experience and know-how in the process of digitisation. Additionally, the historical language barrier forms a stumbling block. Together this causes inefficiency and slows down the process of making European cultural heritage available on the Internet.
To overcome these barriers to digitisation, IMPACT plans to innovate in the technology for text recognition and text enrichment. Two industry partners, IBM and ABBYY, are involved in setting up a text recognition system based on an adaptive model which automatically tunes itself to each new book being digitised.
IMPACT will also improve the process of large-scale digitisation by sharing expertise and best practice. For this a number of strategic tools such as a website, help desk, decision support tools and a training programme are developed as well as the sustainable Centre of Competence, where the requirements of content holders across Europe and the interest of research partners inside and outside the project can be matched. The 11 new partners will demonstrate and disseminate IMPACT project results and support building capacity in digitisation in their countries, which will widen the scope of IMPACT considerably across Europe.
Search for more digital libraries