Cornell University's arXiv project, which includes an e-print archive of scientific papers, is looking to covert the existing simple database to a more interactive one. It is being projected as a place where authors, articles, databases and readers talk to each other to help users identify a work's main concepts, see research reports in context and easily find related work. The project is funded by a three-year $883,000 grant from the National Science Foundation, with federal stimulus money from the American Recovery and Reinvestment Act (ARRA).
The arXiv currently contains close to 600,000 papers in physics, mathematics, computer science, quantitative biology, quantitative finance and statistics, with some 5,000 new papers submitted each month. Researchers submit their work as ‘preprints’ before formal publication. New tools will link papers by concepts, not just by the citations they contain. This is expected to help users without advanced expertise including some outside the scientific community - understand the significance of new research. The system will also identify related databases and commentaries.
Computers usually search documents by looking for specific words or phrases, but concepts are not always described with the same exact words, and some words mean different things in different places. New algorithms will use a ‘fuzzier’ approach, inferring concepts by the ways terms are used, and will track related documents over a five- or 10-year time scale. Users will therefore be able to see the ‘genealogy’ of ideas. New documents will be linked to such data as definitions and rules for reasoning about it, which enables machines to infer relationships.
Other enhancements will provide interoperability with such research sites as PubMedCentral and provisions to allow scientists to contribute in newer, more flexible text formats.
Search for more such search services in K-Store
Discuss this NEWS