As part of the efforts of the National Library of Medicine (NLM) to transform and accelerate biomedical discovery and improve health and healthcare, NLM is transitioning to automated MeSH indexing of MEDLINE citations in PubMed. Automated indexing will provide users with timely access to MeSH indexed metadata and allow NLM to scale MeSH indexing for MEDLINE to the volume of published biomedical literature. Human indexers have been and will continue to be involved in the refinement of automated indexing algorithms and will play a significant role in the quality assurance approaches for automated indexing.
In 2018, NLM launched the MEDLINE 2022 initiative, a five-year development plan that aims to ensure that MEDLINE continues to evolve to meet the needs of users in an age of data-driven discovery. A key goal of this initiative involved implementing a range of indexing methods to ensure the timely assignment of MeSH to MEDLINE citations. Based on the successful pilot of automated indexing on a limited scale since 2016, it was determined that fully automated MEDLINE indexing be implemented with quality control, and that human curation and automation be specifically applied to improve the discoverability of chemical and gene information in MEDLINE.
Automated MeSH indexing has been under development at NLM for many years and the most significant outcome is the development of the Medical Text Indexer (MTI) by researchers in the Lister Hill National Center for Biomedical Communications. MTI is not new; it has been used to provide indexing suggestions for human indexers since 2002 and was incorporated as the "first line" of indexing with subsequent human curation for a set of journals starting in 2011. Automated indexing with a version of MTI has been used for comments since 2016, OLDMEDLINE citations since 2015, and for processing an experimental batch of backlogged citations in 2016. Since 2018, the method of indexing has been identified in the XML of all completed citations.
The MTI algorithm has been undergoing refinements in recent years as we move towards automation, including incorporation of deep learning approaches to improve the application of MeSH subheadings, the incorporation of rules and triggers for the indexing of Publication Types, and the application of IM designation. The version of MTI used for current automated indexing is called MTIA, and it is being applied to citations from a variety of journals. Human curation of MTIA-indexed citations originally involved a scan of all citations indexed by MTIA but has been modified to focus curation on specific sets of citations (e.g., those involving genes and proteins) to scale curation and to ensure that indexed terms are correct and irrelevant terms are not indexed.
Recognizing that searching for chemicals and genes are some of the most searched data points in PubMed, NLM is working to improve recognition of these entities by MTIA and are evaluating the incorporation of chemicals identified by the NLM-Chem identification tool. NLM is also evaluating NLM-Gene as a tool to support curation at scale for the creation of GeneRIFs (the links made between PubMed and the Gene database).
It is expected that by mid-2022 all citations indexed for MEDLINE will be indexed by MTIA, with human curation applied as indicated. Beyond achievement of this major milestone, the MTIA algorithm will continue to be refined and improved.Click here to read the original press release.