Science and Research Content

Collaboration between IBM Research Europe and Thieme Chemistry brings together ML and expert human-curated data with unprecedented results in synthesis reaction planning -

The collaboration between IBM Research Europe and Thieme Chemistry, announced this summer, leverages the synergies between high-quality data (Science of Synthesis and Synfacts by Thieme) and cutting-edge machine learning models for organic chemistry synthesis prediction (RXN for Chemistry by IBM) to create an unprecedented user experience. RXN For Chemistry, a cloud-based platform powered by artificial intelligence (AI), was recently trained with the highest-quality, human-curated datasets from Thieme's Science of Synthesis and Synfacts. IBM Research Europe and Thieme Chemistry are have now announced the preliminary findings of their collaboration, which were assessed by seven famous synthetic chemistry specialists and their research groups from China, Germany, Switzerland, New Zealand, and the United States of America.

Organic compounds can react in hundreds of thousands of distinct ways with one another. Experiential knowledge is critical for organic chemists in order to avoid spending endless hours in the laboratory doing several trials and errors. To enhance synthesis planning, IBM Research and Thieme Chemistry combined expert-curated datasets from Thieme's full-text resource for synthetic organic chemistry methods, Science of Synthesis, and peer-reviewed content from the journal Synfacts with an IBM AI model called Molecular Transformer in RXN for Chemistry.

Created to accurately anticipate the outcome of chemical reactions, the Molecular Transformer was later extended to enable retrosynthetic analysis — determining the chemicals required to generate a specific target molecule. The model has shown great performance in learning chemical reactivity from datasets of chemical processes.

Synthesis Science and Synfacts cover a broad range of reaction space. Typically, models trained on publicly accessible patent datasets underperform on a large number of these reactions. Chemical records in Science of Synthesis and Synfacts are of greater quality, as evidenced by a higher percentage of useable records. Thieme's dataset's consistency improves the AI models' learning process, resulting in more consistent predictions: Thieme-trained models on the RXN for Chemistry platform significantly improve prediction accuracy by a factor of three for forward predictions and by a factor of nine for retrosynthesis.

Thieme and IBM Research Europe's collaboration effort demonstrates the influence that high-quality chemical reaction data may have on future AI chemical synthesis tools. Integrating high-quality, curated data from Science of Synthesis and Synfacts creates an unparalleled chance to improve RXN's chemistry performance to new heights by releasing the whole information contained in hundreds of thousands of chemical reaction records.

Selected models were evaluated by seven world-renowned organic synthesis specialists and their teams. As a result of this partnership, IBM Research Europe and Thieme will be able to refine the models and their usage, and provide a unique venue for discussion between machine learning professionals and the synthetic organic chemistry community.

Seven internationally known specialists in organic synthesis and their groups agreed to examine the retrained models. The experts will continue to provide insightful feedback to IBM Research Europe and Thieme during this collaboration, enabling improvements to the models and their usage, as well as creating a unique forum for exchange between machine learning experts and the synthetic organic chemistry community.

IBM Research Europe and Thieme Chemistry will host a free Web conference on December 1, 2021, to discuss the outcomes of their partnerships. The teams will compare the performance of language models trained on the best commercially available datasets (Science of Synthesis and Synfacts) against that of publicly available patent reaction records, with a particular emphasis on retrosynthetic and chemical prediction tasks.

Those interested to participate, may register here: Web seminar “Powering Molecular Transformers with High Quality Data”


sponsor links

For banner ads click here