Chemical compounds are constructed using structural formulae, which show the arrangement and connection of atoms. These formulae provide valuable information for chemists, such as which molecules can react with each other and how complex compounds can be synthesized. However, translating structural formulae into machine-readable code has been a challenge. That’s where the Artificial Intelligence (AI) tool “DECIMER” comes in.
An Image Becomes a Code
DECIMER, which stands for “Deep Learning for Chemical Image Recognition,” is an open-source platform developed by Prof. Christoph Steinbeck and Prof. Achim Zielesny. It allows users to upload scientific articles containing chemical structural formulae, which are then processed by the AI tool.
The algorithm used in DECIMER searches the document for images and identifies whether they are chemical structural formulae or other images. Once recognized, the structural formulae are translated into machine-readable structure codes or displayed in a structure editor for further processing. This breakthrough in the project is crucial.
For example, the caffeine molecule’s structural formula becomes the machine-readable structure code CN1C=NC2=C1C(=O)N(C(=O)N2C)C, which can be directly uploaded into a database and linked with additional information.
The Power of AI
The development of DECIMER was inspired by the astonishing performance of AI in the game of Go. Prof. Steinbeck and Prof. Zielesny witnessed the defeat of the best human player by the machine software “AlphaGo,” which opened their eyes to the potential of AI. They realized that AI could solve complex problems with sufficient training data.
Making Scientific Information Accessible
With DECIMER, Prof. Steinbeck and his team aim to machine-read and translate chemical literature dating back to the 1950s into open databases. This sustainable approach preserves existing knowledge and makes it readily available to the global scientific community.
Discover the DECIMER AI tool at https://decimer.ai.