Discovering new materials and drugs is a time-consuming and expensive process. Scientists often use machine learning to predict molecular properties and narrow down the molecules they need to synthesize and test in the lab. Now, researchers from MIT and the MIT-Watson AI Lab have developed a new framework that can predict molecular properties and generate new molecules more efficiently than existing deep-learning approaches.
Traditionally, training a machine learning model to predict a molecule’s properties requires a large dataset of labeled molecular structures. However, these datasets are often hard to come by and expensive to create. In contrast, the system developed by the MIT researchers can effectively predict molecular properties using only a small amount of data. It has an understanding of the rules that govern how building blocks combine to produce valid molecules, allowing it to generate new molecules and predict their properties more efficiently.
The researchers found that their system outperformed other machine learning approaches, even when given small datasets with fewer than 100 samples. Their goal is to speed up the discovery of new molecules by using data-driven methods, eliminating the need for costly experiments.
To achieve these results, the MIT team created a machine learning system that learns the “language” of molecules, also known as a molecular grammar. This grammar allows the system to generate viable molecules and predict their properties. Similar molecules share similar structures and grammar rules, and the system learns to understand these similarities.
The system learns the production rules for molecular grammar using reinforcement learning, a trial-and-error process where the model is rewarded for behavior that gets it closer to achieving a goal. To make the learning process faster, the researchers divided the molecular grammar into two parts: a general grammar and a molecule-specific grammar learned from a smaller dataset. This hierarchical approach speeds up the learning process.
In experiments, the researchers’ system generated viable molecules and predicted their properties more accurately than popular machine learning approaches, even with small datasets. It was particularly effective at predicting physical properties of polymers. The researchers also aim to extend their molecular grammar to include 3D geometry and develop an interface for user feedback.
This research, funded by the MIT-IBM Watson AI Lab and Evonik, has promising implications for accelerating the discovery of new molecules and expanding the applications of molecular grammar beyond chemistry and material science.