In recent years, diffusion generative models like DALL-E 2 and Midjourney have gained popularity for their ability to create striking images based on text prompts. However, researchers at MIT’s Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic) believe that these models can do more than just produce surreal images. They have developed a new molecular docking model called DiffDock, which they believe can revolutionize the development of new drugs and reduce the risk of adverse side effects.
DiffDock, which will be presented at the 11th International Conference on Learning Representations, offers a groundbreaking approach to computational drug design. Unlike current tools used by pharmaceutical companies, DiffDock takes a different path, presenting an opportunity for a major overhaul of the traditional drug development pipeline.
Molecular docking is a technique used to predict how a drug molecule and a protein can bind together. It has been instrumental in the development of drugs for diseases like HIV and cancer. However, the process is time-consuming and expensive, with many drug candidates failing clinical trials. Researchers are eager to find faster and more efficient ways to sift through the vast number of potential drug molecules.
Most current molecular docking tools use a “sampling and scoring” approach, evaluating different poses of the drug molecule in relation to the protein. DiffDock takes a different approach by treating the problem as a generative modeling task. Instead of aiming for a single solution, DiffDock allows for multiple poses to be predicted, each with a different probability. This approach reduces the likelihood of failure and accounts for uncertainties in the process.
Diffusion generative models work by gradually introducing random noise to an image and then training a neural network to recover the original image. In the case of DiffDock, the model is trained on various ligand and protein poses to identify multiple binding sites on proteins. Instead of generating new image data, DiffDock generates new 3D coordinates that help the ligand find potential angles for binding.
DiffDock’s unique approach opens up possibilities for leveraging other AI models like AlphaFold 2, which predicts protein folding structures. Previous molecular docking tools have struggled to demonstrate better performance than chance when binding ligands to computationally predicted protein structures. DiffDock, on the other hand, outperforms existing docking models and maintains high accuracy even with computationally generated unbound protein structures.
These advancements bring exciting opportunities for biological research and drug discovery. They offer the potential to accelerate the process of identifying drugs’ mechanisms of action, which is critical for understanding their effects and potential side effects. DiffDock, in combination with protein folding techniques, could greatly simplify the process of identifying off-target effects, reducing the cost and time involved in clinical trials.
Experts in the field, like Tim Peterson from the University of Washington St. Louis School of Medicine, have praised DiffDock for its ability to expedite the drug target identification process. With DiffDock, researchers can screen multiple proteins and triage potential drug candidates virtually in a matter of days, eliminating the need for time-consuming experiments.
The work on DiffDock was conducted by MIT PhD students Gabriele Corso, Hannes Stärk, and Bowen Jing, under the guidance of Professor Regina Barzilay and Professor Tommi Jaakkola. The research was supported by various organizations, including the Machine Learning for Pharmaceutical Discovery and Synthesis consortium, the Jameel Clinic, and the DARPA Accelerated Molecular Discovery program.