New AI Tool Classifies the Effects of 71 Million ‘Missense’ Mutations
Uncovering the root causes of disease is a major challenge in human genetics. With millions of possible mutations and limited experimental data, it remains a mystery which ones can lead to disease. This knowledge is crucial for faster diagnosis and the development of life-saving treatments. That’s why we’re introducing the AlphaMissense catalogue, a tool developed using our new AI model, AlphaMissense, which classifies missense variants.
Missense variants are genetic mutations that can affect the function of human proteins and potentially lead to diseases like cystic fibrosis, sickle-cell anemia, or cancer. Our AI model, AlphaMissense, categorized 89% of all 71 million possible missense variants as either likely pathogenic or likely benign, compared to only 0.1% confirmed by human experts.
AI tools that accurately predict the effects of genetic variants have the power to accelerate research in molecular biology, clinical genetics, and statistics. Traditional experiments to identify disease-causing mutations are expensive and time-consuming. By using AI predictions, researchers can get preliminary results for thousands of proteins at once, prioritizing resources and speeding up more complex studies.
We have made all our predictions freely available to the research community and open sourced the model code for AlphaMissense. Our AI model predicted the pathogenicity of all 71 million missense variants, classifying 89% of them. It determined that 57% were likely benign and 32% were likely pathogenic.
What is a missense variant? A missense variant is a substitution of a single letter in DNA that changes an amino acid within a protein. Just like changing a word in a sentence alters its meaning, a substitution can affect protein function. The average person carries more than 9,000 missense variants, most of which are benign. However, some are pathogenic and can disrupt protein function.
Classifying missense variants is crucial to understanding which protein changes can lead to disease. Currently, only 2% of the 4 million variants observed in humans have been annotated as pathogenic or benign by experts. This means that the majority of missense variants, around 71 million, are considered “variants of unknown significance” due to the lack of experimental or clinical data. AlphaMissense provides the clearest picture to date by classifying 89% of variants using a threshold that yielded 90% precision on a database of known disease variants.
AlphaMissense is based on our AlphaFold model, which accurately predicted protein structures from amino acid sequences. We fine-tuned AlphaFold to predict the pathogenicity of missense variants. It leverages databases of related protein sequences and structural context of variants to produce a score between 0 and 1, indicating the likelihood of a variant being pathogenic. The continuous score allows users to choose a threshold for classifying variants as pathogenic or benign according to their accuracy requirements.
AlphaMissense outperforms other computational methods for predicting variant effects. It achieved state-of-the-art predictions in genetic and experimental benchmarks without explicit training on such data. Our tool surpassed other methods when classifying variants from the ClinVar database and predicting results from lab experiments.
AlphaMissense is part of our ongoing efforts to advance protein research. We have made its predictions freely available to the scientific community and partnered with EMBL-EBI to make them more accessible through the Ensembl Variant Effect Predictor. In addition to the missense mutations lookup table, we have shared expanded predictions for over 216 million single amino acid sequence substitutions across human proteins.
We have also included the average prediction for each gene, indicating how essential it is for the organism’s survival. These predictions have the potential to improve the diagnosis of rare genetic disorders and aid in the discovery of new disease-causing genes.
While our predictions are not intended for clinical use and should be interpreted alongside other evidence, they can accelerate research into genetic diseases. We have collaborated with Genomics England to study the genetics of rare diseases, and their evaluation confirmed the accuracy and consistency of our predictions.
Ultimately, we hope that AlphaMissense, along with other tools, will contribute to a better understanding of diseases and the development of life-saving treatments. Learn more about AlphaMissense now!