New Research Proposes a System to Determine the Accuracy of Predictive AI in Medical Settings
Artificial intelligence (AI) has the potential to enhance work in various industries, including healthcare. However, to ensure the safe and responsible integration of AI tools into the workplace, there is a need for more robust methods to understand when and how they can be most useful.
In healthcare, the question of when AI is more accurate than a human is particularly important, especially in tasks with high stakes. To address this, a joint paper by Google Research and Nature Medicine introduces CoDoC (Complementarity-driven Deferral-to-Clinical Workflow), an AI system that learns when to rely on predictive AI tools and when to defer to a human clinician for the most accurate interpretation of medical images.
CoDoC: An Add-on Tool for Human-AI Collaboration
CoDoC is designed to improve the reliability of AI models without requiring the redesign of the underlying AI tool itself. This means that healthcare providers, who may not be machine learning experts, can easily deploy and run the system on a single computer. The training process also requires only a small amount of data, typically just a few hundred examples. Additionally, CoDoC is compatible with any proprietary AI models and does not require access to the model’s inner workings or training data.
Determining AI Accuracy in Comparison to Clinicians
CoDoC proposes a simple and usable AI system that helps improve reliability by allowing predictive AI systems to recognize their limitations. The system focuses on scenarios where a clinician uses an AI tool to interpret medical images, such as examining a chest x-ray for the need for a tuberculosis test.
CoDoC requires three inputs for each case in the training dataset: the predictive AI’s confidence score, the clinician’s interpretation of the image, and the ground truth of whether the disease was present. Notably, CoDoC does not require access to any medical images.
Increased Accuracy and Efficiency
Tests with real-world datasets have demonstrated that CoDoC’s combination of human expertise and predictive AI results in greater accuracy. In one case, CoDoC reduced false positives by 25% for a mammography dataset. Additionally, in hypothetical simulations, CoDoC reduced the number of cases requiring clinician review by two thirds.
Responsible Development of AI for Healthcare
CoDoC shows potential for adapting and improving performance across different demographic populations, clinical settings, medical imaging equipment, and disease types. To safely bring CoDoC and similar technologies to real-world medical settings, collaboration between healthcare providers, manufacturers, and AI researchers is crucial. Rigorous evaluation and validation of these systems are necessary to ensure their effectiveness and benefits.
Learn more about CoDoC: http://github.com/deepmind/codoc