Defining and Measuring Distance from Calibration in Probabilistic Predictors
The distance from calibration for probabilistic predictors is a fundamental question that we aim to address in this study. While perfect calibration is well-understood, there is no consensus on how to quantify the distance from perfect calibration. Several calibration measures have been proposed, but it remains unclear how they compare and if they satisfy basic properties like continuity. In this article, we present a rigorous framework for analyzing calibration measures, inspired by property testing. We also propose a ground-truth notion of distance from calibration.
Measuring Distance from Calibration:
We define the distance from calibration as the distance to the nearest perfectly calibrated predictor. This serves as a ground truth for measuring calibration. To measure this distance, we introduce three consistent and efficiently estimable calibration measures: smooth calibration, interval calibration, and Laplace kernel calibration.
1. Smooth calibration: Smooth calibration provides a quadratic approximation to the ground truth distance. We demonstrate that this approximation is information-theoretically optimal.
2. Interval calibration: Interval calibration, similar to smooth calibration, offers a quadratic approximation to the ground truth distance. This approximation is also shown to be information-theoretically optimal.
3. Laplace kernel calibration: Laplace kernel calibration is another consistent calibration measure. Although it does not provide a quadratic approximation, it has its own advantages and is preferred in practice.
This study establishes fundamental lower and upper bounds on measuring the distance to calibration in probabilistic predictors. By introducing the consistent calibration measures of smooth calibration, interval calibration, and Laplace kernel calibration, we offer practical and efficient methods for estimating the distance from perfect calibration. This framework provides a solid foundation for evaluating and comparing calibration measures.