Article:
Introduction to Optimal Transport (OT) Theory in AI
Optimal transport (OT) theory is a valuable tool in the field of machine learning. It allows us to study and understand maps that efficiently transfer a probability measure from one space to another. One of the key theorems in this field is Brenier’s theorem, which tells us that when the ground cost is the squared-Euclidean distance, the most effective map to transform a continuous measure into another is the gradient of a convex function.
Inspired by Brenier’s theorem, recent research by Makkuva et al. (2020) and Korotin et al. (2020) has focused on using maps defined by input convex neural networks (ICNNs) to exploit this result. ICNNs, as defined by Amos et al. in 2017, are then fitted using stochastic gradient descent (SGD) with samples. However, fitting OT maps with ICNNs presents several challenges. These challenges include the constraints imposed on ICNNs, the need to approximate the conjugate, and the limitation to squared-Euclidean cost.
At this point, it is worth questioning the validity of using Brenier’s result, which is based on densities, to constrain the architecture of candidate maps fitted on samples. To address these limitations, we propose a novel approach to estimating OT maps. Our approach introduces a regularizer called the Monge gap of a map, which measures how far a map deviates from the desired properties of an OT map. By dropping all architecture requirements for ICNNs, we can simply minimize the distance (e.g., Sinkhorn divergence) between the reference measure and the map, regularized by the Monge gap.
Through our research, we have demonstrated that our simplified pipeline outperforms other baseline methods. We have observed significant improvements in practical applications.
Using this new approach, we can better understand and utilize optimal transport theory in the context of artificial intelligence. Our method offers a simpler and more effective way to estimate OT maps, overcoming the challenges associated with traditional ICNN-based approaches.