Score-based generative models, known as SBGMs, are a powerful family of models that can represent complex distributions in high-dimensional spaces. These models, including diffusion models, often use a stochastic differential equation (SDE) to simulate a source density, typically Gaussian. However, SBGMs are limited by their assumption of a Gaussian source, which hinders their ability to capture the dynamics of real-world systems.
To address this limitation, continuous normalizing flows (CNFs), also known as flow-based generative models, have emerged as a solution. CNFs transform the source density to the target density using an ordinary differential equation (ODE) and assume a deterministic continuous-time generating process. Previous work has introduced simulation-free training objectives for CNFs, allowing them to compete with SBGMs even without a Gaussian source.
While CNFs have shown promise, they have traditionally relied on inefficient simulation-based training objectives that require computationally expensive integration of the ODE during training. This approach falls short in learning stochastic dynamics, which are important for modeling real systems.
To overcome these challenges, researchers from various institutions have studied the simulation-free score and flow matching (2M) goal for the Schrödinger bridge problem (SB). The SB problem deals with finding the most likely evolution between a source and target probability distribution. It has applications in modeling stochastic dynamical systems, mean field games, and generative modeling. The 2M approach generalizes the simulation-free objectives for CNFs and the denoising training target for diffusion models to handle stochastic dynamics and arbitrary source distributions.
By defining the Schrödinger bridge as the Markovization of a collection of Brownian bridges, the researchers leverage the relationship between the SB problem and entropic optimum transport (OT). The 2M approach benefits from static entropic OT mappings, which can be approximated efficiently using the Sinkhorn method or stochastic algorithms. This avoids the need for simulating an SDE on each iteration, making the method more practical.
The researchers demonstrate the effectiveness of 2M using both simulated and real-world datasets. On artificial data, 2M outperforms previous approaches in terms of generative modeling metrics and provides a more accurate approximation of the real Schrödinger bridge. They also apply 2M to model cross-sectional measurement sequences, such as unpaired time series observations, on real data. This is a significant advancement, as previous methods could only handle static or low-dimensional dynamic settings.
Additionally, the researchers showcase the capability of 2M to scale to thousands of gene dimensions, making it suitable for modeling cells. This is achieved without requiring simulation during training. They provide a static manifold geodesic map, demonstrating the practical use of Schrödinger bridge approximations with non-Euclidean costs for cell interpolations in a dynamic environment. Furthermore, they show that 2M can directly model and reconstruct the gene-gene interaction network, providing insights into the dynamics of cells.
If you’re interested in learning more, the researchers have made their code and examples available on GitHub. Feel free to check out their paper and GitHub link for more details. Don’t forget to join their ML SubReddit, Discord Channel, and Email Newsletter for the latest AI research news and updates. If you have any questions or suggestions, you can reach out to them via email.