The capabilities of human beings to process multiple sound sources at once, whether for music composition or analysis, are remarkable. Our brains can separate individual sound sources from a mixture and synthesize multiple sources to create a coherent combination. Researchers have mathematically expressed this knowledge using the joint probability density of sources. However, currently, there is no deep learning model that can perform both source separation and music generation tasks.
In the field of musical composition or generation, existing models focus on learning the distribution over the mixtures, which accurately models the mixture but loses information about the individual sources. On the other hand, models designed for source separation learn a single model for each source distribution and rely on the mixture during inference. This approach lacks crucial details about the interdependence of the sources and makes it challenging to generate mixtures.
To tackle this limitation, researchers from the GLADIA Research Lab at the University of Rome have developed the Multi-Source Diffusion Model (MSDM). This model is trained using the joint probability density of sources that share a context, also known as the prior distribution. It can perform both generation and separation tasks by sampling from the prior distribution or conditioning the prior distribution on the mixture and sampling from the resulting posterior distribution. This is a significant advancement in the field of audio models, as it combines both generation and separation capabilities.
The researchers conducted experiments using the Slakh2100 dataset, which consists of over 2100 tracks and is widely used for source separation tasks. The model’s foundation lies in estimating the joint distribution of the sources, and it can handle various inference tasks, including source imputation and classical total inference tasks.
To train the model, the researchers used a diffusion-based generative model and employed a technique called “denoising score matching.” This technique approximates the “score” function of the target distribution rather than the distribution itself. Additionally, the researchers introduced a novel sampling method based on Dirac delta functions to improve source separation performance.
The model’s performance on separation tasks was comparable to other state-of-the-art regressor models. However, the researchers noted that the limited availability of contextual data currently hinders the algorithm’s performance. To address this, they considered pre-separating mixtures and using them as a dataset. Overall, the Multi-Source Diffusion Model developed by the GLADIA Research Lab is a novel paradigm in the field of separation and generation of music. The researchers hope that their work will inspire further research in this domain.
To learn more about this research, you can read the full paper and visit the project website. All credit for this research goes to the researchers involved in the project. Don’t forget to join our ML SubReddit, Discord Channel, and Email Newsletter to stay updated on the latest AI research news and cool projects.