Advancements in Self-Supervised Learning to Enhance Machine-Learning Models

MIT researchers, in collaboration with the MIT-IBM Watson AI Lab and IBM Research, have developed a new technique for analyzing unlabeled audio and visual data that could improve machine learning models used in applications such as speech recognition and object detection. The technique, called the Contrastive Audio-Visual Masked Autoencoder (CAV-MAE), combines two architectures of self-supervised learning to scale machine learning tasks without the need for annotation. By training on large YouTube datasets of audio and video clips, the CAV-MAE can extract and map meaningful latent representations into high-dimensional space. The researchers believe this technique could have applications in action recognition in sports, education, entertainment, motor vehicles, and public safety, and potentially extend to other modalities beyond audio and visual data.

Source link

Stay in the Loop

Get the daily email from AI Headliner that makes reading the news actually enjoyable. Join our mailing list to stay in the loop to stay informed, for free.

Latest stories

You might also like...