Multimodal Interactive Agent (MIA) is a fascinating project developed by researchers to help Artificial Intelligence (AI) better interact with humans and their environment. MIA blends visual perception, language comprehension, navigation, and manipulation to engage in realistic interactions with humans.
The researchers created a 3D virtual environment called Playhouse, where humans and agents can interact. They collected data of real-time human interactions in this environment to further train the AI.
The AI was trained using a combination of supervised prediction of human actions and self-supervised learning. MIA displayed over 70% success rate in human-rated online interactions. Additionally, as they scaled up the dataset and model size, the performance of the AI significantly improved.
The researchers also discovered that as little as 12 hours of human interaction data was enough for the AI to quickly understand and display ceiling performance when learning new commands or objects.
To further explore MIA’s capabilities, the researchers are now focusing on developing methodologies to capture and analyze its open-ended behavior in human-agent interactions.
For more information, check out their paper on this exciting research.