The NTU Singapore team developed a computer program called DIverse yet Realistic Facial Animations (DIRFA) using artificial intelligence (AI) that creates lifelike videos based on just a photo and an audio clip of a speaker. The program captures realistic facial expressions and head movements, exceeding previous models that struggled with emotion and head poses.
The team trained DIRFA on over one million audiovisual clips from over 6,000 people, which helped the program predict speech cues and match them with the right facial expressions. This innovative program could revolutionize the way multimedia communication is conducted, offering endless possibilities for various industries and domains, from creating more sophisticated virtual assistants to improving user experiences. DIRFA is also a powerful tool for individuals with speech or facial disabilities, as it can help them convey their thoughts and emotions through expressive avatars.
The team published its findings in the scientific journal, Pattern Recognition, exploring the challenges behind creating lifelike facial expressions driven by audio. They aimed to create talking faces with precise lip synchronizations, rich facial expressions, and natural head movements corresponding to the audio input. The team is now working to improve the program’s interfaces and expand its capabilities to include a wider range of datasets for more varied facial expressions and voice audio clips.