Aneesh Tickoo, a consulting intern at MarktechPost, conducted research on the use of big language models (LLMs) in long-term action anticipation (LTA) in the context of artificial intelligence (AI). LTA is important for human-machine communication, especially in situations like self-driving cars and household tasks. Tickoo’s study aimed to explore the potential benefits of LLMs in improving LTA.
The research focused on the challenges of video action detection due to the ambiguity and unpredictability of human behaviors. Bottom-up modeling, a popular LTA strategy, simulates human behavior using visual representations or action labels. However, Tickoo and the researchers proposed a top-down framework in addition to the bottom-up strategy, which considers the goal of the actor to aid action prediction.
To investigate the use of LLMs in LTA, the researchers developed a two-stage system called AntGPT. This system combined supervised action recognition algorithms with OpenAI GPT models to predict future actions based on recognized actions in videos. They conducted tests on various LTA benchmarks and found that AntGPT showed promising results.
The study contributed several findings. Firstly, it suggested the use of LLMs to infer objectives and model temporal dynamics in LTA. Secondly, it introduced the AntGPT framework, which incorporated LLMs with computer vision algorithms to achieve state-of-the-art performance in long-term action prediction. Finally, the research provided comprehensive evaluations of LLMs in LTA, outlining their benefits and limitations.
Tickoo and the researchers plan to release the code for their project and encourage readers to join their ML SubReddit, Facebook Community, Discord Channel, and Email Newsletter for updates on AI research and projects.
In conclusion, Tickoo’s study explored the potential of LLMs in improving long-term action anticipation in AI. The research provided valuable insights and proposed the AntGPT framework as a promising approach in this field.