LLMs in Action: Enhancing Task Completion and Computer Control
In recent efforts, large language models (LLMs) such as SAYCAN, REACT, TOOLFORMER, and SWIFTSAGE have shown promise in various live contexts like ALFWORLD and ALPHACODE. These LLMs are used to follow expert trails, understand environmental changes, plan future activities, and compose API requests. Studies like REFLEXION and SELF-REFINE have demonstrated that repeated task performance with self-reflection can greatly improve task completion. In this process, LLMs are asked to modify previous execution plans based on environmental feedback, which is then incorporated into the action generator’s prompt for the next round.
The Challenge of Learning New Jobs
MINIWOB++ has been used as a testbed to evaluate LLM performance on modularized computing workloads. Learning a task through comprehensive trace examples (WebGUM), self-supervision, or few/many shot prompting (SYNAPSE) are standard methods. These methods have achieved a task completion rate of over 90% in computer control. However, the reliance on expert traces limits an agent’s ability to learn new jobs. To address this, researchers from Google Research and the University of Toronto propose a zero-shot agent that can independently learn and improve its control over a computer without relying on expert guidance.
The Zero-Shot Agent
The agent built by the researchers is based on PaLM2, a recent LLM, and it uses a single set of instruction prompts for all activities instead of task-specific prompts. Unlike contemporary efforts that utilize screen representations with additional data, this agent focuses on a condensed screen depiction to make the test environment more realistic. They also provide a simple but effective action planner that can precisely plan out executable operations on a state in a single pass. The agent demonstrates the ability to complete most simple tasks on the MINIWOB++ benchmark using the latest LLM capacity.
Furthermore, the researchers introduce a systematic thought management technique inspired by Reflexion to help the agent learn from exploratory failures and progress in more difficult tasks. After a few rounds of trial and error, the agent achieves performance equivalent to previous few/many-shot state-of-the-art methods. This zero-shot design for computer control tasks is the first of its kind, according to the research.
Conclusion and Future Directions
The use of LLMs in action production has shown promising results in various live contexts. The development of a zero-shot agent that can independently learn and improve computer control tasks opens up new possibilities for artificial intelligence. By reducing the reliance on expert traces and utilizing a condensed screen depiction, the agent is able to complete tasks with a high success rate. Further research and advancements in this area could lead to more efficient and capable AI systems.
Check out the Paper. All credit for this research goes to the researchers on this project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter. Subscribe now!
We are also on WhatsApp. Join our AI Channel on Whatsapp.