Object navigation, also known as ObjNav, is a process where a physical agent is guided to a specific object in an unfamiliar environment. This activity is crucial for the agent to interact with the object, making it an important component of navigation-based tasks.
Successfully navigating to a target object requires two key skills: semantic scene understanding and commonsense reasoning. However, existing zero-shot object navigation approaches often lack the ability to reason using commonsense knowledge. These approaches either rely on simple exploration heuristics or require training on other navigation tasks.
To address this issue, researchers from the University of California, Santa Cruz, and Samsung Research have developed a framework called Exploration with Soft Commonsense constraints (ESC). This framework leverages massive pre-trained models to adapt to new environments and object types.
The ESC framework consists of two main components. The first component is GLIP, a vision-and-language grounding model that can infer object and room information based on current agent views. GLIP has been extensively trained on image-text pairs, allowing it to generalize to novel objects. The second component is a pre-trained commonsense reasoning language model that uses the inferred room and object data to infer their association.
However, translating the commonsense knowledge deduced from language models into actionable steps can be challenging. Additionally, there may be some uncertainties in the connections between objects. To overcome these challenges, the ESC approach utilizes Probabilistic Soft Logic (PSL), a declarative templating language. PSL models “soft” commonsense constraints to guide the agent’s exploration strategy. By expressing knowledge in a continuous value space, the ESC approach facilitates more efficient exploration.
The effectiveness of the ESC framework was evaluated using three object goal navigation benchmarks with varying environmental factors. The findings showed that the approach outperforms other methods in terms of success rate and efficiency. On one of the datasets, the proposed zero-shot approach achieved the highest success rate compared to other state-of-the-art algorithms.
To learn more about this research, you can check out the paper and project page. Credit goes to the researchers involved in this project.
Don’t forget to join our ML SubReddit, Discord Channel, and subscribe to our Email Newsletter to stay updated on the latest AI research news, projects, and more.
If you’re interested in AI tools, you can check out more than 900 AI tools in the AI Tools Club.
(Note: This article has been written by Dhanshree Shenwai, a Computer Science Engineer with experience in the FinTech industry and a keen interest in AI applications.)