Home AI News RGB-Stacking: A New Benchmark for Vision-Based Robotic Manipulation

RGB-Stacking: A New Benchmark for Vision-Based Robotic Manipulation

RGB-Stacking: A New Benchmark for Vision-Based Robotic Manipulation

Introducing RGB-Stacking: A New Benchmark for Vision-Based Robotic Manipulation

Picking up and balancing objects may be simple tasks for humans, but for robots, it’s much more challenging. Robots struggle with handling multiple tasks simultaneously, such as manipulating a stick or stacking stones. Before robots can perform these complex tasks, they need to learn how to interact with a wide range of objects. At DeepMind, we’re focused on developing more generalizable and useful robots by enabling them to better understand object interactions.

In our latest research, presented at CoRL 2021 and available as a preprint on OpenReview, we introduce RGB-Stacking as a benchmark for vision-based robotic manipulation. This benchmark requires a robot to learn how to grasp different objects and stack them on top of each other. What sets our research apart is the diversity of objects used and the extensive evaluations conducted to validate our findings. Our results show that a combination of simulation and real-world data can be used to learn complex multi-object manipulation, providing a solid foundation for addressing the challenge of generalizing to new objects. To support other researchers, we’re open-sourcing our simulated environment, sharing the designs for building our real-robot RGB-stacking environment, and providing the RGB-object models for 3D printing. Additionally, we’re releasing a collection of libraries and tools used in our robotics research.

RGB-Stacking Benchmark:

The goal of RGB-Stacking is to train a robotic arm to stack objects of different shapes using reinforcement learning. The robot arm is equipped with a parallel gripper, and there are three objects in a basket: a red, a green, and a blue object. The task is to stack the red object on top of the blue object within 20 seconds, while the green object serves as an obstacle. The learning process involves training the agent on multiple object sets to acquire generalized skills. Different grasp and stack affordances are intentionally varied, forcing the agent to exhibit behaviors that go beyond simple pick-and-place strategies.

Each triplet of objects presents unique challenges to the agent. For example, one triplet requires a precise grasp, another requires using the top object as a tool to flip the bottom object, and another requires balancing. We found that our hand-coded scripted baseline had a 51% success rate at stacking in assessing the challenges of this task.

We’ve designed two versions of the RGB-Stacking benchmark with different levels of difficulty: “Skill Mastery” and “Skill Generalisation.” “Skill Mastery” focuses on training a single agent to stack a predefined set of five triplets. “Skill Generalisation” evaluates the agent’s generalization abilities by training it on a large set of training objects (over a million possible triplets) and testing it on a different set of test triplets. Our learning pipeline is divided into three stages: simulation training using an RL algorithm, training with realistic observations in simulation, and collecting real robot data to train an improved policy offline.

Decoupling the learning pipeline in this way is crucial for solving the problem efficiently and increasing research velocity. This approach allows different team members to work on different parts of the pipeline before combining their changes for overall improvement.

Our agent has demonstrated novel stacking behaviors for the five triplets. The Skill Mastery agent achieved an average success rate of 79% in simulation, 68% zero-shot success on real robots, and 82% success after one-step policy improvement using real data. The Skill Generalisation agent achieved a 54% success rate on real robots. However, closing the gap between Skill Mastery and Generalisation remains an open challenge.

While there has been significant progress in applying learning algorithms to real-robot manipulation problems, the focus has mainly been on single-object tasks like grasping and pushing. Our RGB-Stacking approach, accompanied by our open-sourced resources, provides surprising stacking strategies and mastery of stacking a subset of objects. But there is still much more to explore in terms of generalization. We hope that this new benchmark and the resources we have released will inspire new ideas and methods that make manipulation easier and robots more capable.

Source link


Please enter your comment!
Please enter your name here