First-Explore: Sample-Efficient Meta-RL for Hard-Exploration Domains

Reinforcement learning (RL) applications, such as plasma control, molecular design, game playing, and robot control, have seen success. However, traditional RL is not efficient with samples, requiring hundreds of thousands of episodes to learn a task that a human can learn in a few tries.

Researchers from the University of British Columbia, Vector Institute, and Canada CIFAR AI Chair have introduced First-Explore, a lightweight meta-RL framework that addresses the shortcomings of traditional RL. It learns a set of policies: an intelligent explore policy and an intelligent exploit policy. This framework enables sample-efficient learning on hard-exploration domains, even those that require sacrificing reward to investigate effectively.

In developing artificial general intelligence (AGI), one of the primary obstacles is achieving human-level performance on hard-exploration domains. The team suggests combining First-Explore with a curriculum, such as the AdA curriculum, as a step in the right direction. This progress could unlock the potential benefits of AGI while also addressing safety concerns.

First-Explore uses computational resources to learn intelligent exploration. The exploring strategy becomes sample efficient when learning new tasks. This approach outperforms standard RL on domains like the multi-armed Gaussian bandit and the Dark Prize Room environment. The findings highlight the importance of understanding the differences between exploration and exploitation to achieve effective in-context learning.

