Scene Understanding: A Dataset for Human Egocentric Navigation and AI
Humans perceive their surroundings from a unique perspective, receiving sensory input that is completely “egocentric” or “from the self.” This perspective is essential for navigation and understanding the world around us. Technologies like robots, AR glasses, and assistive devices also require a human-like perspective to function effectively.
Computer vision, specifically scene understanding, focuses on how visible objects relate to the overall structure and layout of a scene. Self-driving cars, for example, need to understand the 3D structure of the road, identify street signs, and recognize stop lights. Robots navigating through a park need to find paths and avoid obstacles. AR glasses assist users in finding their way.
Most scene understanding datasets available today do not focus solely on human egocentric data, making them less useful for human-centered navigation tasks. To address this, we introduce the Scene understanding, Accessibility, Navigation, Pathfinding, Obstacle avoidance dataset (SANPO). SANPO is a multi-attribute video dataset for outdoor human egocentric scene understanding. It includes real and synthetic data with depth maps and video panoptic masks.
SANPO-Real is a multiview video dataset captured with two stereo cameras and covers various environments. It consists of 701 sessions, each approximately 30 seconds long, with high-level attribute annotations, camera pose trajectories, dense depth maps, and panoptic segmentation annotations.
SANPO-Synthetic is a high-quality synthetic dataset created in partnership with Parallel Domain. It matches real-world conditions and contains 1961 sessions recorded using virtualized cameras. The sessions have precise camera pose trajectories, dense depth maps, and panoptic segmentation masks.
Researchers can use SANPO to develop systems for human-centered navigation and benchmark their scene understanding models. The dataset supports a wide range of dense prediction tasks and provides challenging scenarios for current models.
SANPO-Synthetic and SANPO-Real can be used interchangeably, allowing researchers to study domain transfer tasks or use synthetic data during training. This flexibility enables the development of robust models that can handle the complexity of real-world scenes.
By providing a comprehensive human egocentric dataset, SANPO fills a gap in the scene understanding community and facilitates advancements in AI-driven navigation and perception. Researchers can download the dataset and start exploring new possibilities in human-centered AI.