Predicting human attention has been a topic of interest in various fields, including neuroscience, psychology, human-computer interaction (HCI), and computer vision. Being able to determine which regions attract attention has many practical applications, such as graphics, photography, image compression, and visual quality measurement. Google Research has been working on using machine learning and smartphone-based gaze estimation to accelerate eye movement research. This eliminates the need for expensive specialized hardware.
In this blog post, we will discuss two papers that highlight recent research on human attention modeling: “Deep Saliency Prior for Reducing Visual Distraction” and “Learning from Unique Perspectives: User-aware Saliency Modeling.” These papers demonstrate how predictive models of human attention can improve user experiences in various applications.
First, let’s explore attention-guided image editing. Traditionally, handcrafted features like color contrast, edges, and shape have been used in attention models. However, recent approaches have leveraged deep neural networks to automatically learn discriminative features. The “Deep Saliency Prior for Reducing Visual Distraction” paper introduces an optimization framework that uses a deep saliency model to guide visual attention during image editing. This framework can produce powerful effects like recoloring, inpainting, camouflage, and object editing. The goal is to minimize visual clutter and artifacts in photos, leading to increased user satisfaction.
Next, we will discuss user-aware saliency modeling. Previous research assumed a single saliency model for everyone, but human attention varies between individuals. The “Learning from Unique Perspectives: User-aware Saliency Modeling” paper introduces a user-aware saliency model that can predict attention for individuals, groups, and the general population. This model combines each participant’s visual preferences with a per-user attention map and adaptive user masks. By personalizing attention predictions, it allows for a more tailored user experience.
Finally, let’s talk about progressive image decoding. One common frustration while browsing is waiting for web pages with images to load. Progressive decoding of images can improve this experience by displaying higher-resolution sections as data is downloaded. With a predictive attention model, images can be decoded based on saliency, prioritizing the display of the most important regions first. This results in better image quality and reduced wait times for users.
In conclusion, predictive attention models have the potential to enhance user experiences in various applications. From image editing to image compression and web browsing, these models can minimize distractions, personalize experiences, and improve load times. Google Research’s work in this area showcases the power of human attention modeling in creating delightful user experiences.