The Amazing Ability of Our Brain to Process Visual Information
The human brain has an incredible talent for processing visual information. With just a quick glance at a complex scene, our brain can quickly analyze and understand its different components, such as objects, colors, and sizes. This allows us to describe the scene in simple language. Behind this seemingly effortless process is a complex computation performed by our visual cortex.
To fully comprehend how this works, we need to understand how semantically meaningful information is represented in the firing of neurons in the visual cortex. We also need to figure out how this representation is learned from untaught experiences.
In recent research with our collaborators from Caltech and the Chinese Academy of Science, we focused on studying face perception. Faces are well-studied in neuroscience and serve as a microcosm of object recognition. We compared the responses of single cortical neurons in the face patches to a class of neural networks called “disentangling” models. These models aim to be interpretable to humans, unlike traditional black box systems.
Disentangling models learn to represent complex images in a lower-dimensional form using latent units. Each unit represents a specific attribute of the scene, like color or size. Unlike black box classifiers, these models are trained without external supervision. They reconstruct input images using their learned latent representation.
For years, the machine learning community has been fascinated by disentangling as a way to build more efficient and imaginative AI systems. However, developing practical disentangling models has been a challenge. The first successful model to achieve this, β-VAE, took inspiration from neuroscience and mirrors the properties of the visual brain.
In our research, we measured the similarity between the disentangled units discovered by β-VAE and the responses of real neurons in the visual cortex. We found a strong one-to-one mapping between them, suggesting that β-VAE units encode semantically meaningful information just like real neurons.
To further test this, we translated the activity of real neurons into their matched artificial counterparts and used the β-VAE generator to visualize the faces represented by the real neurons. Surprisingly, as few as 12 neurons were sufficient to generate accurate and high-quality reconstructions of the original faces.
These findings challenge the common belief that semantically meaningful information is multiplexed across a large number of neurons in the brain. Instead, it suggests that the brain optimizes the disentanglement objective to support our effortless visual perception.
While β-VAE was inspired by neuroscience principles, its utility for intelligent behavior has primarily been demonstrated in machine learning. We believe that these insights could now benefit the neuroscience community and help explore the use of disentangled representations for supporting intelligence in biological systems.
By understanding how the brain achieves disentanglement and applies it to abstract reasoning and efficient learning, we can unlock new possibilities for artificial and biological intelligence.