An Efficient and Fast Solution for Semantic Segmentation in Autonomous Vehicles
Autonomous vehicles need to quickly and accurately identify objects they encounter, from parked delivery trucks to approaching cyclists. This task, known as semantic segmentation, has traditionally required complex computer vision models that consume significant computational resources, especially when dealing with high-resolution images.
Researchers from MIT, the MIT-IBM Watson AI Lab, and other institutions, however, have developed a highly efficient computer vision model. This model can perform semantic segmentation in real-time using limited hardware resources, making it ideal for on-board computers in autonomous vehicles.
The Challenge of Semantic Segmentation
Existing state-of-the-art semantic segmentation models become increasingly computationally intensive as image resolution rises. This makes them too slow to process high-resolution images on edge devices like sensors or mobile phones. To address this challenge, the MIT researchers designed a new building block for semantic segmentation models with linear computational complexity and hardware-efficient operations.
The result is a new model series, called EfficientViT, which performs up to nine times faster than previous models when deployed on mobile devices. Importantly, this new model series maintains the same or better accuracy.
EfficientViT and Beyond
This efficient model series has potential applications beyond autonomous vehicles. It can also improve the efficiency of other high-resolution computer vision tasks, like medical image segmentation.
The researchers emphasize the importance of balancing performance and efficiency. EfficientViT achieves this balance by using a linear similarity function and incorporating additional components to capture local feature interactions and enable multiscale learning.
Realizing the Potential of Efficient Computer Vision
The EfficientViT model series is designed with a hardware-friendly architecture, enabling it to run smoothly on various devices, such as virtual reality headsets and edge computers in autonomous vehicles. It can also be applied to image classification and other computer vision tasks.
Testing on semantic segmentation datasets demonstrated that EfficientViT is up to nine times faster with the same or better accuracy than other popular vision transformer models. This breakthrough opens up possibilities for running efficient and accurate computer vision models on mobile and cloud devices.
Next, the researchers plan to apply this technique to speed up generative machine learning models and continue scaling EfficientViT for various vision tasks.
Industry experts recognize the significance of this research. They believe that efficient transformer models, like EfficientViT, offer immense potential for real-world applications, such as enhancing image quality in video games and driving efficient and green AI computing.