(Article Title: Semantic-SAM: A Universal Image Segmentation Model)
Artificial Intelligence has made significant advancements recently. One of the latest developments in AI is the introduction of Large Language Models, which have gained attention for their amazing ability to imitate human behavior. These models have not only excelled in language processing but also in computer vision. Despite the remarkable success in natural language processing and controllable image generation, there are still limitations in pixel-level image understanding, specifically universal image segmentation.
Image segmentation involves dividing an image into different sections. While there have been improvements in this technique, creating a universal picture segmentation model that can handle various images with different levels of detail is still being discussed. The main challenges in this area are the availability of sufficient training data and limitations on model design flexibility. Current methods often use a single-input, single-output pipeline that cannot predict segmentation masks at different levels of detail. Additionally, it is costly to scale up segmentation datasets with both semantic and granularity knowledge.
To address these limitations, a team of researchers has introduced Semantic-SAM, a universal image segmentation model that can segment and recognize objects at any desired level of detail based on user input. This model provides semantic labels for both objects and parts and predicts masks at different granularities when a user clicks on the image. The decoder architecture of Semantic-SAM incorporates a multi-choice learning strategy to handle multiple granularities. Each click is represented by multiple queries, each with a unique embedding level. The queries are trained to learn from ground-truth masks with different granularities.
To overcome the problem of semantic awareness, the team has used a decoupled categorization strategy for parts and objects in Semantic-SAM. The model separately encodes objects and parts using a shared text encoder, allowing for distinct segmentation procedures while adjusting the loss function based on the input type. This strategy ensures that the model can handle data from the SAM dataset, which lacks some categorization labels, as well as data from general segmentation datasets.
To enhance semantics and granularity, the team has combined seven datasets that represent various levels of detail. These datasets include the SA-1B dataset, part segmentation datasets like PASCAL Part, PACO, and PartImagenet, and generic segmentation datasets like MSCOCO and Objects365. The data formats have been rearranged to fit Semantic-SAM’s training objectives.
Results from evaluations and testing have shown that Semantic-SAM outperforms existing models. Performance is significantly improved when interactive segmentation techniques like SA-1B promptable segmentation and COCO panoptic segmentation are used in training. The model achieves a remarkable 2.3 box AP gain and 1.2 mask AP gain. It also performs better than SAM by more than 3.4 1-IoU in terms of granularity completeness.
Semantic-SAM is a groundbreaking advancement in image segmentation. It opens up new opportunities for pixel-level image analysis by combining universal representation, semantic awareness, and abundance of granularity.
Check out the Paper and GitHub link for more information. Don’t forget to join our 26k+ ML subreddit, Discord channel, and Email Newsletter to stay updated on the latest AI research news and cool AI projects. If you have any questions or if we missed anything, feel free to email us at Asif@marktechpost.com.
Check out more than 800 AI tools in AI Tools Club!
(Article by Tanya Malhotra, a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, specializing in Artificial Intelligence and Machine Learning.)