Image Retrieval in AI: Mapping Pictures to Words
Image retrieval is a complex process that aims to accurately represent images with minimal loss. To address this challenge, researchers have been exploring the use of text embeddings to represent images. However, this method has resulted in significant loss and reduced accuracy. This approach falls under the category of Computer Vision and Convolutional Neural Networks.
To overcome the limitations of text embeddings, Google AI researchers introduced a method called Pic2Word. This method maps images to words, ensuring minimal loss. Unlike other methods, Pic2Word does not require labeled data and can work with unlabelled and captioned images, which are easier to collect. This method is similar to Convolutional Neural Networks and involves passing the image and caption information through hidden layers to generate an output image with minimal loss.
The Contrastive image pre-trained model proposed by researchers generates embeddings for both text and images. The image is processed through the visual encoder to obtain visual embeddings, which are then processed through the text encoder to generate text embeddings. These embeddings are then used to search for similar images, resulting in retrieved images with minimal loss. Additionally, the fashion attribute composition model ensures that the color obtained in the output image remains the same as the input image.
These methods have proven to be effective in mapping images to word tokens. Researchers suggest using the trained CLIP model, which treats an image as a text token. This allows for flexible composition of image features and text descriptions. Pic2Word has been demonstrated to perform well across various diverse tasks.
To learn more, you can check out the paper, GitHub link, and blog post. Join our ML SubReddit, Discord Channel, and subscribe to our email newsletter for the latest AI research news and projects. If you have any questions or if we missed anything, feel free to email us at Asif@marktechpost.com.
Check out the AI Tools Club for over 800 AI tools to explore!
Note: This article was written by Bhoumik Mhatre, a third-year undergraduate student at IIT Kharagpur pursuing a B.Tech + M.Tech program in Mining Engineering with a minor in economics. He is currently a research intern at the National University of Singapore and a partner at Digiaxx Company. Bhoumik is passionate about the field of Data Science and its recent developments.