Enhancing CLIP Features with Task-Specific Pseudo-Labeling for Improved Performance

AI News

Enhancing CLIP Features with Task-Specific Pseudo-Labeling for Improved Performance

Jimmy W.

March 5, 2024

Enhancing CLIP Features with Task-Specific Pseudo-Labeling for Improved Performance

The Significance of Contrastive Language Image Pretraining

The UniReps Workshop in NeurIPS 2023 has accepted a paper on contrastive language image pretraining, a method that has become the standard approach for training vision language models. This technique involves using CLIP visual features as global representations for images. However, while CLIP features are useful for many tasks, they have limitations when it comes to tasks like object localization, pixel-level understanding, and 3D perception.

The Challenge of Multi-Task Training

To address these limitations, one popular solution is multi-task training. However, creating a large-scale annotated multi-task dataset can be costly. Training on separate task-specific datasets also presents challenges, such as aligning gradients and knowledge from different input distributions and tasks.

Improving CLIP Features with Pseudo-Labeling

To overcome these challenges, researchers are exploring a new approach using pseudo-labeling with task-specific experts to enhance CLIP features for more complex downstream tasks. In this method, multiple existing pretrained models are leveraged to pseudo-label an uncurated web-scale image-caption dataset. By training CLIP with contrastive loss and task-specific losses using pseudo-labels through lightweight heads attached to the vision backbone, researchers aim to enhance the performance of CLIP on challenging tasks.

Source link

The Significance of Contrastive Language Image Pretraining

The Challenge of Multi-Task Training

Improving CLIP Features with Pseudo-Labeling

LEAVE A REPLY Cancel reply