Visual Object Tracking: The Backbone of Computer Vision
Visual object tracking plays a crucial role in various subfields of computer vision, such as robot vision and autonomous driving. Its main goal is to accurately identify the target object in a video sequence. The Visual Object Tracking (VOT) challenge is a highly significant competition in the tracking field, where state-of-the-art algorithms compete to improve tracking techniques.
The Visual Object Tracking and Segmentation Competition (VOTS2023)
The Visual Object Tracking and Segmentation competition (VOTS2023) expands the scope of the VOT challenge by removing certain restrictions. This allows participants to explore object tracking in a broader context. VOTS2023 combines short and long-term monitoring of a single target and tracking multiple targets using target segmentation as the sole position specification. This introduces new challenges, such as precise mask estimation, multi-target trajectory tracking, and object relationship recognition.
The HQTrack System: High-Quality Tracking
A research study conducted by Dalian University of Technology, China, and DAMO Academy, Alibaba Group, introduces a system called HQTrack, short for High-Quality Tracking. HQTrack consists primarily of a video multi-object segmenter (VMOS) and a mask refiner (MR). VMOS, an enhanced version of DeAOT, is used to detect tiny objects in complex setups. A gated propagation module (GPM) at 1/8 scale is cascaded with VMOS to handle intricate setups. Intern-T is employed as the feature extractor to improve object distinction. In VMOS, only the most recently used frame is retained in the long-term memory, while older frames are discarded. However, a larger segmentation model can be utilized to enhance tracking masks, especially for objects with complex structures that are challenging to predict.
Improving Tracking Masks with HQ-SAM Model
To further enhance the quality of tracking masks, an HQ-SAM model that has been pre-trained is used. Final tracking results are selected from VMOS and MR, and the predicted masks’ outer enclosing boxes are used as box prompts for HQ-SAM. The original images are fed into HQ-SAM alongside the box prompts to obtain refined masks. HQTrack achieved second place at the VOTS2023 competition with a quality score of 0.615 on the test set.
Check out the paper and GitHub for more details. All credit for this research goes to the researchers involved in the project. Don’t forget to join our ML SubReddit, Discord Channel, and subscribe to our Email Newsletter for the latest AI research news and cool AI projects.