SwinTransTrack: Multi-object Tracking Using Shifted Window Transformers
Tianci Zhao,Changwen Zheng,Qingmeng Zhu,Hao He
TLDR
This work proposes SwinTransTrack, a novel shift-window encoder and decoder model based on Swin Transformer, which fuse low-rank adaptation to achieve feature dimension enhancement and propose a new shifted-window decoder network to obtain accurate displacement to associate trajectories.
Abstract
With the great popularity of Transformers, there has been many works using Transformers to explore the temporal association properties of objects between different video frames. However, due to the large-scale variation of visual entities and the high resolution of pixels in images, the original Transformers take so long time for both training and inference. Based on Swin Transformer, we propose SwinTransTrack, a novel shift-window encoder and decoder model. Different from the original model, we fuse low-rank adaptation to achieve feature dimension enhancement and propose a new shifted-window decoder network to obtain accurate displacement to associate trajectories. Finally, We conducted extensive quantitative experiments on different MOT datasets, MOT17 and MOT20. The experimental results show that SwinTransTrack achieves 75.5 MOTA on MOT17 and 67.5 MOTA on MOT20, leading both MOT competitions.
