Abstract:
The SiamRPN tracker, relying on a first-order shallow network for feature extraction, encounters challenges in accurately capturing comprehensive feature information, often resulting in target drift or loss due to the absence of an occlusion discrimination mechanism. To address these issues, this paper introduces a twin-network tracking method that combines bilinear feature fusion with adaptive re-detection. We employed an enhanced ResNet50 network for sequential feature extraction, with feature vectors obtained from the final three residual blocks fused in a bilinear cascade, thereby providing second-order feature information. Subsequently, the region proposal network generates the target box. To evaluate potential occlusion, we calculate the average peak correlation energy corresponding to the target box. In cases of occlusion, a neighboring detection window is established around the tracking result from the previous frame. The window's selection is determined through a combination of weighted sequential and random selection for target re-detection. Experimental results on the OTB100 and UAV123 datasets demonstrate the effectiveness of our proposed method, achieving tracking success rates of 89.4% and 80.0%, as well as tracking accuracies of 66.9% and 60.5%, respectively. Furthermore, the method exhibits robust tracking timeliness.