论文标题
可视跟踪的准确边界框回归,距离损失
Accurate Bounding-box Regression with Distance-IoU Loss for Visual Tracking
论文作者
论文摘要
大多数现有的跟踪器都是基于使用分类器和多尺度估计来估计目标状态的。因此,正如预期的那样,跟踪器在跟踪准确性停滞不前时变得更加稳定。尽管跟踪器采用了基于基于相交联合(IOU)损失的最大重叠方法来减轻此问题,但IOU损失本身中存在缺陷,这使得在给定边界框中完全包含/没有其他边界框时,无法继续优化目标函数;这使得准确估计目标状态变得非常具有挑战性。因此,在本文中,我们通过提出基于距离(diou)损失的新型跟踪方法来解决上述问题,从而使所提出的跟踪器由目标估计和目标分类组成。对目标估计部分进行了训练,以预测目标地面界限和估计边界框之间的DIOU得分。 DIOU损失可以保持IOU损失提供的优势,同时最大程度地减少两个边界框的中心点之间的距离,从而使目标估计更加准确。此外,我们介绍了一个分类部分,该零件在线培训,并通过基于共轭级别的策略进行了优化,以确保实时跟踪速度。全面的实验结果表明,与最先进的跟踪器相比,所提出的方法在实时跟踪速度的同时,可以达到竞争性跟踪精度。
Most existing trackers are based on using a classifier and multi-scale estimation to estimate the target state. Consequently, and as expected, trackers have become more stable while tracking accuracy has stagnated. While trackers adopt a maximum overlap method based on an intersection-over-union (IoU) loss to mitigate this problem, there are defects in the IoU loss itself, that make it impossible to continue to optimize the objective function when a given bounding box is completely contained within/without another bounding box; this makes it very challenging to accurately estimate the target state. Accordingly, in this paper, we address the above-mentioned problem by proposing a novel tracking method based on a distance-IoU (DIoU) loss, such that the proposed tracker consists of target estimation and target classification. The target estimation part is trained to predict the DIoU score between the target ground-truth bounding-box and the estimated bounding-box. The DIoU loss can maintain the advantage provided by the IoU loss while minimizing the distance between the center points of two bounding boxes, thereby making the target estimation more accurate. Moreover, we introduce a classification part that is trained online and optimized with a Conjugate-Gradient-based strategy to guarantee real-time tracking speed. Comprehensive experimental results demonstrate that the proposed method achieves competitive tracking accuracy when compared to state-of-the-art trackers while with a real-time tracking speed.