论文标题
自适应标签平滑
Adaptive Label Smoothing
论文作者
论文摘要
本文涉及使用对象措施来改善卷积神经网络(CNN)的校准性能。事实证明,CNN是非常好的分类器,通常可以很好地定位对象。但是,通常用于训练分类CNN的损失功能不会惩罚无法定位对象的损失功能,也不考虑在给定图像中对象的相对大小。在Imagenet-1k培训期间,几乎所有方法都在图像上使用随机作物,这种转换有时只能为CNN提供背景样本。这会导致分类器取决于上下文。上下文依赖性对关键安全应用有害。我们提出了一种新颖的分类方法,该方法结合了训练期间的概念和标签平滑的想法。与以前的方法不同,我们根据图像中的相对对象大小计算一个平滑因子,该因子是\ emph {自适应}。这导致我们的方法产生以所分类的对象大小为基础的信心,而不是依靠上下文来做出正确的预测。我们使用Imagenet提出了广泛的结果,以证明使用自适应标签平滑训练的CNN在其预测中过度自信的可能性要小得多。我们使用类激活图和使用分类和转移学习任务的定量结果显示定性结果。与基线相比,我们的方法仅在上下文图像上预测上下文时,能够降低置信度的阶数。使用转移学习,与硬标签方法相比,我们可以在MS Coco上获得2.1map。
This paper concerns the use of objectness measures to improve the calibration performance of Convolutional Neural Networks (CNNs). CNNs have proven to be very good classifiers and generally localize objects well; however, the loss functions typically used to train classification CNNs do not penalize inability to localize an object, nor do they take into account an object's relative size in the given image. During training on ImageNet-1K almost all approaches use random crops on the images and this transformation sometimes provides the CNN with background only samples. This causes the classifiers to depend on context. Context dependence is harmful for safety-critical applications. We present a novel approach to classification that combines the ideas of objectness and label smoothing during training. Unlike previous methods, we compute a smoothing factor that is \emph{adaptive} based on relative object size within an image. This causes our approach to produce confidences that are grounded in the size of the object being classified instead of relying on context to make the correct predictions. We present extensive results using ImageNet to demonstrate that CNNs trained using adaptive label smoothing are much less likely to be overconfident in their predictions. We show qualitative results using class activation maps and quantitative results using classification and transfer learning tasks. Our approach is able to produce an order of magnitude reduction in confidence when predicting on context only images when compared to baselines. Using transfer learning, we gain 2.1mAP on MS COCO compared to the hard label approach.