位置感知功能选择文本检测网络

论文标题

位置感知功能选择文本检测网络

Location-Aware Feature Selection Text Detection Network

论文作者

Guo, Zengyuan, Wang, Zilin, Wang, Zhihui, Ouyang, Wanli, Li, Haojie, Gao, Wen

论文摘要

基于回归的文本检测方法已经通过简单的网络结构和高效率实现了有希望的性能。但是，与最近基于细分的文本检测器相比，它们落后于准确性。在这项工作中，我们发现这种情况的一个重要原因是基于回归的方法通常使用固定的特征选择方式，即在单个位置或邻居区域中选择特征，以预测边界框的组件，例如边界的距离或旋转角度的距离。通过这种方式选择的功能有时并不是预测文本边界框的每个组件，从而降低准确性性能的最佳选择。为了解决这个问题，我们提出了一个新颖的位置感知特征选择文本检测网络（LASNET）。 Lasnet从不同位置选择合适的功能，以分别预测边界框的五个组件，并通过这些组件的组合获取最终边界框。具体而言，该提议的Lasnet首先学习五个新的置信得分映射，以分别指示边界框组件的预测准确性。然后，根据其置信度得分，将位置感知的特征选择机制（LAFS）旨在重视每个组件的顶部$ K $预测结果，并将所有五个Fused组件组合到最终的边界框中。结果，LASNET通过使用可学习的功能选择方式来预测更准确的边界框。实验结果表明，我们的LASNET通过单模型和单尺度测试实现最先进的性能，表现优于所有基于回归的检测器。

Regression-based text detection methods have already achieved promising performances with simple network structure and high efficiency. However, they are behind in accuracy comparing with recent segmentation-based text detectors. In this work, we discover that one important reason to this case is that regression-based methods usually utilize a fixed feature selection way, i.e. selecting features in a single location or in neighbor regions, to predict components of the bounding box, such as the distances to the boundaries or the rotation angle. The features selected through this way sometimes are not the best choices for predicting every component of a text bounding box and thus degrade the accuracy performance. To address this issue, we propose a novel Location-Aware feature Selection text detection Network (LASNet). LASNet selects suitable features from different locations to separately predict the five components of a bounding box and gets the final bounding box through the combination of these components. Specifically, instead of using the classification score map to select one feature for predicting the whole bounding box as most of the existing methods did, the proposed LASNet first learn five new confidence score maps to indicate the prediction accuracy of the bounding box components, respectively. Then, a Location-Aware Feature Selection mechanism (LAFS) is designed to weightily fuse the top-$K$ prediction results for each component according to their confidence score, and to combine the all five fused components into a final bounding box. As a result, LASNet predicts the more accurate bounding boxes by using a learnable feature selection way. The experimental results demonstrate that our LASNet achieves state-of-the-art performance with single-model and single-scale testing, outperforming all existing regression-based detectors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题