最佳框：通过增强学习调整带注释的边界框来提高端到端场景文本识别

论文标题

最佳框：通过增强学习调整带注释的边界框来提高端到端场景文本识别

Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning

论文作者

Tang, Jingqun, Qian, Wenming, Song, Luchuan, Dong, Xiena, Li, Lan, Bai, Xiang

论文摘要

文本检测和识别是现代OCR系统的重要组成部分。大多数OCR方法都试图在检测阶段获得准确的文本框，该文本被用作文本识别阶段的输入。我们观察到，在使用紧密的文本边界框作为输入时，由于边界框之间的不一致和文本识别的深度表示，文本识别器通常无法实现最佳性能。在本文中，我们提出了Box调节器，这是一种基于增强学习的方法，用于调整每个文本边界框的形状，以使其与文本识别模型更兼容。此外，在处理诸如合成至真实问题之类的跨域问题时，所提出的方法会大大降低源和目标域之间的域分布不匹配。实验表明，当使用调整后的边界框作为训练的基础真相时，可以改善端到端文本识别系统的性能。具体而言，在几个基准数据集以用于场景文本理解上，该建议的方法在端到端文本识别任务上平均比最先进的文本toteters的f-评分和4.6％的f-评分在域适应任务上。

Text detection and recognition are essential components of a modern OCR system. Most OCR approaches attempt to obtain accurate bounding boxes of text at the detection stage, which is used as the input of the text recognition stage. We observe that when using tight text bounding boxes as input, a text recognizer frequently fails to achieve optimal performance due to the inconsistency between bounding boxes and deep representations of text recognition. In this paper, we propose Box Adjuster, a reinforcement learning-based method for adjusting the shape of each text bounding box to make it more compatible with text recognition models. Additionally, when dealing with cross-domain problems such as synthetic-to-real, the proposed method significantly reduces mismatches in domain distribution between the source and target domains. Experiments demonstrate that the performance of end-to-end text recognition systems can be improved when using the adjusted bounding boxes as the ground truths for training. Specifically, on several benchmark datasets for scene text understanding, the proposed method outperforms state-of-the-art text spotters by an average of 2.0% F-Score on end-to-end text recognition tasks and 4.6% F-Score on domain adaptation tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题