蒙版TextSpotter V3：用于强大场景的细分建议网络文本斑点

论文标题

蒙版TextSpotter V3：用于强大场景的细分建议网络文本斑点

Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting

论文作者

Liao, Minghui, Pang, Guan, Huang, Jing, Hassner, Tal, Bai, Xiang

论文摘要

最新的端到端可训练方法用于场景文本发现，集成检测和识别，显示出很大的进步。但是，当前大多数任意形状的场景texters topters使用区域提案网络（RPN）来提出建议。 RPN在很大程度上依赖手动设计的锚，其建议用轴对准的矩形表示。前者在处理极端纵横比或不规则形状的文本实例方面遇到了困难，而后者通常将多个相邻实例包括在一个提案中，而在密集取向的文本的情况下。为了解决这些问题，我们提出了Mask Textspotter V3，这是一种可端到端的可训练场景text Sottter，它采用了细分建议网络（SPN）而不是RPN。我们的SPN不含锚定，并提供了任意形状建议的准确表示。因此，它在检测极端长宽比或不规则形状的文本实例中优于RPN。此外，SPN产生的准确建议允许将蒙版的ROI功能用于解耦相邻的文本实例。结果，我们的蒙版Textspotter V3可以处理极端长宽比或不规则形状的文本实例，并且其识别精度不会受到附近文本或背景噪声的影响。具体而言，在旋转的ICDAR 2013数据集（旋转鲁棒性）上，我们的表现优于最先进的方法，总文本数据集（Shape Rotustness）为5.9％，并在MSRA-TD500数据集（倍数鲁棒性）上实现最先进的性能。代码可在以下网址找到：https：//github.com/mhliao/masktextspotterv3

Recent end-to-end trainable methods for scene text spotting, integrating detection and recognition, showed much progress. However, most of the current arbitrary-shape scene text spotters use region proposal networks (RPN) to produce proposals. RPN relies heavily on manually designed anchors and its proposals are represented with axis-aligned rectangles. The former presents difficulties in handling text instances of extreme aspect ratios or irregular shapes, and the latter often includes multiple neighboring instances into a single proposal, in cases of densely oriented text. To tackle these problems, we propose Mask TextSpotter v3, an end-to-end trainable scene text spotter that adopts a Segmentation Proposal Network (SPN) instead of an RPN. Our SPN is anchor-free and gives accurate representations of arbitrary-shape proposals. It is therefore superior to RPN in detecting text instances of extreme aspect ratios or irregular shapes. Furthermore, the accurate proposals produced by SPN allow masked RoI features to be used for decoupling neighboring text instances. As a result, our Mask TextSpotter v3 can handle text instances of extreme aspect ratios or irregular shapes, and its recognition accuracy won't be affected by nearby text or background noise. Specifically, we outperform state-of-the-art methods by 21.9 percent on the Rotated ICDAR 2013 dataset (rotation robustness), 5.9 percent on the Total-Text dataset (shape robustness), and achieve state-of-the-art performance on the MSRA-TD500 dataset (aspect ratio robustness). Code is available at: https://github.com/MhLiao/MaskTextSpotterV3

下载PDF全文

下载文献需遵守相关版权规定

论文标题