读取：一个纠正的注意力识别双重监督网络，用于场景文本识别

论文标题

读取：一个纠正的注意力识别双重监督网络，用于场景文本识别

ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition

论文作者

Song, Qi, Jiang, Qianyi, Li, Nan, Zhang, Rui, Wei, Xiaolin

论文摘要

近年来，场景文本识别始终被视为序列到序列问题。连接派时间分类（CTC）和注意序列识别（ATTN）是两种非常普遍的方法来解决此问题，而它们在某些情况下可能会失败。 CTC更多地集中在每个字符上，但在文本语义依赖性建模中较弱。基于ATTN的方法具有更好的上下文语义建模能力，而在有限的培训数据上倾向于过度努力。在本文中，我们精心设计了一个纠正的注意力双重监督网络（读取），以用于一般场景文本识别。为了克服CTC和ATTN的弱点，它们都应用于我们的方法中，但在两个有监督的分支中使用不同的模块，可以彼此互补。此外，引入了有效的空间和通道注意机制，以消除背景噪声并提取有效的前景信息。最后，实现了一个简单的整流网络以纠正不规则文本。读取可以端对端训练，只需要单词级注释。对各种基准测试的广泛实验验证了实现最新性能的读取的有效性。

In recent years, scene text recognition is always regarded as a sequence-to-sequence problem. Connectionist Temporal Classification (CTC) and Attentional sequence recognition (Attn) are two very prevailing approaches to tackle this problem while they may fail in some scenarios respectively. CTC concentrates more on every individual character but is weak in text semantic dependency modeling. Attn based methods have better context semantic modeling ability while tends to overfit on limited training data. In this paper, we elaborately design a Rectified Attentional Double Supervised Network (ReADS) for general scene text recognition. To overcome the weakness of CTC and Attn, both of them are applied in our method but with different modules in two supervised branches which can make a complementary to each other. Moreover, effective spatial and channel attention mechanisms are introduced to eliminate background noise and extract valid foreground information. Finally, a simple rectified network is implemented to rectify irregular text. The ReADS can be trained end-to-end and only word-level annotations are required. Extensive experiments on various benchmarks verify the effectiveness of ReADS which achieves state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题