使用对抗性学习将内容与样式分开，以识别野外文本

论文标题

使用对抗性学习将内容与样式分开，以识别野外文本

Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild

论文作者

Luo, Canjie, Lin, Qingxiang, Liu, Yuliang, Jin, Lianwen, Shen, Chunhua

论文摘要

我们建议通过将文本内容与复杂的背景区分开来从新的角度提高文本识别。由于Vanilla gans在自然图像中不足以生成类似序列的字符，因此我们为图像中多个字符的生成和识别提出了一个对抗性学习框架。所提出的框架包括基于注意力的识别器和一种生成的对抗体系结构。此外，为了解决缺乏配对训练样本的问题，我们设计了一种交互式联合培训计划，该计划共享了从识别器到歧视者的注意力面具，并使歧视者能够提取每个角色的特征以进行进一步的对抗性训练。从角色级别的对手培训中受益，我们的框架仅需要未配对的简单数据才能进行样式监督。在培训期间，可以简单地在线合成每个目标样本，其中只有一个随机选择的角色。这很重要，因为培训不需要昂贵的配对样本或角色级注释。因此，仅需要输入图像和相应的文本标签。除了背景的样式归一化外，我们还完善了字符模式以简化识别任务。提出了一种反馈机制来弥合歧视者和识别器之间的差距。因此，鉴别器可以根据识别器的混淆来指导发生器，以使生成的模式更清晰地识别。在包括规则和不规则文本在内的各种基准测试的实验表明，我们的方法大大降低了识别的难度。我们的框架可以集成到最近的识别方法中，以实现新的最新识别精度。

We propose to improve text recognition from a new perspective by separating the text content from complex backgrounds. As vanilla GANs are not sufficiently robust to generate sequence-like characters in natural images, we propose an adversarial learning framework for the generation and recognition of multiple characters in an image. The proposed framework consists of an attention-based recognizer and a generative adversarial architecture. Furthermore, to tackle the issue of lacking paired training samples, we design an interactive joint training scheme, which shares attention masks from the recognizer to the discriminator, and enables the discriminator to extract the features of each character for further adversarial training. Benefiting from the character-level adversarial training, our framework requires only unpaired simple data for style supervision. Each target style sample containing only one randomly chosen character can be simply synthesized online during the training. This is significant as the training does not require costly paired samples or character-level annotations. Thus, only the input images and corresponding text labels are needed. In addition to the style normalization of the backgrounds, we refine character patterns to ease the recognition task. A feedback mechanism is proposed to bridge the gap between the discriminator and the recognizer. Therefore, the discriminator can guide the generator according to the confusion of the recognizer, so that the generated patterns are clearer for recognition. Experiments on various benchmarks, including both regular and irregular text, demonstrate that our method significantly reduces the difficulty of recognition. Our framework can be integrated into recent recognition methods to achieve new state-of-the-art recognition accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题