通过视觉匹配的自适应文本识别

论文标题

通过视觉匹配的自适应文本识别

Adaptive Text Recognition through Visual Matching

论文作者

Zhang, Chuhan, Gupta, Ankush, Zisserman, Andrew

论文摘要

在这项工作中，我们的目标是解决文档中文本识别的概括和灵活性的问题。我们介绍了一种新模型，该模型利用语言中字符的重复性质，并解除视觉表示学习和语言建模阶段。通过这样做，我们将文本识别变成形状匹配的问题，从而在课堂上的外观和灵活性中实现概括。我们评估了不同字母的合成数据集和真实数据集的新模型，并表明它可以应对挑战，即传统体系结构在没有昂贵的重新培训的情况下无法解决，包括：（i）它可以推广到没有新范例的情况下看不见的字体；（ii）它可以通过更改提供的示例来灵活地更改类的数量；（iii）它可以推广到未通过提供新的字形集来培训的新语言和新字符。对于所有这些情况，我们对最先进的模型显示出显着改善。

In this work, our objective is to address the problems of generalization and flexibility for text recognition in documents. We introduce a new model that exploits the repetitive nature of characters in languages, and decouples the visual representation learning and linguistic modelling stages. By doing this, we turn text recognition into a shape matching problem, and thereby achieve generalization in appearance and flexibility in classes. We evaluate the new model on both synthetic and real datasets across different alphabets and show that it can handle challenges that traditional architectures are not able to solve without expensive retraining, including: (i) it can generalize to unseen fonts without new exemplars from them; (ii) it can flexibly change the number of classes, simply by changing the exemplars provided; and (iii) it can generalize to new languages and new characters that it has not been trained for by providing a new glyph set. We show significant improvements over state-of-the-art models for all these cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题