使用注意力和信念传播进行结构信息提取的一声文本字段标记

论文标题

使用注意力和信念传播进行结构信息提取的一声文本字段标记

One-shot Text Field Labeling using Attention and Belief Propagation for Structure Information Extraction

论文作者

Cheng, Mengli, Qiu, Minghui, Shi, Xing, Huang, Jun, Lin, Wei

论文摘要

从文档图像中提取的结构化信息通常包括三个步骤：文本检测，文本识别和文本字段标记。虽然文本检测和文本识别在文献中进行了大量研究和改进，但文本字段标签的探索较少，仍然面临许多挑战。现有的基于学习的文本标签任务方法通常需要大量的标记示例来训练每种文档的特定模型。但是，由于隐私问题，收集大量文档图像并标记它们是困难的，有时是不可能的。为每种文档部署单独的模型还消耗了很多资源。面对这些挑战，我们探索了文本字段标签任务的一声学习。该任务的现有一声学习方法主要是基于规则的，并且在拥挤区域的标记字段上很难，而这些区域很少有由多个单独的文本区域组成的地标和字段。为了减轻这些问题，我们提出了一种新颖的深端到端可训练的方法，用于一声文本字段标签，该方法利用注意机制在文档图像之间传输布局信息。我们在传输的布局信息上进一步应用条件随机字段，以改进字段标记。我们收集并注释了具有各种文档类型的真实世界单次场标记数据集，并进行了广泛的实验，以检查所提出的模型的有效性。为了刺激这一方向的研究，将发布收集的数据集和单发模型1。

Structured information extraction from document images usually consists of three steps: text detection, text recognition, and text field labeling. While text detection and text recognition have been heavily studied and improved a lot in literature, text field labeling is less explored and still faces many challenges. Existing learning based methods for text labeling task usually require a large amount of labeled examples to train a specific model for each type of document. However, collecting large amounts of document images and labeling them is difficult and sometimes impossible due to privacy issues. Deploying separate models for each type of document also consumes a lot of resources. Facing these challenges, we explore one-shot learning for the text field labeling task. Existing one-shot learning methods for the task are mostly rule-based and have difficulty in labeling fields in crowded regions with few landmarks and fields consisting of multiple separate text regions. To alleviate these problems, we proposed a novel deep end-to-end trainable approach for one-shot text field labeling, which makes use of attention mechanism to transfer the layout information between document images. We further applied conditional random field on the transferred layout information for the refinement of field labeling. We collected and annotated a real-world one-shot field labeling dataset with a large variety of document types and conducted extensive experiments to examine the effectiveness of the proposed model. To stimulate research in this direction, the collected dataset and the one-shot model will be released1.

下载PDF全文

下载文献需遵守相关版权规定

论文标题