选择：使用改进的图形学习 - 卷积网络从文档中处理密钥信息提取

论文标题

选择：使用改进的图形学习 - 卷积网络从文档中处理密钥信息提取

PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

论文作者

Yu, Wenwen, Lu, Ning, Qi, Xianbiao, Gong, Ping, Xiao, Rong

论文摘要

具有最先进的深度学习模型的计算机视觉在光学特征识别（OCR）领域取得了巨大成功，包括文本检测和识别任务。但是，在现实世界中具有大量使用场景的OCR的下游任务，从文档中的关键信息提取（KIE）仍然是一个挑战，因为文档不仅具有从OCR系统中提取的文本功能，而且具有无法完全利用的语义视觉功能，并且在KIE中没有完全利用并起着关键作用。太少的工作专门用于有效地充分利用文档的文本和视觉特征。在本文中，我们介绍了Pick，该框架通过将图形学习与图形卷积操作相结合，可在处理复杂文档的布局中有效且坚固，从而产生了包含文本和视觉特征的丰富语义表示，并且没有歧义。已经对现实世界数据集进行了广泛的实验，以表明我们的方法以大幅度的边距优于基本方法。我们的代码可在https://github.com/wenwenyu/pick-pytorch上找到。

Computer vision with state-of-the-art deep learning models has achieved huge success in the field of Optical Character Recognition (OCR) including text detection and recognition tasks recently. However, Key Information Extraction (KIE) from documents as the downstream task of OCR, having a large number of use scenarios in real-world, remains a challenge because documents not only have textual features extracting from OCR systems but also have semantic visual features that are not fully exploited and play a critical role in KIE. Too little work has been devoted to efficiently make full use of both textual and visual features of the documents. In this paper, we introduce PICK, a framework that is effective and robust in handling complex documents layout for KIE by combining graph learning with graph convolution operation, yielding a richer semantic representation containing the textual and visual features and global layout without ambiguity. Extensive experiments on real-world datasets have been conducted to show that our method outperforms baselines methods by significant margins. Our code is available at https://github.com/wenwenyu/PICK-pytorch.

下载PDF全文

下载文献需遵守相关版权规定

论文标题