数据：域适应性辅助桌检测使用视觉上的表示

论文标题

数据：域适应性辅助桌检测使用视觉上的表示

DATa: Domain Adaptation-Aided Deep Table Detection Using Visual-Lexical Representations

论文作者

Kwon, Hyebin, An, Joungbin, Lee, Dongwoo, Shin, Won-Yong

论文摘要

通过开发依赖于手工制作的启发式方法的基于规则的方法，而且还要开发深度学习方法，已经将大量的研究注意力引起了表检测。尽管最近的研究成功地执行了表面检测，但结果通常会在转移的域时会经历性能退化，其表布局特征可能与培训基础模型的源域不同。为了克服这个问题，我们提出了数据，这是一种新型的域自适应深度表检测方法，可确保在很少有可信赖标签的特定目标域中令人满意的性能。为此，我们新设计的词汇功能和用于重新训练的增强模型。更具体地说，在培训基于最先进的视觉模型之一作为我们的骨干网络之后，我们重新培训了我们的增强模型，该模型由基于视觉的模型和多层感知器（MLP）体系结构组成。使用基于训练的MLP体系结构获得的新置信度得分，以及对边界框的初始预测及其置信度得分，我们更准确地计算每个置信度得分。为了验证数据的优势，我们通过在源域中采用现实世界的基准数据集和我们的目标域中的另一个数据集来执行实验评估，该数据集由材料科学文章组成。实验结果表明，所提出的数据方法基本上优于仅利用目标域中视觉表示的竞争方法。由于能够根据置信度得分阈值消除高假阳性或假否定性，因此可能获得此类收益。

Considerable research attention has been paid to table detection by developing not only rule-based approaches reliant on hand-crafted heuristics but also deep learning approaches. Although recent studies successfully perform table detection with enhanced results, they often experience performance degradation when they are used for transferred domains whose table layout features might differ from the source domain in which the underlying model has been trained. To overcome this problem, we present DATa, a novel Domain Adaptation-aided deep Table detection method that guarantees satisfactory performance in a specific target domain where few trusted labels are available. To this end, we newly design lexical features and an augmented model used for re-training. More specifically, after pre-training one of state-of-the-art vision-based models as our backbone network, we re-train our augmented model, consisting of the vision-based model and the multilayer perceptron (MLP) architecture. Using new confidence scores acquired based on the trained MLP architecture as well as an initial prediction of bounding boxes and their confidence scores, we calculate each confidence score more accurately. To validate the superiority of DATa, we perform experimental evaluations by adopting a real-world benchmark dataset in a source domain and another dataset in our target domain consisting of materials science articles. Experimental results demonstrate that the proposed DATa method substantially outperforms competing methods that only utilize visual representations in the target domain. Such gains are possible owing to the capability of eliminating high false positives or false negatives according to the setting of a confidence score threshold.

下载PDF全文

下载文献需遵守相关版权规定

论文标题