论文标题
表格类型分类的图表上的数据增强
Data augmentation on graphs for table type classification
论文作者
论文摘要
由于其紧凑而结构化的信息表示,表被广泛用于文档中。特别是,在科学论文中,表可以概括新颖的发现并总结实验结果,从而使研究可以与学者相提并论。由于表的布局是高度可变的,因此将其内容解释并将其分类为类别是有用的。这可能有助于直接从科学论文中提取信息,例如,鉴于其论文结果表比较某些模型的性能。在这项工作中,我们使用图神经网络解决了表格的分类,从而利用了使用中的算法的表结构。我们在TAB2KKEY数据集的子集上评估了模型。由于它包含几乎没有手动注释的示例,因此我们直接在表图结构上提出了数据增强技术。我们获得了有希望的初步结果,提出了一种适用于基于图表的表表示的数据增强方法。
Tables are widely used in documents because of their compact and structured representation of information. In particular, in scientific papers, tables can sum up novel discoveries and summarize experimental results, making the research comparable and easily understandable by scholars. Since the layout of tables is highly variable, it would be useful to interpret their content and classify them into categories. This could be helpful to directly extract information from scientific papers, for instance comparing performance of some models given their paper result tables. In this work, we address the classification of tables using a Graph Neural Network, exploiting the table structure for the message passing algorithm in use. We evaluate our model on a subset of the Tab2Know dataset. Since it contains few examples manually annotated, we propose data augmentation techniques directly on the table graph structures. We achieve promising preliminary results, proposing a data augmentation method suitable for graph-based table representation.