DEAT：欧洲艺术数据集

论文标题

DEAT：欧洲艺术数据集

DEArt: Dataset of European Art

论文作者

Reshetnikov, Artem, Marinescu, Maria-Cristina, Lopez, Joaquim More

论文摘要

在过去20年中，在研究社区公开可用的大型数据集是NLP或计算机视觉深度学习算法进步的关键因素。这些数据集通常是成对的对齐图像 /手动注释的元数据，其中图像是日常生活的照片。另一方面，学术和历史内容对待不一定会受到普通受众群体流行的主题，它们可能并不总是包含大量数据点，并且新数据可能难以收集或不可能收集。例如，确实存在一些例外，例如科学或健康数据，但文化遗产并非如此（CH）。计算机视觉中最佳模型的性能不佳 - 在对艺术品进行测试时 - 加上缺乏CH的大量注释数据集，而艺术品图像描述了照片未捕获的对象和动作，这表明CH特定的数据集将对这个社区非常有价值。我们提出了Deart，此时主要是对象检测和姿势分类数据集，旨在参考Xiith和Xviiith世纪之间的绘画。它包含15000多个图像，约80％的非偶像，与识别69个类别的所有实例的边界框以及识别类似人类对象的盒子的12个可能的姿势一致。其中，超过50个类是CH特异性的，因此不会出现在其他数据集中。这些反映了虚构的生物，符号实体和与艺术有关的其他类别。此外，现有数据集不包括姿势注释。我们的结果表明，文化遗产域的对象探测器可以通过转移学习达到与通用图像的最新模型相当的精确度。

Large datasets that were made publicly available to the research community over the last 20 years have been a key enabling factor for the advances in deep learning algorithms for NLP or computer vision. These datasets are generally pairs of aligned image / manually annotated metadata, where images are photographs of everyday life. Scholarly and historical content, on the other hand, treat subjects that are not necessarily popular to a general audience, they may not always contain a large number of data points, and new data may be difficult or impossible to collect. Some exceptions do exist, for instance, scientific or health data, but this is not the case for cultural heritage (CH). The poor performance of the best models in computer vision - when tested over artworks - coupled with the lack of extensively annotated datasets for CH, and the fact that artwork images depict objects and actions not captured by photographs, indicate that a CH-specific dataset would be highly valuable for this community. We propose DEArt, at this point primarily an object detection and pose classification dataset meant to be a reference for paintings between the XIIth and the XVIIIth centuries. It contains more than 15000 images, about 80% non-iconic, aligned with manual annotations for the bounding boxes identifying all instances of 69 classes as well as 12 possible poses for boxes identifying human-like objects. Of these, more than 50 classes are CH-specific and thus do not appear in other datasets; these reflect imaginary beings, symbolic entities and other categories related to art. Additionally, existing datasets do not include pose annotations. Our results show that object detectors for the cultural heritage domain can achieve a level of precision comparable to state-of-art models for generic images via transfer learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题