论文标题
Artelingo:Wikiart的一百万个情感注释,重点是语言和文化的多样性
ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and Culture
论文作者
论文摘要
本文介绍了Artelingo,这是一种新的基准和数据集,旨在鼓励跨语言和文化的多样性工作。在Artemis之后,Artelingo收集了Wikiart的80K艺术品,并具有4.5万个情感标签和仅英文字幕的字幕,在阿拉伯语和中文中增加了7900万个注释,以及4.8k的西班牙语,以评估“文化转移”的表现。超过51k的艺术品具有3种语言的5个或更多注释。这种多样性使研究语言和文化之间的相似性和差异成为可能。此外,我们研究字幕任务并发现多样性改善了基线模型的性能。 Artelingo可在https://www.artelingo.org/上公开获得,并带有标准分割和基线模型。我们希望我们的工作将有助于减轻对多语言和文化意识AI的未来研究。
This paper introduces ArtELingo, a new benchmark and dataset, designed to encourage work on diversity across languages and cultures. Following ArtEmis, a collection of 80k artworks from WikiArt with 0.45M emotion labels and English-only captions, ArtELingo adds another 0.79M annotations in Arabic and Chinese, plus 4.8K in Spanish to evaluate "cultural-transfer" performance. More than 51K artworks have 5 annotations or more in 3 languages. This diversity makes it possible to study similarities and differences across languages and cultures. Further, we investigate captioning tasks, and find diversity improves the performance of baseline models. ArtELingo is publicly available at https://www.artelingo.org/ with standard splits and baseline models. We hope our work will help ease future research on multilinguality and culturally-aware AI.