论文标题
如何以更有趣的方式描述图像?迈向模块化讽刺产生的方法
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation
论文作者
论文摘要
在先前的研究中,已经研究了讽刺性生成是通过将其视为文本到文本的生成问题,即为输入句子生成讽刺句子。在本文中,我们研究了一个跨模式讽刺产生(CMSG)的新问题,即为给定图像产生讽刺描述。 CMSG具有挑战性,因为模型需要满足讽刺的特征,以及不同方式之间的相关性。另外,两种方式之间应该存在一些不一致的情况,这需要想象力。此外,高质量的培训数据不足。为了解决这些问题,我们迈出了从图像中生成讽刺描述的一步,而无需配对训练数据,并提出了一种基于提取生成的模块化方法(EGRM),以用于跨模型的讽刺产生。具体而言,EGRM首先从不同级别的图像中提取各种信息,并使用所获得的图像标签,情感描述标题和基于常识的结果来生成候选讽刺文本。然后,提出了一种全面的排名算法,该算法考虑了图像文本关系,讽刺性和语法性,以从候选文本中选择最终文本。在五个标准的人类评估中,总共来自八个系统的1200个产生的图像文本对和辅助自动评估显示了我们方法的优越性。
Sarcasm generation has been investigated in previous studies by considering it as a text-to-text generation problem, i.e., generating a sarcastic sentence for an input sentence. In this paper, we study a new problem of cross-modal sarcasm generation (CMSG), i.e., generating a sarcastic description for a given image. CMSG is challenging as models need to satisfy the characteristics of sarcasm, as well as the correlation between different modalities. In addition, there should be some inconsistency between the two modalities, which requires imagination. Moreover, high-quality training data is insufficient. To address these problems, we take a step toward generating sarcastic descriptions from images without paired training data and propose an Extraction-Generation-Ranking based Modular method (EGRM) for cross-model sarcasm generation. Specifically, EGRM first extracts diverse information from an image at different levels and uses the obtained image tags, sentimental descriptive caption, and commonsense-based consequence to generate candidate sarcastic texts. Then, a comprehensive ranking algorithm, which considers image-text relation, sarcasticness, and grammaticality, is proposed to select a final text from the candidate texts. Human evaluation at five criteria on a total of 1200 generated image-text pairs from eight systems and auxiliary automatic evaluation show the superiority of our method.