桅杆：多模式抽象摘要，三级分层关注

论文标题

桅杆：多模式抽象摘要，三级分层关注

MAST: Multimodal Abstractive Summarization with Trimodal Hierarchical Attention

论文作者

Khullar, Aman, Arora, Udit

论文摘要

本文介绍了MAST，这是一种用于多模式抽象文本摘要的新模型，该模型在多模式视频中利用了所有三种模式（文本，音频和视频）中的信息。多模式抽象文本摘要的先前工作仅利用文本和视频方式中的信息。我们研究了从音频方式得出信息的有用性和挑战，并提出了基于序列的序列到序列层次层次注意模型，该模型通过让模型更多地关注文本模式来克服这些挑战。 Mast的表现优于最新模型（视频文本）的当前状态，而在How2数据集中，对于内容F1得分，rouge-l得分为1.00点，以了解多模式语言的理解。

This paper presents MAST, a new model for Multimodal Abstractive Text Summarization that utilizes information from all three modalities -- text, audio and video -- in a multimodal video. Prior work on multimodal abstractive text summarization only utilized information from the text and video modalities. We examine the usefulness and challenges of deriving information from the audio modality and present a sequence-to-sequence trimodal hierarchical attention-based model that overcomes these challenges by letting the model pay more attention to the text modality. MAST outperforms the current state of the art model (video-text) by 2.51 points in terms of Content F1 score and 1.00 points in terms of Rouge-L score on the How2 dataset for multimodal language understanding.

下载PDF全文

下载文献需遵守相关版权规定

论文标题