超越愿景：对多模式和时间数据的自我监督表示学习的评论

论文标题

超越愿景：对多模式和时间数据的自我监督表示学习的评论

Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data

论文作者

Deldari, Shohreh, Xue, Hao, Saeed, Aaqib, He, Jiayuan, Smith, Daniel V., Salim, Flora D.

论文摘要

最近，自我监督的表示学习（SSRL）在计算机视觉，语音，自然语言处理（NLP）以及最近的其他类型的方式（包括传感器的时间序列）中引起了很多关注。自我监督学习的普及是由传统模型通常需要大量通知数据进行培训的事实所驱动的。获取带注释的数据可能是一个困难且昂贵的过程。已经引入了自我监督的方法，以通过使用从原始数据自由获得的监督信号对模型进行判别预训练来提高培训数据的效率。与现有的对SSRL的评论不同，该评论旨在以单一模态为单一模式的CV或NLP领域的方法，我们旨在为时间数据提供对多模式自我监督学习方法的首次全面审查。为此，我们1）提供了现有SSRL方法的全面分类，2）通过定义SSRL框架的关键组件来引入通用管道，3）3）根据其目标功能，网络体系结构和潜在应用程序比较现有模型，以及4）审查每个类别和各种模态中现有的多模态技术。最后，我们提出了现有的弱点和未来的机会。我们认为，我们的工作对使用多模式和/或时间数据的域中SSRL的要求有了一个观点

Recently, Self-Supervised Representation Learning (SSRL) has attracted much attention in the field of computer vision, speech, natural language processing (NLP), and recently, with other types of modalities, including time series from sensors. The popularity of self-supervised learning is driven by the fact that traditional models typically require a huge amount of well-annotated data for training. Acquiring annotated data can be a difficult and costly process. Self-supervised methods have been introduced to improve the efficiency of training data through discriminative pre-training of models using supervisory signals that have been freely obtained from the raw data. Unlike existing reviews of SSRL that have pre-dominately focused upon methods in the fields of CV or NLP for a single modality, we aim to provide the first comprehensive review of multimodal self-supervised learning methods for temporal data. To this end, we 1) provide a comprehensive categorization of existing SSRL methods, 2) introduce a generic pipeline by defining the key components of a SSRL framework, 3) compare existing models in terms of their objective function, network architecture and potential applications, and 4) review existing multimodal techniques in each category and various modalities. Finally, we present existing weaknesses and future opportunities. We believe our work develops a perspective on the requirements of SSRL in domains that utilise multimodal and/or temporal data

下载PDF全文

下载文献需遵守相关版权规定

论文标题