论文标题
根据预计的基于变压器的语言模型在主动学习中调查表示文本数据集的有效性
Investigating the Effectiveness of Representations Based on Pretrained Transformer-based Language Models in Active Learning for Labelling Text Datasets
论文作者
论文摘要
活跃学习已被证明是减轻利用大量未标记数据进行机器学习任务所需的一些努力的有效方法,而无需完全标记它们。但是,用于表示主动学习时用来表示文本文档的表示机制对过程的有效性有重大影响。尽管基于Word2Vec等技术的简单矢量表示,例如单词袋和基于嵌入的表示形式,已被证明是在积极学习过程中表示文档的有效方法,但基于预训练的基于预培训的神经网络模型的表示机制的出现在自然语言处理研究中流行(E.G. BERT)流行(E.G. BERT)提供了一种有益的探索,并且尚未充分探索。本文介绍了基于预训练的基于变压器的语言模型的积极学习的代表性有效性的全面评估。该评估表明,尚未在主动学习中广泛使用的基于变压器的模型,尤其是类似于Bert的模型,对更常用的矢量表示(例如字袋或其他经典单词嵌入)进行了重大改进,例如Word2Vec。本文还研究了基于伯特(Bert)的变体(例如罗伯塔(Roberta),阿尔伯特(Albert)的变体)的有效性,并比较了[Cls]令牌表示的有效性和可以使用BERT样模型生成的汇总表示。最后,我们提出了一种方法自适应调整积极学习。我们的实验表明,在主动学习中获得的有限标签信息不仅可以用于培训分类器,而且还可以自适应地改善伯特式语言模型产生的嵌入。
Active learning has been shown to be an effective way to alleviate some of the effort required in utilising large collections of unlabelled data for machine learning tasks without needing to fully label them. The representation mechanism used to represent text documents when performing active learning, however, has a significant influence on how effective the process will be. While simple vector representations such as bag-of-words and embedding-based representations based on techniques such as word2vec have been shown to be an effective way to represent documents during active learning, the emergence of representation mechanisms based on the pre-trained transformer-based neural network models popular in natural language processing research (e.g. BERT) offer a promising, and as yet not fully explored, alternative. This paper describes a comprehensive evaluation of the effectiveness of representations based on pre-trained transformer-based language models for active learning. This evaluation shows that transformer-based models, especially BERT-like models, that have not yet been widely used in active learning, achieve a significant improvement over more commonly used vector representations like bag-of-words or other classical word embeddings like word2vec. This paper also investigates the effectiveness of representations based on variants of BERT such as Roberta, Albert as well as comparing the effectiveness of the [CLS] token representation and the aggregated representation that can be generated using BERT-like models. Finally, we propose an approach Adaptive Tuning Active Learning. Our experiments show that the limited label information acquired in active learning can not only be used for training a classifier but can also adaptively improve the embeddings generated by the BERT-like language models as well.