独立的注意力体重表现得足够：对自我监视的音频变压器的注意力研究

论文标题

独立的注意力体重表现得足够：对自我监视的音频变压器的注意力研究

Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers

论文作者

Wu, Tsung-Han, Hsieh, Chun-Chen, Chen, Yen-Hao, Chi, Po-Han, Lee, Hung-yi

论文摘要

在本文中，我们寻求解决方案来降低基于变压器模型的计算复杂性，以进行语音表示学习。我们评估了10种关注算法；然后，我们以自我监管的方式使用这些注意力算法进行基于变压器的模型预先培训，并将其视为下游任务的功能提取器，包括音素分类和说话者分类。在T-SNE，PCA和一些观察结果的帮助下，可以将自我观察音频变压器中的注意力重量分为四种一般情况。基于这些情况和一些分析，我们能够使用一组特定的注意力权重来初始化模型。我们的方法表现出与典型自我注意的可比性能，但在训练和推理中需要减少20％的时间。

In this paper, we seek solutions for reducing the computation complexity of transformer-based models for speech representation learning. We evaluate 10 attention algorithms; then, we pre-train the transformer-based model with those attention algorithms in a self-supervised fashion and treat them as feature extractors on downstream tasks, including phoneme classification and speaker classification. With the assistance of t-SNE, PCA and some observation, the attention weights in self-supervised audio transformers can be categorized into four general cases. Based on these cases and some analyses, we are able to use a specific set of attention weights to initialize the model. Our approach shows comparable performance to the typical self-attention yet requires 20% less time in both training and inference.

下载PDF全文

下载文献需遵守相关版权规定

论文标题