论文标题
时间意识的大内核卷积
Time-aware Large Kernel Convolutions
论文作者
论文摘要
迄今为止,大多数最先进的序列建筑架构都使用注意力来构建基于语言的任务的生成模型。其中一些模型使用所有可用序列令牌来产生注意力分布,从而导致时间复杂性为$ O(n^2)$。另外,他们利用SoftMax归一化的$ k $的SoftMax归一化粒子充当有限的窗口自我注意力,从而导致时间复杂性为$ O(k {\ cdot} n)$。在本文中,我们介绍了时间吸引大的内核(Talk)Ressolutions,这是一种新型的自适应卷积操作,学会了预测求和内核的大小,而不是使用固定尺寸的内核矩阵。该方法产生的时间复杂性为$ O(N)$,有效地使序列编码过程线性与令牌数量。我们在大规模标准机器翻译,抽象性摘要和语言建模数据集上评估了提出的方法,并表明谈话卷积比其他基于其他注意/卷积的方法有效改进。
To date, most state-of-the-art sequence modeling architectures use attention to build generative models for language based tasks. Some of these models use all the available sequence tokens to generate an attention distribution which results in time complexity of $O(n^2)$. Alternatively, they utilize depthwise convolutions with softmax normalized kernels of size $k$ acting as a limited-window self-attention, resulting in time complexity of $O(k{\cdot}n)$. In this paper, we introduce Time-aware Large Kernel (TaLK) Convolutions, a novel adaptive convolution operation that learns to predict the size of a summation kernel instead of using a fixed-sized kernel matrix. This method yields a time complexity of $O(n)$, effectively making the sequence encoding process linear to the number of tokens. We evaluate the proposed method on large-scale standard machine translation, abstractive summarization and language modeling datasets and show that TaLK Convolutions constitute an efficient improvement over other attention/convolution based approaches.