论文标题
为什么注意力可能无法解释?
Why Attentions May Not Be Interpretable?
论文作者
论文摘要
基于注意力的方法在模型解释中起着重要作用,在模型解释中,预计注意力的权重突出了输入的关键部分〜(例如,句子中的关键字)。但是,最近的研究发现,重要的意义解释通常无法正常正常。例如,学到的注意力权重有时会突出显示“ [SEP]”,“”,“”和“”等有意义的标记,并且经常与其他特征重要的指标(例如基于梯度的措施)不相关。最近关于注意力是否解释的辩论引起了人们的极大兴趣。在本文中,我们证明了这种现象的一个根本原因是组合捷径,这意味着,除了突出显示的部分外,注意力重量本身可能会携带额外的信息,而注意力层后下游模型可以使用这些信息。结果,注意力重量不再是纯粹的重要性指标。我们从理论上分析了组合快捷方式,设计一个直观的实验来显示它们的存在,并提出了两种减轻此问题的方法。我们对基于注意力的解释模型进行经验研究。结果表明,所提出的方法可以有效地提高注意力机制的解释性。
Attention-based methods have played important roles in model interpretations, where the calculated attention weights are expected to highlight the critical parts of inputs~(e.g., keywords in sentences). However, recent research found that attention-as-importance interpretations often do not work as we expected. For example, learned attention weights sometimes highlight less meaningful tokens like "[SEP]", ",", and ".", and are frequently uncorrelated with other feature importance indicators like gradient-based measures. A recent debate over whether attention is an explanation or not has drawn considerable interest. In this paper, we demonstrate that one root cause of this phenomenon is the combinatorial shortcuts, which means that, in addition to the highlighted parts, the attention weights themselves may carry extra information that could be utilized by downstream models after attention layers. As a result, the attention weights are no longer pure importance indicators. We theoretically analyze combinatorial shortcuts, design one intuitive experiment to show their existence, and propose two methods to mitigate this issue. We conduct empirical studies on attention-based interpretation models. The results show that the proposed methods can effectively improve the interpretability of attention mechanisms.