论文标题
单词方程式:固有可解释的稀疏字通过稀疏编码
Word Equations: Inherently Interpretable Sparse Word Embeddingsthrough Sparse Coding
论文作者
论文摘要
单词嵌入是一种强大的自然语言处理技术,但是它们很难解释。为了启用可解释的NLP模型,我们创建了每个维度本质上可以解释的向量。从固有的上讲,我们的意思是一个系统,每个维度都与某些人类可以理解的提示相关联,该暗示可以描述该维度的含义。为了创建更容易解释的单词嵌入,我们将验证的密集嵌入嵌入转换为稀疏的嵌入。这些新的嵌入是可以固有地解释的:它们的每个维度都是由自然语言或特定语法概念创建的。我们通过稀疏编码构建这些嵌入,在基础集中的每个向量本身都是一个单词嵌入。因此,我们稀疏的向量的每个维度都对应于自然语言词。我们还表明,使用这些稀疏嵌入的训练的模型可以实现良好的性能,并且在实践中更容易解释,包括通过人类评估。
Word embeddings are a powerful natural language processing technique, but they are extremely difficult to interpret. To enable interpretable NLP models, we create vectors where each dimension is inherently interpretable. By inherently interpretable, we mean a system where each dimension is associated with some human understandable hint that can describe the meaning of that dimension. In order to create more interpretable word embeddings, we transform pretrained dense word embeddings into sparse embeddings. These new embeddings are inherently interpretable: each of their dimensions is created from and represents a natural language word or specific grammatical concept. We construct these embeddings through sparse coding, where each vector in the basis set is itself a word embedding. Therefore, each dimension of our sparse vectors corresponds to a natural language word. We also show that models trained using these sparse embeddings can achieve good performance and are more interpretable in practice, including through human evaluations.