LSTM如何编码语法：探索自然文本上的上下文矢量和半定量化

论文标题

LSTM如何编码语法：探索自然文本上的上下文矢量和半定量化

How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text

论文作者

Shibata, Chihiro, Uchiumi, Kei, Mochihashi, Daichi

论文摘要

长期的短期记忆复发性神经网络（LSTM）被广泛使用并已知来捕获信息丰富的长期句法依赖性。但是，如何在其自然文本的内部向量中反映了这些信息尚未得到充分研究。我们通过学习隐式构成句法结构的语言模型来分析它们。我们从经验上表明，上下文更新向量，即内部门的输出，大约量化为二进制或三元值，以帮助语言模型准确地计算嵌套的深度，如Suzgun等人。（2019年）最近显示了合成型戴克语言。对于上下文向量中的某些维度，我们表明它们的激活与短语结构的深度（例如VP和NP）高度相关。此外，使用$ L_1 $正则化，我们还发现它可以准确地预测一个单词是否来自上下文向量的少量组件中的短语结构内部。即使是从原始文本学习的情况下，上下文向量仍然与短语结构相关。最后，我们表明，功能单词的自然簇和触发短语的演讲的一部分是在LSTM的上下文更新向量的小但主要子空间中表示的。

Long Short-Term Memory recurrent neural network (LSTM) is widely used and known to capture informative long-term syntactic dependencies. However, how such information are reflected in its internal vectors for natural text has not yet been sufficiently investigated. We analyze them by learning a language model where syntactic structures are implicitly given. We empirically show that the context update vectors, i.e. outputs of internal gates, are approximately quantized to binary or ternary values to help the language model to count the depth of nesting accurately, as Suzgun et al. (2019) recently show for synthetic Dyck languages. For some dimensions in the context vector, we show that their activations are highly correlated with the depth of phrase structures, such as VP and NP. Moreover, with an $L_1$ regularization, we also found that it can accurately predict whether a word is inside a phrase structure or not from a small number of components of the context vector. Even for the case of learning from raw text, context vectors are shown to still correlate well with the phrase structures. Finally, we show that natural clusters of the functional words and the part of speeches that trigger phrases are represented in a small but principal subspace of the context-update vector of LSTM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题