论文标题
他们在疑问时要做什么:Seq2Seq学习者中的归纳偏见的研究
What they do when in doubt: a study of inductive biases in seq2seq learners
论文作者
论文摘要
序列到序列(SEQ2SEQ)学习者被广泛使用,但我们仍然只了解什么感应偏见来塑造他们的推广方式。我们通过调查流行的SEQ2SEQ学习者如何在培训数据中具有很高歧义的任务中概括如何解决这一问题。我们使用扫描和三个新任务来研究学习者对记忆,算术,分层和组成推理的偏好。此外,我们与所罗门诺夫的归纳理论联系在一起,并建议将描述长度用作归纳偏见的原则性和敏感的度量。 在我们的实验研究中,我们发现基于LSTM的学习者可以通过一个训练示例中的常数学习计数,加法和乘法。此外,变压器和基于LSTM的学习者对线性构造归纳归纳有偏见,而基于CNN的学习者则偏爱相反的情况。在扫描数据集上,我们发现基于CNN的基于CNN的程度,并且在较小的程度上,基于变形金刚和LSTM的学习者偏爱组成概括而不是记忆。最后,在我们的所有实验中,描述长度被证明是对感应偏见的敏感度量。
Sequence-to-sequence (seq2seq) learners are widely used, but we still have only limited knowledge about what inductive biases shape the way they generalize. We address that by investigating how popular seq2seq learners generalize in tasks that have high ambiguity in the training data. We use SCAN and three new tasks to study learners' preferences for memorization, arithmetic, hierarchical, and compositional reasoning. Further, we connect to Solomonoff's theory of induction and propose to use description length as a principled and sensitive measure of inductive biases. In our experimental study, we find that LSTM-based learners can learn to perform counting, addition, and multiplication by a constant from a single training example. Furthermore, Transformer and LSTM-based learners show a bias toward the hierarchical induction over the linear one, while CNN-based learners prefer the opposite. On the SCAN dataset, we find that CNN-based, and, to a lesser degree, Transformer- and LSTM-based learners have a preference for compositional generalization over memorization. Finally, across all our experiments, description length proved to be a sensitive measure of inductive biases.