论文标题
用于序列路由向量的算法
An Algorithm for Routing Vectors in Sequences
论文作者
论文摘要
我们提出了一种采用一系列向量的路由算法,并计算具有指定长度和向量大小的新序列。每个输出矢量通过更好地预测输入向量来最大化“每位爆炸”,使用净收益和忽略数据的净收益之间的差异。我们将输出向量描述为几何对象,是分配信用的潜在变量,就像在关联记忆模型中的查询状态,以及在心理社会模型中的代理。我们通过优化实现算法,以减少参数计数,计算和记忆使用数量级的使用,从而使我们能够路由比以前更大的长度序列路由。我们评估了我们对自然语言和视觉分类任务的实施,从而获得了竞争性或最先进的准确性以及可解释的端到端信用分配。
We propose a routing algorithm that takes a sequence of vectors and computes a new sequence with specified length and vector size. Each output vector maximizes "bang per bit," the difference between a net benefit to use and net cost to ignore data, by better predicting the input vectors. We describe output vectors as geometric objects, as latent variables that assign credit, as query states in a model of associative memory, and as agents in a model of a Society of Mind. We implement the algorithm with optimizations that reduce parameter count, computation, and memory use by orders of magnitude, enabling us to route sequences of greater length than previously possible. We evaluate our implementation on natural language and visual classification tasks, obtaining competitive or state-of-the-art accuracy and end-to-end credit assignments that are interpretable.