论文标题
基于注意的神经细胞自动机
Attention-based Neural Cellular Automata
论文作者
论文摘要
细胞自动机(CA)的最新扩展已结合了现代深度学习的关键思想,极大地扩展了其能力,并催化了新的神经细胞自动机(NCA)技术。受基于变压器的体系结构的启发,我们的作品介绍了使用空间本地化的$ \ unicode {x2014} $形成的新类$ \ textit {coative {coative} $ ncas,但全球有组织的$ \ unicode {x2014} $自我注意力集中方案。我们介绍了名为$ \ textit {Vision Transformer Cellular automata} $(VITCA)的实例。我们提出了定量和定性的结果,以将跨六个基准数据集进行自动编码,将VITCA与U-NET,一个基于U-NET的CA基线(UNETCA)和视觉变压器(VIT)进行比较。当比较与类似参数复杂性的体系结构进行比较时,VITCA体系结构在所有基准和几乎每个评估度量标准中都能产生卓越的性能。我们介绍了一项关于VITCA的各种结构构型的消融研究,对其对细胞态的影响的分析以及对其电感偏见的研究。最后,我们通过线性探针对其融合细胞状态隐藏表示形式进行了学习的表示,与我们的U-NET,VIT和UNETCA基准相比,平均得出的结果是出色的。
Recent extensions of Cellular Automata (CA) have incorporated key ideas from modern deep learning, dramatically extending their capabilities and catalyzing a new family of Neural Cellular Automata (NCA) techniques. Inspired by Transformer-based architectures, our work presents a new class of $\textit{attention-based}$ NCAs formed using a spatially localized$\unicode{x2014}$yet globally organized$\unicode{x2014}$self-attention scheme. We introduce an instance of this class named $\textit{Vision Transformer Cellular Automata}$ (ViTCA). We present quantitative and qualitative results on denoising autoencoding across six benchmark datasets, comparing ViTCA to a U-Net, a U-Net-based CA baseline (UNetCA), and a Vision Transformer (ViT). When comparing across architectures configured to similar parameter complexity, ViTCA architectures yield superior performance across all benchmarks and for nearly every evaluation metric. We present an ablation study on various architectural configurations of ViTCA, an analysis of its effect on cell states, and an investigation on its inductive biases. Finally, we examine its learned representations via linear probes on its converged cell state hidden representations, yielding, on average, superior results when compared to our U-Net, ViT, and UNetCA baselines.