论文标题
神经网络的成功和关键失败在捕获类似人类的语音识别方面
Successes and critical failures of neural networks in capturing human-like speech recognition
论文作者
论文摘要
自然和人工试镜原则上可以为给定问题获取不同的解决方案。然而,任务的限制可以推动试镜的认知科学和工程性,以定性地融合,这表明更仔细的相互检查可能会富含人工听力系统和思维和大脑的过程模型。语音识别 - 这种探索成熟的区域 - 在人类中对各种光谱粒度的数字转换本质上是强大的。这些鲁棒性概况在多大程度上由高性能的神经网络系统解释?我们将单个合成框架的语音识别实验汇总在一起,以评估最新的神经网络作为可刺激的,优化的观察者。在一系列实验中,我们(1)澄清文献中的有影响力的语音操纵如何相互关系和与自然语音有关,(2)显示机器表现出粒度的粒度,在该粒度上,将经典的经典感知现象复制在人类中的经典感知现象,(3)在人类的模型中均逐渐恢复过来,并且(3)在人类的模型中均与人类的模型相差(4),(4)均无方面的预测,(4)为理论和模型建设提出替代方向。这些发现鼓励了认知科学与试听工程之间的协同作用。
Natural and artificial audition can in principle acquire different solutions to a given problem. The constraints of the task, however, can nudge the cognitive science and engineering of audition to qualitatively converge, suggesting that a closer mutual examination would potentially enrich artificial hearing systems and process models of the mind and brain. Speech recognition - an area ripe for such exploration - is inherently robust in humans to a number transformations at various spectrotemporal granularities. To what extent are these robustness profiles accounted for by high-performing neural network systems? We bring together experiments in speech recognition under a single synthesis framework to evaluate state-of-the-art neural networks as stimulus-computable, optimized observers. In a series of experiments, we (1) clarify how influential speech manipulations in the literature relate to each other and to natural speech, (2) show the granularities at which machines exhibit out-of-distribution robustness, reproducing classical perceptual phenomena in humans, (3) identify the specific conditions where model predictions of human performance differ, and (4) demonstrate a crucial failure of all artificial systems to perceptually recover where humans do, suggesting alternative directions for theory and model building. These findings encourage a tighter synergy between the cognitive science and engineering of audition.