论文标题
婴儿和成人发声声学的低维度代表
Low-dimensional representation of infant and adult vocalization acoustics
论文作者
论文摘要
在生命的最初几年中,婴儿发声发生了很大的变化,因为婴儿发展了能够产生语音的发声技巧。基于特定的声学特征,质子类别或语音转录的特征能够提供婴儿在不同年龄和不同情况下发出的声音的表示,但不能完全描述听众如何感知声音,无法在较大的尺度上获得效率,并且在没有其他统计处理的情况下很难在二维中可视化。基于机器学习的方法提供了机会,可以通过纯粹的数据驱动的婴儿声音来补充这些特征。在这里,我们使用光谱特征提取和无监督的机器学习,特别是均匀的歧管近似(UMAP),以获取从一天的家庭录制中提取的婴儿和照料者发声的新型二维空间表示。 UMAP产生了一个连续且分布良好的空间,有助于对婴儿声发的某些分析。例如,我们发现,在一天内,婴儿发声声音在2-D空间内的分散从3个月增加到9个月,然后从9个月减少到18个月。该方法还允许分析婴儿和成人发声之间的相似性,这也显示了婴儿年龄的变化。
During the first years of life, infant vocalizations change considerably, as infants develop the vocalization skills that enable them to produce speech sounds. Characterizations based on specific acoustic features, protophone categories, or phonetic transcription are able to provide a representation of the sounds infants make at different ages and in different contexts but do not fully describe how sounds are perceived by listeners, can be inefficient to obtain at large scales, and are difficult to visualize in two dimensions without additional statistical processing. Machine-learning-based approaches provide the opportunity to complement these characterizations with purely data-driven representations of infant sounds. Here, we use spectral features extraction and unsupervised machine learning, specifically Uniform Manifold Approximation (UMAP), to obtain a novel 2-dimensional spatial representation of infant and caregiver vocalizations extracted from day-long home recordings. UMAP yields a continuous and well-distributed space conducive to certain analyses of infant vocal development. For instance, we found that the dispersion of infant vocalization acoustics within the 2-D space over a day increased from 3 to 9 months, and then decreased from 9 to 18 months. The method also permits analysis of similarity between infant and adult vocalizations, which also shows changes with infant age.