EEG2Mel：从大脑反应到音乐的声音重建声音

论文标题

EEG2Mel：从大脑反应到音乐的声音重建声音

EEG2Mel: Reconstructing Sound from Brain Responses to Music

论文作者

Ramirez-Aristizabal, Adolfo G., Kello, Chris

论文摘要

从大脑对听觉和视觉刺激的响应中的信息检索通过在录制脑电图信号时呈现给参与者的歌曲名称和图像类别的分类显示了成功。以重建听觉刺激的形式进行信息检索也显示出一些成功，但是在这里我们通过对音乐刺激的重建足够好，可以被独立地看到和识别来改进以前的方法。此外，为每个相应的脑电图记录的一秒钟窗口对时间分配的音乐刺激谱进行了深度学习模型，与先前的研究相比，这大大降低了所需的提取步骤。参与者的NMED-TEMPO和NMED-HINDI数据集被动地收听全长歌曲，用于训练和验证卷积神经网络（CNN）回归器。测试了原始电压与功率谱输入与线性与MEL频谱图的功效，并将所有输入和输出转换为2D图像。通过训练分类器评估了重建光谱图的质量，该分类器的MEL光谱图的精度为81％，线性光谱图（10％的机会精度）的精度为72％。最后，在两种抗性的匹配到样本任务中，听众以85％的成功率（50％机会）歧视听觉音乐刺激的重建。

Information retrieval from brain responses to auditory and visual stimuli has shown success through classification of song names and image classes presented to participants while recording EEG signals. Information retrieval in the form of reconstructing auditory stimuli has also shown some success, but here we improve on previous methods by reconstructing music stimuli well enough to be perceived and identified independently. Furthermore, deep learning models were trained on time-aligned music stimuli spectrum for each corresponding one-second window of EEG recording, which greatly reduces feature extraction steps needed when compared to prior studies. The NMED-Tempo and NMED-Hindi datasets of participants passively listening to full length songs were used to train and validate Convolutional Neural Network (CNN) regressors. The efficacy of raw voltage versus power spectrum inputs and linear versus mel spectrogram outputs were tested, and all inputs and outputs were converted into 2D images. The quality of reconstructed spectrograms was assessed by training classifiers which showed 81% accuracy for mel-spectrograms and 72% for linear spectrograms (10% chance accuracy). Lastly, reconstructions of auditory music stimuli were discriminated by listeners at an 85% success rate (50% chance) in a two-alternative match-to-sample task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题