论文标题
Picknet:临时麦克风阵列的实时频道选择
PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays
论文作者
论文摘要
本文提出了PickNet,PickNet是一种用于实时通道选择的神经网络模型,用于临时麦克风阵列,该模型由手机等多个记录设备组成。 PickNet假设最多在每个时间点都具有声音活跃,则可以通过使用仅数百毫秒的短频谱贴片来识别每个时间框架上最接近活动人员的设备。该模型应用于每个时间范围,并且从所选麦克风中的短时框架信号在整个框架上加入以产生输出信号。由于个人设备通常与所有者接近保持,因此预计输出信号的平均信噪比和直接转换比率高于输入信号。由于PickNet在每个时间范围内仅使用有限的声学环境,因此使用建议模型的系统实时起作用,并且对声学条件的变化非常有力。基于语音识别的评估是通过使用各种智能手机获得的真实对话记录进行的。所提出的模型使用块连接界限和单个远处麦克风在系统上产生了显着的单词错误率,计算成本有限。
This paper proposes PickNet, a neural network model for real-time channel selection for an ad hoc microphone array consisting of multiple recording devices like cell phones. Assuming at most one person to be vocally active at each time point, PickNet identifies the device that is spatially closest to the active person for each time frame by using a short spectral patch of just hundreds of milliseconds. The model is applied to every time frame, and the short time frame signals from the selected microphones are concatenated across the frames to produce an output signal. As the personal devices are usually held close to their owners, the output signal is expected to have higher signal-to-noise and direct-to-reverberation ratios on average than the input signals. Since PickNet utilizes only limited acoustic context at each time frame, the system using the proposed model works in real time and is robust to changes in acoustic conditions. Speech recognition-based evaluation was carried out by using real conversational recordings obtained with various smartphones. The proposed model yielded significant gains in word error rate with limited computational cost over systems using a block-online beamformer and a single distant microphone.