数据集大小和基于长期ECOG的BCI使用对深度学习解码性能的影响

论文标题

数据集大小和基于长期ECOG的BCI使用对深度学习解码性能的影响

Impact of dataset size and long-term ECoG-based BCI usage on deep learning decoders performance

论文作者

Śliwowski, Maciej, Martin, Matthieu, Souloumiac, Antoine, Blanchart, Pierre, Aksenova, Tetiana

论文摘要

在大脑计算机界面（BCI）研究中，记录数据耗时且昂贵，这限制了对大数据集的访问。这可能会影响BCI系统的性能，因为机器学习方法在很大程度上取决于训练数据集的大小。出现重要的问题：考虑到神经元信号特征（例如非平稳性），我们可以通过更多数据来实现更高的解码性能来训练解码器吗？在长期BCI研究的情况下，随着时间的推移进一步改善的观点是什么？在这项研究中，我们从两个主要角度研究了长期记录对电动图像解码的影响：有关数据集大小和患者适应潜力的模型要求。我们评估了长期BCI和四边形NCT02550522的多线性模型和两个深度学习模型（DL）模型，其中包含43个由四脑术患者执行的ECOG记录的43次疗程。在实验中，参与者使用运动图像模式执行了3D虚拟手工翻译。我们设计了多个计算实验，其中增加或翻译训练数据集以研究模型性能与影响记录的不同因素之间的关系。我们的分析表明，在培训数据集中添加更多数据可能不会立即提高已经包含40分钟信号的数据集的性能。与多线性模型相比，DL解码器在数据集大小上显示出类似的要求，同时证明了更高的解码性能。此外，通过在实验后面记录的相对较小的数据集获得了高解码性能，这表明运动图像模式改善和患者适应。最后，我们提出了UMAP嵌入和局部固有维度，以可视化数据并可能评估数据质量。

In brain-computer interfaces (BCI) research, recording data is time-consuming and expensive, which limits access to big datasets. This may influence the BCI system performance as machine learning methods depend strongly on the training dataset size. Important questions arise: taking into account neuronal signal characteristics (e.g., non-stationarity), can we achieve higher decoding performance with more data to train decoders? What is the perspective for further improvement with time in the case of long-term BCI studies? In this study, we investigated the impact of long-term recordings on motor imagery decoding from two main perspectives: model requirements regarding dataset size and potential for patient adaptation. We evaluated the multilinear model and two deep learning (DL) models on a long-term BCI and Tetraplegia NCT02550522 clinical trial dataset containing 43 sessions of ECoG recordings performed with a tetraplegic patient. In the experiment, a participant executed 3D virtual hand translation using motor imagery patterns. We designed multiple computational experiments in which training datasets were increased or translated to investigate the relationship between models' performance and different factors influencing recordings. Our analysis showed that adding more data to the training dataset may not instantly increase performance for datasets already containing 40 minutes of the signal. DL decoders showed similar requirements regarding the dataset size compared to the multilinear model while demonstrating higher decoding performance. Moreover, high decoding performance was obtained with relatively small datasets recorded later in the experiment, suggesting motor imagery patterns improvement and patient adaptation. Finally, we proposed UMAP embeddings and local intrinsic dimensionality as a way to visualize the data and potentially evaluate data quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题