论文标题

基于SMOTE和卷积神经网络的有效类不平衡学习

Effective Class-Imbalance learning based on SMOTE and Convolutional Neural Networks

论文作者

Joloudari, Javad Hassannataj, Marefat, Abdolreza, Nematollahi, Mohammad Ali, Oyelere, Solomon Sunday, Hussain, Sadiq

论文摘要

不平衡的数据(ID)是阻止机器学习(ML)模型的问题,以实现令人满意的结果。 ID是发生一种情况的情况,在这种情况下,属于一个类别的样本的数量超过了另一个类别的范围,这使得这样的模型学习过程偏向多数级别。近年来,为了解决这个问题,已经提出了一些解决方案,该解决方案选择合成为少数族裔类生成新数据,或者减少以平衡数据的多数类别的数量。因此,在本文中,我们研究了基于深神网络(DNN)和卷积神经网络(CNN)的方法的有效性,并与各种众所周知的不平衡数据解决方案混合,这意味着过采样和降采样。为了评估我们的方法,我们使用了龙骨,乳腺癌和Z-Alizadeh Sani数据集。为了获得可靠的结果,我们通过随机洗牌的数据分布进行了100次实验。分类结果表明,混合的合成少数族裔过采样技术(SMOTE) - 正态化-CNN优于不同方法,在24个不平衡数据集上达到了99.08%的精度。因此,提出的混合模型可以应用于其他实际数据集上的不平衡算法分类问题。

Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models for achieving satisfactory results. ID is the occurrence of a situation where the quantity of the samples belonging to one class outnumbers that of the other by a wide margin, making such models learning process biased towards the majority class. In recent years, to address this issue, several solutions have been put forward, which opt for either synthetically generating new data for the minority class or reducing the number of majority classes for balancing the data. Hence, in this paper, we investigate the effectiveness of methods based on Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs), mixed with a variety of well-known imbalanced data solutions meaning oversampling and undersampling. To evaluate our methods, we have used KEEL, breast cancer, and Z-Alizadeh Sani datasets. In order to achieve reliable results, we conducted our experiments 100 times with randomly shuffled data distributions. The classification results demonstrate that the mixed Synthetic Minority Oversampling Technique (SMOTE)-Normalization-CNN outperforms different methodologies achieving 99.08% accuracy on the 24 imbalanced datasets. Therefore, the proposed mixed model can be applied to imbalanced binary classification problems on other real datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源