论文标题

COVID-19数据集的Ugrwo-smpling:一种基于图形的修改后的随机步行不采样方法,用于数据分类不平衡

UGRWO-Sampling for COVID-19 dataset: A modified random walk under-sampling approach based on graphs to imbalanced data classification

论文作者

Roshanfekr, Saeideh, Esmaeili, Shahriar, Ataeian, Hassan, Amiri, Ali

论文摘要

本文提出了一个基于不平衡数据集图的图形的新的RWO采样(随机步行过采样)。在这种方法中,引入了基于减小和过采样方法的两种方案,以使近似信息与噪音和异常值保持稳定。在构建了少数族裔类别的第一个图之后,将在选定的样本上实现RWO缩采样,其余的将保持不变。第二个图是为多数类构建的,并去除低密度区域(离群值)中的样品。最后,在提出的方法中,选择了高密度区域中多数类的样品,其余的则消除了。此外,利用RWO采样,尽管没有提高异常值,但少数族裔的边界也会增加。该方法进行了测试,并将评估措施的数量与以前的九个连续属性数据集的方法进行了比较,该数据集具有不同的过采样率和一个用于诊断COVID-19的数据集。实验结果表明,提出的方法对数据分类的效率和灵活性很高和灵活性

This paper proposes a new RWO-Sampling (Random Walk Over-Sampling) based on graphs for imbalanced datasets. In this method, two schemes based on under-sampling and over-sampling methods are introduced to keep the proximity information robust to noises and outliers. After constructing the first graph on minority class, RWO-Sampling will be implemented on selected samples, and the rest will remain unchanged. The second graph is constructed for the majority class, and the samples in a low-density area (outliers) are removed. Finally, in the proposed method, samples of the majority class in a high-density area are selected, and the rest are eliminated. Furthermore, utilizing RWO-sampling, the boundary of minority class is increased though the outliers are not raised. This method is tested, and the number of evaluation measures is compared to previous methods on nine continuous attribute datasets with different over-sampling rates and one data set for the diagnosis of COVID-19 disease. The experimental results indicated the high efficiency and flexibility of the proposed method for the classification of imbalanced data

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源