在Medicare欺诈检测中，特征提取和类抽样组成的影响

论文标题

在Medicare欺诈检测中，特征提取和类抽样组成的影响

Impact of the composition of feature extraction and class sampling in medicare fraud detection

论文作者

Kumari, Akrity, Punn, Narinder Singh, Sonbhadra, Sanjay Kumar, Agarwal, Sonali

论文摘要

由于医疗保健是关键方面，健康保险已成为最大程度地减少医疗费用的重要计划。此后，由于保险的增加，医疗保健行业的欺诈活动大幅增加，欺诈行业已成为医疗费用上升的重要贡献，尽管可以使用欺诈检测技术来减轻其影响。为了检测欺诈，使用机器学习技术。美国联邦政府的医疗补助和医疗保险服务中心（CMS）在本研究中使用“医疗保险D部分”保险索赔来开发欺诈检测系统。在类不平衡且高维Medicare数据集中使用机器学习算法是一项艰巨的任务。为了紧凑此类挑战，目前的工作旨在在数据采样之后执行功能提取，然后应用各种分类算法，以获得更好的性能。特征提取是一种降低降低方法，该方法将属性转换为实际属性的线性或非线性组合，生成较小，更多样化的属性集，从而降低了尺寸。数据采样通常用于通过扩大少数族裔类的频率或降低多数类的频率以获得两种类别的出现数量大约相等的频率来解决类不平衡。通过标准性能指标评估所提出的方法。因此，为了有效地检测欺诈，本研究将自动编码器作为特征提取技术，合成少数族裔过采样技术（SMOTE）作为数据采样技术以及各种基于决策树的分类器作为分类算法。实验结果表明，自动编码器的结合，然后在LightGBM分类器上获得SMOTE的结合，取得了最佳的结果。

With healthcare being critical aspect, health insurance has become an important scheme in minimizing medical expenses. Following this, the healthcare industry has seen a significant increase in fraudulent activities owing to increased insurance, and fraud has become a significant contributor to rising medical care expenses, although its impact can be mitigated using fraud detection techniques. To detect fraud, machine learning techniques are used. The Centers for Medicaid and Medicare Services (CMS) of the United States federal government released "Medicare Part D" insurance claims is utilized in this study to develop fraud detection system. Employing machine learning algorithms on a class-imbalanced and high dimensional medicare dataset is a challenging task. To compact such challenges, the present work aims to perform feature extraction following data sampling, afterward applying various classification algorithms, to get better performance. Feature extraction is a dimensionality reduction approach that converts attributes into linear or non-linear combinations of the actual attributes, generating a smaller and more diversified set of attributes and thus reducing the dimensions. Data sampling is commonlya used to address the class imbalance either by expanding the frequency of minority class or reducing the frequency of majority class to obtain approximately equal numbers of occurrences for both classes. The proposed approach is evaluated through standard performance metrics. Thus, to detect fraud efficiently, this study applies autoencoder as a feature extraction technique, synthetic minority oversampling technique (SMOTE) as a data sampling technique, and various gradient boosted decision tree-based classifiers as a classification algorithm. The experimental results show the combination of autoencoders followed by SMOTE on the LightGBM classifier achieved best results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题