可解释的人工智能和基于因果推理的ATM欺诈检测

论文标题

可解释的人工智能和基于因果推理的ATM欺诈检测

Explainable Artificial Intelligence and Causal Inference based ATM Fraud Detection

论文作者

Vivek, Yelleti, Ravi, Vadlamani, Mane, Abhay Anand, Naidu, Laveti Ramesh

论文摘要

在金融领域，获得客户的信任并提供同理心非常关键。频繁发生欺诈活动会影响这两个因素。因此，金融组织和银行必须尽最大努力减轻它们。其中，ATM欺诈性交易是银行面临的常见问题。以下是欺诈数据集涉及的关键挑战：数据集高度不平衡，欺诈模式正在发生变化等。由于欺诈活动的稀有性，欺诈检测可以被提出为二进制分类问题或一个类别分类（OCC）。在这项研究中，我们在从印度收集的ATM交易数据集上处理了这些技术。在二进制分类中，我们研究了各种过度采样技术的有效性，例如合成少数族裔过采样技术（SMOTE）及其变体，生成的对抗性网络（GAN），以实现过度采样。此外，我们采用了各种机器学习技术，即天真的贝叶斯（NB），逻辑回归（LR），支持矢量机（SVM），决策树（DT），随机森林（RF），梯度增强树（GBT），多层型培训者Perceptron（MLP）。 GBT通过达到0.963 AUC来优于其余模型，而DT则以0.958 AUC为第二。如果考虑复杂性和解释性方面，DT是赢家。在所有过采样方法中，Smote及其变体的性能更好。在OCC中，Iforest达到0.959 Cr，OCSVM以0.947 Cr获得了第二名。此外，我们将可解释的人工智能（XAI）和因果推理（CI）纳入了欺诈检测框架，并通过各种分析进行了研究。

Gaining the trust of customers and providing them empathy are very critical in the financial domain. Frequent occurrence of fraudulent activities affects these two factors. Hence, financial organizations and banks must take utmost care to mitigate them. Among them, ATM fraudulent transaction is a common problem faced by banks. There following are the critical challenges involved in fraud datasets: the dataset is highly imbalanced, the fraud pattern is changing, etc. Owing to the rarity of fraudulent activities, Fraud detection can be formulated as either a binary classification problem or One class classification (OCC). In this study, we handled these techniques on an ATM transactions dataset collected from India. In binary classification, we investigated the effectiveness of various over-sampling techniques, such as the Synthetic Minority Oversampling Technique (SMOTE) and its variants, Generative Adversarial Networks (GAN), to achieve oversampling. Further, we employed various machine learning techniques viz., Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Gradient Boosting Tree (GBT), Multi-layer perceptron (MLP). GBT outperformed the rest of the models by achieving 0.963 AUC, and DT stands second with 0.958 AUC. DT is the winner if the complexity and interpretability aspects are considered. Among all the oversampling approaches, SMOTE and its variants were observed to perform better. In OCC, IForest attained 0.959 CR, and OCSVM secured second place with 0.947 CR. Further, we incorporated explainable artificial intelligence (XAI) and causal inference (CI) in the fraud detection framework and studied it through various analyses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题