Frote：反馈规则驱动的编辑模型的过采样

论文标题

Frote：反馈规则驱动的编辑模型的过采样

FROTE: Feedback Rule-Driven Oversampling for Editing Models

论文作者

Alkan, Öznur, Wei, Dennis, Mattetti, Massimiliano, Nair, Rahul, Daly, Elizabeth M., Saha, Diptikalyan

论文摘要

机器学习模型可能涉及由于规则和法规的更新，例如在贷款批准或索赔管理中随着时间的变化而变化。但是，在这种情况下，可能需要花费足够的培训数据积累，以便重新训练模型以反映新的决策界限。尽管已经完成了加强现有决策界限的工作，但在涵盖了ML模型的决策边界以反映新规则的情况下，几乎没有采取很少的工作来涵盖这些方案。在本文中，我们关注用户提供的反馈规则，以加快ML模型更新过程，并正式介绍预处理培训数据的问题，以编辑ML模型，以响应反馈规则，以便一旦模型在预处理的数据上重新验证，其决策边界就会与规则更加紧密地与规则相吻合。为了解决这个问题，我们提出了一种新型的数据增强方法，即基于反馈规则的过采样技术。使用不同ML模型和现实世界数据集的广泛实验证明了该方法的有效性，特别是增强的好处以及处理许多反馈规则的能力。

Machine learning models may involve decision boundaries that change over time due to updates to rules and regulations, such as in loan approvals or claims management. However, in such scenarios, it may take time for sufficient training data to accumulate in order to retrain the model to reflect the new decision boundaries. While work has been done to reinforce existing decision boundaries, very little has been done to cover these scenarios where decision boundaries of the ML models should change in order to reflect new rules. In this paper, we focus on user-provided feedback rules as a way to expedite the ML models update process, and we formally introduce the problem of pre-processing training data to edit an ML model in response to feedback rules such that once the model is retrained on the pre-processed data, its decision boundaries align more closely with the rules. To solve this problem, we propose a novel data augmentation method, the Feedback Rule-Based Oversampling Technique. Extensive experiments using different ML models and real world datasets demonstrate the effectiveness of the method, in particular the benefit of augmentation and the ability to handle many feedback rules.

下载PDF全文

下载文献需遵守相关版权规定

论文标题