论文标题
becaptcha型:生物识别键型数据生成用于改进的机器人检测
BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot Detection
论文作者
论文摘要
这项工作提出了一个数据驱动的学习模型,以综合击键生物识别数据。将提出的方法与基于通用和用户依赖模型的两种统计方法进行了比较。这些方法在机器人检测任务上进行了验证,使用击键合成数据来改善基于击键的机器人检测系统的训练过程。我们的实验框架考虑了一个数据集,其中有1.68千名受试者的1.36亿击球事件。我们通过定性和定量实验分析了三种综合方法的性能。根据几个监督分类器(支持向量机,随机森林,高斯幼稚的贝叶斯和长期的短期记忆网络)和包括人和合成样本在内的学习框架,考虑了不同的机器人探测器。实验证明了合成样品的现实主义。分类结果表明,在具有大型标记数据的情况下,可以以高精度检测这些合成样品。但是,在几次学习方案中,它代表了一个重要的挑战。此外,这些结果表明了介绍的模型的巨大潜力。
This work proposes a data driven learning model for the synthesis of keystroke biometric data. The proposed method is compared with two statistical approaches based on Universal and User-dependent models. These approaches are validated on the bot detection task, using the keystroke synthetic data to improve the training process of keystroke-based bot detection systems. Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects. We have analyzed the performance of the three synthesis approaches through qualitative and quantitative experiments. Different bot detectors are considered based on several supervised classifiers (Support Vector Machine, Random Forest, Gaussian Naive Bayes and a Long Short-Term Memory network) and a learning framework including human and synthetic samples. The experiments demonstrate the realism of the synthetic samples. The classification results suggest that in scenarios with large labeled data, these synthetic samples can be detected with high accuracy. However, in few-shot learning scenarios it represents an important challenge. Furthermore, these results show the great potential of the presented models.