论文标题

谎言可以伪造吗?从机器学习的角度比较低风险和高风险欺骗视频数据集

Can lies be faked? Comparing low-stakes and high-stakes deception video datasets from a Machine Learning perspective

论文作者

Camara, Mateus Karvat, Postal, Adriana, Maul, Tomas Henrique, Paetzold, Gustavo

论文摘要

尽管谎言对人类社会产生了很大的影响,而欺骗检测的54%人类准确性微不足道,但由于数据稀缺,执行自动化量DD的机器学习系统对于在现实生活环境中的适当应用仍然不可行。几乎没有公开可用的DD数据集存在,因此低风险和高风险之间的概念区别在于创建新数据集的创建。从理论上讲,两种谎言是如此独特,以至于一种类型的数据集不能用于另一种应用程序。尽管可以在受控设置中模拟(伪造)的低风险欺骗数据更容易获得数据,但这些谎言并不具有与真正的高风险谎言相同的意义或深度,这很难获得并保持自动DD系统的实际利益。为了研究这种区别是否从实际的角度来看,我们设计了几个实验,比较了高风险DD数据集和一个低风险DD数据集,该数据集评估了他们的结果,该结果专门从视频数据中工作。在我们的实验中,通过低风险训练的网络比低风险的网络对高风险的欺骗进行了更好的准确性,尽管使用低风险的欺骗性是为高风险数据集的增强策略降低其准确性。

Despite the great impact of lies in human societies and a meager 54% human accuracy for Deception Detection (DD), Machine Learning systems that perform automated DD are still not viable for proper application in real-life settings due to data scarcity. Few publicly available DD datasets exist and the creation of new datasets is hindered by the conceptual distinction between low-stakes and high-stakes lies. Theoretically, the two kinds of lies are so distinct that a dataset of one kind could not be used for applications for the other kind. Even though it is easier to acquire data on low-stakes deception since it can be simulated (faked) in controlled settings, these lies do not hold the same significance or depth as genuine high-stakes lies, which are much harder to obtain and hold the practical interest of automated DD systems. To investigate whether this distinction holds true from a practical perspective, we design several experiments comparing a high-stakes DD dataset and a low-stakes DD dataset evaluating their results on a Deep Learning classifier working exclusively from video data. In our experiments, a network trained in low-stakes lies had better accuracy classifying high-stakes deception than low-stakes, although using low-stakes lies as an augmentation strategy for the high-stakes dataset decreased its accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源