论文标题
通过CNN合奏的视频面对操作检测
Video Face Manipulation Detection Through Ensemble of CNNs
论文作者
论文摘要
在过去的几年中,已经成功开发了几种视频中面部操作的技术,并提供给群众(即面部杂志,深板等)。这些方法使任何人都可以轻松地在视频序列中编辑面孔,并以非常现实的结果和很少的努力编辑面孔。尽管这些工具在许多领域中具有有用性,但如果恶意使用,它们可能会对社会产生明显的影响(例如,假新闻传播,通过假复仇色情造成网络欺凌)。当时,客观地检测面部是否已操纵的能力是最重要的任务。在本文中,我们解决了针对现代面部操纵技术的视频序列中面部操纵检测的问题。特别是,我们研究了不同训练的卷积神经网络(CNN)模型的结合。在提出的解决方案中,从基本网络(即有效网络)开始获得不同的模型,利用两个不同的概念:(i)注意层; (ii)暹罗培训。我们表明,将这些网络结合起来会导致有希望的面部操纵检测结果,其中两个具有119000多个视频的公开数据集。
In the last few years, several techniques for facial manipulation in videos have been successfully developed and made available to the masses (i.e., FaceSwap, deepfake, etc.). These methods enable anyone to easily edit faces in video sequences with incredibly realistic results and a very little effort. Despite the usefulness of these tools in many fields, if used maliciously, they can have a significantly bad impact on society (e.g., fake news spreading, cyber bullying through fake revenge porn). The ability of objectively detecting whether a face has been manipulated in a video sequence is then a task of utmost importance. In this paper, we tackle the problem of face manipulation detection in video sequences targeting modern facial manipulation techniques. In particular, we study the ensembling of different trained Convolutional Neural Network (CNN) models. In the proposed solution, different models are obtained starting from a base network (i.e., EfficientNetB4) making use of two different concepts: (i) attention layers; (ii) siamese training. We show that combining these networks leads to promising face manipulation detection results on two publicly available datasets with more than 119000 videos.