Face Forgery Detection Based on Fine-grained Clues and Noise Inconsistency

2025-02-05 Wed
face forgery deepfake detection noise inconsistency fine-grained clues dual-domain attention fusion transformer network compressed domain
In-depth Exploration of Face Forgery Detection Based on Fine-Grained Clues and Noise InconsistencyBackground IntroductionWith the rapid advancement of artificial intelligence (AI) technologies, various generative models have achieved remarkable progress. This has made it increasingly easy to generate highly realistic “deepfake” face images. These hyper-realistic face forgeries have legitimate uses in fields like entertainment and film production, but they are also misused for malicious purposes, such as spreading misinformation, misleading public opinion, and even threatening social and national security. Especially when mainstream media adopt implicit compression methods, the compression process often dilutes traces of forgery, making detection more challenging. Therefore, developing effective methods for face forgery detection has become a core necessity in the field of multimedia information security.
At present, most existing forgery detection methods rely solely on spatial domain features or frequency domain features, with few studies examining their correlation and complementarity. Additionally, the performance of these methods often severely degrades when processing low-quality or heavily compressed images. To address these challenges, the paper “Face Forgery Detection Based on Fine-grained Clues and Noise Inconsistency” introduces an innovative two-stream network (TSN) that improves the accuracy and generalizability of forgery detection by leveraging fine-grained clues and noise inconsistency.
Paper OriginThis paper is authored by Dengyong Zhang, Ruiyi He, Xin Liao, Feng Li, Jiaxin Chen, and Gaobo Yang. It was published in the January 2025 issue of IEEE Transactions on Artificial Intelligence. The research was funded by grants from the National Natural Science Foundation of China (Grant 62172059, 62402062, U22A2030) and supported by related funding projects in Hunan Province. The authors primarily belong to Changsha University of Science and Technology and Hunan University, focusing on intelligent processing of big data and multimedia information security.
Research Process and Methodology1. Two-Stream Network DesignThe proposed forgery detection framework primarily leverages spatial features while integrating high-frequency noise features for forgery recognition. Specifically, the framework consists of two primary modules:
Double-Frequency Transformer Module (DFTM): This module extracts high-frequency features from frequency domain signals and guides the learning of spatial features, helping to capture local forgery traces in manipulated images.
Dual-Domain Attention Fusion Module (DDAFM): This module fuses information from the spatial and noise domains, effectively combining their inputs to further boost forgery detection performance.
2. Data Preprocessing and Training StrategyTo comprehensively evaluate the performance of the proposed method, experiments utilized several large-scale public datasets including FaceForensics++ (FF++), Celeb-DF, DFDC, WildDeepfake, and FaceShifter. The FaceForensics++ dataset, for instance, offers raw uncompressed versions (RAW) and compressed versions (C23 and C40), allowing for evaluation in scenarios involving image compression.
The study employed EfficientNet as the backbone network and utilized a two-stage training strategy:
- Stage 1: Classification training using the Cross-Entropy Loss function.
- Stage 2: Further optimization incorporating an improved Local Relationship Constraint Loss.
3. Local Relationship Constraint LossTo distinguish forgery features across various manipulation methods, the study enhances the Local Relationship Constraint Loss proposed by Li et al. During the blocking operation, the model calculates cosine similarities among feature blocks of varying stride and block sizes. This captures forgery traces more precisely without being affected by edge noise. Furthermore, by segmenting into shallow, middle, and deep layers and integrating multi-scale feature information, the study effectively enhances the discriminative representation between forgery regions and normal regions.
Core Research Findings1. Significant Improvement in Efficiency and RobustnessExperimental results demonstrate that the proposed method achieved notable improvements in detection accuracy and robustness across multiple datasets. For example, on the heavily compressed FF++ dataset in version C40, the model achieved an AUC (Area Under the Curve) of 89.98%, outperforming most other state-of-the-art methods. Additionally, when handling low-quality forgery images subjected to JPEG compression, the model maintained high robustness.
2. Generalization Across Diverse TasksThe study validates the method’s extensive applicability in various forgery scenarios via cross-dataset testing. On the Celeb-DF real-world deepfake dataset, the proposed method achieved an AUC of 72.76%, significantly surpassing many conventional methods. This approach addresses the generalization challenges posed by differences in data distribution, providing a potential solution for real-world applications.
3. Verification Through VisualizationUsing Grad-CAM visualization techniques, the paper demonstrates the model’s attention regions across different streams. The experiments show that the DFTM module precisely focuses on high-frequency features in forgery regions, while the noise flow captures global noise inconsistencies. These complementary roles significantly enhance the detection of forgery traces.
4. Lightweight Design and Efficiency EnhancementCompared to existing models such as F3-Net and GFFD, the proposed method significantly reduces computational complexity and parameter count, achieving only 2.13 GFLOPs and using 7.92M parameters. This makes the approach highly suitable for deployment in resource-constrained scenarios.
Significance and ImplicationsScientific Value: The paper introduces the first two-stream network that uses frequency domain features to guide spatial feature learning, supplemented by noise clues, providing a novel approach to face forgery detection.
Practical Potential: The method exhibits strong robustness in complex real-world scenarios, such as data compression, making it highly valuable for audiovisual media security.
Methodological Innovation: The design of the DFTM module and the enhanced Local Relationship Constraint Loss demonstrate exceptional potential in forgery detection tasks, suitable for further applications like deepfake video detection.
Outlook and Directions for ImprovementAlthough the study performed outstandingly on benchmark datasets, there is room for improvement in further enhancing generalization capabilities and lightweight design. Future plans include incorporating more unseen generative models for training and optimizing the network architecture for faster real-time detection.
This paper transcends the limitations of traditional forgery detection methods, injecting new inspiration into the field of multimedia information security. The lightweight and efficient nature of the proposed model makes it an indispensable tool in practical scenarios.