Rethinking Feature Conditioning for Robust Forged Media Detection in Edge AI Sensing Systems
Abstract
Generalization under manipulation and dataset shift remains a core challenge in forged media detection for AI-driven edge sensing systems. Frozen vision foundation models with linear probes are strong baselines, but most pipelines use default backbone outputs without testing conditioning at the frozen feature interface. We present the first controlled probing study on DINOv3 ConvNeXt and show that, without task-specific fine-tuning, linear probing alone yields competitive forged-media detection performance, indicating that ViT-7B self-supervised distillation transfers to security-critical vision workloads at edge-compatible inference cost. Backbone, head, data, and optimization are fixed while conditioning is varied; LN-Affine, the default ConvNeXt head output, is the natural baseline. On FaceForensics++ c23, five conditioning variants are evaluated under in-distribution testing, leave-one-manipulation-out (LOMO), and cross-dataset transfer to Celeb-DF v2 and DeepFakeDetection. In ConvNeXt-Tiny, conditioning alone changes LOMO mean AUC by 6.1 points and reverses ID-vs-OOD ranking: LN-Affine is strongest on external datasets, while LayerNorm is strongest in-distribution. In ConvNeXt-Base replication, the OOD winner becomes protocol-dependent, and ID-optimal selection still fails as a robust deployment rule. Results show that feature conditioning is a first-order design variable and should be selected with robustness-oriented validation, not ID accuracy alone.
Source: arXiv:2603.26387v1 - http://arxiv.org/abs/2603.26387v1 PDF: https://arxiv.org/pdf/2603.26387v1 Original Link: http://arxiv.org/abs/2603.26387v1