Detecting Concept Drift in Evolving Malware Families Using Rule-Based Classifier Representations
Abstract
This work proposes a structural approach to concept drift detection in malware classification using decision tree rulesets. Classifiers are trained across temporal windows on the EMBER2024 dataset, and drift is quantified by comparing extracted rule representations using feature importance, prediction agreement, activation stability, and coverage metrics. These metrics are correlated with both accuracy degradation and data distribution shift as complementary drift indicators. The approach is eva...
Description / Details
This work proposes a structural approach to concept drift detection in malware classification using decision tree rulesets. Classifiers are trained across temporal windows on the EMBER2024 dataset, and drift is quantified by comparing extracted rule representations using feature importance, prediction agreement, activation stability, and coverage metrics. These metrics are correlated with both accuracy degradation and data distribution shift as complementary drift indicators. The approach is evaluated across six malware families using fixed-interval and clustering-based windowing in family-vs-benign and family-vs-family settings, and compared against RIPPER and Transcendent baselines. Results show that fixed two-month windowing with feature-level Pearson correlation is the most reliable configuration, being the only one where all family pairs produce positive drift-accuracy correlations. The methods are complementary - no single approach dominates across all pairs.
Source: arXiv:2604.22629v1 - http://arxiv.org/abs/2604.22629v1 PDF: https://arxiv.org/pdf/2604.22629v1 Original Link: http://arxiv.org/abs/2604.22629v1
Please sign in to join the discussion.
No comments yet. Be the first to share your thoughts!
Apr 27, 2026
Computer Science
Cybersecurity
0