ExplorerData ScienceMachine Learning
Research PaperResearchia:202604.16067

CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations

Benzhao Tang

Abstract

The explosive growth of system logs makes streaming compression essential, yet existing log anomaly detection (LAD) methods incur severe pre-processing overhead by requiring full decompression and parsing. We introduce CLAD, the first deep learning framework to perform LAD directly on compressed byte streams. CLAD bypasses these bottlenecks by exploiting a key insight: normal logs compress into regular byte patterns, while anomalies systematically disrupt them. To extract these multi-scale devia...

Submitted: April 16, 2026Subjects: Machine Learning; Data Science

Description / Details

The explosive growth of system logs makes streaming compression essential, yet existing log anomaly detection (LAD) methods incur severe pre-processing overhead by requiring full decompression and parsing. We introduce CLAD, the first deep learning framework to perform LAD directly on compressed byte streams. CLAD bypasses these bottlenecks by exploiting a key insight: normal logs compress into regular byte patterns, while anomalies systematically disrupt them. To extract these multi-scale deviations from opaque bytes, we propose a purpose-built architecture integrating a dilated convolutional byte encoder, a hybrid Transformer--mLSTM, and four-way aggregation pooling. This is coupled with a two-stage training strategy of masked pre-training and focal-contrastive fine-tuning to effectively handle severe class imbalance. Evaluated across five datasets, CLAD achieves a state-of-the-art average F1-score of 0.9909 and outperforms the best baseline by 2.72 percentage points. It delivers superior accuracy while completely eliminating decompression and parsing overheads, offering a robust solution that generalizes to structured streaming compressors.


Source: arXiv:2604.13024v1 - http://arxiv.org/abs/2604.13024v1 PDF: https://arxiv.org/pdf/2604.13024v1 Original Link: http://arxiv.org/abs/2604.13024v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Apr 16, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark
CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations | Researchia