ExplorerMathematicsMathematics
Research PaperResearchia:202606.25026

Hierarchical Reinforcement Learning for Neural Network Compression (HiReLC): Pruning and Quantization

Kamar Hibatallah Baghdadi

Abstract

We present HiReLC, a hierarchical ensemble-reinforcement learning framework for automated joint quantization and structured pruning of deep neural networks. The framework decomposes the compression search across two levels of abstraction: low-level agents (LLAs) operate independently per block, selecting per-kernel configurations over a multi-discrete action space spanning bitwidth, pruning keep-ratio, quantization type, and granularity, while high-level agents (HLAs) coordinate global budget al...

Submitted: June 25, 2026Subjects: Mathematics; Mathematics

Description / Details

We present HiReLC, a hierarchical ensemble-reinforcement learning framework for automated joint quantization and structured pruning of deep neural networks. The framework decomposes the compression search across two levels of abstraction: low-level agents (LLAs) operate independently per block, selecting per-kernel configurations over a multi-discrete action space spanning bitwidth, pruning keep-ratio, quantization type, and granularity, while high-level agents (HLAs) coordinate global budget allocation via ensemble voting guided by Fisher Information-based sensitivity estimates. To mitigate the computational cost of policy evaluation, an iterative active learning loop interleaves surrogate-guided RL optimization with post-compression fine-tuning, using a lightweight MLP surrogate to amortize expensive evaluations and a logit-MSE proxy during cold-start. The surrogate is used for reward shaping rather than as a replacement for final post-compression evaluation. The controller is architecture-agnostic by design, with a modular layer abstraction decoupling the RL environment from the underlying network topology. Experiments across Vision Transformer and CNN benchmarks demonstrate effective parameter-storage compression ratios of 5.99 - 6.72×\times with a 3.83 % gain in one setting and 0.55 - 5.62 % accuracy drops elsewhere, supporting hierarchical policy decomposition and sensitivity-aware guidance as practical design choices for joint neural network compression.


Source: arXiv:2606.26002v1 - http://arxiv.org/abs/2606.26002v1 PDF: https://arxiv.org/pdf/2606.26002v1 Original Link: http://arxiv.org/abs/2606.26002v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 25, 2026
Topic:
Mathematics
Area:
Mathematics
Comments:
0
Bookmark
Hierarchical Reinforcement Learning for Neural Network Compression (HiReLC): Pruning and Quantization | Researchia