Back to Explorer
Research PaperResearchia:202601.06869449[Engineering > Engineering]

Post-Decision State-Based Online Learning for Delay-Energy-Aware Flow Allocation in Wireless Systems

Mahesh Ganesh Bhat

Abstract

We develop a structure-aware reinforcement learning (RL) approach for delay- and energy-aware flow allocation in 5G User Plane Functions (UPFs). We consider a dynamic system with KK heterogeneous UPFs of varying capacities that handle stochastic arrivals of MM flow types, each with distinct rate requirements. We model the system as a Markov decision process (MDP) to capture the stochastic nature of flow arrivals and departures (possibly unknown), as well as the impact of flow allocation in the system. To solve this problem, we propose a post-decision state (PDS) based value iteration algorithm that exploits the underlying structure of the MDP. By separating action-controlled dynamics from exogenous factors, PDS enables faster convergence and efficient adaptive flow allocation, even in the absence of statistical knowledge about exogenous variables. Simulation results demonstrate that the proposed method converges faster and achieves lower long-term cost than standard Q-learning, highlighting the effectiveness of PDS-based RL for resource allocation in wireless networks.

Submission:1/6/2026
Comments:0 comments
Subjects:Engineering; Engineering
Original Source:
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Post-Decision State-Based Online Learning for Delay-Energy-Aware Flow Allocation in Wireless Systems | Researchia