Skip to Main content Skip to Navigation
Conference papers

Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center

Abstract : This paper presents the network load balancing problem, a challenging real-world task for multi-agent reinforcement learning (MARL) methods. Conventional heuristic solutions like Weighted-Cost Multi-Path (WCMP) and Local Shortest Queue (LSQ) are less flexible to the changing workload distributions and arrival rates, with a poor balance among multiple load balancers. The cooperative network load balancing task is formulated as a Dec-POMDP problem, which naturally induces the MARL methods. To bridge the reality gap for applying learning-based methods, all models are directly trained and evaluated on a real-world system from moderateto large-scale setups. Experimental evaluations show that the independent and "selfish" load balancing strategies are not necessarily the globally optimal ones, while the proposed MARL solution has a superior performance over different realistic settings. Additionally, the potential difficulties of the application and deployment of MARL methods for network load balancing are analysed, which helps draw the attention of the learning and network communities to such challenges.
Complete list of metadata

https://hal-polytechnique.archives-ouvertes.fr/hal-03753203
Contributor : Zhiyuan Yao Connect in order to contact the contributor
Submitted on : Thursday, August 18, 2022 - 5:17:06 PM
Last modification on : Saturday, August 20, 2022 - 3:42:53 AM

Files

cikm22.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Zhiyuan Yao, Zihan Ding, Thomas Clausen. Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center. 31st ACM International Conference on Information and Knowledge Management (CIKM '22), Oct 2022, Atlanta, GA, United States. ⟨10.1145/3511808.3557133⟩. ⟨hal-03753203⟩

Share

Metrics

Record views

18

Files downloads

4