Optimal Scheduling of Integrated Energy Multi-Microgrid System Based on Hierarchical Constraint Reinforcement Learning
Dong Lei1, Yang Zimin1, Qiao Ji2, Chen Sheng2, Wang Xinying2, Pu Tianjiao2
1. School of Electrical and Electronics Engineering North China Electric Power University Beijing 102206 China;
2. China Electric Power Research Institute Beijing 100192 China
The optimization of the integrated energy multi-microgrid system is a complex task, with numerous variables and challenges including data privacy protection and uncertainties of power generation and load, posing significant challenges for the efficient implementation of traditional mathematical optimization methods. Recently, many scholars have turned their attention to deep reinforcement learning (DRL) methods, which rely on data-driven principles and exhibit strong adaptability to uncertainties of power generation and load. Nevertheless, the difficulty of convergence persists with increasing system scale, and traditional DRL methods that handle constraints by adding penalty terms to the reward function may obscure the boundary between objectives and constraints, making it difficult to ensure that constraints are fully satisfied and resulting in excessively conservative learning strategies or suboptimal solutions. To address these issues, this paper proposed a hierarchical constraint reinforcement learning optimization method.
Firstly, this paper proposed a hierarchical DRL optimization framework for multi-microgrid systems. The proposed framework divides the optimization problem into two layers: an upper layer and a lower layer. The upper layer does not require obtaining all the operating status information of each microgrid. Instead, it utilizes net load prediction information and energy storage state information to provide energy storage optimization strategies and power interaction strategies. On the other hand, the lower layer enables each microgrid to autonomously optimize the output of its internal devices based on its own status information through mathematical programming, with the upper layer strategy as a constraint. The proposed framework leverages cooperation between the upper and lower layers to achieve overall optimization of the multi-microgrid system. This framework fully utilizes the advantages of DRL based on data-driven principles and effectively considers the solution accuracy of mathematical programming. Based on this hierarchical framework, a constraint DRL method is proposed that combines DRL methods with Lagrange multiplier methods. This method transforms the constraint optimization problem into an unconstrained optimization problem, enabling the agent to find the optimal strategy while strictly satisfying the constraints. Compared to traditional centralized optimization methods, the proposed method dynamically responds to the fluctuations of power generation and load to meet online optimization requirements and protects microgrid data privacy by not requiring the aggregation of all microgrid status information. Compared to general DRL methods, our approach effectively solves the problem of constraint violation and significantly improves both the convergence speed and accuracy.
The following conclusions can be drawn from the case studies: (1) A hierarchical design approach is proposed to simplify the optimization of multi-microgrid systems. The approach does not require information exchange between microgrids and only necessitates uploading net load and energy storage state information. Microgrids can independently and parallelly solve the optimization problem based on their own status information. This approach can provide scheduling results in real-time consistent with the optimal solution when local status information is available. (2) The proposed approach combines data-driven principles with traditional methods, simplifying the complexity of action space and reward design. It effectively balances the rapid solving ability of DRL and the solution accuracy of mathematical programming. Compared to traditional DRL methods, the proposed approach significantly improves both convergence speed and accuracy. (3) The approach combines DRL methods with Lagrange multiplier methods to transform the constrained optimization problem into an unconstrained one. This ensures that the agent can find the optimal strategy while strictly satisfying the constraints. The approach avoids convergence difficulties and constraint violation issues caused by manually setting the penalty coefficient in traditional DRL methods. (4) The model exhibits robustness and can effectively adapt to the fluctuations of power generation and load, making rapid decisions on power interactions of each microgrid.
董雷, 杨子民, 乔骥, 陈盛, 王新迎, 蒲天骄. 基于分层约束强化学习的综合能源多微网系统优化调度[J]. 电工技术学报, 2024, 39(5): 1436-1453.
Dong Lei, Yang Zimin, Qiao Ji, Chen Sheng, Wang Xinying, Pu Tianjiao. Optimal Scheduling of Integrated Energy Multi-Microgrid System Based on Hierarchical Constraint Reinforcement Learning. Transactions of China Electrotechnical Society, 2024, 39(5): 1436-1453.
[1] 蔡瑶, 卢志刚, 孙可, 等. 计及源荷不确定性的独立型交直流混合微网多能源协调优化调度[J]. 电工技术学报, 2021, 36(19): 4107-4120.
Cai Yao, Lu Zhigang, Sun Ke, et al.Multi-energy coordinated optimal scheduling of isolated AC/DC hybrid microgrids considering generation and load uncertainties[J]. Transactions of China Electrotechnical Society, 2021, 36(19): 4107-4120.
[2] 靳小龙, 穆云飞, 贾宏杰, 等. 融合需求侧虚拟储能系统的冷热电联供楼宇微网优化调度方法[J]. 中国电机工程学报, 2017, 37(2): 581-591.
Jin Xiaolong, Mu Yunfei, Jia Hongjie, et al.Optimal scheduling method for a combined cooling, heating and power building microgrid considering virtual storage system at demand side[J]. Proceedings of the CSEE, 2017, 37(2): 581-591.
[3] 张释中, 裴玮, 杨艳红, 等. 基于柔性直流互联的多微网集成聚合运行优化及分析[J]. 电工技术学报, 2019, 34(5): 1025-1037.
Zhang Shizhong, Pei Wei, Yang Yanhong, et al.Optimization and analysis of multi-microgrids integration and aggregation operation based on flexible DC interconnection[J]. Transactions of China Electrotechnical Society, 2019, 34(5): 1025-1037.
[4] 刘志坚, 刘瑞光, 梁宁, 等. 含电转气的微型能源网日前经济优化调度策略[J]. 电工技术学报, 2020, 35(增刊2): 535-543.
Liu Zhijian, Liu Ruiguang, Liang Ning, et al.Day-ahead optimal economic dispatching strategy for micro energy-grid with P2G[J]. Transactions of China Electrotechnical Society, 2020, 35(S2): 535-543.
[5] 王守相, 吴志佳, 庄剑. 考虑微网间功率交互和微源出力协调的冷热电联供型区域多微网优化调度模型[J]. 中国电机工程学报, 2017, 37(24): 7185-7194, 7432.
Wang Shouxiang, Wu Zhijia, Zhuang Jian.Optimal dispatching model of CCHP type regional multi-microgrids considering interactive power exchange among microgrids and output coordination among micro-sources[J]. Proceedings of the CSEE, 2017, 37(24): 7185-7194, 7432.
[6] 肖浩, 裴玮, 孔力, 等. 考虑光伏余电上网的微网出力决策分析及经济效益评估[J]. 电力系统自动化, 2014, 38(10): 10-16.
Xiao Hao, Pei Wei, Kong Li, et al.Decision analysis and economic benefit evaluation of microgrid power output considering surplus photovoltaic power selling to grid[J]. Automation of Electric Power Systems, 2014, 38(10): 10-16.
[7] Dehghanpour K, Nehrir H.Real-time multiobjective microgrid power management using distributed optimization in an agent-based bargaining framework[J]. IEEE Transactions on Smart Grid, 2018, 9(6): 6318-6327.
[8] 赵波, 汪湘晋, 张雪松, 等. 考虑需求侧响应及不确定性的微电网双层优化配置方法[J]. 电工技术学报, 2018, 33(14): 3284-3295.
Zhao Bo, Wang Xiangjin, Zhang Xuesong, et al.Two-layer method of microgrid optimal sizing considering demand-side response and uncertainties[J]. Transactions of China Electrotechnical Society, 2018, 33(14): 3284-3295.
[9] 许志荣, 杨苹, 张育嘉, 等. 考虑不平衡度约束的单三相混联多微网日前经济优化[J]. 电网技术, 2017, 41(1): 40-47.
Xu Zhirong, Yang Ping, Zhang Yujia, et al.Day-ahead economic optimized dispatch of single and three phase hybrid multi-microgrid considering unbalance constraint[J]. Power System Technology, 2017, 41(1): 40-47.
[10] 李长云,徐敏灵,蔡淑媛.计及电动汽车违约不确定性的微电网两段式优化调度策略[J].电工技术学报, 2023, 38(7): 1838-1851.
Li Changyun,Xu Minling,Cai Shuyuan.Two-stage optimal scheduling strategy for micro-grid considering EV default uncertainty[J].Transactions of China Electrotechnical Society, 2023, 38(7): 1838-1851.
[11] 滕云, 孙鹏, 罗桓桓, 等. 计及电热混合储能的多源微网自治优化运行模型[J]. 中国电机工程学报, 2019, 39(18): 5316-5324, 5578.
Teng Yun, Sun Peng, Luo Huanhuan, et al.Autonomous optimization operation model for multi-source microgrid considering electrothermal hybrid energy storage[J]. Proceedings of the CSEE, 2019, 39(18): 5316-5324, 5578.
[12] 武梦景, 万灿, 宋永华, 等. 含多能微网群的区域电热综合能源系统分层自治优化调度[J]. 电力系统自动化, 2021, 45(12): 20-29.
Wu Mengjing, Wan Can, Song Yonghua, et al.Hierarchical autonomous optimal dispatching of district integrated heating and power system with multi-energy microgrids[J]. Automation of Electric Power Systems, 2021, 45(12): 20-29.
[13] 马腾飞, 裴玮, 肖浩, 等. 基于纳什谈判理论的风-光-氢多主体能源系统合作运行方法[J]. 中国电机工程学报, 2021, 41(1): 25-39, 395.
Ma Tengfei, Pei Wei, Xiao Hao, et al.Cooperative operation method for wind-solar-hydrogen multi-agent energy system based on Nash bargaining theory[J]. Proceedings of the CSEE, 2021, 41(1): 25-39, 395.
[14] 欧阳聪, 刘明波, 林舜江, 等. 采用同步型交替方向乘子法的微电网分散式动态经济调度算法[J]. 电工技术学报, 2017, 32(5): 134-142.
Ouyang Cong, Liu Mingbo, Lin Shunjiang, et al.Decentralized dynamic economic dispatch algorithm of microgrids using synchronous alternating direction method of multipliers[J]. Transactions of China Electrotechnical Society, 2017, 32(5): 134-142.
[15] 顾雪平, 刘彤, 李少岩, 等. 基于改进双延迟深度确定性策略梯度算法的电网有功安全校正控制[J]. 电工技术学报, 2023, 38(8): 2162-2177.
Gu Xueping, Liu Tong, Li Shaoyan, et al.Active power correction control of power grid based on improved twin delayed deep deterministic policy gradient algorithm[J]. Transactions of China Electrotechnical Society, 2023, 38(8): 2162-2177.
[16] Mocanu E, Mocanu D C, Nguyen P H, et al.On-line building energy optimization using deep reinforcement learning[J]. IEEE Transactions on Smart Grid, 2019, 10(4): 3698-3708.
[17] Kofinas P, Dounis A I, Vouros G A.Fuzzy Q-learning for multi-agent decentralized energy management in microgrids[J]. Applied Energy, 2018, 219: 53-67.
[18] Xu Xu, Jia Youwei, Xu Yan, et al.A multi-agent reinforcement learning-based data-driven method for home energy management[J]. IEEE Transactions on Smart Grid, 2020, 11(4): 3201-3211.
[19] 黎海涛, 申保晨, 杨艳红, 等. 基于改进竞争深度Q网络算法的微电网能量管理与优化策略[J]. 电力系统自动化, 2022, 46(7): 42-49.
Li Haitao, Shen Baochen, Yang Yanhong, et al.Energy management and optimization strategy for microgrid based on improved dueling deep Q network algorithm[J]. Automation of Electric Power Systems, 2022, 46(7): 42-49.
[20] 乔骥, 王新迎, 张擎, 等. 基于柔性行动器-评判器深度强化学习的电-气综合能源系统优化调度[J]. 中国电机工程学报, 2021, 41(3): 819-833.
Qiao Ji, Wang Xinying, Zhang Qing, et al.Optimal dispatch of integrated electricity-gas system with soft actor-critic deep reinforcement learning[J]. Proceedings of the CSEE, 2021, 41(3): 819-833.
[21] 董雷, 刘雨, 乔骥, 等. 基于多智能体深度强化学习的电热联合系统优化运行[J]. 电网技术, 2021, 45(12): 4729-4738.
Dong Lei, Liu Yu, Qiao Ji, et al.Optimal dispatch of combined heat and power system based on multi-agent deep reinforcement learning[J]. Power System Technology, 2021, 45(12): 4729-4738.
[22] 张津源, 蒲天骄, 李烨, 等. 基于多智能体深度强化学习的分布式电源优化调度策略[J]. 电网技术, 2022, 46(9): 3496-3504.
Zhang Jinyuan, Pu Tianjiao, Li Ye, et al.Multi-agent deep reinforcement learning based optimal dispatch of distributed generators[J]. Power System Technology, 2022, 46(9): 3496-3504.
[23] 刘俊峰, 王晓生, 卢俊菠, 等. 基于多主体博弈和强化学习的多微网系统协同优化研究[J]. 电网技术, 2022, 46(7): 2722-2732.
Liu Junfeng, Wang Xiaosheng, Lu Junbo, et al.Collaborative optimization of multi-microgrid system based on multi-agent game and reinforcement learning[J]. Power System Technology, 2022, 46(7): 2722-2732.
[24] 聂欢欢, 张家琦, 陈颖, 等. 基于双层强化学习方法的多能园区实时经济调度[J]. 电网技术, 2021, 45(4): 1330-1336.
Nie Huanhuan, Zhang Jiaqi, Chen Ying, et al.Real-time economic dispatch of community integrated energy system based on a double-layer reinforcement learning method[J]. Power System Technology, 2021, 45(4): 1330-1336.
[25] Sutton R S, Barto A G.Reinforcement learning: an introduction[M]. Cambridge, Mass.: MIT Press, 1998
[26] 沈儒茹. 多微网系统的优化调度策略研究[D]. 哈尔滨: 哈尔滨工业大学, 2020.
[27] Altman E.Constrained Markov Decision Processes[M]. Boca Raton: CRC Press, 2021.
[28] Bertsekas D P.Constrained Optimization and lagrange Multiplier Methods[M]. New York: Academic Press, 1982
[29] Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[EB/OL].2018: arXiv: 1801.01290. https://arxiv.org/abs/1801.01290.
[30] Lin Longxin.Reinforcement learning for robots using neural networks[D]. Pittsburgh: Carnegie Mellon University, 1992.
[31] Christodoulou P. Soft actor-critic for discrete action settings[EB/OL].2019: arXiv: 1910.07207. https://arxiv.org/abs/1910.07207.
[32] 叶宇剑, 王卉宇, 汤奕, 等. 基于深度强化学习的居民实时自治最优能量管理策略[J]. 电力系统自动化, 2022, 46(1): 110-119.
Ye Yujian, Wang Huiyu, Tang Yi, et al.Real-time autonomous optimal energy management strategy for residents based on deep reinforcement learning[J]. Automation of Electric Power Systems, 2022, 46(1): 110-119.