Optimized Scheduling of Multi Photovoltaics and Energy Storage Integrated Flexible Direct Current Distribution Systems Based on Delay-Aware Multi-Agent Deep Reinforcement Learning
Peng Chunhua, Zhang Haoqi, Sun Huijuan, Lu Hengyu
School of Electrical and Automation Engineering East China Jiaotong University Nanchang 330013 China
Abstract:In the context of developing a new power system dominated by new energy, large-scale integration of distributed photovoltaics has exacerbated spatiotemporal mismatches. DC-AC conversion losses in prevalent AC distribution systems have also grown more prominent. These issues drove rapid advancement of the photovoltaic, energy storage, direct current, flexibility (PEDF) system. Most existing studies, however, focus on single-system optimization and neglect collaborative scheduling in multiple photovoltaic, energy storage, direct current, flexibility (MPEDF) systems. To enhance MPEDF power complementarity and cut operational costs, this study proposed a collaborative optimal scheduling strategy based on multi-agent deep reinforcement learning. First, the study established an interconnected operation framework for the MPEDF system. Multiple independent PEDF systems integrated PV, grid access, battery energy storage (BES), electric vehicles (EVs), and flexible loads via DC-DC converters and interconnection lines. The power response of flexible loads, BES, and EVs correlates linearly with DC bus voltage deviation. EV charging behavior was simulated using the Monte Carlo method. Next, the study developed a power control method. It dynamically guides the charging and discharging power of flexible devices and EVs based on DC bus voltage deviation signals, enabling power scheduling. By optimizing the voltage gain coefficient at each scheduling moment, the study obtained a power adjustment plan for flexible devices in PEDF systems to achieve economic operation. Finally, the study constructed a complete MPEDF system optimization scheduling model with a power interaction strategy. Its power transmission mechanism coordinates inter-system power differences to minimize operational costs. The model considers various costs, including power transmission, device scheduling, and curtailment penalties, aiming to maximize economic and environmental benefits. To address data privacy and system-specific operational traits, the study employed multi-agent deep reinforcement learning for model solving. For information interaction delays among agents, it adopted a delay-aware multi-agent proximal policy optimization (DA-MAPPO) algorithm. The algorithm introduces a delay-aware Markov process and expands the state space, effectively reducing interaction delays and boosting solving efficiency. Simulations on three interconnected MPEDF systems show that enhanced power complementarity increases photovoltaic absorption and cuts operating costs. Compared with scenarios without electric vehicle participation or inter-system power interaction, electricity purchase costs drop by 42.7% and curtailment decreases by 38.5%. DA-MAPPO yields a total operating cost of 2 640.21 yuan, 17.3% lower than CPLEX, 13.2% lower than PSO, 9.9% lower than SAC, and 6.1% lower than MAPPO. It also has the fastest solving speed (1 second), making it suitable for real-time scheduling. In conclusion, this study develops an operation framework for the PEDF system based on DC bus voltage control strategies. This framework adjusts the power of flexible devices and facilitates inter-system power interaction through voltage signals. The strategy contributes to balancing renewable energy supply and demand, supporting the stable and low-carbon operation of urban power grids, and provides new insights for future optimization scheduling models. Future research will explore the integration of MPEDF with shared energy storage to address spatial and temporal differences in PV output and load demand, as well as to solve issues of energy storage redundancy in some systems and capacity insufficiency in others.
彭春华, 张浩旗, 孙惠娟, 卢恒宇. 基于延迟感知多智能体深度强化学习的多光储直柔系统优化调度[J]. 电工技术学报, 2026, 41(11): 3742-3754.
Peng Chunhua, Zhang Haoqi, Sun Huijuan, Lu Hengyu. Optimized Scheduling of Multi Photovoltaics and Energy Storage Integrated Flexible Direct Current Distribution Systems Based on Delay-Aware Multi-Agent Deep Reinforcement Learning. Transactions of China Electrotechnical Society, 2026, 41(11): 3742-3754.
[1] 马庆, 邓长虹. 基于单/多智能体简化强化学习的电力系统无功电压控制[J]. 电工技术学报, 2024, 39(5): 1300-1312. Ma Qing, Deng Changhong.Single/multi agent simplified deep reinforcement learning based volt-var control of power system[J]. Transactions of China Electrotechnical Society, 2024, 39(5): 1300-1312. [2] 高艺宁, 胡海涛, 葛银波, 等. 电气化铁路沿线光伏分布式并网方案及其电气特性研究[J]. 电工技术学报, 2025, 40(21): 7062-7075. Gao Yining, Hu Haitao, Ge Yinbo, et al.Research on distributed photovoltaic integration scheme along electrified railways and its electrical characteristics[J]. Transactions of China Electrotechnical Society, 2025, 40(21): 7062-7075. [3] 李叶茂, 李雨桐, 郝斌, 等. 低碳发展背景下的建筑“光储直柔”配用电系统关键技术分析[J]. 供用电, 2021, 38(1): 32-38. Li Yemao, Li Yutong, Hao Bin, et al.Key technologies of building power supply and distribution system towards carbon neutral development[J]. Distribution & Utilization, 2021, 38(1): 32-38. [4] 余梦凡, 周建新, 韩四维, 等. 光储直柔建筑整体特性建模与仿真研究[J]. 建筑科学, 2024, 40(4): 142-149. Yu Mengfan, Zhou Jianxin, Han Siwei, et al.Modeling and simulation research on the overall characteristics of photovoltaics, energy storage, direct current and flexibility buildings[J]. Building Science, 2024, 40(4): 142-149. [5] Deng Xiangtian, Zhang Yi, Jiang Yi, et al.A novel operation method for renewable building by combining distributed DC energy system and deep reinforcement learning[J]. Applied Energy, 2024, 353: 122188. [6] 江亿. 光储直柔:助力实现零碳电力的新型建筑配电系统[J]. 暖通空调, 2021, 51(10): 1-12. Jiang Yi.PSDF(photovoltaic, storage, DC, flexible): a new type of building power distribution system for zero carbon power system[J]. Heating Ventilating & Air Conditioning, 2021, 51(10): 1-12. [7] Liu Xiaochen, Liu Xiaohua, Jiang Yi, et al.Photovoltaics and energy storage integrated flexible direct current distribution systems of buildings: definition, technology review, and application[J]. CSEE Journal of Power and Energy Systems, 2023, 9(3): 829-845. [8] 李鹏, 钟瀚明, 马红伟, 等. 基于深度强化学习的有源配电网多时间尺度源荷储协同优化调控[J]. 电工技术学报, 2025, 40(5): 1487-1502. Li Peng, Zhong Hanming, Ma Hongwei, et al.Multi-timescale optimal dispatch of source-load-storage coordination in active distribution network based on deep reinforcement learning[J]. Transactions of China Electrotechnical Society, 2025, 40(5): 1487-1502. [9] 董雷, 杨子民, 乔骥, 等. 基于分层约束强化学习的综合能源多微网系统优化调度[J]. 电工技术学报, 2024, 39(5): 1436-1453. Dong Lei, Yang Zimin, Qiao Ji, et al.Optimal scheduling of integrated energy multi-microgrid system based on hierarchical constraint reinforcement learning[J]. Transactions of China Electrotechnical Society, 2024, 39(5): 1436-1453. [10] Yu Liang, Sun Yi, Xu Zhanbo, et al.Multi-agent deep reinforcement learning for HVAC control in commercial buildings[J]. IEEE Transactions on Smart Grid, 2021, 12(1): 407-419. [11] 陈池瑶, 苗世洪, 姚福星, 等. 基于多智能体算法的多微电网-配电网分层协同调度策略[J]. 电力系统自动化, 2023, 47(10): 57-65. Chen Chiyao, Miao Shihong, Yao Fuxing, et al.Hierarchical cooperative dispatching strategy of multi-microgrid and distribution networks based on multi-agent algorithm[J]. Automation of Electric Power Systems, 2023, 47(10): 57-65. [12] Walsh T J, Nouri A, Li Lihong, et al.Learning and planning in environments with delayed feedback[J]. Autonomous Agents and Multi-Agent Systems, 2009, 18(1): 83-105. [13] Liu Mingyu, Zhang Hui, Zhang Ya.Delay-aware MAPPO algorithm for cooperative environments[C]// 2023 42nd Chinese Control Conference (CCC), Tianjin, China, 2023: 8497-8502. [14] 侯慧, 何梓姻, 陈跃, 等. 基于深度强化学习区间多目标优化的智能建筑低碳优化调度[J]. 电力系统自动化, 2023, 47(21): 47-57. Hou Hui, He Ziyin, Chen Yue, et al.Low-carbon optimal dispatch of smart building based on interval multi-objective optimization with deep reinforcement learning[J]. Automation of Electric Power Systems, 2023, 47(21): 47-57. [15] 文欣, 黄学良, 高山, 等. 考虑区域差异性的电动私家车网格化充电需求预测[J]. 电力系统自动化, 2025, 49(7): 158-168. Wen Xin, Huang Xueliang, Gao Shan, et al.Grid charging demand forecasting for private electric vehicles considering regional differences[J]. Automation of Electric Power Systems, 2025, 49(7): 158-168. [16] 田立亭, 史双龙, 贾卓. 电动汽车充电功率需求的统计学建模方法[J]. 电网技术, 2010, 34(11): 126-130. Tian Liting, Shi Shuanglong, Jia Zhuo.A statistical model for charging power demand of electric vehicles[J]. Power System Technology, 2010, 34(11): 126-130. [17] 韩丽, 陈硕, 王施琪, 等. 考虑风光消纳与电动汽车灵活性的调度策略[J]. 电工技术学报, 2024, 39(21): 6793-6803. Han Li, Chen Shuo, Wang Shiqi, et al.Scheduling strategy considering wind and photovoltaic power consumption and the flexibility of electric vehicles[J]. Transactions of China Electrotechnical Society, 2024, 39(21): 6793-6803. [18] 陈登勇, 刘方, 刘帅. 基于阶梯碳交易的含P2G-CCS耦合和燃气掺氢的虚拟电厂优化调度[J]. 电网技术, 2022, 46(6): 2042-2054. Chen Dengyong, Liu Fang, Liu Shuai.Optimization of virtual power plant scheduling coupling with P2G-CCS and doped with gas hydrogen based on stepped carbon trading[J]. Power System Technology, 2022, 46(6): 2042-2054.