Abstract:With the increasing penetration of distributed generation and the growing demand for power system flexibility, issues like voltage rise at the edge of distribution networks and network congestion under bidirectional power flow are becoming more prominent. Integrating and coordinating flexible resources at the user side through Distributed Smart Grids (DSG) is significant for enhancing the accommodation of distributed generation and the real-time supply-demand balancing capability of distribution systems. Considering the large quantity and high dispersion of flexible resource devices and the distinct characteristics of different prosumers, traditional centralized optimization and dispatch schemes as well as distributed computing methods will face greater challenges in solving efficiency and decision delivery timeliness. Against this background, this paper aims to develop a DSG system collaborative optimization and dispatch method that takes into account operational economy, energy network security, and decision timeliness concurrently. Firstly, mapping real-world prosumers who control and own flexible resources to intelligent agents in reinforcement learning, the optimization and dispatch of flexible resources in DSG is formulated as a multi-agent collaborative optimization model. The existing edge-cloud collaborative framework is extended to the optimization of flexible resources considering energy network security constraints, and a hierarchical optimization and dispatch framework of flexible resource-prosumer-DSG is established. Secondly, considering the differentiated characteristics of prosumers in aspects like types of flexible resource devices, photovoltaics (PV) is taken as distributed generation, and electric vehicles (EV), heating, ventilation and air conditioning (HVAC) of buildings, and energy storage systems (ESS) are taken as demand-side flexible resources. A heterogeneous intelligent agent interactive environment model is built based on the operational characteristics of different flexible resources. Meanwhile, to balance flexible resource operational requirements, overall economic efficiency and energy network security of the DSG system, user satisfaction evaluation of EV and HVAC operation and ESS operation cost are considered as local rewards, while system energy cost and energy network security evaluation are taken as global rewards, and a combined global-local reward mechanism for heterogeneous intelligent agents is proposed. Finally, to adapt to the collaborative training task of the heterogeneous intelligent agent system, an improved multi-agent proximal policy optimization (MAPPO) algorithm is proposed based on asynchronous update of agent policies in random order. Case studies on the IEEE 33-node system are conducted for analysis. Firstly, the proposed improved MAPPO algorithm is compared with existing multi-agent collaborative training schemes in the offline training stage. Secondly, the differences in flexible resource prosumers' power decisions with and without considering energy network constraints are analyzed in the online dispatch stage. Finally, the proposed method is compared with traditional mathematical programming and particle swarm optimization methods regarding optimization performance in real-time dispatch. The main conclusions are: (1) The edge-cloud collaborative hierarchical optimization and dispatch framework for DSG systems is established, which can obtain dispatch decisions faster in real-time dispatch compared to traditional centralized optimization and thus improve the timeliness of DSG power dispatch decisions. (2) The combined global-local reward mechanism for heterogeneous intelligent agents can achieve overall DSG system optimization and collaborative training objectives of balancing user comfort, economic efficiency and energy network security. (3) The proposed improved MAPPO algorithm adapted for heterogeneous intelligent agent training can maintain independent decision spaces for each agent while ensuring environment state stability in collaborative training through asynchronous policy updates in random order.
[1] 孙毅, 李飞, 胡亚杰, 等. 计及条件风险价值和综合需求响应的产消者能量共享激励策略[J]. 电工技术学报, 2023, 38(9): 2448-2463. Sun Yi, Li Fei, Hu Yajie, et al.Energy sharing incentive strategy of prosumers considering conditional value at risk and integrated demand response[J]. Transactions of China Electrotechnical Society, 2023, 38(9): 2448-2463. [2] 胡俊杰, 李阳, 吴界辰, 等. 基于配网节点电价的产消者日前优化调度[J]. 电网技术, 2019, 43(8): 2770-2780. Hu Junjie, Li Yang, Wu Jiechen, et al.A day-ahead optimization scheduling method for prosumer based on iterative distribution locational marginal price[J]. Power System Technology, 2019, 43(8): 2770-2780. [3] 李勇, 凌锋, 乔学博, 等. 基于网侧资源协调的自储能柔性互联配电系统日前-日内优化[J]. 电工技术学报, 2023, 39(3): 758-773. Li Yong, Ling Feng, Qiao Xuebo, et al.Day-ahead and intra-day optimization of flexible interconnected distribution system with self-energy storage based on the grid-side resource coordination[J]. Transactions of China Electrotechnical Society, 2023, 39(3): 758-773. [4] 赵紫原. 分布式智能电网护航能源安全: 专访中国工程院院士、天津大学教授余贻鑫[J]. 中国电力企业管理, 2022(25): 10-13. Zhao Ziyuan.Distributed smart grid escorts energy security—interview with yu Yixin, academician of China academy of engineering and professor of Tianjin university[J]. China Power Enterprise Management, 2022(25): 10-13. [5] 陈心宜, 胡秦然, 石庆鑫, 等. 新型电力系统居民分布式资源管理综述[J]. 电力系统自动化, 2024, 48(5): 157-175. Chen Xinyi, Hu Qinran, Shi Qingxin, et al.Review on residential distributed energy resource management in new power system[J]. Automation of Electric Power Systems, 2024, 48(5): 157-175. [6] 王枭, 何怡刚, 马恒瑞, 等. 面向电网辅助服务的虚拟储能电厂分布式优化控制方法[J]. 电力系统自动化, 2022, 46(10): 181-188. Wang Xiao, He Yigang, Ma Hengrui, et al.Distributed optimization control method of virtual energy storage plants for power grid ancillary services[J]. Automation of Electric Power Systems, 2022, 46(10): 181-188. [7] 李振坤, 钱晋, 符杨, 等. 基于负荷聚合商优选分级的配电网多重阻塞管理[J]. 电力系统自动化, 2021, 45(19): 109-116. Li Zhenkun, Qian Jin, Fu Yang, et al.Multiple congestion management for distribution network based on optimization classification of load aggregators[J]. Automation of Electric Power Systems, 2021, 45(19): 109-116. [8] Hu Junjie, Liu Xuetao, Shahidehpour M, et al.Optimal operation of energy hubs with large-scale distributed energy resources for distribution network congestion management[J]. IEEE Transactions on Sustainable Energy, 2021, 12(3): 1755-1765. [9] Hu Junjie, Yang Guangya, Bindner H W, et al.Application of network-constrained transactive control to electric vehicle charging for secure grid operation[J]. IEEE Transactions on Sustainable Energy, 2017, 8(2): 505-515. [10] Hu Junjie, Yang Guangya, Ziras C, et al.Aggregator operation in the balancing market through network-constrained transactive energy[J]. IEEE Transactions on Power Systems, 2019, 34(5): 4071-4080. [11] 兰威, 陈飞雄. 计及阻塞管理的虚拟电厂与配电网协同运行策略[J]. 电气技术, 2022, 23(6): 30-41. Lan Wei, Chen Feixiong.Cooperative operation strategy of distribution network and virtual power plants considering congestion management[J]. Electrical Engineering, 2022, 23(6): 30-41. [12] Shen Feifan, Wu Qiuwei, Huang Shaojun, et al.Two-tier demand response with flexible demand swap and transactive control for real-time congestion management in distribution networks[J]. International Journal of Electrical Power & Energy Systems, 2020, 114: 105399. [13] 冯斌, 胡轶婕, 黄刚, 等. 基于深度强化学习的新型电力系统调度优化方法综述[J]. 电力系统自动化, 2023, 47(17): 187-199. Feng Bin, Hu Yijie, Huang Gang, et al.Review on optimization methods for new power system dispatch based on deep reinforcement learning[J]. Automation of Electric Power Systems, 2023, 47(17): 187-199. [14] 陈泽宇, 方志远, 杨瑞鑫, 等. 基于深度强化学习的混合动力汽车能量管理策略[J]. 电工技术学报, 2022, 37(23): 6157-6168. Chen Zeyu, Fang Zhiyuan, Yang Ruixin, et al.Energy management strategy for hybrid electric vehicle based on the deep reinforcement learning method[J]. Transactions of China Electrotechnical Society, 2022, 37(23): 6157-6168. [15] Kang H, Jung S, Kim H, et al.Multi-objective sizing and real-time scheduling of battery energy storage in energy-sharing community based on reinforcement learning[J]. Renewable and Sustainable Energy Reviews, 2023, 185: 113655. [16] 陈明昊, 孙毅, 谢志远. 基于双层深度强化学习的园区综合能源系统多时间尺度优化管理[J]. 电工技术学报, 2023, 38(7): 1864-1881. Chen Minghao, Sun Yi, Xie Zhiyuan.The multi-time-scale management optimization method for park integrated energy system based on the Bi-layer deep reinforcement learning[J]. Transactions of China Electrotechnical Society, 2023, 38(7): 1864-1881. [17] 叶宇剑, 袁泉, 刘文雯, 等. 基于参数共享机制多智能体深度强化学习的社区能量管理协同优化[J]. 中国电机工程学报, 2022, 42(21): 7682-7695. Ye Yujian, Yuan Quan, Liu Wenwen, et al.Parameter sharing empowered multi-agent deep reinforcement learning for coordinated management of energy communities[J]. Proceedings of the CSEE, 2022, 42(21): 7682-7695. [18] Lin Lin, Guan Xin, Peng Yu, et al.Deep reinforcement learning for economic dispatch of virtual power plant in Internet of energy[J]. IEEE Internet of Things Journal, 2020, 7(7): 6288-6301. [19] Fang Dawei, Guan Xin, Peng Yu, et al.Distributed deep reinforcement learning for renewable energy accommodation assessment with communication uncertainty in Internet of energy[J]. IEEE Internet of Things Journal, 2021, 8(10): 8557-8569. [20] Kim D, Bae Y, Yun S, et al.A methodology for generating reduced-order models for large-scale buildings using the Krylov subspace method[J]. Journal of Building Performance Simulation, 2020, 13(4): 419-429. [21] Chintala R H, Rasmussen B P.Automated multi-zone linear parametric black box modeling approach for building HVAC systems[C]//Proceedings of ASME 2015 Dynamic Systems and Control Conference, Columbus, Ohio, USA, 2015: 1-10. [22] Silver D, Hubert T, Schrittwieser J, et al.A general reinforcement learning algorithm that Masters chess, shogi, and Go through self-play[J]. Science, 2018, 362(6419): 1140-1144. [23] Ding Yifu, Morstyn T, McCulloch M D. Distributionally robust joint chance-constrained optimization for networked microgrids considering contingencies and renewable uncertainty[J]. IEEE Transactions on Smart Grid, 2022, 13(3): 2467-2478. [24] Zhu Hao, Liu H J.Fast local voltage control under limited reactive power: optimality and stability analysis[J]. IEEE Transactions on Power Systems, 2016, 31(5): 3794-3803. [25] Yu Chao, Velu A, Vinitsky E, et al. The surprising effectiveness of PPO in cooperative, multi-agent games[EB/OL].2021: arXiv: 2103.01955. https:// arxiv.org/abs/2103.01955.pdf [26] Iqbal S, Sha Fei. Actor-attention-critic for multi-agent reinforcement learning[EB/OL].2018: arXiv: 1810.02912. https://arxiv.org/abs/1810.02912.pdf. [27] Foerster J, Farquhar G, Afouras T, et al.Counterfactual multi-agent policy gradients[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 2974-2982. [28] Kuba J G, Feng Xidong, Ding Shiyao, et al. Heterogeneous-agent mirror learning: a continuum of solutions to cooperative MARL[EB/OL].2022: arXiv: 2208.01682. https://arxiv.org/abs/2208.01682.pdf. [29] Kuba J G, Chen Ruiqing, Wen Muning, et al. Trust region policy optimisation in multi-agent reinforcement learning[EB/OL].2021: arXiv: 2109.11251. https:// arxiv.org/abs/2109.11251.pdf. [30] Wiese F, Schlecht I, Bunke W D, et al.Open Power System Data - Frictionless data for electricity system modelling[J]. Applied Energy, 2019, 236: 401-409. [31] Liessner R, Schmitt J, Dietermann A, et al.Hyper-parameter optimization for deep reinforcement learning in vehicle energy management[C]//Proceedings of the 11th International Conference on Agents and Artificial Intelligence, Prague, Czech Republic, 2019: 134-144. [32] 马庆, 邓长虹. 基于单/多智能体简化强化学习的电力系统无功电压控制方法[J]. 电工技术学报, 2024, 39(3): 1300-1312. Ma Qing, Deng Changhong.Single/multi agent simplified deep reinforcement learning based volt-var control of power system[J]. Transactions of China Electrotechnical Society, 2024, 39(3): 1300-1312.