基于单/多智能体简化强化学习的电力系统无功电压控制

doi:10.19595/j.cnki.1000-6753.tces.222195

摘要
图/表
参考文献
相关文章 (10)

全文: PDF (1676 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要为了快速平抑分布式能源接入系统产生的无功电压波动,以强化学习、模仿学习为代表的机器学习方法逐渐被应用于无功电压控制。虽然现有方法能实现在线极速求解,但仍然存在离线训练速度慢、普适性不够等阻碍其应用于实际的缺陷。该文首先提出一种适用于输电网集中式控制的单智能体简化强化学习方法,该方法基于“Actor-Critic”架构对强化学习进行简化与改进,保留了强化学习无需标签数据与强普适性的优点,同时消除了训练初期因智能体随机搜索造成的计算浪费,大幅提升了强化学习的训练速度;然后,提出一种适用于配电网分布式零通信控制的多智能体简化强化学习方法,该方法将简化强化学习思想推广形成多智能体版本,同时采用模仿学习进行初始化,将全局优化思想提前注入各智能体,提升各无功设备之间的就地协同控制效果;最后,基于改进IEEE 118节点算例的仿真结果验证了所提方法的正确性与快速性。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	马庆
	邓长虹

关键词 ：无功电压控制, 集中式控制, 单智能体简化强化学习, 分布式控制, 多智能体简化强化学习

Abstract：In order to quickly suppress the rapid fluctuations of reactive power and voltage caused by the random output change of distributed energies, machine learning (ML) methods represented by deep reinforcement learning (DRL) and imitation learning (IL) have been applied to volt-var control (VVC) research recently, to replace the traditional methods which require a large number of iterations. Although the ML methods in the existing literature can realize the online rapid VVC optimization, there are still some shortcomings such as slow offline training speed and insufficient universality that hinder their applications in practice.
Firstly, this paper proposes a single-agent simplified DRL (SASDRL) method suitable for the centralized control of transmission networks. Based on the classic "Actor-Critic" architecture and the fact that the Actor network can generate wonderful control strategies heavily depends on whether the Critic network can make accurate evaluation, this method simplifies and improves the offline training process of DRL based VVC, whose core ideas are the simplification of Critic network training and the change in the update mode of Actor and Critic network. It simplifies the sequential decision problem set in the traditional DRL based VVC to a single point decision problem and the output of Critic network is transformed from the original sequential action value into the reward value corresponding to the current control strategy. In addition, by training the Critic network in advance to help the accelerated convergence of Actor network, it solves the computational waste problem caused by the random search of agent in the early training stage which greatly improves the offline training speed, and retains the DRL’s advantages like without using massive labeled data and strong universality.
Secondly, a multi-agent simplified DRL method (MASDRL) suitable for decentralized and zero-communication control of active distribution network is proposed. This method generalizes the core idea of SASDRL to form a multi-agent version and continues to accelerate the convergence performance of Actor network of each agent on the basis of training the unified Critic network in advance. Each agent corresponds to a different VVC device in the system. During online application, each agent only uses the local information of the node connected to the VVC device to generate the control strategy through its own Actor network independently. Besides, it adopts IL for initialization to inject the global optimization idea into each agent in advance, and improves the local collaborative control effect between various VVC devices.
Simulation results on the improved IEEE 118-bus system show that SASDRL and MASDRL both achieve the best control results of VVC among all the compared methods. In terms of offline training speed, SASDRL consumes the least amount of training time, whose speed is 4.47 times faster than the traditional DRL and 50.76 times faster than IL. 87.1% of SASDRL's training time is spent on generating the expert samples required for the supervised training of Critic network while only 12.9% is consumed by the training of Actor and Critic network. Regarding MASDRL, it can realize the 82.77% reduction in offline training time compared to traditional MADRL.
The following conclusions can be drawn from the simulation analysis: (1) Compared with traditional mathematical methods and existing ML methods, SASDRL is able to obtain excellent control results similar to mathematical methods while greatly accelerating the offline training speed of DRL based VVC. (2) Compared with traditional MADRL, by the inheritance of SASDRL’ core ideas and the introduction of IL into the initialization of Actor network, the method of MASDRL+IL proposed can improve the local collaborative control effect between various VVC devices and offline training speed significantly.

Key words： Volt-var control centralized control single-agent simplified deep reinforcement learning decentralized control multi-agent simplified deep reinforcement learning

收稿日期: 2022-11-22

PACS:

TM76

基金资助:国家重点研发计划资助项目（2017YFB0903705）

通讯作者: 邓长虹女,1963年生,教授,博士生导师,研究方向为电力系统安全稳定分析、可再生能源接入电网的优化控制。E-mail：dengch@whu.edu.cn

作者简介: 马庆男,1990年生,博士研究生,研究方向电力系统无功电压控制。 E-mail：747942466@qq.com

引用本文:

马庆, 邓长虹. 基于单/多智能体简化强化学习的电力系统无功电压控制[J]. 电工技术学报, 2024, 39(5): 1300-1312. Ma Qing, Deng Changhong. Single/Multi Agent Simplified Deep Reinforcement Learning Based Volt-Var Control of Power System. Transactions of China Electrotechnical Society, 2024, 39(5): 1300-1312.

链接本文:

https://dgjsxb.ces-transaction.com/CN/10.19595/j.cnki.1000-6753.tces.222195 https://dgjsxb.ces-transaction.com/CN/Y2024/V39/I5/1300

[1] Mahmud N, Zahedi A.Review of control strategies for voltage regulation of the smart distribution network with high penetration of renewable distributed generation[J]. Renewable and Sustainable Energy Reviews, 2016, 64: 582-595.
[2] 高聪哲, 黄文焘, 余墨多, 等. 基于智能软开关的主动配电网电压模型预测控制优化方法[J]. 电工技术学报, 2022, 37(13): 3263-3274.
Gao Congzhe, Huang Wentao, Yu Moduo, et al.A model predictive control method to optimize voltages for active distribution networks with soft open point[J]. Transactions of China Electrotechnical Society, 2022, 37(13): 3263-3274.
[3] 康重庆, 姚良忠. 高比例可再生能源电力系统的关键科学问题与理论研究框架[J]. 电力系统自动化, 2017, 41(9): 2-11.
Kang Chongqing, Yao Liangzhong.Key scientific issues and theoretical research framework for power systems with high proportion of renewable energy[J]. Automation of Electric Power Systems, 2017, 41(9): 2-11.
[4] 姚良忠, 朱凌志, 周明, 等. 高比例可再生能源电力系统的协同优化运行技术展望[J]. 电力系统自动化, 2017, 41(9): 36-43.
Yao Liangzhong, Zhu Lingzhi, Zhou Ming, et al.Prospects of coordination and optimization for power systems with high proportion of renewable energy[J]. Automation of Electric Power Systems, 2017, 41(9): 36-43.
[5] 郭庆来, 王彬, 孙宏斌, 等. 支撑大规模风电集中接入的自律协同电压控制技术[J]. 电力系统自动化, 2015, 39(1): 88-93, 130.
Guo Qinglai, Wang Bin, Sun Hongbin, et al.Autonomous-synergic voltage control technology supporting large-scale wind power integration[J]. Automation of Electric Power Systems, 2015, 39(1): 88-93, 130.
[6] Wang Gang, Kekatos V, Conejo A J, et al.Ergodic energy management leveraging resource variability in distribution grids[J]. IEEE Transactions on Power Systems, 2016, 31(6): 4765-4775.
[7] 陈江澜, 汤卫东, 肖小刚, 等. 华中电网协调电压控制模式研究[J]. 电力自动化设备, 2011, 31(8): 47-51.
Chen Jianglan, Tang Weidong, Xiao Xiaogang, et al.Coordinated voltage control for Central China Power Grid[J]. Electric Power Automation Equipment, 2011, 31(8): 47-51.
[8] 徐峰达, 郭庆来, 孙宏斌, 等. 基于模型预测控制理论的风电场自动电压控制[J]. 电力系统自动化, 2015, 39(7): 59-67.
Xu Fengda, Guo Qinglai, Sun Hongbin, et al.Automatic voltage control of wind farms based on model predictive control theory[J]. Automation of Electric Power Systems, 2015, 39(7): 59-67.
[9] 国家市场监督管理总局, 国家标准化管理委员会. GB/T 37408—2019 光伏发电并网逆变器技术要求[S]. 北京: 中国标准出版社, 2019.
[10] Liu Haotian, Wu Wenchuan.Two-stage deep reinforcement learning for inverter-based volt-VAR control in active distribution networks[J]. IEEE Transactions on Smart Grid, 2021, 12(3): 2037-2047.
[11] 颜湘武, 徐韵, 李若瑾, 等. 基于模型预测控制含可再生分布式电源参与调控的配电网多时间尺度无功动态优化[J]. 电工技术学报, 2019, 34(10): 2022-2037.
Yan Xiangwu, Xu Yun, Li Ruojin, et al.Multi-time scale reactive power optimization of distribution grid based on model predictive control and including RDG regulation[J]. Transactions of China Electrotechnical Society, 2019, 34(10): 2022-2037.
[12] 黄大为, 王孝泉, 于娜, 等. 计及光伏出力不确定性的配电网混合时间尺度无功/电压控制策略[J]. 电工技术学报, 2022, 37(17): 4377-4389.
Huang Dawei, Wang Xiaoquan, Yu Na, et al.Hybrid time-scale reactive power/voltage control strategy for distribution network considering photovoltaic output uncertainty[J]. Transactions of China Electrotechnical Society, 2022, 37(17): 4377-4389.
[13] Cao Di, Zhao Junbo, Hu Weihao, et al.Deep reinforcement learning enabled physical-model-free two-timescale voltage control method for active distribution systems[J]. IEEE Transactions on Smart Grid, 2022, 13(1): 149-165.
[14] Wang Licheng, Bai Feifei, Yan Ruifeng, et al.Real-time coordinated voltage control of PV inverters and energy storage for weak networks with high PV penetration[J]. IEEE Transactions on Power Systems, 2018, 33(3): 3383-3395.
[15] 胡丹尔, 彭勇刚, 韦巍, 等. 多时间尺度的配电网深度强化学习无功优化策略[J]. 中国电机工程学报, 2022, 42(14): 5034-5045.
Hu Daner, Peng Yonggang, Wei Wei, et al.Multi-timescale deep reinforcement learning for reactive power optimization of distribution network[J]. Proceedings of the CSEE, 2022, 42(14): 5034-5045.
[16] 李静, 戴文战, 韦巍. 基于混合整数凸规划的含风力发电机组配电网无功补偿优化配置[J]. 电工技术学报, 2016, 31(3): 121-129.
Li Jing, Dai Wenzhan, Wei Wei.A mixed integer convex programming for optimal reactive power compensation in distribution system with wind turbines[J]. Transactions of China Electrotechnical Society, 2016, 31(3): 121-129.
[17] 赵晋泉, 居俐洁, 戴则梅, 等. 基于分支定界—原对偶内点法的日前无功优化[J]. 电力系统自动化, 2015, 39(15): 55-60.
Zhao Jinquan, Ju Lijie, Dai Zemei, et al.Day-ahead reactive power optimization based on branch and bound-interior point method[J]. Automation of Electric Power Systems, 2015, 39(15): 55-60.
[18] 崔挺, 孙元章, 徐箭, 等. 基于改进小生境遗传算法的电力系统无功优化[J]. 中国电机工程学报, 2011, 31(19): 43-50.
Cui Ting, Sun Yuanzhang, Xu Jian, et al.Reactive power optimization of power system based on improved niche genetic algorithm[J]. Proceedings of the CSEE, 2011, 31(19): 43-50.
[19] Malachi Y, Singer S.A genetic algorithm for the corrective control of voltage and reactive power[J]. IEEE Transactions on Power Systems, 2006, 21(1): 295-300.
[20] Jalali M, Kekatos V, Gatsis N, et al.Designing reactive power control rules for smart inverters using support vector machines[J]. IEEE Transactions on Smart Grid, 2020, 11(2): 1759-1770.
[21] 邵美阳, 吴俊勇, 石琛, 等. 基于数据驱动和深度置信网络的配电网无功优化[J]. 电网技术, 2019, 43(6): 1874-1883.
Shao Meiyang, Wu Junyong, Shi Chen, et al.Reactive power optimization of distribution network based on data driven and deep belief network[J]. Power System Technology, 2019, 43(6): 1874-1883.
[22] 李鹏, 姜磊, 王加浩, 等. 基于深度强化学习的新能源配电网双时间尺度无功电压优化[J]. 中国电机工程学报, 2023, 43(16): 6255-6266.
Li Peng, Jiang Lei, Wang Jiahao, et al.Optimization of dual-time scale reactive voltage for distribution network with renewable energy based on deep reinforcement learning[J]. Proceedings of the CSEE, 2023, 43(16): 6255-6266.
[23] 倪爽, 崔承刚, 杨宁, 等. 基于深度强化学习的配电网多时间尺度在线无功优化[J]. 电力系统自动化, 2021, 45(10): 77-85.
Ni Shuang, Cui Chenggang, Yang Ning, et al.Multi-time-scale online optimization for reactive power of distribution network based on deep reinforcement learning[J]. Automation of Electric Power Systems, 2021, 45(10): 77-85.
[24] Duan Jiajun, Shi Di, Diao Ruisheng, et al.Deep-reinforcement-learning-based autonomous voltage control for power grid operations[J]. IEEE Transactions on Power Systems, 2020, 35(1): 814-817.
[25] Wang Wei, Yu Nanpeng, Gao Yuanqi, et al.Safe off-policy deep reinforcement learning algorithm for volt-VAR control in power distribution systems[J]. IEEE Transactions on Smart Grid, 2020, 11(4): 3008-3018.
[26] Yang Qiuling, Wang Gang, Sadeghi A, et al.Two-timescale voltage control in distribution grids using deep reinforcement learning[J]. IEEE Transactions on Smart Grid, 2020, 11(3): 2313-2323.
[27] Kulmala A, Repo Sami, Järventausta P.Coordinated voltage control in distribution networks including several distributed energy resources[J]. IEEE Transactions on Smart Grid, 2014, 5(4): 2010-2020.
[28] Cavraro G, Carli R.Local and distributed voltage control algorithms in distribution networks[J]. IEEE Transactions on Power Systems, 2018, 33(2): 1420-1430.
[29] Karagiannopoulos S, Aristidou P, Hug G.Data-driven local control design for active distribution grids using off-line optimal power flow and machine learning techniques[J]. IEEE Transactions on Smart Grid, 2019, 10(6): 6461-6471.
[30] 乐健, 王曹, 李星锐, 等. 中压配电网多目标分布式优化控制策略[J]. 电工技术学报, 2019, 34(23): 4972-4981.
Le Jian, Wang Cao, Li Xingrui, et al.The multi-object distributed optimization control strategy of medium voltage distribution networks[J]. Transactions of China Electrotechnical Society, 2019, 34(23): 4972-4981.
[31] 赵晋泉, 张振伟, 姚建国, 等. 基于广义主从分裂的输配电网一体化分布式无功优化方法[J]. 电力系统自动化, 2019, 43(3): 108-115.
Zhao Jinquan, Zhang Zhenwei, Yao Jianguo, et al.Heterogeneous decomposition based distributed reactive power optimization method for global transmission and distribution network[J]. Automation of Electric Power Systems, 2019, 43(3): 108-115.
[32] Zeraati M, Hamedani Golshan M E, Guerrero J M. Distributed control of battery energy storage systems for voltage regulation in distribution networks with high PV penetration[J]. IEEE Transactions on Smart Grid, 2018, 9(4): 3582-3593.
[33] Sun Xianzhuo, Qiu Jing.Two-stage volt/var control in active distribution networks with multi-agent deep reinforcement learning method[J]. IEEE Transactions on Smart Grid, 2021, 12(4): 2903-2912.
[34] 赵冬梅, 陶然, 马泰屹, 等. 基于多智能体深度确定策略梯度算法的有功-无功协调调度模型[J]. 电工技术学报, 2021, 36(9): 1914-1925.
Zhao Dongmei, Tao Ran, Ma Taiyi, et al.Active and reactive power coordinated dispatching based on multi-agent deep deterministic policy gradient algorithm[J]. Transactions of China Electrotechnical Society, 2021, 36(9): 1914-1925.
[35] Liu Haotian, Wu Wenchuan.Online multi-agent reinforcement learning for decentralized inverter-based volt-VAR control[J]. IEEE Transactions on Smart Grid, 2021, 12(4): 2980-2990.
[36] Cao Di, Hu Weihao, Zhao Junbo, et al.Reinforcement learning and its applications in modern power and energy systems: a review[J]. Journal of Modern Power Systems and Clean Energy, 2020, 8(6): 1029-1042.
[37] Xu Yan, Dong Zhaoyang, Zhang Rui, et al.Multi-timescale coordinated voltage/var control of high renewable-penetrated distribution systems[J]. IEEE Transactions on Power Systems, 2017, 32(6): 4398-4408.
[38] Yang Yan, Yang Zhifang, Yu Juan, et al.Fast calculation of probabilistic power flow: a model-based deep learning approach[J]. IEEE Transactions on Smart Grid, 2020, 11(3): 2235-2244.
[39] Diederik P Ki, Jimmy L B.Adam: a method for stochastic optimization[C]//Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, USA, 2015: 1-13.
[40] Zhang Cong, Chen Haoyong, Shi Ke, et al.An interval power flow analysis through optimizing-scenarios method[J]. IEEE Transactions on Smart Grid, 2018, 9(5): 5217-5226.