基于泰勒双延迟深度确定性策略梯度算法的自动发电控制

doi:10.19595/j.cnki.1000-6753.tces.241673

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (3287 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要面向新型电力系统大规模清洁能源并网带来的强随机扰动导致控制性能及电网频率稳定性变差的问题,该文从自动发电控制的角度提出泰勒双延迟深度确定性策略梯度算法来获取多区域协同最优解,进而提高大规模清洁能源并网后电力系统的控制性能及频率稳定性。所提算法采用泰勒级数展开更新价值网络,改善了强化学习中存在的动作价值高估,有助于提升算法的控制精度;同时引入可减少训练样本损失的经验回放策略替代训练样本的随机采样,以提升算法寻优正确率,进而减少随机扰动对控制性能的影响。通过搭建改进的IEEE标准两区域负荷频率控制模型和风光水火储一体化三区域互联负荷频率控制模型并进行仿真,验证了所提算法的有效性。相较于其他强化学习算法,该文所提算法具有更优的控制性能和更稳定的频率响应。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	席磊
	王文涛
	全悦
	刘治洪
	任建宇

关键词 ：自动发电控制, 强化学习, 多区域协同, 泰勒双延迟

Abstract：The proportion of clean energy in modern power systems is steadily increasing. The large-scale integration of clean energy, which is highly random and intermittent, introduces significant stochastic disturbances that severely impact the control performance and frequency stability of power systems. This paper explores the challenges of declining control performance and frequency stability caused by large-scale clean energy integration from the perspective of automatic generation control (AGC). Currently, AGC methods in engineering applications mainly follow a centralized model. Since centralized control prioritizes optimizing performance within its own region, it is difficult to achieve coordinated control across different areas. In addition, factors such as communication delays and geographical location further limit the coordination and consistency of centralized control methods.
The new power systems based on a distributed model divide the system into interconnected subsystems. Load frequency control (LFC) is used to regulate the output of each generator, ensuring coordinated operation across multiple areas of the grid. In this process, reinforcement learning algorithms based on Markov decision processes have advantages in solving random problems and enabling multi-area coordinated control. As a result, they are gradually being introduced into AGC to achieve optimal control performance and frequency stability in multi-area grids. However, reinforcement learning algorithms often suffer from the problem of overestimating action values, which can lead to larger frequency deviations in power systems.
To address this issue in distributed power systems, we propose the Taylor twin delayed deep deterministic policy gradient algorithm to obtain the optimal multi-area coordinated solution. This approach aims to improve control performance and frequency stability in power systems with large-scale clean energy integration. The proposed algorithm uses a Taylor series expansion to update the value network, which mitigates the issue of action value overestimation commonly found in reinforcement learning. This improvement enhances the control accuracy of the algorithm, thereby improving the frequency stability of the power system. Additionally, an experience replay strategy is introduced to replace random sampling of training data. This strategy assigns lower priority to samples affected by random disturbances and noise, which tend to reduce learning capability, while giving higher priority to samples with greater learning potential. This approach increases the accuracy of optimization, thus reducing the impact of random disturbances on control performance.
To validate the effectiveness of the proposed algorithm, we developed an improved IEEE standard two-area LFC model and a wind-solar-hydro-thermal-storage integrated three-area interconnected LFC model. Simulations were conducted by introducing step disturbances, random square wave disturbances, and other load variations. The control performance of the TaTD3-ReLo, TaTD3, TD3, DDPG, and DQN algorithms was analyzed under different operating conditions. The series of simulation results demonstrated that the TaTD3-ReLo algorithm exhibits strong robustness and high learning capability. Compared to other reinforcement learning algorithms, the proposed algorithm shows superior control performance and more stable frequency responses. It also enables effective coordination among distributed multi-area interconnected grids in the new power system model, addressing the decline in control performance and frequency stability caused by the large-scale integration of clean energy.

Key words： Automatic generation control reinforcement learning multi-area collaboration Taylor twin delayed

收稿日期: 2024-09-25

PACS:

TM73

基金资助:国家自然科学基金资助项目（52277108, 52477104）

通讯作者: 席磊男,1982年生,博士生导师,研究方向为电力系统运行与控制、自动发电控制、信息物理系统网络攻击与防御、智能控制方法。E-mail：xilei2014@163.com

作者简介: 王文涛男,1999年生,硕士研究生,研究方向为自动发电控制。E-mail：antony322@163.com

引用本文:

席磊, 王文涛, 全悦, 刘治洪, 任建宇. 基于泰勒双延迟深度确定性策略梯度算法的自动发电控制[J]. 电工技术学报, 2025, 40(17): 5501-5513. Xi Lei, Wang Wentao, Quan Yue, Liu Zhihong, Ren Jianyu. Automatic Generation Control Based on the Taylor Twin Delayed Deep Deterministic Policy Gradient Algorithm. Transactions of China Electrotechnical Society, 2025, 40(17): 5501-5513.

链接本文:

https://dgjsxb.ces-transaction.com/CN/10.19595/j.cnki.1000-6753.tces.241673 https://dgjsxb.ces-transaction.com/CN/Y2025/V40/I17/5501

[1] 李军徽, 潘雅慧, 穆钢, 等. 高比例风电系统中储能集群辅助火电机组调峰分层优化控制策略[J]. 电工技术学报, 2025, 40(7): 2127-2145.
Li Junhui, Pan Yahui, Mu Gang, et al.Hierarchical optimal control strategy for storage cluster-assisted thermal unit peaking in high-ratio wind power system[J]. Transactions of China Electrotechnical Society, 2025, 40(7): 2127-2145.
[2] Debbarma S, Saikia L C, Sinha N.Automatic generation control using two degree of freedom fractional order PID controller[J]. International Journal of Electrical Power & Energy Systems, 2014, 58: 120-129.
[3] Sahu R K, Panda S, Yegireddy N K.A novel hybrid DEPS optimized fuzzy PI/PID controller for load frequency control of multi-area interconnected power systems[J]. Journal of Process Control, 2014, 24(10): 1596-1608.
[4] Sahu B K, Pati S, Mohanty P K, et al.Teaching-learning based optimization algorithm based fuzzy-PID controller for automatic generation control of multi-area power system[J]. Applied Soft Computing, 2015, 27: 240-249.
[5] Liu Fang, Li Yong, Cao Yijia, et al.A two-layer active disturbance rejection controller design for load frequency control of interconnected power system[J]. IEEE Transactions on Power Systems, 2016, 31(4): 3320-3321.
[6] 王磊, 胡国, 吴海, 等. 基于分层深度强化学习的分布式能源系统多能协同优化方法[J]. 电力系统自动化, 2024, 48(1): 67-76.
Wang Lei, Hu Guo, Wu Hai, et al.Multi-energy collaborative optimization method for distributed energy systems based on hierarchical deep reinforcement learning[J]. Automation of Electric Power Systems, 2024, 48(1): 67-76.
[7] Yin Linfei, Zhang Chenwei, Wang Yaoxiong, et al.Emotional deep learning programming controller for automatic voltage control of power systems[J]. IEEE Access, 2021, 9: 31880-31891.
[8] Zhang Xiao shun, Yu Tao, Pan Zhen ning, et al. Lifelong learning for complementary generation control of interconnected power grids with high-penetration renewables and EVs[J]. IEEE Transactions on Power Systems, 2018, 33(4): 4097-4110.
[9] 罗清局, 朱继忠. 基于多参数规划改进ADMM的线性电-气综合能源系统分布式优化调度[J]. 电工技术学报, 2024, 39(9): 2797-2809.
Luo Qingju, Zhu Jizhong.Distributed optimal dispatch of linear integrated electricity and gas system based on multi-parameter programming modified ADMM[J]. Transactions of China Electrotechnical Society, 2024, 39(9): 2797-2809.
[10] Li Jiawen, Yu Tao, Zhu Hanxin, et al.Multi-agent deep reinforcement learning for sectional AGC dispatch[J]. IEEE Access, 2020, 8: 158067-158081.
[11] 张薇, 王浚宇, 杨茂, 等. 基于分布式双层强化学习的区域综合能源系统多时间尺度优化调度[J/OL]. 电工技术学报, 2024: 1-16. https://doi.org/10.19595/j.cnki.1000-6753.tces.240907.
Zhang Wei, Wang Junyu, Yang Mao, el al. The multi-time-scale optimal scheduling for regional integrated energy system based on the distributed bi-layer reinforcement learning[J]. Transactions of China Electrotechnical Society, 2024: 1-16. https://doi.org/10.19595/j.cnki.1000-6753.tces.240907.
[12] Li Jiawen, Yu Tao.Virtual generation alliance automatic generation control based on deep reinfo-rcement learning[J]. IEEE Access, 2020, 8: 182204-182217.
[13] Yu Tao, Zhou Bin, Chan K W, et al.Stochastic optimal relaxed automatic generation control in non-Markov environment based on multi-step Q(λ) learning[J]. IEEE Transactions on Power Systems, 2011, 26(3): 1272-1282.
[14] Yu T, Zhou B, Chan K W, et al.R(λ) imitation learning for automatic generation control of interconnected power grids[J]. Automatica, 2012, 48(9): 2130-2136.
[15] Zhang Xiaoshun, Li Qing, Yu Tao, et al.Consensus transfer Q(λ)-learning for decentralized generation command dispatch based on virtual generation tribe[J]. IEEE Transactions on Smart Grid, 2018, 9(3): 2152-2165.
[16] Thrun S, Schwartz A.Issues in using function approximation for reinforcement learning[C]//Procee-dings of the 1993 connectionist models summer school, Hillsdale, NJ, USA, 1993: 255-263.
[17] Hasselt H.Double Q-learning[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2010, 2: 2613-2621.
[18] 李彦营, 席磊, 郭宜果, 等. 基于权重双Q-时延更新学习算法的自动发电控制[J]. 中国电机工程学报, 2022, 42(15): 5459-5471.
Li Yanying, Xi Lei, Guo Yiguo, et al.Automatic generation control based on the weighted double Q-delayed update learning algorithm[J]. Proceedings of the CSEE, 2022, 42(15): 5459-5471.
[19] Xi Lei, Li Haokai, Zhu Jizhong, et al.A novel automatic generation control method based on the large-scale electric vehicles and wind power integration into the grid[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(5): 5824-5834.
[20] 席磊, 刘治洪, 李彦营. 基于拉格朗日松弛强化学习算法的自动发电控制[J]. 中国电机工程学报, 2023, 43(4): 1359-1369.
Xi Lei, Liu Zhihong, Li Yanying.Automatic generation control based on Lagrangian relaxation reinforcement learning algorithm[J]. Proceedings of the CSEE, 2023, 43(4): 1359-1369.
[21] Lillicrap T, Hunt J J, Pritzel A, et al.Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:150902971, 2015.
[22] Vaswani S, Kazemi A, Babanezhad R, et al.Addressing function approximation error in actor-critic methods: supplementary material A. proof of convergence of clipped double Q-learning[C]// Proce-edings of the International Conference on Machine Learning, PMIL, 2018: 1587-1596.
[23] Garibbo M, Robeyns M, Aitchison L.Taylor TD-learning[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 2023: 1061-1081.
[24] Sujit S, Nath S, Braga P, et al.Prioritizing samples in reinforcement learning with reducible loss[J]. Advances in Neural Information Processing Systems, 2023, 36: 23237-23258.
[25] 甘伟, 艾小猛, 方家琨, 等. 风-火-水-储-气联合优化调度策略[J]. 电工技术学报, 2017, 32(增刊1): 11-20.
Gan Wei, Ai Xiaomeng, Fang Jiakun, et al.Coordinated optimal operation of the wind, coal, hydro, gas units with energy storage[J]. Transactions of China Electrotechnical Society, 2017, 32(S1): 11-20.
[26] Magdy G, Shabib G, Elbaset A A, et al.Renewable power systems dynamic security using a new coordination of frequency control strategy based on virtual synchronous generator and digital frequency protection[J]. International Journal of Electrical Power & Energy Systems, 2019, 109: 351-368.
[27] 赵熙临, 周红玉, 付波, 等. 一种用于微网调频的风电与抽水蓄能综合控制方法[J]. 河南理工大学学报(自然科学版), 2023, 42(4): 121-129.
Zhao Xilin, Zhou Hongyu, Fu Bo, et al.A comprehensive control method for wind power and pumped storage in the frequency regulation of microgrid[J]. Journal of Henan Polytechnic University (Natural Science), 2023, 42(4): 121-129.
[28] 李嘉文, 余涛, 张孝顺, 等. 基于改进深度确定性梯度算法的AGC发电功率指令分配方法[J]. 中国电机工程学报, 2021, 41(21): 7198-7212.
Li Jiawen, Yu Tao, Zhang Xiaoshun, et al.AGC power generation command allocation method based on improved deep deterministic policy gradient algorithm[J]. Proceedings of the CSEE, 2021, 41(21): 7198-7212.
[29] Jaleeli N, VanSlyck L S. NERC's new control performance standards[J]. IEEE Transactions on Power Systems, 1999, 14(3): 1092-1099.
[30] 吴珊, 边晓燕, 张菁娴, 等. 面向新型电力系统灵活性提升的国内外辅助服务市场研究综述[J]. 电工技术学报, 2023, 38(6): 1662-1677.
Wu Shan, Bian Xiaoyan, Zhang Jingxian, et al.A review of domestic and foreign ancillary services market for improving flexibility of new power system[J]. Transactions of China Electrotechnical Society, 2023, 38(6): 1662-1677.
[31] Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J]. ArXiv e-Prints, 2013. arXiv: 1312.5602.