电工技术学报  2023, Vol. 38 Issue (8): 2162-2177    DOI: 10.19595/j.cnki.1000-6753.tces.221073
电力系统与综合能源 |
基于改进双延迟深度确定性策略梯度算法的电网有功安全校正控制
顾雪平1, 刘彤1, 李少岩1, 王铁强2, 杨晓东2
1.华北电力大学电气与电子工程学院 保定 071003;
2.国网河北省电力公司 石家庄 050021
Active Power Correction Control of Power Grid Based on Improved Twin Delayed Deep Deterministic Policy Gradient Algorithm
Gu Xueping1, Liu Tong1, Li Shaoyan1, Wang Tieqiang2, Yang Xiaodong2
1. School of Electrical & Electronic Engineering North China Electric Power University Baoding 071003 China;
2. State Grid Hebei Electric Power Company Shijiazhuang 050021 China
全文: PDF (2020 KB)   HTML
输出: BibTeX | EndNote (RIS)      
摘要 

新型电力系统中,由于源荷不确定性的影响,发生线路过载事故的风险增大,传统的有功安全校正方法无法有效兼顾计算速度及效果等。基于此,该文提出一种基于改进双延迟深度确定性策略梯度算法的电网有功安全校正控制方法。首先,在满足系统静态安全约束条件下,以可调元件出力调整量最小且保证系统整体运行安全性最高为目标,建立有功安全校正控制模型。其次,构建有功安全校正的深度强化学习框架,定义计及目标与约束的奖励函数、反映电力系统运行的观测状态、可改变系统状态的调节动作以及基于改进双延迟深度确定性策略梯度算法的智能体。最后,构造考虑源荷不确定性的历史系统过载场景,借助深度强化学习模型对智能体进行持续交互训练以获得良好的决策效果;并且进行在线应用,计及源荷未来可能的取值,快速得到最优的元件调整方案,消除过载线路。IEEE 39节点系统和IEEE 118节点系统算例结果表明,所提方法能够有效消除电力系统中的线路过载且避免短时间内再次越限,在计算速度、校正效果等方面,与传统方法相比具有明显的优势。

服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
顾雪平
刘彤
李少岩
王铁强
杨晓东
关键词 新型电力系统有功安全校正深度强化学习改进双延迟深度确定性策略最优调整方案    
Abstract

With the construction and development of the novel power system, the probability of line overload caused by component faults or source-load fluctuations has been significantly increased. If the system cannot be corrected timely and effectively, the propagation speed and range of cascading faults may be aggravated and lead to a blackout accident. Therefore, the timely and effective implementation of safety correction measures to eliminate power flow over the limit is of great significance to ensure the safe operation of the system.
An active power safety correction control method is proposed based on the twin delayed deep deterministic policy gradient algorithm (TD3) algorithm. Firstly, an active power safety correction model is established. One of the objectives is to minimize the sum of the absolute values of the adjustments of the adjustable components, and the other is to ensure the maximum safety of the system.
Secondly, a deep reinforcement learning framework for active power safety correction is established, as shown in Fig.A1. State expresses the characteristics of the power system. Action is the output of adjustable components. The reward function comprises the objective function and constraint conditions of the active power safety correction model. The agent selects the TD3 algorithm.
Finally, the active power safety correction control is carried out based on the improved TD3 algorithm. The historical overload scenario is constructed to pre-train the active power safety correction model based on the improved TD3 algorithm. Considering the influence of source-load fluctuation on the correction results during the correction process, the possible fluctuation value of the source-load output is calculated for each operating condition. During the online application, the predicted value of source and load in the next 5 minutes plus the

Fig.A1 Interaction process between agent and environment
prediction error value are used as the output value of new energy and the load value at the current time, which are input into the actor network together with the states of other system components. The improved TD3 algorithm with sufficient pre-learning can make the optimal decision quickly according to the system state.
An operation state of the IEEE39-bus system is used to verify the effectiveness and feasibility of the proposed method. In this state, line 23 is suddenly disconnected, then leads to line 13 overload. The correction result is shown in Fig.A2.

Fig.A2 Load rate of each line before and after correction
100 groups of source and load prediction error values are selected, and the predicted values are added to evaluate the system's security after correcting the proposed method. At the same time, the correction results without considering the change of new energy output and load are used for comparison. The results show that the correction effect based on the proposed method considering the fluctuation of source load, system uniformity, and the line's highest load rate is the best. It can withstand relatively more uncertainties to ensure that the system will not appear overloaded in a short time.
The same historical overload scenario trains and tests Deep Deterministic Policy Gradient (DDPG), TD3, and improved TD3 deep reinforcement learning algorithms. The results show that the proposed improved TD3 method is better than the other two algorithms regarding training time, testing time, and calculation results.
Compared with the traditional sensitivity method and optimization method, the calculation time of the proposed method is shorter, but the total adjustment amount is more. The optimization method has the slightest adjustment, but the calculation time is about 10 times the sensitive method. Regarding the proposed method, the number of adjustment components is small, and the time is short. The system uniformity after correction is the highest, but the total adjustment amount is slightly greater than the optimization method.
In conclusion, the calculation results of the active power safety correction model established by the proposed method are more consistent with the actual operation scenario of the power grid. In addition, compared with the traditional methods, the proposed method has certain advantages and is more suitable for the current novel power system.

Key wordsNovel power systems    active power security correction    deep reinforcement learning    improved twin delayed deep deterministic policy gradient    optimal adjustment scheme   
收稿日期: 2022-06-08     
PACS: TM732  
基金资助:

国家电网公司科技资助项目(SGTYHT/17-JS-199)

通讯作者: 李少岩 男,1989年生,副教授,研究方向为电力系统安全防御与恢复控制、人工智能技术及其在电力系统中的应用等。E-mail: shaoyan.li@ncepu.edu.cn   
作者简介: 顾雪平 男,1964年生,教授,博士生导师,研究方向为电力系统安全稳定评估与控制、电力系统安全防御与恢复控制、人工智能技术及其在电力系统中的应用等。E-mail: xpgu@ncepu.edu.cn
引用本文:   
顾雪平, 刘彤, 李少岩, 王铁强, 杨晓东. 基于改进双延迟深度确定性策略梯度算法的电网有功安全校正控制[J]. 电工技术学报, 2023, 38(8): 2162-2177. Gu Xueping, Liu Tong, Li Shaoyan, Wang Tieqiang, Yang Xiaodong. Active Power Correction Control of Power Grid Based on Improved Twin Delayed Deep Deterministic Policy Gradient Algorithm. Transactions of China Electrotechnical Society, 2023, 38(8): 2162-2177.
链接本文:  
https://dgjsxb.ces-transaction.com/CN/10.19595/j.cnki.1000-6753.tces.221073          https://dgjsxb.ces-transaction.com/CN/Y2023/V38/I8/2162