Transactions of China Electrotechnical Society  2023, Vol. 38 Issue (8): 2162-2177    DOI: 10.19595/j.cnki.1000-6753.tces.221073
Current Issue| Next Issue| Archive| Adv Search |
Active Power Correction Control of Power Grid Based on Improved Twin Delayed Deep Deterministic Policy Gradient Algorithm
Gu Xueping1, Liu Tong1, Li Shaoyan1, Wang Tieqiang2, Yang Xiaodong2
1. School of Electrical & Electronic Engineering North China Electric Power University Baoding 071003 China;
2. State Grid Hebei Electric Power Company Shijiazhuang 050021 China

Download: PDF (2020 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  

With the construction and development of the novel power system, the probability of line overload caused by component faults or source-load fluctuations has been significantly increased. If the system cannot be corrected timely and effectively, the propagation speed and range of cascading faults may be aggravated and lead to a blackout accident. Therefore, the timely and effective implementation of safety correction measures to eliminate power flow over the limit is of great significance to ensure the safe operation of the system.
An active power safety correction control method is proposed based on the twin delayed deep deterministic policy gradient algorithm (TD3) algorithm. Firstly, an active power safety correction model is established. One of the objectives is to minimize the sum of the absolute values of the adjustments of the adjustable components, and the other is to ensure the maximum safety of the system.
Secondly, a deep reinforcement learning framework for active power safety correction is established, as shown in Fig.A1. State expresses the characteristics of the power system. Action is the output of adjustable components. The reward function comprises the objective function and constraint conditions of the active power safety correction model. The agent selects the TD3 algorithm.
Finally, the active power safety correction control is carried out based on the improved TD3 algorithm. The historical overload scenario is constructed to pre-train the active power safety correction model based on the improved TD3 algorithm. Considering the influence of source-load fluctuation on the correction results during the correction process, the possible fluctuation value of the source-load output is calculated for each operating condition. During the online application, the predicted value of source and load in the next 5 minutes plus the

Fig.A1 Interaction process between agent and environment
prediction error value are used as the output value of new energy and the load value at the current time, which are input into the actor network together with the states of other system components. The improved TD3 algorithm with sufficient pre-learning can make the optimal decision quickly according to the system state.
An operation state of the IEEE39-bus system is used to verify the effectiveness and feasibility of the proposed method. In this state, line 23 is suddenly disconnected, then leads to line 13 overload. The correction result is shown in Fig.A2.

Fig.A2 Load rate of each line before and after correction
100 groups of source and load prediction error values are selected, and the predicted values are added to evaluate the system's security after correcting the proposed method. At the same time, the correction results without considering the change of new energy output and load are used for comparison. The results show that the correction effect based on the proposed method considering the fluctuation of source load, system uniformity, and the line's highest load rate is the best. It can withstand relatively more uncertainties to ensure that the system will not appear overloaded in a short time.
The same historical overload scenario trains and tests Deep Deterministic Policy Gradient (DDPG), TD3, and improved TD3 deep reinforcement learning algorithms. The results show that the proposed improved TD3 method is better than the other two algorithms regarding training time, testing time, and calculation results.
Compared with the traditional sensitivity method and optimization method, the calculation time of the proposed method is shorter, but the total adjustment amount is more. The optimization method has the slightest adjustment, but the calculation time is about 10 times the sensitive method. Regarding the proposed method, the number of adjustment components is small, and the time is short. The system uniformity after correction is the highest, but the total adjustment amount is slightly greater than the optimization method.
In conclusion, the calculation results of the active power safety correction model established by the proposed method are more consistent with the actual operation scenario of the power grid. In addition, compared with the traditional methods, the proposed method has certain advantages and is more suitable for the current novel power system.

Key wordsNovel power systems      active power security correction      deep reinforcement learning      improved twin delayed deep deterministic policy gradient      optimal adjustment scheme     
Received: 08 June 2022     
PACS: TM732  
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
Gu Xueping
Liu Tong
Li Shaoyan
Wang Tieqiang
Yang Xiaodong
Cite this article:   
Gu Xueping,Liu Tong,Li Shaoyan等. Active Power Correction Control of Power Grid Based on Improved Twin Delayed Deep Deterministic Policy Gradient Algorithm[J]. Transactions of China Electrotechnical Society, 2023, 38(8): 2162-2177.
URL:  
https://dgjsxb.ces-transaction.com/EN/10.19595/j.cnki.1000-6753.tces.221073     OR     https://dgjsxb.ces-transaction.com/EN/Y2023/V38/I8/2162
Copyright © Transactions of China Electrotechnical Society
Supported by: Beijing Magtech