Transactions of China Electrotechnical Society
Current Issue| Next Issue| Archive| Adv Search |
Active Power Correction Control of Power Grid Based on Improved Twin Delayed Deep Deterministic Policy Gradient Algorithm
Gu Xueping1, Liu Tong1, Li Shaoyan1, Wang Tieqiang2, Yang Xiaodong2
1. School of Electrical & Electronic Engineering North China Electric Power University Baoding 071003 China;
2. State Grid Hebei Electric Power Company Shijiazhuang 050021 China

Download: PDF (915 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  

With the construction and development of the novel power system, the probability of line overload caused by component faults or source-load fluctuations has been significantly increased. If the system cannot be corrected timely and effectively, the propagation speed and range of cascading faults may be aggravated, and then lead to blackout accident. Therefore, the timely and effective implementation of safety correction measures to eliminate power flow over limit is of great significance to ensure the safe operation of the system.
An active power safety correction control method based on twin delayed deep deterministic policy gradient algorithm (TD3) algorithm is proposed. Firstly, an active power safety correction model is established, and one of the objectives is to minimize the sum of the absolute values of the adjustable elements, and the other is to ensure the maximum safety of the system.
Secondly, a deep reinforcement learning framework for active power safety correction is established, as shown in Fig.A1. State expresses the characteristics of power system. Action is the output of adjustable components. The reward function is composed of the objective function and constraint conditions of the active power safety correction model. The agent selects the TD3 algorithm.
Finally, the active power safety correction control is carried out based on the improved TD3 algorithm. The historical overload scenario is constructed to pre-train the active power safety correction model based on the improved TD3 algorithm. Considering the influence of source-load fluctuation on the correction results during the correction process, the possible fluctuation value of the source-load output is calculated for each operation condition. During online application, the predicted value of source and load in the next 5 minutes plus the prediction error value are used as the output value of new energy and the load value at the current time, which are input into the actor network together with the states of other system components. The improved TD3 algorithm with sufficient pre-learning can make the optimal decision quickly according to the system state.


An operation state of IEEE 39-bus system is taken as an example to verify the effectiveness and feasibility of the proposed method. In this state, line 23 is suddenly disconnected then lead to line 13 overload. The correction result is shown in Fig.A2.


100 groups of source and load prediction error values are selected, and the predicted values are added to evaluate the system security after correction of the proposed method. At the same time, the correction results without considering the change of new energy output and load are used for comparison. The results show that the correction effect based on the proposed method considering the fluctuation of source-load, system uniformity and the line highest load rate is the best, and it can withstand relatively more uncertainties to ensure that the system will not appear overload in a short time.
The same historical overload scenario is used to train and test deep deterministic policy gradient (DDPG), TD3 and improved TD3 deep reinforcement learning algorithms. The results show that the proposed improved TD3 method is better than the other two algorithms in terms of training time, testing time and calculation results.
The results of the proposed method are compared with those of the traditional sensitivity method and optimization method. The results show that the calculation time of sensitivity method is shorter, but the total adjustment amount is more. The optimization method has the least amount of adjustment, but the calculation time is longer, about 10 times that of the sensitive method. In the calculation results of this paper method, the number of adjustment components and the time are small, and the system uniformity after correction is the highest, but the total adjustment amount is slightly greater than that of the optimization method.
In conclusion, the calculation results of the active power safety correction model established by the proposed method are more consistent with the actual operation scenario of the power grid. In addition, compared with the traditional methods, the proposed method has certain advantages and is more suitable for the current novel power system.

Key wordsNovel power systems      active power security correction      deep reinforcement learning      improved twin delayed deep deterministic policy gradient      optimal adjustment scheme     
Received: 08 June 2022     
PACS: TM732  
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
Gu Xueping
Liu Tong
Li Shaoyan
Wang Tieqiang
Yang Xiaodong
Cite this article:   
Gu Xueping,Liu Tong,Li Shaoyan等. Active Power Correction Control of Power Grid Based on Improved Twin Delayed Deep Deterministic Policy Gradient Algorithm[J]. Transactions of China Electrotechnical Society, 0, (): 32-32.
URL:  
https://dgjsxb.ces-transaction.com/EN/10.19595/j.cnki.1000-6753.tces.221073     OR     https://dgjsxb.ces-transaction.com/EN/Y0/V/I/32
Copyright © Transactions of China Electrotechnical Society
Supported by: Beijing Magtech