电工技术学报  2024, Vol. 39 Issue (14): 4547-4556    DOI: 10.19595/j.cnki.1000-6753.tces.230694
高速列车电气化控制 |
基于双延迟深度确定性策略梯度的受电弓主动控制
吴延波, 韩志伟, 王惠, 刘志刚, 张雨婧
西南交通大学电气工程学院 成都 611756
Active Pantograph Control of Deep Reinforcement Learning Based on Double Delay Depth Deterministic Strategy Gradient
Wu Yanbo, Han Zhiwei, Wang Hui, Liu Zhigang, zhang Yujing
School of Electric Engineering Southwest Jiaotong University Chengdu 611756 China
全文: PDF (2033 KB)   HTML
输出: BibTeX | EndNote (RIS)      
摘要 弓网系统耦合性能对于高速列车受流质量起着至关重要的作用,提高弓网耦合性能,一种有效的方法是针对受电弓进行主动控制调节,特别是在低速线路提速及列车多线路混跑时,主动控制可通过提高弓网自适应适配性,有效降低线路改造成本并提升受流质量。针对受电弓主动控制问题,该文提出一种基于双延迟深度确定性策略梯度(TD3)的深度强化学习受电弓主动控制算法。通过建立弓网耦合模型实现深度强化学习系统环境模块,利用TD3作为受电弓行为控制策略,最终通过对控制器模型训练实现有效的受电弓控制策略。实验结果表明,运用该文方法可有效提升低速线路列车高速运行时弓网耦合性能及受电弓在多线路运行时的适应性,为铁路线路提速及列车跨线路运行提供新的思路。
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
吴延波
韩志伟
王惠
刘志刚
张雨婧
关键词 低速线路混跑双延迟深度确定性策略梯度(TD3)受电弓主动控制    
Abstract:The stable coupling between the pantograph and the catenary is the foundation for the safe operation of high-speed railway trains. With speed increases, the offline and arcing of the pantograph and catenary can affect the performance, leading to a decrease in the current collection quality of the train. At present, the primary method to improve the current collection quality is the active control method of the pantograph. The self-adaptability of current control algorithms mainly solves adaptive selection problems of algorithm parameters. However, few studies on the impact of changes in line conditions and external disturbances exist. This paper constructs the pantograph active control system based on the deep reinforcement learning method, which can effectively overcome the complex time-varying characteristics of the pantograph catenary system to reduce fluctuations of the pantograph catenary contact force.
The deep reinforcement learning algorithm is introduced. Then, a pantograph catenary coupling model is constructed as the environmental module to generate data for deep reinforcement learning training and obtain feedback on control strategies. The pantograph adopts a three-mass block model, and the contact network adopts a nonlinear pole/cable finite element method coupled through penalty functions. The pantograph active control system's objectives and the existing constraints are analyzed according to state space, observation space, action space, and reward function required in the deep reinforcement learning framework. The process of controller training and testing is provided. The effectiveness and robustness of the pantograph active control system are verified.
The experimental results show that the reinforcement learning active control reduces contact force fluctuations at different speeds, and the average value of the contact force is almost unchanged. Compared with the finite frequency H control, the standard deviation of the contact force is decreased by 21.8% using the double delay deep deterministic strategy gradient (TD3) control. By analyzing the span passing frequency (SPF) data of the contact pressure span, the PSD of contact pressure is reduced by nearly 80% using TD3 control. Since the energy of SPF accounts for a large proportion of the fluctuation frequency of the contact force, reducing the energy of SPF can effectively decrease overall contact force fluctuations. At the same time, TD3 control requires a lower amplitude of the control force than H control, which has a smaller impact on the airbag. From the perspective of the control force output frequency, TD3 control does not adjust the high-frequency part, which is in line with the slow adjustment speed of the pneumatic mechanism of the pantograph airbag. Under different pantograph catenary conditions, TD3 control can reduce the standard deviation of the contact force more effectively than H control, which indicates that TD3 algorithm has good robustness.
Compared with the traditional control methods, (1) the active pantograph control algorithm based on deep reinforcement learning is an end-to-end data-driven algorithm, which does not need an accurate pantograph catenary system model. The control model is generated from readily available operating data and has strong adaptability. (2) The deep reinforcement learning algorithm constructs the relationship between the observation space and the action space to the reward function through exploration and trial and error. Therefore, environmental changes cause changes in the observation value, and the controller can quickly adjust the corresponding action to maximize the reward function. (3) Under the constraints of external conditions, such as pantograph actuators and pantograph observers, different control strategies can be achieved by adjusting the observation space and reward function.
Key wordsLow speed network    mixed running    TD3    active pantograph control   
收稿日期: 2023-05-18     
PACS: TM571  
基金资助:国家自然科学基金资助项目(U1734202, 51977182)
通讯作者: 韩志伟, 男,1981年生,副教授,硕士生导师,研究方向为现代信号处理、计算机视觉及其在铁路和电力系统中的应用。E-mail: zw.han@my.swjtu.edu.cn   
作者简介: 吴延波, 男,1998年生,硕士研究生,研究方向为深度强化学习和受电弓主动控制。E-mail: 17355206572@my.swjtu.edu.cn
引用本文:   
吴延波, 韩志伟, 王惠, 刘志刚, 张雨婧. 基于双延迟深度确定性策略梯度的受电弓主动控制[J]. 电工技术学报, 2024, 39(14): 4547-4556. Wu Yanbo, Han Zhiwei, Wang Hui, Liu Zhigang, zhang Yujing. Active Pantograph Control of Deep Reinforcement Learning Based on Double Delay Depth Deterministic Strategy Gradient. Transactions of China Electrotechnical Society, 2024, 39(14): 4547-4556.
链接本文:  
https://dgjsxb.ces-transaction.com/CN/10.19595/j.cnki.1000-6753.tces.230694          https://dgjsxb.ces-transaction.com/CN/Y2024/V39/I14/4547