基于双延迟深度确定性策略梯度的受电弓主动控制

doi:10.19595/j.cnki.1000-6753.tces.230694

摘要
图/表
参考文献
相关文章 (10)

全文: PDF (2033 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要弓网系统耦合性能对于高速列车受流质量起着至关重要的作用,提高弓网耦合性能,一种有效的方法是针对受电弓进行主动控制调节,特别是在低速线路提速及列车多线路混跑时,主动控制可通过提高弓网自适应适配性,有效降低线路改造成本并提升受流质量。针对受电弓主动控制问题,该文提出一种基于双延迟深度确定性策略梯度（TD3）的深度强化学习受电弓主动控制算法。通过建立弓网耦合模型实现深度强化学习系统环境模块,利用TD3作为受电弓行为控制策略,最终通过对控制器模型训练实现有效的受电弓控制策略。实验结果表明,运用该文方法可有效提升低速线路列车高速运行时弓网耦合性能及受电弓在多线路运行时的适应性,为铁路线路提速及列车跨线路运行提供新的思路。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	吴延波
	韩志伟
	王惠
	刘志刚
	张雨婧

关键词 ：低速线路, 混跑, 双延迟深度确定性策略梯度（TD3）, 受电弓主动控制

Abstract：The stable coupling between the pantograph and the catenary is the foundation for the safe operation of high-speed railway trains. With speed increases, the offline and arcing of the pantograph and catenary can affect the performance, leading to a decrease in the current collection quality of the train. At present, the primary method to improve the current collection quality is the active control method of the pantograph. The self-adaptability of current control algorithms mainly solves adaptive selection problems of algorithm parameters. However, few studies on the impact of changes in line conditions and external disturbances exist. This paper constructs the pantograph active control system based on the deep reinforcement learning method, which can effectively overcome the complex time-varying characteristics of the pantograph catenary system to reduce fluctuations of the pantograph catenary contact force.
The deep reinforcement learning algorithm is introduced. Then, a pantograph catenary coupling model is constructed as the environmental module to generate data for deep reinforcement learning training and obtain feedback on control strategies. The pantograph adopts a three-mass block model, and the contact network adopts a nonlinear pole/cable finite element method coupled through penalty functions. The pantograph active control system's objectives and the existing constraints are analyzed according to state space, observation space, action space, and reward function required in the deep reinforcement learning framework. The process of controller training and testing is provided. The effectiveness and robustness of the pantograph active control system are verified.
The experimental results show that the reinforcement learning active control reduces contact force fluctuations at different speeds, and the average value of the contact force is almost unchanged. Compared with the finite frequency H_∞ control, the standard deviation of the contact force is decreased by 21.8% using the double delay deep deterministic strategy gradient (TD3) control. By analyzing the span passing frequency (SPF) data of the contact pressure span, the PSD of contact pressure is reduced by nearly 80% using TD3 control. Since the energy of SPF accounts for a large proportion of the fluctuation frequency of the contact force, reducing the energy of SPF can effectively decrease overall contact force fluctuations. At the same time, TD3 control requires a lower amplitude of the control force than H_∞ control, which has a smaller impact on the airbag. From the perspective of the control force output frequency, TD3 control does not adjust the high-frequency part, which is in line with the slow adjustment speed of the pneumatic mechanism of the pantograph airbag. Under different pantograph catenary conditions, TD3 control can reduce the standard deviation of the contact force more effectively than H_∞ control, which indicates that TD3 algorithm has good robustness.
Compared with the traditional control methods, (1) the active pantograph control algorithm based on deep reinforcement learning is an end-to-end data-driven algorithm, which does not need an accurate pantograph catenary system model. The control model is generated from readily available operating data and has strong adaptability. (2) The deep reinforcement learning algorithm constructs the relationship between the observation space and the action space to the reward function through exploration and trial and error. Therefore, environmental changes cause changes in the observation value, and the controller can quickly adjust the corresponding action to maximize the reward function. (3) Under the constraints of external conditions, such as pantograph actuators and pantograph observers, different control strategies can be achieved by adjusting the observation space and reward function.

Key words： Low speed network mixed running TD3 active pantograph control

收稿日期: 2023-05-18

PACS:

TM571

基金资助:国家自然科学基金资助项目（U1734202, 51977182）

通讯作者: 韩志伟, 男,1981年生,副教授,硕士生导师,研究方向为现代信号处理、计算机视觉及其在铁路和电力系统中的应用。E-mail: zw.han@my.swjtu.edu.cn

作者简介: 吴延波, 男,1998年生,硕士研究生,研究方向为深度强化学习和受电弓主动控制。E-mail: 17355206572@my.swjtu.edu.cn

引用本文:

吴延波, 韩志伟, 王惠, 刘志刚, 张雨婧. 基于双延迟深度确定性策略梯度的受电弓主动控制[J]. 电工技术学报, 2024, 39(14): 4547-4556. Wu Yanbo, Han Zhiwei, Wang Hui, Liu Zhigang, zhang Yujing. Active Pantograph Control of Deep Reinforcement Learning Based on Double Delay Depth Deterministic Strategy Gradient. Transactions of China Electrotechnical Society, 2024, 39(14): 4547-4556.

链接本文:

https://dgjsxb.ces-transaction.com/CN/10.19595/j.cnki.1000-6753.tces.230694 https://dgjsxb.ces-transaction.com/CN/Y2024/V39/I14/4547

[1] Wang Hongrui, Liu Zhigang, Song Yang, et al.Dete-ction of contact wire irregularities using a quadratic time-frequency representation of the pantograph-catenary contact force[J]. IEEE Transactions on Instrumentation and Measurement, 2016, 65(6): 1385-1397.
[2] 陈忠华, 唐俊, 时光, 等. 弓网强电流滑动电接触摩擦振动分析与建模[J]. 电工技术学报, 2020, 35(18): 3869-3877.
Chen Zhonghua, Tang Jun, Shi Guang, et al.Analysis and modeling of high current sliding electrical contact friction dynamics in pantograph-catenary system[J]. Transactions of China Electrotechnical Society, 2020, 35(18): 3869-3877.
[3] 程肥肥. 高速受电弓结构参数设计优化研究[D]. 成都: 西南交通大学, 2020.
[4] Pisano A, Usai E.Contact force estimation and regulation in active pantographs: an algebraic observability approach[C]//2007 46th IEEE Con-ference on Decision and Control, New Orleans, LA, USA, 2008: 4341-4346.
[5] Mokrani N, Rachid A.A robust control of contact force of pantograph-catenary for the high-speed train[C]//2013 European Control Conference (ECC), Zurich, Switzerland, 2013: 4568-4573.
[6] 谢松霖, 张静, 宋宝林, 等. 计及作动器时滞的高速铁路受电弓最优控制[J]. 电工技术学报, 2022, 37(2): 505-514.
Xie Songlin, Zhang Jing, Song Baolin, et al.Optimal control of pantograph for high-speed railway con-sidering actuator time delay[J]. Transactions of China Electrotechnical Society, 2022, 37(2): 505-514.
[7] 王帅. 基于弓网接触力预测的受电弓主动控制方法研究[D]. 重庆: 重庆交通大学, 2022.
[8] 张静, 宋宝林, 谢松霖, 等. 基于状态估计的高速受电弓鲁棒预测控制[J]. 电工技术学报, 2021, 36(5): 1075-1083.
Zhang Jing, Song Baolin, Xie Songlin, et al.Robust predictive control of high-speed pantograph based on state estimation[J]. Transactions of China Electro-technical Society, 2021, 36(5): 1075-1083.
[9] Jiao Yuwei, Wang Ying, Chen Xiaoqiang, et al.Active control of pantograph based on prior-information of catenary[C]//2020 IEEE 5th Infor-mation Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 2020: 460-464.
[10] Wang Hui, Han Zhiwei, Liu Zhigang, et al.Deep reinforcement learning based active pantograph control strategy in high-speed railway[J]. IEEE Transactions on Vehicular Technology, 2023, 72(1): 227-238.
[11] Cully A, Clune J, Tarapore D, et al.Robots that can adapt like animals[J]. Nature, 2015, 521(7553): 503-507.
[12] Shao Kun, Zhao Dongbin, Li Nannan, et al.Learning battles in ViZDoom via deep reinforcement lear-ning[C]//2018 IEEE Conference on Computational Intelligence and Games (CIG), Maastricht, Nether-lands, 2018: 1-4.
[13] Wang Qi, Wang Xianping.Deep convolutional neural network for decoding EMG for human computer interaction[C]//2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 2020: 554-557.
[14] 陈泽宇, 方志远, 杨瑞鑫, 等. 基于深度强化学习的混合动力汽车能量管理策略[J]. 电工技术学报, 2022, 37(23): 6157-6168.
Chen Zeyu, Fang Zhiyuan, Yang Ruixin, et al.Energy management strategy for hybrid electric vehicle based on the deep reinforcement learning method[J]. Transactions of China Electrotechnical Society, 2022, 37(23): 6157-6168.
[15] Mu Ruihui, Zeng Xiaoqin.A review of deep learning research[J]. Transactions on Internet and Information Systems, 2019, 13(4): 1738-1764.
[16] Fujimoto S, van Hoof H, Meger D. Addressing fun-ction approximation error in actor-critic methods[EB/OL].2018: arXiv: 1802.09477. https://arxiv.org/abs/1802.09477.pdf.
[17] Nguyen T T, Nguyen N D, Nahavandi S.Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications[J]. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839.
[18] 顾雪平, 刘彤, 李少岩, 等. 基于改进双延迟深度确定性策略梯度算法的电网有功安全校正控制[J]. 电工技术学报, 2023, 38(8): 2162-2177.
Gu Xueping, Liu Tong, Li Shaoyan, et al.Active power correction control of power grid based on improved twin delayed deep deterministic policy gradient algorithm[J]. Transactions of China Elec-trotechnical Society, 2023, 38(8): 2162-2177.
[19] Hunter J S.The exponentially weighted moving average[J]. Journal of Quality Technology, 1986, 18(4): 203-210.
[20] Song Yang, Liu Zhigang, Wang Hongrui, et al.Nonlinear modelling of high-speed catenary based on analytical expressions of cable and truss elements[J]. Vehicle System Dynamics, 2015, 53(10): 1455-1479.
[21] CENELEC. Railway applications-current collection systems-validation of simulation of the dynamic interaction between pantograph and overhead contact line: EN 50318-2018[P].2018-12-01.
[22] 杨鹏, 张静, 金伟, 等. 考虑气动系统的高速受电弓分层控制[J]. 电工技术学报, 2022, 37(10): 2644-2655.
Yang Peng, Zhang Jing, Jin Wei, et al.Hierarchical control of high-speed pantograph considering pneumatic system[J]. Transactions of China Elec-trotechnical Society, 2022, 37(10): 2644-2655.
[23] 宋洋, 刘志刚, 鲁小兵, 等. 计及接触网空气动力的高速弓网动态受流特性研究[J]. 铁道学报, 2016, 38(3): 48-58.
Song Yang, Liu Zhigang, Lu Xiaobing, et al.Study on characteristics of dynamic current collection of high-speed pantograph-catenary considering aerody-namics of catenary[J]. Journal of the China Railway Society, 2016, 38(3): 48-58.
[24] Lu Xiaobing, Liu Zhigang, Zhang Jing, et al.Prior-information-based finite-frequency H_∞ control for active double pantograph in high-speed railway[J]. IEEE Transactions on Vehicular Technology, 2017, 66(10): 8723-8733.