|
|
Active Pantograph Control of Deep Reinforcement Learning Based on Double Delay Depth Deterministic Strategy Gradient |
Wu Yanbo, Han Zhiwei, Wang Hui, Liu Zhigang, zhang Yujing |
School of Electric Engineering Southwest Jiaotong University Chengdu 611756 China |
|
|
Abstract The stable coupling between the pantograph and the catenary is the foundation for the safe operation of high-speed railway trains. With speed increases, the offline and arcing of the pantograph and catenary can affect the performance, leading to a decrease in the current collection quality of the train. At present, the primary method to improve the current collection quality is the active control method of the pantograph. The self-adaptability of current control algorithms mainly solves adaptive selection problems of algorithm parameters. However, few studies on the impact of changes in line conditions and external disturbances exist. This paper constructs the pantograph active control system based on the deep reinforcement learning method, which can effectively overcome the complex time-varying characteristics of the pantograph catenary system to reduce fluctuations of the pantograph catenary contact force. The deep reinforcement learning algorithm is introduced. Then, a pantograph catenary coupling model is constructed as the environmental module to generate data for deep reinforcement learning training and obtain feedback on control strategies. The pantograph adopts a three-mass block model, and the contact network adopts a nonlinear pole/cable finite element method coupled through penalty functions. The pantograph active control system's objectives and the existing constraints are analyzed according to state space, observation space, action space, and reward function required in the deep reinforcement learning framework. The process of controller training and testing is provided. The effectiveness and robustness of the pantograph active control system are verified. The experimental results show that the reinforcement learning active control reduces contact force fluctuations at different speeds, and the average value of the contact force is almost unchanged. Compared with the finite frequency H∞ control, the standard deviation of the contact force is decreased by 21.8% using the double delay deep deterministic strategy gradient (TD3) control. By analyzing the span passing frequency (SPF) data of the contact pressure span, the PSD of contact pressure is reduced by nearly 80% using TD3 control. Since the energy of SPF accounts for a large proportion of the fluctuation frequency of the contact force, reducing the energy of SPF can effectively decrease overall contact force fluctuations. At the same time, TD3 control requires a lower amplitude of the control force than H∞ control, which has a smaller impact on the airbag. From the perspective of the control force output frequency, TD3 control does not adjust the high-frequency part, which is in line with the slow adjustment speed of the pneumatic mechanism of the pantograph airbag. Under different pantograph catenary conditions, TD3 control can reduce the standard deviation of the contact force more effectively than H∞ control, which indicates that TD3 algorithm has good robustness. Compared with the traditional control methods, (1) the active pantograph control algorithm based on deep reinforcement learning is an end-to-end data-driven algorithm, which does not need an accurate pantograph catenary system model. The control model is generated from readily available operating data and has strong adaptability. (2) The deep reinforcement learning algorithm constructs the relationship between the observation space and the action space to the reward function through exploration and trial and error. Therefore, environmental changes cause changes in the observation value, and the controller can quickly adjust the corresponding action to maximize the reward function. (3) Under the constraints of external conditions, such as pantograph actuators and pantograph observers, different control strategies can be achieved by adjusting the observation space and reward function.
|
Received: 18 May 2023
|
|
|
|
|
[1] Wang Hongrui, Liu Zhigang, Song Yang, et al.Dete-ction of contact wire irregularities using a quadratic time-frequency representation of the pantograph-catenary contact force[J]. IEEE Transactions on Instrumentation and Measurement, 2016, 65(6): 1385-1397. [2] 陈忠华, 唐俊, 时光, 等. 弓网强电流滑动电接触摩擦振动分析与建模[J]. 电工技术学报, 2020, 35(18): 3869-3877. Chen Zhonghua, Tang Jun, Shi Guang, et al.Analysis and modeling of high current sliding electrical contact friction dynamics in pantograph-catenary system[J]. Transactions of China Electrotechnical Society, 2020, 35(18): 3869-3877. [3] 程肥肥. 高速受电弓结构参数设计优化研究[D]. 成都: 西南交通大学, 2020. [4] Pisano A, Usai E.Contact force estimation and regulation in active pantographs: an algebraic observability approach[C]//2007 46th IEEE Con-ference on Decision and Control, New Orleans, LA, USA, 2008: 4341-4346. [5] Mokrani N, Rachid A.A robust control of contact force of pantograph-catenary for the high-speed train[C]//2013 European Control Conference (ECC), Zurich, Switzerland, 2013: 4568-4573. [6] 谢松霖, 张静, 宋宝林, 等. 计及作动器时滞的高速铁路受电弓最优控制[J]. 电工技术学报, 2022, 37(2): 505-514. Xie Songlin, Zhang Jing, Song Baolin, et al.Optimal control of pantograph for high-speed railway con-sidering actuator time delay[J]. Transactions of China Electrotechnical Society, 2022, 37(2): 505-514. [7] 王帅. 基于弓网接触力预测的受电弓主动控制方法研究[D]. 重庆: 重庆交通大学, 2022. [8] 张静, 宋宝林, 谢松霖, 等. 基于状态估计的高速受电弓鲁棒预测控制[J]. 电工技术学报, 2021, 36(5): 1075-1083. Zhang Jing, Song Baolin, Xie Songlin, et al.Robust predictive control of high-speed pantograph based on state estimation[J]. Transactions of China Electro-technical Society, 2021, 36(5): 1075-1083. [9] Jiao Yuwei, Wang Ying, Chen Xiaoqiang, et al.Active control of pantograph based on prior-information of catenary[C]//2020 IEEE 5th Infor-mation Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 2020: 460-464. [10] Wang Hui, Han Zhiwei, Liu Zhigang, et al.Deep reinforcement learning based active pantograph control strategy in high-speed railway[J]. IEEE Transactions on Vehicular Technology, 2023, 72(1): 227-238. [11] Cully A, Clune J, Tarapore D, et al.Robots that can adapt like animals[J]. Nature, 2015, 521(7553): 503-507. [12] Shao Kun, Zhao Dongbin, Li Nannan, et al.Learning battles in ViZDoom via deep reinforcement lear-ning[C]//2018 IEEE Conference on Computational Intelligence and Games (CIG), Maastricht, Nether-lands, 2018: 1-4. [13] Wang Qi, Wang Xianping.Deep convolutional neural network for decoding EMG for human computer interaction[C]//2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 2020: 554-557. [14] 陈泽宇, 方志远, 杨瑞鑫, 等. 基于深度强化学习的混合动力汽车能量管理策略[J]. 电工技术学报, 2022, 37(23): 6157-6168. Chen Zeyu, Fang Zhiyuan, Yang Ruixin, et al.Energy management strategy for hybrid electric vehicle based on the deep reinforcement learning method[J]. Transactions of China Electrotechnical Society, 2022, 37(23): 6157-6168. [15] Mu Ruihui, Zeng Xiaoqin.A review of deep learning research[J]. Transactions on Internet and Information Systems, 2019, 13(4): 1738-1764. [16] Fujimoto S, van Hoof H, Meger D. Addressing fun-ction approximation error in actor-critic methods[EB/OL].2018: arXiv: 1802.09477. https://arxiv.org/abs/1802.09477.pdf. [17] Nguyen T T, Nguyen N D, Nahavandi S.Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications[J]. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839. [18] 顾雪平, 刘彤, 李少岩, 等. 基于改进双延迟深度确定性策略梯度算法的电网有功安全校正控制[J]. 电工技术学报, 2023, 38(8): 2162-2177. Gu Xueping, Liu Tong, Li Shaoyan, et al.Active power correction control of power grid based on improved twin delayed deep deterministic policy gradient algorithm[J]. Transactions of China Elec-trotechnical Society, 2023, 38(8): 2162-2177. [19] Hunter J S.The exponentially weighted moving average[J]. Journal of Quality Technology, 1986, 18(4): 203-210. [20] Song Yang, Liu Zhigang, Wang Hongrui, et al.Nonlinear modelling of high-speed catenary based on analytical expressions of cable and truss elements[J]. Vehicle System Dynamics, 2015, 53(10): 1455-1479. [21] CENELEC. Railway applications-current collection systems-validation of simulation of the dynamic interaction between pantograph and overhead contact line: EN 50318-2018[P].2018-12-01. [22] 杨鹏, 张静, 金伟, 等. 考虑气动系统的高速受电弓分层控制[J]. 电工技术学报, 2022, 37(10): 2644-2655. Yang Peng, Zhang Jing, Jin Wei, et al.Hierarchical control of high-speed pantograph considering pneumatic system[J]. Transactions of China Elec-trotechnical Society, 2022, 37(10): 2644-2655. [23] 宋洋, 刘志刚, 鲁小兵, 等. 计及接触网空气动力的高速弓网动态受流特性研究[J]. 铁道学报, 2016, 38(3): 48-58. Song Yang, Liu Zhigang, Lu Xiaobing, et al.Study on characteristics of dynamic current collection of high-speed pantograph-catenary considering aerody-namics of catenary[J]. Journal of the China Railway Society, 2016, 38(3): 48-58. [24] Lu Xiaobing, Liu Zhigang, Zhang Jing, et al.Prior-information-based finite-frequency H∞ control for active double pantograph in high-speed railway[J]. IEEE Transactions on Vehicular Technology, 2017, 66(10): 8723-8733. |
|
|
|