|
|
|
| Context-Based Deep Meta Reinforcement Learning for Active Pantograph Control in High-Speed Railway |
| Wang Hui1, Peng Yuxiang1, Chu Wenping1,2, Song Yang1,3, Liu Zhigang1 |
1. School of Electrical Engineering Southwest Jiaotong University Chengdu 611756 China; 2. China Railway Construction Electrification Bureau Group Co. Ltd Beijing 100043 China; 3. Leeds-SWJTU Joint School Southwest Jiaotong University Chengdu 611756 China |
|
|
|
Abstract Active pantograph control is the most promising technique for reducing contact force (CF) fluctuation and improving the train's current collection quality. The train's high-speed operation causes wave propagation and nonlinearity dynamics, making it challenging to maintain a suitable and stable contact force. Scholars have proposed numerous control strategies for the PCS in recent years, including proportional-integral-derivative (PID) control, sliding mode control, feedback control, optimal control, robust control, etc. Thesecontrol strategies often achieve good results on single simulation scenes. Existing solutions, however, suffer from three significant limitations: (1) they are incapable of dealing with the various pantograph types, catenary line operating conditions, changing operating speeds, and contingencies well. (2) It is challenging to implement in practical systems due to the lack of rapid adaptability to new PCS operating conditions and environmental disturbances; (3) It is particularly difficult to characterize the sensor accuracy, actuator uncertainty, railway line parameters, and external excitations because all of these factors can drift over time. The high-speed railway systems with widely varying operating conditions will increase uncertainty. In this paper, we improve the current collection quality by developing and applying a context-based deep meta-reinforcement learning (CB-DMRL) approach to learn and finetune the control strategy. It combines improved distributed soft actor-critic algorithm with environment-sensitive task encoder to train a meta-policy, which can quickly adapt to different PCS operating conditions and environmental disturbances. Firstly, an improved distributed soft actor-critic algorithm, including distributed state-action value function and dual-value distribution learning, is proposed to solve the overestimation problem in value estimation and stabilize the training process. Secondly, the proposed method allows the environment-sensitive task encoder and well-trained agent to adapt to new tasks quickly and efficiently, even in unseen tasks and non-stationary environments. Finally, a validated non-linear pantograph-catenary system model is established based on the finite element and multi-body dynamics theory as the simulation environment in DRL. The state space, action space, and reward in are redesigned to train the meta-agent. We evaluated the CB-DMRL algorithm's performance on a proven PCS model and active pantograph HIL experiment platform. The experimental results demonstrate that meta-training DRL policies with latent space swiftly adapt to new operating conditions and unknown perturbations. The meta-agent adapts quickly after two iterations with a high reward, which require only 10 steps. The standard deviation of contact force is reduced by 14.71%, 16.93%, 21.22%, and 35.69% at 320 km/h, 340 km/h, 360 km/h, and 380 km/h respectively. The faster the train runs, the better the control effect is. Because the high-speed train has caused strong pantograph-catenary system vibration, our method can effectively improve the current collection quality. Even in unknown scenarios, the proposed approach can adapt the well-trained behavior policy to the new task using environment-sensitive task encoder. Given the rapidly changing PCS operating conditions and unknown environmental disturbances, we believe this research is a significant advance in applying DRL to PCS control. The following conclusions can be drawn from the simulation analysis: (1) In the algorithm components of this paper, the improved distributed SAC algorithm can effectively improve the value estimation accuracy and stabilize the training process to solve the problem of exploding and vanishing gradients. The task encoder sensitive to environmental changes can quickly generate the optimal task encoding based on the context of the interaction sample. (2) Compared with the most advanced pantograph active control method, the method proposed in this paper achieves the lowest contact force variance and can quickly adapt to new control tasks and environmental disturbances. The deep meta-reinforcement learning algorithm used in this paper has achieved good results in multi-scenario pantograph active control. However, how to scientifically and effectively update the meta-strategy parameters in limited computing power scenarios such as running vehicles needs to be explored in the future.
|
|
Received: 15 January 2025
|
|
|
|
|
|
[1] Wang Hongrui, Liu Zhigang, Song Yang, et al.Detection of contact wire irregularities using a quadratictime-frequency representation of the pantograph-catenary contact force[J]. IEEE Transactions on Instru-mentation and Measurement, 2016, 65(6): 1385-1397. [2] 赵娜, 韦晓广, 高仕斌. 基于可靠性和维修成本的高铁接触网维修策略优化[J]. 电气化铁道, 2024, 35(6): 1-6. Zhao Na, Wei Xiaoguang, Gao Shibin.Optimization of maintenance strategy of OCS of high-speed railway based on reliability and maintenance costs[J]. Electric Railway, 2024, 35(6): 1-6. [3] 程肥肥. 高速受电弓结构参数设计优化研究[D]. 成都: 西南交通大学, 2020. Cheng Feifei.Research on structure parameters design optimization of high speedpantograph[D]. Chengdu: Southwest Jiaotong University, 2020. [4] 谢松霖, 张静, 宋宝林, 等. 计及作动器时滞的高速铁路受电弓最优控制[J]. 电工技术学报, 2022, 37(2): 505-514.为本文关键参考文献,前人研究基础 Xie Songlin, Zhang Jing, Song Baolin, et al.Optimal control of pantograph for high-speed railway considering actuator time delay[J]. Transactions of China Electrotechnical Society, 2022, 37(2): 505-514. [5] 吴延波, 韩志伟, 王惠, 等. 基于双延迟深度确定性策略梯度的受电弓主动控制[J]. 电工技术学报, 2024, 39(14): 4547-4556. Wu Yanbo, Han Zhiwei, Wang Hui, et al.Active pantograph control of deep reinforcement learning based on double delay depth deterministic strategy gradient[J]. Transactions of China Electrotechnical Society, 2024, 39(14): 4547-4556. [6] Song Yongduan, Li Luyuan.Robust adaptive contact force control of pantograph-catenary system: an accelerated output feedback approach[J]. IEEE Transactions on Industrial Electronics, 2021, 68(8): 7391-7399. [7] Duan Huayu, Dixon R, Stewart E.A disturbance observer based lumped-mass catenary model for active pantograph design and validation[J]. Vehicle System Dynamics, 2023, 61(6): 1565-1582. [8] Song Yang, Ouyang Huajiang, Liu Zhigang, et al.Active control of contact force for high-speed railway pantograph-catenary based on multi-body pantograph model[J]. Mechanism and Machine Theory, 2017, 115: 35-59. [9] Lu Xiaobing, Liu Zhigang, Zhang Jing, et al.Prior-information-based finite-frequency control for active double pantograph in high-speed railway[J]. IEEE Transactions on Vehicular Technology, 2017, 66(10): 8723-8733. [10] 张静, 宋宝林, 谢松霖, 等. 基于状态估计的高速受电弓鲁棒预测控制[J]. 本文关键参考文献,前人研究基础电工技术学报, 2021, 36(5): 1075-1083. Zhang Jing, Song Baolin, Xie Songlin, et al.Robust predictive control of high-speed pantograph based on state estimation[J]. Transactions of China Electro-technical Society, 2021, 36(5): 1075-1083 [11] Zhang Jing, Zhang Hantao, Song Baolin, et al.A new active control strategy for pantograph in high-speed electrified railways based on multi-objective robust control[J]. IEEE Access, 2019, 7: 173719-173730. [12] Schirrer A, Aschauer G, Talic E, et al.Catenary emulation for hardware-in-the-loop pantograph testing with a model predictive energy-conserving control algorithm[J]. Mechatronics, 2017, 41: 17-28. [13] Yu Pan, Liu Kangzhi, Li Xiaoli, et al.Robust control of pantograph-catenary system: comparison of 1-DOF-based and 2-DOF-based control systems[J]. IET Control Theory & Applications, 2021, 15(18): 2258-2270. [14] 杨鹏, 张静, 金伟, 等. 考虑气动系统的高速受电弓分层控制[J]. 本文关键参考文献,前人研究基础电工技术学报, 2022, 37(10): 2644-2655. Yang Peng, Zhang Jing, Jin Wei, et al.Hierarchical control of high-speed pantograph considering pneumatic system[J]. Transactions of China Electrotechnical Society, 2022, 37(10): 2644-2655. [15] 刘建伟, 高峰, 罗雄麟. 基于值函数和策略梯度的深度强化学习综述[J]. 计算机学报, 2019, 42(6): 1406-1438. Liu Jianwei, Gao Feng, Luo Xionglin.Survey of deep reinforcement learning based on value function and policy gradient[J]. Chinese Journal of Computers, 2019, 42(6): 1406-1438. [16] 陈荣亮, 梁海燕, 刘艺涛. 基于人工神经网络的差模EMI滤波器插入损耗预测[J]. 电源学报, 2024, 22(5): 67-73. Chen Rongliang, Liang Haiyan, Liu Yitao.Insertion loss prediction of differential-mode EMI filter based on artificial neural networks[J]. Journal of Power Supply, 2024, 22(5): 67-73. [17] Zhang Hailong, Peng Jiankun, Tan Huachun, et al.A deep reinforcement learning-based energy management framework with Lagrangian relaxation for plug-in hybrid electric vehicle[J]. IEEE Transactions on Transportation Electrification, 2021, 7(3): 1146-1160. [18] Haarnoja T, Zhou A, Abbeel P, et al.Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International Conference on Machine Learning, PMLR, Stockholm, Sweden, 2018: 1861-1870. [19] Duan Jingliang, Guan Yang, Li S E, et al.Distributional soft actor-critic: off-policy reinforcement learning for addressing value estimation errors[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(11): 6584-6598. [20] Finn C, Abbeel P, Levine S.Model-agnostic meta-learning for fast adaptation of deep networks[C]// International Conference on Machine Learning, PMLR, Sydney, Australia, 2017: 1126-1135. [21] Rakelly K, Zhou A, Finn C, et al.Efficient off-policy meta-reinforcement learning via probabilistic context variables[C]//International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 2019: 5331-5340. [22] Mnih V, Kavukcuoglu K, Silver D, et al.Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. [23] Van Hasselt H, Guez A, Silver D.Deep reinforcement learning with double Q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 2016, 2094-2100. [24] CENELEC. Railway applications-current collection systems-validation of simulation of the dynamic interaction between pantograph and overhead contact line: BS EN 50318: 2018+A1: 2022[S]. London: BSI Standards Limited, 2022. [25] 王殿元, 赵兴东, 豆飞, 等. 基于深度Q网络的城市轨道交通协同限流方法[J]. 都市快轨交通, 2024, 37(3): 97-102. Wang Dianyuan, Zhao Xingdong, Dou Fei, et al.Cooperative passenger flow control method for urban rail transit utilizing deep Q-network[J]. Urban Rapid Rail Transit, 2024, 37(3): 97-102 [26] Fujimoto S, Hoof H, Meger D.Addressing function approximation error in actor-critic methods[C]// International conference on machine learning, PMLR, Stockholm, Sweden, 2018: 1587-1596. [27] 张华强, 牟晨东, 赵玫, 等. 基于强化学习的多光储虚拟同步机频率协调控制策略[J]. 电气传动, 2021, 51(19): 36-42. Zhang Huaqiang, Mu Chendong, Zhao Mei, et al.Frequency coordination control strategy of multiple photovoltaic-battery virtual synchronous generators based on reinforcement learning[J]. Electric Drive, 2021, 51(19): 36-42. [28] Akiba T, Sano S, Yanase T, et al.Optuna: a next-generation hyperparameter optimization framework[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019: 2623-2631. [29] 宋洋. 环境风下高速铁路弓网动态受流特性研究[D]. 成都: 西南交通大学, 2018. Song Yang.Study on high-speed railway pantograph-catenary current collection quality under environmental wind load[D]. Chengdu: Southwest Jiaotong University, 2018. [30] 赵洪山, 钱亚楠, 李西备, 等. 基于振动信号特征预测与张量融合的真空断路器机械性能退化动态评估方法[J/OL]. 电工技术学报, 2025: 1-14. (2025-10-17). https://doi/10.19595/j.cnki.1000-6753.tces.250438. Zhao Hongshan, QianYanan, Li Xibei, et al. Dynamic evaluation method of mechanical performance degradation of vacuum circuit breaker based on vibration signal feature prediction and tensor fusion [J/OL]. Transactions of China Electrotechnical Society, 2025: 1-14. (2025-10-17). https://link.cnki.net/doi/10.19595/j.cnki.1000-6753.tces.250438. [31] 刘阳, 姚宇昊, 裴川东, 等. 基于模糊融合的锂电池深度强化学习充电策略[J]. 电力电子技术, 2025, 59(12): 94-102. Liu Yang, Yao Yuhao, Pei Chuandong, et al.Fuzzy fusion-based deep reinforcement learning charging strategy for lithium-ion batteries[J]. Power Electronics, 2025, 59(12): 94-102. [32] 张振龙, 聂达文, 张新生, 等. 融合特征筛选与信号分解协同优化的xLSTM-Informer风电功率预测研究[J/OL]. 电工技术学报, 2025: 1-14. (2025-10-17). https://doi/10.19595/j.cnki.1000-6753.tces.251380. Zhang Zhenlong, Nie Dawen, Zhang Xinsheng, et al. Research on xLSTM-Informer wind power prediction with synergistic optimization of feature selection and signal decomposition[J/OL]. Transactions of China Electrotechnical Society, 2025: 1-14. (2025-10-17). https://link.cnki.net/doi/10.19595/j.cnki.1000-6753.tces.251380. [33] 彭春华, 张浩旗, 孙惠娟, 等. 基于延迟感知多智能体深度强化学习的多光储直柔系统优化调度[J/OL]. 电工技术学报, 2025: 1-12. (2025-10-10). https://doi/10.19595/j.cnki.1000-6753.tces.250960. Peng Chunhua, Zhang Haoqi, Sun Huijuan, et al. Optimized scheduling of multi photovoltaics and energy storage integrated flexible direct current distribution systems based on delay-aware multi-agent deep reinforcement learning[J/OL]. Transactions of China Electrotechnical Society, 2025: 1-12. (2025-10-10). https://doi/10.19595/j.cnki.1000-6753.tces.250960. [34] 王允祥, 刘友波, 廖红兵, 等. 基于双层强化学习的有源配电网中低压协同趋优运行策略[J]. 电力系统自动化, 2025, 49(24): 41-50. Wang Yunxiang, Liu Youbo, Liao Hongbing, et al.Medium-and low-voltage collaborative optimal operation strategy in active distribution network based on double-layer reinforcement learning[J]. Automation of Electric Power Systems, 2025, 49(24): 41-50. [35] 彭自然, 王顺豪, 肖伸平, 等. 基于KA Informer的电动汽车动力电池荷电状态和健康状态估算[J]. 电工技术学报, 2025, 40(19): 6378-6394. Peng Ziran, Wang Shunhao, Xiao Shenping, et al.State of charge and state of health estimation of electric vehicle power battery based on KA Informer model[J]. Transactions of China Electrotechnical Society, 2025, 40(19): 6378-6394. [36] 李寅生, 王冰, 陈玉全, 等. 基于积分强化学习的构网型VSC综合频率控制[J/OL]. 电工技术学报, 2025: 1-15. (2025-09-28). https://link.cnki.net/doi/10.19595/j.cnki.1000-6753.tces.250670. Li Yinsheng, Wang Bing, Chen Yuquan, et al. Integrated frequency control for grid-forming VSC based on integral reinforcement learning[J/OL]. Transactions of China Electrotechnical Society, 2025: 1-15. (2025-09-28). https://doi/10.19595/j.cnki.1000-6753.tces.250670. [37] 刘润龙, 李亦言, 周正昊, 等. 基于扩散模型考虑用户行为的电动汽车充电场景生成[J/OL]. 电力系统自动化, 2025: 1-17. (2025-10-22). https://kcms/detail/32.1180.TP.20251021.1434.004.html. Liu Runlong, Li Yiyan, Zhou Zhenghao, et al. Diffusion model-based electric vehicle charging scenario generation considering user behavior[J/OL]. Automation of Electric Power Systems, 2025: 1-17. (2025-10-22). https://kcms/detail/32.1180.TP.20251021.1434.004.html. [38] 罗培恩, 尹忠刚, 原东昇, 等. 基于自适应和弦变换旋转蒸发策略的电机轴承未知故障诊断[J]. 电工技术学报, 2026, 41(2): 499-451. Luo Peien, Yin Zhonggang, Yuan Dongsheng, et al.Unknown fault diagnosis of motor bearings based on adaptive chord transformation rotation evaporation strategy[J]. Transactions of China Electrotechnical Society, 2026, 41(2): 499-451. |
|
|
|