基于分类回放双延迟贝叶斯深度确定性策略梯度的燃料电池温度控制

doi:10.19595/j.cnki.1000-6753.tces.230699

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (20722 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要质子交换膜燃料电池（PEMFC）是一种难以精确建模的非线性系统,因此需要具有较强鲁棒性与高适应性的控制器来控制PEMFC电堆温度。该文提出一种基于深度强化学习的数据驱动控制器来控制电堆温度。考虑PEMFC系统的特点,包括其非线性、不确定性和环境条件的影响,提出一种新的深度强化学习算法,即分类回放双延迟贝叶斯深度确定性策略梯度（CTDB-DDPG）算法。该算法的设计引入贝叶斯神经网络、分类经验回放等技术,提高了控制器的性能。通过仿真结果与RT-Lab实验平台的结果表明,利用CTDB-DDPG算法的高适应性与强鲁棒性,所提算法可以更有效地控制 PEMFC电堆温度,具有一定的实际意义。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

关键词 ：燃料电池, 联合控制, 深度确定性, 贝叶斯网络

Abstract：Proton exchange membrane fuel cells (PEMFCs) have the characteristics of difficulty to model accurately and strong nonlinearity; in addition, the radiator and circulating water pump in the hydrothermal management system of the fuel cell system have the characteristics of strong coupling, which makes it difficult for the model-based control algorithms to achieve accurate control of the fuel cell temperature, this paper proposes a data-driven model-free algorithm based on the on classified replay twin delayed Bayesian deep deterministic policy gradient(CTDB-DDPG) to achieve the control of the fuel cell temperature system.
Firstly, the use of deep deterministic policy gradient is proposed to solve the problem of intricate modeling of fuel cells. Then, the classification experience playback strategy is added to the algorithm, and the CTDB-DDPG algorithm uses two experience buffer pools to store the experience data. When constructing the network model, the average TD error of all samples in these two experience buffer pools is initialized to 0. Whenever new experience data is generated, the average TD errors of all experience data are first updated. If its TD error exceeds the mean value, it is stored in the empirical buffer pool I. Otherwise, it is stored in the empirical buffer pool II. Classifying each experience sample's TD error helps better use the empirical data to train the network model. CTDB-DDPG considers the neural network's uncertainty by incorporating a Bayesian neural network into the algorithm, and the proposed Bootstrap with random initialization leads to a reasonable uncertainty estimation. At the beginning of each round or fixed interval during the learning process, unbiased hypotheses are obtained from the posterior distributions of the MDP parameters and estimated using a multi-head shared network Bootstrap value function, which does not require additional computational resources.
Moreover, using Q-learning preserves the uncertainty of the cumulative discount, which is more effective for environments requiring deep exploration. Randomly selecting the head network and simulating Thompson sampling can effectively avoid ineffective boosting of intelligence in the noise strategy, accelerating the convergence of the CTDB-DDPG algorithm. In addition, the fuel cell thermal management system has a large inertia; the algorithm in this paper adds OU noise to the action to improve the exploration efficiency.OU noise is a temporary correlation noise extracted from the Ornstein-Uhlenbeck process, which helps the algorithm to better explore different strategies by generating temporal correlation noise. This exploration process can help the algorithm to find possible better strategies, thus improving the performance and efficiency of the algorithm. Although the addition of noise can cause the algorithm's performance to deteriorate in the short term, in the long term, the addition of noise can help the algorithm to avoid falling into a local optimum. It may help to find a better strategy.
Finally, the algorithm's validity is verified on the simulation platform Simulink as well as the experimental platform RT-Lab, and similar conclusions are obtained, verifying the algorithm's effectiveness. However, although our CTDB-DDPG temperature control strategy has been validated on simulation and hardware-in-the-loop test platforms, more complex real-world working conditions, such as ambient temperature and humidity variations and equipment aging, will be considered in future studies to test and improve the adaptability and robustness of our algorithm in the broader range of more complex situations.

Key words： Fuel cell joint control deep reinforcement learning Bayesian network

收稿日期: 2023-05-17

PACS:	TM911.4
	U264

通讯作者: 潘思潮男,1998年生,硕士研究生,研究方向为燃料电池建模及温度控制。E-mail：pansc_ncepu@126.com

作者简介: 赵洪山男,1965年生,教授,博士生导师,研究方向为电力系统动态分析与控制、电力负荷预测、燃料电池热管理等。E-mail：zhaohshcn@126.com

引用本文:

赵洪山, 潘思潮, 马利波, 吴雨晨, 吕廷彦. 基于分类回放双延迟贝叶斯深度确定性策略梯度的燃料电池温度控制[J]. 电工技术学报, 2024, 39(13): 4240-4256. Zhao Hongshan, PanSichao, Ma Libo, Wu Yuchen, Lü Tingyan. Control of Fuel Cell Temperature Based on Classified Replay Twin Delayed Bayesian Deep Deterministic Policy Gradient. Transactions of China Electrotechnical Society, 2024, 39(13): 4240-4256.

链接本文:

https://dgjsxb.ces-transaction.com/CN/10.19595/j.cnki.1000-6753.tces.230699 https://dgjsxb.ces-transaction.com/CN/Y2024/V39/I13/4240

[1] 张雪霞, 黄平, 蒋宇, 等. 动态机车工况下质子交换膜燃料电池电堆衰退性能分析[J]. 电工技术学报, 2022, 37(18): 4798-4806.
Zhang Xuexia, Huang Ping, Jiang Yu, et al.Degradation performance analysis of proton exchange membrane fuel cell stack under dynamic locomotive conditions[J]. Transactions of China Electrotechnical Society, 2022, 37(18): 4798-4806.
[2] 唐钧涛, 戚志东, 裴进, 等. 基于电荷泵的燃料电池有源网络升压变换器[J]. 电工技术学报, 2022, 37(4): 905-917.
Tang Juntao, Qi Zhidong, Pei Jin, et al.An active network DC-DC Boost converter with a charge pump employed in fuel cells[J]. Transactions of China Electrotechnical Society, 2022, 37(4): 905-917.
[3] 马小勇, 王议锋, 王萍, 等. 燃料电池用交错并联型Boost变换器参数综合设计方法[J]. 电工技术学报, 2022, 37(2): 397-408.
Ma Xiaoyong, Wang Yifeng, Wang Ping, et al.Comprehensive parameter design method of interleaved Boost converter for fuel cell applications[J]. Transactions of China Electrotechnical Society, 2022, 37(2): 397-408.
[4] 高锋阳,高翾宇,张浩然等.全局与瞬时特性兼优的燃料电池有轨电车能量管理策略[J].电工技术学报, 2023, 38(21): 5923-5938.
Gao Fengyang, Gao Huayu, Zhang Haoran, et al.Management strategy for fuel cell trams with both global and transient characteristics[J]. Transactions of China Electrotechnical Society, 2023, 38(21): 5923-5938.
[5] 宋清超, 陈家伟, 蔡坤城, 等. 多电飞机用燃料电池-蓄电池-超级电容混合供电系统的高可靠动态功率分配技术[J]. 电工技术学报, 2022, 37(2): 445-458.
Song Qingchao, Chen Jiawei, Cai Kuncheng, et al.A highly reliable power allocation technology for the fuel cell-battery-supercapacitor hybrid power supply system of a more electric aircraft[J]. Transactions of China Electrotechnical Society, 2022, 37(2): 445-458.
[6] 任洲洋, 王皓, 李文沅, 等.基于氢能设备多状态模型的电氢区域综合能源系统可靠性评估[J]. 电工技术学报, 2023, 38(24): 6744-6759.
Ren Zhouyang, Wang Hao, Li Wenyuan, et al.Reliability evaluation of electricity-hydrogen regional integrated energy systems based on the multi-state models of hydrogen energy equipment[J]. Tran- sactions of China Electrotechnical Society, 2023, 38(24): 6744-6759.
[7] Wang Yulin, Xu Haokai, Zhang Zhe, et al.Lattice Boltzmann simulation of a gas diffusion layer with a gradient polytetrafluoroethylene distribution for a proton exchange membrane fuel cell[J]. Applied Energy, 2022, 320: 119248.
[8] Paul B, Andrews J.PEM unitised reversible/ regenerative hydrogen fuel cell systems: state of the art and technical challenges[J]. Renewable and Sustainable Energy Reviews, 2017, 79: 585-599.
[9] Derbeli M, Farhat M, Barambones O, et al.Control of proton exchange membrane fuel cell (PEMFC) power system using PI controller[C]//2017 International Conference on Green Energy Conversion Systems (GECS), Hammamet, Tunisia, 2017: 1-5.
[10] Hu Yunfeng, Zhang Chong, Gong Xun, et al.Design of a nonlinear dynamic output feedback controller based on a fixed-time RBF disturbance observer for a PEMFC air supply system[J]. Measurement, 2023, 211: 112683.
[11] You Zhiyu, Xu Tao, Liu Zhixiang, et al.Study on air-cooled self-humidifying PEMFC control method based on segmented predict negative feedback control[J]. Electrochimica Acta, 2014, 132: 389-396.
[12] 裴尧旺, 陈凤祥, 胡哲, 等. 基于自适应LQR控制的质子交换膜燃料电池热管理系统温度控制[J]. 吉林大学学报(工学版), 2022, 52(9): 2014-2024.
Pei Yaowang, Chen Fengxiang, Hu Zhe, et al.Temperature control of proton exchange membrane fuel cell thermal management system based on adaptive LQR control[J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(9): 2014-2024.
[13] 陈飞, 罗仁宏. 基于模型预测控制的水冷型燃料电池冷却系统研究[J]. 汽车技术, 2021(7): 8-13.
Chen Fei, Luo Renhong.Research on water-cooled fuel cell cooling system based on MPC[J]. Automobile Technology, 2021(7): 8-13.
[14] 刘欣, 郝晓弘, 杨新华, 等. 固体氧化物燃料电池系统的鲁棒反馈模型预测控制[J]. 系统工程理论与实践, 2015, 35(2): 521-527.
Liu Xin, Hao Xiaohong, Yang Xinhua, et al.Robust feedback model predictive control of the solid oxide fuel cell’s system[J]. Systems Engineering-Theory & Practice, 2015, 35(2): 521-527.
[15] 金红超, 何锋, 胡耀宗. 基于变论域模糊理论的PEMFC热管理系统控制研究[J]. 电子测量技术, 2022, 45(14): 23-28.
Jin Hongchao, He Feng, Hu Yaozong.Thermal management system control of PEMFC based on variable universe fuzzy theory[J]. Electronic Measurement Technology, 2022, 45(14): 23-28.
[16] Aly M, Rezk H.An improved fuzzy logic control-based MPPT method to enhance the performance of PEM fuel cell system[J]. Neural Computing and Applications, 2022, 34(6): 4555-4566.
[17] Wang Binrui, Jin Yinglian, Xu Hong, et al.Temperature control of PEM fuel cell stack application on robot using fuzzy incremental PID[C]// 2009 Chinese Control and Decision Conference, Guilin, China, 2009: 3293-3297.
[18] Ou Kai, Yuan Weiwei, Choi M, et al.Performance increase for an open-cathode PEM fuel cell with humidity and temperature control[J]. International Journal of Hydrogen Energy, 2017, 42(50): 29852-29862.
[19] Abbaspour A, Khalilnejad A, Chen Zheng.Robust adaptive neural network control for PEM fuel cell[J]. International Journal of Hydrogen Energy, 2016, 41(44): 20385-20395.
[20] 蒋利炜,何可人,陈航.基于PSO改进BP算法的直流电子负载PID控制仿真[J].计算机真, 2024, 41(01): 306-310.
Jiang Liwei, He Keren, Chen Hang.Simulation of DC electronic load PID control based on PSO improved BP algorithm[J]. Computer Simulation, 2024, 41(1): 306-310.
[21] Wang Fucheng, Ko C C.Multivariable robust PID control for a PEMFC system[J]. International Journal of Hydrogen Energy, 2010, 35(19): 10437-10445.
[22] 侯荣福, 杨君, 于蓬, 等. 基于模糊自抗扰的质子交换膜燃料电池温度控制[J]. 山东工业技术, 2022(6): 16-23.
Hou Rongfu, Yang Jun, Yu Peng, et al.Temperature control of proton exchange membrane fuel cell based on fuzzy active disturbance rejection[J]. Journal of Shandong Industrial Technology, 2022(6): 16-23.
[23] Sun Li, Li Guanru, Hua Q S, et al.A hybrid paradigm combining model-based and data-driven methods for fuel cell stack cooling control[J]. Renewable Energy, 2020, 147: 1642-1652.
[24] Yu Yang, Chen Ming, Zaman S, et al.Thermal management system for liquid-cooling PEMFC stack: from primary configuration to system control strategy[J]. eTransportation, 2022, 12: 100165.
[25] Cho Y, Hwang G, Gbadago D Q, et al.Artificial neural network-based model predictive control for optimal operating conditions in proton exchange membrane fuel cells[J]. Journal of Cleaner Production, 2022, 380: 135049.
[26] Han J, Yu S, Yi Sun.Advanced thermal management of automotive fuel cells using a model reference adaptive control algorithm[J]. International Journal of Hydrogen Energy, 2017, 42(7): 4328-4341.
[27] Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actor-critic methods [J/OL]. ArXiv, 2018: 1802.09477. http://arxiv.org/ abs/1802.09477
[28] Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J/OL]. ArXiv, 2017: 1707.06347. http://arxiv.org/abs/1707.06347.pdf.
[29] van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-Learning[C]//Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, Arizona, UAS, 2016: 2094-2100.
[30] Auer P, Cesa-Bianchi N, Fischer P.Finite-time analysis of the multiarmed bandit problem[J]. Machine Learning, 2002, 47(2): 235-256.
[31] Davison A C, Hinkley D V.Bootstrap methods and their application[M]. Cambridge: Cambridge University Press, 1997.
[32] Osband I, Van Roy B. Bootstrapped Thompson sampling and deep exploration[J/OL]. ArXiv: 2015: 1507.00300. http://arxiv.org/abs/1507.00300.pdf
[33] Uhlenbeck G E, Ornstein L S.On the theory of the Brownian motion[J]. Physical Review, 1930, 36(5): 823-841.
[34] 仇俊政, 赵红, 牟亮, 等. 基于粒子群PID的质子交换膜燃料电池温度控制[J]. 制造业自动化, 2022, 44(8): 98-101.
Qiu Junzheng, Zhao Hong, Mu Liang, et al.Temperature control of proton exchange membrane fuel cell based on particle swarm optimization PID[J]. Manufacturing Automation, 2022, 44(8): 98-101.
[35] Marsala G, Ragusa A.Increase of the performance of a low ripple boost converter for PEM FC applications using GA and PSO algorithms[C]//2012 IEEE Vehicle Power and Propulsion Conference, Seoul, Korea (South), 2012: 908-913.
[36] Zhao Dongdong, Li Fei, Ma Rui, et al.An unknown input nonlinear observer based fractional order PID control of fuel cell air supply system[J]. IEEE Transactions on Industry Applications, 2020, 56(5): 5523-5532.