This title appears in the Scientific Report :
2023
Please use the identifier:
http://dx.doi.org/10.34734/FZJ-2024-00909 in citations.
Please use the identifier: http://dx.doi.org/10.48550/ARXIV.2308.01674 in citations.
End-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear Model Predictive Control
End-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear Model Predictive Control
(Economic) nonlinear model predictive control ((e)NMPC) requires dynamic system models that are sufficiently accurate in all relevant state-space regions. These models must also be computationally cheap enough to ensure real-time tractability. Data-driven surrogate models for mechanistic models can...
Saved in:
Personal Name(s): | Mayfrank, Daniel |
---|---|
Mitsos, Alexander / Dahmen, Manuel (Corresponding author) | |
Contributing Institute: |
Modellierung von Energiesystemen; IEK-10 |
Imprint: |
arXiv
2023
|
DOI: |
10.34734/FZJ-2024-00909 |
DOI: |
10.48550/ARXIV.2308.01674 |
Document Type: |
Preprint |
Research Program: |
Helmholtz School for Data Science in Life, Earth and Energy (HDS LEE) Digitalization and Systems Technology for Flexibility Solutions |
Subject (ZB): | |
Link: |
OpenAccess |
Publikationsportal JuSER |
Please use the identifier: http://dx.doi.org/10.48550/ARXIV.2308.01674 in citations.
(Economic) nonlinear model predictive control ((e)NMPC) requires dynamic system models that are sufficiently accurate in all relevant state-space regions. These models must also be computationally cheap enough to ensure real-time tractability. Data-driven surrogate models for mechanistic models can be used to reduce the computational burden of (e)NMPC; however, such models are typically trained by system identification for maximum average prediction accuracy on simulation samples and perform suboptimally as part of actual (e)NMPC. We present a method for end-to-end reinforcement learning of dynamic surrogate models for optimal performance in (e)NMPC applications, resulting in predictive controllers that strike a favorable balance between control performance and computational demand. We validate our method on two applications derived from an established nonlinear continuous stirred-tank reactor model. We compare the controller performance to that of MPCs utilizing models trained by the prevailing maximum prediction accuracy paradigm, and model-free neural network controllers trained using reinforcement learning. We show that our method matches the performance of the model-free neural network controllers while consistently outperforming models derived from system identification. Additionally, we show that the MPC policies can react to changes in the control setting without retraining. |