Description: Ensemble Kalman Filter Optimizing Deep Neural Networks: An Alternative Approach to Non-performing Gradient Descent

This title appears in the Scientific Report : 2020

Ensemble Kalman Filter Optimizing Deep Neural Networks: An Alternative Approach to Non-performing Gradient Descent

The successful training of deep neural networks is dependent on initialization schemes and choice of activation functions. Non-optimally chosen parameter settings lead to the known problem of exploding or vanishing gradients. This issue occurs when gradient descent and backpropagation are applied. F...

Personal Name(s):	Yegenoglu, Alper (Corresponding author)
	Krajsek, Kai / Diaz, Sandra / Herty, Michael
Contributing Institute:	Jülich Supercomputing Center; JSC
Published in:	Machine Learning, Optimization, and Data Science
Imprint:	Cham Springer 2020
Physical Description:	78-92
DOI:	10.1007/978-3-030-64580-9_7
Conference:	The Sixth International Conference on Machine Learning, Optimization, and Data Science, Siena (Italy), 2020-07-19 - 2020-07-22
Document Type:	Contribution to a book Contribution to a conference proceedings
Research Program:	Helmholtz School for Data Science in Life, Earth and Energy (HDS LEE) SimLab Neuroscience Center for Simulation and Data Science (CSD) - School for Simulation and Data Science (SSD) Supercomputing and Modelling for the Human Brain Computational Science and Mathematical Methods Helmholtz Analytics Framework Doktorand ohne besondere Förderung
Edition:	5th ed.
Series Title:	Lecture Notes in Computer Science 12566
Link:	OpenAccess
	Publikationsportal JuSER

Please use the identifier: http://dx.doi.org/10.1007/978-3-030-64580-9_7 in citations.
Please use the identifier: http://hdl.handle.net/2128/26777 in citations.

The successful training of deep neural networks is dependent on initialization schemes and choice of activation functions. Non-optimally chosen parameter settings lead to the known problem of exploding or vanishing gradients. This issue occurs when gradient descent and backpropagation are applied. For this setting the Ensemble Kalman Filter (EnKF) can be used as an alternative optimizer when training neural networks. The EnKF does not require the explicit calculation of gradients or adjoints and we show this resolves the exploding and vanishing gradient problem. We analyze different parameter initializations, propose a dynamic change in ensembles and compare results to established methods.