Description: Towards Neural Architectures for Large Scale Continual Learning

This title appears in the Scientific Report : 2019

Towards Neural Architectures for Large Scale Continual Learning

In the recent years, artificial intelligence and machine learning have witnessed a radical transformation driven by the methods based on deep neural networks. Deep neural networks span a broad class of learning algorithms that are responsible for several remarkable breakthroughs in such different, h...

Personal Name(s):	Jitsev, Jenia (Corresponding author)
Contributing Institute:	Jülich Supercomputing Center; JSC
Imprint:	2019
Conference:	Workshop on Brain-Inspired Computing - Towards the Digital Transformation of Neuroscience by HPC, Cetraro (Italy), 2019-07-15 - 2019-07-19
Document Type:	Talk (non-conference)
Research Program:	Data-Intensive Science and Federated Computing
	Publikationsportal JuSER

In the recent years, artificial intelligence and machine learning have witnessed a radical transformation driven by the methods based on deep neural networks. Deep neural networks span a broad class of learning algorithms that are responsible for several remarkable breakthroughs in such different, hard to solve domains as computer vision, natural language understanding and complex closed-loop control. In only few years, deep learning surpassed the previous state-of-the-art methods by margins that usually require many decades of research, demonstrated performance comparable to or even better than human expert level on some tasks like recognizing objects from complex natural images, playing game GO or medical diagnostics, and gained rapid widespread in various technological applications in the industry.Still, some fundamental restrictions in the capability of deep artificial neural networks to learn clearly remain, as compared to their biological counterparts. To train a neural network, data sets related to the task have to be carefully gathered and well prepared in advance, training procedure itself has to be carefully supervised and tuned. Once the network is able to cope with the task successfully, training is considered to be finished and network has to be frozen to avoid degradation and decay of the obtained skills if exposed to data other than the well-prepared training set - a phenomenon widely known as catastrophic forgetting. If confronted with multitude of different tasks to learn during training, performance for each single task is severly compromised or learning breaks down completely. In this talk I will outline the opportunities and challenges to remove those restrictions and progress towards neural architectures that are capable of continual learning. Continual learning posits set of abilities to receive streams of incoming, unlabeled data without any clear task boundaries and digest them into a progressively growing generic model. Using this generic model, network should be able to deal with variety of multiple tasks and diversity of specific domains without any additional external supervision or necessity to freeze or otherwise manually tune learning, showing increasingly better learning performance across tasks and domains as learning progresses (learning to learn). As learning of such a versatile model will require generation of highly variable, large data amounts and also put high computational demand on training of networks, HPC facililites will become indispensable for growing such general AI.