Description: A mathematician's introduction to transformers and large language models

This title appears in the Scientific Report : 2022

A mathematician's introduction to transformers and large language models

The field of Natural Language Processing (NLP) has been undergoing a revolution in recent years. Large-scale language models (LLMs), most notably a series of Generative Pre-trained Transformers (GPTs), exceeded all expectations in benchmark scenarios and real life applications such as text generatio...

Personal Name(s):	Penke, Carolin (Corresponding author)
Contributing Institute:	Jülich Supercomputing Center; JSC
Imprint:	2022
DOI:	10.34732/XDVBLG-QSBTYX
Document Type:	Communication
Research Program:	Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups
Subject (ZB):	Workshop OpenGPTX
	Publikationsportal JuSER

Please use the identifier: http://dx.doi.org/10.34732/XDVBLG-QSBTYX in citations.

The field of Natural Language Processing (NLP) has been undergoing a revolution in recent years. Large-scale language models (LLMs), most notably a series of Generative Pre-trained Transformers (GPTs), exceeded all expectations in benchmark scenarios and real life applications such as text generation, translation, question-answering and summarization. The engine of the NLP revolution is the so-called attention mechanism, which now allows to process longer sentences without 'forgetting' important words. This mechanism is implemented in form of a series of matrix products and lends itself to intense parallelization. The pre-training of transformers requires great computational resources and is one example of the increasing AI workload of large High Performance Computing (HPC) facilities. OpenGPT-X is a joint effort of 10 partners from science and industry to train and provide access to an open LLM based in Europe in order to guarantee digital and economic sovereignty. Within the project, the pre-training of the LLM is performed at the Jülich Supercomputing centre. This blog post aims to give an introduction to the state of current large language models, the OpenGPT-X project, and the transformer neural network architecture for people unfamiliar with the subject with a working knowledge of linear algebra.