This title appears in the Scientific Report :
2022
Please use the identifier:
http://dx.doi.org/10.34732/XDVBLG-QSBTYX in citations.
A mathematician's introduction to transformers and large language models
A mathematician's introduction to transformers and large language models
The field of Natural Language Processing (NLP) has been undergoing a revolution in recent years. Large-scale language models (LLMs), most notably a series of Generative Pre-trained Transformers (GPTs), exceeded all expectations in benchmark scenarios and real life applications such as text generatio...
Saved in:
Personal Name(s): | Penke, Carolin (Corresponding author) |
---|---|
Contributing Institute: |
Jülich Supercomputing Center; JSC |
Imprint: |
2022
|
DOI: |
10.34732/XDVBLG-QSBTYX |
Document Type: |
Communication |
Research Program: |
Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups |
Subject (ZB): | |
Publikationsportal JuSER |
The field of Natural Language Processing (NLP) has been undergoing a revolution in recent years. Large-scale language models (LLMs), most notably a series of Generative Pre-trained Transformers (GPTs), exceeded all expectations in benchmark scenarios and real life applications such as text generation, translation, question-answering and summarization. The engine of the NLP revolution is the so-called attention mechanism, which now allows to process longer sentences without 'forgetting' important words. This mechanism is implemented in form of a series of matrix products and lends itself to intense parallelization. The pre-training of transformers requires great computational resources and is one example of the increasing AI workload of large High Performance Computing (HPC) facilities. OpenGPT-X is a joint effort of 10 partners from science and industry to train and provide access to an open LLM based in Europe in order to guarantee digital and economic sovereignty. Within the project, the pre-training of the LLM is performed at the Jülich Supercomputing centre. This blog post aims to give an introduction to the state of current large language models, the OpenGPT-X project, and the transformer neural network architecture for people unfamiliar with the subject with a working knowledge of linear algebra. |