Analyzing Transformers and Implementing Multi-Head Attention

Overview

When I was doing my semester abroad at the University of Edinburgh, I worked on a course project about machine translation.
The task was to improve an existing baseline machine translation model, improving the BLEU score and other metrics.
To do so, I implemented and tested two approaches found in research. I implemented both the lexical attention model as described by Nguyen and Chiang (2017) and a multi-head attention mechanism as presented by Vaswani et al. (2017). I did this in PyTorch and also spent a lot of time analyzing and optimizing the training data.
As a result, the performance of the machine translation model improved considerably. I also learned how different model architectures can affect the training- and inference time.

Context

🗓️ Timeline: 01/2022 — 05/2022
🛠️ Project type: Course project in Natural Language Understanding, Generation and Machine Translation @ University of Edinburgh, UK

Technologies/Keywords

Python
PyTorch
Machine Translation (MT)
Transformers
Lexical attention
Multi-head attention
Paper implementation

Main Learnings

I learned to transform machine-learning papers into a working implementations. This requires understanding the papers in detail and transforming formulas and diagrams into code.
In addition, I learned to explore and extend an existing codebase and gained a quite deep understanding of transformers, multi-head attention, and multiple related topics (e.g., beam search).

Post date
June 30, 2022
Posted in Portfolio
Improved a machine translation system by implementing two attention mechanisms from research.
Tagged with AI/ML, University of Edinburgh