Overview
- When I was doing my semester abroad at the University of Edinburgh, I worked on a course project about machine translation.
- The task was to improve an existing baseline machine translation model, improving the BLEU score and other metrics.
- To do so, I implemented and tested two approaches found in research. I implemented both the lexical attention model as described by Nguyen and Chiang (2017) and a multi-head attention mechanism as presented by Vaswani et al. (2017). I did this in PyTorch and also spent a lot of time analyzing and optimizing the training data.
- As a result, the performance of the machine translation model improved considerably. I also learned how different model architectures can affect the training- and inference time.
Context
- 🗓️ Timeline: 01/2022 — 05/2022
- 🛠️ Project type: Course project in Natural Language Understanding, Generation and Machine Translation @ University of Edinburgh, UK
Technologies/Keywords
- Python
- PyTorch
- Machine Translation (MT)
- Transformers
- Lexical attention
- Multi-head attention
- Paper implementation
Main Learnings
- I learned to transform machine-learning papers into a working implementations. This requires understanding the papers in detail and transforming formulas and diagrams into code.
- In addition, I learned to explore and extend an existing codebase and gained a quite deep understanding of transformers, multi-head attention, and multiple related topics (e.g., beam search).