Machine Translation Post-Editing

Industrial organisations involved
Luisa Tõlkebüroo OÜ is the biggest translation agency in Estonia. The company offers more than 50 services – including sworn translation, simultaneous and consecutive interpretation, layout work, machine translation and post-editing, subtitling and localisation.
Technical/scientific challenge
The company needed a custom-made machine translation system to reduce the time of translations. As the company had no previous experience neither in natural language processing nor in machine learning, they collaborated with the TartuNLP team.
Solution
Training of the machine translation model was conducted by using University of Tartu HPC centre’s Rocket cluster. The company needed a custom-made machine translation system to reduce the time of translations. As the company had no previous experience neither in natural language processing nor in machine learning, they collaborated with the TartuNLP team. Training of the machine translation model was conducted by using University of Tartu HPC centre’s Rocket cluster.
Business impact
Thanks to rapid advances in the technology and extensive translation memory, the company is able to offer machine translations with post-editing in a range of language combinations and on a range of topics.
Benefits
- The innovative translation tool helps to save valuable time and human resources
- Creation of high quality reference dataset for future developments.
- Innovative application of deep learning techniques in cloud masking.
An accurate AI-based Cloud Mask Processor for Sentinel-2

Industrial organisations involved
KappaZeta is a science-driven remote sensing company aiming to make space a valuable asset for everyone. KappaZeta’s expertise is in using SAR (radar) satellite data, incorporating it with optical satellite data and providing some of the most accurate AI models on the market. The key area of focus is agriculture.
Technical/scientific challenge
Cloud masking is an essential step for the pre-processing of optical satellite imagery. KappaZeta addresses the problem by introducing KappaMask, an AI-based cloud and cloud shadow masking processor for Sentinel-2, which carries an optical instrument payload that samples 13 spectral bands. As a cloud detector, KappaMask uses a large convolutional segmentation model. Faster model convergence during training can be achieved by using larger batch sizes of the training data, which means more GPU memory is needed. Additionally, faster CPUs are required for shorter data loading times to increase the training speed even further.
Solution
KappaMask was trained on an open-source dataset and fine-tuned on a Northern European terrestrial dataset which was labelled manually using the active learning methodology. The training was performed on the University of Tartu’s HPC Centres’ high-performance compute nodes. Powerful GPUs and CPUs were applied to substantially speed up the training of the model.
Business impact
KappaMask is an open source project. All the results, final software and source code will be freely and openly distributed in GitHub. Openness and accessibility of the software should directly translate into greater usage.
Benefits
- Reliable cloud mask processor for Northern Europe region, which is compatible with ESA Sentinel-2 L2 processing chain.
- Creation of high quality reference dataset for future developments.
- Innovative application of deep learning techniques in cloud masking.
Self-driving technology for a Level 4 autonomous car

Industrial organisations involved
Bolt is an Estonian mobility company that offers vehicle for hire, micromobility, car-sharing, and food delivery services headquartered in Tallinn and operating in over 400 cities in over 45 countries. In partnership with the University of Tartu, the company developes self-driving technology for a Level 4 autonomous car.
Technical/scientific challenge
Autonomous cars acquire up to 357 GB/hour of data during test drives. Autonomous car engineers needed a system to store and easily access those test logs.
Solution
Acquired test logs are copied to HPC storage, into appropriately guarded directory. Regularly cron job processes those log files into metadata stored in MongoDB database. Processing is distributed over cluster and happens in parallel. Longest logs can take up to 24 hours to process, so processing them sequentially would be very time-consuming. On top of MongoDB sits custom-made application that allows filtering of test sessions and browsing them using Webviz visualization tool. Visualization tool accesses the raw sensor data from HPC storage.
Business impact
With the growing demand for ride-hailing services, autonomous vehicle technology will provide a solution for transportation problems on an increasingly broader scale.
Benefits
- Custom database application and visualization tool enables easy analysis of the logs
- Thanks to distributed processing in the cluster the metadata about the drives usually shows up already next morning
- Thanks to petabytes of storage at the HPC Centre, the company can keep all the data they need
Large collections of European HPC success stories are available on the FF4EuroHPC and EuroCC webpages.