Machine Translation Post-Editing

Industrial organisations involved

Luisa Tõlkebüroo OÜ is the biggest translation agency in Estonia. The company offers more than 50 services – including sworn translation, simultaneous and consecutive interpretation, layout work, machine translation and post-editing, subtitling and localisation.

Technical/scientific challenge

The company needed a custom-made machine translation system to reduce the time of translations. As the company had no previous experience neither in natural language processing nor in machine learning, they collaborated with the TartuNLP team.

Solution

Training of the machine translation model was conducted by using University of Tartu HPC centre’s Rocket cluster. The company needed a custom-made machine translation system to reduce the time of translations. As the company had no previous experience neither in natural language processing nor in machine learning, they collaborated with the TartuNLP team. Training of the machine translation model was conducted by using University of Tartu HPC centre’s Rocket cluster.

Business impact

Thanks to rapid advances in the technology and extensive translation memory, the company is able to offer machine translations with post-editing in a range of language combinations and on a range of topics.

Benefits

An accurate AI-based Cloud Mask Processor for Sentinel-2

Industrial organisations involved

KappaZeta is a science-driven remote sensing company aiming to make space a valuable asset for everyone. KappaZeta’s expertise is in using SAR (radar) satellite data, incorporating it with optical satellite data and providing some of the most accurate AI models on the market. The key area of focus is agriculture.

Technical/scientific challenge

Cloud masking is an essential step for the pre-processing of optical satellite imagery. KappaZeta addresses the problem by introducing KappaMask, an AI-based cloud and cloud shadow masking processor for Sentinel-2, which carries an optical instrument payload that samples 13 spectral bands. As a cloud detector, KappaMask uses a large convolutional segmentation model. Faster model convergence during training can be achieved by using larger batch sizes of the training data, which means more GPU memory is needed. Additionally, faster CPUs are required for shorter data loading times to increase the training speed even further.

Solution

KappaMask was trained on an open-source dataset and fine-tuned on a Northern European terrestrial dataset which was labelled manually using the active learning methodology. The training was performed on the University of Tartu’s HPC Centres’ high-performance compute nodes. Powerful GPUs and CPUs were applied to substantially speed up the training of the model.

Business impact

KappaMask is an open source project. All the results, final software and source code will be freely and openly distributed in GitHub. Openness and accessibility of the software should directly translate into greater usage.

Benefits

Self-driving technology for a Level 4 autonomous car

Industrial organisations involved

Bolt is an Estonian mobility company that offers vehicle for hire, micromobility, car-sharing, and food delivery services headquartered in Tallinn and operating in over 400 cities in over 45 countries. In partnership with the University of Tartu, the company developes self-driving technology for a Level 4 autonomous car.

Technical/scientific challenge

Autonomous cars acquire up to 357 GB/hour of data during test drives. Autonomous car engineers needed a system to store and easily access those test logs.

Solution

Acquired test logs are copied to HPC storage, into appropriately guarded directory. Regularly cron job processes those log files into metadata stored in MongoDB database. Processing is distributed over cluster and happens in parallel. Longest logs can take up to 24 hours to process, so processing them sequentially would be very time-consuming. On top of MongoDB sits custom-made application that allows filtering of test sessions and browsing them using Webviz visualization tool. Visualization tool accesses the raw sensor data from HPC storage.

Business impact

With the growing demand for ride-hailing services, autonomous vehicle technology will provide a solution for transportation problems on an increasingly broader scale.

Benefits

Large collections of European HPC success stories are available on the FF4EuroHPC and EuroCC webpages.