Success stories – Eurocc-Estonia

What is the impact of high-performance computing (HPC) on companies and business?

On this page we introduce Estonian companies and academic users who have already benefited from the use of HPC.

Larger collections of inspiring HPC success stories from Europe are available on the FF4EuroHPC and EuroCC webpages.

A DNA test to detect counterfeit honey

Industrial organisations involved

Celvia is an Estonian company and research institution owned by the University of Tartu, the University of Tartu Hospital and the Nova Vita Clinic, among others. Celvia’s mission is to develop and deliver innovative science-based products and services for patients, clinicians and businesses. The company combines new knowledge with state-of-the-art technology and unique skills, resulting in competitive genetic tests that meet high standards. They collaborate with researchers and clinicians from Estonia and abroad to ensure the high quality of basic research.

Technical/scientific challenge

Honey contains approximately 0.01% DNA, which can be extracted and sequenced to produce up to 20 million DNA sequencing reads. These sequences can be computationally analysed using a taxonomic classifier to get insights into the biological composition of honey, aid in monitoring honeybee pathogens, and assess authenticity. However, this computational analysis is resource-intensive as the taxonomic classifier alone relies on a database approximately 400 GB in size, and the analysis involves multiple steps. The company needs to process a high volume of samples on a weekly basis.

Solution

Using high-performance computing (HPC), the company processes 32 samples in parallel through a singularity-containerized Nextflow workflow comprising seven subtasks. The most resource-intensive subtask demands 400 GB of RAM and 8 CPUs per sample. To streamline operations, they have automated the detection of new data, downloading, and initiation of analysis using crontab. Additionally, HPC resources are utilized to securely store raw data and analysis results, ensuring data integrity and confidentiality with backup systems in place.

Business impact

Honey is one of the most commonly counterfeited foods, creating a significant global challenge. Counterfeit honey is produced at a fraction of the cost of authentic honey, driving down market prices and threatening the livelihoods of honest beekeepers. To address this issue, HPC solutions offer the computational power necessary to analyse large volumes of honey samples on a weekly basis. These analyses help combat honey fraud by verifying authenticity and providing valuable taxonomic insights, enabling beekeepers to better understand and protect the quality of their honey and monitor the health of honeybees.

Benefits

HPC computational resources enable rapid processing of large volumes of weekly samples for honey analysis service, reducing results reporting time
Automation of the analysis process eliminates the need for specialists, allowing office staff to simply download results when notified
With petabytes of secure storage, HPC ensures safe data storage, including backups and access restrictions

Driving Efficiency in Radiology Workflows with Intelligent Insights

Industrial organisations involved

Better Medicine is an Estonian medical technology company founded in 2020 that focuses on applying artificial intelligence in radiology to detect oncological findings. Unlike traditional single-organ AI solutions, Better Medicine adopts a holistic, multi-organ approach, streamlining cancer diagnostics to aid in detecting primary tumours, metastases, and unidentified disease. The company’s mission is to transform the landscape of cancer diagnostics and radiology, reducing the burden on professionals and improving outcomes of cancer patients.

Technical/scientific challenge

Better Medicine faces the challenge of efficiently training advanced AI models for lesion detection, measurement, and segmentation from CT scans. This requires substantial computational power to handle large datasets, optimize model performance, and reduce training time.

Solution

Using the HPC Centre at the University of Tartu, Better Medicine leverages high-performance GPUs enabled nodes and scalable infrastructure to train deep learning models on large imaging datasets. The computational resources enable faster experimentation, precise parameter optimization, and the development of robust AI models ready for clinical use.

Business impact

HPC resources significantly enhance the speed and accuracy of Better Medicine’s AI models, enabling rapid iteration cycles and model improvements. These models accelerate radiological workflows by automating lesion segmentation and measurement, addressing the global radiologist shortage, and expediting patient care. The adoption of HPC supports scalability, ensuring efficient handling of increasing data volumes as Better Medicine expands its solutions to broader clinical applications.

Benefits

Machine Translation for Low-resource Finno-Ugric Languages

Scientific partners involved

The University of Tartu is the leading university in the Baltics in the field of information technology and is one of the 250 best universities in the world in the ranking of Times Higher Education Universities in the field of Computer Science. Examples of the University’s Chair of Natural Language Processing’s research directions include machine learning and data mining on text data, machine translation from one natural language into another, summarization of a long text document, automatic analysis of grammatical correctness, word meanings and sentence structure.

Technical/scientific challenge

Neural networks have caused rapid growth in output quality for many natural language processing tasks, including neural machine translation (NMT). However, the output quality crucially depends on the availability of large amounts of parallel and monolingual data for the covered languages. Due to lack of training material, several lower-resource Finno-Ugric languages are not included in the existing massively multilingual models. In terms of the number of speakers, they range from 20 near-native speakers of Livonian to several hundred thousand speakers of Mordvinic languages.

Solution

An NMT system was developed between 20 low-resource Finno-Ugric languages and 7 high-resource languages. Monolingual corpora was collected mainly by crawling texts off the web and combining with pre-existing corpora. Three main categories of texts can be distinguished: news, Wikipedia, and biblical. All the NMT systems were trained on the LUMI supercomputer. All models were fine-tuned with the Fairseq framework implementation of M2M-100 for 350k updates with a batch size of 3840 tokens (the number was chosen to match earlier versions of models trained with the Huggingface implementation of M2M-100). The models were fine-tuned on 4 AMD Mi250X GPU-s.

Scientific impact

More that 20 Finnish-Ugric languages can now be translated using the machine translation engine of the University of Tartu. Most of these languages were added to a public translation engine for the first time. The translation engine allows researchers to translate materials that would otherwise be incomprehensible to them. It provides an opportunity to better study the history of languages and regions without knowing the respective language. Moreover, since most of Finno-Ugric languages are not widely spoken today, a translation engine is necessary to preserve these languages.

Benefits

Collection of parallel and monolingual corpora that can be used for training NMT systems for 20 low-resource Finno-Ugric languages
Expansion of the 200-language translation benchmark FLORES-200 with manual translations into nine new languages (Komi, Udmurt, Hill and Meadow Mari, Erzya, Livonian, Mansi, Moksha and Livvi Karelian)
The collected data can be used to create NMT systems for the included languages and investigate the impact of back-translation data on the NMT performance for low-resource languages

Cosmic Ray-based Solutions for 3D Imaging

Industrial organisations involved

GScan was founded in 2018 to revolutionise inspection, security and medical scanning markets using Muon Flux Technology (MFT). GScan, as the pioneer of MFT having unique IP, tech & sales know-how in the field, is developing a new generation of Non-Destructive Testing (NDT) scanners and tomography systems for infrastructure management applications.

Technical/scientific challenge

To keep the surrounding environment safe and ensure its longevity, careful assessment, maintenance and investment plans are required. However, currently there is no efficient way of obtaining the information required for more efficient use of assets and reducing risks for critical infrastructure.

Solution

Capitalising on the power of natural cosmic ray tomography, the technology tracks the trajectory changes or absorption of particles (muons, electrons, positrons) as they pass through the object of interest, thereby extracting crucial statistics about its material and shape. These insights are then translated into 2D and 3D visualisations of both internal and external geometries, along with data on chemical composition. The comprehensive output we deliver provides in-depth insights into the objects and materials under scrutiny – all meticulously tailored to fulfil our customers’ unique requirements. HPC plays an important role in translating the collected data into visualisations.

Business impact

With time and space related digital data in terabytes, the detailed process of reconstruction enables us to see inside of structures what was not possible before.

Benefits

With HPC we can process the data and do our reconstructions faster
With faster reconstruction it is possible to apply wider range of algorithms during the post processing
With a wider range of algorithms the capability and efficiency of the technology grows and with better muon flux technology the world can become safer thanks to more reliable data about critical infrastructure

Machine Translation Post-Editing

Industrial organisations involved

Luisa Tõlkebüroo OÜ is the biggest translation agency in Estonia. The company offers more than 50 services – including sworn translation, simultaneous and consecutive interpretation, layout work, machine translation and post-editing, subtitling and localisation.

Technical/scientific challenge

The company needed a custom-made machine translation system to reduce the time of translations. As the company had no previous experience neither in natural language processing nor in machine learning, they collaborated with the TartuNLP team.

Solution

Training of the machine translation model was conducted by using University of Tartu HPC centre’s Rocket cluster. The company needed a custom-made machine translation system to reduce the time of translations. As the company had no previous experience neither in natural language processing nor in machine learning, they collaborated with the TartuNLP team. Training of the machine translation model was conducted by using University of Tartu HPC centre’s Rocket cluster.

Business impact

Thanks to rapid advances in the technology and extensive translation memory, the company is able to offer machine translations with post-editing in a range of language combinations and on a range of topics.

Benefits

An accurate AI-based Cloud Mask Processor for Sentinel-2

Industrial organisations involved

KappaZeta is a science-driven remote sensing company aiming to make space a valuable asset for everyone. KappaZeta’s expertise is in using SAR (radar) satellite data, incorporating it with optical satellite data and providing some of the most accurate AI models on the market. The key area of focus is agriculture.

Technical/scientific challenge

Cloud masking is an essential step for the pre-processing of optical satellite imagery. KappaZeta addresses the problem by introducing KappaMask, an AI-based cloud and cloud shadow masking processor for Sentinel-2, which carries an optical instrument payload that samples 13 spectral bands. As a cloud detector, KappaMask uses a large convolutional segmentation model. Faster model convergence during training can be achieved by using larger batch sizes of the training data, which means more GPU memory is needed. Additionally, faster CPUs are required for shorter data loading times to increase the training speed even further.

Solution

KappaMask was trained on an open-source dataset and fine-tuned on a Northern European terrestrial dataset which was labelled manually using the active learning methodology. The training was performed on the University of Tartu’s HPC Centres’ high-performance compute nodes. Powerful GPUs and CPUs were applied to substantially speed up the training of the model.

Business impact

KappaMask is an open source project. All the results, final software and source code will be freely and openly distributed in GitHub. Openness and accessibility of the software should directly translate into greater usage.

Benefits

Self-driving technology for a Level 4 autonomous car

Industrial organisations involved

Bolt is an Estonian mobility company that offers vehicle for hire, micromobility, car-sharing, and food delivery services headquartered in Tallinn and operating in over 400 cities in over 45 countries. In partnership with the University of Tartu, the company developes self-driving technology for a Level 4 autonomous car.

Technical/scientific challenge

Autonomous cars acquire up to 357 GB/hour of data during test drives. Autonomous car engineers needed a system to store and easily access those test logs.

Solution

Acquired test logs are copied to HPC storage, into appropriately guarded directory. Regularly cron job processes those log files into metadata stored in MongoDB database. Processing is distributed over cluster and happens in parallel. Longest logs can take up to 24 hours to process, so processing them sequentially would be very time-consuming. On top of MongoDB sits custom-made application that allows filtering of test sessions and browsing them using Webviz visualization tool. Visualization tool accesses the raw sensor data from HPC storage.

Business impact

With the growing demand for ride-hailing services, autonomous vehicle technology will provide a solution for transportation problems on an increasingly broader scale.