• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

HSE University Develops Tool for Assessing Text Complexity in Low-Resource Languages

An installation at the National Library of the Republic of Tatarstan celebrating the history of Tatar writing, featuring symbols from various alphabets

An installation at the National Library of the Republic of Tatarstan celebrating the history of Tatar writing, featuring symbols from various alphabets
© Wikimedia Commons

Researchers at the HSE Centre for Language and Brain have developed a tool for assessing text complexity in low-resource languages. The first version supports several of Russia’s minority languages, including Adyghe, Bashkir, Buryat, Tatar, Ossetian, and Udmurt. This is the first tool of its kind designed specifically for these languages, taking into account their unique morphological and lexical features.

According to the Institute of Linguistics of the Russian Academy of Sciences, 155 languages are spoken in Russia. Some of them are used by relatively small communities—for example, around 80,000 people speak Adyghe, while 250,000 to 350,000 people speak Buryat, Ossetian, and Udmurt. Other languages, such as Bashkir and Tatar, have more than one million native speakers. All of these languages hold official status in various republics of Russia, making it essential not only to preserve them but also to create conditions for their development, including opportunities for learning and use in education and science. 

In 2025, a Presidential Decree approving the Fundamentals of the State Language Policy of the Russian Federation was adopted. It affirms linguistic diversity and outlines a strategy for the development and practical use of the languages spoken by the peoples of Russia. One way to advance these goals is to create digital tools that make working with low-resource languages easier and more accessible.

A team of scientists at the HSE Centre for Language and Brain has developed an online text complexity calculator for quick and easy assessment of text difficulty in several minority languages, taking into account their linguistic features. The calculator is based on Textometr, a tool created by Antonina Laposhina and Maria Lebedeva for evaluating the complexity of Russian-language texts.

The calculator developed by psycholinguists at HSE University evaluates texts across several parameters: word length and frequency based on data from language corpora; the percentage of vocabulary covered by the frequency list (ie the share of words in the text that appear among the 5,000 most frequent words in the respective language); and the distribution of parts of speech within the text. In addition, the calculator considers factors such as lexical density and diversity, as well as the text's narrativity and descriptiveness.

The key innovation is the use of the Flesch Reading Ease formula, adapted separately for each language, making it possible to assess text complexity and readability more accurately. 

The Flesch score is based on the number of words, sentences, and syllables, but the original coefficients were developed for English and do not work well for structurally different languages—such as the polysynthetic Adyghe language, in which the average word is much longer. In a 2025 study, Uliana Petrunina and Nina Zdorova recalculated the formula’s coefficients specifically for Adyghe, which significantly improved the accuracy of the readability assessment.

Uliana Petrunina

'The parameters of our calculator are adapted to the structural features of each of the six low-resource languages of Russia, using text corpora as well as frequency and morphological analyses. We also adapted the classic Flesch Reading Ease score. As a result, the algorithm can be easily reconfigured for other low-resource languages, regardless of their typological characteristics,' explains Uliana Petrunina, Research Fellow at the HSE Centre for Language and Brain and one of the developers of the tool.

The tool will help create comparable stimulus materials for linguistic experiments and provide teachers with a resource for selecting high-quality educational materials by difficulty level. This solution represents an important contribution to the preservation and development of Russia’s minority languages and to supporting the country’s linguistic diversity. 

Nina Zdorova

'Our tool allows researchers and teachers to select materials based on their linguistic complexity, which is particularly important for research and education in languages with limited resources,' says Nina Zdorova, one of the creators of the tool.

Future versions are expected to include additional low-resource languages that are underrepresented in linguistics, both in Russia and beyond.

See also:

Russian Scientists Propose Method to Speed Up Microwave Filter Design

Researchers at HSE MIEM, in collaboration with colleagues from the Moscow Technical University of Communications and Informatics (MTUCI), have implemented a novel approach to designing microwave filters—generative synthesis using machine learning tools. The proposed method reduces the filter development cycle from several days to just a few minutes and in the future could be applied to the design of other microwave electronic devices. The results were presented at the IEEE International Conference '2026 Systems of Signals Generating and Processing in the Field of on Board Communications.'

Scientists Find That Only Technological Innovations Consistently Advance Environmental Sustainability

Renewable energy and labour productivity do not always contribute to environmental sustainability. Technological innovation is the only factor that consistently has a positive effect. This is the conclusion reached by an international team of researchers, including Natalia Veselitskaya, Leading Research Fellow at the HSE ISSEK Foresight Centre. The study has been published in Sustainable Development.

HSE’s CardioLife Test Among Winners of Data Fusion Awards 2026

The CardioLife genetic test—a development by the Centre for Biomedical Research and Technologies of the AI and Digital Science Institute at HSE University’s Faculty of Computer Science—has won the All-Russian cross-industry Data Fusion Awards, which recognise achievements in data and AI technologies. The project took first place in the Science–Business Partnership category, demonstrating a successful model for transferring technology from university research into the real healthcare sector.

'I Dream of Becoming Part of the International Semantics Community'

As a student, Stepan Mikhailov took part in an expedition to the Urals and became so deeply engaged that he eventually wrote his dissertation on a related topic—possessive constructions in the Khanty language. In this interview for the HSE Young Scientists project, he talks about bridging syntax and semantics, the importance of making time to cook and eat breakfast in the morning, and his favourite place in the village of Kazym.

HSE Researchers Train Neural Network to Predict Protein–Protein Interactions More Accurately

Scientists at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a model capable of predicting protein–protein interactions with 95% accuracy. GSMFormer-PPI integrates three types of protein data (including information about protein surface properties) to analyse relationships between proteins, rather than simply combining datasets as in previous models. The solution could accelerate the discovery of disease molecular mechanisms, biomarkers, and potential therapeutic targets. The paper has been published in Scientific Reports.

HSE University Installs Geoscan Station at IIT Bombay

A Russian ground station for receiving SONIKS satellite data has been installed on the campus of the Indian Institute of Technology Bombay (IIT Bombay). Developed by Geoscan, the system will become part of a mirror laboratory project run jointly by HSE University and one of India’s leading universities.

HSE MIEM and MTS Launch Workshop on Innovative Solutions in Communication Networks

The HSE Tikhonov Moscow Institute of Electronics and Mathematics (MIEM) and MTS are launching a joint workshop in which students will work at the intersection of communications network engineering, data analysis, and digital technologies. The project is designed as a practice-oriented learning format, enabling students to tackle real industry challenges alongside company engineers and MIEM specialists. Registration to participate in the workshop is open until April 15, 2026.

HSE Scientists Uncover Mechanism Behind Placental Lipid Metabolism Disorders in Preeclampsia

Scientists at HSE University have discovered that in preeclampsia—one of the most severe complications of pregnancy—the placenta remodels its lipid metabolism, reducing its own cholesterol synthesis while increasing cholesterol transfer to the foetus. This compensatory mechanism helps sustain foetal nutrition but accelerates placental deterioration and may lead to preterm birth. The study findings have been published in Frontiers in Molecular Biosciences.

HSE Experts Reveal Low Accuracy of Technology Forecasts in Transportation

HSE researchers evaluated the accuracy of technology forecasts in the transportation sector over the past 50 years and found that the average accuracy rate does not exceed 25%, with the lowest accuracy observed in aviation and rail transport. According to the scientists, this is due to limitations of the forecasting method and the inherent complexities of the sector. The study findings have been published in Technological Forecasting and Social Change.

Wearable Device Data and Saliva Biomarkers Help Assess Stress Resilience

A team of scientists, including researchers from HSE University, has proposed a method for assessing stress resilience using physiological markers derived from wearable devices and saliva samples. The participants who adapted better to stress showed higher heart rate variability, higher zinc concentrations in saliva, and lower potassium levels.  The findings were published in the Journal of Molecular Neuroscience.