New Clustering Method Simplifies Analysis of Large Data Sets

Researchers from HSE University and the Institute of Control Sciences of the Russian Academy of Sciences have proposed a new method of data analysis: tunnel clustering. It allows for the rapid identification of groups of similar objects and requires fewer computational resources than traditional methods. Depending on the data configuration, the algorithm can operate dozens of times faster than its counterparts. The study was published in the journal Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia.
Each year, the volume of information requiring processing continues to grow. Data comes from a variety of sources: scientific research, financial reports, medical examinations, and many others. Clustering methods—which group data based on similar characteristics—are used to detect patterns and organise information within such large datasets. These groupings are known as clusters.
One of the most widely used clustering methods is the k-means algorithm. It divides data into a predetermined number of clusters, initially selecting their centres (centroids). However, this method has a limitation: the number of clusters must be known beforehand, which is not always possible when dealing with complex data. Scientists from HSE University and the V.A. Trapeznikov Institute of Control Sciences have proposed a new approach to simplify this process—tunnel clustering. Unlike the k-means method, this algorithm does not require the number of clusters to be set in advance; it determines the necessary number itself by analysing the data structure.
‘The algorithm forms “tunnels” in the data—regions in multidimensional space where objects with similar characteristics group together,’ explained Fuad Aleskerov, Head of the Department of Mathematics at the HSE Faculty of Economic Sciences. ‘Users can choose from three modes of operation: with fixed cluster boundaries, with adaptive boundaries that adjust to the data structure, or a combined approach. This makes the method flexible and suitable for various types of tasks.’
The method was tested on a synthetic (artificially generated) dataset of 100,000 objects, as well as on real-world tasks in public administration and the banking sector.

The main advantage of the new method is its speed. Unlike classical algorithms that demand significant computational resources, tunnel clustering can, depending on the data configuration, perform the analysis dozens of times faster.
In addition, the researchers introduced the concept of the ‘transition degree’—a parameter indicating how many characteristics of an object must change for it to be classified into a different cluster. This helps assess the clarity of cluster boundaries and identify objects situated at the intersection of different groups.
‘People are generating more and more data, and the pace is only accelerating. According to the latest Digital 2025: Global Overview Report, as of early 2025, there were 5.56 billion internet users—nearly 68% of the global population. Adults spend an average of 6 hours and 38 minutes online each day, communicating, working, watching videos, and consuming content,’ said Alexey Myachin, Senior Research Fellow at the HSE International Centre for Decision Choice and Analysis. ‘Companies that ignore data analysis are losing vast sums of money.’
The authors continue to refine the algorithm, including conducting research into dimensionality reduction, which will help further decrease the time required to identify patterns in data.
The study was carried out with partial support from the Russian Science Foundation.
See also:
HSE Scientists Uncover Mechanism Behind Placental Lipid Metabolism Disorders in Preeclampsia
Scientists at HSE University have discovered that in preeclampsia—one of the most severe complications of pregnancy—the placenta remodels its lipid metabolism, reducing its own cholesterol synthesis while increasing cholesterol transfer to the foetus. This compensatory mechanism helps sustain foetal nutrition but accelerates placental deterioration and may lead to preterm birth. The study findings have been published in Frontiers in Molecular Biosciences.
HSE Experts Reveal Low Accuracy of Technology Forecasts in Transportation
HSE researchers evaluated the accuracy of technology forecasts in the transportation sector over the past 50 years and found that the average accuracy rate does not exceed 25%, with the lowest accuracy observed in aviation and rail transport. According to the scientists, this is due to limitations of the forecasting method and the inherent complexities of the sector. The study findings have been published in Technological Forecasting and Social Change.
Wearable Device Data and Saliva Biomarkers Help Assess Stress Resilience
A team of scientists, including researchers from HSE University, has proposed a method for assessing stress resilience using physiological markers derived from wearable devices and saliva samples. The participants who adapted better to stress showed higher heart rate variability, higher zinc concentrations in saliva, and lower potassium levels. The findings were published in the Journal of Molecular Neuroscience.
When Circumstances Are Stronger Than Habits: How Financial Stress Affects Smoking Cessation
HSE researchers have found that the likelihood of quitting smoking rises with increasing financial struggles. While low levels of financial difficulties do not affect smoking behaviour, moderate financial stress can increase the probability of quitting by 13% to 21%. Responses to high financial stress differ by gender: men are almost 1.5 times more likely to give up cigarettes than under normal conditions, whereas no significant effect is observed on women’s decisions to quit smoking. These conclusions are based on data from the Russia Longitudinal Monitoring Survey (RLMS-HSE) for 2000–2023 and have been published in Monitoring of Public Opinion: Economic and Social Changes.
HSE Researchers Propose New Method of Verbal Fluency Analysis for Early Detection of Cognitive Impairment
Researchers from the HSE Center for Language and Brain and the Mental Health Research Centre have proposed a new method of linguistic analysis that enables the distinction between normal and pathological ageing. Using this approach, they showed that patterns in patients’ word choices during verbal fluency tests allow clinicians to more accurately differentiate clinically significant impairments from subjective memory complaints. Incorporating this type of analysis into clinical practice could improve the accuracy of early dementia diagnosis. The results have been published in Applied Neuropsychology: Adult.
How the Brain Processes a Word: HSE Researchers Compare Reading Routes in Adults and Children
Researchers from the HSE Center for Language and Brain used magnetoencephalography to study how the brains of adults and children respond to words during reading. They showed that in children the brain takes longer to process words that are frequently used in everyday speech, while rare words and pseudowords are processed in the same way—slowly and in parts. With age, the system is reorganised: high-frequency words shift to a fast route, whereas new letter combinations are still analysed slowly. The study was published in the journal Psychophysiology.
How Neural Networks Detect and Interpret Wordplay: New Insights from HSE Researchers
An international team including researchers from the HSE Faculty of Computer Science has presented KoWit-24, an annotated dataset of 2,700 Russian-language Kommersant news headlines containing wordplay. The dataset enables an assessment of how artificial intelligence detects and interprets wordplay. Experiments with five large language models show that even advanced systems still make mistakes, and that interpreting wordplay is more challenging for them than detecting it. The results were presented at the RANLP conference; the paper is available on Arxiv.org, and the dataset and the code for reproducing the experiments are available on GitHub.
HSE Economists Find That Auction Prices Depend on Artist’s Life Story
Researchers from the Centre for Big Data in Economics and Finance at the HSE Faculty of Economic Sciences have found that facts from an artist’s life are statistically significant in pricing a painting, alongside such traditional characteristics as the material, the size of the canvas, or the presence of the artist’s signature. This conclusion is based on an analysis of prices for 15,000 works by 158 artists sold since 1999 by the major auction houses Sotheby’s and Christie’s. The article has been published in the journal Empirical Studies of the Arts.
HSE Physicists Propose Unified Theory for Describing Electric Double Layer
To develop more efficient batteries and catalysts, it is essential to understand the processes occurring at the metal–solution interface in the electric double layer (EDL). Physicists at HSE MIEM have proposed a unified theoretical model of the EDL that simultaneously accounts for selective adsorption of ions on the surface and partial charge transfer between ions and the metal—phenomena that had previously been described separately. The model’s predictions are consistent with experimental data. In the future, it may be used in the development of batteries, supercapacitors, and catalysts. The study has been published in Electrochimica Acta.
HSE Researchers Experimentally Demonstrate Positive Effects of Urban Parks on the Brain
Scientists at HSE University have investigated the effect of parks on the cognitive and emotional resources of city dwellers. The researchers compared brain electrical activity in 30 participants while they watched videos of walks through parks and along busy highways. The results showed that green urban environments with trees produce a consistent effect across individuals, helping the brain calm down and relax. By contrast, walks along busy streets were found to be distracting. The findings have been published in Scientific Reports.


