Facepalm: Being the fourth largest search engine in the world, Yandex is a real technology giant offering a variety of digital services or services with digital additions. The company has been involved in a recent security incident that will yield interesting results, at least for the SEO market.
Almost 50 gigabytes of stolen data from Yandex services have recently been posted online. The company is trying to downplay the leak, but the source code distributed via torrent can reveal a lot of useful information about how its services actually work — and, in particular, the search engine.
The leak occurred on January 25 and concerned a list of files that were apparently stolen in July 2022 from a vault dating back to February 2022 – the month when Russia launched its full—scale invasion of Ukraine. The torrent does not seem to contain any data (or ready-made binary files), except for the source code of all the main Yandex services, including the search engine with its indexer bot, Maps (the Russian version of Google Maps and Street View), Uber- as a Taxi service, Mail, Market (an alternative to Amazon), cloud platform and much more.
According to software engineer Arseny Shestakov, a leak is a big deal. “Imagine one company” capable of replacing Google, Uber, Amazon, Netflix and Spotify at once, the coder said. The leak is also real, as Shestakov spoke with various people who worked at the company (or still work there), and said that some of the archives contain “modern source code” for Yandex services and documentation pointing to real intranet URLs.
One of the most interesting — and potentially dangerous — aspects of the leak is the source code of the Yandex search engine, namely the ranking factors used by the algorithm to provide results for users’ search queries. The leak lists 1922 unique ranking factors, most of which are marked as “obsolete” and have probably been replaced in the most recent versions of the Yandex code.
The first ranking factor used by the Russian search engine is “PAGE_RANK”, which is an explicit reference to the most important algorithm used by Google to rank web pages. As for Yandex’s own web search, the algorithm leak seems to favor pages that are not too outdated, have a lot of organic traffic (i.e. unique visitors), are optimized for code and hosted on reliable servers or are Wikipedia pages.
The leak from Yandex certainly gives search engine optimization specialists a lot of information about how a world-class search engine actually works, although the security implications shouldn’t be that interesting. Shestakov said that no personal data was involved, and several API keys were probably used only for testing.
Yandex’s official press release about the incident states that the leaked code fragments are “outdated and differ from the version currently used” by its services, and some of the published fragments “have never actually been used in work.”
The company is still investigating the seemingly politically motivated incident and will take all possible measures to improve management control so that there are no more leaks in the future.