The Google Algorithm Leak: The TL;DR on what you need to know…


On May 31st, just after we went to press with the weekly AxisOfEasy, the news broke of a massive leak of internal Google API documentation that exposed the inner workings of how the search giant’s search ranking algorithms worked.

The leak took place through an errant commit of the files to a public repository, which was then cloned on Github. Google later confirmed the authenticity of the materials.

What this means is that after years of speculation, and an entire industry forming around pondering “the mind of Google” as it pertains to ranking in the search results – the world finally caught a glimpse into exactly how that coveted front page and top placement gets determined (or at least how it was done so in the past. The leaked materials may be out of date).

The Main Takeaways:

Over the weekend I listened to quite a few podcasts, read a lot of post-mortems, and talked to a few SEO experts, and the loose consensus is that things Google has been saying for years do not count as ranking factors, according to code snippets and comments in this dump, actually do count.

Here’s a few examples:

Domain rank exists

In the olden days, there was such thing as a Google Pagerank which was a score on a scale of 10. The name of the game was to achieve a high pagerank, and also to secure backlinks from other sites with high pagerank.

Then at some point, page rank “went away”, Google said there was no domain ranking attribute that affected search engine results, although sites like like Ahrefs and Semrush continued to assign domain ranking scores under various labels and despite Google protestations that it didn’t matter any more, the suspicion was that it did.

The leaks outline out something called siteAuthority is a factor in returning search results.

`siteAuthority` (*type:* `integer()`, *default:* `nil`) - site_authority: converted from quality_nsr.SiteAuthority, applied in Qstar.

Small Personal Sites may be penalized vs big brands

Google has an attribute called smallPersonalSite:

`smallPersonalSite` (*type:* `number()`, *default:* `nil`) - Score of small personal site promotion go/promoting-personal-blogs-v1

…and from the people looking at this, they seem to be demoted vs big brand sites. A huge headwind for small, mom-and-pop, indie businesses if this is true.

Yes Virginia, There Is a Sandbox

Whether brand new websites (on freshly registered domains) were included in the main index or “sandboxed” for a period of time, running into months, has long been debated. Google said there was no “sandbox”- the dump indicates to the contrary – as laid out in the file sandbox_config.ex

Clicks count

Google has long said that clicks on search results do not weigh in to the subsequent ranking algorithm – in fact they testified under oath to that affect to the DoJ.

Well, apparently they do. The dump shows that clicks were tracked and mapped down to length of time on site.

So it will be interesting to see how this plays out in the legal arena now that Google has basically been outed as having perjured themselves. Probably nothing.

With over 40,000 files to comb through, fresh revelations and discoveries will probably be coming for a long time.

However, despite the headlines, Clint Butler, from SEO Intel‘s take is more circumspect:

“The recent leak of Google’s API data has provided a fascinating glimpse into the inner workings of their search algorithms, but it’s crucial that we approach these findings with a healthy dose of skepticism.

Before jumping to conclusions, more rigorous testing and validation are needed to truly understand the implications. Relying on leaked data without thorough verification could lead to misguided strategies and missed opportunities.” 

One of the reasons I asked Clint for his take on this was because I’ve been following his work in the SEO field for a little over a year and have always been impressed by his intellectual rigour he brings to the craft – primarily his penchant for testing everything.

So we need see how this shakes out and whether Google alters their algos as a result of this leak.

Where to Get More Information

The first to receive the leak was Rand Fishkin over on SparkToro. Among the first to dive in and sift through them all were Mike King, from iPullRank, which he wrote up here.

Some good distillations of the findings can be had via the Niche Pursuits: Google Lied! episode and this interview with Rand Fishkin on The Near Memo

Clint Butler can be reached via SEOIntel

Get on the AxisOfEasy mailing list to receive the weekly tech digest covering all-things cyber-security, privacy, censorship and sovereign computing. We’ll also send you a coupon code you can use over on easyDNS.

Leave a Reply

Your email address will not be published. Required fields are marked *