How Google fought search spam using AI in 2020

By Tilly Kenyon

May 04, 2021

undefined mins

Google has said that Artificial Intelligence (AI) offers ‘unprecedented potential to revolutionise’ spam fighting...

Last year Google was able to build their very own spam-fighting AI, that can catch both known and new spam trends.

Hacked spam was still widespread in 2020 as the number of vulnerable websites remained quite large, although Google has said they have improved their detection capability by more than 50% and removed most of the hacked spam from search results. They have also reduced sites with auto-generated and scraped content by more than 80% compared to a couple of years ago.

What is search engine spam?

Search engine spam refers to measures that try to influence the position a website has in search engines, often for pages that contain little or no relevant content.

How Google prevents spam from reaching you

Before Google delivers a set of search results, there is a lot that happens. Every day they are discovering, crawling, and indexing billions of web pages of which they discover 40 billion spammy pages.

This diagram shows how Google defends against spam.

Firstly, they have systems that can detect spam when they crawl pages or other content. Crawling is when their automatic systems visit content and consider it for inclusion in the index they use to provide search results.

These systems also work for the content they discover through sitemaps and Search Console. For example, Search Console has a 'request indexing' feature so creators can let Google know about new pages that should be added quickly. Google has previously observed spammers hacking into vulnerable sites, pretending to be the owners of these sites, verifying themselves in the Search Console, and using the tool to ask Google to crawl and index the many spammy pages they created. Using AI, Google was able to pinpoint suspicious verifications and prevented spam URLs from getting into the index this way.

Next, they have systems that analyse the content that is included in the index. When you issue a search, they work to double-check if the content that matches might be spam. If so, that content won’t appear in the top search results.

The result is that very little spam actually makes it into the top results anyone sees for a search, thanks to the automated systems that are aided by AI. Google has estimated that these automated systems help keep more than 99% of visits from Search completely spam-free. As for the percentage left, their teams take manual action to further improve the automated systems.

(Image: Google)

AI Google Spam searchengine