The Google Penguin algorithm, first unleashed in April 2012, is a webspam algorithm designed to target websites that use low-quality link schemes to rank high in Google SERPs. Penguin had an immediate impact when it launched in 2012.

Penguin Impact Was Widely Felt

The original Penguin update in 2012 targeted ‘webspam’ and impacted many websites and businesses that were ignorant of the risks of web spam.

There were numerous complaints on the actual post from Google about the first Penguin Update.

Here’s one:

I built a public benefit website that for 8 years has helped thousands and thousands of addicts find addiction treatment for free… We were able to provide the service through the industry paying for featured status for their centers in our directory of treatment centers… Harvard’s addiction hospital links to us, as well as a number of super picky super high-quality websites… A few years back I started to notice crap links pointing at the site… Then Panda came and the site lost half its traffic… We submitted reinclusion requests, we sent notices for sites to remove links to us and informed Google, and now this! Now our site is nowhere to be found, and competitors (who are no doubt paying link companies to take down certain competitors thanks to the algorithms changes) have won! Well done Google… You took a true public benefit site out of the rankings (one with a database of treatment centers more complete than the U.S. government’s), and replaced it with referral sites (of course, the exact keyword match .com) and individual treatment centers that charge people in dire need of help (and usually broke) scores of thousands of dollars for treatment, because they can afford to bring down the real good guys that have been helping people for years by pointing links at them. Well done Google.

The Relationship between Penguin & Panda

Google’s algorithms seem focused on quality, as Google defines it, and Google has published advice on creating a high-quality website that will rank high in SERPs:

What counts as a high-quality site?

Our site quality algorithms are aimed at helping people find “high-quality” sites by reducing the rankings of low-quality content. The recent “Panda” change tackles the difficult task of algorithmically assessing website quality. Taking a step back, we wanted to explain some of the ideas and research that drive the development of our algorithms.

Below are some questions that one could use to assess the “quality” of a page or an article. These are the kinds of questions we ask ourselves as we write algorithms that attempt to assess site quality. Think of it as our take at encoding what we think our users want.

Of course, we aren’t disclosing the actual ranking signals used in our algorithms because we don’t want folks to game our search results; but if you want to step into Google’s mindset, the questions below provide some guidance on how we’ve been looking at the issue:

  • Would you trust the information presented in this article?
  • Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?
  • Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?
  • Would you be comfortable giving your credit card information to this site?
  • Does this article have spelling, stylistic, or factual errors?
  • Are the topics driven by genuine interests of readers of the site, or does the site generate content by attempting to guess what might rank well in search engines?
  • Does the article provide original content or information, original reporting, original research, or original analysis?
  • Does the page provide substantial value when compared to other pages in search results?
  • How much is quality control done on content?
  • Does the article describe both sides of a story?
  • Is the site a recognized authority on its topic?
  • Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care?
  • Was the article edited well, or does it appear sloppy or hastily produced?
  • For a health related query, would you trust information from this site?
  • Would you recognize this site as an authoritative source when mentioned by name?
  • Does this article provide a complete or comprehensive description of the topic?
  • Does this article contain insightful analysis or interesting information that is beyond obvious?
  • Is this the sort of page you’d want to bookmark, share with a friend, or recommend?
  • Does this article have an excessive amount of ads that distract from or interfere with the main content?
  • Would you expect to see this article in a printed magazine, encyclopedia or book?
  • Are the articles short, unsubstantial, or otherwise lacking in helpful specifics?
  • Are the pages produced with great care and attention to detail vs. less attention to detail?
  • Would users complain when they see pages from this site?

It’s time for Penguin 4.0

Google updates Penguin, says it now runs in real time within the core search algorithm

After a nearly two-year wait, Google’s Penguin algorithm has finally been updated again. It’s the fourth major release, making this Penguin 4.0. It’s also the last release of this type, as Google now says Penguin is a real-time signal processed within its core search algorithm. The special penguin 4.0 algorithm infographic that gives an introduction to the algorithm updates and recommended SEO techniques.

Penguin goes real-time

Penguin is a filter designed to capture sites that are spamming Google’s search results in ways that Google’s regular spamming systems might not detect. Introduced in 2012, it has operated on a periodic basis.

In other words, the Penguin filter would run and catch sites deemed spammy. Those sites would remain penalized even if they improved and changed until the next time the filter ran, which could take months.

The last Penguin update, Penguin 3.0, happened on October 17, 2014. Any sites hit by it have waited nearly two years for the chance to be free.

Those long delays are now to be a thing of the past, according to Google. With this latest release, Penguin becomes real-time. As Google recrawls and reindexes pages — which happens constantly — those pages will be assessed by the Penguin filter. Pages will be caught and/or freed by Penguin as part of this regular process.

As Google said in its post:

With this change, Penguin’s data is refreshed in real time, so changes will be visible much faster, typically taking effect shortly after we recrawl and reindex a page.

Recommended for You


Webcast, April 4th: How This Startup of the Year Accelerated User Acquisition Growth 18x in 10 Months

Penguin Update History Dates:

  • Penguin 1.0 – April 24, 2012 (3.1% of searches)
  • Penguin 1.2 – May 26, 2012 (0.1% of searches)
  • Penguin 1.3 – October 5, 2012 (0.3% of searches)
  • Penguin 2.0 – May 22, 2013 (2.3% of searches)
  • Penguin 2.1 – Oct. 4, 2013 ( 1% of searches)
  • Penguin 3 – Oct. 17, 2014 (LESS THAN 1%)
  • Penguin 4.0 – Sept 23, 2016

Penguin 4.0, Google would not give a specific number of the percentage of queries it impacted, mostly because the update is constantly happening and the percentage will constantly be changing.