WhatsApp EverRanks
image

Blog

Welcome to our Blog! For those who want to stay up to date with the latest SEO information. What are the latest changes in Google algorithms, rules, and how to succeed in SEO? Which SEO company with quality services?

Google On The Percentage Of Duplicate Content

clock
15 phút
pencil
Written by
26 Sep 2022

Google’s John Mueller responds to a question about whether there is a percentage threshold at which Google identifies something as duplicate content.

Top 21 Best Web Design Agencies in 2022

How much of the content is duplicated?
How much of the content is duplicated?

Google’s John Mueller recently responded to a question about whether Google uses a percentage threshold of content duplication to identify and filter out duplicate content.

What percentage of content is duplicated?

Duane Forrester (@DuaneForrester) asked on Facebook if anyone knew if any search engine had published a percentage of content overlap at which content is considered a duplicate.

Bill Hartzer (@bhartzer) asked John Mueller on Twitter and received an almost immediate response.

Bill tweeted:

“Hey @johnmu is there a percentage that represents duplicate content?

For example, should we be trying to make sure pages are at least 72.6 percent unique to other pages on our site?

Does Google even measure it?”

John Mueller of Google responded:

Bill Hartzer’s tweet on Sep 23, 2022 (Source: @bhartzer) 

Google’s Method for Detecting Duplicate Content

For many years, Google’s methodology for detecting duplicate content has remained remarkably consistent.

Matt Cutts (@mattcutts), a Google software engineer at the time, published an official Google video in 2013 describing how Google detects duplicate content.

He began the video by stating that a large amount of Internet content is duplicated and that this is normal.

“It’s important to realize that if you look at content on the web, something like 25% or 30% of all the web’s content is duplicate content.

…People will quote a paragraph of a blog and then link to the blog, that sort of thing.”

He went on to say that Google will not penalize duplicate content because so much of it is innocent and without spammy intent.

Penalizing websites for having some duplicate content, he claims, will harm the quality of search results.

When Google discovers duplicate content, it does the following:

“…try to group it all together and treat it as if it’s just one piece of content.”

Matt went on:

“It’s just treated as something that we need to cluster appropriately. And we need to make sure that it ranks correctly.”

He went on to say that Google then decides which page to show in the search results and filters out duplicate pages to improve the user experience.

How Google Handles Duplicate Content in the Year 2020

Google is essentially comparing checksums rather than percentages
Google is essentially comparing checksums rather than percentages (Source: Internet)

In 2020, Google released a Search Off the Record podcast episode in which the same topic is described in eerily similar language.

From 06:44 minutes into the episode, here is the relevant section of that podcast:

“Gary Illyes: And now we ended up with the next step, which is actually canonicalization and dupe detection.

Martin Splitt: Isn’t that the same, dupe detection and canonicalization, kind of?

Gary Illyes: [00:06:56] Well, it’s not, right? Because first, you have to detect the dupes, basically cluster them together, saying that all of these pages are dupes of each other,

and then you have to basically find a leader page for all of them.

…And that is canonicalization.

So, you have the duplication, which is the whole term, but within that, you have cluster building, like dupe cluster building, and canonicalization.”

Gary then goes into technical detail about how they do it. Essentially, Google is comparing checksums rather than percentages.

A checksum is a representation of content in the form of a series of numbers or letters. As a result, if the content is identical, the checksum number sequence will be similar.

Gary explained it this way:

“So, for dupe detection what we do is, well, we try to detect dupes.

And how we do that is perhaps how most people at other search engines do it, which is, basically, reducing the content into a hash or checksum and then comparing the checksums.”

Gary stated that Google does it this way because it is simpler and more accurate.

Google uses checksums to detect duplicate content

So, when discussing duplicate content, it’s probably not a matter of a percentage threshold, where there’s a number at which content is said to be duplicated.

Instead, duplicate content is detected using a checksum representation of the content, and then those checksums are compared.

Another thing to note is that there appears to be a distinction between when part of the content is duplicated and when all of the content is duplicated.

With the data EverRanks has compiled on The Percentage Of Duplicate Content on Google, you should be able to allocate more relevant and effective content for your work. Follow our SEO company for more valuable information.

Google App Allows You To Remove Personal Information From Search

The Timing Of The September Core Update Following The HCU Is Not Coincidental
















      tags

      Similar news

      Card image cap

      23 Best Websites for Small Business Owners Should Bookmark

      A list of the best websites for small business owners,

      29 May 2023
      Card image cap

      How to Use Grammarly Premium for Free on Chrome Mac MSWord

      To check the spelling and grammar of an English text,

      26 May 2023
      Card image cap
      25 May 2023