Search match: All (internal and external) Internal duplicates only External duplicates only
Query match:   1 data point 2 data points
Advanced options

Duplicate content checker / Plagiarism detection.

Updates: 1. The duplicate content checker is now also able to process plain text input, besides URL input. 2. By clicking the advanced options box, you can select the option to search for duplicate content based on multiple data points (text selection). 3. And I tweaked the way the returned results are presented.

Use the duplicate content checker to find internal and external duplicate content for a specific webpage. Duplicate content is an important SEO issue, because search engines try to filter out as much duplicates as possible, to offer the best search experience. This tool is able to detect two types of (text based) duplicate content.

Duplicate content types:

  1. Internal duplicate content. This means the same text is found on multiple pages on the same URL.
  2. External duplicate content. In this case the same text is found on multiple domains.

Why is it important to prevent duplicate content?

As mentioned above search engines don’t like duplicate content / plagiarism because users aren’t interested in looking at a search results page containing multiple URL’s, all containing more or less the same content. To prevent this from happening, search engines try to determine the original source, so they can show this URL for a relevant search query and filter out all the duplicates. As we know search engines do a pretty good job at filtering duplicates, but it is still pretty difficult to determine the original webpage. It can happen, when the same block of text appears on multiple websites, the algorithm will decide the page with the highest authority / highest trust will be shown in search results even though this isn’t the original source.

In the case Google detects duplicate content with the intent to manipulate rankings or deceive users, Google will make ranking adjustments (Panda filter) or the site will be removed entirely from the Google index and search results.

How does the duplicate content checker work?

  • Find indexed duplicate content, using URL or TEXT input.
    • Use URL input to extract the main article content / text found in the body of a web page. Navigational elements are removed, to reduce noise (otherwise a lot of pages would be falsely identified as internal duplicates.)
    • Use text input to get more control over the input.
  • Select advanced options to choose one or multiple data points, used to detect duplicate pages. Selecting multiple data points, will get you more specific and even better matching results.
    (These data points are automatically extracted from the page content or text input).
  • Similar content is extracted, returned and marked as: Input URL, Internal duplicate, External duplicate.
  • Export the results to .CSV. and use Excel / Open Office spreadsheet to view, edit or report your results.

How to use these results?

Internal duplicates

In most cases you’ll start solving internal duplicate issues. Because these problems exist in your own controlled environment (your website). Different methods can be used to remove internal duplicates, depending on the nature of the problem.

Some examples:

  1. Minimize boilerplate repetition
  2. Use a 301 permanent redirect
  3. Use a canonical tag
  4. Use Parameter Handling in Google Webmaster Tools
  5. Prevent an URL from being index.

External duplicates

External duplicates can be a whole nother story, because you can’t just make adjustments to your own site and solve the problem.

Some examples how you can remove external duplicates:

  1. Contact webmasters, and ask them to remove the copies of your content.
  2. If an another site is duplicating your content / in violation of copyright law and contacting them doesn’t solve the problem, you can use this form to notify Google: https://support.google.com/legal/troubleshooter/1114905 .

Tool limitations

  1. This tools automatically extracts the text form a web page to use as input to detect duplicate content. This is not always the exact block of text you like to check for duplicates. In the case it’s better to use the text input field.
  2. New content needs to be indexed before it can be returned by this tool. If the page / content is less than 2 days old, chances are slim you will get any results.
  3. Not all duplicates, found online, are returned by this tool. But compared to other tools it returns a pretty large sum.

External recourses:

  1. Google, https://support.google.com/webmasters/answer/66359?hl=en
  2. Search Engine Land, http://searchengineland.com/library/google/google-panda-update
VN:F [1.9.22_1171]
Rating: 9.7/10 (64 votes cast)
Duplicate content checker, 9.7 out of 10 based on 64 ratings

15 Responses to “Duplicate content checker”

  1. Max K.

    “Sorry, problems connecting to the API, please try again tomorrow…”

    3 Days running for me. Anyone else running into this issue?

    Reply
    • Jasja ter Horst (admin)

      Hey Max,

      Thanks for mentioning
      Just solved the problem, so the tool should again, be working like a charm ;-)

      Reply
  2. jai

    Hi, i try to find the duplicate content for a link : http://www.eurocarparts.com/fr_fr/filtre-habitacle

    and it give 50 output, in that, it considered a internal link http://www.eurocarparts.com/fr_fr/informations-sur-la-livraison

    and the query is : Pour vous offrir la meilleure expérience possible, ce site utilise des cookies. Continuer à utiliser eurocarparts.com signifie que vous acceptez notre utilisation

    I try to find it out the query in both URLs but i can’t. Could you please guide me what is the issue ?

    Reply
    • Jasja ter Horst (admin)

      Hi Jai,

      You get this message because the tool detects this text as duplicate for multiple pages (50 in your example).
      The text you refer to is the default cookie text you showing on all your pages. This isn’t a problem, because it’s just a very small piece of content.
      To check other content sections try the “2 data point” query match or add text manually using the text input option you can find over here: http://www.seoreviewtools.com/duplicate-content-checker/?text-input
      Success!

      Reply
  3. Jasja ter Horst (admin)

    Updates:

    • Expanded the number of API queries.
    • And did some small adjustments to improve the feedback provided by the tool.
    Reply
  4. sahar

    Does your website use an independent testing procedure?
    Or is it supported by copy scape? Because you show other results.

    Reply
    • Jasja ter Horst (admin)

      Hi Sahar,

      Great question! The duplicate content checker uses it’s own technology. This automatically explains the differences you’ll encounter when comparing the results form this tool with for example Copy Scape.

      Cheers!

      Reply
  5. Dilip Sharma

    I am testing My Blog (buzznix.com/) but it is showing “Sorry, problems connecting to the API, please try again tomorrow…”…what to do admin ? #help

    Reply
  6. yuli agustiani

    Hi!! The Duplicate content checker is very helpful to me in improving the content on my blog.

    Reply

Leave a Reply

  • (will not be published)