Duplicate content checker / Plagiarism detection.
Use the duplicate content checker to find internal and external duplicate content for a specific webpage. Duplicate content is an important SEO issue, because search engines try to filter out as much duplicates as possible, to offer the best search experience. This tool is able to detect two types of (text based) duplicate content.
Tip: Need to check duplicate content for more websites on a daily basis? Try the API →
Duplicate content types
- Internal duplicate content. This means the same text is found on multiple pages on the same URL.
- External duplicate content. In this case the same text is found on multiple domains.
Why is it important to prevent duplicate content?
As mentioned above search engines don’t like duplicate content / plagiarism because users aren’t interested in looking at a search results page containing multiple URL’s, all containing more or less the same content. To prevent this from happening, search engines try to determine the original source, so they can show this URL for a relevant search query and filter out all the duplicates. As we know search engines do a pretty good job at filtering duplicates, but it is still pretty difficult to determine the original webpage. It can happen, when the same block of text appears on multiple websites, the algorithm will decide the page with the highest authority / highest trust will be shown in search results even though this isn’t the original source. In the case Google detects duplicate content with the intent to manipulate rankings or deceive users, Google will make ranking adjustments (Panda filter) or the site will be removed entirely from the Google index and search results.
How does the duplicate content checker work?
- Find indexed duplicate content, using URL or TEXT input.
- Use URL input to extract the main article content / text found in the body of a web page. Navigational elements are removed, to reduce noise (otherwise a lot of pages would be falsely identified as internal duplicates.)
- Use text input to get more control over the input.
- Similar content is extracted, returned and marked as: Input URL, Internal duplicate, External duplicate.
- Export the results to .CSV. and use Excel / Open Office spreadsheet to view, edit or report your results.
How to use these results?
Internal duplicates In most cases you’ll start solving internal duplicate issues. Because these problems exist in your own controlled environment (your website). Different methods can be used to remove internal duplicates, depending on the nature of the problem. Some examples:
- Minimize boilerplate repetition
- Use a 301 permanent redirect
- Use a canonical tag
- Use Parameter Handling in Google Webmaster Tools
- Prevent an URL from being index.
External duplicates External duplicates can be a whole nother story, because you can’t just make adjustments to your own site and solve the problem. Some examples how you can remove external duplicates:
- Contact webmasters, and ask them to remove the copies of your content.
- If an another site is duplicating your content / in violation of copyright law and contacting them doesn’t solve the problem, you can use this form to notify Google: https://support.google.com/legal/troubleshooter/1114905 .
Tool limitations
- This tools automatically extracts the text form a web page to use as input to detect duplicate content. This is not always the exact block of text you like to check for duplicates. In the case it’s better to use the text input field.
- New content needs to be indexed before it can be returned by this tool. If the page / content is less than 2 days old, chances are slim you will get any results.
- Not all duplicates, found online, are returned by this tool. But compared to other tools it returns a pretty large sum.
Update:
Why is it showing Input URL in the result? Does it mean there is duplicate content within the same website?
Hi Samel, Good question!
This tool performs a Google based search query and labels the results:
Does this answer your question?
Hi. How many URLs can this tool check at a time?
Hi Mandy, thank you for your question.
This tool takes 1 URL as input.
Hi,
I was wondering about confidentiality, does the tool store the input text?
Thanks a lot,
Hi Anais,
The Duplicate Content Checker and all other tools that you can use on SEO Review Tools don’t store input data (text, URLs etc.)
Hopes this answers your question!
Cheers,
Jasja ter Horst
What is meant by “No results found”. does it mean no duplicate content found on website ?
Hi Karan,
The “No results found” message, means the we’re unable to return any results..
Update / bug fix:
Hello Jasja, there is still a problem, the duplicate tool systematically analyse the home page, whatever the URL we are when launching the tool.
Hi, I did a quick check and I don’t see any issues with the duplicate content checker.
The case you referring to is probably caused by content existing on both your homepage as well on your input page.
You can post your input URL and I’ll have a closer look.
Cheers,
Nice work! Thanks for providing this great plagiarism tool. Especially because it returns and splits internal and external duplicates.
I have an eCommerce site “chalktalksports.com”. It is on Sales Force Commerce Cloud (formerly Demandware). When we use the duplicate-content-checker, we are matched to many other eCommerce sites on the same platform. How do I find the code that is matching to make this connection?
Not every eCommerce site on this platform has this problem, but many of us do.
Thanks,
Bobby
Hi Bobby,
Just above the “Summary” after checking your URL you can see the Query (on the left side) . This is the text, the tool automatically extracts from your page to perform the duplicate content check.
In your case this is the following text: “Your browser s Javascript functionality is turned off Please turn it on so that you can experience”. This is exactly why other e-commerce sites running on the same platform show up as duplicates. For the record these are not the type of duplicates which should worry you, since it’s just a very small content section.
Because of this I would suggest using the text input http://www.seoreviewtools.com/duplicate-content-checker/?text-input to get an accurate duplicate content check.
Success!
You should have the ability to check regular tekst, so that I dont have to create a page and publish it before checking it.
Hi Rasmus,
You can use the text input http://www.seoreviewtools.com/duplicate-content-checker/?text-input to check unpublished text or when the tool isn’t able to do an automatic content extraction.
Nice Tool!
Great copyscape alternative :)
“Sorry, problems connecting to the API, please try again tomorrow…”
3 Days running for me. Anyone else running into this issue?
Hey Max,
Thanks for mentioning
Just solved the problem, so the tool should again, be working like a charm ;-)
Absolutely great tool to check for plagiarism and duplicate content.
Thanks
Hi, i try to find the duplicate content for a link : http://www.eurocarparts.com/fr_fr/filtre-habitacle
and it give 50 output, in that, it considered a internal link http://www.eurocarparts.com/fr_fr/informations-sur-la-livraison
and the query is : Pour vous offrir la meilleure expérience possible, ce site utilise des cookies. Continuer à utiliser eurocarparts.com signifie que vous acceptez notre utilisation
I try to find it out the query in both URLs but i can’t. Could you please guide me what is the issue ?
Hi Jai,
You get this message because the tool detects this text as duplicate for multiple pages (50 in your example).
The text you refer to is the default cookie text you showing on all your pages. This isn’t a problem, because it’s just a very small piece of content.
To check other content sections try the “2 data point” query match or add text manually using the text input option you can find over here: http://www.seoreviewtools.com/duplicate-content-checker/?text-input
Success!
Updates:
Does your website use an independent testing procedure?
Or is it supported by copy scape? Because you show other results.
Hi Sahar,
Great question! The duplicate content checker uses it’s own technology. This automatically explains the differences you’ll encounter when comparing the results form this tool with for example Copy Scape.
Cheers!
I am testing My Blog (buzznix.com/) but it is showing “Sorry, problems connecting to the API, please try again tomorrow…”…what to do admin ? #help
Hi Dilip,
API problem solved so you can use this tool again.
Thank You :)
Update: Just fixed some API issues affecting this tool. Works like a charm again ; -)
Hi!! The Duplicate content checker is very helpful to me in improving the content on my blog.
Updated the “No results” response.