Duplicate content checker / Plagiarism detection.
Updates: 1. The duplicate content checker is now also able to process plain text input, besides URL input. 2. By clicking the advanced options box, you can select the option to search for duplicate content based on multiple data points (text selection). 3. And I tweaked the way the returned results are presented.
Use the duplicate content checker to find internal and external duplicate content for a specific webpage. Duplicate content is an important SEO issue, because search engines try to filter out as much duplicates as possible, to offer the best search experience. This tool is able to detect two types of (text based) duplicate content. Duplicate content types:
- Internal duplicate content. This means the same text is found on multiple pages on the same URL.
- External duplicate content. In this case the same text is found on multiple domains.
Why is it important to prevent duplicate content?
As mentioned above search engines don’t like duplicate content / plagiarism because users aren’t interested in looking at a search results page containing multiple URL’s, all containing more or less the same content. To prevent this from happening, search engines try to determine the original source, so they can show this URL for a relevant search query and filter out all the duplicates. As we know search engines do a pretty good job at filtering duplicates, but it is still pretty difficult to determine the original webpage. It can happen, when the same block of text appears on multiple websites, the algorithm will decide the page with the highest authority / highest trust will be shown in search results even though this isn’t the original source. In the case Google detects duplicate content with the intent to manipulate rankings or deceive users, Google will make ranking adjustments (Panda filter) or the site will be removed entirely from the Google index and search results.
How does the duplicate content checker work?
- Find indexed duplicate content, using URL or TEXT input.
- Use URL input to extract the main article content / text found in the body of a web page. Navigational elements are removed, to reduce noise (otherwise a lot of pages would be falsely identified as internal duplicates.)
- Use text input to get more control over the input.
- Select advanced options to choose one or multiple data points, used to detect duplicate pages. Selecting multiple data points, will get you more specific and even better matching results. (These data points are automatically extracted from the page content or text input).
- Similar content is extracted, returned and marked as: Input URL, Internal duplicate, External duplicate.
- Export the results to .CSV. and use Excel / Open Office spreadsheet to view, edit or report your results.
How to use these results?
Internal duplicates In most cases you’ll start solving internal duplicate issues. Because these problems exist in your own controlled environment (your website). Different methods can be used to remove internal duplicates, depending on the nature of the problem. Some examples:
- Minimize boilerplate repetition
- Use a 301 permanent redirect
- Use a canonical tag
- Use Parameter Handling in Google Webmaster Tools
- Prevent an URL from being index.
External duplicates External duplicates can be a whole nother story, because you can’t just make adjustments to your own site and solve the problem. Some examples how you can remove external duplicates:
- Contact webmasters, and ask them to remove the copies of your content.
- If an another site is duplicating your content / in violation of copyright law and contacting them doesn’t solve the problem, you can use this form to notify Google: https://support.google.com/legal/troubleshooter/1114905 .
- This tools automatically extracts the text form a web page to use as input to detect duplicate content. This is not always the exact block of text you like to check for duplicates. In the case it’s better to use the text input field.
- New content needs to be indexed before it can be returned by this tool. If the page / content is less than 2 days old, chances are slim you will get any results.
- Not all duplicates, found online, are returned by this tool. But compared to other tools it returns a pretty large sum.
- Google, https://support.google.com/webmasters/answer/66359?hl=en
- Search Engine Land, http://searchengineland.com/library/google/google-panda-update