What is Duplicate Content?
There are many people who have this and that to say about duplicate content, and I could you give you my definition but I think that it is best to understand what Google classifies as duplicate content.
“Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include:
- Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
- Store items shown or linked via multiple distinct URLs
- Printer-only versions of web pages
Google goes on to state that you should “Understand your content management system: Make sure you’re familiar with how content is displayed on your web site. Blogs, forums, and related systems often show the same content in multiple formats. For example, a blog entry may appear on the home page of a blog, in an archive page, and in a page of other entries with the same label.”
Understanding How WordPress Duplicates Content
WordPress is an awesome and powerful content management system. Unlike other open source content management systems such as Joomla or Drupal, WordPress is a lot easier to use (I banged my head up against the wall using Joomla a few times and quickly started looking for a different solution) WordPress is not only easier to use but it is also more automated than either Joomla or Drupal, for example WordPress has One Click updates for both the core update as well as plugins. Furthermore WordPress gives you the ability to easily schedule content to be published so that you can set it and forget it (note this can also be done in Joomla but the WordPress interface is much easier to use).
Even though WordPress is both powerful and easy to use it does have some downsides. For instance, WordPress out of the box, that is the default settings, publishes and organizes content that creates duplicate content issues, which can confuse the search engines, which is something we want to avoid at all costs. Remember, Google is your best friend, if you give them what they want, which frankly is great content and user friendly websites.
The way that duplicate content confuses the search engines, is if you have content published and organized on your websites using various different methods, as WordPress does, then the Google bot (the search engine spider/robot) will not know which page to index in the Google Search Database and it may make your site appear to be of low value or that you are trying to trick the Google Bot (something that Google doesn’t like). Lets take a closer look at how WordPress creates duplicate content.
Here’s how duplicate content is generated in a typical WordPress setup. Say you or someone else creates a Post titled How to Find the Best Roofing Company in Edmonton, selects two categories (Edmonton Roofer and Edmonton Roofing) for the Post and one Tag for the Post (finding a roofer). When the Post is published, WordPress then does the following:
The Post is published as www.edmontonroofingpros.com.com/how-to-find-the-best-roofing-company-in-edmonton.
Category pages are created. An entry for the Post containing its title, a link to the Post, and a 55-word excerpt (called the excerpt in WordPress parlance) is created on two category pages here at www.edmontonroofingpros.com/category.edmonton-roofer and www.edmontonroofingpros.com/category/edmonton-roofing.
A Tag page is created. Another duplicate entry of the Post with a title, link, and 55-word excerpt are created on a tag page here: www.edmontonroofingpros.com/tags/finding-a-roofer
If that wasn’t enough repetition, WordPress also creates yearly and monthly archives that contain another copy of the title, a link to the Post, and the excerpt. The archives display in this format: www.edmontonroofingpros.com/2011 and www.edmontonroofingpros.com/2011/07.
Furthermore, WordPress also creates an author archive publishing the exact same post under the authors name.
We don’t want to disable these pages because they help users find content but we have to find a way to control duplicate content to maximize our SEO efforts. There is one other wrinkle at work in this scenario. The 55-word excerpt that WordPress generates automatically for each Post (unless we create an excerpt manually for each Post) is generated from the first 55 words of the Post! So, the excerpt itself is a partial duplicate of the actual content! So, we have two problems:
The excerpt is a partial duplicate of existing content
Multiple archive pages with duplicate content
How to Solve Duplicate Content Issues with WordPress
So what is the answer solving the WordPress duplicate content problem? Read on to discover the five ways to solve the WordPress duplicate content issues.
1. Use Canonanical URLs
This is something that you should not only do with WordPress but all of your other websites as well, whether they be powered by wordpress or not. Perhaps you’re reading this and wondering what I mean by canonical urls (don’t feel that you’re alone, I had no idea what they were either) Canonical URLS are basically telling the search engines and everyone else which url structure you want to use. For instance, you may not know this but Google see’s www.edmontonroofingpros.com and edmontonroofingpros.com as two different websites (hence referred to as non-www and www). Therefore you need to tell Google, Users and other search engines which url structure you prefer. Frankly, while some people claim that it is easier for users to not have to type www. I prefer to use it, however it is just personal preference.
2. Structure Your Website Using Categories
This is really the most ideal way to structure your WordPress website for SEO and usability. Using categories to organize your posts will create Keyword Silo’s, that is keyword theme content related areas of your site. When the content on your is related to a specific category it is shown to be more relevant to the search engines and your users.
3. Have Page Excerpts on Your Home Page rather than full excerpts
If you have WordPress configured so that you Home page is where post are published or you have set WordPress up so that there is a dedicated page for blog posts you will want to have excerpts on your page rather than full posts so that the search engine spiders will follow the link and index it in its proper place. You can read more about this in my post, How to Easily Have Page Excerpts with the Headway Theme.
4. Use Only One Category Type per post
This should go without saying but I’ll mention it anyway. Only assign one category for each post. If you assign multiple categories this will further confuse the search engines and create even more duplicate content issues for you to resolve. Assigning one category per post will make it easier on your, your readers and the search engines. Coming up with the proper categories again should be a part of your keyword research. Also focus on ranking for one keyword at a time.
5. Use “no index” to tell the search engines not to index pages
There is an Robots Meta Tag that you can use to inform the search engines that you do not want them to index a certain page in order to avoid duplicate content and this is known as the Robots Meta Tags. For instance you can tell the search engine robots not to index your page or follow the links on the page by putting “no index, no follow” in the header section of your site. However, to accomplish this with WordPress is again, either best to use the Headway theme or Yoast’s WordPress SEO Plugin. Again, I prefer using Yoast’s plugin so I’ve shown you the settings to change in a screen capture below.
Tools To Solve WordPress Duplicate Content Issues
1. Duplicate Content Cure Plugin
There is a simple yet powerful plugin that you can install and simply set and forget. It will tell the search engines not to follow nor index your archives, paged and category pages. The benefits of using this plugin is the fact that you can simply install it and it will do all the work for you. The downside is that you have no control over what it indexes. For instance, I’ve written earlier that I organize my websites according to categories and the Duplicate Content Cure Plugin would not allow me to do this which is the reason why I don’t use it.
2. Use an SEO Optimized Theme Such as Headway
I keep no secrets in letting you know that I love headway and use it to build and power a majority of my sites. Headway is so powerful and flexible that you can build almost any type of site with it. For the reasons why I use headway check out my post on 7 Reasons Why I Use Headway (and you should too)
One of those reasons is the fact that Headway has SEO options built into it. That is that I can choose which pages to tell the search engines not to index, I can add my own Custom Titles and Meta Descriptions and headway will remove a list of predefined words such as “the” “of” and “a” from my word press slug urls.
Headway is a great theme, not only for its built in SEO options but also for its powerful capabilities (such as the Visual Editor and its Drag and Drop Framework).
3. WordPress SEO Plugin by Yoast
This is my method of choice to optimize my themes for the search engines. I used to simply use Headway’s built in options until Joost released his WordPress SEO plugin for WordPress. It has so many features and options that I now use it on all my sites.
If you’re familiar with WordPress and SEO than you’ve more than likely heard of Joost De Valk, a WordPress Developer and SEO consultant from the Netherlands. Joost has worked with companies such as KLM, eBay and Salesforce. He also is currently ranked #1 for the keyword “Wordpress SEO” and I don’t mind sharing that I learn a lot from him and listen to what he says (and you should too).
That being said, while there are numerous WordPress SEO plugins to choose from, in my humble opinion, Yoast’s WordPress SEO Plugin is the best. While I’ll be doing a full review on the plugin in a future post I’ll share three reasons why I like it.
i. It has a lot of options for me to customize how I want the Search Engine Spiders to crawl my site
ii. It has an xml sitemap built into it. Thus removing the need for another plugin.
iii. It has the option to remove the word “category” from my blog posts which results in better SEO (who needs the word category in the post type, annoying in my opinion and I’m glad that he included this feature)
I could go on about the plugin but I’ll save that for another post.
So I hope this blog post has been both helpful and useful to you. If you have any questions or comments about duplicate content please leave a comment below.