WordPress Duplicate Content Prevention with robots.txt

WordPress Duplicate Content Prevention with robots.txt

WordPress Duplicate Content Prevention with robots.txt

To prevent the WordPress Duplicate Content in Google that may arise when you use any versions of WordPress, here is the typical content of the robots.txt file:


User-agent: *
Disallow: /comments/feed/
Disallow: /feed/
Disallow: /feed/atom/
Disallow: /feed/rss/
Disallow: /rss/
Disallow: /trackback/
Disallow: /wp/
Disallow: /*/comments/feed/$
Disallow: /*/feed/$
Disallow: /*/feed/atom/$
Disallow: /*/feed/rss/$
Disallow: /*/rss/$
Disallow: /*/trackback/$

The above code is based on the following assumptions:

  • The WordPress address (URL) is in the form http://www.yoursite.com/wp
  • The robots.txt file is located at http://www.yoursite.com/robots.txt
  • The above robots.txt file content is to be used with the AdFlex WordPress theme
Posted in WordPress on Apr 2nd, 2007, 7:51 am by VK   

25 Responses

  1. Mike
    April 8th, 2007 | 8:53 pm

    Hey VK,

    Another question. I’ve added the lines in my robots.txt since some days but I haven’t seen any changes yet?

    How long does it take to be effective?

    Do I need to add other things somewhere else?

    Mike

  2. VK
    April 12th, 2007 | 5:55 pm

    Hi Mike,

    Another question. I’ve added the lines in my robots.txt since some days but I haven’t seen any changes yet?

    How long does it take to be effective?

    Do I need to add other things somewhere else?

    It’s taken into account by Google and other SE during their next web spider visit.

    Now if you have already current items in your SERPs that you want to remove, you’ll have to use the URL remove tool as given by Google, Yahoo and MSN.

    As I suppose you’re interested mainly by Google as it’s currently the only search engine that may give you some duplicate content if similar items are in your SERPs.

    For Google, you need to use the Google URL Remover tool (also referred as Google Automatic URL Removal System). It’s located at URL: http://services.google.com/urlconsole/controller

    This Google tool is not as robust as other Google tools you may know but it does the job in most of the case. It happens it’s down time to time. It’s also not as secure as other Google tools as the Google’s Remove URL tool is accessible with HTTP and not HTTPS.

    The process is simple:
    1) After you’ve added the content to exclude in your robots.txt file, connect to the URL:
    http://services.google.com/urlconsole/controller
    2) There, just create an account to use the tool. You’ll get an e-mail that you’ll need to confirm.
    3) After confirming the e-mail for the account creation, just connect to the Google URL Remover service.
    4) Select the option “Remove pages, subdirectories or images using a robots.txt file.”
    5) In the field “URL to your robots.txt”, type the URL to your robots.txt. For instance: http://www.yoursite.com/robots.txt
    6) Click on the button “Remove Pages”

    In about 2 days, you can check your SERPs by check if Google has already done the URL removal request by doing a Google site command. The Google site command means you do a Google search for the following string:
    site:www.yoursite.com
    Of course, you have to replace http://www.yoursite.com by your site name but don’t put http:// there.

    You’ll see that the bad, not nice, crappy URLs that were in your SERPs will be removed and you’ll get a much nicer SERPs when doing the Google site command.

    Now some notes:
    a) – Be really sure to put the right content in the robots.txt file because when performing using the Google Automatic URL Removal System, the URLs to be removed will be removed out of Google index for 6 months!

    b) – As said previously the Google Automatic URL Removal System at URL http://services.google.com/urlconsole/controller is often down. If it’s the case, you can also use the other URL http://services.google.com:8882/urlconsole/controller that is more often up.
    The second URL version use port 8882 and it means you may not be able to access it when behind a company firewell that blogs such non-standard port number.
    Of course a direct connection to Internet will work fine most of the time.

    Hope it helps.

    Cheers,
    VK

  3. July 30th, 2007 | 10:42 am

    How big is the problem with duplicate content in wordpress. I’ve never seen any huge problems with it.

  4. August 10th, 2007 | 8:34 pm

    Its really not so much a problem as it is just holding back your site from geting higher rankings, more targeted visitors, and less supplementals..

    Check this new updated version:

    http://www.askapache.com/seo/updated-robotstxt-for-wordpress.html

  5. August 17th, 2007 | 8:56 pm

    [...] prevent the Wordpress Duplicate Content in Google that may arise when you use any versions of Wordpress, here is the typical content of the [...]

  6. February 27th, 2008 | 11:52 pm

    Nice wordpress tips. I’d never thought about the duplicates like that

  7. February 16th, 2009 | 11:24 am

    Normally I don’t comment on sites but your article was good.

  8. rss
    February 20th, 2009 | 8:40 pm

    I really liked your blog!

  9. March 25th, 2009 | 5:09 am

    He is spot on. His html is spot on. Follow this article exactly! You will have to wait for google spiders to re-crawl/update your website to see the changes. I give 65 thumbs up to this article and the others. Great content blog here. Add it to your favs if you want to make it in the “Internet” industry.

  10. May 7th, 2009 | 7:50 am

    Great post. I was actually looking for info on Joomla robots.txt, but this really gave me some ideas about a few things. As a result, I switched off the pdf’s and print options on the site. I doubt anyone uses them anyway.

    Cheers,

  11. June 8th, 2009 | 6:04 pm

    You have a great blog here and it is Nice to read some well written posts that have some relevancy…keep up the good work ;)

  12. July 14th, 2009 | 10:05 am

    Wow, what a find. This post is definitely one of the best WP tips that I have seen. I also write a different excerpt for each post because it shows on the category page. This prevents duplicate content on the category pages. Keep up the good work!

  13. August 30th, 2009 | 11:17 pm

    I never ever post but this time I will,Thanks alot for the great blog.

  14. September 13th, 2009 | 5:58 pm

    Good read, thanks.

  15. September 22nd, 2009 | 8:55 pm

    Hi! Thanks you for nice post

  16. October 7th, 2009 | 6:58 pm

    Great post and very informative. Google is emphasising more and more on uniqness of content and thus avoiding duplicacy could be of great help for SERP.

  17. October 21st, 2009 | 11:48 am

    Long time waiting for something like this!

  18. October 30th, 2009 | 5:43 am

    i am still looking for a good SEO plugin for Wordpress. my blog is not ranking high enought for the keywords that i wanted to rank.

  19. November 26th, 2009 | 6:08 pm

    Hi VK
    Still trying to get my head around this duplicate content thing with Wordpress, posts, archives, categories etc.
    I can handle a static site, you just don’t repeat yourself, but with a dynamic site!
    This certainly helps.

  20. December 5th, 2009 | 11:27 am

    Hi VK
    I’m trying to put together a robots.txt file to prevent duplicate content and I notice that Archives and Categories are not excluded in your robots.txt file.

    If they are not excluded, won’t they produce duplicate content?

  21. VK
    December 7th, 2009 | 8:48 pm

    Why not give a try to my SEO WordPress theme called AdFlex Niche/Blog. More info at URL:
    http://www.vklabs.com/wordpress/

    My SEO WordPress theme provides you with tons of SEO options and other settings that may help you get a higher on-page SEO score and so a better ranking in SE. Of course, on-page SEO is one important things but you also need to dedicate some times on the off-page SEO part.

    Hope it helps.

  22. VK
    December 7th, 2009 | 8:50 pm

    I have to get back to work on that topic and add other helpful. Hope can do that very soon.

  23. VK
    December 7th, 2009 | 8:58 pm

    I understand what you mean. Sometimes using the robots.txt file only is not enough for smarter more dynamic web site. I was in such case too in the past and that’s why I finally created my SEO WordPress theme to help me add more anti-duplicate content feature directly via the WordPress Dashboard.

    Please, when you have time, just quickly install my SEO WordPress theme located at URL:
    http://www.vklabs.com/wordpress/

    And you’ll see all the SEO anti-duplicate settings that you can play with. At first, it may be impressive but you’ll understand the logical goals.

  24. VK
    December 7th, 2009 | 9:04 pm

    In fact, the assumption is that the Archives and Categories listing just show the excerpt and not the full post content. Full post content will be only displayed when you are in front of a page or a post.

    If you need even more control at the level of the SEO anti-duplicate settings, you’ll have to not only use the robots.txt file but also change other things at the level of the WordPress itself. For instance, using a SEO Plug-in or a SEO WordPress theme.

    I have created a SEO WordPress theme that comes up with about 200 options/settings and lots of SEO options/settings among them to give the user a better control of the SEO aspect and the anti-duplicate content thing. If you have to look at my SEO WordPress theme, just go to URL:
    http://www.vklabs.com/wordpress/

    Then just read the PHP code provided with my SEO WordPress theme and if needed, just copy the code that may be of any interests to you and add/include that code to your WordPress site.

  25. January 6th, 2010 | 6:03 pm

    As a graphic web designer , I’m truely glad to see that another individual brought up this topic.

    Quite a few people don’t understand what all is involved in this industry, and I think also we are many times not appreciated enough
    or taken for granted. Never the less I’m glad to see that you feel the same way I do , thanks so much for your blog!

Leave a reply