How to use a robots.txt file

October 07 0 Comments Category: Blog, SEO

As with many of my posts on this blog this post has been inspired by a conversation with one of my clients.

What is a robots.txt file?

A robots.txt file is usually located in the root of a website for example http://www.rutley.co.uk/robots.txt. A robots.txt file tells search engines what parts of the site they should visit, and which ones they should not include in their index.

What does a robots.txt file look like?

A robots.txt file must be laid out in a specific format. It should first say how much access each of the search engines have.

As there are many search engines you can specify a general policy for all search engines, and also set policies for individual search engines.

To set a general policy:

User-agent: *
Disallow:

The * means the rule applies to all search engines and the Disallow: means that no directories/files are disallowed.

To disallow the entire domain from all search engines you can use:

User-agent: *
Disallow: /

To disallow all the search engines from a specific directory you can use:

User-agent: *
Disallow: /admin/

To specify a rule for a specific search engine you can add:

User-agent: googlebot
Disallow: /articles/

This will stop googlebot indexing the articles folder.

You can also specify the location of your sites XML sitemap. To do this you simply add:

Sitemap: http://www.rutley.co.uk/sitemap.xml

This is great for SEO, as it tells the search engines where your sitemap is located, which in turn lets the search engine know what pages are available for it to index.

Summary

Robots.txt files are useful for several reasons. They allow you to stop directories and files from being indexed and can help more of your pages to be indexed.

Write a Comment

Commenter Gravatar