The robots.txt file is a very powerful tool to use for the purpose of SEO, but if used incorrectly it can lead to your website or blog being completely ignored by search engines!
The robots.txt file tells a search engine spider which directories on a server it can crawl. These directories will then be added to the search engines database. If you do not want a directory to be added to a search engine, you can stop it from happening by adding the relevant command to the robots.txt file.
Why would you want to exclude a directory from a search engine?
Say you have a folder/directory on the server which contains a bunch of PDF files for internal use or for download to members. If this directory was exposed to search engine spiders, it would be added to the search database and links to these files could end up in the search engine results pages. Another example is a directory for clients to see work in progress or where files are stored for private use. Anything you do not want a search engine to see or the general public to find, should be excluded in your robots.txt file.
What does the robots.txt file look like?
This little, but very powerful file, takes on many guises and it’s appearance will change from server to server depending upon what you want it to do. It is best created using Notepad – just save as “robots.txt”.
If you are happy for the spiders to crawl every page on your website, and most people are, then the format is very straightforward:
User-agent: *
Disallow: /
That’s it! This well tell all search engine spiders (Googlebot, Yahoo Slurp, MSNbot etc) to follow every link on the server and index everything it finds.
However, say you have a folder which contains information you don’t want search engines to index, what is the command for excluding it from the crawl?
Simple, just add the name of the folder to the disallow line
User-agent: *
Disallow: /pdf
This tells the spiders to crawl every directory except the pdf directory as this has been disallowed. If you want to exclude multiple folders from being crawled, just add a line for each one:
User-agent: *
Disallow: /pdf
Disallow: /photos
Disallow: /internal-files
The list can be as long as you want it to be.
Almost every website will have a robots.txt file of sort or another and anyone can see them; all you have to do is enter the domain name and add “robots.txt” after the backslash.




