eg: UK or Brides UK or Classical Art or Buy Music or Spirituality
 
eg: UK or Brides UK or Classical Art or Buy Music or Spirituality
 

Your Online Guide » Guide to Technology » Web Site Promotion

[H1566]How To Robots Txt
by Jkhjuk, Jkh
This protocol is used by web site administrators when there are sections or files that they would rather not be accessed by the rest of the world. This could include employee lists, or files that they are circulating internally. For example, the White House website uses robots.txt to block any inquiries on speeches by the Vice President, a photo essay of the First Lady, and profiles of the 911 victims.

How does the protocol work? It lists the files that shouldn't be scanned, and places it in the top-level directory of the website. The robots.txt protocol was created by consensus in June 1994 by members of the robots mailing list (robots-request@nexor.co.uk). There is no official standards body or RFC for the protocol, so it's difficult to legislate or mandate that the protocol be followed. In fact, the file is treated as strictly advisory, and does not have absolute guarantee that those contents won't be read.

In effect, robot.txt requires cooperation by the web spider and even the reader, since anything that is uploaded into the internet becomes publicly available. You aren't locking them out of those pages, you are just making it harder for them to get in. But it takes very little for them to ignore these instructions. Computer hackers can also easily penetrate the files and retrieve information. So the rule of thumb is'if it's that sensitive, it shouldn't be on your website to begin with.

Care, however, should be taken to ensure that the Robots.txt protocol doesn't block the website robots from other areas of the website. This will dramatically affect your search engine ranking, as the crawlers rely on the robots to count the keywords, review metatags, titles and crossheads, and even register the hyperlinks.

One misplaced hyphen or dash can have catastrophic effects. For example, the robots.txt patterns are matched by simple substring comparisons, so care should be taken to make sure that patterns matching directories have the final '/' character appended: otherwise all files with names starting with that substring will match, rather than just those in the directory intended.

To avoid these problems, consider submitting your site to a search engine spider simulator, also called search engine robot simulator. These simulators?which can be bought or downloaded from the internet? use the same processes and strategies of different search engines and give you a ?dry run? of how they will read your site. They will tell you which pages are skipped, which links are ignored, and which errors are encountered. Since the simulators will also reenact how the bots will follow your hyperlinks, you'll see if your robot.txt protocol is interfering with the search engine's ability to read through all the necessary pages.

It's also important to review your robot.txt files, which will enable you to spot any problems and correct them before you submit them to real search engines.

Sometimes we rank well on one engine for a particular keyphrase and assume that all search engines will like our pages, and hence we will rank well for that keyphrase on a number of engines. Unfortunately this is rarely the case. All the major search engines differ somewhat, so what's get you ranked high on one engine may actually help to lower your ranking on another engine.

It is for this reason that some people like to optimize pages for each particular search engine. Usually these pages would only be slightly different but this slight difference could make all the difference when it comes to ranking high.

However because search engine spiders crawl through sites indexing every page it can find, it might come across your search engine specific optimizes pages and because they are very similar, the spider may think you are spamming it and will do one of two things, ban your site altogether or severely punish you in the form of lower rankings.

The solution is this case is to stop specific Search Engine spiders from indexing some of your web pages. This is done using a robots.txt file which resides on your webspace.

A Robots.txt file is a vital part of any webmasters battle against getting banned or punished by the search engines if he or she designs different pages for different search engine's.

The robots.txt file is just a simple text file as the file extension suggests. It's created using a simple text editor like notepad or WordPad, complicated word processors such as Microsoft Word will only corrupt the file.

You can insert certain code in this text file to make it work. This is how it can be done.

User-Agent: (Spider Name)
Disallow: (File Name)

The User-Agent is the name of the search engines spider and Disallow is the name of the file that you don't want that spider to index.

You have to start a new batch of code for each engine, but if you want to list multiply disallow files you can one under another. For example

User-Agent: Slurp (Inktomi's spider)

Disallow: xyz-gg.html
Disallow: xyz-al.html
Disallow: xxyyzz-gg.html
Disallow: xxyyzz-al.html

The above code disallows Inktomi to spider two pages optimized for Google (gg) and two pages optimized for AltaVista (al). If Inktomi were allowed to spider these pages as well as the pages specifically made for Inktomi, you may run the risk of being banned or penalized. Hence, it's always a good idea to use a robots.txt file.

The robots.txt file resides on your webspace, but where on your webspace? The root directory! If you upload your file to sub-directories it will not work. If you wanted to disallow all engines from indexing a file, you simply use the "*" character where the engines name would usually be. However beware that the "*" character won't work on the Disallow line.

Here are the names of a few of the big engines:

Excite - ArchitextSpider
AltaVista - Scooter
Lycos - Lycos_Spider_(T-Rex)
Google - Googlebot
Alltheweb - FAST-WebCrawler

Be sure to check over the file before uploading it, as you may have made a simple mistake, which could mean your pages are indexed by engines you don't want to index them, or even worse none of your pages might be indexed.
Article Source : guaranteed web site traffic

About Author
Both Jkhjuk & Michael Sherriff are contributors for EditorialToday. The above articles have been edited for relevancy and timeliness. All write-ups, reviews, tips and guides published by EditorialToday.com and its partners or affiliates are for informational purposes only. They should not be used for any legal or any other type of advice. We do not endorse any author, contributor, writer or article posted by our team.

Jkhjuk has sinced written about articles on various topics from SEO Search Engine Optimization, SEO Search Engine Optimization and Web Development. For more useful tips & hints, please browse for more information at our website:- . Jkhjuk's top article generates over 18100 views. to your Favourites.

Michael Sherriff has sinced written about articles on various topics from Site Promotion, Web Development and Education. The SEO Company offers a complete and right now you can get a comprehensive ranking report for your website for free by clicking. Michael Sherriff's top article generates over 22200 views. to your Favourites.
EditorialToday Guide to Technology has 3 sub sections. Such as Technology, Increase Adsense Revenue and Information & Technology. With over 20,000 authors and writers, we are a well known online resource and editorial services site in United Kingdom, Canada & America . Here, we cover all the major topics from self help guide to A Guide to Business, Guide to Finance, Ideas for Marketing, Legal Guide, Lettre De Motivation, Guide to Insurance, Guide to Health, Guide to Medical, Military Service, Guide to Women, Pet Guide, Politics and Policy , Guide to Technology, The Travel Guide, Information on Cars, Entertainment Guide, Family Guide to, Hobbies and Interests, Quality Home Improvement, Arts & Humanities and many more.
About Editorial Today | Contact Us | Terms of Use | Submit an Article | Our Authors