Generally in case of small websites webmasters are under false assumption that they do not require to create a robot.txt file. But it is required. First of all let us define the robot.txt file. Even prior to it we need to define what a web robot is. A web robot is also called spider or crawler which should not be confused with the normal web browser as web browser is not a robot.
The main use of robots.txt file for webmasters in is to give instructions to the robot what they crawl and what should not be crawled. This can give you some control over the robots. This give you little more control over the robots and this indicates that you can issue indexing instructions to various search engines.
Robots.txt invites the search engines. Some of the good bots can also step away from your website in case you have not created robot.txt in the top level of your website. Some time there is requirement to exclude some pages from search engines. These are those pages that are still under construction and those directories that you do not want to get them indexed. You may also want to exclude those search engines whose main aim is to collect email addresses.
Robot.txt file is a simple text file created in notepad. This is required to be saved to the root directory of your website. It means that directory where your home page or index page is stored. In order to create a simple robot.txt file with the specification that allows all robots to spider your website, write the following info:
User-agent: *
Disallow:
This will allow all robots to index your pages.
In case you don't want a specific robot to have access any of your website pages. Then do the following:
User-agent:specificbot
Disallow: /
In case you do not want a specific robot to access any of your web pages, then do the following. Suppose you do not want Googlebot to index a page names as ?abc? and you directory name is newdir. In the disallow section you will be required to put:
User-Agent: Googlebot
Disallow:/newdir/abc.html
IN case you do not want to get indexed the complete directory then you would put:
Now if it's a complete directory you do not want indexed you would put:
User-Agent: Googlebot
Disallow:/newdir/abc.html/
By putting forwarding slash in the beginning and in the end, search engines are informed that not to include any of the directories.
Thus create a robot. Text is an important part in SEO services and it can not be ignored at all costs.
When a search engine spider accesses your website, it will usually look first for a file in the root directory of your site (where your website begins) called "robots.txt." The robots.txt file tells the spider what it may spider (index/parse). The standard for all of this is called "The Robots Exclusion Standard."
The format for this standard is very simple. It consists of records in a text file, each record consisting of two fields: a user-agent line and one or more disallow lines. These fields are formatted in a specific way so that the spider program can read them. You'll see examples of this formatting later in this article.
The first field is the "User-agent" field, which his used to specify which robot the "Disallow" lines in the next field apply to. Usually, this contains the wildcard character "*" to specify all robots. In some cases, however, you may wish to only exclude specific robots, such as the googlebot.
The second field is the "Disallow" field, which can actually contain several records. You can specify that robots are to ignore specific files, whole directories, or combinations of these. Password protected directories (such as those on a Unix system using .htaccess files) are usually excluded by robots, but it's a good idea to include them in the "disallow" anyway.
To create or edit your robots.txt file, you'll need a text editor such as Notepad. Whatever you use, just make sure it saves in pure text and in no other format. Your HTML editor usually has this function.
Comments can be done using the "#" character to specify that a comment follows. Since the file's contents are pretty self-explanatory, comments are rarely used. The first line of your robots.txt file is the User-agent line, so the first line will probably look like this:
User-agent: *
You can replace the "*" with any robot's name, if you wish. For a complete and up-to-date list of spider names, visit http://www.searchenginedictionary.com/spider-names.shtml.
The next line or lines will consist of those directories you wish to disallow access to the spider or spiders you've specified in the User-agent line:
Disallow: dontindexthis.html
This would block spiders from indexing the file "dontindexthis.html" in your root directory. To disallow a whole directory, just use the same format:
Disallow: /cgi-bin/
To disallow specific files in sub-directories, you would use a combination of these:
Disallow: /cgi-bin/dontindexthis.html
Wildcards can be used in several ways. You can specify a file AND directory of the same name in the same line like this:
Disallow: /notthisone
This blocks both the directory /notthisone/ and any files named "notthisone." (such as "notthisone.html" or "notthisone.cgi"). You can also include all files on the site by just putting a "/" in the Disallow line:
Disallow: /
A completed robots.txt file will look something like this:
If you want to get really complicated with your robots.txt file, I'd suggest you look at some of the robots.txt files of the big boys of the Internet like Amazon.com or eBay. You can find these by simply typing in the URL followed by "/robots.txt" (as in: http://www.amazon.com/robots.txt). These files are universally accessible via the Web as a rule.
The absence of a robots.txt file or a blank robots.txt file are the same and result in the spider indexing everything on your site, whether you want it to or not. So implementing a robots.txt file is important to your site's success.
Both Rajeev Guglani & Aaron Turpen are contributors for EditorialToday. The above articles have been edited for relevancy and timeliness. All write-ups, reviews, tips and guides published by EditorialToday.com and its partners or affiliates are for informational purposes only. They should not be used for any legal or any other type of advice. We do not endorse any author, contributor, writer or article posted by our team.
Rajeev Guglani has sinced written about articles on various topics from Internet Marketing, SEO Search Engine Optimization and Internet Marketing. Rajeev Guglani writes articles for SEO.He has vast exposure in writing for Web Promotions.He is working for NDDW. For Website Promotion ,Internet Marketing ,. Rajeev Guglani's top article generates over 18100 views. to your Favourites.
Aaron Turpen has sinced written about articles on various topics from Networking, Software and SEO Search Engine Optimization. Aaron Turpen is the proprietor of Aaronz WebWorkz, a web services company providing consultation, development, and more to small businesses online. Aaron publishes several newsletters regularly and is the author of many ebooks, including "The Layman's Gui. Aaron Turpen's top article generates over 9900 views. to your Favourites.