[drupal-devel] [feature] Introduce crawl delay in robots.txt

Uwe Hermann drupal-devel at drupal.org
Thu Mar 31 04:02:33 UTC 2005


Issue status update for http://drupal.org/node/14177

 Project:      Drupal
 Version:      cvs
 Component:    other
 Category:     feature requests
 Priority:     minor
 Assigned to:  bertboerland at www.drop.org
 Reported by:  bertboerland at www.drop.org
 Updated by:   Uwe Hermann
 Status:       patch

There's no indication /why/ it was removed, that would be interesting.
+1 for adding a default robots.txt file, if you ask me.


Uwe Hermann



Previous comments:
------------------------------------------------------------------------

December 10, 2004 - 12:29 : bertboerland at www.drop.org

Though it is not "a standard" within the "non standard" robots.txt, many
bots obey the "Crawl-delay:" parameter. Since drupal sites seem to be
popular with search engines and lost of people have more aggresive bots
than visitors at their site, it might be wise to slow down the robots by
adding a robots.txt line like:
User-Agent: *
Crawl-Delay: 10

(time in seconds between page requests)
Slurp (yahoo/AV) and MSFT bots obey this paramter, Googlebot not yet
but will most likely in 2.1+
Does it makes sense to ship drupal with a default robots.txt with this
parameter? If so, then there should be something in the documentaion
about moving this to docroot in case drupal is installed in a
subdirectory.


------------------------------------------------------------------------

March 30, 2005 - 21:41 : bertboerland at www.drop.org

Seems like there is no robots.txt anymore in cvs?
The old one was something like (delay added)
# small robots.txt
# more information about this file can be found at
# http://www.robotstxt.org/wc/robots.html
# if case your drupal site is in a directory
# lower than your docroot (e.g. /drupal)
# please add this before the /-es below
# to stop a polite robot indexing an exampledir
# add a line like
# user-agent: polite-bot
# Disallow: /exampledir/
# a list of know bots can be found at
# http://www.robotstxt.org/wc/active/html/index.html
# see http://www.sxw.org.uk/computing/robots/check.html
# for syntax checking
User-agent: *
Crawl-Delay: 10
Disallow: /?q=admin
Disallow: /admin/
Disallow: /cron.php
Disallow: /xmlrpc.php
Disallow: /database/
Disallow: /includes/
Disallow: /modules/
Disallow: /scripts/
Disallow: /themes/
Disallow: */add/


------------------------------------------------------------------------

March 30, 2005 - 23:26 : Morbus Iff

It appears that robots.txt was removed in 2002 [1].
[1]
http://cvs.drupal.org/viewcvs/drupal/drupal/Attic/robots.txt?rev=1.2&hideattic=0&view=log





More information about the drupal-devel mailing list