| ACADEMIC COMPUTING and COMMUNICATIONS CENTER | |||||||||
| |||||||||||||||||
Which Files Get Indexed? | |||||||||||||||||
| How the Robot Finds Files | |||||||||||||||||
|
There are several avenues the catalog server uses in finding files to index:
|
|||||||||||||||||
| Robots - Keep Off! | |||||||||||||||||
|
What if you don't want your files indexed? By convention,
indexing robots normally download a file called
How do you get listed in
Here are
details
on the robots.txt conventions, but the file can be pretty simple.
Just list the directory subtrees, one to a line, that you want the
robot to avoid. For example, I might put the following in
disallow: /~bobg/restricted disallow: /~bobg/good_stuff/dontlookhereThen any robot should avoid asking for URLs that begin with these strings. Notes:
Example Assume you have a tigger directory, http://www.uic.edu/~bobg/block1 http://www.uic.edu/~bobg/ok/block2Then prepare a robots.txt file like this,
and place it in the public_html directory:
disallow: /~bobg/block1 disallow: /~bobg/ok/block1 NOTE: You must include the full prefix of the path part
of the url (in this case, |
|||||||||||||||||
| Web Search Forms | Previous: 2 Intro | Next: 4 Fields & Queries |
| 2005-6-18 wwwtech@uic.edu |
|