ACCC Home Page ACADEMIC COMPUTING and COMMUNICATIONS CENTER
Accounts / Passwords Email Labs / Classrooms Telecom Network Security Software Computing and Network Services Education / Teaching Getting Help
 
Web Searching / Indexing
0 Contents 1 Google 2 Intro 3 What's Indexed 4 Fields & Queries
5 Forms 6 Output 7 Examples A1 Related Links  

Fields & Queries

   
 
     
What Fields Can You Search On?
 

Netscape Catalog creates a summary of each document; it is this summary that is actually indexed and returned by the search engine. Each summary consists of several named fields and associated values. All of the values are combined and indexed, and in addition, some of the fields are indexed separately. In order to formulate intelligent queries, it's important to understand where the values come from, and which ones are indexed.

For example, the title field is derived from the <title> HTML element, and it is indexed. Therefore one could execute the query: find all documents where the word finance occurs in the title field. (The exact query language is different, of course, and will be described in another section.)

The author field is also indexed, but it is derived from the <meta> tag. That means that the HTML author must insert this into the <head> element of the HTML file:

     <meta name=author content="Paul Dirac">
After that, it would be possible to execute the query: find all documents where the word Paul occurs in the author field.

The partial-text field is not indexed; it is generated by the approximately first 1kB of text from the HTML document. If the word finance is contained in the partial-text extraction, it would be possible to execute the query: find all documents with the word finance in any field. But note, it would not be possible to restrict the search to the partial-text field, because this field is not indexed separately.

Some of the fields you might find useful are listed in the following table. Some of these fields are generated automatically, but others must be added explicitly with the <meta> tag. Some are indexed and can therefore be searched separately, while others are not specially indexed and can only be searched as part of the overall document index.
FieldSourceSpecial Index
Author<meta>yes
Keywords<meta>yes
Date<meta>yes
Title<title> elementyes
URLurl of pageyes
Last-Modifiedupdate of fileyes
Table-of-contents<h1>..<h3>tagsno
Description<meta>no
Partial-textfirst 1kB of textno

 
     
Construction of Queries
 

The Verity search engine will accept a wide variety of queries. It can search for documents based on word stem, exact word, wildcards, relevance ranking, and so forth. For full details, please consult the Verity Site.

This is of main interest if you want to formulate your own advanced queries. Otherwise, the cgi scripts at UIC will translate simple words into simple queries. But if you understand a little of the verity syntax, it will be clearer later how the web form input actually gets translated to a real search. (In most searches, stemmed variations of the search words count as a match. So if you search for printer, you will also get printing and printers.)

This table contains a small sampling of the most important search operators.
OperatorDescriptionExample
AND Both items must match trucks AND cars
<CONTAINS> finds words only in certain fields title <CONTAINS> trucks
greater than (>) dates greater than a specific value date > 5-30-97
less than (<) dates less than a specific value date < 5-30-97
<NEAR> relevancy ranking is higher if words appear near each other cars <NEAR> trucks
OR At least one item must match. Higher relevancy if words match more often trucks OR cars
<WORD> Match the word exactly, without considering variations of the word stem <WORD> trucks
quotes matches the phrase, not just the individual words "red cars"

 
 

Web Search Forms Previous: 3 What's Indexed Next: 5 Forms


2002-6-29  wwwtech@uic.edu
UIC Home Page Search UIC Pages Contact UIC