ACCC Home Page Academic Computing and Communications Center  
Accounts / Passwords Email Labs / Classrooms Telecom Network Security Software Computing and Network Services Education / Teaching Getting Help
 
Web Searching / Indexing
0 Contents 1 Google 2 Intro 3 What's Indexed 4 Fields & Queries
5 Forms 6 Output 7 Examples A1 Related Links  

Custom Output Formats and SGML Config Files

     
 
     
Background
 

SearchUIC is a generic CGI script that accepts information from an HTML form, optionally reads a config file from disk, runs a query on a search engine (which previously indexed Web pages at UIC), and presents the output. SearchUIC can be used by anyone publishing documents on one of the ADN Web servers (icarus and tigger, as of this writing), without requiring prior authorization from the Academic Computing and Communications Center.

 
     
Basic Usage Instructions
 

You must prepare at least one of two files:

  1. The original HTML form
  2. A configuration file, to be read by the SearchUIC CGI script, which tells the script how to behave when your HTML form is submitted.

In more detail:

  1. Prepare the HTML form, put it on the Web, and make sure the HTML is valid.
  2. Prepare a configuration file for your form. This config file must also be placed on the Web on tigger or icarus, although it is not intended to be publicly viewed. The name of the config file is arbitrary, and each HTML form should have its own associated config file. For purposes of this documentation, I'll assume the config file is named config.sgml.
  3. You will need to know the URL of this config file; more precisely, you will need to know the path part of the URL. That is, the URL without the "http://www.uic.edu/" host name part. For example, I have a file whose URL is http://www.uic.edu/~bobg/config.sgml; and, therefore the config file path is simply /~bobg/config.sgml. Make sure your config file is publicly readable, using the command chmod a+r config.sgml if necessary. If you can actually retrieve the config file with a Web browser using the proper URL, the permissions are fine.
  4. Inside the HTML form itself, you will, of course, need an <form action=" ..."> tag. If your config file is on tigger (doesn't matter where the form itself is), then this tag should be:
    
    <form method="POST" action="http://www.uic.edu/htbin/search/SearchUIC/path_of_config_file"> 
    
    
    So, for the above example, it would be:
    
    <form method="POST" action="http://www.uic.edu/htbin/search/SearchUIC/~bobg/config.sgml">
    
    
    (If this were on icarus, use www2.uic.edu instead of www.uic.edu)
  5. The format of the config file can be complicated, and will be discussed in the next section. For now, you can construct a minimal config file for testing. It should look like:
           <!DOCTYPE search SYSTEM 'Search-1.0.dtd'>
           <search name="mysearch">
           </search>
           
That's enough to get you started. You can then extend the functionality by changing the config file and possibly the HTML form.

Note: If you use both an HTML form and a config file, and happen to have the same variable set in both, the value in the HTML form takes precedence.


 
     
Config File Format
 

The config file is defined by a set of tags, similar in spirit to HTML. But the actual tags differ because the meaning is different. If you leave most of the file blank, defaults will be assumed, but you can override these defaults to a large degree.

The tags are described below, but you may (i.e., will) be interested in examples. You might bring one up in a separate Web window to compare it to the tag descriptions below.

(For the technically interested, I invented a small tag set using SGML, a meta-language for define tag sets. HTML is another such example of an SGML-defined tag set. And, since you already know about how to specify HTML tags, the SearchUIC tags won't be too surprising.)

The SGML tags for the config file are of three general types.

Initial tags <!DOCTYPE>, <search>
Input tags <set>, <eval>
HTML response tags <response>

Putting the search results into HTML responses

Before describing the tags in detail, you should be aware of how to refer to the value of an HTML input field or other internal variable when setting up the tags. Suppose you have an input field in your original HTML form:
<input name="myname" type="text" >
You may later refer to the value that the user types in as $myname or ${myname} In general, most everything in the config file starting with a dollar sign will be interpreted as a field value. If you really want just a dollar sign, escape it with a backslash: \$

You probably know that the quotes in the above <input> tag are not necessary unless the values of name or type have embedded spaces. Although it might be possible to use embedded spaces in the name of an input field, don't try this unless you have very strong proofreading abilities and/or like pain. If you do try this, make sure to use the ${myname} format to refer to variables.

There are a number of variables pre-defined for you, all of which start with $s_. You may need to reset some of these variables, either in the HTML form or in the config file. In general, your variables should be of the following form:

$s_date Current date
$s_ip Name or IP address of client
$s_config Full path to the config file
$myname Value of the myname HTML input field.
${myname} Value of the myname HTML input field. Note: Use this form if myname contains colons or other special characters.

Internal Variables

The following variables are available for use in evaluating expressions. Some of the variables can also be set, using HTML forms, or the <SET> or <EVAL> tags in the config file.

These variables are read-only:

VariablePossible ValuesMeaning
$s_date string current date
$s_ip dotted number or name name or ip address of client
$s_config string full path to the config file
$s_status OK if the search is normal, and has produced results.
NOHITS if the search is normal, but no hits were returned.
NOSEARCH if no search occurred, because the search string was blank
ERROR if an internal error occurred.
$s_stophits integer number of the last document returned.
$s_nres integer number of hits returned
$s_results string formatted results, provided $s_status is OK
$s_more + if more documents would have been returned in a larger search, otherwise blank
$s_next integer number of the next document to be returned, if a larger search were done
$s_qstring string the search string, in Verity Query Language
$s_qstring_html string an HTML'ized version of $s_qstring, to be displayed in the results
$s_qstring_url string an URL'ized version of $s_qstring, to be used inside a URL

These variables can be read or set:

VariablePossible ValuesMeaning
$s_starthits integer document number to start with
$s_maxhits integer max hits to return
$s_sortby ALPHA sort results alphabetically by title
DATE sort by last modified date
SCORE sort by relevancy score
$s_resultsby ALL Return all fields.
fields Comma-delimited list of fields to return. If one or two fields are specified, starting with title, return one line per hit. Otherwise use multi-line format. Example: title,description
User defined string You can set your own variables in the HTML form or by <set> or <eval> tags in the config file. You can later use these values in the body of a <response> tag, or in the if attribute to control the actual output.
query* See Query Field Format Variables that start with the string query are like user-defined variables, but also control the generation of the internal query string, manifested in $s_qstring

The <!DOCTYPE> tag

The first line of your config file must be:
		<!DOCTYPE search SYSTEM 'Search-1.0.dtd'> 
Don't worry about the details. This just tells the SearchUIC script which version of the SearchUIC tag set that you've used.

I've defined 'Search-1.0.dtd' to point to 'Search-1.0b1.dtd' as of this writing. So you have a choice:

  1. Use 'Search-1.0.dtd', which will always point to the current beta release, and to the production release later on. This might cause your config files to break if I have to change the DTD in incompatible ways. (This is possible, although unlikely, during beta testing.)
  2. Use 'Search-1.0b1.dtd' which points to a specific beta release. This should be stable (or stably broken, depending). But when I release the production version, I'll remove all beta copies, so you'll have to change your tags at that point.

The <SEARCH> tag

The config file contains a (possibly large) element in the form: <SEARCH> ... </SEARCH> and all the other tags will sit inside. The <SEARCH> tag takes some attributes:

	<SEARCH DEBUG NAME="search_name">
DEBUG Optional Just put in the word DEBUG. This will send lots of diagnostics to your browser when you submit your search. Highly encouraged for diagnosing why your search behaves as it does. (Note that if your config file contains syntax errors, they are reported to your Web browser whether or not you have DEBUG turned on. DEBUG is used to analyze the behavior of a syntactically-correct config file that may or may not do what you want.)
NAME="search_name" Required Pick any search_name you want. Best if different searches have different names.

The <SET> tag

The <set> tag is pretty simple. It lets you set variables before the search is conducted. The exact same function could be done with (possibly hidden) input fields in the HTML form. In many cases, it is just a matter of preference whether or not to set the variables one place or the other.

Warning: The <SET> tag takes an if attribute. If the if attribute does not exist, the SET action takes place before the HTML file is read. But if the if attribute is set, the action takes place after the HTML file. This means that values set by the HTML file will override those SET statements that don't contain ifs, but may be overridden by SET statements that do contain ifs. It also means that the if attribute may depend on values contained the the HTML file, but not on values generated by the search itself. The format of this tag is:

<SET NAME="variable_name" VALUE="variable_value" >

NAME="variable_name" Required variable_name might match the name of one of the existing internal variables (those starting with s_) or simply be your own variable that you want to reference later. Don't use the leading $ before the variable name here.
VALUE="variable_type" Required variable_type is the value of the variable. Should be a simple string or number, depending on the variable.
IF="if_cond" Optional If absent, NAME is set to VALUE before the HTML file id read. If present and if if_conf evaluates to true (possibly depending on values from the HTML file but not the search results), then NAME is set to VALUE, overriding the HTML file values.

The <EVAL> tag

This tag is similar to the <SET> tag in that it is used to set variables. But it is more complicated in two ways. Firstly, it takes an optional IF attribute, so that the setting of the variable may or may not take place, depending on the evaluation of the IF clause. And secondly, the value may itself be an expression to be evaluated, the ultimate result of which may depend on the outcome of the search performed.

Yes, it seems complicated. But proper use of this tag will let you present, as part of the result of a search, an HTML query form whose details depend on the original query form and on the outcome of the search. This is probably clearer in the examples.

The format is:

<EVAL NAME="variable_name" VALUE="expression" IF="if_cond">

NAME="variable_name" Required variable_name must match one of the internal variables (starting with s_ or one of your own variables. Don't use the leading $ before the variable name here.
VALUE="expression" Required expression , once evaluated, is the value that variable_name is set to. Do use the leading $ to refer to the value of a variable, if necessary.
IF="if_cond" Optional Validation condition will be checked only if if_cond evaluates to true.

The <RESPONSE> tag

The <RESPONSE> ... </RESPONSE> sections control the HTML generated in response to search. There may be many such sections, and they are evaluated in the order presented in the config file. The syntax is:
	<RESPONSE NAME="mname" IF="if_cond" URL="url">
The attributes are:
NAME="mname" Optional mname is used to identify the response section for debugging.
IF="if_cond" Optional Response will be printed only if if_cond evaluates to true.
URL="url" Optional Output will be redirected to the given full url. Do not use a relative url, make sure it starts with http or ftp or other valid protocol. No check for validity of the redirected url is made. No other response sections will be evaluated if the redirection is made.

The sections are evaluated in order. If the first to evaluate to true contains a URL redirection, the output is redirected. Otherwise, all sections that evaluate to true are sent back to the client browser.

The body of a <RESPONSE> section should contain the HTML code to print. HOWEVER, the enclosed HTML tags must be protected from the SGML parser for this to work. This means that a response section should look like:

<RESPONSE NAME="resp1">
<![CDATA[
here is some <b>HTML</b> code
]]>
</RESPONSE>

Important Note: If you choose to use the <RESPONSE> tags to generate any part of the HTML response, you are entirely responsible for generating the whole response page, from the beginning <html> tag to the ending </html> tag. Don't presume that SearchUIC will do any of this for you, unless one of the following is true:

  • You omit the <RESPONSE> tags, and let SearchUIC generate its default response.
  • You use URL redirection within the <RESPONSE> tag.

The "if_cond" conditions for <EVAL> and <RESPONSE>

The optional if conditions specified in the <EVAL> and <RESPONSE> sections are expressions separated by Boolean ANDs and ORs. But the conditions are not totally general, because parentheses are not allowed, and the ANDs bind tighter than the ORs. Therefore

exp1 AND exp2 OR exp3 OR exp4 AND exp5
is parsed as:
(exp1 AND exp2) OR (exp3) OR (exp4 AND exp5)
I hope this is sufficient for most needs.

Each expression is evaluated as a string comparison or numeric comparison, depending on the operator used. Each expression should take one of the following forms. Note that each comparison operator must be delimited with blanks.

Expression True if: Type
string string evaluates to non-null string
NOT string string evaluates to null string
string1 eq string2 string1 equals string2 string
string1 ne string2 string1 does not equal string2 string
string1 pre string2 string1 begins with string2 string
string1 npre string2 string1 does not begin with string2 string
num1 == num2 num1 equals num2 numeric
num1 != num2 num1 does not equal num2 numeric
num1 >= num2 num1 is greater than or equal to num2 numeric
num1 <= num2 num1 is less than or equal to num2 numeric
num1 > num2 num1 is greater than num2 numeric
num1 < num2 num1 is less than num2 numeric
where string is either a hard-coded string (case-sensitive), or a dollar-variable, such as $var or $s_name. The string should not contain internal blanks. For numeric comparisons, num should evaluate to a number, where null evaluates to 0. Using numeric comparisons with non-numeric strings is undefined, and should not be relied on to produce either true or false.
 
Web Search Forms Previous:  5 Forms Next:  7 Examples


2008-2-11  wwwtech@uic.edu
UIC Home Page Search UIC Pages Contact UIC