banner_450x90.gif (12558 bytes)
Table of Contents
Site Map

The Search Process

At the beginning, there are four questions that must be addressed in the course of conducting an online Internet Search:

WHAT:

Before we can begin searching for information, we must know what we're trying to  research.  Searching is essentially a problem-solving activity.   Therefore, we must first isolate the problem and devise a sufficient number of approaches to solve the problem satisfactorily.  The problem may be in the form of a specific question, or a wide-ranging topic originating from a general subject(s) or topic(s).  Specific questions, such as finding out who wrote a paper or locating a citation for a published work from a known author or title are relatively simple and straightforward.  Our primary interest is a second type of problem, usually a subject search, which offers many variations in terms of possible queries and retrieval outcomes.

With the second type, how we choose to define a subject search influences every ensuing step and the process involves changes, elaboration, and refinements of the search terms and their relationships, which together comprise a query.  In essence, what we're trying to accomplish is the transformation of abstractions into words that can be manipulated to redefine their relationships, then posed as a query to a database or search engine.  By defining, we create an interrelated organization of concepts with a finite scope that is temporal, geographical, hierarchical, or based on some other important consideration.  The overall methodology involves making choices and decisions to focus a general topic into a manageable number of more specific resources that can be examined, evaluated, retained or discarded, and used.

WHERE:

We will be conducting searches for journal citations in bibliographic databases that can be accessed on the World Wide Web; and for web pages or web sites (groups of related web pages) on the World Wide Web itself.

Bibliographic Databases

The World Wide Web

Although our efforts for this learning exercise are limited to the World Wide Web and to bibliographic databases accessible through the Web, the principles in this module can be applied to searching almost any database or information source that is generally available to the public.  In some cases, specific database query languages must be used to construct queries according to language-specific rules of syntax.  Structured Query Language (SQL) or Query-by-Example (QBE) are examples of database-specific languages.

HOW:

This step constitutes the actual construction of a search query from the search terms that encompass the topic that you have selected.

Basic Search Logic and Syntax Concepts

Advanced Search Concepts

This stage consists of three steps: generating search terms; constructing a query; and limiting the search retrieval. We begin by breaking up the subject into into its component parts, which become search terms.  Search terms are usually synonyms or closely related terms.  The most effective method of searching starts out very broad at the beginning, and is subsequently narrowed as we further define the concept.   Therefore, at the beginning we want as many synonyms as possible to generate a large amount of retrieval.  One of the most common problems is incorrect entry or misspelling of the search terms.

By defining relationships between these search terms, we further define the concept and narrow the search concept.  This focuses the search and reduces the amount of retrieval.  The relationships between the terms are governed by logic and defined in syntax that is appropriate to the database or search tool that is being used.

Once the terms and their relationships have been decided, we often can further focus the search by limiting the retrieval to a certain time period, type of publication, language, or source.

The terms and concepts listed below can be used to further define searches and increase the specificity of the retrieval. Click on the term for further explanation.

Basic Search Logic and Syntax Concepts:

Advanced Search Concepts:

Since many WWW search engines and WWW database search interfaces have intrinsic search syntaxes, it is useful to read all Help screens, and print them out for future reference. The terms and concepts introduced here are found generally in most advanced search tools, and can save much time in defining a search that answers specific user needs.

  • Synonyms & Relationships

    Prior to searching online, it is useful to make a list of terms that encompass the subject that you wish to search. Try to think of synonyms for the terms, and determine the relationships between them graphically. Example of relationships between terms includes hierarchical, causal, temporal, and spatial.

  • Controlled language

    Many databases index their content according to hierarchical subject, headings, which are cross-referenced and frequently updated to introduce new subject heading. These subject headings are stored in a thesaurus, which explains the use of each heading, its position in the hierarchy, and the history of its use. Controlled language searches result in very high precision, when the correct subject headings are used. Some search interfaces, like Ovid, will map natural language terms to the correct subject heading, or present a list of near matches to select from. MEDLINE is an example of a controlled language database. The controlled language it uses is called MESH (Medical Subject Headings).

    • MeSH

      MeSH is the National Library of Medicine's controlled vocabulary thesaurus. Thesauri are carefully constructed sets of terms often connected by "broader-than," narrower-than," and "related" links. Synonyms for the term thesaurus are "classification structure," "controlled vocabulary," and "ordering system." A well constructed thesaurus links related terms and provides a hierarchical structure that permits generic searches to be conducted. MeSH consists of a set of terms or subject headings arranged in both an alphabetic and a hierarchical structure. At the most general level of the hierarchical structure, headings include, for example, "anatomical terms," "diseases," and "chemicals and drugs." At the more detailed level are names of specific neoplastic, immunologic, and viral diseases. There are more than 18,000 main headings in the primary structure of MeSH. In addition to these main headings, there are an additional 80,000 headings within a special chemical thesaurus. Thousands of cross-references assist the user in finding the appropriate MeSH heading.

    • UMLS

      The NLM's Unified Medical Language System attempts to translate natural language terms into the correct MESH terms. The purpose of the UMLS is to aid the development of systems that help health professionals and researchers retrieve and integrate electronic biomedical information from a variety of sources. The UMLS project develops machine-readable "Knowledge Sources" that can be used by a wide variety of applications programs to overcome retrieval problems caused by differences in terminology and by the scattering of relevant information across many databases. The goal is to make it easy for users to link disparate information systems, including computer-based patient records, bibliographic databases, factual databases, and expert systems.

  • Keywords

    Many databases that don't use controlled language searching index articles to a lesser extent by selecting several key terms that describe the content of the source. These may occur in the title, author, or abstract fields.

    In general, each database search interface or web search engine has its own particular rules (called search syntax) for specifying the relationship between search terms, and for limiting the outcome of the search. Help screens should be accessed and printed if you are unfamiliar with the commands. However, there is some generic syntax commands that are common to many search tools. These are used to shape a more exact outcome to the search; they are generally found in the Advanced Search interface.

    Boolean Searching

    By specifying Boolean relationships between search terms, the results of the search can be shaped to include or reject the union or intersection of the search sets. This is illustrated graphically below.

    Each of the pairs of circles represents the use of a Boolean search operator to shape the outcome of the search. For example, let's suppose we are searching a bibliographic database for articles on pets. We are particularly interested in articles about "cats" and articles about "dogs." Therefore, we will conduct a search using those two terms. The pairs of circles below represent different relationships we can specify between the terms

    AND
    The relationship between our two search terms was specified by "AND" (i.e., cats AND dogs) is shown by the pair of circles labeled AND, where one of the circles represents a set of articles about cats and the other circle represents a set of articles about dogs. The red area represents the intersection of the two sets, a set of articles mentioning both cats and dogs.

     
    OR
    If the relationship between our two search terms was specified by "OR" (i.e., cats OR dogs), the output of the search would be comprised of a set including articles about cats, articles about dogs, and articles about both cats and dogs. This is shown by the pair of circles labeled OR, where one of the circles represents articles about cats and the other circle represents articles about dogs. The red area represents the union of the two sets, a set of articles mentioning cats, dogs, and both cats and dogs.
     
    NOT
    If the relationship between our two search terms was specified by "NOT" (i.e., cats NOT dogs), the output of the search would be comprised of a list of articles about cats, and any articles about dogs would be eliminated from the search set by the term "NOT." The results set are shown by the pair of circles labeled NOT, where one of the circles represents articles about cats and the other circle represents articles about dogs. The red area represents subtraction of the dog set and of the intersection of the two sets (i.e., articles about both cats and dogs), leaving articles about cats only.

    Nesting

    Nesting allows search terms to be combined with each other to give a greater measure of control to the characteristics of the set of retrieval items. Parentheses are used to separate the search terms. For example, a nested statement would be "((cats OR dogs) AND pets) AND health" which would result in the retrieval of items about the health of cats and/or dogs that are pets.

    Phrases

    While Boolean AND specifies that two terms are in the same item, phrase searching searches for an exact string of text. This increases precision and reduces retrieval. For example, a search for "triple AND play" which might locate the terms in the context of "triple your salary" and "play it by ear" that occur in the same item. Phrase searching for "triple play" would locate the baseball term. Double quotes, as shown, surrounding the phrase are one of delimiting the terms; another is the search operator (w), e.g. triple (w ) play.

    Proximity

    This is a useful tool in specifying the relationship of terms by their physical proximity. If we are searching for items about healthcare workers, we could search on the phrase itself. This might eliminate a number of useful items that don't use the exact phrase, "workers in healthcare" or "workers employed by healthcare providers." A proximity search operator allows you to search for terms that are near each other in the item. For example, Alta vista uses the NEAR (healthcare NEAR worker); other search engines or databases may use (n), e.g. healthcare (n) worker. More powerful search engines can specify the number of words that may occur between the terms in the item, or their order of occurrence.

    Case

    In general, most web search engines are not case sensitive. Except for proper nouns, this is usually not a major hindrance. It is often better to search under lower case, since the retrieval includes lower and upper case occurrences, and prevents early elimination of valuable items from the retrieval.

    Truncation

    This is a useful search operator when multiple uses of a term are to be searched, or the exact spelling of the term is uncertain. The two types of truncation are right hand and left hand; most search engines offer right hand (the end of the term). The specific operator varies, but often the ?, #, or * character is used for truncation. An example would be the search term "librar*." The retrieval would include items with "library", libraries", "library's", "librarian", "librarians", and "librarianship."

    Limiting Search Sets

    Once the search has resulted in a set of retrieval items that is satisfactory, the specificity of the set can be further increased by limiting the set. This is a common feature of database searching, for example, MEDLINE. You can limit the sets to items that appear in a particular language, a type of publication, or a particular date or range of dates. Many web search engines offer limitation to a range of dates and language.

HOW WELL:

Critical Evaluation

Although much of published literature undergoes some form of  a pre-publication review process, such as peer review, it is important for researchers to develop the critical thinking skills.  It is important not only to develop criteria that apply specifically to the relevancy of the retrieval, but also a general set of criteria to measure the value of everything that we read.  One must eventually sift through the content of the retrieval resulting from search queries to separate the wheat from the chaff; a final filtration.  If the process has been performed well, this step is not onerous.  The critical evaluation process begins during the search process through further refining or redefining some or all of the three preceding steps of the search process, like a feedback step.  Critical evaluation encompasses a broad spectrum and basically answers the question of how well a source of information meets our requirements and information needs.  At the top level, we begin by critically evaluating the relevancy and legitimacy of the retrieval's authority, content, source, and all other parameters that are appropriate.   If successful searches can be characterized by their degree of precision, timeliness, and relevancy, then critical evaluation is the final test.

 

top.gif (1204 bytes) <date last modified:08/30/00>