II. A Digital Library of Field Research

 

The second element of an infrastructure is the construction of a "digital library" as a central archive and organizer of the process of field research. As with the other components in this proposal, we have already begun work on building a modest version of what we propose. We candidly admit we don't really know yet what a "digital library of field research" will finally look like. But we have the right people with the right methodological and technological skills, and the right physical location (Chicago). Most importantly, we know the three central questions which need to be solved:

 

1. How can qualitative datasets be archived and made useful and accesible on the web?

 

To answer this question we need to a. compile a sufficient quantity of important qualitative data sets; and b. create an electronic interface through which researchers can use and compare these data sets.

a. Compiling Qualitative Data Sets. The University of Illinois-Chicago currently has the rights to the entire "Chicago School" textual and photographic archives. These are the founding texts of U.S. field research. The PI for this proposal is co-PI on a National Endowment for the Humanities proposal to further explore these vast data sets through a history of Hull House, which is physically located on UIC's campus. The great Chicago School works, like Zorbaugh's The Gold Coast and the Slum (1929), have been digitized and are awaiting use by hyper-text minded scholars. More importantly, unpublished manuscripts are being scanned and digitized, including interview transcripts and other contextual material. This is an unparalleled opportunity for social science.

The seminars run by the Clark Center described above will result in a steady stream of the top field researchers in the country coming through UIC. One topic of conversation will be how to mine the field notes, transcripts, and contextual material of current research. We expect these discussions to lead to the inclusion in the digital archive of major contemporary data sets. We note, however, that confidentiality is a serious issue. A researcher will make a data set available only if it is either in the public domain already, or if s/he sees advantages in how the data set can be analyzed. This leads us to how qualitative data sets can be more easily analyzed.

b. Creating an Electronic Interface. There are two ways to make qualitative data sets comparable and available. One way is to license off-the-shelf software, such as FolioViews™ or Nudist™, load transcripts and pictures into an "info-base" (Folioview's term) and make the data sets available on the web for downloading and/or on-line analysis. This would mean taking existing textual, photographic, audio, and video data and converting it into the commercial format. Then remote clients could read a data base, or more importantly, aggregate them for comparative analysis. Comparing Zorbaugh's and Thrasher's texts, for example, might yield new insights impossible to conceive using traditional techniques.

The second way to make qualitative data sets comparable and available is to design software which uses natural language queries, rather than the hypertext boolean searches used by commercial software. This solution has the advantage of making searches more powerful by formulating a "natural language" question to query a data base.

For example, In Folioviews, to look for instances of drug dealing by African American gangs, one might search an info-base of transcript text for all mentions of "Gangster Disciple" or "Vicelord" and "drugs," and then look further for cases where drugs are sold and not just used, and then further where the respondent sold them, not bought them. A natural language query would simply command: "Display all mentions by respondents from Chicago African American gangs who sell illegal drugs of any type." The "natural language query" would result in more specific information, and present textual material in a more useable context.

But natural language approaches are problematic since, after the conversion of the data set to a specialized text format, the "natural language" concepts need to be written. The different terms which might describe "drug dealer" or "drug deals" and which gangs are African American would need to be entered into the program. While this is time-consuming, it may be of major advantage to remote field researchers who can discover new ideas in old data sets through the skillful formulation of natural language concepts and subsequent queries.

John Shuler of the UIC Library will supervise consultants who will weigh a commercial solution compared to designing our own software. Negotiations with software companies for the licensing of their products may result in technical cooperation and additional resources.

Like so much in this proposal, we don't know the best way to organize qualitative data sets. We intend to utilize consultants and advice from software companies, as well as solicit input from field researchers through the seminars, our web page, and conferences. We expect to solve the problem of format in the first two years and spend the next three years setting up a national archive of qualitative data bases, starting with the Chicago School documents.

 

2. What software programs can best integrate qualitative and quantitative data?

 

We are going to address this rather broad question in a narrow way. Rather than look at all kinds of quantitative and qualitative data sets and come up with a general solution which is perhaps not quite suitable for anyone, we propose to focus on data which can be located in space, and thus utilize the advanced practice of GIS software.

The most exciting developments in software for spatial data analysis lie in data visualization and the use of powerful GIS software like ArcView ™ and ArcInfo. GIS techniques have been routinely used to organize survey and demographic data with visual data from locales. We intend to expand its use by adapting GIS software to the merger of qualitative and quantitative analysis. This may be a major part of the future for field research.

UIC is the ideal venue for this effort due to the existence of the Data Visualization Center (http://www.uic.edu/cuppa/udv/), a national leader in creating innovative applications for urban planning and policy making. The Center integrates multiple media, including Virtual Reality, GIS, and the world wide web to simulate future alterations and scenarios of cities and neighborhoods.

By superimposing layers of data on maps, neighborhoods in different cities can be easily viewed and compared with one another. By using photos and other graphical representations of key neighborhood locations and displaying them on the web, researchers, students, and stakeholders can "see" the neighborhoods being studied (http://www.uic.edu/~kheir). The use of online GIS systems, specifically ESRI’s Internet Map Sever, allows any participant access to a large collection of demographic and spatial data about a locale. In a recent project ESRI’s ArcIMS were used to present real-time visualization of geo-referenced data in close relation to narratives that include published reports and project descriptions (http://e036.cuppa.uic.edu/ims/north-lawndale/index.html).

This technique has revolutionary implications for qualitative analysis, which are just arriving on the academic agenda. For example, if researchers would want to compare neighborhoods adjacent to economic development to more socially isolated neighborhoods, as we intend to do in this proposal (see part III below), data visualization can integrate quantitative and qualitative data in vastly superior ways.

GIS systems can load census and other quantitative data on maps, and displayed in various visual manners, making data more understandable. Police data can be loaded to reveal locations of drug sales and "hot spots" of criminal activity. Demographic variables, like median income, property value, percent homeowner, ethnicity, and other standard variables can be used for comparisons.

Qualitative data can also be loaded on GIS systems, and this is where it gets more interesting. Photographs of neighborhoods can be loaded to facilitate direct comparisons. In areas where photographs from the past can be retrieved, an historical photographic exhibit can parallel longitudinal demographic data. Further, links to written texts, and audio tapes about neighborhood at various times can be added to the photographs, yielding a five dimensional description of an area: textual, visual, audio, digital, and statistical. All this, of course, can be placed on the web.

More directly, field work in a neighborhood can make use of the mapping of demographic and other spatial variables. Official data, such as locations of drug selling can be compared with information from direct observation and interviews with gang members, and the results compared by digital maps. Typing drug selling sites, and comparing such qualitative data to official quantitative data, such as location of arrests for violent crime, can contribute to understanding the adequacy (or inadequacy) of official crime reports.

In brief, we propose to begin to address the integration of quantitative and qualitative data through GIS technology. As with other aspects of this proposal, we do not have all the answers. The UIC Data Visualization Center, History Department, and the Clark Center have begun to address these issues in a preliminary manner. NSF support would vastly streamline our efforts and bring them more quickly to the attention of scholars nationally.

3. How can we use the web to increase interaction between the university and communities?

 

The final component of the "digital library" is its capacity to increase interaction with various publics and stakeholders, promoting social responsibility in research. There are two ways this proposal seeks to address these concerns: transforming the historic category of "personal documents" and organizing web-based feed-back to neighborhoods.

"Personal documents" is the term first formulated by Robert Park and his Chicago School colleagues to describe documents written in the voice of the research "subject." Clifford Shaw's 1930: The Jackroller: A Delinquent Boy's Own Story" is perhaps the seminal example (and, now available to UIC for digitizing). In its original conception, the sociologist wrote down the words of the "subject" and put them down in a case study format, often with a "sociological" introduction, explaining the voice of the book. The world wide web dramatically changes the concept of personal documents, making them more potent and vastly reducing the voice of the sociologist and amplifying the voice of those being studied.

Some respondents want their unabridged words to be made public. In the written era, some research subjects had their "stories" published —but not many, heavily edited, and placed in a proper "sociological" context. The web allows for entire interviews to be transcribed and placed on the web, or respondents videotaped and their actual words archived. This has substantial confidentiality concerns, which should not be underestimated. However, confidentiality should not be used to silence the voices of those who are not often heard in their own voice — such as gang members, the homeless, or drug dealers.

The solution may lay in two levels of data collection, one confidential, and the other made explicitly for publication, with final web-publication decision in the hands of the respondent. One example of on-going work in this area is UIC’s City Design Center which has been working on a project called Chicago Imagebase that employs some of these techniques (http://www.uic.edu/depts/ahaa/imagebase/). The web site allows users to access, index, store, retrieve, compare, and analyze images, maps, data, literature and other geographically-based materials.

As studies are completed, this process allows for video, audio, and textual comment by those studied before the work goes to print. Such practices would have had a substantial effect on studies like Vidich and Bensman's Small Town in Mass Society, which caused an uproar after its release. The web has the potential for developing a new norm for social research, of providing for comment and different voices on the research. findings and a digital space where respondents can be heard and seen.

Second, the web allows for interaction between communities and the academy. For example, the bi-directional capacity of the web pemitted urban planning surveys of residents taken on line, and results compiled and fed-back to the neighborhood (AL-Kodmany 2000). In this project JavaScript was used to create the interactive web interface. A Web Server/Data Servlet and Oracle database were used to collect and sort out data, then create immediate feedback in the form of composite maps. Later, these maps were integrated into ArcView GIS for spatial analysis.

The web also can be used in various phases of research, as we have suggested above. Most importantly, the web's use in a neighborhood requires a certain level of being "wired," which is one of the greatest deficiencies in poor neighborhoods today. However, "wiring" of libraries and schools is taking place everywhere, and one task of research can be to get residents who otherwise might not use the new technology to understand its value and make use of it. It is anticipated that in study neighborhoods, substantial training in information technology will be made available to area residents and to increase investment in "unwired" neighborhoods.

In sum, we envision the digital library as a virtual archive where important qualitative data sets can be accessed, analyzed, and compared, linked to quantitative data, and used to give voice to those who are too often voiceless.

 

Go the next section: The Four-City Study