University of Illinois at Chicago, School of Public
Health
Environmental and Occupational Health Sciences Division
DR. PETER SCHEFF: I want to welcome everybody viewing this to the first of a series of lectures on environmental statistics. Our opening lecture will be Interpreting Your Monitoring Data. These lectures, along with files and examples of our problems, will be on-line for you to consult and work with. And so one thing that would be very useful for us, and to make these lectures more successful, would be for you to look at the materials on-line and send us your questions and suggestions for new examples, new improvements or things you just don't understand.
We hope these lectures don't remain static but actually develop over time; and so hearing back from you, is the most important thing that we are looking for.
I also want to acknowledge at this time the United States Environmental Protection Agency for their continuing support, and all of the students who sat through various versions of these lectures to help make them work.
I also want to acknowledge my colleagues Sal Cali and Justin Ford, who were instrumental in organizing and developing these lectures.
So, overall, the objectives of this program are to provide the context and the background for air pollution sampling and sampling design. We're going to give you, in these series of lectures, an introduction to environmental statistics and assessment as it relates to air pollution data.
Sampling is a necessity. When we make environmental measurements, we are sampling a relatively small amount of the environment. In the case of air pollution sampling, we're sipping a little bit of air at a monitoring site. What we need to do is figure out what that tells us about the surrounding environment. To make that judgment, to make that leap from the small sample that we collect to the large environment, we must use statistics to help us guide our judgments so we don't make incorrect assessments.
Now, throughout the course of these lectures, we'll be looking at pollutants, both criteria pollutants and toxic pollutants. These are very different materials being collected for very different reasons. The criteria pollutants are listed here on this slide: Carbon monoxide, lead, particulate matter, sulfur oxide, nitrogen oxide and ozone. These pollutants are defined as criteria pollutants specifically because of their known human health effects defined by epidemiological studies.
The National Ambient Air Quality Standards, or NAAQS, are based on these health studies, very specifically. Monitoring for the NAAQS is primarily to determine attainment or nonattainment.
In contrast, air toxics are a much newer program. The air toxics themselves were defined by the United States Clean Air Act in 1990, and they're specifically one of the 88 materials which were at that time believed to be toxic.
In the case of toxic pollutants, the health effects are not as well defined. They tend to be more from animal studies than human studies. And, in general, the toxics program is more aimed at a risk program or risk management structure rather than an attainment program.
The criteria pollutants are summarized on this slide, and I think it's useful to take a little bit of a look at what they are. I think everybody is familiar with these. For example, carbon monoxide has both an eight-hour average and a one-hour average standard; lead is a three-month average. Nitrogen dioxide has an annual average. I think it's important to recognize that the pollutant standards themselves have very specific averaging periods built in, from one hour to three hour to eight hour, to 24, to quarterly average, to annual average.
These averaging times have very profound statistical implications that we will be discussing throughout these series of lectures.
It's very important to look at the way the standards are structured, because they're structured to minimize instability in the ambient measurement record to give us a number that we can use to make an objective evaluation of air quality at a monitoring site.
We also look at these different pollutants in terms of their sources. And we'd like to think of source pollutants as either being primary or secondary. Primary are those materials which are basically the same in the ambient air as they are at the point of emission. So a good example is carbon monoxide. Carbon monoxide primarily comes from motor vehicles; it moves through the environment but stays relatively unchanged. Primary pollutants can travel for very long distances if they don't deposit, or if they're large particles, they deposit quickly; they travel a very short distance.
In contrast, secondary pollutants are those materials which form a chemical reaction in the atmosphere. A good example of this would be formaldehyde. It's primarily emitted from organic precursors, from motor vehicles, and then sunlight and high temperatures in the afternoon cause it to be formed.
So as we look at and use these pollutants as examples in these lectures, you'll see that sources affect how we look at and evaluate the air pollution. Now, we're not going to talk too much about actual sampling methods. For criteria pollutants, sampling and analytical methods are well defined in the National Ambient Air Quality Standards. So if you want to look up the appropriate methods for nitrogen oxide, the best place to be would be look at the standard, look at their criteria document, and it will give you a great deal of the analytical method.
Air toxics are not as well developed. The EPA website has guidance on air toxics. We'll provide for you a number of references and links where you can get information about air toxics methods.
What we'll primarily do in these series of lectures is look at the statistical implications in the measurements, not so much the analytical methods.
So where do we start? We start by defining why we make measurements. What's the purpose of air quality monitoring? There's a whole series of reasons for air quality monitoring. And it's very important in the design and interpretation of the data set to know what the purpose underlying the measurement program was.
To start with, demonstrating compliance with the national standards. Most criteria monitoring throughout the United States is done just for this single purpose. In this example, the map shown on the slide shows the counties in Illinois that are not in attainment currently for ozone. It shows a large area up near Chicago and an area surrounding St. Louis where there are elevations in ambient levels of ozone. But by no means is this the only reason for monitoring.
Another very important purpose for ambient monitoring is to establish baselines for future reference. For example, if a large development is proposed for a geographical area, it may be required to monitor from one to three years before the facility is built to determine what the baseline is, and then impacts of the new facility can be judged based on the measurement record.
Another very important area that's important to university researchers is definitely health and exposure studies. Much of the monitoring we do, special purpose monitoring, is specifically designed for health and exposure studies. Now, we use ambient monitoring programs for the national standards as well. But when we actually go to do a detailed health study, we're usually using different kinds of monitoring strategies and different kinds of monitors. And we'll talk about some of the implications of these throughout the lectures.
Another very important purpose of air monitoring is to provide information on spatial variability. This particular slide comes from the Air Now program. The Air Now program is designed to not only look at the spatial variability of a pollutant but to communicate the interpretation and the meaning of these concentrations to the public.
Air Now is primarily focused on ozone, although there have been recent attempts to include PM2.5 into the Air Now website. I encourage you to go visit that website for information about air quality in your region.
What Air Now does is, it displays air pollution concentration in terms of the Air Quality Index. I have here just a brief slide which shows the Air Quality Index; the green and yellow indicates good or moderate. And when you switch over to the orange and reds, you switch over to where you become questionable with respect to the standard and certainly where you run into known health effects.
So if you study this Air Now image from August 2003, you'll see that much of the Region 5 area is in compliance with the standard. It's good or moderate. And there's a little bit of area downwind of Chicago near Grand Rapids which is a potential problem, because the color switches over to orange, and a little bit of area around St. Louis is also in orange. And it's a very effective way to communicate air quality. And these maps actually are animated. And if you go to the site you can watch the maps change in time.
Another very important purpose of air quality monitoring is to look at temporal change or temporal variability. We plan to have an entire lecture on this in the future. This particular slide shows the distribution of peak ozone, peak one-hour ozone concentrations across all the monitors of region five over a ten-year period. And what's shown here is the median monitor, the tenth percentile monitor and the 90th percentile monitor. You can see this is an extreme measure at all the monitors. You can see how the extreme ozone peak one-hour values change over time throughout the region.
We also superimposed on this map the dashed line which shows the number of days for each year that exceeded 90 degrees. And you can see in the case for ozone the driving variable which controls ozone; peak ozone values in a region is just that, the number of peak high temperature days during the year. In this graph, we very cleverly placed on the left-hand axis, the Air Quality Index, so you can see all in one image the short-term trend and its relationship with temperature. We will present much more about trends and mapping and graphics in subsequent lectures.
Another good example of trends in the Chicago area is to look at lead concentrations over time. This particular graph shows the distribution of lead levels in the ambient air in Chicago during three different periods: The late '60s, early '80s and early '90s. And you can see what happens over this period of time is that the distribution of lead levels is steadily dropping.
In the late '60s, the median, or 50th percentile lead level was well above one microgram per cubic meter, close to 1.5 the standard. By the early '90s, the bottom line, the median or 50th percentile, was below 0.2, around 0.15.
This particular kind of graph is called a log probability plot. It shows the distribution of these three data sets. And I'll talk a lot more about how to build these plots and how to interpret them in subsequent lectures. But you can see, over these three decades, a dramatic drop in ambient lead levels occurred.
One very nice public health result of this is what happens to blood leads in children. So this graph simply shows the change in the amount of lead we added to motor vehicles; as we decreased lead in motor vehicles, blood leads dropped. And this drop in blood and lead drop in lead in motor vehicles was seen in the previous slide as dramatic decreases in ambient exposure levels.
Public health success!
Another important use for ambient monitoring is to compare models and our conceptual understanding with the actual reality. This particular graph is a trellis plot, and it shows observed ammonium ion concentrations with modeled ammonium ion concentrations. We have an idea of what the ammonium ion concentration should be based on our chemical understanding of air pollution. This graph shows that our computed or estimated levels, our understanding of ammonium agrees really nicely with our ability to measure it. With this graph with the trellis plot we're able to separate this out to different cities throughout the region.
One of my plans is to have an entire lecture on the language of graphing, and we'll tell you much more about trellis plots in that lecture.
So what are the steps in the design and analysis process? It's very important, when you're designing an air quality study, and ultimately the data analysis, that you put it into the context of the study design. This first lecture will now focus primarily on these issues and use a number of examples to illustrate these points. But it's very important to put data analysis in the context of study design. Your primary objective is to look at the data as it was designed to be looked at. Exploratory analysis is always dangerous, statistically. So it's very important to stay grounded. And what are the basic steps here? Step one: What's the purpose of the study? Step two: Who do you want to study? Step three: What is the physical environment where you're going to study? Step four: What kind of samples do you collect? Are you looking at personal exposure or long-term area concentrations?
Step five: What is the quality assurance plan? You must have a well-developed, you are required to have a well-developed quality assurance plan, including measurement objectives, data quality objectives, sampling procedures, lab procedures, and data handling procedures. Even statistical analysis procedures need to be in your quality plan.
The sixth step is to look at data from previous studies. What do you know about the environment? Before you make a measurement it's always very useful to get a very good idea about what you're going to find before you actually start collecting the samples.
Step seven: At this point it's useful to begin to develop the actual field sampling procedures. You could define where to specifically sample. Where can you get power? Where can you get security? What's too close or what's not close enough to the particular people you want to study? Step eight: You need to look at what kind of statistical procedures you're going to apply to the data at this point. It's useful to define them. And then step nine: Ultimately conduct the study and as much as possible conduct it according to written plans. When you get to the real world and get to the field, it's always tempting to make changes. But it's very important to stick to your plans.
If you don't stick to your plans, much of the statistical implications you try to make from the data may not be valid. Step ten; When you're done collecting the data, look at the results and step eleven; determine if you've met your objectives.
As we step through these design steps, I'll be looking at the measurement record for ozone and PM 2.5 as well as a few toxics for my examples.
As much as possible, I will illustrate with real data examples. So step one, define the purpose of the study. There are a whole variety of reasons why we're out there. At the top of the list is determine compliance with the NAAQS. So much of our data, much of our monitoring to date throughout the United States is just that: Are we in compliance or not in compliance with the national standards for carbon monoxide? We may be out there for different reasons. Another important reason, for example, would be to understand more about what's going on with a specific toxic pollutant.
So one question may be: Is there a pattern or not? You may want to just determine the diurnal pattern for the pollutant. And that's a very different study design. So it's very important to be very clear, up front with what you are trying to accomplish.
Now one of the pollutants we'll be referring to is ozone. And a lot bit of background on ozone. Ozone is a secondary pollutant. It's formed in the atmosphere from chemical reactions involving volatile organic carbons, oxides of nitrogen and presence of UV energy. And high ambient temperatures are important.
Ozone is formed from these precursors. It's not directly emitted. This has implications about its behavior in the environment. The best conceptual model for ozone is from the National Academy book, "Rethinking the Ozone Process." This slide roughly shows the kind of conceptual understanding that we now believe is appropriate for ozone. I strongly recommend this book for background information on ozone and its chemical and physical properties.
One very large field campaign that we had in the early '90s was the Lake Michigan Ozone Study, and that was a field monitoring program designed to support the development of a control strategy for the region. Here's a little bit of data from that study which shows the statistical ozone pattern you see in urban areas, where you get a peak in the middle to late afternoon. And that peak is due to the chemical reactions coming to completion during the time of the day where there's maximum sunlight, energy.
So this pattern for ozone is very similar for different areas. In contrast, a toxic pollutant like benzene is a conserved pollutant. It's a primary pollutant that comes from, I would say, the vast majority from motor vehicles. And benzene patterns in urban areas will reflect the pattern of emissions from the sources as well as the pattern from atmospheric stability. So this particular graph shows a box plot for each hour of the day. It shows the benzene concentrations throughout the day.
You can see in the morning drive time, 6:00 , 7:00 , 8:00 in the morning period, the benzene levels are higher. And then as you move into the middle of the day, the concentrations drops down. Now what's happening in the atmosphere is, as the day progresses and the sun heats the ground, you get increased atmospheric instability, increased mixing. So in general concentrations go down in the afternoon.
To some extent benzene is also chemically reactive by photo-chemical processes which also tend to destroy it.
So this pattern of high concentrations during the time of emission and low concentrations during the afternoon when there is more instability is very typical of a conserved pollutant from a source like motor vehicles. The box plot is a very important tool that we use to display information. We'll talk quite a bit about it in future lectures.
This is a little bit of data we found from New York City which shows hourly values of benzene. It shows quite clearly the morning concentration spike. Every morning on these three days there's a peak in the benzene concentration due to motor vehicles. There's also some evidence of a peak in the late afternoon. That late afternoon peak may be from any of a number of reasons. One could be the evening rush hour. Another could be the development of atmospheric stability as the sun sets. But this kind of short-term variability is very typical in the environment. One of the challenges we have is to characterize this kind of variability.
Once you know what the purpose is, the second step in the design process is to define what the population of interest is. That is, the temporal scale and the spatial scale of your data. Are you looking for annual leverage, or are you looking for a short-term peak? Are you looking for a whole city or particularly a neighborhood influenced by a source? You need to be very clear of what the underlying objective is.
Now, spatial scale is very important. As I mentioned before, when we make measurements at a fixed site we're looking at the air that's passing by the monitor. What's that telling us? That's basically telling us about the air quality emissions that have been injected into the air on its way to the monitor. What we're actually looking at, if you like, is a glimpse at where the air has been over the past few minutes, few hours, few days.
We then take that sample of air and we compute the concentration from it. What does that concentration tell us? It tells us about the air that passed that monitor, but it also tells us something about the air quality in the area surrounding the monitor. It's very important to get a handle on the spatial scale, how representative is that number. The EPA defines this based on the mean concentration at the monitoring site. So if a monitoring site has a mean concentration of 100, the spatial scale of the monitor is the area surrounding the monitor where the mean concentration is within 25 percent, or between 75 and 125. There's a very specific spatial scale associated with the monitor. The spatial scale may be very small, next to a big source; or maybe very large if it's in center of the city, not close to any sources.
We define these in four scale ranges: Urban, neighborhood, middle scale and micro scale. The urban scale is a citywide measurement; it represents distances from four to 50 kilometers. And its primary purpose is to reflect long-term trends. The neighborhood scale is the basic scale for exposure monitoring or compliance monitoring. It has a scale in the neighborhood of 5.5 to 4 kilometers. That's sort of the work horse monitor for NAC assessment.
Middle scale and micro scale represent areas that are very small and only represent localized problems. If you are concerned about a particular air pollution source affecting a particular neighborhood, you might want to shove a monitor right up next to it and look at the micro scale or very short-term impacts of that source on the neighborhood. It doesn't tell anything about exposure throughout the neighborhood, but it represents just that one particular small scale. Micro scale could be as short as a hundred meters. A hundred meters away, you could have more than a 25 percent difference in mean concentration.
This has other implications. This graph shows the peak expected concentration as a function of scale. On the left side of the graph is the microscale, right next to the factory, and the right hand side of the graph is the urban scale. And you can see that the expected range of concentrations will increase as the scale length decreases.
This is true in terms of spatial scale as well as averaging time scale length. The shorter the time of measurement, and the shorter the scale, the higher the potential concentrations. And you do expect this pattern when you're out there monitoring.
The third step in this process is to collect information about the physical environment. It's very important to know history of the site, where the major source is; what are the landscape issues? Are there barriers, are there topographical issues, what's the meteorology. What are the temperatures and wind patterns and precipitation patterns for the site? These will all ultimately control the air quality you're going to measure. So understanding these up front is critical. We'll look at some of these examples.
One monitoring program we have in the Lake Michigan region is called PAMS, Photo Chemical Assessment Monitoring Sites. PAMS sites are specifically in those regions of the country that exceeds the ozone standard. If you look at our PAMS monitoring network, it's pretty clear that it's strongly influenced by the geography of the Lake Michigan region.
All of the PAMS monitors are located right up on the lake, except for one slightly upwind and inland. So you'll see the upwind site, the Braidwood site is a little bit away from the lake. All the other sites, whether urban, downwind, or transport sites, are different distances away from the urban core along the lakefront. It's clear from this map that the cold surface of Lake Michigan has a profound influence on ozone in the summer in our area.
And the monitoring network reflects this major geographical feature of our region. Now, meteorological data is readily available. You can get it from your meteorologists who work for your local agency. The National Weather Bureau has a number of sites where you can purchase, for very small amounts of money, CD-ROMs with meteorological data.
The questions that you need to ask yourself are: What are the prevailing winds? The wind direction will control what sources are upwind of a monitoring site. And it will control how often a particular source has potential impact at a particular monitoring site. Windroses are the way that we display this information. The monitoring website shown here has windroses for a variety of areas throughout the United States , which you can download. Here are some examples. These windroses show, for St. Louis and San Francisco , profoundly different patterns. The windroses for San Francisco shows winds that predominantly come from the west and northwest. There are winds from the ocean toward the land.
In contrast, the windroses for St. Louis shows a much more balanced pattern of winds where winds come from all directions. Now, the windrose is a joint frequency graphical representation of wind direction and wind speed. It shows the fraction of time the wind comes from a particular wind direction at a particular wind speed. So the thicker bar represents the fractured time at higher wind speeds, and the thinner part of the bar, the fractured time at lower wind speeds. And this is one of the governing parameters that determines air quality at a monitoring site. It is very important to understand this as you proceed.
Wind is also important for other reasons. Wind speed is an important parameter in determining peak concentration. For a non-reactive conserved pollutant as shown in this graph, wind speed is inversely proportional to concentration. As you increase wind speed, peak concentrations decrease; a very typical pattern. This pattern may not be seen for a pollutant like wind blown dust or coarse particles, because at very high wind speeds, coarse particle concentration can actually increase. But for most conserved pollutants the pattern is illustrated here.
Temperature is obviously an important variable for photo-chemical pollutants. This is a graph which shows photo-chemical pollutant formaldehyde increases as temperatures increase. Quite simply, the chemistry reactions just take place at higher rates at higher temperatures. So what happens here is that the potential for high formaldehyde increases as temperature increases.
In addition to wind speed and wind direction, it's possible to determine pollutant concentration due to wind direction. This is from a graduate of ours who works in Idaho , who is an air pollution enforcement official. This is a graph of PM10 concentrations as a function of wind direction. You're able to see that, although the wind most of the time comes from the southeast, the concentrations associated with these winds are quite low. The small fracture of the time the wind comes from the northwest, the concentration of PM10 was much higher. So this is just an example of graphing pollution concentration instead of wind speed on a windrose.
As another example, he was investigating hydrogen sulfide odor incidents in Paul , Idaho . He set up an H 2S monitoring which collected hourly data. His pollution rose showed that the only time he ever saw H 2S, the wind came right from the west, slightly north of west. And when you graph this pollution rose on a topographic map, on a map of Paul , Idaho , the pollution rose points right at the suspected source. He presented this information to the source operators. They could see that, in fact, they were likely the culprit and installed odor control technology on the source. So this was a very successful application using wind and air pollution concentrations and enforcement action.
Temperature is also important in understanding monitoring technology. Here in this region, we've spent a lot of resources trying to understand ammonium concentrations and ammonia concentrations. This is a project done by our regional planning group, LADCO, and they were looking at two different monitoring technologies, the Ion chromatograph and the Pranalytica techniques. And when you graph the difference between what these two instruments measure versus temperature you see a strong influence of temperature. As temperature goes up, the difference increases systematically. It's not a random effect. One of these two instruments is unfortunately sensitive to temperature. And they're trying to understand what it is about the sampling technology that causes this, because you don't want to see this kind of pattern. You'd like to see random a relationship.
In contrast, when you look at the same difference between the two monitoring technologies versus humidity, there is no pattern. The average difference is zero across all humidities. So humidity is not an important variable. But something about one of the instruments does appear to be sensitive to temperature.
And finally topography: Topography can have profound effects on your air quality. It's not a big issue in region five. We're one of the flattest parts of the country. But if you do live in an area where there are surface features, you want to be very aware of topography.
The Denora, Pennsylvania incident back in 1948 is a good example. This is a town with a steel mill in a valley. And the topography caused profoundly high concentrations on a weekend with resulting important health implications.
We also have a nice picture here of a river valley in Italy which again shows the profound effect of complex terrain on air masks. So if you do have an area where there are important topographic features, topography can be a controlling factor.
Step four in this process is to define the kind of samples you want to collect. Do you want to look at long-term versus short-term versus personal impacts? Are you looking at all of a metropolitan area like Chicago versus a neighborhood? What's your averaging time? Is your ultimate goal a three-year average? Or are you looking at a short-term peak concentration?
If you're looking at a NAAQS, there are excellent references on-line in terms of monitoring and types of samples required. But for other kinds of studies, things are not as well defined.
Now, long-term monitoring can be used for a number of reasons: One is, your ultimate goal may be to determine the highest concentration during a particular length of time over a three-year period. Or to determine the representative concentrations over a long period of time, long-term average.
Long-term monitoring may be to look for and identify impacts from particular sources. Or just to define the general background levels present.
So long-term monitoring can ultimately give you many different answers. In contrast, short-term or special purpose monitoring really is much more focused on a specific end point. What's the geographic variability at this location? Are there environmental concerns caused by a particular facility in a neighborhood. Personal exposure monitoring is specifically for health studies. We'll have very different study designs than for long-term monitoring for compliance studies.
Once you've determined the purpose and how you're going to monitor, you then need a quality assurance plan. The quality assurance plan covers a whole range of issues from measurement objectives all the way to data analysis techniques. And it will primarily be the focus of the third module of these lectures. So I will move on at this point.
Step six is to learn about study area from previous studies. What do you know about the monitoring area from measurements we've made in the past? The National Air Toxic Assessment, NATA, did a fantastic assessment on a whole list of targeted urban air toxics. What's shown on this slide are their estimates of background concentrations that you should expect to see in urban areas for a variety of pollutants. Benzene, et cetera.
Now background concentrations are that amount of mass that you'd expect to see before the addition of any local sources. In the case of benzene, there are tremendous local sources. In the case of carbon tetrachloride, hopefully not.
So if you look at the distribution of expected concentrations, you can see that for benzene, about a third of the benzene is from background. And about two-thirds from ambient sources. Mostly mobile sources.
In contrast, below is the picture for carbon tetrachloride. Carbon tetrachloride is essentially 100 percent from background. We no longer manufacture or use it in the industry. And what's left in the atmosphere is left over from past uses, over many decades.
So what you see whenever you measure carbon tetrachloride, no matter where you go, it's essentially the same concentration. It's all a background.
The seventh step is to develop the specific field sampling program. How many sampling locations do you need? How many monitors do you need at each sampling location? How many samples do you collect? Do you need to collect samples over a one-month, a one-year or ten-year period? How important is background going to be in your field study, and do you need to measure background? These very specific questions need to be answered as you develop your sampling program.
At this point it's also useful to think about what you're going to do to the data once you collect it. Trends analysis, spatial patterns, averaging times, looking at different percentiles of the data. Even isopleth graphing. It's a good idea to define the techniques you're going to use now, so you're ready to go when you get to that point in the program. At this point, if you've done your homework correctly, it's time go do the study. Do the monitoring.
Here's an example. Monitoring PM 2.5 in a rural area, using three fixed sites and improved type monitors. We're monitoring with teflon filters for gravimetric analysis and we're taking a 24-hour sample once every three days. This gives us 120 samples per year over a three-year study. These are some of the outlines of what a monitoring program may entail. But again, these are just examples.
And it's very important to go off and conduct the monitoring study according to the written plans and not deviate from the written plans. And things to consider when you're ultimately interpreting the data: Remember, when you finish the study and you have all the data sets compiled and accumulated, it's important to remember back to the purpose of the study and representativeness of the monitors in interpreting the data. It's very important not to overreach with the data.
And we're going to look at, in the evaluation of the results, we're going to look at ozone and PM as our specific examples here. And there's guidance available on both of these pollutants associated with the National Ambient Air Quality Standard at the references provided.
So our first example involves interpreting the eight-hour ozone monitoring data for compliance. And the first step in this process is to assemble your data into a spreadsheet. What you're shown here is ozone is measured on an hourly basis. So you would assemble for each day the 24-hour one-hour values. And the first step here is to compute the eight-hour averages. We're specifically looking in this example for the eight-hour standard.
So you compute all the possible 8-hour averages available for each day. You select for each day the highest value. In the case of this first day, the highest eight-hour period was 0.07 parts per million. Now sometimes the measurement record is incomplete. If there are missing hours, that's okay. As long as there aren't too many missing hours. But remember, when you have missing hours you need to have the correct denominators in your averages. You don't want to divide by eight when only seven hours of data available for the average.
The criteria for missing value is that for an eight-hour average you must have at least six valid data points. And in terms of using the monitor for NAAQS, the actual annual data must be 75% complete.
Now, there are exceptions to this in that the EPA will allow you to use incomplete data if it supports the conclusion of noncompliance. So if you're clearly over for a day and you only have five hours of data, you can use those five hours. But you can't demonstrate compliance with data that only has five hours of valid data. You need to have valid data to demonstrate compliance. This is a general theme that shows up in many interpretations EPA has for monitoring data.
So what you do for each year is you compute the highest value for each day and then you sort within the year the values from the highest to lowest. What's shown here are the four highest eight-hour values for three years, 2001, 2002, 2003. Now, the standard says the fourth highest value averaged over three years.
So you simply take those three values four highest values and take an arithmetic average. You end up with 0.0846. The particular specification for how this is done is you round to two decimal places. That would be 0.08, which just squeaks in under the standard for this particular example.
My second example is data interpretation for compliance purposes for PM 2.5. Now a little bit on PM 2.5. This is a pollutant which has both primary and secondary contributors. A substantial fraction of PM2.5 will be formed in the atmosphere. Sulfates and nitrates, organic carbon, come from gaseous precursors.
It's important to understand conceptually where this comes from as you're setting up your study design. And sorting out the chemical composition or the chemical framework we use summarized here, the major components of this particular pollutant are soil or crustal materials, sulfate nitrate, elemental carbon and organic carbon and everything else.
As you delve into the criteria document and the PM 2.5 literature, this is the model that we generally use.
Now the standard is structured with two different averaging times. One is the annual average. The other is the short-term 24-hour average. The annual average standard is met when the three-year average either spatially or non-spatially averaged data is less or equal to 15.0 micrograms per meter.
So with the annual average, you're allowed to designate, if you choose, specific monitors that will go into a spatial average. The spatial average of those monitors are used for compliance.
The short-term standard is based on 24-hour peaks. And it's based on the three-year average of a 98th percentile at each monitor. And so I'll go through these two calculated examples from the Chicago area.
Now in spatial averaging, the first thing you do is start by computing the quarterly average. You take all the measurements within each calendar quarter, January through March, April through June, et cetera, and compute the average.
The purpose of this is so that if you have different numbers or observations within each quarter. Ultimately, the influence of each quarter on the annual average will not be influenced by the number of observations taken within that quarter.
So in this case you compute the four quarters and then the annual average is the average of these four quarterly averages as shown here.
The answer here, 15.85, would round to 15.9. Potentially, evidence exists of exceedence of the standard, which is 15.0. However, you're allowed to do spatial averaging if you designate in your community three particular monitoring sites that will be used in the spatial averaging as shown here, site one, two, three.
You need to acquire three years of data from these three monitoring sites and compute the average across the years: Year one, year two, year three. And then once you have the spatial average for each year, compute the grand average for the three years. In this case 15.3, still slightly above 15.0, there is potential here for a NAAQS violation.
Now, the 98th percentile is a bit different. Here you're looking at a peak concentration. What you would do is just sort the data from lowest to highest. In this case there are 118 observations, which is pretty typical of a monitoring program designed to collect samples once every three days. You would expect about 120 or so samples if you were 100% successful. So 118 is pretty good data completeness. You sort from lowest to highest. Now, the 98th percentile is the point along the distribution and you find that data point by simply multiplying the total number of observations times 0.98. In this case we get 115.6. But there isn't a sample 115.6, there is a sample 115, sample 116. We don't want you to interpolate here. We just want you to go up to the next highest value. In this case sample 116. We'll define that as 98%. And looking it up on the chart, that's 31.9.
You do this for three years. Take the average of those three years and that's the value to compare to the standard. In this case, 32.5, much less than 65, this is not evidence of a standard violation.
And I think in general, not everywhere, but in general, for PM 2.5, most parts of the country will have problems with the long-term average standard and not the short-term 98th percentile part of the standard. And this data taken from EPA region five is consistent with that general observation.
And finally, step 11: Evaluate whether or not you've met the objectives of the study. Was the data sufficient to meet the requirements of compliance? Did you meet your data quality objectives? Do things make sense or not? It’s very important to step back and take a look and ask yourself this question.
Just a little bit of a preview, in lectures two and three, we'll learn more about data quality objectives, study design and detection limits. I'll delve a little bit deeper into a number of these concepts soon.
So what have we covered? We've covered the background for air quality sampling, the purpose of sampling and the 11-step process that will help you come up with good study design and good data interpretation. The examples shown were how to interpret the particular NAAQS for ozone and PM 2.5.
I want to thank you for making it through this first lecture and encourage you to complete the course evaluation and send us e-mails with your questions, concerns and comments, and suggestions on new data sets are greatly appreciated. We'd like to keep our examples current and real and living. And we look forward to hearing from you. And we'll be back shortly with future lectures. Thank you.