UNIVERSITY OF ILLINOIS AT CHICAGO
COLLEGE OF BUSINESS ADMINISTRATION
MBA 503 (Statistics Module for MBA Students):
Data Analysis for Managerial Decisions
PROF. STAN SCLOVE
NOTES ON ANALYSIS OF VARIANCE
These notes Copyright © 1997 Stanley Louis Sclove
(These course notes will sometimes be a summary of what is in the text but often will amplify and extend it.)
The essay "Preliminary Evaluation of a New Food Product" (Street and Carroll 1972) is a very readable introduction to the design of experiments and analysis of variance.
The LBS textbook discusses single-factor ("one-way") ANOVA. Often one needs higher-way ANOVA. How would you assess the effects of Temperature and Presssure during manufacture on the quality of the resulting product? Or, suppose you own ThreadBare, a garment-sewing company. You have three machines which do the same job. They are run by serveral operators. How do you comapre the machines? How do you comapre the operators? How would MachCo, the compnay that makes the machines, answer these questions? How would the union? Why should these three parties view the questions differently? These questions require higher-way ANOVA and different sampling models.
Another example is
which is still called "additive" because the term in X1X2, though nonlinear, is still added on.
Combining the preceding, we have, say, Y = A + B1X1 + B2X2 + B'3X1X2 + e.[An example of a nonadditive response function would be f(X1,X2) = A*e-B1X1cos(B2X2).]
This gives nine treatment combinations: {(Ti, Pj), i = 1,2,3; j = 1,2,3}.
The response Y is observed n times at each treatment combination; the results at the (i,j)-th treatment combination are denoted by Yijk, k = 1,2,...,n. The value of the response function at (Ti,Pj) is f(Ti,Pj) = = µij, say. This function is modeled in terms of the effect of T, the effect of P and their combined effect. This is customarily written asIn this context, X1 and X2 are often called "Factor A" and "Factor B".
TABLE 1. Stacked Data # MnPr0701 Dat # Temp: Lo/Med/Hi; Press: Lo/Med/Hi; n = 2 reps per cell # y Temp Press --- ---- ----- 90.4 1 1 90.7 1 2 90.2 1 3 90.2 1 1 90.6 1 2 90.4 1 3 90.1 2 1 90.5 2 2 89.9 2 3 90.3 2 1 90.6 2 2 90.1 2 3 90.5 3 1 90.8 3 2 90.4 3 3 90.7 3 1 90.9 3 2 90.1 3 3 ---------------- SOURCE. Montgomery (1991), Chapter 7, Problem 1These "stacked" data can then be tabulated and edited to give a table such as Table 2.
TABLE 2. Data for Montgomery, Problem 1, Chapter 7
PRESSURE
| Lo Med Hi
___________________________
|
Lo | 90.4 90.7 90.2
| 90.2 90.6 90.4
|
TEMP Med | 90.1 90.5 89.9
| 90.3 90.6 90.1
|
Hi | 90.5 90.8 90.4
| 90.7 90.9 90.1
______|____________________
SOURCE. Table 1
Notation: PRESSURE
j=1 j=2 j=3
| Lo Med Hi
___________________________
|
i=1 Lo | y111 y121 y131
| y112 y122 y132
|
TEMP i=2 Med | y211 y221 y231
| y212 y222 y232
|
i=3 Hi | y311 y321 y331
| y312 y322 y332
______|____________________
SOURCE. Table 1
The notation is yijk, where i = level of Temperature (1, 2
or 3), j = level of Pressure (1, 2 or 3), and k = replicate (1 or 2).
In general, yijk. denotes the value for the k-th replicate (k
= 1,2,...,n) in the cell for the i-th level of Factor A (i =
1,2,...,a) and the j-th level of Factor B (j = 1,2,...,b).
Sums and Means
When a subscript is replaced by a plus sign (+), it means summing over
the values of that subscript. For example, yij+ denotes the sum
of the replicates in the (i,j)-th cell. When the plus is replaced by a dot (.), it denotes the corresponding mean, yij. .
TABLE 3. One-at-a-time design with two observations per cell
FACTOR B
| B1 | B2 |
====|==============|==============|
A1 | y y | y y |
| 111 112 | 121 122 |
FACTOR A ----------------------------------|
A2 | y y | |
| 211 212 | |
===================================
TABLE 4. Factorial design with one observation per cell.
FACTOR B
| B1 | B2 |
===|==============|==============|
A1 | y | y |
| 111 | 121 |
FACTOR A ---|--------------|--------------|
A2 | y | y |
| 211 | 221 |
===================================
The estimates of the main effects from Table 4 are based on only 4 observations but are just as precise as
those from Table 3, based on 6 observations.
The factorial design is thus more efficient. Further, the
interaction is estimable from Table 4 but not from Table 3.
With Table 4, the effect of B can be estimated for each level of A.
As discussed above, the observation yijk. is written in terms of parts due to the i-th level of Factor A, the j-th level of Factor B and the interaction of the i-th level of A and j-th level of B, and a residual. The decomposition into these parts can be written as follows.
This decomposition also holds for the sums of squares (provided there are equal cell numbers).
Model I. From ThreadBare's viewpoint, they are stuck with those 4 machines and those 3 operators. Inferences are to be made about those specific 4 machines and those three operators. This is a fixed effects model.
Model II. From the viewpoint of MachCo, who manufactures the machines, the 4 machines are only a sample from their total production. And the three operators are a sample from a population of possible potential operators. This a random effects model.
Model III. The company that has the machines may take another viewpoint. Namely, they may regard the operators as a sample from the population of potentially available operators. Then, Operators is a random effect and Machines is a fixed effect: this is a mixed model.
TABLE 5. ANOVA Table, assuming n = 2 replicates per cell
F-ratios
----------------------------------------
Source SS DF MS I II III
-----------------------------------------------------------------------
A SS(A) 3 MS(A) MS(A)/MS(E) MS(A)/MS(AB) MS(A)/MS(AB)
B SS(B) 2 MS(B) MS(B)/MS(E) MS(B)/MS(AB) MS(B)/MS(E)
AB SS(AB) 6 MS(AB) MS(AB)/MS(E) MS(AB)/MS(E) MS(AB)/MS(E)
Error SS(E) 12 MS(E)
---------------------------------------------------------------------
Total SS(Tot) 23
---------------------------------------------------------------------
Mean SS(Mean) 1
Grand Total
SS(GrTot) 24
---------------------------------------------------------------------
The F-ratio for testing a given hypothesis is determined by the expected mean square for the corresponding effects. Let's illustrate with Model II. In this model,
where here A, B and AB are random variables representing the A effects, B effects and AB effects. Consider the null hypothesis H'o. concerning the Factor A effects. The test is a test of whether V(A) = 0. E[MS(A)] = V(e) + c1V(AB) + c2V(A), where the c's are constants related to the sample sizes. When Ho is true, this is
It is also true that this equals E[MS(AB)]. So Ho is equivalent to E[MS(A)] = E[MS(AB)], or E[MS(A)]/E[MS(AB)] = 1. When Ho is not true, this ratio is bigger than 1. This ratio is estimated by MS(A)/MS(AB), the F-ratio for this test. The hypothesis Ho is rejected when this ratio is large.
Montgomery, Douglas C. (1991). Design and Analysis of Experiments. 3rd ed.. Wiley, New York.
Street, Elisabeth, & Carroll, Mavis B. (1972). "Preliminary Evaluation of a New Food Product." In Tanur, J., et al. (eds.). Statistics: A Guide to the Unknown. Holden-Day, San Francisco. 3rd ed., Wadsworth-Brooks/Cole, Pacific Grove, Cal., 1989.