UNIVERSITY OF ILLINOIS AT CHICAGO
COLLEGE OF BUSINESS ADMINISTRATION

MBA 503 (Statistics Module for MBA Students):
Data Analysis for Managerial Decisions
PROF. STAN SCLOVE

NOTES ON ANALYSIS OF VARIANCE
These notes Copyright © 1997 Stanley Louis Sclove

(These course notes will sometimes be a summary of what is in the text but often will amplify and extend it.)

INTRODUCTION

The essay "Preliminary Evaluation of a New Food Product" (Street and Carroll 1972) is a very readable introduction to the design of experiments and analysis of variance.

The LBS textbook discusses single-factor ("one-way") ANOVA. Often one needs higher-way ANOVA. How would you assess the effects of Temperature and Presssure during manufacture on the quality of the resulting product? Or, suppose you own ThreadBare, a garment-sewing company. You have three machines which do the same job. They are run by serveral operators. How do you comapre the machines? How do you comapre the operators? How would MachCo, the compnay that makes the machines, answer these questions? How would the union? Why should these three parties view the questions differently? These questions require higher-way ANOVA and different sampling models.

MODELING

LBS considers the case of a single factor ("one-way ANOVA"). Now consider two factors. The response Y is a function of variables X1, X2. The expected value of the response when the inputs are at X1 and X2 is denoted by f(X1,X2), say. Besides being a function of X1 and X2, the response Y contains random variation ("error"), e. This "error" is not really error in the sense of a mistake; it is just the deviation of the response Y from its mean f(X1, X2).
ADDITIVE MODELS
We write Y = f(X1,X2) + e. Such a model is called "additive" because the term e is added on.
LINEAR MODELS
Often f(X1,X2) is taken to be linear, e.g., f(X1,X2) = A + B1X1 + B2X2 .

Another example is

f(X1,X2) = A + B1X1 + B2X2 + B3X1X2,

which is still called "additive" because the term in X1X2, though nonlinear, is still added on.

Combining the preceding, we have, say, Y = A + B1X1 + B2X2 + B'3X1X2 + e.

[An example of a nonadditive response function would be f(X1,X2) = A*e-B1X1cos(B2X2).]

DESIGNED EXPERIMENTS
In designed experiments, X1 and X2 are set at a few convenient or practical values, or at values that prior experience suggests might be good. Let's discuss this in terms of an example where X1 = Temperature, T, and X2 = Pressure, P. Temperature might be set at low, medium and high values T1, T2, T3, and Pressure at low, medium and high values P1, P2, P3 .

This gives nine treatment combinations: {(Ti, Pj), i = 1,2,3; j = 1,2,3}.

The response Y is observed n times at each treatment combination; the results at the (i,j)-th treatment combination are denoted by Yijk, k = 1,2,...,n. The value of the response function at (Ti,Pj) is f(Ti,Pj) = = µij, say. This function is modeled in terms of the effect of T, the effect of P and their combined effect. This is customarily written as
f(Ti,Pj) = µ + Ai + Bj + (AB)ij .

In this context, X1 and X2 are often called "Factor A" and "Factor B".

DATA ENTRY

It is best to build a database by entering only one observation per record, together with all details about it. See Table 1. This facilitates organizing, summarizing and displaying the data in any number of ways.
TABLE 1.  Stacked Data

# MnPr0701 Dat
# Temp: Lo/Med/Hi; Press: Lo/Med/Hi;  n = 2 reps per cell
#  y  Temp Press
  --- ---- -----
  90.4   1     1
  90.7   1     2
  90.2   1     3
  90.2   1     1
  90.6   1     2
  90.4   1     3
  90.1   2     1
  90.5   2     2
  89.9   2     3
  90.3   2     1
  90.6   2     2
  90.1   2     3
  90.5   3     1
  90.8   3     2
  90.4   3     3
  90.7   3     1
  90.9   3     2
  90.1   3     3
 ----------------
 SOURCE.  Montgomery (1991), Chapter 7, Problem 1
These "stacked" data can then be tabulated and edited to give a table such as Table 2.
TABLE 2.  Data for Montgomery, Problem 1, Chapter 7

                         PRESSURE
                   |    Lo   Med    Hi
             ___________________________
                   |
               Lo  |  90.4   90.7   90.2
                   |  90.2   90.6   90.4
                   |
      TEMP     Med |  90.1   90.5   89.9
                   |  90.3   90.6   90.1
                   |
               Hi  |  90.5  90.8   90.4
                   |  90.7  90.9   90.1
             ______|____________________

             SOURCE.  Table 1

Notation:                 PRESSURE
                      j=1    j=2   j=3
                   |    Lo   Med    Hi
             ___________________________
                   |
          i=1  Lo  |  y111   y121   y131
                   |  y112   y122   y132
                   |
     TEMP i=2  Med |  y211   y221   y231
                   |  y212   y222   y232
                   |
          i=3  Hi  |  y311   y321   y331
                   |  y312   y322   y332
             ______|____________________

             SOURCE.  Table 1
The notation is yijk, where i = level of Temperature (1, 2 or 3), j = level of Pressure (1, 2 or 3), and k = replicate (1 or 2). In general, yijk. denotes the value for the k-th replicate (k = 1,2,...,n) in the cell for the i-th level of Factor A (i = 1,2,...,a) and the j-th level of Factor B (j = 1,2,...,b). Sums and Means When a subscript is replaced by a plus sign (+), it means summing over the values of that subscript. For example,   yij+ denotes the sum of the replicates in the (i,j)-th cell. When the plus is replaced by a dot (.), it denotes the corresponding mean, yij. .
ADVANTAGES OF FACTORIAL DESIGNS
TABLE 3.  One-at-a-time design with two observations per cell

                             FACTOR B
                  |    B1        |     B2       |
              ====|==============|==============|
              A1  |  y     y     |  y     y     |   
                  |   111   112  |   121   122  |      
   FACTOR A   ----------------------------------|
              A2  |  y     y     |              |
                  |   211   212  |              |
              ===================================

TABLE 4.  Factorial design with one observation per cell.

                              FACTOR B
                   |    B1        |     B2       |
                ===|==============|==============|
               A1  |   y          |   y          |
                   |    111       |    121       |
    FACTOR A    ---|--------------|--------------|
               A2  |   y          |   y          |
                   |    211       |    221       |
               ===================================

The estimates of the main effects from Table 4 are based on only 4 observations but are just as precise as those from Table 3, based on 6 observations. The factorial design is thus more efficient. Further, the interaction is estimable from Table 4 but not from Table 3. With Table 4, the effect of B can be estimated for each level of A.
THE TWO-FACTOR FACTORIAL DESIGN

As discussed above, the observation yijk. is written in terms of parts due to the i-th level of Factor A, the j-th level of Factor B and the interaction of the i-th level of A and j-th level of B, and a residual. The decomposition into these parts can be written as follows.

[yijk - y...]   =   [yi.. - y...] + [y.j. - y...] - [yij. - (yi.. + y.j. - y...)] + [yijk - yij.].

This decomposition also holds for the sums of squares (provided there are equal cell numbers).

SS(Total) = SS(A) + SS(B) + SS(AB) + SS(Error)
RANDOM AND MIXED EFFECTS
Your company, ThreadBare, owns four MachCo synthetic thread spinning machines. You employ three machine operators. Each operator will use each machine, and the tear resistance of the resulting thread will be measured. This is a two-factor problem. Factor A is Machines and Factor B is Operators. There are two replicates: Each operator uses each machine twice.

Model I. From ThreadBare's viewpoint, they are stuck with those 4 machines and those 3 operators. Inferences are to be made about those specific 4 machines and those three operators. This is a fixed effects model.

Model II. From the viewpoint of MachCo, who manufactures the machines, the 4 machines are only a sample from their total production. And the three operators are a sample from a population of possible potential operators. This a random effects model.

Model III. The company that has the machines may take another viewpoint. Namely, they may regard the operators as a sample from the population of potentially available operators. Then, Operators is a random effect and Machines is a fixed effect: this is a mixed model.

TABLE 5.  ANOVA Table, assuming n = 2 replicates per cell

                                          F-ratios
                             ----------------------------------------
Source  SS      DF    MS         I             II             III
-----------------------------------------------------------------------
A     SS(A)      3   MS(A)   MS(A)/MS(E)   MS(A)/MS(AB)  MS(A)/MS(AB)
B     SS(B)      2   MS(B)   MS(B)/MS(E)   MS(B)/MS(AB)  MS(B)/MS(E)
AB    SS(AB)     6   MS(AB)  MS(AB)/MS(E)  MS(AB)/MS(E)  MS(AB)/MS(E)
Error SS(E)     12   MS(E)
---------------------------------------------------------------------
Total SS(Tot)   23
---------------------------------------------------------------------
Mean  SS(Mean)   1
Grand Total
      SS(GrTot) 24
---------------------------------------------------------------------

EXPECTED MEAN SQUARES AND F-RATIOS

The F-ratio for testing a given hypothesis is determined by the expected mean square for the corresponding effects. Let's illustrate with Model II. In this model,

V(y) = V(A) + V(B) + V(AB) + V(e),

where here A, B and AB are random variables representing the A effects, B effects and AB effects. Consider the null hypothesis H'o. concerning the Factor A effects. The test is a test of whether V(A) = 0. E[MS(A)] = V(e) + c1V(AB) + c2V(A), where the c's are constants related to the sample sizes. When Ho is true, this is

V(e) + c1V(AB)

It is also true that this equals E[MS(AB)]. So Ho is equivalent to E[MS(A)] = E[MS(AB)], or E[MS(A)]/E[MS(AB)] = 1. When Ho is not true, this ratio is bigger than 1. This ratio is estimated by MS(A)/MS(AB), the F-ratio for this test. The hypothesis Ho is rejected when this ratio is large.

REFERENCES

Montgomery, Douglas C. (1991).   Design and Analysis of Experiments. 3rd ed.. Wiley, New York.

Street, Elisabeth, & Carroll, Mavis B. (1972). "Preliminary Evaluation of a New Food Product." In Tanur, J., et al. (eds.).   Statistics: A Guide to the Unknown. Holden-Day, San Francisco.   3rd ed., Wadsworth-Brooks/Cole, Pacific Grove, Cal., 1989.


latest revision   30-July-97