UICUniversity of Illinois at Chicago
College of Business Administration
Department of Information & Decision Sciences


IDS 470     Multivariate Statistical Analysis
Instructor               Sclove
Textbook   Hair et al., 5th ed.


Notes on Chapter 6     Multivariate Analysis of Variance
Part B Section-by-Section Commentary

Required: Only Sections 6.1-4 (pp. 326-339) and Repeated Measures (p. 347).

HyperTable of Contents

6.0.   LEARNING OBJECTIVES . CHAPTER PREVIEW . KEY TERMS
6.1.   What is Multivariate Analysis of Variance?
6.1.1. Univariate Procedures for Assessing Group Differences
6.1.2. Multivariate Analysis of Variance (MANOVA)
6.2.   Differences between MANOVA and Discriminant Analysis   p. 336
6.3.   A Hypothetical Illustration of MANOVA   p. 336
6.4.   When Should We Use MANOVA?   p. 339
6.4.1. Control of Experimentwide Error Rate
6.4.2. Differences Among a Combination of Dependent Variables
6.5.   A Decision Process for MANOVA
6.6.  Stage One: Objectives of MANOVA
6.6.1. Types of Multivariate Questions Suitable for MANOVA
6.6.2. Selecting the Dependent Measures
6.7.   Stage Two: Issues in the Research Design of MANOVA
6.7.1. Sample Size Requirements--Overall and by Group
6.7.2. Factorial Designs--Two or More Treatments
6.7.3. Using Covariates--ANCOVA and MANCOVA
6.7.4. A Special Case of MANOVA: Repeated Measures
6.8.   Stage Three: Assumptions of ANOVA and MANOVA
6.8.1. Independence
6.8.2. Equality of Variance-Covariance Matrices
6.8.3. Normality
6.8.4. Linearity and Multicollinearity Among the Dependent Variables
6.8.5.Sensitivity to Outliers
6.9.   Stage Four: Estimation of the MANOVA Model and Assessing Overall Fit
6.9.1. Criteria for Significance Testing
6.9.2. Statistical Power of the Multivariate Tests
6.10.   Stage Five: Interpretation of the MANOVA Results
6.10.1. Evaluating Covariates
6.10.2. Assessing the Dependent Variate
6.10.3. Identifying Differences Between Individual Groups
6.11. Stage Six: Validation of the Results
6.12. Summary
6.13. Example 1: Difference Between Two Independent Groups
6.13.1. A Univariate Approach: The t test
6.13.2. A Multivariate Approach: Hotelling's T-square
6.14. Example 2: Difference Between k Independent Groups
6.14.1. A Univariate Approach: k-Groups ANOVA
6.14.2. A Multivariate Approach: k-Groups MANOVA
6.15. Example 3: A Factorial Design for MANOVA with Two Independent Variables
6.15.1. Stage One: Objectives of the MANOVA
6.14.2. Stage Two: Research Design of the MANOVA
6.14.3. Stage Three: Assumptions in MANOVA
6.14.4. Stage Four: Estimation of the MANOVA Model and Assessing Overall Fit
6.14.6. Stage Five: Interpretation of the Results
6.16.Summary . Questions . References

Addendum (not in Textbook):   CART/AID


6.0.   LEARNING OBJECTIVES . CHAPTER PREVIEW . KEY TERMS

6.1.   What is Multivariate Analysis of Variance?

6.1.1. Univariate Procedures for Assessing Group Differences Test

6.1.1.1. The t Test

The simplest ANOVA situation is that of the two-sample problem. It can be analyzed by a t test.

6.1.1.2. Analysis of Variance
When there are more than two groups, an F test is used.

6.1.2. Multivariate Analysis of Variance

In the multivariate situation the response variable is a vector of m dependent variables Y1, Y2, . . . , Ym . The hypothesis tested is that of equality of vectors of group means.
6.1.2.1 The Two-Group Case:   Hotelling's T2
One-way layout (single factor ANOVA). The multivariate case is handled by a test statistic called Hotelling's T-square. Directly analogous to the univariate two-sample t, it is the statistical (Mahalanobis) D-squared between the two sample mean vectors, in the metric of the covariance matrix of their difference. Hotelling's T-square is in fact the square of the two-sample t for the most significant variate, i.e., the most significant linear combination of the variables.

6.1.2.2. The k-Group Case:   MANOVA

Factorial designs. MANOVA is applied in any of the experimental designs encountered in univariate statistics; for a review see Against all Odds #12 (Design of Experiments) and #13 (Blocking & Sampling).

ANCOVA/MANCOVA. This is a combination of ANOVA/MANOVA and Multiple Regression.

Repeated measures. This design is used when the individuals (subjects, patients, cases, firms) are observed on several successive occasions. The results on any given individual are correlated, and this must be taken into account.

6.2.   Differences between MANOVA and Discriminant Analysis   p. 336

Before Multiple Discriminant Analysis (MDA) is applied (see Ch. 5), MANOVA is done as a preliminary test to see if there are actually differences between/among the groups.

6.3.   A Hypothetical Illustration of MANOVA

6.4.   When Should We Use MANOVA?

6.4.1. Control of Experimentwide Error Rate

6.4.2. Differences Among a Combination of Dependent Variables


Required: Only Sections 6.1-4 (pp. 326-339) and Repeated Measures (p. 347)

6.5.   A Decision Process for MANOVA

6.6.   Stage One: Objectives of MANOVA

6.6.1. Types of Multivariate Questions Suitable for MANOVA

6.6.2. Selecting the Dependent Measures

6.7.   Stage Two: Issues in the Research Design of MANOVA

6.7.1. Sample Size Requirements--Overall and by Group

6.7.2. Factorial Designs--Two or More Treatments

6.7.3. Using Covariates--ANCOVA and MANCOVA


Minitab commands

MTB > ANCOva Y1 Y2 = A B A*B; SUBC > COVAriates X1 X2.

6.7.4. A Special Case of MANOVA: Repeated Measures   p. 347

"Repeated Measures" occur when we have observed individuals on separate occasions, resulting in a time series of observations for each. In the HATCO data, variables 9 and 10 are Usage level and Satisfaction level. They might be assessed at, say, three different times, maybe 6 months apart. This gives a vector of six variables. This response vector might be analyzed as a function of the categorical variables 8 (size of firm) and 13 (type of industry). This would be a two-way ANOVA, where the response is the vector of six variables.

6.8.   Stage 3: Assumptions of ANOVA and MANOVA

6.8.1. Independence

6.8.2. Equality of Variance-Covariance Matrices

6.8.3. Normality

6.8.4. Linearity and Multicollinearity Among the Dependent Variables

6.8.5.Sensitivity to Outliers

6.9.   Stage Four: Estimation of the MANOVA Model and Assessing Overall Fit

6.9.1. Criteria for Significance Testing

The F criterion for univariate ANOVA is proportional to A/W, where W = within-groups sum of squares and A = among groups sum of squares. Analogously, the test criteria for multivariate MANOVA are measures of the size of the matrix A W-1, where W = the within-groups sum of squares matrix and A = the among groups sum of squares matrix. These measures of matrix size are functions of the characteristic roots (eigenvalues) of the matrix.

Note that elsewhere in the text the number of dependent variables is denoted by   m   (rather than   p   as here) and the number of groups by   NG   (rather than   k   as here).

6.9.2. Statistical Power of the Multivariate Tests

6.10.   Stage Five: Interpretation of the MANOVA Results

6.10.1. Evaluating Covariates

6.10.2. Assessing the Dependent Variate

6.10.3. Identifying Differences Between Individual Groups

6.11. Stage 6: Validation of the Results

6.12. Summary

6.13. Example 1: Difference Between Two Independent Groups

6.13.1. A Univariate Approach: The t test

6.13.2. A Multivariate Approach: Hotelling's T-square

6.14. Example 2: Difference Between k Independent Groups

6.14.1. A Univariate Approach: k-Groups ANOVA

6.14.2. A Multivariate Approach: k-Groups MANOVA

6.14.2.1. FROM UNIVARIATE TO MULTIVARIATE
In the univariate case, there is a decomposition of the observation into groups means and deviations from them, and a corresponding decomposition of the sums of squares.   There is a directly analogous decomposition in the multivariate case.
6.14.2.2. FROM ONE-WAY TO HIGHER-WAY ANOVA (not in text)
Different designs are generated by different relationships among the groups.
6.14.2.2.1. Factorial Designs
Suppose there are two factors, A and B, A at two levels, and B at three. Then there are six groups, with the following structure.


Group:        1     2    3   4   5   6
Level of A:   1     1    1   2   2   2
Level of B:   1     2    3   1   2   3

The 6-1 = 5 d.f. for Groups are broken down as follows.

Source  d.f.
------  ----
Groups   5
     A     1
     B     2
   A*B     2

Minitab commands

Versions 10 and later of MINITAB have a MANOVA subcommand in the ANOVA command. MTB > ANOVA Y1 Y2 = A B A*B; SUBC > MANOVA.
6.14.2.2.2. Hierarchical (nested) design
Suppose a survey is taken at 2 schools and at 3 classrooms in each.

Group:        1     2    3   4   5   6
School:       1     1    1   2   2   2
Classroom:    1     2    3   4   5   6
The 6-1 = 5 d.f. for Groups are broken down as follows.

Source  d.f.
------  ----
Groups   5
   Between 2 Schools  1
   Between Classrooms
      within Schools  4

Minitab commands

MTB > ANOVA Y1 Y2 = S C(S); SUBC > MANOVA.
You then try to see how much the schools differ, and how much individual classrooms within a given school differ from one another. You see which effects are larger, those between schools, or those between classrooms within a given school. Then you could tell planners of a future study how many schools to visit, relative to how many classrooms within a school. E.g., maybe it would be better to have visited 3 schools, and only 2 classrooms within each.

6.15. Example 3: A Factorial Design for MANOVA with Two Independent Variables

6.15.1. Stage One: Objectives of the MANOVA

6.15.2. Stage Two: Research Design of the MANOVA

6.15.3. Stage Three: Assumptions in MANOVA

6.15.4. Stage Four: Estimation of the MANOVA Model and Assessing Overall Fit

6.15.5. Stage Five: Interpretation of the Results

6.16. Summary . Questions . References

Problems for Chapter 6

1. Contrasts in Repeated Measures MANOVA:

Usage Level and Satisfaction Level were obtained from a sample of 10 of HATCO's customers in 1995, again in 1996, and again in 1997. Satisfaction is on a scale of 0 to 10, and Usage Level, being the percentage of business given to HATCO, is on a scale of 0 to 100 percent.

You will retrieve the data from the Web (see below) and enter the data in columns A-F of an Excel worksheet (alternatively, you may use a statistical computer package).

   Satisfaction            Usage
 ------------------   ------------------
  1995  1996  1997     1995  1996  1997
    A     B     C        D     E     F
For both Satisfaction and Usage Level, form the contrasts

initial level vs. average level after 1995
and
1996 vs. 1997.

More specifically, create four columns as follows:

    G              H            I            J
  -----------     ---      -----------      ---
  (B+C)/2 - A     C-B      (E+F)/2 - D      F-E
 

Using Excel or a statistical computer package, do t-tests of H0: µ = 0 for G, H, I, and J.

Hand in your printout with a written summary of the results, including answers to the following questions.

Question S1(answer using G): Was there a significant change in Satisfaction after 1995? If so, which direction was it?

Question S2 (answer using H): Was there a significant change in Satisfaction between 1996 and 1997? If so, which direction was it?

Question U1(answer using I): Was there a significant change in Usage Level after 1995? If so, which direction was it?

Question U2 (answer using J): Was there a significant change in Usage Level between 1996 and 1997? If so, which direction was it?

If your browser is configured to load into Excel, click here for Excel version of the data. Otherwise, click here for ASCII version of the data.


Addendum (not in Textbook):   CART/AID

CART (Classification and Regression Trees, also known as AID:   Automatic Interaction Detection): This is splitting of the sample according to values of the explanatory variables.

AID is used to find combinations of levels of factors that relate to a value of the dependent variable. E.g., it could be used to find sets of consumers who are most likely to buy a particular product that is being developed.

The dependent variable can be on any scale, dichotomous, categorical, or metric. The explanatory variables may also be dichotomous, categorical or metric.

When the dependent variable is numerical, the splitting can be done on the basis of a t- or F-test.

AID separates units of the initial group into two subgroups contingent upon the value of one of the predictors. All possible splits of this type are considered and the one which best separates the data into groups homogeneous in class is chosen. A chi-square or F statistic is used to measure the separation. This process then continues recursively (hence the name "recursive partitioning").

Nature of the Dependent Variable

Software for AID includes SPSS's CHAID (AID using chi-square for a categorical dependent variable).

When there are multiple dependent variables, in theory AID could still be used. The F of ANOVA would be replaced by a suitable MANOVA test statistic.


Created   1998: Sept 9 latest revision   2005: Oct 18