|
|
| UIC | University of Illinois at Chicago
| College of Business Administration
| Department of Information & Decision Sciences
| |
| |
IDS 470 | Multivariate Statistical Analysis
| Instructor | Sclove
|
Textbook | Hair et al., 5th ed.
|
|
Notes on Chapter 6 | Multivariate Analysis of Variance
| Part B | Section-by-Section Commentary
| | | | | | | |
Required:
Only Sections 6.1-4 (pp. 326-339) and Repeated Measures (p. 347).
HyperTable of Contents
- 6.0. LEARNING OBJECTIVES . CHAPTER PREVIEW . KEY TERMS
- 6.1. What is Multivariate Analysis of Variance?
- 6.1.1. Univariate Procedures for Assessing Group Differences
- 6.1.2. Multivariate Analysis of Variance (MANOVA)
- 6.2. Differences between MANOVA and Discriminant Analysis p. 336
- 6.3. A Hypothetical Illustration of MANOVA p. 336
- 6.4. When Should We Use MANOVA? p. 339
- 6.4.1. Control of Experimentwide Error Rate
- 6.4.2. Differences Among a Combination of Dependent Variables
- 6.5. A Decision Process for MANOVA
- 6.6. Stage One: Objectives of MANOVA
- 6.6.1. Types of Multivariate Questions Suitable for MANOVA
- 6.6.2. Selecting the Dependent Measures
- 6.7. Stage Two: Issues in the Research Design of MANOVA
- 6.7.1. Sample Size Requirements--Overall and by Group
- 6.7.2. Factorial Designs--Two or More Treatments
- 6.7.3. Using Covariates--ANCOVA and MANCOVA
- 6.7.4. A Special Case of MANOVA: Repeated Measures
- 6.8. Stage Three: Assumptions of ANOVA and MANOVA
- 6.8.1. Independence
- 6.8.2. Equality of Variance-Covariance Matrices
- 6.8.3. Normality
- 6.8.4. Linearity and Multicollinearity Among the Dependent Variables
- 6.8.5.Sensitivity to Outliers
- 6.9. Stage Four: Estimation of the MANOVA Model and Assessing Overall Fit
- 6.9.1. Criteria for Significance Testing
- 6.9.2. Statistical Power of the Multivariate Tests
- 6.10. Stage Five: Interpretation of the MANOVA Results
- 6.10.1. Evaluating Covariates
- 6.10.2. Assessing the Dependent Variate
- 6.10.3. Identifying Differences Between Individual Groups
- 6.11. Stage Six: Validation of the Results
- 6.12. Summary
- 6.13. Example 1: Difference Between Two Independent Groups
- 6.13.1. A Univariate Approach: The t test
- 6.13.2. A Multivariate Approach: Hotelling's T-square
- 6.14. Example 2: Difference Between k Independent Groups
- 6.14.1. A Univariate Approach: k-Groups ANOVA
- 6.14.2. A Multivariate Approach: k-Groups MANOVA
- 6.15. Example 3: A Factorial Design for MANOVA with Two Independent Variables
- 6.15.1. Stage One: Objectives of the MANOVA
- 6.14.2. Stage Two: Research Design of the MANOVA
- 6.14.3. Stage Three: Assumptions in MANOVA
- 6.14.4. Stage Four: Estimation of the MANOVA Model and Assessing Overall Fit
- 6.14.6. Stage Five: Interpretation of the Results
- 6.16.Summary . Questions . References
- Addendum (not in Textbook): CART/AID
6.0. LEARNING OBJECTIVES . CHAPTER PREVIEW . KEY TERMS
6.1. What is Multivariate Analysis of Variance?
6.1.1. Univariate Procedures for Assessing Group Differences Test
6.1.1.1. The t Test
The simplest ANOVA situation is that of the two-sample problem.
It can be analyzed by a t test.
6.1.1.2. Analysis of Variance
When there are more than two groups, an F test is used.
6.1.2. Multivariate Analysis of Variance
In the multivariate situation the response variable is a vector of m dependent variables Y1, Y2, . . . , Ym .
The hypothesis tested is that of equality of vectors of group means.
6.1.2.1 The Two-Group Case: Hotelling's T2
One-way layout (single factor ANOVA). The multivariate case is handled by a test statistic called Hotelling's T-square.
Directly analogous to the univariate two-sample t, it is the statistical (Mahalanobis) D-squared between the two sample mean vectors, in the metric of
the covariance matrix of their difference.
Hotelling's T-square is in fact the square of the two-sample t for the most significant variate, i.e., the most significant linear combination of the variables.
6.1.2.2. The k-Group Case: MANOVA
Factorial designs. MANOVA is applied in any of the experimental designs
encountered in univariate statistics; for a review see Against all Odds #12
(Design of Experiments) and #13 (Blocking & Sampling).
ANCOVA/MANCOVA. This is a combination of ANOVA/MANOVA and Multiple
Regression.
Repeated measures. This design is used when the individuals (subjects, patients, cases, firms) are observed on several successive occasions. The results on any given
individual are correlated, and this must be taken into account.
6.2. Differences between MANOVA and Discriminant Analysis p. 336
Before Multiple Discriminant Analysis (MDA) is applied (see Ch. 5), MANOVA is done as a preliminary test to see if there are actually differences between/among the groups.
6.3. A Hypothetical Illustration of MANOVA
6.4. When Should We Use MANOVA?
6.4.1. Control of Experimentwide Error Rate
6.4.2. Differences Among a Combination of Dependent Variables
Required: Only Sections 6.1-4 (pp. 326-339) and Repeated Measures (p. 347)
6.5. A Decision Process for MANOVA
6.6. Stage One: Objectives of MANOVA
6.6.1. Types of Multivariate Questions Suitable for MANOVA
6.6.2. Selecting the Dependent Measures
6.7. Stage Two: Issues in the Research Design of MANOVA
6.7.1. Sample Size Requirements--Overall and by Group
6.7.2. Factorial Designs--Two or More Treatments
6.7.3. Using Covariates--ANCOVA and MANCOVA
Minitab commands
MTB > ANCOva Y1 Y2 = A B A*B;
SUBC > COVAriates X1 X2.
6.7.4. A Special Case of MANOVA:
Repeated Measures p. 347
"Repeated Measures" occur when we have observed individuals on separate occasions, resulting in a time series of observations for each. In the HATCO data, variables 9 and 10 are Usage level and Satisfaction level. They might be assessed at, say, three different times, maybe 6 months apart. This gives a vector of six variables. This response vector might be analyzed as a function of the categorical variables 8 (size of firm) and 13 (type of industry). This would be a two-way ANOVA, where the response is the vector of six variables.
6.8. Stage 3: Assumptions of ANOVA and MANOVA
6.8.1. Independence
6.8.2. Equality of Variance-Covariance Matrices
6.8.3. Normality
6.8.4. Linearity and Multicollinearity Among the Dependent Variables
6.8.5.Sensitivity to Outliers
6.9. Stage Four: Estimation of the MANOVA Model and Assessing Overall Fit
6.9.1. Criteria for Significance Testing
The F criterion for univariate ANOVA is proportional to A/W, where W = within-groups sum of squares and A = among groups sum of squares. Analogously, the test criteria for multivariate MANOVA are measures of the size of the matrix A W-1,
where W = the within-groups sum of squares matrix and A = the among groups sum of squares matrix. These measures of matrix size are functions of the characteristic roots (eigenvalues) of the matrix.
Note that elsewhere in the text the number of dependent variables is denoted by m (rather than p as here) and the number of groups by NG (rather than k as here).
6.9.2. Statistical Power of the Multivariate Tests
6.10. Stage Five: Interpretation of the MANOVA Results
6.10.1. Evaluating Covariates
6.10.2. Assessing the Dependent Variate
6.10.3. Identifying Differences Between Individual Groups
6.11. Stage 6: Validation of the Results
6.12. Summary
6.13. Example 1: Difference Between Two Independent Groups
6.13.1. A Univariate Approach: The t test
6.13.2. A Multivariate Approach: Hotelling's T-square
6.14. Example 2: Difference Between k Independent Groups
6.14.1. A Univariate Approach: k-Groups ANOVA
6.14.2. A Multivariate Approach: k-Groups MANOVA
6.14.2.1. FROM UNIVARIATE TO MULTIVARIATE
In the univariate case, there is a decomposition of the observation into groups means and deviations from them, and a corresponding decomposition of the sums of squares. There is a directly analogous decomposition in the multivariate case.
6.14.2.2. FROM ONE-WAY TO HIGHER-WAY ANOVA (not in text)
Different designs are generated by different relationships among the groups.
6.14.2.2.1. Factorial Designs
Suppose there are two factors, A and B, A at two levels, and B at three.
Then there are six groups, with the following structure.
Group: 1 2 3 4 5 6
Level of A: 1 1 1 2 2 2
Level of B: 1 2 3 1 2 3
The 6-1 = 5 d.f. for Groups are broken down as follows.
Source d.f.
------ ----
Groups 5
A 1
B 2
A*B 2
Minitab commands
Versions 10 and later of MINITAB have a MANOVA subcommand
in the ANOVA command.
MTB > ANOVA Y1 Y2 = A B A*B;
SUBC > MANOVA.
6.14.2.2.2. Hierarchical (nested) design
Suppose a survey is taken at 2 schools and at 3 classrooms in each.
Group: 1 2 3 4 5 6
School: 1 1 1 2 2 2
Classroom: 1 2 3 4 5 6
The 6-1 = 5 d.f. for Groups are broken down as follows.
Source d.f.
------ ----
Groups 5
Between 2 Schools 1
Between Classrooms
within Schools 4
Minitab commands
MTB > ANOVA Y1 Y2 = S C(S);
SUBC > MANOVA.
You then try to see how much the schools differ, and how much individual classrooms within a given school differ from one another.
You see which effects are larger, those between schools, or those
between classrooms within a given school.
Then you could tell planners of a future study how many schools to visit, relative to how many classrooms within a school.
E.g., maybe it would be better to have visited 3 schools, and only 2 classrooms within each.
6.15. Example 3: A Factorial Design for MANOVA with Two Independent Variables
6.15.1. Stage One: Objectives of the MANOVA
6.15.2. Stage Two: Research Design of the MANOVA
6.15.3. Stage Three: Assumptions in MANOVA
6.15.4. Stage Four: Estimation of the MANOVA Model and Assessing Overall Fit
6.15.5. Stage Five: Interpretation of the Results
6.16. Summary . Questions . References
Problems for Chapter 6
1. Contrasts in Repeated Measures MANOVA:
Usage Level and Satisfaction Level were obtained from a sample of 10 of HATCO's customers in 1995, again in 1996, and again in 1997. Satisfaction is on a scale of 0 to 10, and Usage Level, being the percentage of business given to HATCO, is on a scale of 0 to 100 percent.
You will retrieve the data from the Web (see below) and enter the data in columns A-F of an Excel worksheet (alternatively, you may use a statistical computer package).
Satisfaction Usage
------------------ ------------------
1995 1996 1997 1995 1996 1997
A B C D E F
For both Satisfaction and Usage Level, form the contrasts
initial level vs. average level after 1995
and
1996 vs. 1997.
More specifically, create four columns as follows:
G H I J
----------- --- ----------- ---
(B+C)/2 - A C-B (E+F)/2 - D F-E
Using Excel or a statistical computer package, do t-tests of H0: µ = 0 for G, H, I, and J.
Hand in your printout with a written summary of the results, including answers to the following questions.
Question S1(answer using G): Was there a significant change in
Satisfaction after 1995? If so, which direction was it?
Question S2 (answer using H):
Was there a significant change
in Satisfaction between 1996 and 1997? If so, which direction was it?
Question U1(answer using I): Was there a significant change in Usage Level after 1995?
If so, which direction was it?
Question U2 (answer using J): Was there a significant change in Usage Level between 1996 and 1997?
If so, which direction was it?
If your browser is configured to load into Excel,
click here for Excel version of the data.
Otherwise,
click here for ASCII version of the data.
Addendum (not in Textbook): CART/AID
CART (Classification and Regression Trees, also known as AID: Automatic
Interaction Detection): This is splitting of the sample according to values of the explanatory variables.
AID is used to find combinations of levels of factors that relate to a value of the dependent variable. E.g., it could be used
to find sets of consumers who are most likely to buy a particular product that is being developed.
The dependent variable can be on any scale, dichotomous, categorical, or metric.
The explanatory variables may also be dichotomous, categorical or metric.
When the dependent variable is numerical, the splitting can be done on the basis of a t- or F-test.
AID separates units of
the initial group into two subgroups contingent upon the value of one
of the predictors.
All possible splits of this type are considered and the one which best separates the data into groups homogeneous in class is chosen.
A chi-square or F statistic is used to measure the separation.
This process then continues recursively (hence the name "recursive partitioning").
Nature of the Dependent Variable
Software for AID includes SPSS's CHAID (AID using chi-square for
a categorical dependent variable).
When there are multiple dependent variables, in theory AID could still be used.
The F of ANOVA would be replaced by a suitable MANOVA test statistic.
Created 1998: Sept 9
latest revision 2005: Oct 18