| BSTT 580 | Applied Multivariate Statistical Analysis |
| Professor | Stan Sclove |
| Textbook | Johnson & Wichern, 4th ed. |
Canonical Correlation Analysis (CCA) deals with situations in which the variables fall into two subsets, say q Y's and r Z's. Often the Y's are to be predicted from the Z's . Thus, CCA is a dependence rather than an interdependence technique.
VARIABLE
-------------------------------------------------
X1 X2 . . . Xp
--------------------------------------------------
Y1 Y2 ... Yk ... Yq Z1 Z2 ... Zm ... Zr (q+r = p)
--------------------------------------------------
1 y11 y12 ... y1k ... y1q z11 z12 ... z1m ... z1r
2 y21 y22 ... y2k ... y2q z21 z22 ... z2m ... z2r
. . .
CASE j yj1 yj2 ... yjk ... yjq zj1 zj2 ... zjm ... zjr
. . .
n yn1 yn2 ... ynk ... ynq zn1 zn2 ... znm ... znr
--------------------------------------------------
Canonical correlation analysis proceeds further by finding successive
pairs (CVZ2,CVY2), (CVZ3,CVY3), etc., of linear
BMDP6M - CANONICAL CORRELATION ANALYSIS
VERSION: 1990 (IBM/CMS) DATE: MARCH 3, 1998 AT 15:46:31
PROGRAM INSTRUCTIONS
/PROBLEM TITLE IS 'Credit Card Data (like Hair, Table 3.3)'.
/INPUT VARIABLES ARE 4.
FORMAT IS FREE.
/VARIABLE NAMES ARE CrdtCrds, CCexp, FamSize, FamInc.
/CANONICAL FIRST = FamSize, FamInc.
SECOND = CrdtCrds, CCexp.
/PRINT MATR=CORR, LOAD, COEF.
LINEsize=69.
/PLOT XVAR = CNVRS1,CNVRS2.
YVAR = CNVRF1,CNVRF2.
/END
PROBLEM TITLE IS Credit Card Data (like Hair, Table 3.3)
NUMBER OF VARIABLES TO READ . . . . . . . . . . 4
VARIABLES TO BE USED
1 CrdtCrds 2 CCexp 3 FamSize 4 FamInc
FIRST SET OF VARIABLES
----------------------
3 FamSize 4 FamInc
SECOND SET OF VARIABLES
-----------------------
1 CrdtCrds 2 CCexp
NUMBER OF VARIABLES IN FIRST SET. . . . . . . . 2 This is n.
NUMBER OF VARIABLES IN SECOND SET . . . . . . . 2 This is m.
TOTAL NUMBER OF VARIABLES USED. . . . . . . . . 4
MAXIMUM NUMBER OF CANONICAL VARIABLES . . . . . 2 This is min{m,n}
NUMBER OF CASES READ. . . . . . . . . . . . . . 48 This is N.
UNIVARIATE SUMMARY STATISTICS
-----------------------------
SMALLEST LARGEST
STANDARD SMALLEST LARGEST STANDARD STANDARD
VARIABLE MEAN DEVIATION VALUE VALUE SCORE SCORE
3 FamSize 4.2500 1.4947 2.0000 8.0000 -1.51 1.17
4 FamInc 33.5417 15.4589 14.0000 75.0000 -1.26 2.68
1 CrdtCrds 8.8542 1.5709 4.0000 12.0000 -1.82 3.28
2 CCexp 14.6875 8.2642 5.0000 34.0000 -1.55 3.08
CORRELATIONS
------------
FamSize FamInc CrdtCrds CCexp
3 4 1 2
FamSize 3 1.000
FamInc 4 0.290 1.000
CrdtCrds 1 0.813 0.340 1.000
CCexp 2 0.370 0.930 0.490 1.000
CANONICAL NUMBER OF BARTLETT'S TEST FOR
EIGENVALUE CORRELATION EIGENVALUES REMAINING EIGENVALUES
CHI- TAIL
SQUARE D.F. PROB.
142.12 4 0.0000
lambda1= 0.88365 R1= 0.94003 1 48.40 1 0.0000
lambda2= 0.64747 R2= 0.80466
---------------------------------------------------------------
CNVRF1 CNVRF2
1 2
FamSize 3 -0.026753 0.698485
FamInc 4 0.065389 -0.017082
CNVRF1 CNVRF2
1 2
FamSize 3 -0.040 1.044
FamInc 4 1.011 -0.264
COEFFICIENTS FOR CANONICAL VARIABLES FOR SECOND SET OF VARIABLES
----------------------------------------------------------------
CNVRS1 CNVRS2
1 2
CrdtCrds 1 -0.127497 0.719246
CCexp 2 0.172865 -0.060589
CNVRS1 CNVRS2
1 2
CrdtCrds 1 -0.200 1.130
CCexp 2 1.083 -0.380
CANONICAL VARIABLE LOADINGS
---------------------------
(CORRELATIONS OF CANONICAL VARIABLES WITH ORIGINAL VARIABLES)
FOR FIRST SET OF VARIABLES
CNVRF1 CNVRF2
1 2
FamSize 3 0.253 0.968
FamInc 4 0.999 0.038
-----------------------------
CANONICAL VARIABLE LOADINGS
---------------------------
(CORRELATIONS OF CANONICAL VARIABLES WITH ORIGINAL VARIABLES)
FOR SECOND SET OF VARIABLES
CNVRS1 CNVRS2
1 2
CrdtCrds 1 0.331 0.944
CCexp 2 0.985 0.175
------------------------------
2 PLOTS ARE TO BE MADE
NO. NAME NO. NAME NUMBER
7 CNVRS1 5 CNVRF1 9
8 CNVRS2 6 CNVRF2 10
......+.....+.....+.....+.....+.....+.....+.....+....
2.7 + 1 +
- -
- 1 -
- -
- 1 -
- -
- -
- 1 -
1.8 + +
- -
- -
- -
C - -
N - -
V - 1 -
R - 21 -
F .90 + 1 +
1 - -
- -
- 1 1 2 1 -
- 1 -
- 1 -
- -
5 - 1 1 1 -
0.0 + 1 1 2 11 +
- 1 -
- 1 1 -
- 1 111 -
- -
- 1 -
- 1 -
- 1 -
-.90 + 1 +
- 2 1 -
- 1 111 2 -
-1 11 -
......+.....+.....+.....+.....+.....+.....+.....+....
-.50 .50 1.5 2.5
-1.0 0.0 1.0 2.0
CNVRS1 7
...+....+....+....+....+....+....+....+....+....+....
- -
1.5 + 11 +
- 1 1 -
- -
- 12 -
- -
1.0 + 1 +
- 1 1 1 -
- 1 11 -
- 1 -
- -
.50 + 2 1 +
C - 12 1 -
N - -
V - 1 -
R - 1 11 -
F 0.0 + 1 +
2 - 2 1 1 -
- 2 -
- 11 -
- -
-.50 + +
- -
6 - -
- 1 -
- -
-1.0 + +
- -
- 1 -
- 1 11 -
- -
-1.5 + 111 12 +
- -
- 1 -
- 1 -
- -
...+....+....+....+....+....+....+....+....+....+....
-1.5 -.50 .50 1.5 2.5
-2.0 -1.0 0.0 1.0 2.0
CNVRS2 8
References
Anderson, T.W., & Sclove, Stanley L. (1986). Statistical Analysis of Data, 2nd ed. Scientific Press, Palo Alto, CA.
Dixon, W.J., & Massey, F.J. (1969). Introduction to Statistical Analysis, 3rd ed. New York: McGraw-Hill.
Galle, Omer R., Gove, Walter R., & McPherson, J. Miller (1972). Population density and pathology: What are the relations for man? Science 176, 7-April-1972, 23-30.
Hopkins, C.E. (1969). Statistical analysis by canonical correlation: A computer application. Health Services Research
Hotelling, Harold (1935). The most predictable criterion. J. Educ. Psych. 26, 139-142.
Hotelling, Harold (1936). Relations between two sets of variates. Biometrika 28, 321-377.
Meredith, W. (1964). Canonical correlation with fallible data. Psychometrika 29, 55-65.
Tatsuoka, M. M. (1988). Multivariate analysis:
Techniques for educational and psychological research. 2nd ed.
New York: Wiley.
Waugh, F.V. (1942). Regressions between sets of variables. Econometrica 10, 290-310.