@heading number right $$ REALEST BMDPKMOP p.$$ ------------------------------------------------------------------------------------------------------------------------ @end PAGE 1 BMDPKM BMDPKM - K MEANS CLUSTERING OF CASES Copyright (C) Regents of University of California. BMDP Statistical Software, Inc. 1440 Sepulveda Boulevard Phone (213) 479-7799 Los Angeles, California 90025 Telex 4972934 Program Version: 1987 (VM/CMS) Manual Edition: 1983, 1985 reprint. State NEWS in the PRINT paragraph for a summary of new features. DECEMBER 17, 1989 AT 15:58:12 PROGRAM INSTRUCTIONS /PROBLEM TITLE IS ' REALESTATE DATA'. /INPUT VARIABLES ARE 6. FORMAT IS FREE. /VARIABLE NAMES ARE STYLE,SFLA,GRADE,ASSESSED,MARKET, ID. USE = 1 to 6. LABEL = ID. /CLUSTER NUMBER = 3. STANDARD = WCOV. CENTER=55,6,7. /PRINT CASE = 4. MEMBER. /PLOT XVAR = ASSESSED. YVAR = MARKET. /COMMENT 'CMS DSN = REALEST DAT .' /COMMENT 'Real estate market data from p. 273 of Robt. B. Miller, .' /COMMENT 'MINITAB HANDBOOK FOR BUSINESS AND ECONOMICS .' /COMMENT 'C1=a "style" code assigned by the assessors STYLE .' /COMMENT 'C2=square feet of living area (in hundreds) SFLA .' /COMMENT 'C3=a "grade" code assigned by the assessors GRADE .' /COMMENT 'C4=assessed value of the parcel at time of sale ASSESSED.' /COMMENT 'C5=market value of the parcel MARKET .' /COMMENT 'C6=number of parcel (1 to 60) ID '. /END PROBLEM TITLE IS REALESTATE DATA NUMBER OF VARIABLES TO READ IN. . . . . . . . . 6 NUMBER OF VARIABLES ADDED BY TRANSFORMATIONS. . 0 TOTAL NUMBER OF VARIABLES . . . . . . . . . . . 6 NUMBER OF CASES TO READ IN. . . . . . . . . . . TO END CASE LABELING VARIABLES . . . . . . . . . . . .ID MISSING VALUES CHECKED BEFORE OR AFTER TRANS. . NEITHER BLANKS ARE. . . . . . . . . . . . . . . . . . . MISSING NUMBER OF WORDS OF DYNAMIC STORAGE. . . . . . . 246454 VARIABLES TO BE USED 1 STYLE 2 SFLA 3 GRADE 4 ASSESSED 5 MARKET PAGE 2 BMDPKM REALESTATE DATA INPUT FORMAT IS FREE MAXIMUM LENGTH DATA RECORD IS 80 CHARACTERS. PAGE 3 BMDPKM REALESTATE DATA NUMBER(S) OF CLUSTER(S) TO REPORT . . . . . . . 3 DISTANCES ARE STANDARDIZED BY . . . . . . . . . WCOV MAXIMUM NUMBER OF MISSING VALUES PER CASE . . . 0 TOLERANCE . . . . . . . . . . . . . . . . . . . 0.001000 THE FOLLOWING CASES ARE THE INITIAL CLUSTER CENTERS 55 6 7 MAXIMUM NUMBER OF ITERATIONS. . . . . . . . . . 30 NUMBER OF CASES PRINTED . . . . . . . . . . . . 4 MAXIMUM NUMBER OF CASES THAT CAN BE PROCESSED 18876 PAGE 4 BMDPKM REALESTATE DATA DATA POINTS C A S E 1 2 3 4 5 NO. LABEL STYLE SFLA GRADE ASSESSED MARKET ----- -------- ---------- ---------- ---------- ---------- ---------- 1 01 1.250 10.110 .900 39 57.600 2 02 1 7.210 .900 32.400 49.200 3 03 1 7.600 .900 39.900 53.700 4 04 1 6.620 .900 29.400 51.900 NUMBER OF CASES READ. . . . . . . . . . . . . . 60 PAGE 5 BMDPKM REALESTATE DATA REALLOCATION IS COMPLETE AFTER 4 ITERATIONS FOR 3 CLUSTERS CLUSTER 1 OF 3 CONTAINS 8 CASES ==================================== STATISTICS ARE COMPUTED FROM THE STANDARDIZED DATA 1 111111 1 DISTANCE +.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+ FROM CENTER TO CASES IN THIS CLUSTER 5.000 10.00 3 3 3 33 3 3 3 3 3 333 33 3 33 3 2 333 333333 3333332 3323 233333 2 2 2 2 22 DISTANCE +.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+ FROM CENTER TO CASES IN OTHER CLUSTERS 5.000 10.00 C A S E WEIGHT DISTANCE ------------------------- 40 1.00 1.2764 44 1.00 1.4029 48 1.00 1.3976 55 1.00 3.6447 58 1.00 1.6593 59 1.00 1.5329 06 1.00 3.6606 37 1.00 1.8000 ---------------------------- AVERAGE DISTANCE 2.0468 CLUSTER 2 OF 3 CONTAINS 10 CASES ==================================== STATISTICS ARE COMPUTED FROM THE STANDARDIZED DATA 2 2 222 2 2 22 2 DISTANCE +.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+ FROM CENTER TO CASES IN THIS CLUSTER 5.000 10.00 3 3 3 3 3 3 3 333 333 3333333333 3 33 1 3 1 3333333333 13131 1 11 3 3 3 DISTANCE +.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+ FROM CENTER TO CASES IN OTHER CLUSTERS 5.000 10.00 C A S E WEIGHT DISTANCE ------------------------- 16 1.00 1.1593 17 1.00 4.0127 28 1.00 2.1654 29 1.00 0.9568 30 1.00 1.1671 33 1.00 1.0697 38 1.00 2.1664 41 1.00 1.7526 49 1.00 3.6548 13 1.00 3.5316 ---------------------------- AVERAGE DISTANCE 2.1636 CLUSTER 3 OF 3 CONTAINS 42 CASES ==================================== STATISTICS ARE COMPUTED FROM THE STANDARDIZED DATA 3 33 3 3 333 3 3 3 3 3 33333 333333333 333333 3 33 3 3 3 33 3 DISTANCE +.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+ FROM CENTER TO CASES IN THIS CLUSTER 4.000 8.000 1 2 1 1 1 11 2 22 2 221 2 1 2 2 DISTANCE +.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+ FROM CENTER TO CASES IN OTHER CLUSTERS 4.000 8.000 C A S E WEIGHT DISTANCE ------------------------- 01 1.00 0.6745 03 1.00 1.2842 05 1.00 0.5989 07 1.00 3.7189 08 1.00 1.5787 09 1.00 1.4714 10 1.00 1.2536 11 1.00 2.2780 12 1.00 1.4845 14 1.00 2.5859 15 1.00 1.3962 18 1.00 0.6353 19 1.00 0.9787 20 1.00 0.4555 21 1.00 1.4017 22 1.00 1.5439 23 1.00 2.2660 24 1.00 1.4852 26 1.00 1.1994 27 1.00 3.5182 31 1.00 1.2713 32 1.00 0.7455 34 1.00 1.9349 35 1.00 1.6866 36 1.00 2.0810 39 1.00 4.3048 42 1.00 2.9250 43 1.00 0.9303 45 1.00 3.1497 46 1.00 3.4608 47 1.00 3.6216 50 1.00 1.3658 51 1.00 0.5466 52 1.00 2.7022 53 1.00 1.7881 54 1.00 1.7043 56 1.00 1.8881 57 1.00 1.4235 60 1.00 1.1129 04 1.00 2.0518 02 1.00 1.8825 25 1.00 3.6660 ---------------------------- AVERAGE DISTANCE 1.8584 PAGE 6 BMDPKM REALESTATE DATA REPORT ON CASES WITH POSITIVE WEIGHT ------------------------------------ .......+......+......+......+......+......+......+......+......+......+......+......+......+......+.... - - - 3 - - - - 3 3 - 70 + + - - - 3 3 - - 3 - - 3 3 - 65 + 3 3 3 + - 3 3 3 - - 3 3 3 - - 3 3 3 - - 3 3 3 - 60 + 1 + - 3 3 3 - - 3 3 3 - - 3 3 - M - 1 1 3 3 - A 55 + 3 + R - 2 3 3 - K - 1 3 - E - 1 1 2 3 3 - T - - 50 + 22 + - 3 - - - - 2 2 2 - - - 45 + 1 + - - - - - - - - 40 + + - 2 2 - - 1 3 - - - - - 35 + + - - - - - - - - 30 + + - 2 - .......+......+......+......+......+......+......+......+......+......+......+......+......+......+.... 9. 15 21 27 33 39 45 12 18 24 30 36 42 48 ASSESSED PAGE 7 BMDPKM REALESTATE DATA CLUSTER MEANS -------------- SIZE STYLE SFLA GRADE ASSESSED MARKET 1 8 1.0775 9.1150 0.8625 20.8537 51.4125 2 10 1.1000 7.1470 0.7500 31.5900 45.2700 3 42 1.2678 10.0155 0.9095 38.9927 60.4284 GRAND MEAN 1.2145 9.4173 0.8767 35.3403 56.6998 PAGE 8 BMDPKM REALESTATE DATA CLUSTER STANDARD DEVIATIONS ---------------------------- STYLE SFLA GRADE ASSESSED MARKET 1 0.1758 1.5323 0.0694 7.9827 6.7843 2 0.2415 1.3947 0.0000 7.9511 7.4612 3 0.3191 2.4529 0.0297 5.1972 6.5717 STYLE SFLA GRADE ASSESSED MARKET MEAN SQUARES BETWEEN 0.200 33.650 0.104 1189.914 1056.991 WITHIN 0.086 4.923 0.001 37.237 45.507 D.F.-S 2, 57 2, 57 2, 57 2, 57 2, 57 F-RATIO 2.324 6.835 84.510 31.955 23.227 P-VALUE 0.085 0.001 0.000 0.000 0.000 PAGE 9 BMDPKM REALESTATE DATA CLUSTER PROFILES - VARIABLES ARE ORDERED BY F-RATIO SIZE -------------------------------------------------------- * * * GRADE -----1----- 2 ---3-- ASSESSED ------1----- -----2------ ----3--- MARKET -----1----- -----2------ -----3----- SFLA -------1------ -----2------ -----------3---------- STYLE ----------1--------- -------------2------------- ------------------3----------------- * * * EACH COLUMN DESCRIBES A CLUSTER . THE CLUSTER NUMBER IS PRINTED AT THE MEAN OF EACH VARIABLE DASHES INDICATE ONE STANDARD DEVIATION ABOVE AND BELOW PAGE 10 BMDPKM REALESTATE DATA POOLED WITHIN CLUSTER COVARIANCES --------------------------------- STYLE SFLA GRADE ASSESSED MARKET 1 2 3 4 5 STYLE 1 0.1 SFLA 2 0.5 4.9 GRADE 3 0.0 0.0 0.0 ASSESSED 4 0.2 3.9 0.1 37.2 MARKET 5 1.0 10.4 0.1 15.7 45.5 PAGE 11 BMDPKM REALESTATE DATA POOLED WITHIN CLUSTER CORRELATIONS ---------------------------------- STYLE SFLA GRADE ASSESSED MARKET 1 2 3 4 5 STYLE 1 1.0000 SFLA 2 0.7669 1.0000 GRADE 3 0.4112 0.3126 1.0000 ASSESSED 4 0.0944 0.2879 0.4054 1.0000 MARKET 5 0.5084 0.6943 0.4174 0.3820 1.0000 PAGE 12 BMDPKM REALESTATE DATA ACCURACY CHECK OF STANDARDIZING TRANSFORMATION ---------------------------------------------- YOUR STATEMENT 'STAND=WCOV.' REQUESTED THAT THE DATA, X, BE STANDARDIZED, Z=(T-INVERSE)*X, SO THAT THE POOLED WITHIN CLUSTER COVARIANCE MATRIX OF Z BE THE IDENTITY MATRIX. THE MATRIX T HAS BEEN OBTAINED ITERATIVELY. THE FOLLOWING OUTPUT CHECKS THE ACCURACY OF (T), BY COMPARING THE POOLED WITHIN CLUSTER COVARIANCE MATRIX COMPUTED FROM THE DATA, WITH (T)*(T-TRANSPOSE). RELATIVE ERRORS OF THE DIAGONALS -------------------------------- 0.0000 0.0000 0.0000 0.0000 0.0000 ACCURACY CHECK MATRIX --------------------- DIFFERENCES BETWEEN CORRELATIONS COMPUTED FROM (W) AND (T)*(T-TRANSPOSE) STYLE SFLA GRADE ASSESSED MARKET 1 2 3 4 5 STYLE 1 0.0000 SFLA 2 -0.0000 0.0000 GRADE 3 0.0000 -0.0000 0.0000 ASSESSED 4 0.0000 0.0000 0.0000 0.0000 MARKET 5 0.0000 -0.0000 -0.0000 -0.0000 -0.0000 IF ANY OF THESE CHECKS FAIL (I.E. VALUE GREATER THEN 0.001), THEN THE 'STAND=WCOV.' REQUEST HAS NOT BEEN FULFILLED, AND THE FOLLOWING FOLLOWUP IS RECOMENDED (1) RERUN THE PROBLEM AND SAVE THE CLUSTERING RESULTS ON BMD-FILE. (2) RERUN THE PROBLEM WITH THE BMDP-FILE SAVED IN (1) AS INPUT, AND THE CLUSTER MEMBERSHIPS OBTAINED IN (1) AS CLUSTER MEMBERSHIP INDICATOR THE SKELETON DECKSETUP TO DO THIS IS // EXEC BIMED,PROG=BMDPKM //GO.SYSIN DD * / PROBLEM TITLE IS 'SAVE BMDPKM RESULTS'. / INPUT . . . / VARIABLE . . . / CLUSTER . . . / SAVE UNIT IS 3. CODE IS MYDATA. NEW. / END - D A T A - / PROBLEM TITLE IS 'RERUN'. / INPUT UNIT IS 3. CODE IS MYDATA. / CLUSTER . . . MEMBER=CLUSTER. / END /* // PAGE 13 BMDPKM REALESTATE DATA CPU TIME USED 0.556 SECONDS PAGE 14 BMDPKM BMDPKM - K MEANS CLUSTERING OF CASES DECEMBER 17, 1989 AT 15:58:13 NO MORE CONTROL LANGUAGE. PROGRAM TERMINATED