CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CMS DSN = MIX1CM CLUSPAC C C C C "MIX1CM CLUSPAC" IS A PROGRAM FOR CLUSTERING UNIVARIATE DATA C C (DATA ON THE LINE) BY ITERATIVE MAXIMIZATION OF THE MIXTURE- C C MODEL LIKELIHOOD C C C C N K C C --- -- C C L = | | > P(C)*F(X(I)|C) C C | | -- C C I=1 C=1 C C C C C C REFERENCE: C C C C WOLFE (1970). C C C C MANUAL MODE: NUMBER OF CLUSTERS AND INITIAL MEANS ARE C C INPUT. (USE PROGRAM MIX1CMA FOR AUTOMATIC SETTING OF C C NUMBERS OF CLUSTERS AND INITIAL MEANS.) C C C C C C PROGRAMMED BY C C DR. STANLEY L. SCLOVE 312/996-2681 C C DEPARTMENT OF C C INFORMATION & DECISION SCIENCES M/C 294 C C COLLEGE OF BUSINESS ADMINISTRATION C C UNIVERSITY OF ILLINOIS AT CHICAGO C C BOX 4348 C C CHICAGO, IL 60680-4348 C C C C C C VERSION 3.2 30-OCT-89 C C C C COPYRIGHT (C) 1988 STANLEY L. SCLOVE C C C C C C C C RESTRICTIONS (CAN BE MODIFIED): C C N, SAMPLE SIZE, AT MOST 999; C C K, NUMBER OF CLUSTERS, AT MOST 29; C C ITER, MAXIMUM NUMBER OF ITERATIONS, 20. C C C C C C C C CONTROL CARDS: C C C C (1) DATASET TITLE C C (2) N, IN FORMAT (2X,I4) C C (3) FMT, IN FORMAT (18A4), E.G., (1X,F4.1). C C ALLOW AT LEAST ONE BLANK IN FMT: IT WILL ALSO BE USED C C FOR OUTPUT, WHERE CC1 IS FOR CARRIAGE CONTROL. C C ALLOW A CC FOR THE DECIMAL POINT ON OUTPUT, C C WHETHER OR NOT THERE IS ONE ON INPUT. C C (4) DATA, IN FORMAT SPECIFIED BY FMT C C (5) K, NUMBER OF CLUSTERS, IN FORMAT (2X,I1) C C (6) K INITIAL VALUES OF PRIOR PROBABILITIES AND MEANS, C C IN FORMAT (5X,F3.2,2X,F8.2). C C (7) INITIAL VALUE OF THE VARIANCE. (THE RESULTS DO NOT C C DEPEND UPON THIS VALUE IF THE INITIAL PRIOR PROBABILITIES C C ARE EQUAL.) C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C C C WRITE PROGRAM INFORMATION. C C C READ SAMPLE SIZE, N. C C READ DATA FORMAT. C C READ DATA AND C COMPUTE STATISTICS OF WHOLE SAMPLE: C C THE NEXT STMT WAS FOR USE WITH THE WATFOR COMPILER: C READ (5,*) X(I) C C WRITE DATA: C WRITE SUMMARY STATISTICS FOR WHOLE SAMPLE: C C C READ K, NUMBER OF CLUSTERS. C C READ INITIAL PRIOR PROBS AND MEANS: C READ INITIAL VARIANCE: C C C C WRITE INITIAL PRIOR PROBS, MEANS AND VARIANCE: C C SET CONSTANTS. C C IF THE INITIAL PRIOR PROBABILITIES ARE EQUAL, THEN THE C FIRST ITERATION IS EQUIVALENT TO MINIMUM DISTANCE CLUSTERING C TO INITIAL MEANS (I.E., "ISODATA"). C IN GENERAL, THE CLUSTERING IS BY MAXIMUM POSTERIOR C PROBABILITY CLUSTERING. C C C STORE OLD CLUSTERING: C C COMMENCE DISTANCE COMPUTATIONS. C NOTE THAT A PROB. DENSITY FUNCTION OTHER THAN THE GAUSSIAN C COULD BE USED HERE: C XMNDSQ(I) = MIN SQ. DISTANCE FROM X(I) TO ANY MEAN C C C C C COMPUTE POSTERIOR PROBABILITIES OF GROUP MEMBERSHIP: C IF ( DENOM(I) .EQ. 0.0 ) DENOM(I)=0.0001 C C COMPUTE NEW LABELS BY MAX POSTERIOR PROBABILITY: C C WRITE NEW LABELS: C C UPDATE CLUSTER PRIOR PROBABILITIES P(IC), MEANS XMEAN(IC) AND C VARIANCE VARHAT: C XNC(IC) WILL BE THE SUM OVER ALL N OBSERVATIONS OF THEIR C POSTERIOR PROBABILITIES OF MEMBERSHIP IN CLUSTER IC. C IF ( VAR(IC) .LE. 0.0 ) VAR(IC) = 0.0001 C C COUNT NUMBERS IN CLUSTERS: C C C C C C C C C B(IC) IS BOUNDARY BETWEEN G-TH AND G+1-ST CLASSES. C C C C VARHAT IS MLE OF VARIANCE. C C C C C COMPUTE MODEL-SELECTION CRITERIA: C NO. PARAMETERS = K MEANS + 1 VARIANCE + (K-1) PROBS. C C SCHWARZ' CRITERION IS FIRST-DEGREE EXPANSION OF C LOG POSTERIOR PROBABILITY OF THE MODEL. C KASHYAP'S CRITERION IS SECOND-DEGREE EXPANSION OF SAME. C C C C C WITH THE WATFOR COMPILER, USE THE FOLLOWING STMT AFTER END C INSTEAD OF //GO.SYSIN DD * . C $DATA Sample deck set-up: TRYPANOSOMES DATA: LENGTHS OF 500 ROUNDWORMS N=0500 (1X,F2.0) 15 15 15 . . . 35 K=2 C1 .50 15. C2 .50 29. VAR 25.00 /*