CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis USER'S MANUAL 1999 edition Stanley L. Sclove, Ph.D. Professor Information & Decision Sciences Dept. (MC 294) College of Business Administration University of Illinois at Chicago 601 S. Morgan St. Chicago, IL 60607-7124 Copyright 1991 Stanley Louis Sclove 13-Jan-92 Stanley L. Sclove i CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL -------------------------------------------------------------------- CONTENTS Abstract 1. What is CLUSPAC? 2. Output of a Program 3. Table: Directory of CLUSPAC 4. Using CLUSPAC through VS FORTRAN 5. Lists of Program Descriptions and Control Statements MIX*** series: MIX1CM MIX1CMA MIX1DT MIX1DTA MIXPCM MIXPCMA MIXPDT MIXPDTA ISDT*** series: ISDT1CM ISDT1CMA ISDT1CMX ISDT1DT ISDT1DTA ISDTPCM CLASS*** series: CLASSPDT CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis USER'S MANUAL 1999 edition Stanley L. Sclove, Ph.D. Professor Information & Decision Sciences Dept. M/C 294 College of Business Administration University of Illinois at Chicago (UIC) ABSTRACT This document is a description of CLUSPAC, a set of FORTRAN computer programs for mixture-model cluster analysis for clustering individuals. These programs have been developed by the author in connection with his teaching, research and consulting activities at UIC and elsewhere. Object code for the programs is available to the UIC user community on the mainframe "tigger" via http://www.uic.edu/~slsclove/ids472. This document is a User's Manual for those programs. ........................................................................ Stanley L. Sclove p. 2 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL ------------------------------------------------------------------------ CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis USER'S MANUAL 1999 edition Stanley L. Sclove, Ph.D. 1. What is CLUSPAC? -------------------- CLUSPAC is a package of FORTRAN programs for clustering and classification of (univariate and) multivariate data. Object code for the programs is available to the UIC user community via http://www.uic.edu/~slsclove/ids472. This document is a User's Manual for those programs. The source code is the property of the author (Copyright 1991 Stanley Louis Sclove). The following are some of the references giving methodology on which the programs are based. Ball, G.H., and Hall, D.J. (1967). A clustering technique for summarizing multivariate data. Behavioral Science 12, 153-155. The basic reference for ISODATA. Johari, Shyam, and Sclove, Stanley L. Partitioning a distribution. Commmunications in Statistics (A) 5 (1976), 133-147. Gives optimal class probabilities for the normal (and other) distributions. Sclove, Stanley L. Population mixture models and clustering algorithms. Communications in Statistics(A) 6 (1977), 417-434. Gives a probability interpretation for and certain modifications to ISODATA. Sclove, Stanley L. Mixture-Model Cluster Analysis. CRIM Working Paper No.98-1, December, 1991. Center for Research in Information Management, University of Illinois at Chicago. ISODATA. Selim, S.Z. and Ismail, M.A. (1984). K-means type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. PAMI, vol. PAMI-6, no. 1, pp. 81-87. Wolfe, J. H. (1970). Pattern clustering by multivariate mixture analysis. Multivariate Behavioral Research 5, 329-350. A basic reference on normal mixture-model clustering. There are three sets of programs in CLUSPAC, denoted by CLASS***, ISDT****, and MIX*****. ISDT**** series: clustering by ISODATA (Ball and Hall 1967) as modified by Sclove (1977) MIX***** series: clustering based on a mixture of normal distributions (Wolfe 1970) CLASS*** series: classification (i.e., assignment or allocation) of given observations into k multivariate normal distributions with specified parameters and prior probabilities Stanley L. Sclove p. 3 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL ---------------------------------------------------------------------- The programs have restrictions, which may be modified by relatively minor changes in some of the FORTRAN statements: N, sample size, at most 999; K, number of clusters, at most 29; ITER, maximum number of iterations, 20. The programs require various control statements, such as the following: (1) dataset title (2) N, in format (2X,I4) (3) FMT, in format (18A4), e.g., (1X,F4.1). allow at least one blank in FMT: it will also be used for output, where cc1 is for carriage control. Allow a cc for the decimal point on output, whether or not there is one on input. (4) data, in format specified by FMT Thus, some knowledge of FORTRAN is helpful for running the programs. Stanley L. Sclove p. 4 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL ---------------------------------------------------------------------- 2. Output of a Program ----------------------- Here is output of a run of one of the programs (ISDTPCM). ........................................................................ PROGRAM ISDTPCM CLUSPAC FOR CLUSTERING MULTIVARIATE DATA USING DISTANCE IN THE METRIC OF THE COVARIANCE MATRIX DEVELOPED AND PROGRAMMED BY DR. STANLEY L. SCLOVE VERSION 5.4 19-SEP-91 PREVIOUS UPDATE: VERSION 5.3 15-MAR-88 COPYRIGHT (C) 1991 STANLEY LOUIS SCLOVE. ALL RIGHTS RESERVED. CMS DSN = IQ DAT; X's: Language IQ; Nonlanguage IQ N = 23 NUMBER OF VARIABLES = 2 MINIMUM OF EACH VARIABLE: 80.0 80.0 MAXIMUM OF EACH VARIABLE: 119.0 124.0 K = 3 CLUSTERS INITIAL MEANS 90.0 90.0 110.0 110.0 100.0 120.0 CLUSTERING: CASES AND LABELS:-- 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 13 3 14 3 15 2 16 1 17 3 18 2 19 2 20 2 21 2 22 2 23 2 COMMON COVARIANCE MATRIX (MLE): 37.03615 23.81796 23.81796 42.21500 DET = 996.18582 IDET = 0 ACTUAL DET. = DET*10**IDET INVERSE COVARIANCE MATRIX: 0.04238 -0.02391 -0.02391 0.03718 MINUS 2 LOG LIKELIHOOD = 289.33276 ITERATION 1 MEAN VECTOR FOR CLUSTER 1: 90.23077 92.53846 MEAN VECTOR FOR CLUSTER 2: 109.14286 107.57143 MEAN VECTOR FOR CLUSTER 3: 102.33333 117.00000 CLUSTERING: CASES AND LABELS:-- 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 13 3 14 3 15 2 16 2 17 3 18 2 19 2 20 2 21 3 22 2 23 2 Stanley L. Sclove p. 5 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL ---------------------------------------------------------------------- COMMON COVARIANCE MATRIX (MLE): 34.52640 26.27640 26.27640 41.68944 DET = 748.93715 IDET = 0 ACTUAL DET. = DET*10**IDET INVERSE COVARIANCE MATRIX: 0.05566 -0.03508 -0.03508 0.04610 MINUS 2 LOG LIKELIHOOD = 282.77124 ITERATION 2 MEAN VECTOR FOR CLUSTER 1: 89.25000 92.50000 MEAN VECTOR FOR CLUSTER 2: 107.85714 103.85714 MEAN VECTOR FOR CLUSTER 3: 104.50000 117.50000 CLUSTERING: CASES AND LABELS:-- 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 13 3 14 3 15 2 16 2 17 3 18 2 19 2 20 2 21 3 22 2 23 2 COMMON COVARIANCE MATRIX (MLE): 34.52640 26.27640 26.27640 41.68944 DET = 748.93715 IDET = 0 ACTUAL DET. = DET*10**IDET INVERSE COVARIANCE MATRIX: 0.05566 -0.03508 -0.03508 0.04610 MINUS 2 LOG LIKELIHOOD = 282.77124 ITERATION 3 MEAN VECTOR FOR CLUSTER 1: 89.25000 92.50000 MEAN VECTOR FOR CLUSTER 2: 107.85714 103.85714 MEAN VECTOR FOR CLUSTER 3: 104.50000 117.50000 CONVERGENCE: NO CASE CHANGED CLUSTERS AFTER ITERATION 3. RESULTS ARE PRINTED BELOW. NUMBERS: 12 7 4 COMMON COVARIANCE MATRIX (DIVISOR IS N-K): 39.70535 30.21785 30.21785 47.94286 CASE, LABEL / DATA 1 1 80.0 93.0 2 1 82.0 91.0 3 1 84.0 80.0 4 1 86.0 94.0 5 1 86.0 92.0 6 1 89.0 94.0 7 1 90.0 87.0 8 1 94.0 87.0 9 1 94.0 93.0 10 1 95.0 97.0 11 1 95.0 100.0 12 1 96.0 102.0 Stanley L. Sclove p. 6 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL -------------------------------------------------------------------------------- 13 3 99.0 117.0 14 3 99.0 110.0 15 2 102.0 104.0 16 2 102.0 93.0 17 3 109.0 124.0 18 2 104.0 103.0 19 2 105.0 96.0 20 2 105.0 100.0 21 3 111.0 119.0 22 2 118.0 115.0 23 2 119.0 116.0 NUMBER OF PARAMETERS = 9 AIC = 300.77124 SCHWARZ CRITERION = 310.99048 PROGRAM ENDED NORMALLY. ........................................................................ Stanley L. Sclove p. 7 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis -- USER'S MANUAL ------------------------------------------------------------------------ 3. Table: Directory of CLUSPAC -------------------------------- The acronyms used to name the programs are combinations of letters denoting the likelihood function used, the number of variables, the metric, and the mode. The likelihood is either conditional mixture model (ISDT), standard mixture model (MIX), or classification (CLASS). The number of variables is indicated as either 1 or P. The metric is either common (CM), different (DF), determinant (DT) or Euclidean (EU). The mode is either automatic (A) or not. This is illustrated further in the table. TABLE: Key to naming of the programs-- Combination of likelihood, no. of variables, metric and mode Likelihood: classification or mixture No. of variables: 1 or P Metric: common, different, determinant or Euclidean Mode: automatic or not -------------------------------------------------------------------- CLUSTERING PROGRAMS LIKELIHOOD MODEL ----------------------------------------------------- Multi- Common Determinant Different Euclidean nomial* ------ ----------- --------- --------- ------- Classification (conditional mixture) likelihood ISDTxCM ISDTxDT ISDTxDF ISDTxEU Mixture (standard mixture) likelihood MIXxCM MIXxDT MIXxDF MIXxEU MIXxMUL --------------------------------------------------------------------- *Not yet programmed. x = P for multivariate versions = 1 for univariate versions CLASSIFICATION PROGRAMS Classification CLASSxCM CLASSxDT CLASSxDF CLASSxEU -------------------------------------------------------------------- Stanley L. Sclove p. 8 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL ---------------------------------------------------------------------- 4. Using CLUSPAC through VS FORTRAN ------------------------------------ To run the programs in an IBM CMS environment under VS FORTRAN:-- The CLUSPAC object programs have filetype TEXT. If you want to run the program with dataset name fn TEXT fm, type FORTVLG fn TEXT fm (DATA fn2 where the data to run the program have been collected into a dataset with filename fn2 and filetype DATA, according to instructions given in the next section. The above command will send the output to your terminal screen. If you want the output to go to your disk, use the OUTPUT option, as follows. FORTVLG fn TEXT fm (DATA fn2 OUTPUT The output will go to a file called fn FT06F001. For example, if you want to run ISDTPCM, and the ISOPAC disk has been accessed with filemode M, and the data are in INISPCM DATA, and you want the output to go to your disk, then the command is as follows. FORTVLG ISDTPCM TEXT M (DATA INISPCM OUTPUT The output will go to a file called ISDTPCM FT06F001 on your A-disk. You can then rename the file with a more descriptive name, for example as follows. RENAME ISDTPCM FT06F001 A INISPCM ISDTPCM A Here is sample input for the program ISDTPCM. (To use the above FORTVLG command, these data would have been saved into the dataset INISPCM DATA referred to after the parenthesis in the data option.) Stanley L. Sclove p. 9 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL ---------------------------------------------------------------------- TABLE. Sample input for the program ISDTPCM CMS DSN = IQ DAT; X's: Language IQ; Nonlanguage IQ N=0023 IP=02 ( 8X,F6.1, F6.1) 80.0 93.0 82.0 91.0 84.0 80.0 86.0 94.0 86.0 92.0 89.0 94.0 90.0 87.0 94.0 87.0 94.0 93.0 95.0 97.0 95.0 100.0 96.0 102.0 99.0 117.0 99.0 110.0 102.0 104.0 102.0 93.0 109.0 124.0 104.0 103.0 105.0 96.0 105.0 100.0 111.0 119.0 118.0 115.0 119.0 116.0 K=03 90.0 90.0 100.0 100.0 110.0 110.0 Stanley L. Sclove p. 10 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL ---------------------------------------------------------------------- 5. Lists of Program Descriptions and Control Statements -------------------------------------------------------- Listings of some of the comment statements for the existing programs follow. These include program descriptions and lists of control statements required to run the programs. Two subroutines (DMATEQ and DMATDT) from the old UICC Subroutine Library are called. (UICC--University of Illinois at Chicago Circle--was the former name of UIC--University of Illinois at Chicago). The UICC Subroutine Library is a public-domain FORTRAN subroutine library. The subroutines are compiled with the object-code versions of the programs so that they are self-contained are ready to run. Stanley L. Sclove p.11 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL ---------------------------------------------------------------------- MIX1CM: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CMS DSN = MIX1CM CLUSPAC C C C C "MIX1CM CLUSPAC" IS A PROGRAM FOR CLUSTERING UNIVARIATE DATA C C (DATA ON THE LINE) BY ITERATIVE MAXIMIZATION OF THE MIXTURE- C C MODEL LIKELIHOOD C C C C N K C C --- -- C C L = | | > P(C)*F(X(I)|C) C C | | -- C C I=1 C=1 C C C C C C REFERENCE: C C C C WOLFE (1970). C C C C MANUAL MODE: NUMBER OF CLUSTERS AND INITIAL MEANS ARE C C INPUT. (USE PROGRAM MIX1CMA FOR AUTOMATIC SETTING OF C C NUMBERS OF CLUSTERS AND INITIAL MEANS.) C C C C C C PROGRAMMED BY C C DR. STANLEY L. SCLOVE 312/996-2681 C C DEPARTMENT OF C C INFORMATION & DECISION SCIENCES M/C 294 C C COLLEGE OF BUSINESS ADMINISTRATION C C UNIVERSITY OF ILLINOIS AT CHICAGO C C BOX 4348 C C CHICAGO, IL 60680-4348 C C C C C C VERSION 3.3 31-MAR-91 C C C C COPYRIGHT 1991 STANLEY LOUIS SCLOVE. C C C C C C RESTRICTIONS (CAN BE MODIFIED): C C N, SAMPLE SIZE, AT MOST 999; C C K, NUMBER OF CLUSTERS, AT MOST 29; C C ITER, MAXIMUM NUMBER OF ITERATIONS, 20. C C C C C C CONTROL CARDS: C C C C (1) DATASET TITLE C C (2) N, IN FORMAT (2X,I4) C C (3) FMT, IN FORMAT (18A4), E.G., (1X,F4.1). C C ALLOW AT LEAST ONE BLANK IN FMT: IT WILL ALSO BE USED C C FOR OUTPUT, WHERE CC1 IS FOR CARRIAGE CONTROL. C C ALLOW A CC FOR THE DECIMAL POINT ON OUTPUT, C C WHETHER OR NOT THERE IS ONE ON INPUT. C C (4) DATA, IN FORMAT SPECIFIED BY FMT C C (5) K, NUMBER OF CLUSTERS, IN FORMAT (2X,I1) C C (6) K INITIAL VALUES OF PRIOR PROBABILITIES AND MEANS, C C IN FORMAT (5X,F3.2,2X,F8.2). C C (7) INITIAL VALUE OF THE VARIANCE. (THE RESULTS DO NOT C C DEPEND UPON THIS VALUE IF THE INITIAL PRIOR PROBABILITIES C C ARE EQUAL.) C C FORMAT IS (5X,F8.2). C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Sample input for MIX1CM: 3 on 2: Starting with 3 groups when there are really 2 clusters N=0040 (1X,F2.0) 1 1 Stanley L. Sclove p.12 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL ---------------------------------------------------------------------- 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 K=3 .27 2.00 initial values of P(1), MEAN1 .46 6.00 initial values of P(2), MEAN2 .27 10.00 initial values of P(3), MEAN3 1.00 initial value of variance /* Stanley L. Sclove p.13 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL ---------------------------------------------------------------------- MIX1CMA: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CMS DSN = MIX1CMA CLUSPAC C C C C THE PROGRAMS MIX* CLUSPAC IN ISOPAC ARE FOR CLUSTERING DATA C C BY ITERATIVE MAXIMIZATION OF THE MIXTURE-MODEL LIKELIHOOD C C C C N K C C --- -- C C L = | | > P(C)*F(X(I)|C), C C | | -- C C I=1 C=1 C C C C WHERE C C C C N = NUMBER OF OBSERVATIONS ("SAMPLE SIZE"), C C K = NUMBER OF CLUSTERS, C C X(I) = I-TH OBSERVATION, I = 1,2,...,N, C C F(X|C) = VALUE AT X OF THE C-TH CLASS-CONDITIONAL C C DENSITY FUNCTION (C=1,2,...,K) C C AND C C P(C) = PRIOR PROBABILITY OF CLASS C. C C C C C C REFERENCE FOR CLUSTERING BY MIXTURE MODEL: C C C C Wolfe, J. H. (1970). Pattern clustering by multivariate C C mixture analysis. Multivariate Behavioral Research 5, 329-350. C C C C C C THE "1" IN THE PROGRAM NAME "MIX1CMA ISOPAC" MEANS THAT C C THE PROGRAM IS FOR UNIVARIATE (1-DIMENSIONAL) DATA C C (DATA ON THE LINE); THE "CM" MEANS THAT A COMMON VARIANCE IS C C ASSUMED ACROSS CLUSTERS; AND THE "A" MEANS THAT THERE IS C C AUTOMATIC SETTING OF NUMBERS OF CLUSTERS AND INITIAL MEANS. C C C C C C VERSION 1.2 31-MAR-91 C C C C COPYRIGHT 1991 STANLEY LOUIS SCLOVE. C C C C C C C C RESTRICTIONS (CAN BE MODIFIED): C C N, SAMPLE SIZE, AT MOST 999; C C K, NUMBER OF CLUSTERS, AT MOST 29; C C ITER, MAXIMUM NUMBER OF ITERATIONS, 20. C C C C C C C C CONTROL CARDS: C C C C (1) DATASET TITLE C C (2) N, IN FORMAT (2X,I4) C C (3) FMT, IN FORMAT (18A4), E.G., (1X,F4.1). C C ALLOW AT LEAST ONE BLANK IN FMT: IT WILL ALSO BE USED C C FOR OUTPUT, WHERE CC1 IS FOR CARRIAGE CONTROL. C C ALLOW A CC FOR THE DECIMAL POINT ON OUTPUT, C C WHETHER OR NOT THERE IS ONE ON INPUT. C C (4) DATA, IN FORMAT SPECIFIED BY FMT C C C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Stanley L. Sclove p.14 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL ---------------------------------------------------------------------- MIX1DT: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CMS DSN = MIX1DT ISOPAC C C C C "MIX1DT ISOPAC" IS A PROGRAM FOR CLUSTERING UNIVARIATE DATA C C (DATA ON THE LINE) BY ITERATIVE MAXIMIZATION OF THE MIXTURE- C C MODEL LIKELIHOOD C C C C N K C C --- -- C C L = | | > P(C)*F(X(I)|C) C C | | -- C C I=1 C=1 C C C C C C REFERENCE: C C C C Wolfe, J. H. (1970). Pattern clustering by multivariate C C mixture analysis. Multivariate Behavioral Research 5, 329-350. C C C C MANUAL MODE: NUMBER OF CLUSTERS AND INITIAL MEANS ARE C C INPUT. (USE PROGRAM MIX1DTA FOR AUTOMATIC SETTING OF C C NUMBERS OF CLUSTERS AND INITIAL MEANS.) C C C C C C VERSION 1.0 3-NOV-89 C C C C COPYRIGHT 1991 STANLEY LOUIS SCLOVE. C C C C C C RESTRICTIONS (CAN BE MODIFIED): C C N, SAMPLE SIZE, AT MOST 999; C C K, NUMBER OF CLUSTERS, AT MOST 29; C C ITER, MAXIMUM NUMBER OF ITERATIONS, 20. C C C C C C C C CONTROL CARDS: C C C C (1) DATASET TITLE C C (2) N, IN FORMAT (2X,I4) C C (3) FMT, IN FORMAT (18A4), E.G., (1X,F4.1). C C ALLOW AT LEAST ONE BLANK IN FMT: IT WILL ALSO BE USED C C FOR OUTPUT, WHERE CC1 IS FOR CARRIAGE CONTROL. C C ALLOW A CC FOR THE DECIMAL POINT ON OUTPUT, C C WHETHER OR NOT THERE IS ONE ON INPUT. C C (4) DATA, IN FORMAT SPECIFIED BY FMT C C (5) K, NUMBER OF CLUSTERS, IN FORMAT (2X,I1) C C (6) K INITIAL VALUES OF PRIOR PROBABILITIES AND MEANS, C C IN FORMAT (5X,F3.2,2X,F8.2). C C (7) INITIAL VALUE OF THE VARIANCE. (THE RESULTS DO NOT C C DEPEND UPON THIS VALUE IF THE INITIAL PRIOR PROBABILITIES C C ARE EQUAL.) C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Stanley L. Sclove p.15 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL ---------------------------------------------------------------------- MIX1DTA: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CMS DSN = MIX1DTA CLUSPAC C C VERSION 1.1 05-NOV-89 C C COPYRIGHT 1991 STANLEY LOUIS SCLOVE. C C C C C C THE PROGRAMS MIX* ISOPAC ARE FOR CLUSTERING DATA BY C C ITERATIVE MAXIMIZATION OF THE MIXTURE-MODEL LIKELIHOOD C C C C N K C C --- -- C C L = | | > P(C)*F(X(I)|C), C C | | -- C C I=1 C=1 C C WHERE C C C C N = NUMBER OF OBSERVATIONS ("SAMPLE SIZE"), C C K = NUMBER OF CLUSTERS, C C X(I) = I-TH OBSERVATION, I = 1,2,...,N, C C F(X|C) = VALUE AT X OF THE C-TH CLASS-CONDITIONAL C C DENSITY FUNCTION (C=1,2,...,K) C C AND C C P(C) = PRIOR PROBABILITY OF CLASS C. C C C C C C REFERENCE FOR CLUSTERING BY MIXTURE MODEL: C C C C Wolfe, J. H. (1970). Pattern clustering by multivariate C C mixture analysis. Multivariate Behavioral Research 5, 329-350. C C C C C C IN THE PROGRAM NAME C C C C MIX1DTA C C C C THE "1" MEANS THAT THE PROGRAM IS FOR UNIVARIATE C C (1-DIMENSIONAL) DATA; C C C C THE "DT" MEANS THE VARIANCES ARE ALLOWED TO VARY ACROSS C C CLUSTERS; C C C C AND THE "A" MEANS THAT THERE IS AUTOMATIC SETTING OF NUMBERS C C OF CLUSTERS AND INITIAL MEANS. C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C RESTRICTIONS (CAN BE MODIFIED): C C N, SAMPLE SIZE, AT MOST 999; C C K, NUMBER OF CLUSTERS, AT MOST 29; C C ITER, MAXIMUM NUMBER OF ITERATIONS, 20. C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C CONTROL CARDS: C C C C (1) DATASET TITLE C C (2) N, IN FORMAT (2X,I4) C C (3) FMT, IN FORMAT (18A4), E.G., (1X,F4.1). C C ALLOW AT LEAST ONE BLANK IN FMT: IT WILL ALSO BE USED C C FOR OUTPUT, WHERE CC1 IS FOR CARRIAGE CONTROL. C C ALLOW A CC FOR THE DECIMAL POINT ON OUTPUT, C C WHETHER OR NOT THERE IS ONE ON INPUT. C C (4) DATA, IN FORMAT SPECIFIED BY FMT C C (5) K, NUMBER OF CLUSTERS, IN FORMAT (2X,I1) C C (6) K INITIAL VALUES OF PRIOR PROBABILITIES AND MEANS, C C IN FORMAT (5X,F3.2,2X,F8.2). C C (7) INITIAL VALUE OF THE VARIANCE. (THE RESULTS DO NOT C C DEPEND UPON THIS VALUE IF THE INITIAL PRIOR PROBABILITIES C C ARE EQUAL.) C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Stanley L. Sclove p.16 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL ---------------------------------------------------------------------- MIXPCM: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CMS NAME OF PROGRAM: MIXPCM ISOPAC C C VERSION 7.4 07-DEC-89 C C C C COPYRIGHT 1991 STANLEY LOUIS SCLOVE. C C C C C C ISOPAC is a set of programs implementing clustering C C algorithms derived under the assumption of Gaussian C C class-conditional distributions. The ISDT* programs in C C ISOPAC are based on the so-called "classification" C C likelihood. The MIX* programs are based on the mixture- C C model likelihood. C C C C Program MIXPCM (MIXture model, P-variate data, CoMmon C C *** * * * C C covariance matrix) in the ISOPAC package is one of the C C mixture-model programs for clustering multivariate data. C C (For univariate data the "MIX1" programs may be used.) C C MIXPCM assumes a common covariance matrix across C C distributions. MIXPDT allows different covariance C C matrices. C C C C Input: C C ----- C C Number of clusters (K), initial values of means, prior C C probabilities, and common covariance matrix. If desired, C C program ISDTPCM.ISOPAC can be used to obtain these initial C C values. Use program MIXPCMA.ISOPAC to try a range of C C numbers of clusters (values of K), with automatic setting C C of initial values. C C C C Program restrictions (can be modified): C C -------------------------------------- C C N, sample size, at most 1000; C C IP, number of variables, at most 20; C C K, number of clusters, at most 29; C C ITER, maximum number of iterations, 20. C C C C C C Subroutines called: C C MATEQ, which calls MATDT C C C C IV is a work array for subroutine MATEQ. C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C C C CONTROL CARDS: C C C C DATASET TITLE C C N, IN FORMAT (2X,I4) C C IP, IN FORMAT (3X,I2) C C DATFMT, IN FORMAT (18A4), E.G., (4F4.1) C C "DATFMT" WILL ALSO BE USED FOR OUTPUT, SO ALLOW AT LEAST C C ONE BLANK AT THE BEGINNING FOR CARRIAGE CONTROL. C C DATA, ONE CASE AT A TIME, IN FORMAT SPECIFIED BY DATFMT C C K, NUMBER OF CLUSTERS, IN FORMAT (2X,I2) C C MEANFT, in format (18A4) C C K INITIAL MEANS, IN FORMAT SPECIFIED BY MEANFT C C C C K INITIAL VALUES OF PRIOR PROBABILITIES, C C IN FORMAT (5X,F3.2). C C C C COVFMT, in format (18A4). C C Initial value of VARHAT, the common covariance matrix. C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Stanley L. Sclove p.17 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL ---------------------------------------------------------------------- MIXPCMA: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CMS FILE NAME OF PROGRAM: MIXPCMA CLUSPAC C C VERSION 4.5 31-MAR-91 C C C C C C COPYRIGHT 1991 STANLEY LOUIS SCLOVE. C C C C C C ISOPAC consists of the CLUSPAC clustering programs, C C the TSPAC time-series segmentation programs and the C C IMPAC image-segmentation programs. C C C C CLUSPAC is a set of programs implementing clustering C C algorithms derived under the assumption of Gaussian C C class-conditional distributions. The ISDT* programs in C C CLUSPAC are based on the so-called "classification" C C likelihood. The MIX* programs are based on the mixture- C C model likelihood. C C C C Program MIXPCMA in the CLUSPAC package is one of the C C mixture-model programs for clustering multivariate data. C C (For univariate data the "MIX1" programs may be used.) C C MIXPCM and MIXPCMA assume a common covariance matrix C C across distributions. MIXPDT and MIXPDTA allow different C C covariance matrices. C C MIXPCMA tries a range of numbers of clusters, with C C automatic setting of initial values of the parameters. C C C C C C ..............................................................C C C C C C RESTRICTIONS (CAN BE MODIFIED): C C N, SAMPLE SIZE, AT MOST 1000; C C IP, NUMBER OF VARIABLES, AT MOST 20; C C K, NUMBER OF CLUSTERS, AT MOST 29; C C ITER, MAXIMUM NUMBER OF ITERATIONS, 20. C C C C SUBROUTINE(S) CALLED: C C MATEQ, WHICH CALLS MATDT C C ..............................................................C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CONTROL CARDS: C C C C DATASET TITLE C C N, IN FORMAT (2X,I4) C C IP, IN FORMAT (3X,I2) C C INPFMT, IN FORMAT (18A4), E.G., (4F4.1,1X,A6) C C DATFMT, IN FORMAT (18A4), E.G., (4F4.1) C C "DATFMT" WILL ALSO BE USED FOR OUTPUT, SO ALLOW AT LEAST C C ONE BLANK AT THE BEGINNING FOR CARRIAGE CONTROL. C C DATA, ONE CASE AT A TIME, IN FORMAT SPECIFIED BY DATFMT C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Stanley L. Sclove p.18 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL -------------------------------------------------------------------------------- MIXPDT: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CMS FILE: MIXPDT CLUSPAC C C VERSION 2.2 15-NOV-89 C C C C C C CLUSPAC is a set of programs implementing clustering C C algorithms derived under the assumption of Gaussian C C class-conditional distributions. The ISDT* programs in C C CLUSPAC are based on the so-called "classification" C C likelihood. The MIX* programs are based on the mixture- C C model likelihood. C C C C Program MIXPDT in CLUSPAC is one of the C C mixture-model programs for clustering multivariate data. C C (For univariate data the "MIX1" programs may be used.) C C MIXPDT allows different covariance matrices; C C MIXPCM assumes a common covariance matrix across C C distributions. C C C C Input: C C ----- C C Number of clusters (K), initial values of means, prior C C probabilities, and covariance matrices. If desired, C C program ISDTPCM.CLUSPAC can be used to obtain these initial C C values. Use program MIXPDTA.CLUSPAC to try a range of C C numbers of clusters (values of K), with automatic setting C C of initial values. C C C C Program restrictions (can be modified): C C -------------------------------------- C C N, sample size, at most 1000; C C IP, number of variables, at most 20; C C K, number of clusters, at most 29; C C ITER, maximum number of iterations, 20. C C C C C C Subroutines called: C C MATEQ, which calls MATDT C C C C IV is a work array for subroutine MATEQ. C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CONTROL CARDS: C C C C DATASET TITLE C C N, IN FORMAT (2X,I4) C C IP, IN FORMAT (3X,I2) C C DATFMT, IN FORMAT (18A4), E.G., (4F4.1) C C "DATFMT" WILL ALSO BE USED FOR OUTPUT, SO ALLOW AT LEAST C C ONE BLANK AT THE BEGINNING FOR CARRIAGE CONTROL. C C DATA, ONE CASE AT A TIME, IN FORMAT SPECIFIED BY DATFMT C C K, NUMBER OF CLUSTERS, IN FORMAT (2X,I2) C C MEANFT, in format (18A4) C C K INITIAL MEANS, IN FORMAT SPECIFIED BY MEANFT C C C C K INITIAL VALUES OF MIXING PROBABILITIES, C C IN FORMAT (5X,F3.2). C C C C COVFMT, in format (18A4). C C Initial value of SIGMA(1,JV,JW), JV,JW=1,IP, cov mx of Pop 1 C C Initial value of SIGMA(2,JV,JW), JV,JW=1,IP, cov mx of Pop 2 C C . C C . C C . C C Initial value of SIGMA(K,JV,JW), JV,JW=1,IP, cov mx of Pop K C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Stanley L. Sclove p.19 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL -------------------------------------------------------------------------------- MIXPDTA: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CLUSPAC: CLUSTER ANALYSIS OF CASES BY MIXTURE MODEL C C C C PROGRAM MIXPDTA CLUSPAC: C C C C VARYING COVARIANCE MATRICES C C AUTOMATIC SETTING OF INITIAL PARAMETER ESTIMATES C C C C Programs are organized as C C programs within packages within libraries. C C PROGRAM: MIXPDTA PACKAGE: CLUSPAC LIBRARY: ISOPAC C C C C DOCUMENTATION (on my public disk for clustering programs): C C DIRETABL CLUSPAC, USERMAN CLUSPAC C C C C CMS FILE: MIXPDTA CLUSPAC C C VERSION 2.92 21-Apr-91 C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C ISOPAC consists of the CLUSPAC clustering programs, C C the TSPAC time-series segmentation programs and the C C IMPAC image-segmentation programs. C C C C CLUSPAC is a set of programs implementing clustering C C algorithms derived under the assumption of Gaussian C C class-conditional distributions. The ISDT* programs in C C CLUSPAC are based on the so-called "classification" C C likelihood. The MIX* programs are based on the mixture- C C model likelihood. C C C C Programs MIXP** in the CLUSPAC package are C C mixture-model programs for clustering multivariate data. C C (For univariate data the "MIX1**" programs may be used.) C C MIXPCM and MIXPCMA assume a common covariance matrix C C across distributions. MIXPDT and MIXPDTA allow different C C covariance matrices. C C MIXPCMA and MIXPDTA try a range of numbers of clusters, C C with automatic setting of initial values of the parameters. C C C C RESTRICTIONS (CAN BE MODIFIED): C C C C N, SAMPLE SIZE, AT MOST 1000; C C IP, NUMBER OF VARIABLES, AT MOST 20; C C K, NUMBER OF CLUSTERS, AT MOST 29; C C ITER, MAXIMUM NUMBER OF ITERATIONS, 30. C C C C SUBROUTINE(S) CALLED: C C MATEQ, WHICH CALLS MATDT C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C CONTROL CARDS: C C C C DATASET TITLE C C N, IN FORMAT (2X,I4) C C IP, IN FORMAT (3X,I2) C C INPFMT, IN FORMAT (18A4), E.G., (4F4.1,1X,A6) C C DATFMT, IN FORMAT (18A4), E.G., (4F4.1) C C "DATFMT" WILL ALSO BE USED FOR OUTPUT, SO ALLOW C C AT LEAST ONE BLANK AT THE BEGINNING FOR CARRIAGE CONTROL. C C DATA, ONE CASE AT A TIME, IN FORMAT SPECIFIED BY DATFMT C C ..............................................................C C C C INPFMT allows for a last variable which is an alphanumeric C C case label. C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Stanley L. Sclove p.20 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL -------------------------------------------------------------------------------- YEnd MIX*** series." Stanley L. Sclove p.21 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL -------------------------------------------------------------------------------- YBegin ISDT*** series." ISDT1CM: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C CMS DSN = ISDT1 ISOPAC C C C C PROGRAM ISDT1 C C FOR CLUSTERING UNIVARIATE DATA (CLUSTERING ON THE LINE). C C MANUAL MODE: NUMBER OF CLUSTERS AND INITIAL MEANS ARE C C INPUT. USE PROGRAM ISDT1A FOR AUTOMATIC SETTING OF C C NUMBER OF CLUSTERS AND INITIAL MEANS. C C C C C C LAST UPDATE: VERSION 2.5 9-JUN-87 C C NEXT-TO-LAST UPDATE: VERSION 2.4 1-AUG-83 C C C C COPYRIGHT 1991 STANLEY LOUIS SCLOVE. C C C C C C RESEARCH SUPPORTED IN PART BY: C C ONR CONTRACT N00014-80-C-0408, TASK NR042-443 C C ARO CONTRACT DAAG29-82-K-0155 C C C C RESTRICTIONS (CAN BE MODIFIED): C C N, SAMPLE SIZE, AT MOST 9999; C C K, NUMBER OF CLUSTERS, AT MOST 9; C C ITER, MAXIMUM NUMBER OF ITERATIONS, 20. C C C C C C C C C C CONTROL CARDS: C C C C DATASET TITLE C C N, IN FORMAT (2X,I4) C C FMT, IN FORMAT (18A4), E.G., (1X,F4.1) C C ALLOW AT LEAST ONE BLANK IN FMT: IT WILL ALSO BE USED C C FOR OUTPUT. C C DATA, IN FORMAT SPECIFIED BY FMT C C K, NUMBER OF CLUSTERS, IN FORMAT (2X,I1) C C K INITIAL MEANS, IN FORMAT SPECIFIED BY FMT C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Stanley L. Sclove p.22 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL -------------------------------------------------------------------------------- ISDT1CMA: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C PROGRAM ISDT1A C C FOR CLUSTERING UNIVARIATE DATA (CLUSTERING ON THE LINE). C C SET AUTOMATICALLY. C C C C C C VERSION 2.3 1-FEB-83 C C C C RESEARCH SUPPORTED IN PART BY: C C ONR CONTRACT N00014-80-C-0408, TASK NR042-443 C C ARO CONTRACT DAAG29-82-K-0155 C C C C RESTRICTIONS (CAN BE MODIFIED): C C N, SAMPLE SIZE, AT MOST 9999; C C K, NUMBER OF CLUSTERS, AT MOST 29; C C ITER, MAXIMUM NUMBER OF ITERATIONS, 20. C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C DIMENSION X(9999),D(29),C(29),F(9999),ICLUS(9999),IOTA(9999) DIMENSION SUM(29) DIMENSION TITLE(18) DIMENSION B(29),NG(29),XMEAN(29) DIMENSION FMT(18) DIMENSION SS(29),SSD(29) DIMENSION SD(29) DIMENSION VAR(29) DIMENSION ICLSOL(9999) DOUBLE PRECISION SUM,SS,SSD,SD C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CONTROL CARDS: C C C C DATASET TITLE, IN FORMAT(18A4) C C N, IN FORMAT (2X,I4) C C FMT, IN FORMAT (18A4), E.G., (F4.1) C C DATA, IN FORMAT SPECIFIED BY FMT C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Stanley L. Sclove p.23 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL -------------------------------------------------------------------------------- ISDT1CMX: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CMS DSN = ISDT1 ISOPAC C C C C C C PROGRAM ISDT1 C C FOR CLUSTERING UNIVARIATE DATA (CLUSTERING ON THE LINE). C C MANUAL MODE: NUMBER OF CLUSTERS AND INITIAL MEANS ARE C C INPUT. USE PROGRAM ISDT1A FOR AUTOMATIC SETTING OF C C NUMBER OF CLUSTERS AND INITIAL MEANS. C C C C C C LAST UPDATE: VERSION 2.5 9-JUN-87 C C NEXT-TO-LAST UPDATE: VERSION 2.4 1-AUG-83 C C C C COPYRIGHT 1991 STANLEY LOUIS SCLOVE. C C C C RESEARCH SUPPORTED IN PART BY: C C ONR CONTRACT N00014-80-C-0408, TASK NR042-443 C C ARO CONTRACT DAAG29-82-K-0155 C C C C RESTRICTIONS (CAN BE MODIFIED): C C N, SAMPLE SIZE, AT MOST 9999; C C K, NUMBER OF CLUSTERS, AT MOST 9; C C ITER, MAXIMUM NUMBER OF ITERATIONS, 20. C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C DIMENSION X(9999),F(9999),ICLUS(9999),IOTA(9999) DIMENSION D(29),C(29),SUM(29) DIMENSION TITLE(18) DIMENSION B(29),NG(29),XMEAN(29) DIMENSION SS(29),SSD(29) DIMENSION SD(29) DIMENSION VAR(29) DIMENSION ICLSOL(9999) DOUBLE PRECISION SUM,SS C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C CONTROL CARDS: C C C C DATASET TITLE C C N, IN FORMAT (2X,I4) C C C C C C FOR OUTPUT. C C DATA C C K, NUMBER OF CLUSTERS, IN FORMAT (2X,I1) C C K INITIAL MEANS C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Stanley L. Sclove p.24 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL -------------------------------------------------------------------------------- ISDT1DT: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CMS DSN = ISDT1DT CLUSPAC C C C C C C PROGRAM ISDT1DT OF ISOPAC IS C C FOR CLUSTERING UNIVARIATE DATA (CLUSTERING ON THE LINE). C C IT ALLOWS DIFFERENT VARIANCES IN CLUSTERS; THE C C OBSERVATION X IS ASSIGNED TO THAT CLUSTER FOR WHICH C C LOG VAR + (X-MEAN)**2/VAR IS MINIMAL. C C MANUAL MODE: NUMBER OF CLUSTERS AND INITIAL MEANS ARE C C INPUT; USE PROGRAM ISDT1DTA FOR AUTOMATIC SETTING OF C C NUMBER OF CLUSTERS AND INITIAL MEANS. C C C C C C C C LAST UPDATE: VERSION 1.0 23-OCT-89 C C C C COPYRIGHT 1991 STANLEY LOUIS SCLOVE. C C C C C C C C RESTRICTIONS (CAN BE MODIFIED): C C N, SAMPLE SIZE, AT MOST 9999; C C K, NUMBER OF CLUSTERS, AT MOST 9; C C ITER, MAXIMUM NUMBER OF ITERATIONS, 20. C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C DIMENSION X(9999),F(9999),ICLUS(9999),IOTA(9999) DIMENSION D(29),C(29),SUM(29) DIMENSION TITLE(18) DIMENSION BDY1(29),BDY2(29),NG(29),XMEAN(29) DIMENSION ROOT1(29),ROOT2(29) DIMENSION A(29),B(29) DIMENSION SS(29),SSD(29) DIMENSION SD(29) DIMENSION VAR(29) DIMENSION ICLSOL(9999) DOUBLE PRECISION SUM,SS C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C CONTROL CARDS: C C C C DATASET TITLE C C N, IN FORMAT (2X,I4) C C C C C C FOR OUTPUT. C C DATA C C K, NUMBER OF CLUSTERS, IN FORMAT (2X,I1) C C K INITIAL MEANS C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Stanley L. Sclove p.25 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL -------------------------------------------------------------------------------- ISDT1DTA: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CMS DSN = ISDT1DTA ISOPAC C C C C C C PROGRAM ISDT1DTA OF ISOPAC IS FOR C C CLUSTERING UNIVARIATE DATA (CLUSTERING ON THE LINE). C C IT ALLOWS DIFFERENT VARIANCES IN CLUSTERS; THE C C OBSERVATION X IS ASSIGNED TO THAT CLUSTER FOR WHICH C C LOG VAR + (X-MEAN)**2/VAR IS MINIMAL. C C AUTOMATIC MODE: VARYING NUMBER OF CLUSTERS; INITIAL MEANS C C ARE SET AUTOMATICALLY. C C C C C C VERSION 1.2 24-OCT-89 C C C C C C RESTRICTIONS (CAN BE MODIFIED): C C N, SAMPLE SIZE, AT MOST 9999; C C K, NUMBER OF CLUSTERS, AT MOST 29; C C ITER, MAXIMUM NUMBER OF ITERATIONS, 20. C C C C C C CONTROL CARDS: C C DATASET TITLE, IN FORMAT(18A4) C C N, IN FORMAT (2X,I4) C C FMT, IN FORMAT (18A4), E.G., (F4.1) C C DATA, IN FORMAT SPECIFIED BY FMT C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Stanley L. Sclove p.26 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL -------------------------------------------------------------------------------- ISDTPCM: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C PROGRAM ISDTPCM CLUSPAC C C VERSION 5.4 19-SEP-91 C C PREVIOUS UPDATE: VERSION 5.3 15-MAR-88 C C COPYRIGHT 1991 STANLEY LOUIS SCLOVE. C C C C PROGRAM ISDTPCM (ISoDaTa, P-variate data, CoMmon C C ** * * * * * C C covariance matrix) IN THE ISOPAC PACKAGE IS ONE OF THE C C "ISODATA" PROGRAMS FOR CLUSTERING MULTIVARIATE DATA. (FOR C C UNIVARIATE DATA THE "ISDT1" PROGRAMS MAY BE USED.) C C USE ISDTPCM FOR DISTANCE IN THE METRIC OF THE COMMON C C COVARIANCE MATRIX AND ISDTPDF FOR DIFFERENT COVARIANCE C C MATRICES. ISDTPDT USES DIFFERENT COVARIANCE MATRICES, WITH C C ADJUSTMENT BY THE DETERMINANTS, I.E., IT IS BASED ON THE C C GAUSSIAN LIKELIHOOD. C C C C MANUAL MODE: NUMBER OF CLUSTERS AND INITIAL MEANS ARE C C INPUT. USE PROGRAM ISDTPCMA TO TRY A RANGE OF NUMBER OF C C CLUSTERS, WITH AUTOMATIC SETTING OF INITIAL MEANS. C C C C RESTRICTIONS (CAN BE MODIFIED): C C N, SAMPLE SIZE, AT MOST 1000; C C IP, NUMBER OF VARIABLES, AT MOST 20; C C K, NUMBER OF CLUSTERS, AT MOST 29; C C ITER, MAXIMUM NUMBER OF ITERATIONS, 20. C C C C SUBROUTINE(S) CALLED: C C MATEQ, WHICH CALLS MATDT C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C DIMENSION X(1000,20),SUM(29,20) DIMENSION D(29),ICLUS(1000) DIMENSION TITLE(18) DIMENSION NG(29),XMEAN(29,20) DIMENSION FMT(18) DIMENSION SS(29,20,20),SSD(29,20,20) DIMENSION WGSS(20,20) DIMENSION VARHAT(20,20),WGMS(20,20) DIMENSION ICLSOL(1000) DIMENSION XMIN(20),XMAX(20) DIMENSION IV(20,20) C IV IS A WORK ARRAY FOR SUBROUTINE MATEQ. C DIMENSION P(20,20) DOUBLE PRECISION SS,SUM DOUBLE PRECISION WGSS,SSD DOUBLE PRECISION VARHAT DOUBLE PRECISION P DOUBLE PRECISION DET DOUBLE PRECISION D DOUBLE PRECISION XMEAN DOUBLE PRECISION TEMPIV,TEMPJV DOUBLE PRECISION F DOUBLE PRECISION CF C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C CONTROL CARDS: C C C C DATASET TITLE C C N, IN FORMAT (2X,I4) C C IP, IN FORMAT (3X,I2) C C FMT, IN FORMAT (18A4), E.G., (4F4.1) C C "FMT" WILL ALSO BE USED FOR OUTPUT, SO ALLOW AT LEAST ONE C C BLANK AT THE BEGINNING FOR CARRIAGE CONTROL. C C DATA, ONE CASE AT A TIME, IN FORMAT SPECIFIED BY FMT C C K, NUMBER OF CLUSTERS, IN FORMAT (2X,I2) C C K INITIAL MEANS, IN FORMAT SPECIFIED BY FMT C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Stanley L. Sclove p.27 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL -------------------------------------------------------------------------------- YEnd ISDT*** series." YBegin CLASS*** series." CLASSPDT: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CLUSPAC: Computer Programs for Mixture-Model Clustering C C C C COPYRIGHT 1991 STANLEY LOUIS SCLOVE. C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C CMS DSN OF PROGRAM: CLASSPDT ISOPAC C C VERSION 1.0 03-JUN-88 C C C C PROGRAM CLASSPDT (CLASSification, using DeTerminants C C of group covariance matrices) IS ONE OF THE "ISOPAC" C C PROGRAMS FOR CLUSTERING AND CLASSIFICATION OF MULTIVARIATE DATA. C C (FOR UNIVARIATE DATA THE PROGRAMS DESIGNATED "1" INSTEAD OF "P" C C MAY BE USED.) C C CLASSPEU.ISOPAC WILL USE EUCLIDEAN DISTANCE (FOR RESEARCH C C PURPOSES, NOT RECOMMENDED FOR DATA ANALYSIS). CLASSPCM WILL C C USE DISTANCE IN THE METRIC OF THE ESTIMATED COMMON COVARIANCE C C MATRIX. CLASSPDF WILL USE DIFFERENT COVARIANCE MATRICES FOR THE C C CLUSTERS. CLASSPDT USES DIFFERENT COVARIANCE MATRICES, WITH C C ADJUSTMENT BY THE DETERMINANTS, I.E., IT USES THE ESTIMATED LOG C C LIKELIHOOD FOR THE GAUSSIAN MODEL WITH DIFFERENT COVARIANCE C C MATRICES. C C C C C C PROGRAMMED BY C C DR. STANLEY L. SCLOVE 312/996-2681 C C DEPARTMENT 312/996-2676 C C OF INFORMATION & DECISION SCIENCES M/C 294 C C COLLEGE OF BUSINESS ADMINISTRATION C C UNIVERSITY OF ILLINOIS AT CHICAGO C C BOX 4348, CHICAGO, IL 60680 C C C C C C COPYRIGHT 1991 STANLEY LOUIS SCLOVE. C C Input: C C ----- C C Format for means, means, format for prior probabilities, C C prior probabilities; format for covariance matrices, C C covariance matrices. C C C C C C MENFMT (input) - is an 18A4 variable which specifies the C input format for the means. C C C XMEAN(IC) (input)-is a real vector variable which is the C C mean for the c-th class. C C C C C C C C RESTRICTIONS (CAN BE MODIFIED): C C N, SAMPLE SIZE, AT MOST 1000; C C IP, NUMBER OF VARIABLES, AT MOST 20; C C K, NUMBER OF CLUSTERS, AT MOST 29; C C C C SUBROUTINE(S) CALLED: C C MATEQ, WHICH CALLS MATDT C C C C IV IS A WORK ARRAY FOR SUBROUTINE MATEQ. C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CONTROL CARDS: C C C C DATASET TITLE C Stanley L. Sclove p.28 CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis--USER'S MANUAL -------------------------------------------------------------------------------- C N, IN FORMAT (2X,I4) C C IP, IN FORMAT (3X,I2) C C FMT, IN FORMAT (18A4), E.G., (4F4.1) C C "FMT" WILL ALSO BE USED FOR OUTPUT, SO ALLOW AT LEAST ONE C C BLANK AT THE BEGINNING FOR CARRIAGE CONTROL. C C DATA, ONE CASE AT A TIME, IN FORMAT SPECIFIED BY FMT C C K, NUMBER OF CLUSTERS, IN FORMAT (2X,I2) C C K MEANS, IN FORMAT SPECIFIED BY FMT C C C C K COVARIANCE MATRICES C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC .................................................................... Prof. Stanley L. Sclove, Ph.D. Information & Decision Sciences Dept. (MC 294) College of Business Administration University of Illinois at Chicago 601 S. Morgan St. Chicago, IL 60607-7124 Ofc phone 312/996-2681 Dept phone 312/996-2676 e-mail: slsclove@uic.edu Stanley L. Sclove CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis USER'S MANUAL 42 --------------------------------------------------------------------------------