CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CLUSPAC: Software for Mixture-Model Clustering C C C C COPYRIGHT (C) 1991, 1992, 1993 STANLEY L. SCLOVE C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C ALAk7DT CLUSPAC C C VERSION 1.0 5-JAN-94 C C C C PROGRAMMED BY: C C C C Prof. Stanley L. Sclove, Ph.D. 312/996-2681 C C Information & Decision Sciences Dept. M/C 294 C C University of Illinois at Chicago C C 601 So. Morgan Street C C Chicago, IL 60607-7124 C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CLUSPAC programs implement clustering algorithms which are C C derived under the assumption of Gaussian class-conditional C C distributions. The ISDT* programs in CLUSPAC are based on C C the so-called "classification" likelihood. The MIX* programs C C are based on the mixture-model likelihood. Program C C MIXPDT in CLUSPAC is one of the mixture-model programs for C C clustering multivariate data. (For univariate data the C C "MIX1" programs may be used.) C C C C MIXPDT allows different covariance matrices; MIXPCM C C assumes a common covariance matrix across distributions. C C C C AMNOPDT, for clustering dihedral angles of molecules C C in polypeptide chains, is adapted from MIXPDT. C C C C FISIDT, for clustering (phi,psi), is adapted from C C AMNOPDT, which is for clustering (phi,psi,chi1). C C C C Input: C C ----- C C Number of clusters (K), initial values of means, prior C C probabilities, and covariance matrices. If desired, C C program ISDTPCM.CLUSPAC can be used to obtain these initial C C values. Use program MIXPDTA.CLUSPAC to try a range of C C numbers of clusters (values of K), with automatic setting C C of initial values. C C C C Program restrictions (can be modified): C C -------------------------------------- C C N, sample size, at most 9999; C C IP, number of variables, at most 20; C C K, number of clusters, at most 29; C C MAXITER, maximum number of iterations, 99 C C C C C C Subroutines called: C C MATEQ, which calls MATDT C C C C IV is a work array for subroutine MATEQ. C C C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C C C CONTROL CARDS: C C C C DATASET TITLE C C N, IN FORMAT (2X,I4) C C IP, IN FORMAT (3X,I2) C C DATFMT, IN FORMAT (18A4), E.G., (4F4.1) C C "DATFMT" WILL ALSO BE USED FOR OUTPUT, SO ALLOW AT LEAST C C ONE BLANK AT THE BEGINNING FOR CARRIAGE CONTROL. C C DATA, ONE CASE AT A TIME, IN FORMAT SPECIFIED BY DATFMT C C K, NUMBER OF CLUSTERS, IN FORMAT (2X,I2) C C MEANFT, in format (18A4) C C K INITIAL MEANS, IN FORMAT SPECIFIED BY MEANFT C C C C K INITIAL VALUES OF MIXING PROBABILITIES, C C IN FORMAT (5X,F3.2). C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C DIMENSION X(9999,20),SUM(29,20),SD(29,3) C DIMENSION AA(9999,3) C DIMENSION F(9999,29),PP(29,9999),FMAX(9999),DSQMIN(9999) C The matrix F is an N x K matrix of values of p.d.f.'s. DIMENSION ICLUS(9999),ICLSOL(9999) DIMENSION DENOM(9999),XMXPR(9999) DIMENSION DSQ(9999,29) DIMENSION TITLE(18) C DIMENSION NG(29),XBAR(29,20),XBAROLD(29,20),SEED(29,20) DIMENSION DATFMT(18) DIMENSION MEANFT(18) C DIMENSION COVFMT(18) DIMENSION SSD(29,20,20),SIGMA(29,20,20) DIMENSION WGSS(20,20) DIMENSION VARHAT(20,20) DIMENSION XMIN(20),XMAX(20) DIMENSION IV(20,20) DIMENSION P(20,20),PREC(29,20,20) DIMENSION XIDET(29),XLGDET(29),TRUDET(29) DIMENSION PR(29) C DOUBLE PRECISION SUM,SUMPXF DOUBLE PRECISION WGSS,SSD DOUBLE PRECISION VARHAT DOUBLE PRECISION P DOUBLE PRECISION DET,TRUDET DOUBLE PRECISION DSQ DOUBLE PRECISION XBAR,XBAROLD DOUBLE PRECISION DEVV,DEVW DOUBLE PRECISION F,BOTTOM C C C C FLOW OF PROGRAM: C C C READ DATA AND INITIAL PARAMETER ESTIMATES. C INVERT COVARIANCE MATRICES. C PRINT INITIAL PARAMETER ESTIMATES. C C ITERATION: C COMPUTE ESTIMATED VALUES OF PROBABILITY DENSITY FUNCTIONS. C COMPUTE LIKELIHOOD AND VALUES OF MODEL-SELECTION CRITERIA. C COMPUTE CLUSTER-MEMBERSHIP PROBABILITIES. C UPDATE PARAMETER ESTIMATES (INCLUDING C INVERSE COVARIANCE MATRICES). C CLUSTER BY MAXIMUM PROBABILITY OF CLUSTER MEMBERSHIP. C IF CLUSTERING HASN'T CHANGED, STOP AND PRINT RESULTS. C OTHERWISE DO ANOTHER ITERATION (UNLESS TOO MANY HAVE C ALREADY BEEN DONE). C C C DATA PI/3.141593/ C READ (5,36000) TITLE C C WRITE PROGRAM INFORMATION. WRITE (6,10000) WRITE (6,40000) TITLE C C READ SAMPLE SIZE, N. READ (5,12000) N XN = N WRITE (6,15000) N C READ NUMBER OF VARIABLES, IP. READ (5,54000) IP WRITE (6,13000) IP C C READ DATA FORMAT. READ (5,36000) DATFMT C C READ DATA. DO 500 I = 1,N C READ (5,DATFMT) ( AA(I,KV),KV=1,3 ), ( X(I,JV),JV=1,IP ) C was: READ (5,DATFMT) (X(I,JV), JV = 1,IP) C IF (I .EQ. 1) GO TO 100 GO TO 300 100 CONTINUE C COMPUTE MINIMA AND MAXIMA: DO 200 JV = 1,IP XMAX(JV) = X(1,JV) XMIN(JV) = X(1,JV) 200 CONTINUE 300 CONTINUE DO 400 JV = 1,IP IF (X(I,JV) .LT. XMIN(JV)) XMIN(JV)=X(I,JV) IF (X(I,JV) .GT. XMAX(JV)) XMAX(JV)=X(I,JV) 400 CONTINUE 500 CONTINUE C WRITE MINIMA AND MAXIMA: WRITE (6,64000) (XMIN(JV),JV=1,IP) WRITE (6,66000) (XMAX(JV),JV=1,IP) C C READ K, NUMBER OF CLUSTERS. READ (5,11000) K WRITE (6,26000) K C C READ INITIAL MEANS C READ INPUT FORMAT FOR MEANS: READ (5,36000) MEANFT DO 600 IC=1,K READ (5,MEANFT) (XBAR(IC,JV), JV=1,IP) C 600 CONTINUE C Store initial means into SEED: DO 602 IC = 1,K DO 602 IVAR=1,IP SEED(IC,IVAR) = XBAR(IC,IVAR) 602 CONTINUE C C READ INITIAL VALUES OF MIXING PROBABILITIES: DO 650 IC=1,K READ (5,25000) PR(IC) C 650 CONTINUE C C C SET INITIAL VALUES OF COVARIANCE MATRICES: C C READ(5,36000) COVFMT DO 660 IC = 1,K DO 660 JV = 1,IP DO 660 JW = 1,IP SIGMA(IC,JV,JW) = 0.0 660 CONTINUE DO 665 IC = 1,K DO 665 JV = 1,IP SIGMA(IC,JV,JV) = 300.00 665 CONTINUE C C WRITE(6,22000) MAXITER = 99 ITER = 1 C 700 CONTINUE C C Store old means: DO 810 IC = 1,K DO 810 JV = 1,IP XBAROLD(IC,JV) = XBAR(IC,JV) 810 CONTINUE C C WRITE (6,32000) ITER C WRITE CURRENT ESTIMATES OF PARAMETERS: WRITE (6,23000) C WRITE CURRENT ESTIMATES OF MEAN VECTORS:-- WRITE (6,20000) DO 710 IC = 1,K WRITE (6,33000) ( XBAR(IC,JV), JV=1,IP ) 710 CONTINUE WRITE (6,35000) (PR(IC), IC = 1,K) C WRITE CURRENT ESTIMATES OF COVARIANCE MATRICES:- IF (K .GE. 4) GO TO 671 DO 670 IC = 1,K WRITE(6,42500) IC DO 670 JV = 1,IP WRITE (6,48000) (SIGMA(IC,JV,JW),JW=1,IP) 670 CONTINUE C 671 CONTINUE C C Copy SIGMA into VARHAT for input to Subroutine MATEQ: DO 680 IC = 1,K DO 675 JV = 1,IP DO 675 JW = 1,IP VARHAT(JV,JW) = SIGMA(IC,JV,JW) 675 CONTINUE C C CALL SUBROUTINE TO COMPUTE INVERSE COVARIANCE MATRIX: C SET PARAMETERS OF SUBROUTINE CALL: IDET = 1 NRS1 = 0 C C CALL MATEQ(VARHAT,IP,20,JFLG,DET,IDET,IV,NRS1,P,20) C ON RETURN, P CONTAINS THE INVERSE MATRIX. C C C GENERAL FORM OF CALL IS: C CALL MATEQ(A,M,N,JFLG,DET,IDET,IV,NRS1,P,LL) C SEE SUBROUTINE LISTING FOR FULLER EXPLANATION. C DET(VARHAT) = DET*10.0**IDET C C ON RETURN, P CONTAINS THE INVERSE MATRIX. C C DO 676 JV = 1,IP DO 676 JW = 1,IP PREC(IC,JV,JW) = P(JV,JW) 676 CONTINUE C IF(DET .LE. 0.0) IDET = 0 XIDET(IC) = IDET IF(DET .LE. 0.0) DET=VARHAT(1,1)*VARHAT(2,2)*VARHAT(3,3) IF (DET .LE. 0.0) DET = -DET XLGDET(IC) = DLOG(DET) + XIDET(IC)*ALOG(10.0) TRUDET(IC) = DET*(10.0**IDET) 680 CONTINUE C C C C C COMMENCE COMPUTATION OF F(I,IC), I=1,...,N, IC=1,...,K: C C COMPUTE MAHALANOBIS D-SQUARE BETWEEN THE I-TH OBSERVATION AND C THE L-TH MEAN, L=1,2,...,K, I=1,2,...,N:-- C DO 1200 I = 1,N C DO 1000 L=1,K DSQ(I,L) = 0.0 DO 1310 JV=1,IP DEVV = XBAR(L,JV) - X(I,JV) DO 1310 JW=1,IP DEVW = XBAR(L,JW) - X(I,JW) DSQ(I,L) = DSQ(I,L) + DEVV*PREC(L,JV,JW)*DEVW 1310 CONTINUE IF ( DSQ(I,L) .LT. 0.0 ) DSQ(I,L)=0.0 ZSQ = DSQ(I,L) C IF(TRUDET(L) .LE. 0.0) XTRUDET(L)=SIGMA(L,1,1)*SIGMA(L,2,2)*SIGMA(L,3,3) IF ( TRUDET(L) .LE. 0.0 ) TRUDET(L) = -TRUDET(L) C C IF D-SQ IS INORDINATELY LARGE, SET VALUE OF PDF TO ZERO C (IT IS EXTREMELY SMALL ANYWAY, AND THIS AVOIDS UNDERFLOW): IF ( ZSQ/2.0 .LE. 174.673 ) GO TO 1090 F(I, L) = 0.0 GO TO 1100 C 1090 CONTINUE F(I, L) = EXP(-ZSQ/2.0) BOTTOM = DSQRT( TRUDET(L) ) F(I, L) = F(I,L)/BOTTOM 1100 CONTINUE C 1000 CONTINUE 1200 CONTINUE C C COMPUTE LOG LIKELIHOOD AND VALUES OF MODEL-SELECTION CRITERIA: C SUMLNF = 0.0 DO 2200 I=1,N SUMPXF = 0.0 DO 2100 IC=1,K SUMPXF = SUMPXF + PR(IC)*F(I,IC) 2100 CONTINUE C IF ( SUMPXF .LE. 0.0 ) GO TO 2200 C SUMLNF = SUMLNF + DLOG(SUMPXF) 2200 CONTINUE C XMN2LL = -2.0*SUMLNF C WRITE (6,30000) XMN2LL C C C COMPUTE MODEL SELECTION CRITERIA C C PARAMETERS: C K MEAN VECTORS OF DIMENSION P AND K P-BY-P COVARIANCE MATRICES, C WHERE P IS THE NUMBER OF VARIABLES, AND K-1 INDEPENDENT MIXING C PROBABILITIES. NOPARM = K*IP + K*IP*(IP+1)/2 + (K-1) WRITE (6,72000) NOPARM AIC = XMN2LL + 2.0*NOPARM SCH = XMN2LL + ALOG(XN)*NOPARM WRITE (6,70000) AIC WRITE (6,71000) SCH C C C COMPUTE POSTERIOR PROBABILITIES OF GROUP MEMBERSHIP, C PP(IC,I), THE CONDITIONAL PROBABILITY OF POP'N IC, GIVEN X(I), C AS PR(IC)*F(I,IC)/DENOM(I): DO 1350 I = 1,N DENOM(I) = 0.0 DO 1350 IC=1,K DENOM(I) = DENOM(I) + PR(IC)*F(I,IC) 1350 CONTINUE DO 1400 I = 1,N DO 1400 IC=1,K IF ( DENOM(I) .GT. 0.0 ) GO TO 1410 PP(IC,I) = 0.0 GO TO 1400 1410 CONTINUE PP(IC,I)= PR(IC)*F(I,IC)/DENOM(I) 1400 CONTINUE C WRITE(6,76000) DO 1420 I = 1,4 WRITE(6,82000) ( PP(IC,I), IC = 1,K ) 1420 CONTINUE C C C UPDATE PARAMETER ESTIMATES: C C C UPDATE CLUSTER MIXING PROBABILITIES PR(IC):-- DO 3800 IC = 1,K PR(IC) = 0.0 DO 3800 I = 1,N PR(IC) = PR(IC) + PP(IC,I) 3800 CONTINUE DO 3900 IC = 1,K PR(IC) = PR(IC)/N 3900 CONTINUE DO 1750 IG = 1,K DO 1750 JV = 1,IP SUM(IG,JV) = 0.0 DO 1750 JW = 1,IP SSD(IG,JV,JW) = 0.0 1750 CONTINUE C C UPDATE MEANS:-- DO 1875 IC = 1,K DO 1875 JV = 1,IP DO 1875 I = 1,N SUM(IC,JV) = SUM(IC,JV) + PP(IC,I)*X(I,JV) 1875 CONTINUE DO 1900 IG = 1,K XMINOBS = 0.5 IF (N*PR(IG) .LT. XMINOBS) GO TO 2050 GO TO 2150 2050 WRITE (6,74000) IG, XMINOBS GO TO 4000 2150 CONTINUE DO 1900 JV = 1,IP XBAR(IG,JV) = SUM(IG,JV)/(N*PR(IG)) 1900 CONTINUE C C DO 3600 IC=1,K DO 3600 JV=1,IP DO 3600 JW=JV,IP DO 3600 I=1,N DEVV=X(I,JV)-XBAR(IC,JV) DEVW=X(I,JW)-XBAR(IC,JW) TERM=PP(IC,I)*DEVV*DEVW SSD(IC,JV,JW)=SSD(IC,JV,JW)+TERM 3600 CONTINUE C DO 3700 IC=1,K DO 3700 JV=2,IP DO 3700 JW=1,JV-1 SSD(IC,JV,JW)=SSD(IC,JW,JV) 3700 CONTINUE C DO 3710 IC=1,K DO 3710 JV=1,IP DO 3710 JW=1,IP SIGMA(IC,JV,JW)=SSD(IC,JW,JV)/(N*PR(IC)) 3710 CONTINUE C C POOL: C DO 1950 JV = 1,IP DO 1950 JW = 1,IP WGSS(JV,JW) = 0.0 1950 CONTINUE C DO 2300 JV = 1,IP DO 2300 JW = 1,IP DO 2300 IC = 1,K WGSS(JV,JW) = WGSS(JV,JW) + PR(IC)*SSD(IC,JV,JW) 2300 CONTINUE C COMPUTE VARHAT, MLE OF COMMON COVARIANCE MATRIX: DO 2400 JV = 1,IP DO 2400 JW = 1,IP VARHAT(JV,JW) = WGSS(JV,JW)/N 2400 CONTINUE C (END PARAMETER-ESTIMATE UPDATE SEQUENCE) C IF (ITER .EQ. 1) GO TO 900 C STORE OLD LABELS: DO 800 I = 1,N ICLSOL(I) = ICLUS(I) 800 CONTINUE C C 900 CONTINUE C C COMPUTE NEW LABELS BY MAX POSTERIOR PROBABILITY: DO 1600 I = 1,N XMXPR(I) = PP(1,I) ICLUS(I) = 1 DO 1600 IC = 2,K IF ( PP(IC,I) .GT. XMXPR(I) ) GO TO 1500 GO TO 1600 1500 XMXPR(I) = PP(IC,I) ICLUS(I) = IC 1600 CONTINUE C C IF (N .GE. 31) GO TO 250 WRITE (6,15500) C WRITE (6,16000) WRITE (6,18000) (I, ICLUS(I), I=1,N) 250 CONTINUE C C C IF (ITER .EQ. 1) GO TO 3000 DO 2900 I = 1,N IF (ICLUS(I) .EQ. ICLSOL(I)) GO TO 2900 C If any case changed clusters, another iteration will be done: GO TO 3000 2900 CONTINUE C If no case changed cluster, a test of convergence of C estimates of the means will be done: TOL = 0.5 DO 2901 IC = 1,K DO 2901 JV = 1,IP TEST = ABS( XBAR(IC,JV)-XBAROLD(IC,JV) ) IF ( TEST .LT. TOL ) GO TO 2901 C If any mean changed much, another iteration will be done: GO TO 3000 2901 CONTINUE C If no mean has changed much, stop iterating: GO TO 3300 C 3000 CONTINUE C Maximum number of iterations is set at beginning of program C (currently it is 99). IF (ITER.GE.MAXITER) GO TO 3100 ITER = ITER + 1 GO TO 700 3100 WRITE (6,68000) GO TO 3300 C C 3300 CONTINUE C C C COUNT NUMBERS IN CLUSTERS: DO 3310 IC = 1,K NG(IC) = 0 3310 CONTINUE C DO 3320 I = 1,N IC = ICLUS(I) NG(IC) = NG(IC) + 1 3320 CONTINUE C C C PRINT FINAL RESULTS C C WRITE (6,34000) (NG(IC),IC=1,K) WRITE (6,35000) (PR(IC),IC=1,K) C DO 2800 IC = 1,K WRITE (6,28000) IC, (XBAR(IC,JV),JV=1,IP) 2800 CONTINUE C C DO 2501 IC = 1,K WRITE(6,42500) IC C WRITE (6,28000) IC, (XBAR(IC,JV),JV=1,IP) C WRITE (6,42000) DO 2500 JV=1,IP WRITE (6,48000) (SIGMA(IC,JV,JW),JW=1,IP) 2500 CONTINUE C C Compute and print standard deviations: WRITE (6,48600) C SD(IC,1) = SQRT(SIGMA(IC, 1,1 )) SD(IC,2) = SQRT(SIGMA(IC, 2,2 )) C SD1 = SD(IC,1) SD2 = SD(IC,2) C WRITE (6,48500) SD1,SD2 C C C Compute correlation matrix: DO 2503 JV = 1,IP JVN1 = JV+1 DO 2503 JW = JVN1,IP SD1 = SQRT(SIGMA(IC,JV,JV)) SD2 = SQRT(SIGMA(IC,JW,JW)) SIGMA(IC,JV,JW) = SIGMA(IC,JV,JW)/(SD1*SD2) 2503 CONTINUE DO 2505 JV = 2,IP JVM1 = IP-1 DO 2505 JW = 1,JVM1 SIGMA(IC,JV,JW) = SIGMA(IC,JW,JV) 2505 CONTINUE DO 2506 JV = 1,IP SIGMA(IC,JV,JV) = 1.0 2506 CONTINUE C C Write correlation matrix: WRITE (6,48800) C WRITE (6,48700) SIGMA(IC, 1, 2) C C 2501 CONTINUE C WRITE (6,10000) WRITE (6,48955) WRITE (6,10000) WRITE (6,40000) TITLE WRITE (6,15000) N WRITE (6,30000) XMN2LL WRITE (6,48952) DO 2508 IC = 1,K C WRITE (6,48901) IC,NG(IC), SEED(IC,1),SEED(IC,2), 1XBAR(IC,1),XBAR(IC,2), SD(IC,1),SD(IC,2) C 2508 CONTINUE WRITE (6,49000) DO 2509 IC = 1,K WRITE (6,48900) IC, PR(IC),NG(IC),XBAR(IC,1),XBAR(IC,2), 1 SD(IC,1),SD(IC,2), 2SIGMA(IC,1,2) C 2509 CONTINUE C C IF (ITER .GE. MAXITER + 1 ) WRITE (6,68000) C DO 3525 I = 1,N FMAX(I) = 0.0 DSQMIN(I) = 10.0**10 DO 3525 IC = 1,K IF ( F(I,IC) .GT. FMAX(I) ) FMAX(I) = F(I,IC) IF ( DSQ(I,IC) .LT. DSQMIN(I) ) DSQMIN(I) = DSQ(I,IC) 3525 CONTINUE C WRITE (6,52000) WRITE (6,52100) C C C Seed points: C C B -139 +135 1 beta-poly(L-alanine) C P -80 +150 2 polyglycine II C T -95 -10 3 C R -57 -47 4 right alpha-helix C L +57 +47 5 left alpha-helix C B2 -180 -180 1 C B3 +180 +180 1 C C Cluster 6, with a center near (-180,-180), will be C moved to near (-180,+180): DO 3510 I = 1,N IF(ICLUS(I) .EQ. 6 ) X(I,1) = X(I,1) IF(ICLUS(I) .EQ. 6 ) X(I,2) = 360 + X(I,2) C Cluster 7, with a center near (+180,+180), will be C moved to near (-180,+180): IF(ICLUS(I) .EQ. 7 ) X(I,1) = X(I,1)-360 IF(ICLUS(I) .EQ. 7 ) X(I,2) = X(I,2) 3510 CONTINUE C DO 3500 IC = 1,K WRITE (6,53000) IC DO 3500 I = 1,N C C Cases with min D-sq > 5.99 (upper 5% point) will be deleted C in next run: IF( DSQMIN(I) .GT. 5.99) GO TO 3505 C IF(ICLUS(I) .EQ. IC) WRITE (6,50000) I,ICLUS(I), X ( AA(I,KV),KV=1,3 ), X( X(I,JV), JV = 1,IP ), PP(IC,I), F(I,IC), FMAX(I),DSQMIN(I) 3505 CONTINUE C 3500 CONTINUE C WRITE (6,84000) C IF (ITER .GE. MAXITER + 1 ) WRITE (6,68000) C 4000 STOP C C 10000 FORMAT('1',//,' ..........................................'// X1X, ' Program ALAk7DT CLUSPAC ' / X1X, ' MIXTURE MODEL CLUSTERING FOR AMINO ACIDS',/, X1X, ' MODEL WITH VARYING COVARIANCE MATRICES '// X' Developed and programmed by: '/ X' Prof. Stanley L. Sclove, Ph.D. '/ X' Information & Decision Sciences Dept. M/C 294 '/ X' College of Business Administration '/ X' University of Illinois at Chicago '/ X' 601 South Morgan Street '// X' Chicago, IL 60607-7124 '// X' 312/996-2681 '// X' FISIDT CLUSPAC (from AMNOPDT CLUSPAC) '/ X' ALAk7dt CLUSPAC Version 1.0 5-Jan-94 '/ X' COPYRIGHT (C) 1991-1993 STANLEY L. SCLOVE', X' ALL RIGHTS RESERVED.'//) 11000 FORMAT(2X,I2) 12000 FORMAT(2X,I4) 13000 FORMAT(1X,'Number of variables used ...................',I3 ) 15000 FORMAT(1X,'Number of observations (sample size), n ...', I4 ) 15500 FORMAT(/ 1X,'CLUSTERING:' ) 16000 FORMAT(/,1X,'CASES AND LABELS:--' ) 18000 FORMAT(15(I5,I3)) 20000 FORMAT(/1X,' Means: '/) 22000 FORMAT(' FIRST ITERATION USES THE INITIAL PARAMETER ESTIMATES', X' PROVIDED BY THE USER.' ) 25000 FORMAT(5X, F3.2) 23000 FORMAT(/1X, 'CURRENT ESTIMATES OF PARAMETERS: ' ) 26000 FORMAT('1',/, 1X,'K = ',I2,' CLUSTERS'/ ) 28000 FORMAT(1X,'Mean vector for Cluster ',I2,': ',(8F12.0/)) 30000 FORMAT( 1X,' MINUS 2 LOG LIKELIHOOD = ', F12.0/) 32000 FORMAT(/, 1X,'ITERATION ', I2,/ ) 34000 FORMAT(/,1X,'Frequencies:',1X,9(I10,3X)/) 35000 FORMAT(/,1X,' Mixing probabilities: ', 3X,17(F5.3,1X)/) 33000 FORMAT( 1X, 2F7.0) 36000 FORMAT(18A4) 40000 FORMAT(//,1X,18A4//) 42000 FORMAT(/, ' Covariance matrix of phi, psi: ') 42500 FORMAT( ' Distribution ',I3 ) 48000 FORMAT(1X,8F12.1/) 48500 FORMAT(1X,2F6.1) 48600 FORMAT(1X,'Standard deviations of phi, psi: ') 48700 FORMAT(1X, F13.2/) 48800 FORMAT(1X,'Correlation between phi and psi: ') C C WRITE (6,48900) IC, PR(IC),NG(IC),XBAR(IC,1),XBAR(IC,2), C 1 SD(IC,1),SD(IC,2), C 2SIGMA(IC,1,2) 48900 FORMAT(1X,I2,1X,F5.3,I5,2F7.0,1X,2F5.0,1X, F5.1) C 48901 FORMAT(1X,I2,1X, I4, 2F6.0,3X,2F6.0,3X,2F6.1) C 48955 FORMAT(1X,'SUMMARY OF RESULTS '//) 48952 FORMAT(' Clus- Seeds',17X,'Final Means Std. Devs.'/ A' ter Freq phi psi phi psi phi psi '/ B' ------------------------------------------------------ ') C 49000 FORMAT(1X,'SUMMARY OF RESULTS'/,40X,' Correlation '/ 1' Clus- Means Std.Devs. between ,'/ 2' ter Prob Freq phi psi phi psi phi and psi '/ 5' ------------------------------------------------ ' ) C C SUMMARY OF RESULTS Correlations C Clus- Means Std. Devs. phi, phi, psi, C ter Prob Freq phi psi chi1 phi psi chi1 psi chi1 chi1 C --- ---- ---- ------------------- ------------- --------------- C C IF(ICLUS(I) .EQ. IC) WRITE (6,50000) I,ICLUS(I), C X ( AA(I,KV),KV=1,3 ), C X( X(I,JV), JV = 1,IP ), PP(IC,I), F(I,IC), FMAX(I),DSQMIN(I) C 50000 FORMAT(1X,I4,1X,I2, 1X, 3A1, 2F7.0, F7.2,2E11.2, F7.1) C 52000 FORMAT( 1X,'CASE, LABEL / DATA' ) 52100 FORMAT(/, 1X, 30X, 'F(I) MAX F(I|C) min DSQ(I,C) '/) 53000 FORMAT( 1X,' CLUSTER ', I2 ) 54000 FORMAT(3X,I2) 58000 FORMAT(/1X,'CONVERGENCE: NO CASE CHANGED CLUSTERS AFTER ', X'ITERATION ',I2,'. RESULTS ARE PRINTED BELOW.'//) 60000 FORMAT(/,1X,'JFLG = ',I2,'. IF JFLG=0, COMPUTATION OF DET', X' WENT WELL; OTHERWISE, THERE WAS TROUBLE OR MATRIX WAS ', X'ILL-CONDITIONED.'//) 62000 FORMAT(/1X,'DET = ',F13.1,' IDET = ',I3,5X, X'ACTUAL DET. = DET*10**IDET',//) 64000 FORMAT(/1X,'Minimum of each variable: ',1X,2F9.1) 66000 FORMAT(/1X,'Maximum of each variable: ',1X,2F9.1) 68000 FORMAT(1X,'PROGRAM HAS NOT CONVERGED IN 99 ITERATIONS. STOP') 70000 FORMAT(1X,' AIC = ', F15.4 ) 71000 FORMAT(1X,'SCHWARZ CRITERION = ', F15.4 ) 72000 FORMAT( 1X,'NUMBER OF PARAMETERS = ',I4 ) 74000 FORMAT(1X,'EXPECTED NUMBER OF OBSERVATIONS IN CLUSTER ',I3, X' IS LESS THAN ',F7.2,'. STOP.'/) 76000 FORMAT(//' POSTERIOR PROBS OF GROUP MEMBERSHIP FOR 1ST 4 CASES:') 82000 FORMAT(1X,29F5.2) 84000 FORMAT(1X,'PROGRAM ENDED NORMALLY.') END C C SUBROUTINE MATEQ(A,M,N,JFLG,DET,IDET,IV,NRS1,P,LL) C SUBROUTINE MATEQ IS DMATEQ FROM THE UICC SUBROUTINE LIBRARY. C PAGE 1 C C SUBROUTINE DMATEQ C ***************** C THIS ROUTINE WILL SOLVE A REAL*8 SYSTEM OF LINEAR EQUATIONS,COMPUTE C THE DETERMINANT, WITHOUT UNDERFLOW OR OVERFLOW, OF A REAL*8 MATRIX, C AND/OR INVERT A REAL*8 MATRIX. C CALLING SEQUENCE: C CALL DMATEQ(A,N,IA,JFLG,DET,IDET,IV,NRS,P,IP) WHERE; C A (INPUT) - IS THE REAL*8 MATRIX ON WHICH THE ROUTINE IS C TO WORK. IN THE PROCESS OF COMPUTATION THE C CONTENTS OF THIS MATRIX ARE DESTROYED. C N (INPUT) - IS AN INTEGER*4 VARIABLE WHICH SPECIFIES THE C ORDER OF THE A MATRIX. C IA (INPUT) - IS AN INTEGER*4 VARIABLE WHICH SPECIFIES THE C ACTUAL ROW DIMENSION OF A AS DIMENSIONED IN C THE CALLING PROGRAM. IA MUST BE GREATER THAN C OR EQUAL TO N. C JFLG (OUTPUT) - IS AN INTEGER*4 RETURN CODE VARIABLE. UPON C RETURN FROM DMATEQ IF; C JFLG=0, ALL WENT WELL. C JFLG=1, THE A MATRIX WAS SINGULAR OR NEAR C SINGULAR AND THE COMPUTATIONS COULD NOT BE C COMPLETED. THE CONTENTS OF THE VARIABLES C A, DET, IDET AND P ARE MEANINGLESS. C DET (OUTPUT) - IS A REAL*8 VARIABLE WHICH CONTAINS THE C DETERMINANT OF A. (SEE IDET) C IDET (INPUT) - IS AN INTEGER*4 VARIABLE. ON INPUT IF; C IDET=0, NO DETERMINANT IS CALCULATED. C IDET NOT 0, THE DETERMINANT OF A IS COMPUTED. C ON OUTPUT IDET CONTAINS THE POWER OF 10 C THAT DET SHOULD BE MULTIPLIED BY TO GIVE THE C CORRECT VALUE OF THE DETERMINANT. I.E. C DET(A)=DET*10.0D0**IDET. C IF DET(A) CAN BE COMPUTED WITHOUT UNDER OR C OVERFLOW, THEN IDET=0 OTHERWISE IDET IS SET C TO THE PROPER VALUE SO THAT NO UNDER OR OVER- C FLOW WILL OCCUR IN COMPUTING DET. C IV (INPUT) - IS AN INTEGER*4 WORK ARRAY WHICH SHOULD BE C DIMENSIONED AT LEAST IV(N). C PAGE 2 C C NRS (INPUT) - IS AN INTEGER*4 VARIABLE WITH THE FOLLOWING C INTERPRETATION: C NRS>0, SOLVE A SYSTEM OF LINEAR EQUATIONS C WITH NRS RIGHT HAND SIDES. C NRS=0, INVERT THE A MATRIX. C NRS<0, ONLY COMPUTE THE DETERMINANT OF A. C IN THIS CASE IDET MUST BE DIFFERENT FROM 0. C P (INPUT) - IS A REAL*8 ARRAY WITH THE FOLLOWING INTER- C PRETATION: C IF NRS>0, THEN P CONTAINS THE NRS RIGHT HAND C SIDES STORED BY COLUMNS. IN THIS CASE P MUST C BE DIMENSIONED AT LEAST P(N,NRS). ON RETURN C THE COLUMNS OF P ARE REPLACED BY THE RESPEC- C TIVE SOLUTIONS. C IF NRS=0, THEN P MUST BE DIMENSIONED AT LEAST C P(N,N). ON RETURN P WILL CONTAIN THE INVERSE C OF A. C IF NRS<0,THEN P NEED ONLY BE A DUMMY VARIABLE C IN THIS CASE P IS NEVER ACCESSED BY DMATEQ. C IP (INPUT) - IS AN INTEGER*4 VARIABLE WHICH CONTAINS THE C ACTUAL ROW DIMENSION OF P AS DIMENSIONED IN C THE CALLING PROGRAM. IP MUST BE GREATER THAN C OR EQUAL TO N. C NOTE: IMMEDIATELY ON RETURN FROM DMATEQ THE CONDITION CODE FLAG, C JFLG, SHOULD BE INTERROGATED. IF JFLG=1, THEN THE ROUTINE C COULD NOT COMPUTE A SOLUTION. C METHOD - THE ALGORITHM USED IS GAUSSIAN ELIMINATION WITH PARTIAL C -1 C PIVOTING. IN ESSENCE THE ROUTINE GENERATES A MATRIX L SUCH C -1 C THAT L *A = U, WHERE U IS AN UPPER TRIANGULAR MATRIX. THEN IT C SOLVES THE SYSTEM A*X = P BY MEANS OF THE EQUIVALENT SYSTEM C -1 -1 C U*X = L *A*X = L *P BY BACK SUBSTITUTION. C -1 C THE L MATRIX CAN BE WRITTEN AS A PRODUCT OF THE FORM C -1 C L = L *P *....*L *P WHERE EACH P IS A PERMUTATION C N-1 N-1 1 1 K C MATRIX OBTAINED BY INTERCHANGING AT MOST TWO ROWS OF THE C IDENTITY MATRIX. ( THIS REPRESENTS THE INTERCHANGING OF TWO C ROWS). THE L MATRICES ARE ELIMINATION MATRICES WHICH ARE C K C CHOSEN TO INTRODUCE ZEROS IN THE LAST N-K ENTRIES OF THE K-TH C COLUMN OF THE MATRIX. C PAGE 3 C C -1 -1 C THE CALCULATIONS OF L *A AND L *P ARE DONE BY PERFORMING C THE PERMUTATIONS ON A AND P RESPECTIVELY. THE ACTUAL L AND P C K K C ARE NOT COMPUTED. C SUBROUTINES CALLED: DMATDT C REFERENCE: C G. W. STEWART, INTRODUCTION TO MATRIX COMPUTATIONS, C ACADEMIC PRESS, 1973. REAL*8 A(N,1),DET,P(LL,1) REAL*8 DNORM,DEN,DMULT,DSUM,DISIGN DIMENSION IV(1) NRS=NRS1 IF (NRS.EQ.0) IDET=1 DISIGN=1.0D+00 DET=0.0D+00 JFLG=0 C C JFLG IS A TROUBLE FLAG.UPON EXIT IF JFLG=0 THEN THE MATRIX WAS PROCESS C WITHOUT TROUBLE.IF JFLG=1 EITHER THE MATRIX IS SINGULAR OR TROUBLE C OCCURED.ISIGN=-ISIGN EVERY TIME A ROW IS INTERCHANGED.THIS IS USED TO C INSURE THAT THE DETERMINANT HAS THE PROPER SIGN. C M1=M-1 DO 100 I=1,M 100 IV(I)=I IF (NRS) 500,200,500 200 DO 300 I=1,M DO 300 J=1,M 300 P(I,J)=0.0D+00 DO 400 I=1,M 400 P(I,I)=1.0D+00 NRS=M C C INSTEAD OF ACTUALLY INTERCHANGING ROWS A POINTER ARRAY IS USED TO KEEP C TRACK OF THE ROW POSITIONS. C C BEGIN ELIMINATION LOOP. C 500 DO 1200 K=1,M1 ICOL=K IPCOL=K C C SEARCHING FOR LARGEST ELEMENT IN ABSOLUTE VALUE IN COLUMN K. C DNORM=A(IV(K),K) IFLG=0 KK=K+1 DO 600 J=KK,M IF (DABS(A(IV(J),K)).LE.DABS(DNORM)) GO TO 600 IFLG=1 IPCOL=IV(J) DNORM=A(IPCOL,K) 600 CONTINUE C C IF IFLG=0 NO ROW INTERCHANGE TOOK PLACE.IF IFLG=1 A ROW INTERCHANGE C TOOK PLACE AND THE POINTER ARRAY IV MUST BE UPDATED. C IF (IFLG.EQ.0) GO TO 800 ISAVE=IV(ICOL) IV(ICOL)=IPCOL ICOL1=ICOL+1 DO 700 L=ICOL1,M IF (IV(L).EQ.IPCOL) IV(L)=ISAVE 700 CONTINUE DISIGN=-DISIGN 800 IF (DNORM.EQ.0.0D+00) GO TO 1900 C C BEGIN ELIMINATION OF ROW BELOW IV(K).DEN IS THE PIVOT ELEMENT. C K1=K+1 DO 1100 IM=K1,M C C BEFORE ACTUALLY ELIMINATING WE CHECK TO SEE IF A(IV(IM),K) HAS C ALREADY BEEN ANNIHILATED. C IF (A(IV(IM),K).EQ.0.0D+00) GO TO 1100 C C CALCULATE ELIMINATION FACTOR. C DMULT=-A(IV(IM),K) C C WE NOW CALCULATE VALUE OF OTHER ELEMENTS IN ROW IV(IM). C DO 900 NN=K1,M 900 A(IV(IM),NN)=(DMULT*A(IV(K),NN))/DNORM+A(IV(IM),NN) IF (NRS.LE.0) GO TO 1100 DO 1000 IN=1,NRS 1000 P(IV(IM),IN)=(DMULT*P(IV(K),IN))/DNORM+P(IV(IM),IN) 1100 CONTINUE 1200 CONTINUE C C CALCULATE VALUE OF DETERMINANT. C IF (A(IV(M),M).EQ.0.0D0) GO TO 1900 DET=DISIGN IF (IDET.NE.0) CALL DMATDT(A,N,M,DET,IV,IDET) IF (DET.EQ.0.0D+00) GO TO 1900 IF (NRS.LE.0) GO TO 2000 C C WE START SOLVING RIGHT HAND SIDES.THE SOLUTION REPLACES THE RIGHT HAND C VECTOR. C 1300 N1=M-1 DO 1600 JJ=1,NRS C C BEGIN BACK SUBSTITUTION. C P(IV(M),JJ)=P(IV(M),JJ)/A(IV(M),M) DO 1500 I=1,N1 DSUM=0.0D+00 DO 1400 J=1,I 1400 DSUM=DSUM-A(IV(M-I),M-J+1)*P(IV(M-J+1),JJ) 1500 P(IV(M-I),JJ)=(P(IV(M-I),JJ)+DSUM)/A(IV(M-I),M-I) 1600 CONTINUE DO 1800 JJ=1,NRS DO 1700 IND=1,M 1700 A(IND,1)=P(IV(IND),JJ) DO 1800 IND=1,M 1800 P(IND,JJ)=A(IND,1) RETURN 1900 JFLG=1 IDET=0 2000 RETURN END SUBROUTINE MATDT(A,IA,N,DET,IV,IDET) C SUBROUTINE MATDT IS DMATDT FROM THE UICC SUBROUTINE LIBRARY. REAL*8 A(IA,1),DET,B,LOG16 INTEGER*4 IV(1),K EQUIVALENCE (B,K) NUM=16777216 LOG16=.120411998265592457D+01 IF (A(IV(N),N).EQ.0.0D+00) GO TO 300 L=0 DO 100 I=1,N B=DABS(A(IV(I),I)) K=K/NUM-64 L=L+K 100 DET=DET*(A(IV(I),I)/16.0D+00**K) B=DABS(DET) K=K/NUM-64 IW=L+K IF ((IW.LT.-64).OR.(IW.GT.63)) GO TO 200 DET=DET*16.0D+00**L IDET=0 GO TO 400 200 DET=DET*16.0D+00**(-K) IDET=L+K B=IDET*LOG16 IDET=B B=B-DFLOAT(IDET) DET=DET*1.0D+01**B GO TO 400 300 DET=0.0D+00 IDET=0 400 RETURN END