Documentation for ISODATA Computer Programs (ISDT****) The "isodata" scheme proceeds as follows. One starts with tentative estimates of the K cluster means and assigns each observation to the mean to which it is closest. The cluster means are then re-estimated, and one loops through the data again, reassigning the observations. Etc. ISODATA is similar to K-MEANS, except that in K-MEANS there is "drift": The centroids are updated as each case is assigned. ISODATA waits for a complete pass through the data before updating the centroids. Here ISODATA is modified to use statistical distance (Mahalanobis distance) rather than just Euclidean distance. The number of clusters can be decided by means of model-selection criteria. Criteria computed are Akaike's, Schwarz's and Kashyap's. (I recommend use of the latter two.) The model with the lowest value of a given criterion is best. There are several ISODATA programs: ISDT1CM -- univariate, common variance, one value of K ISDT1CMA -- univariate, common variance, a range of values of K ISDTPCM -- multivariate, common covariance matrix, one value of K ISDTPCMA -- multivariate, common covariance matrix, a range of values of K ISDT1DT -- univariate, different variances, one value of K ISDT1DTA -- univariate, different variances, a range of values of K ISDTPDT -- multivariate, different covariance matrices, one value of K ISDTPDTA -- multivariate, different covariance matrices, a range of values of K When the variances (or covariance matrices) are different, the classification rule is not simply minimum distance: the distance is modified by a term that is a function of the variance (or determinant of the covariance matrix). REFERENCES Ball, G.H., and Hall, D.J. (1967). A clustering technique for sumarizing multivariate data. Behavioral Science 12, 153-155. Kashyap, R. (1978). Paper in IEEE PAMI. MacQueen, James (1967). K-means paper in Berkeley Symposium. Schwarz, Gideon (1978). Estimating the dimension of a model. Annals of Mathematical Statistics. Sclove, Stanley L. (1977). Population mixture models and clustering algorithms. Communications in Statistics A6, 417-434. ------------------------------------------------------------------------- created 2002: July 30 latest update: 2003: Jan 23 -------------------------------------------------------------------------