. TSPAC A Package of Programs for Time-Series Segmentation Date of this documentation: 25-Feb-1993 Prof. Stanley L. Sclove, Ph.D. Information & Decision Sciences Dept. University of Illinois at Chicago Contents -------- Description of TSPAC APPENDIX A: DIRECTORY OF TSPAC PROGRAMS APPENDIX B: RUNNING THE PROGRAMS IN CMS WITH VS FORTRAN .................................................................... Description of TSPAC TSPAC is part of the ISOPAC library, which also contains CLUSPAC for cluster analysis and IMPAC for segmentation of numerical images. The input to programs in TSPAC is a time series {x(t), t = 1,2,...,n}, where x(t) may be a vector. The algorithm used in TSPAC goes as follows. (More formal presentations are found in Sclove (1983a,b,c). ) The elements of the segmentation model are the class-conditional time series or distributions, with their parameters; the labels; and the transition probabilities between the labels. Correspondingly, the algorithm alternates between estimation of the distributional parameters, estimation of the labels, and estimation of the transition probabilities. That is, given a tentative labeling, one can obtain tentative estimates of the parameters of the class-conditional distributions and of the transition probabilities. One then relabels the observations, using these updated parameter estimates. The relabeling is done as follows: If x(t) is currently labeled as class c (i.e., if time period t is tentatively assigned to class c) then x(t+1) is labeled as that class d for which the product p(c to d)f(x(t+1)|d) is maximal, where p(c to d) denotes the current estimate of the probability of a transition from c to d and f(x|d) denotes the tentatively estimated class-d probability density, evaluated at x. This makes sense because, under the assumptions of the model, the likelihood is the product of such terms. Example. Consider this short, artificial time series. t: 1 2 3 4 5 6 7 8 9 10 11 12 x(t): 1 1 3 1 2 1 2 6 7 1 1 1 Suppose it is specified that there are two classes and that the class-conditional distributions are exponential. Suppose the initial guesses of the parameters are equal prior probabilities for the two classes and means of 2 and 3. Then initially the class-conditional densities are taken as f(x|a) = (1/2)exp(-x/2) and f(x|b) = (1/3)exp(-x/3). In the first iteration, using equal prior probabilities of .5, one labels x as having come from Class A if .5f(x|a) > .5f(x|b), which simplifies to x < 2.43. This gives the following estimated labels at the end of the first iteration. t: 1 2 3 4 5 6 7 8 9 10 11 12 x(t): 1 1 3 1 2 1 2 6 7 1 1 1 label: a a b a a a a b b a a a Now the transition probabilities can be estimated from the sequence of estimated labels. The 11 transitions are: a to a, a to b, b to a, a to a, a to a, a to a, a to b, b to b, b to a, a to a, a to a. These give the following estimates of the transition probabilities. p(a to a) = 3/4 p(a to b) = 1/4 p(b to a) = 2/3 p(b to b) = 1/3 The class means are estimated as follows Mean of a's: (1 + 1 + 1 + 2 + 1 + 2 + 1 + 1 + 1)/9 = 11/9 = 1.22 Mean of b's: (3 + 6 + 7)/3 = 16/3 = 5.33 Now the condition for labeling the current x as "a", given that the preceding x has been labeled as "a", is p(a to b)f(x|b) < p(a to a)f(x|a), or (1/4)(1/5.33)exp(-x/5.33) < (3/4)(1/1.22)exp(-x/1.22), which simplifies to x < 4.075. Similarly, the condition for labeling x as "a", given that the preceding observation has been labeled as "b", is p(b to b)f(x|b) < p(b to a)f(x|a), which simplifies to x < 1.24. These second-iteration classification rules change only the label of x(3) from "b" to "a". References Sclove, Stanley L. (1983a). Time-series segmentation: a model and a method. Information Sciences 29, 7-25. Sclove, Stanley L. (1983b). Application of the conditional population-mixture model to image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence 5, 428-433. Sclove, Stanley L. (1983c). On segmentation of time series. In Studies in Econometrics, Time Series and Multivariate Statistics (S. Karlin, T. Amemiya and L. Goodman, eds.), Academic Press, pp. 311-330. -------------------------------------------------------------------- APPENDIX A: DIRECTORY OF TSPAC PROGRAMS TSSG1CMA TSPAC - univariate series, common variance, automatic mode TSSG1CAS TSPAC - univariate series, common variance, automatic mode, sparse transition probability matrix TSSGPDTA TSPAC - multiple time-series, automatic mode APPENDIX B: RUNNING THE PROGRAMS Until 31-Dec-1999, one of the operating systems maintained by UIC was IBM CMS. Consequently, this Appendix concerns using TSPAC through VS FORTRAN on CMS. Assume that the ISOPAC program library is online on a public disk called ISOPAC. Then access the programs by entering GETDISK ISOPAC in CMS. The ISOPAC disk contains object programs for TSPAC. The TSPAC programs have filetype TEXT. If you want to run the program with dataset name fn TEXT fm, type FORTVLG fn TEXT fm (DATA fn2 where the data to run the program have been collected into a dataset with filename fn2 and filetype DATA, according to instructions given in the next section. The above command will send the output to your terminal screen. If you want the output to go to your disk, use the OUTPUT option, as follows. FORTVLG fn TEXT fm (DATA fn2 OUTPUT The output will go to a file called fn FT06F001. For example, if you want to run TSSG1CMA, and the ISOPAC disk has been accessed with filemode M, and the data are in MYDATA DATA, and you want the output to go to your disk, then the command is as follows. FORTVLG TSSG1CMA TEXT M (DATA MYDATA OUTPUT The output will go to a file called TSSG1CMA FT06F001 on your A-disk. You can then rename the file with a more descriptive name, for example as follows. RENAME TSSG1CMA FT06F001 A MYDATA TSSG1CMA A (To use the above FORTVLG command, the input data would have been saved into the dataset MYDATA DATA referred to after the parenthesis in the data option.) -------------------------------------------------------------------- Original date of this documentation: 1993: Feb 25 latest revision: 2002: July 26 --------------------------------------------------------------------