A time series is a set of numerical data obtained at regular periods of time. The subject of Time Series Analysis is itself the subject of whole courses. (At UIC, these courses are IDS 476: Business Forecasting using Time Series Methods, cross-listed as ECON 450, and, on the graduate level, ECON 537-538: Research and Forecasting I-II, cross-listed as IDS 582-583.)
(Program 6--"Time Series" from the series Against All Odds: Inside Statistics can be viewed as part of an introduction to time series analysis.)
We begin with an example of a short time series.
# CMS DSN = KnEx20x5 DAT
# Kenkel, 2nd ed., Expl 20.5
# ANNUAL GNP OF THE U.S. ($BILLIONS) FOR 1968 TO 1977
# GNP (Y)
868.5
935.5
982.4
1063.4
1171.1
1306.6
1412.9
1528.8
1700.1
1887.2
MTB > TSPLOT C1
-
GNP -
- 0
-
1750+
- 9
-
- 8
-
1400+ 7
- 6
-
- 5
-
1050+ 4
- 3
- 2
- 1
+-----+-----+-----+-----+-----+
0 2 4 6 8 10
MTB > NAME C2 'GNP(t-1)'
MTB > LAG C1, PUT RESULTS INTO C2
MTB > PRINT C1 C2
ROW GNP GNP(t-1)
1 868.5 *
2 935.5 868.5
3 982.4 935.5
4 1063.4 982.4
5 1171.1 1063.4
6 1306.6 1171.1
7 1412.9 1306.6
8 1528.8 1412.9
9 1700.1 1528.8
10 1887.2 1700.1
MTB > PLOT C1 VS C2
-
- *
1800+
-
- *
GNP(t) -
-
1500+ *
- *
-
- *
-
1200+ *
-
- *
-
- * *
900+
--------+---------+---------+---------+---------+--------
960 1120 1280 1440 1600
GNP(t-1)
N* = 1
The correlation between observations k intervals apart is numerically measured by the lag k sample autocorrelationcoefficient. This is just like a correlation between an X and a Y, where Y is Yt and X is Yt-k.
ACF of GNP
-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
+----+----+----+----+----+----+----+----+----+----+
1 0.685 XXXXXXXXXXXXXXXXXX
2 0.400 XXXXXXXXXXX
3 0.147 XXXXX
4 -0.082 XXX
5 -0.272 XXXXXXXX
6 -0.375 XXXXXXXXXX
7 -0.405 XXXXXXXXXXX
8 -0.362 XXXXXXXXXX
9 -0.237 XXXXXXX
PACF of GNP
-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
+----+----+----+----+----+----+----+----+----+----+
1 0.685 XXXXXXXXXXXXXXXXXX
2 -0.131 XXXX
3 -0.141 XXXXX
4 -0.172 XXXXX
5 -0.173 XXXXX
6 -0.099 XXX
7 -0.087 XXX
8 -0.047 XX
9 0.036 XX
If one or more autocorrelation coefficient is high, as is the case above, we consider fitting an "autoregressive" model. Here we fit a first-order autoregressive model to the GNP data, since the PACF shows a single spike at lag 1.
MTB > REGRess 'GNP(t)' on the 1 variable 'GNP(t-1)' & MTB > put standardized residuals into C11 and fits into C12; SUBC> DW. The regression equation is GNP(t) = - 64.3 + 1.15 GNP(t-1) 9 cases used 1 cases contain missing values Predictor Coef Stdev t-ratio Constant -64.27 32.00 -2.01 GNP(t-1) 1.14560 0.02563 44.70 s = 20.80 R-sq = 99.7% R-sq(adj) = 99.6%
Durbin-Watson statistic = 1.98A value of the DW statistic near 2 like this indicates no first-order correlation of the residuals, but in this case there is a high lag-2 residual correlation; see the ACF below.
It is possible to do the LAG and REGRESS automatically using the ARIMA command, which in this case would look like the following.
ARIMA p=1 d=0 q=0 for data in 'GNP, put residuals into C11
MTB > NAME C12 'FITS'
MTB > NAME C13 'RES'
MTB > LET 'RES' = 'GNP' - 'FITS'
MTB > TSPLOT 'RES'
-
RES -
- 6
-
20.0+
- 5
- 9
-
- 2 4 0
0.0+
-
-
-
-
-20.0+ 7
- 3 8
-
-
+-----+-----+-----+-----+-----+
0 2 4 6 8 10
N* = 1
MTB > ACF 'RES'
ACF of RES
-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
+----+----+----+----+----+----+----+----+----+----+
1 0.003 X
2 -0.592 XXXXXXXXXXXXXXXX
3 -0.273 XXXXXXXX
4 0.301 XXXXXXXXX
5 0.211 XXXXXX
6 -0.145 XXXXX
7 -0.011 X
8 0.006 X
There is a high lag-2 residual autocorrelation, demonstrating the inadequacy of the model. Something more has to be done, such as working in terms of differences or second differences, and/or taking logs. Here a very short segment of a times series was chosen for purposes of illustration; it is too short to obtain sufficiently accurate estimates of more parameters.
With a longer series, one could fit an ARIMA model (Box-Jenkins model) to the data. ARIMA = AR + I + MA; AR refers to autoregression, I refers to differencing (in that the observed series may be an "integration" of a differenced, stationary series), and MA refers to "moving average" terms in the model, i.e., autoregression in the error term.
In fitting ARIMA models, one uses the pattern of ACF and PACF to identify an appropriate model. Synthesis like that shown below for moving-average models shows the following:
MA(q) models exhibit the reverse of the behavior exhibited by AR(p) models: For MA(q) models the ACF spikes and the PACF tails off.
We consider in detail only the MA(1) model. Suppose that u(1), u(2),..., u(t), ... is a "white noise" sequence; this means that
(i) the means of the u(t)'s are zero;
(ii) the u(t)'s all have the same variance, say v;
(iii) the u(t)'s are uncorrelated.
LetThen Yt and Yt-1 will be correlated because they both contain ut-1, but Yt and Yt-2 will not be correlated because they contain no u's in common and the u's are uncorrelated. The series {Yt} is an MA(1) series.
The dataset MBTB1604 DAT contains sales data. Often it is the case for sales data that the series does not appear to be stationary and in fact does not appear to vary about a fixed mean. However, the first differences, D(t) = Yt - Yt-1, i.e., the changes in sales, do appear to be stationary and to vary about a fixed mean that is close to zero. We therefore model the original series {Yt} indirectly by modeling the stationary series {Dt}. It is often more interesting and natural to look at the sales increases rather than the sales figures themselves.
If the ACF shows a significant spike at lag 1 and is otherwise essentially zero, it is appropriate to model the series {D(t)} as a 1st-order moving average.
This method of forecasting is included in many business software packages. However, it is optimal only when the best model is ARIMA p=0, d=1, q=1, also denoted by IMA d=1, q=1. Further, exponential smoothing involves choosing the value of a weight, denoted by W in LBS, called the "smoothing constant." It seems preferable to run ARIMA(0,1,1) which estimates the constant.
More precisely, the exponential smoothing scheme is[The value ^Y1 is taken to be Y1 .] The value of W, the "smoothing constant" can be taken to be 1-f, where f is the estimate of the parameter F in the IMA(1,1) model; this model is
Dt = ut - F ut-1,It can be shown that
^ Yt = (1-f)[Yt-1 + fYt-2 + f2 Yt-2 + ... ].The ^ Yt thus turns out to be a weighted average of the past values of the series. The weights decrease in exponential progression, and the series giving the ^ Yt is called an exponentially weighted (moving) average.
Above we alluded to the differencing operation as a pre-processing step in time-series analysis. Let us examine this operation further.
MTB > DIFFerence the data in C1, put the results into C2
MTB > name c2 'dGNP'
MTB > diff c2 c3
MTB > name c3 'd2GNP'
MTB > tsplot c2
200+
- 0
dGNP -
- 9
-
150+
- 6
-
- 8
- 5 7
100+
-
- 4
- 2
-
50+ 3
-
+-----+-----+-----+-----+-----+
0 2 4 6 8 10
N* = 1
The first differences still exhibit an upward trend, i.e., they are not stationary. Let's look at the second differences.
MTB > TSPLOT C3
60.0+
- 9
d2GNP -
-
- 4
30.0+ 6
- 5
- 0
- 8
-
0.0+
-
-
- 3
-
-30.0+ 7
-
+-----+-----+-----+-----+-----+
0 2 4 6 8 10
N* = 2
First differences can be thought of as velocities; second differences, as accelerations. A constant first difference indicates constant rate of change; a constant second difference indicates constant acceleration.
#CMS DSN = EXPL2013 DATAKNKL D #SHIPMENTS OF PORTLAND CEMENT (MILLIONS OF BARRELS) # #C1: Y, bbls Portland cement (millions) #C2: MONTH #C3: YEAR # C1 C2 C3 19 1 75 17 2 75 22 3 75 29 4 75 34 5 75 36 6 75 39 7 75 39 8 75 39 9 75 42 10 75 28 11 75 23 12 75 18 1 76 20 2 76 . . . 50 10 79 38 11 79 29 12 79
MTB > NAME C1 'Y' C2 'MONTH' C3 'YEAR'
MTB > TABLE C2 C3;
SUBC> MEAN C1.
ROWS: MONTH COLUMNS: YEAR
| 75 76 77 78 79 | ALL
---|-------------------------------------------|------
1 | 19. 18. 14. 15. 17. | 16.6
2 | 17. 20. 21. 19. 19. | 19.2
3 | 22. 28. 31. 31. 32. | 28.8
4 | 29. 33. 36. 37. 36. | 34.2
5 | 34. 34. 40. 45. 45. | 39.6
6 | 36. 40. 45. 50. 48. | 43.8
7 | 39. 39. 41. 44. 45. | 41.6
8 | 39. 42. 46. 50. 50. | 45.4
9 | 39. 39. 42. 45. 43. | 41.6
10 | 42. 37. 43. 48. 50. | 44.
11 | 28. 32. 35. 38. 38. | 34.2
12 | 23. 23. 26. 29. 29. | 26.
---|-------------------------------------------|------
ALL| 30.583 32.083 35. 37.583 37.667| 34.583
CELL CONTENTS --Y:MEAN
MTB > TSPLot with period of 12 'Y'
-
50.0+ 6 8 8 0
- 0 6
Y - 6 8 5 7 9 5 7
- 0 8 90 9
- 789 67 9 5 7
37.5+ 0 4 A A
- 56 5 4 A 4
- 4 A 3
- 4 3 3 B B
- A 3
25.0+ B
- 3 B B
- 1 2 2 2 2
- 2 1 1
- 1 1
12.5+
-
+-----------+-----------+-----------+-----------+-----------+
0 12 24 36 48 60
MTB > DIFFERENCES OF LAG 12 FOR DATA IN 'Y' PUT INTO C12
MTB > NAME C12 'SEASDIFF'
MTB > TSPLOT 12 'SEASDIFF'
-
- 3 5 0
-
- 6 56 0
S 4.00+ 4 6 A 8 8
E - 2 8 34 9 AB 7 9 AB
A - 7 1 0
S -
D - 2 1 4 3 7
I 0.00+ 5 7 9 B 3 2 5 8 AB F
F - 1 4
-
- 2 6 9
-
-4.00+ 1
- 0
-
-
+-----------+-----------+-----------+-----------+-----------+
0 12 24 36 48 60
N* = 12
MTB > ACF 'SEASDIFF'
ACF of SEASDIFF
-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
+----+----+----+----+----+----+----+----+----+----+
1 0.333 XXXXXXXXX
2 0.190 XXXXXX
3 0.259 XXXXXXX
4 -0.029 XX
5 0.016 X
6 -0.085 XXX
7 -0.207 XXXXXX
8 -0.216 XXXXXX
9 -0.184 XXXXXX
10 -0.298 XXXXXXXX
11 -0.100 XXXX
12 0.083 XXX
13 0.112 XXXX
14 0.212 XXXXXX
15 0.073 XXX
16 0.065 XXX
MTB > PACF 'SEASDIFF'
PACF of SEASDIFF
-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
+----+----+----+----+----+----+----+----+----+----+
1 0.333 XXXXXXXXX
2 0.090 XXX
3 0.194 XXXXXX
4 -0.203 XXXXXX
5 0.044 XX
6 -0.157 XXXXX
7 -0.102 XXXX
8 -0.161 XXXXX
9 0.007 X
10 -0.221 XXXXXXX
11 0.153 XXXXX
12 0.135 XXXX
13 0.204 XXXXXX
14 0.041 XX
15 -0.120 XXXX
16 -0.107 XXXX
MTB > ARIMA 0 0 0 1 1 0 12 'Y' RESIDUALS INTO C41
Final Estimates of Parameters
Type Estimate St. Dev. t-ratio
SAR 12 0.4091 0.1365 3.00
Differencing: 0 regular, 1 seasonal of order 12
No. of obs.: Original series 60, after differencing 48
Residuals: SS = 385.360 (backforecasts excluded)
MS = 8.199 DF = 47
MTB > NAME C1 'GNP' C2 'FYGI3'
MTB > READ 'mleh1221 dat m' C1-c2
60 ROWS READ
ROW C1 C2
1 972.0 7.599
2 986.3 6.742
3 1003.6 6.541
4 1009.0 5.820
. . .
These series exhibit a strong trend (upward); therefore, they are differenced to achieve stationarity, as follows.
MTB > diff 'GNP' put into C11
MTB > DIFFerence 'FYGI3' put into C12
MTB > name C11 'DIFFGNP' C12 'DIFFTBIL'
MTB > save 'MLEH1221 MTW M'
MTB > CCF 10 lags of C12 and C11
CCF - correlates DIFFTBILt and DIFFGNP(t+k)
-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
+----+----+----+----+----+----+----+----+----+----+
-10 0.158 XXXXX
-9 -0.126 XXXX
-8 -0.063 XXX
-7 -0.212 XXXXXX
-6 0.038 XX
-5 -0.015 X
-4 -0.056 XX
-3 0.050 XX The largest values
-2 0.084 XXX are
-1 0.359 XXXXXXXXXX <=== here and
0 0.386 XXXXXXXXXXX <=== here.
1 0.117 XXXX
2 -0.143 XXXXX
3 0.137 XXXX
4 -0.167 XXXXX
5 -0.095 XXX
6 -0.114 XXXX
7 -0.087 XXX
8 -0.029 XX
9 0.047 XX
10 0.061 XXX
The value of k ranges from -10 to +10. For negative values of k, the CCF give the correlation between current DIFFTBIL and past DIFFGNP; for positive values of k, the CCF gives the correlation between current DIFFGNP and past values of DIFFTBIL. For k equal to zero, the CCF gives the correlation between contemporaneous values of the two series. The largest correlations are for lags k = 0 and -1, indicating some relationship of current DIFFTBIL with current and past DIFFGNP.
Thus in trying to predict DIFFTBIL, one might consider a model in which DIFFTBIL is regressed on current and past DIFFGNP, and possibly also include terms for autoregression on DIFFTBIL.
Since steel is made from iron and coal, it makes sense to try
to predict steel prices from iron and coal prices.
Let
St = this quarter's price of steel,
It = this quarter's price of iron, and
Ct = this quarter's price of coal.
One might consider a transfer-function model
However, one would not really want to say that I and C determined or "caused" the price of steel (S) unless one took account of the possibility of predicting St from St-1. Hence one would consider a model
An interesting example is Sales vs Advertising Expenditure.
1. Discuss the ARIMA(0,1,0) model.
2. Discuss the ARIMA(0,2,0) model.
3. Discuss the ARIMA(0,1,1) model.
4. If Yt = .5yt-1 - .5yt-2, t = 3,4,5,...,12, and y1 =24 and y2 = 24, compute Yt, t = 3, 4, 5, ..., 12. Describe the behavior of this series.