Against All Odds: Inside Statistics, program #6 (Time
Series)
A time series is a set of time-ordered data obtained at regular, equally spaced time points.
Example. Stock prices. The series of prices of a stock every time it is traded is time-ordered data. The series of closing prices of that stock, consisting of the price of the stock at the close of each trading day, is a time series.
Typical notation for a time series is
y1, y2, . . . , yn.
Example. Demand-driven Production/Inventory system. Manufacturing firms produce goods and maintain inventory to meet demand. Production, Demand, and Inventory are closely related. Production and Inventory must be balanced with Demand.
Here the role of statistics is to predict ("forecast'') demand. Demand may be growing, shrinking, or seasonal, and we need to forecast it. Statistical methods are used to organize and summarize the demand data, to develop forecasts and thus help set ideal production and inventory levels.
How might you forecast Demand? First, you
need an accounting system which keeps track not only of Sales, but also
of Demand. (Demand equals Sales plus unfilled orders). The symbol
t denotes the current time period. Then t+1
is the next time period. One forecast would be to guess that
demand next period, Dt+1, will be equal to
demand this period, Dt . Using Ft+1
to denote the forecast of the value at time t+1, we write Ft+1
= Dt. This simplest of all predictions, namely, that
the value next period will be the same as this period, is called the 'naive''
prediction. Another forecast would be to guess that the demand
next period, Dt+1, will be equal to
demand this period, plus a change which was equal to the change just preceding:
Ft+1 = Dt + (Dt - Dt-1)
= 2Dt - Dt-1. You can see that
there could be many reasonable schemes for predicting next period's demand,
based on the stream of past demands. We need some theory to suggest what
might be best. And after we see what might be best, we need to be able
to do the required computation. Except in the simplest cases, the computations
are heavy, as with regression analysis. Excel has only limited capability
for time series analysis, so statistical computing packages (such as Minitab)
are usually used for all but the simplest time series calculations.
Example. Income statements.
An income statement showing comparisons with preceding years can be viewed
as an example of multiple time series. Each item (row) on the income
statement can be considered as a variable. There is a
time series across the years shown for each of these variables.
The mean values are the "true'' values that would still be present if we could observe the variable more than once at each time point. The error is random and would differ on each occasion of observation.
We would like to estimate the series of "true'' mean values. They contain the true pattern of the series . Smoothing means the averaging of adjacent observations. One reasonable way of estimating the mean values is by smoothing. The reason smoothing works is that the trend, the underlying true values, usually moves slowly, so that its adjacent values, and consequently those of the observed series, are not far apart. When we average adjacent values, the errors tend to cancel out and the trend is well estimated. The smoothed series consists of these averages. In the smoothed series, the smoothed value at time t replaces the observed value at time t; St replaces yt. The resulting series is called "smoothed'' because usually the sizes of any short-term ups and downs are decreased by the averaging. It is usually easier to spot a trend in the smoothed series than in the original series.
Forming a moving average is a way of smoothing.
A moving average of width three (three-point moving average)
is the result of averaging the observations in a moving window of
width three. First there is the average of observations at times
1, 2 and 3, then the average of observations at times 2, 3 and 4, then
the average of the observations at times 3, 4 and 5, etc.
The analogous definition applies for windows of widths other than three.
Moving averages are also sometimes called running averages.
Example. In AAO Pgm 6, the running
median of Boston Marathon winning times is computed.
Moving Averages
Of course, in forming a moving average we could use a mean instead of a median. The resulting moving mean is usually called simply a moving average. The centered moving mean of three is the smoothed value
St = mean of yt-1,
yt, yt+1 = (yt-1+
yt + yt+1)/3. For example,
with the winning Marathon times,
S1973 = mean of y1972,
y1973,
y1974
= mean of 190, 186, 167
= (190 + 186 + 167)/3 = 181.0.
A three-point moving average is one choice; another common choice is a five-point moving average.
Suppose we want to forecast using just a two-point average.
We could take Ft+1 = St,
where St = .5yt + .5yt-1.
The next value of the smoothed series would be St+1,
or
.5yt+1 + .5 yt.
It would be more proactive if we could take Ft+1
to be St+1 rather than St.
We can't do this, because we haven't yet observed yt+1.
But we could predict it, using a preliminary predictor F't+1
= .5yt + .5yt-1 and then take our final
predictor Ft+1 to be Ft+1
= .5F't+1 + .5yt-1.
This is Ft+1 = .5(.5yt + .5yt-1)
+ .5yt = .75yt + .25yt-1.
This ``look ahead'' method is called a predictor-corrector method because we first predict the missing future observations and then use the predictions to correct our initial forecast. Note that, although we started with equal weights, the final weights are not equal, and the more recent observation is weighted more heavily.
Form a two-point predictor-corrector forecast using weights 0.9, 0.1.
Form a three-point predictor-corrector forecast using initial equal weights.