This content is no longer maintained. Please visit our new website.

ACCC Home Page Academic Computing and Communications Center  
Accounts / Passwords Email Labs / Classrooms Telecom Network Security Software Computing and Network Services Education / Teaching Getting Help
ISO 8601 Dates: What They Are and How to Use Them
Contents 1. What They Are 2. In Programs 3. In Output 4. On Personal Computers
5. Fix Data/Programs A1. Important Dates A2. Leap Years A3. A Y2K Parable  

Identifying and Correcting Dates with Two-Digit Years


The year 2000 is over now, but you might still have Y2K problems to consider -- dates with two-digit years in your programs and data, which is discussed in this article. Another very valuable resource is IBM's The Year 2000 and 2-Digit Dates: A Guide for Planning and Implementation, a 250-page book that's available online. "Chapter 4. Identifying 2-Digit-Year Exposures" and and "Chapter 5. Reformatting Year-Date Notation" cover the questions of how to find two-digit date problems and what to do with affected data and the programs you use to process it.

Pinpointing the Problems
  Consider all your processes, projects, and studies, and ask yourself, "Does this process, project, or study have anything to do with the passage of time?" Does it, for example, record the date on which anything occurred? And then consider today's date? Subtract the two, and you might have a problem.

Now you've got to go looking for those dates.

The examples in this section are fragments of SPSS code. SPSS is neither better nor worse than others with regard to Year 2000 problems. The problems and solutions here apply also to SAS, BMDP, C, FORTRAN, UNIX's s program, Rexx, and so on. SPSS is commonly used, on all computing platforms from mainframes to UNIX, PCs, and Macs, and so it seemed to be a good choice for examples.

(But beware, dates can hide! You might, for example, solve all the problems in your data and programs, but forget about the data's containers -- file names themselves might contain dates. For instance, your data from July 17, 1998 might be stored in a file named DT980717.DAT. That's a date, a date with a two-digit year that has a year 2000 problem!)

-- Finding Year Problems in Programs
  A reasonable way to start to find dates is to use a file searching program, such as grep on UNIX or CMS, or Advanced mode Find in Windows 95, and look for the string "date" in all files with extensions suggesting that they are SPSS programs, SAS programs, or the like. You might also look for the strings "yy", "year", or "yr". (This will help you find dates in your programs, and, assuming that you know which data files are/were used as input to those programs, it will also identify data files that need fixed.)

Here's an obvious example of the kind of thing you're looking for, one that this method will point out. This SPSS code fragment reads dates, and it is easy to tell from the variable names what is going on, and that there is a Y2K problem -- it will report that babies born in 2000 are 100 years old:

DATA LIST / datemo 1-2 datedy 3-4 dateyr 5-6          /* read mmddyy data */
COMPUTE thisyr = XDATE.YEAR($DATE)            /* Get this year - 4 digits */
COMPUTE age = thisyr - (dateyr+1900)   /* calculate age by subtracting    */
The year part of the date is stored in a variable named dateyr and it's only two digits wide (columns 5-6) in the input. (It is interesting to notice in this example how the decisions made two decades ago were compensated for one decade ago -- by simply adding 1900 to the year. It was assumed by most until just recently that the twentieth century would last forever.)

If only all Y2K problems were all this easy to spot!

Now let's look at another SPSS procedure that does the same thing, but this one will be nearly impossible to spot by mechanical searching. These kinds of non-descriptive variable names are actually encountered more often in the real world than the nice, descriptive ones in the example above.

DATA LIST / v1 1-2 v2 3-4 v3 5-6 v4 7-11 v5 12-16
SORT CASES v3 v1 v2
...and now you go looking for those recent babies down at the bottom of the file, and you find to your alarm that they've been kidnapped! (They're still there of course, just before the oldest people in the study, at the top of the file.) Look at it carefully -- would this be easily spotted as a year 2000 problem? There are no clues. The clue comes from your knowledge of what this program does: "Oh, that's the one we run to print a list of subjects arranged by age."

-- What Might Dates Be Called?

  • as-of, asof
  • begin, beg, bgn
  • cc, yy, mm, dd, and any combination of them
  • current
  • date, dat, dte, dd
  • dob (date of birth)
  • end
  • expire
  • julian
  • month, mon, mo, mm, mmm
  • start
  • term
  • time, timestamp, time-stamp
  • this, thisdate
  • today, tod, t-o-d (today and also time-of-day)
  • week, weekday, weekend
  • year, yr, yy
  Remediation a fancy word for repair; you'll see in some of the literature about the Year 2000.  
-- Fixing File Names that Are Dates
  It is extremely commonplace to encounter schemes for storing data within computer files where the file name contains the date when the data was collected. For instance, the data for July 17, 1998 might be stored in a file called DT980717.DAT. If that file naming scheme contains a two-digit year, raise The Proverbial Red Flag. When those two digits become 00, you may no longer be able to find your newest data. Worse, in further data analysis, you could reprocess the data for December 31, 1999 as the data for every day thereafter, because its file name would have a greater value. You might not spot such a problem until your research assistant came in to tell you that, since New Years Eve, the data for every day seems to look strikingly similar, despite the care you took to make sure your lab instruments were ready for 2000. By then, the raw data files for some days might have been lost - especially if you automatically erase the oldest data to conserve disk space. (This is essentially the same problem as a warehouse where the stock stopped rotating, resulting in unexpected spoilage.)

If you can, you should expand the year part of the file name to four digits. However, that might not be an option due to limits on file name length. Another option might be to shrink the month down to a single alphabetic character, where A=January, B=February, and so on. Then use the space you gained for a century key, where 1900s=0, 2000s=1, 2100s=2 and so on. The file name DT980717.DAT would become DT098G17.DAT, and the data collected on February 29, 2000 would be in file DT100B29.DAT. Note that these two file names would sort correctly. (As you devise file name schemes, do not depend on whether numbers will sort before letters, or after them -- this differs on different computers.)

-- Fixing Two-Digit Years at the Data Source
  The best way to fix problems involving two-digit years is at the source. It is also the hardest. Change your data entry procedures to record all four digits of the year. While you're at it, examine whether or not you can ship the data file around in a database-type format, instead of as raw data. That can really save a lot of effort. For instance, lets say that your study's data entry has evolved from the keypunch machine of 25 years ago to a form-filling-out thing based on MS/Access on a PC. After the data is collected, you have Access format it with everything in all the same columns as you used in your 25-year-old punched cards, write it out into a text file, and then use that text file as input into SPSS. Complete with two-digit year.

A remediation strategy here might be to have Access require the user to enter all four digits of the year (Access can do that), then to save the input file it creates in a database format such as .dbf, which SPSS can read directly. If variable names do not match anymore, they can be renamed easily.

Data acquired from another source requires investigation of that source and review of its year 2000 compliance. You also need to review the format you are receiving the data in. If possible, receive data in "database" format, such as a dBase .dbf file, an SPSS System File, or a SAS dataset. These database formats store date data in internal formats which will carry on well into the next century without any problem. Such formats are also preferable to text formats in that they contain data dictionary information as a part of the file.

(Note that it's not enough to make sure that data is in database format; you should also examine the assigned format widths to insure that they are sufficient for a four-digit year. For instance, if the SPSS display dictionary command shows a format type of ADATE8, you will only get a two-digit year, even though it might be stored internally in the database with all four digits. It must be at least ADATE10 for a four-digit year to be displayed.)

Using a Century Marker

If you must keep data in a plain text (ASCII) file, another strategy might be to add a new variable at the end of each record that contains the missing digits of the year. (To help prevent future confusion, avoid the temptation to call this variable "century" -- the 1900s are the 20th century! "Century marker" is better.)  Starting now, you will include this new field in all create new files. If you use this method, you will not need to convert your old files -- they can still be read correctly. When the program reads a file that does not contain the century marker variable, it can be read with the old assumption that all the dates are in the 1900s still operative. SPSS, like most programs, assigns a standard "missing value" to variables that are read past the right edge of the actual data records. SPSS calls this "sysmis"; other programs might use the number 0. This does the least damage to existing layouts. Most likely, your data layout had been expanded years ago way beyond the width of an 80-column card, to accommodate a variety of additional information, and we're just extending it further.

So, now that first example becomes:

DATA LIST / datemo 1-2 datedy 3-4 dateyr 5-6 datehund 81-82 /* read it    */
IF (SYSMIS(datehund)) centmark = 19       /* Not    there? It's old data. */
COMPUTE dateyr = (centmark * 100) + dateyr          /* Combine year parts */
COMPUTE thisyr = XDATE.YEAR($DATE)            /* Get this year - 4 digits */
COMPUTE age = thisyr - dateyr             /* calculate age by subtracting */
...and the recently born babies are young again.

Some other methods:

IBM discusses century markers and suggests several additional methods of treatng date data with two-digit years in chapter 5 of its The Year 2000 and 2-Digit Dates: A Guide for Planning and Implementation.

The methods discussed (including the implications, both pro and con, examples, and how to information) in the IBM book are:

Conversion to full four-digit years
Adding century markers
Compression --
Change input files so that you can contain a four-digit year in the space that you formerly had a two-digit year. This requires changing both your data file and the programs you use to process them.
Two-digit encoding --
This is a varient of the windowing that encodes the years allowing you to represent a wider range of dates. For example, using two hexadecimal digits with a base year of 1900 allows you to represent from 1900 to 2155. (Hex FF is decimal 255.) 

And there's one more method, not specificially discussed by IBM, but it's related to the "two-digit encoding scheme":

Day count conversion approach --
Change the date data in input files to an integer number of days since a specific date. This also requires changing both your data file and the programs you use to process them, but it doesn't increase the size of data files and it leaves you with dates in the input files that you can still understand.
And What If That Cannot Be Done?


At the end, you may still be stuck with data containing two-digit year fields. You have tried to get it expanded to four digits at the source, but it simply cannot be done.

In this case, you need to adopt a technique called "windowing". Windowing means taking the two-digit year and applying common sense to determine the century that it belongs in.

Obviously, windowing cannot be used for data that might span a period greater than 100 years. For instance, the birth years of people in the general population. There have always been a number of people living beyond the age of 100, and as health care improves that number can only increase. If you code a birth year as "96", that could be either 1996, which would indicate a two-year-old, or 1896, which could indicate a 102-year-old. For a short time after the year 2000, there will be people alive who were born in three different centuries!

Example 1: 100 Year Fixed Window, 1973 to 2072

In the 100-year-old babies study, we're studying children, and the oldest ones in our study were born in 1973. So we don't have to worry about data spanning more than 1000 years until 2073. So in this case, we could can establish our "window of time" as being 1973-2072, since we already know we have no data from before 1973. This simplest form of windowing is called a "fixed window".

DATA LIST / datemo 1-2 datedy 3-4 dateyr 5-6          /* read mmddyy data */
COMPUTE dateyr = dateyr + 1900
IF (dateyr < 1973) dateyr = dateyr + 100               /* Years 2000-2072 */
COMPUTE thisyr = XDATE.YEAR($DATE)            /* Get this year - 4 digits */
COMPUTE age = thisyr - dateyr             /* calculate age by subtracting */

Once again -- young babies.

Example 2: One Year Window

Consider data output by a lab instrument that contains a two-digit year. You have verified that it will correctly change from 99 to 00 in the year 2000, and that it will consider 2000 to be a leap year. (In fact, you discover that the processor at its heart is a now-obsolete IBM PC, running DOS, so you know you will need to reset its date on January 1, 2000, see: "IBM/Intel/Windows PC BIOS Tick-Over Bug".) Your computer programs process the data output from this machine within a month of the time it was emitted by the machine. All archives of data from this machine are stored with a four-digit year that has been calculated by your program that receives the raw data.

The time period you need to worry about is only one month, so your window need only be for the previous year. If the two-digit year from the machine is ever greater than the present two-digit year, the data is from the previous century.

Example 3: 50-Year Sliding Window

Consider home mortgages. These typically last up to 30 years. Since you must deal both with mortgages made 30 years ago which are about to be paid off, and with mortgages made today which will not be paid off until 30 years hence, the total span is 60 years. None last for anywhere near 100 years. This makes it a good candidate for the most ordinary and generally useful form of windowing -- the 50-year sliding window, based on today's date.

Warning About Windowing

Windowing is not free from risks! If you try to develop a single windowing rule, and apply it to all situations, it will be wrong some of the time. You must develop your windowing formula according to your knowledge of the data involved.

Be particularly careful of windowing more than once. Most real-world applications consist of chains of applications strung together by intermediate files, like so many beads. You can actually corrupt your data if you use different windows at different places in the chain. The two-digit year 55 might be understood to be 1955 at one point, but as 2055 at another.

Windowing, in particular, must be tested. This is an area where your inventiveness can get the best of you, and can cause errors. In the literature on Year 2000 Time Machine testing, a great many of the errors that were found and fixed involved doing widowing incorrectly.

See chapter 5 of IBM's The Year 2000 and 2-Digit Dates: A Guide for Planning and Implementation. For more information about bridge programs.

Bridge program (or data bridge)

A bridge program is used to convert the date data output by one program (presumably one that has a Y2K problem) to another format (say, one with a four-digit year) before it is input into another program. For more information about bridge programs, see chapter 5 of IBM's The Year 2000 and 2-Digit Dates: A Guide for Planning and Implementation.
External Windowing: How Statistical Packages Handle Two-Digit Years
  Statistical and Database Packages may or may not include features for automatically windowing two-digit years into four-digit years. We will call this "External Windowing". Their use is discouraged. The reason they are not good is that they might be applied across the board without regard to the nature of the data set itself. Also, external windowing might conflict with any internal windowing formula that somebody might have coded into the procedure itself,


SPSS does it plain and simple. The SPSS manuals have always stated that two-digit years will be considered to be in the 1900s. Period. This is, actually, a form of external windowing, where the window is fixed between 1900-1999. Relying on this behavior is just as risky as relying on any other form of external windowing.

SPSS version 8 for Windows (only) contains a feature for adjusting SPSS's external windowing to be something other than 1900-1999. Its use is discouraged, especially because it could break older SPSS programs that depended on the previous 1900-1999 window.


SAS has extensive facilities for applying external windowing. In my opinion, these facilities are so widespread, that the risks of unintended consequences in SAS are extreme.

You will probably be unable to avoid having your windowing formulas conflict with these facilities, so you should arrange not to read any two-digit years at all in SAS. Instead, read them as regular numeric variables, and do the windowing calculations yourself, based on your knowledge of the data.

Microsoft Access

Access handles two-digit years differently depending on the exact version of Access, and of the version of a program called OLEAUT32.DLL. Depending on the combination of versions, you might have a fixed window consisting of the current century, or a fixed window from 1930-2029. Check the documentation that came with Access to find out.

This page last updated 1998-10-08. Please send comments and reports of broken links to the author: Roger Deschner

Using ISO 8601 Dates Previous:  4. On Personal Computers Next:  A1. Important Dates

2001-12-21  ACCC Documentation
UIC Home Page Search UIC Pages Contact UIC