DANGEROUS DATES

	Sign In Sign-Up

DANGEROUS DATES

FOR SOFTWARE APPLICATIONS

Version 2 - March 24, 1998

Abstract

The year 2000 date problem is not the only calendar problem causing trouble for software applications. This article highlights some of the known date problems that are likely to affect software applications over the next 50 years. Other date problems that might impact software include the date at which global positioning satellites (GPS) roll over, the dates at which commodities switch to the Euro, the dates at which the UNIX and C libraries roll over, and some hazardous date patterns which have been used for non-date purposes in software applications. In addition, at some point early in the next century the numbers of digits assigned to social security numbers and telephones will run out of capacity.

Over the next 50 years at least 60,000,000 software applications will need modification because of various date problems. The total costs of these modifications can top $5 trillion dollars. The report concludes that because date problems with computers and software are so widespread, serious, and expensive, a new international standard for dates should be developed for computer purposes. The proposed date format includes a "key" field which is used to identify which specific date format follows. This method would allow older date formats to be used, and would support multiple calendars.

Capers Jones, Chairman

Software Productivity Research, Inc.

1 New England Executive Park

Burlington, MA 01803-5005

Phone 781 273-0140 X-102

FAX 781 273-5176

Email Capers@SPR.com

CompuServe 75430,231

Web http://www.spr.com

INTRODUCTION

Because the time it takes for the earth to move around the sun is 365 and roughly 1/4 days, keeping accurate time has been a problem for the human species ever since history began. Adding to the problem, the monthly cycles of the moon and the annual cycle of the sun are not exactly equal and only coincide every 19 years. Because the solar year is not evenly divisible by the number of days, calendars have needed periodic adjustments to bring them back into synchronization with the solar year.

The invention of atomic clocks and the advent of the coordinated universal time system (abbreviated UTC) have made the measurement of time the most accurate form of measurement known to the human species. While time at the level of seconds can be measured with an accuracy of 1 second in several million years, the coarser measures of dates and calendar intervals continue to be troublesome, and are particularly troublesome in the context of computers and software applications.

With the advent of computers and software, calendar and date problems have suddenly taken on a new importance. Now that so many important human activities are being performed by computers and software, date errors can disrupt interest rates, funds transfers, air traffic control, telephone systems, and electric power generation and cause a host of serious and often unexpected problems.

In theory computers and software should be able to keep track of time and dates with greater accuracy than any other human artifacts. Unfortunately the theoretical capabilities of computers and software for both and date and time keeping have not been achieved because of a historical stumbling block: Storage costs for date information were high enough so that insufficient space was allotted for full precision. Therefore whenever possible, storage space was conserved.

Two common methods of recording dates are used in computers and software applications: 1) Storing dates using conventional date representation formats such as the year, month, and day; 2) Accumulating the number of seconds from an arbitrary starting point. Both of these methods have been troubled by insufficient storage space.

For conventional date formats the most traditional way to conserve space was to truncate four-digit years fields and use only two digits so that a year such as 1998 would be stored as 98.

For the method of keeping track of dates by accumulating seconds from an arbitrary starting point the size of the field is usually limited to no more than 4 bytes of storage space. This method will "overflow" and reset to zero or to the initial starting point when capacity is exceeded.

For example, the date "buckets" used on global positioning satellites (GPS) record time for 1023 weeks, and then reset to week 0 on the 1024th week and continue this cycle about every 20 years.

Starting in August of 1999 and continuing at intervals over the next 50 years both methods used by computers and software for dealing with dates will experience problems because of the historical practice of attempting to conserve storage space.

Both of these methods have worked reasonably well up until now, but both will run into serious problems when their storage boundaries are exceeded. What happens when the computer date storage volumes are exceeded is now a very serious issue which can cause untold economic damage and perhaps physical damages too, in the sense of shutting down electric power plants, stopping assembly lines, or grounding aircraft. Let us examine some of the known date problems that are going to affect computers and software over the next 50 year period.

Several other problems associated with insufficient numbers of digits will occur during the next 50 years too, but at moments in time that are somewhat unpredictable in 1998. An interesting report by Dr. Clifford Kurtzman (Kurtzman 1997) notes that the population of the United States will exceed the capacity of 7-digit phone numbers around the year 2025. We are already experiencing frequent problems with the need to reassign area codes. The capacity of U.S. social security numbers (9 digits) will be exceeded by about the middle of the century, say 2050.

The year 2000 is a specific instance of a general problem which will trigger massive expenditures unless it is solved once an for all. The general problem is the assignment of an insufficient number of digits for key numerical information. This problem already manifested itself circa 1980 when some applications had to be modified because they did not have enough digits to keep pace with salaries and overtime when compensation levels began to top $100,000 for many professions.

Between now and roughly the year 2050, a huge amount of effort and hundreds of billions of dollars of costs will spent on expanding numeric fields in software applications:

•Financial fields starting circa 1980

•Zip codes starting circa 1985

•Date fields starting circa 1999

•Telephone numbers starting circa 2025

•Social security numbers starting circa 2050

The cumulative costs of expanding numeric fields as their capacity is exceeded will erode many of the economic advantages of the use of computers and software. It is obvious that a more permanent general schema must be developed before the maintenance expenses trigger bankruptcy and litigation for hundreds of corporations and even for some governments.

Incompatibilities Among International Date Formats

For centuries the way dates are represented when they are printed have varied from country to country. These variations presented no real problem until the advent of the computer era. Even with computers, the problems were fairly minor but it was obviously necessary to know which date format was used to ensure correct date calculations.

For example, in the United States we normally use a format of "month, day, year" such as 10/6/98 for October 6, 1998. In much of Europe the same date would be printed using the format of "day, month, year" or 6/10/98 for the same day. Obviously the European form might be misinterpreted as June 10th in the United States, or the U.S. format might be misinterpreted as June 10th in Europe if the software assumed the wrong alternative.

To facilitate international trade and commerce using computers and software, the International Organization of Standards (abbreviated to ISO) has proposed a standard date format that expands the number of year digits from two to four. This is the well-known ISO standard 8601: 1988(E). This same format is supported by the American National Standards Institute (ANSI) and also by the National Institute of Standards and Technology (NIST).

The ISO date format puts the year first, then the month, and then the day using the format yyyy/mm/dd. Thus the date of October 6, 1998 would be represented as 1998/10/06 using the proposed ISO standard. (Note that the slash symbols "/" are not part of the date standard but are simply used here to enhance legibility on the printed page.)

Unfortunately, the most common date format used in the United States works in the opposite direction, and puts the years last. This is the default representation on various Microsoft products, although Microsoft's products can support the ISO format too. Thus for many personal computer applications in the United States, dates are represented in the sequence of the month first, followed by the day, followed by the year: mm/dd/yyyy. Thus October 6, 1998 would be 10/06/1998 using Microsoft's U.S. default date format.

Unfortunately the four-digit ISO standard for date formats is not fully adequate. Both the ISO standard and the normal U.S. date representation share a common failing when trying to deal with dates and computers. Both of these date formats exhibit unconscious attempts to conserve storage space without realizing that this is causing unnecessary problems.

By adding at least one extra digit to the ISO date format, any date representation could be accommodated by using the extra digit as a key (shown as "x" in the examples) to identify whether the ISO date format (x-yyyy-mm-dd) or the U.S. default date format (x-mm-dd-yyyy) was intended. The key could also identify other alternatives, such as the normal European date format (x-dd-mm-yyyy) or even Julian dates, which record the number of days from the beginning of a year starting with 1 and running to 365 or 366. Even the traditional Japanese dates based on Imperial reigns could be accommodated.

Using an extra digit (or digits) as a key with the following meanings would make identifying which date format is intended a lot less messy than the current situation. Today ascertaining which of the many possible date formats might be used in software applications either requires advance notification to programmers and users, or extraordinarily complicated algorithms for deriving dates, with no absolute way of knowing if the date format selected is the right one without inspection or testing. Consider how versatile date logic would be if one or more extra digits were utilized:

Possible Date Format Key Using One Additional Digit

The example shown above illustrates what might be done using only a single extra digit. For many date and time-keeping purposes, it might be desirable to include not only century, year, month, and day information but also weeks, hours, minutes, and seconds. Thus if a date key is used to identify which format is being utilized, even the following 16-digit date format could be used if needed:

x-yyyy-MM-ww-dd-hh-mm-ss

In this 16-digit format x is the date code; yyyy represents years; MM represents months; ww represents weeks; dd represents days; hh represents hours; mm represents minutes; and ss represents seconds. Even 16 digits is not enough precision for some uses, so the schema could be extended down to the nanosecond level. If it takes 20 digits or more, but any known date format might be incorporated into the schema, then conservation of space is irrelevant.

For a universal date format there may be hundreds or even thousands of date variants which would need specific keys. Therefore a 4-digit key following by 20 digits of date information should be able to accommodate any known calendar, and operate over arbitrarily long time periods.

Incidentally, the ISO standard date format is not adequate for scientific purposes. For dealing with geological time periods, spans of millions of years must be accommodated and most of this time would be in the BC era and hence require negative numbers. For astronomical time, billions of years must be accommodated. Indeed, for astronomical purposes the calendars of other planets such as Mars may eventually need to be accommodated.

The main feature of the author's proposed date format is that it is aimed at storing dates in computers in an era where unlimited optical storage is the rule. Therefore it is unwise to continue to develop date formats whose utility is compromised by an unconscious need to conserve storage space. Let us design a computerized date format that can last indefinitely, support scientific as well as business dates and time, and support all of the older date format variants. As the situation now stands, there are no current or proposed date standards by ISO or anyone else that are fully adequate even for business if it is transacted by computers, to say nothing of scientific purposes.

Under current date formats, it is almost impossible to utilize technologies such as data mining and on-line analytical processing (OLAP) for scientific data associated with geology, archeology, astronomy, etc. because the dates involved exceed the ranges of standard date formats, and in many cases, exceed the date handling ability of normal business software applications such as spreadsheets and data base packages.

Adding extra "key" digits to date formats when used in computers would allow any conceivable date format to be included in the general schema, so that geologic and astronomical time, Julian dates, the Chinese calendar, the Jewish calendar, or even the Aztec calendar could be utilized as needed.

The date key would not have to be printed or appear on screen, but the presence of a date key would enable software applications to handle calendar calculations with far greater ease and flexibility than has ever been possible since computers became business and scientific tools.

It should be noted that the general solution of using a key field to identify which specific numeric or alphanumeric format follows can be used to deal with other problems besides dates. This same method might be used to handle the international variations in zip code formats, or the international variations used for social security numbers, or their equivalent in other countries.

Because so many messy date and numeric field-length problems for computers and software are going to occur over the next 50 years, it would be highly desirable if we could create a long-range standard that enabled us to handle dates in computers that supported all calendars and all date representations. We can never accomplish this unless we get over the unconscious need to conserve storage. Storage is now cheap, and if assigning 20 digits can simplify date logic and date calculations, now is the time to do it.

An expanded date format would require changes to software applications and databases and would be expensive to implement. But between the year 2000 problem the UNIX date rollover and other date problems, we are already going to spend several trillion dollars in software date changes . Right now, the new replacement dates will have the same kind of problem as the current dates: They don't have enough digits to handle scientific date purposes, they too will overflow, and eventually they will have to be changed again.

Make no mistake about it:, computer and software dates and numeric data are far more important than printed dates. It would be enormously valuable if a truly effective date standard could be developed. Right now none of the current date standard formats are going to accomplish anything but cause more long-range problems for software and computer vendors, and a continuing need for tricky and error-prone date calculations.

The author recommends that an international symposium be held on computer date problems which would deal not only with business date recording but also with the need for recording scientific dates and other numeric data such as telephone numbers. The purpose of this symposium would be to develop a date strategy which would eliminate the kinds of problems which we are about to experience between 1999 and 2038 when the Euro kicks in, the GPS date roll-over occurs, the year 2000 event happens, and the UNIX date roll-over occurs in 2038. Let us now consider some of the major date problems that the computer and software industry will be facing over the next 50 years.

January 1, 1999 (The Beginning of the Euro Currency era)

The European Union is moving toward a unified currency (the Euro) which is scheduled to be introduced during the period starting in January 1, 1999 and running through calendar year 2002. There are two significant date problems associated with the Euro:

1.The timing of the Euro introduction. 2.The impact of the Euro cutover on software and data mining.

The timing of the Euro is one of the worst public policy decisions in human history because it pits the world's second largest software project (the Euro) against the world's largest software project (the year 2000). There are not enough software personnel available to complete either one of these massive efforts in time, and the whole idea of trying to accomplish both of these on the approximately the same schedule is going to cause major economic problems.

For additional information, refer to the author's book "The Year 2000 Software Problem - Quantifying the Costs and Assessing the Consequences" (Addison Wesley Longman, 1997). See also the author's short articles on "Resource Conflicts Between the Year 2000 and Euro Currency Software Problems" (Year 2000 Journal, January/February 1998) and "Rules of Thumb for Year 2000 and Euro-Currency Software Repairs" (SPR technical report, February 1998).

The timing of the Euro is going to teach politicians a very painful lesson: they are no longer in control of many national events. In the pre-computer era when political directives were implemented by human beings, politicians could set arbitrary dates and expect to have their decrees implemented more or less when they wished.

In the computer era, political decisions which trigger massive software updates can no longer be scheduled using arbitrary end points determined by political processes, treaties, or government decrees. If massive software updates are involved, then the timing must be derived from the ability to accomplish software and database updates. Thus the Euro will not be implemented as planned no matter what politicians say or think, because the necessary software updates will not be ready in time.

Over and above bad timing, the software implications of the Euro are horrendous and may cause unexpected problems with stock market trading, banking, and other financial software applications. In addition, the Euro may interfere with data mining, data warehousing, on-line analytical processing (OLAP) and all other forms of analysis which look for trends over time.

The harmful date implications of the Euro are due to the fact that thousands of commodities such as stocks, bonds, manufactured products, services, etc. will begin to switch from local currencies to the Euro when the Euro rolls out.

Many software applications can handle the appearance of new currencies since they have occurred from time to time throughout history. The problem with the Euro is that no software applications were prepared to deal with the situation that products which have been on the market for many years will abruptly start being priced using the Euro rather than their present currencies in the same countries.

This means that any software application which does long-range trend analysis before and after the Euro introduction will have to include complex currency conversion algorithms. Either the older cost data will have to be converted to the Euro, or the newer Euro costs will have to be backfitted to the older currency. Sometimes it may be necessary to do both: show prices in terms of both the Euro and the old currency.

Suppose you have been following the long-range stock prices of a European company such as Siemens-Nixdorf. After the Euro is introduced, the stock values will be displayed in Euro's rather than in Deutschmarks, so all of your long-range trend analysis must handle the currency conversion as of the date when the stock makes the transition to the Euro.

Suppose you are interested in long-range trends concerning the price of a basic commodity such as the cost of wheat in Germany. The introduction of the Euro will cause an abrupt discontinuity is historical data analysis, and all long-range trend analysis will have to bridge the pre-Euro and post-Euro changeover.

Thus many thousands of software applications must be modified to handle the change from older currencies to the Euro. Also, data bases and data warehouse and data mining operations must also deal with the currency changeover.

Making the situation still more difficult, the Euro is being introduced over a sliding time period which will vary by country, industry, and commodity. Thus the dates for the conversion of specific products and services from local currencies to the Euro will take place at almost random intervals between 1999 and 2002, or longer since it is obvious that the Euro is going to run late.

The changes needed to fully support all of the implications of the Euro are not trivial. It is easy enough to add a new currency to software applications, but to support historical analysis of products and services which have been marketed for 50 years in one currency and abruptly switch to being marketed in another currency is a very serious issue for long-range trend analysis.

As can be imagined, the Euro is going to cause major problems for data mining, on-line analytical processing (OLAP) and any other form of long-range trend analysis which spans the pre-Euro and post-Euro periods.

The politicians of the European Union appear to be blissfully unconcerned about the damages which the Euro will cause to software applications and business operations, and about the huge expenses needed to recover from the Euro introduction.

Indeed, as of 1998 many European politicians are still naively saying that the Euro will be introduced on schedule when it is painfully obvious that politicians have failed to estimate the schedules and costs of the necessary software updates. So far as can be determined, neither the European Union nor any of the national governments involved have produced a reasonable cost, schedule, quality, risk, and damage estimate for the Euro introduction. Indeed, the very real risks and damages have been ignored by most European political leaders.

The Euro may commence on schedule in 1999, but it will not be completed by 2002 regardless of assurances by European politicians. As the Euro begins to enter circulation and financial software lurches into the Euro era, we can anticipate unexpected failures and delays in many business operations. We can also anticipate several forms of litigation, including but not limited to the following:

1.Lawsuits against the European Union for damages and cost recovery 2.Lawsuits against national governments for damages and cost recovery 3.Lawsuits against companies whose Euro updates are imperfect and damage clients 4.Lawsuits against corporate officers by shareholders

It is an interesting question to ask if anyone has performed a cost/value analysis of the Euro. There may be some substantial long-range economic benefits from a unified European currency, but the ROI from 1999 through about 2005 appear to be distressingly negative. Indeed, the Euro software update costs will be so high that it is possible that the long-range value of the Euro may not exceed the damage costs until perhaps the year 2020 or 2025 if even then.

Now that computers and software are the dominant tools of commerce, industry, and business arbitrary government decrees such as the Euro which trigger massive software updates can be viewed as a form of hidden taxation. At the very least, private business is being forced to act as an unpaid agent of national governments.

In the future it is likely that government mandates such as the Euro which trigger immense business costs should include some form of relief for the affected companies, either in the form of reduced taxes, subsidies, or some other method of compensation for the work involved in software modifications.

Incidentally there is a minor but annoying software problem which will affect many thousands of software applications which display currencies, even if they don't do any complicated form of currency calculation. As this is written in 1998 there is no easy way to display the Euro currency symbol on a computer screen or print it out!

Unfortunately the new Euro symbol is not part of many older character sets. Under Windows 95 it is very difficult to print the Euro symbol, and currently only Microsoft Word can accomplish this without excessive difficulty. Under Windows NT and presumably under Window 98 the symbol will be included and easier to access.

Other computer hardware and software platforms are also not quite ready for printing or displaying the Euro, since the symbol has only started being included circa 1996. This means that many older computers, or older software applications, will not be able to display the Euro currency symbol even if they can handle the currency conversion calculations.

GPS End of Week Rollover (August 21/22 1999)

The current network of 24 global positioning satellites (GPS) keeps track of dates by recording the number of weeks from midnight on January 5th 1980 using a modulo 1024 approach. That is, the week counter will reset to 0000 after week 1023, which occurs at midnight on August 21, 1999. The GPS dates will of course roll over or reset to 0000 every 1023 weeks from then on, or about every 20 years so long as this method is utilized.

The roll-over of GPS dates has been clearly documented in the original standards (ICD-GPS-200), and all of the major GPS receivers and ground support units should have planned for the roll over and be ready to deal with the situation. Therefore on the face of it, the GPS rollover should not be a major technical problem.

However, we are dealing with applications that were designed and coded by human analysts and by human programmers. Because humans are involved, it is very likely that the GPS end of week 1023 rollover may have been missed or not handled properly in some ground stations or in some software applications which use the GPS system. Therefore many companies and government agencies which use the GPS system are now starting to perform tests of ground units and software applications to ensure that the roll-over will be handled correctly.

For navigational purposes, missing the roll-over may or may not have a major impact: probably not. But the GPS system is also used for purposes where a missed roll-over can have a major impact indeed. The GPS dates are used to synchronize major international funds transfers to ensure that interest payments are calculated to the second. For these applications, an error in missing the end of week roll-over can be very troublesome indeed.

Unfortunately, the GPS roll-over date of August of 1999 is putting this possibly minor date problem in the path of the much more serious year 2000 problem, which occurs only a few months later. This means that applications which use GPS time signals must be tested for both the GPS roll-over and the year 2000 problem more or less simultaneously.

The March 1998 issue of the journal IEEE Spectrum contains a useful article by David Allen, Neil Ashby, and Cliff Hodge which gives an overview of the GPS time-keeping methodology and equipment.

Because time kept by atomic clocks is not exactly the same as the solar time of the earth around the sun, there are small differences between UTC and GPS times The UTC utilized leap second to more or less match time measured atomically with the somewhat less precise period of the earth's rotation. The GPS time does not include leap seconds. Eventually these differences will have to be dealt with.

The "Nines" End of File Date Problems (September 9, 1999 and September 8, 2001)

Software applications need some way of indicating the ends of files, so that they can be shut down safely. One technique used for this has been to use a number such as 9999 as a file termination code. The problem with this method is that although 9999 is not intended to be a normal date, it might well be misinterpreted by some software applications and be considered as the date September 9, 1999. If the 9999 sequence is misinterpreted as a normal date, then obviously calculations will be in error.

A similar problem has been noted in UNIX applications where 999,999,999 has sometimes been used to indicate an end of a file situation. As it happens, the number string of 999,999,999 under UNIX is actually the date of September 8, 2001.

The implications of this are that although the normal expiration of the UNIX clock is not until 2038 AD, some applications written in the C programming language and running under UNIX may experience date problems in the Autumn of 2001, or 37 years before the main UNIX date problems occur.

Thus the aphorism that "UNIX does not have a year 2000 problem" may be true, but UNIX may well have some year 2001 problems in specific applications. UNIX does have a major 2038 date problem that will prove perhaps as troublesome as the year 2000 problem is proving to be.

The root cause of the problem is using end of file symbology which is potentially ambiguous. It would be useful to define a standard end of file representation that worked under all operating systems and on all hardware platforms and would not be misinterpreted as a date.

What comes to mind as a possibility for this might be using the infinity symbol "(" rather than a numeric sequence, although another possibility would be to create a specific end of file symbol which was used only for that purpose, and hence would have no possible ambiguity at all. Whatever solution is adopted, the problem of creating an unambiguous method to indicate file termination points is a problem that should be solved once and for all.

Although this problem has been solved on many operating systems, it is not clear how many older applications still use strings of nines with ambiguous meanings.

The Year 2000 Date Problem (December 31, 1999)

Many articles have been written on why the year 2000 problem will occur, so it is only necessary to include a short background discussion here. The root cause can be traced back to the early days of computers, when information was stored on punched cards or paper tape, or on magnetic tapes and the early disk drives used for mainframe computers.

Data storage was so limited and so expensive that any method that could save storage was readily adopted. Since no one in the 1950's or 1960's had any idea how long software would last, it seemed natural to store dates in two-digit form; i.e. 1965 would be stored simply as 65. This method was convenient and seemingly effective.

It is unfair to blame computer programmers for the year 2000 problem, even though it is the presence of the problem in computer programs that is troublesome. The year 2000 problem actually originated as an explicit requirement by clients of custom software applications and the executives responsible for data centers as a proven and seemingly effective way of saving money.

When the current two-digit date requirement actually became a U.S. military and government standard, programmers who knew that problems might occur were constrained by the standards to use the two-digit form. It is plainly not very effective to raise concerns about problems that won't occur until long after all of the executives and clients who might repair the situation have retired.

By the late 1970's and early 1980's it started to be noted that software applications were sometimes having remarkably long lives. For example, IBM's MVS operating system was approaching 20 years of age, as were a number of other widely used applications. Some tremors of alarm about date limits began to show up, but there was still no immediate serious alarm since the end of the century was 20 years away.

Some applications began their date field expansions in the 1970's, such as mortgage companies who had to deal with 30 year mortgages. Also, many insurance companies always utilized four digit year fields since their actuarial tables spanned more than a hundred years.

Unfortunately applications which continued to use two-digit years outnumbered applications which used four-digit years by roughly a ratio 20 to 1. The bulk of the two-digit years were in software applications which used dates for immediate purposes, rather than long-range trend projections. Thus two-digit dates were used in operating systems, embedded applications, and many government software applications as well as business software.

The year 2000 date problem began to attract significant public and professional notice in 1995 when credit cards with five-year expiration dates hit the year 2000 barrier and stopped being accepted. The temporary solution adopted was to cut back the expiration dates from five years to three years to allow credit card companies time to make repairs.

From that point on, year 2000 problems have occurred with increasing frequency as applications which look forward begin to run into the year 2000 barrier. The worst problems will occur at or near the end of the century when electric power plants, telephone networks, and specialized equipment with embedded software begin to encounter hidden date problems and malfunction. There will also be problems occurring before the actual end of the century for companies and government agencies whose fiscal years are decoupled from the calendar year.

There are a number of strategies for fixing the year 2000 problem. But it is now 1998 and there is no longer time available for changing date fields in software and data bases from two-digit to four-digit formats. A variety of "masking" approaches are being used which do not actually change the date format, but utilize external, outboard software tools to convert dates entering and leaving the application. Following are brief discussions of major year 2000 repair strategies:

Date field expansion: Expanding date fields from two-digit to four-digit form is the "classic" method for solving the year 2000 problem. This method provides a permanent solution, but has proven to be costly and difficult for many applications. Sometimes dates are indirect and hidden. For example, dates may be embedded in product serial numbers such as "1234984321" where digits five and six show the year 1998 embedded in a 10 digit serial number. If date field expansion is the primary method used, it takes between 36 and 40 calendar months to completely convert all of the dates for a major company, so unless you started in 1996 there is no longer time to use this approach except for a few key applications.

Windowing: The windowing method establishes a fixed interval time period, such as 1915 to 2014, and uses external program logic to deal with all dates within that period. Assume that your window runs from 1915 to 2014. Dates below the mid point of the window such as 03 or 10 are assigned to the 21st century as 2003 or 2010, while dates above the mid point such as 97 or 98 are assigned to the 20th century as 1997 and 1998. Windowing for a portfolio can be finished in roughly 18 calendar months, so this has become one of the popular methods with late starters. However, windowing exacts a performance penalty and assumes that everyone using the data or the application knows about the existence of the windowing routines.

Compression: It is obviously possible to use some form of encoding within the allotted two digit date space to represent any conceivable date. By using a binary or hexadecimal representation rather than a decimal representation, the available two digits date field can handle dates over almost any period. Here too the work could be accomplished in less than two years. However, compression requires knowledge of the specific compression technique used by all applications accessing the data. There will also be performance reductions, but not as severe as windowing.

Encapsulation: This method uses an external tool and simply shifts all dates downward by 28 years, so that the year 2000 would be represented as 1972. The rationale for using 28 years is that a 28-year shift will bring the days of the week (i.e. Monday, Tuesday, Wednesday, etc.) and the calendar dates (i.e. October 6, 7, 8, etc.) into correct synchronization. The encapsulation method has the advantages of being fairly easy to do and can be finished before the end of the century. However, here too there is a performance penalty. Also some dates are subtle and calculated by indirect means such as dates hidden in serial numbers.

Bridging: This is a hybrid method used for data-base applications where the software itself is converted from two-digit to four-digit form, but the underlying data base is not due to the excessive difficulties associated with data base date field expansion. A fixed or sliding window or encapsulation are used with the data base itself. Here too, a performance penalty is exacted. Bridging is also used among late starters because of the chance of finishing in less than two years.

Data Duplexing is a specialized method for dealing with data base year 2000 problems, without changing the date fields of all of the applications which reference the data. Two versions of a data base are created, with one version containing the original two-digit date fields, and the second or "cloned" version containing the same information, but the date fields are expanded to four-digit form. Unfortunately doing this for a large portfolio is about a 36 month undertaking, so the optimal time for data duplexing has expired.

Obviously data duplexing requires a lot of work in keeping both versions of the data base synchronized. Data duplexing is a rather complex and expensive strategy, but actually expanding the date fields in data bases is one of the most troublesome and expensive aspects of the year 2000 problem.

Object-code date interception: Experimental methods for intercepting dates in executable object code are being researched and are now entering the market. The object-code interception method is just being demonstrated in IBM mainframe environments, but does not yet have commercial tools for other platforms. However, object-code date interception only works for explicit dates, and not for hidden or obscure dates such as the example of a date embedded in a product serial number. This method is the quickest, of course, and might be deployed in less than a calendar year. It is the last hope of the laggards, but it will probably not turn out to be a "silver bullet" since so many obscure dates would be missed.

Other alternatives for dealing with the year 2000 problem are also running out of time. Replacement with commercial packages is an option, but this approach does not work for custom software. It is far too late to build major replacement applications. Therefore, now is the time to start contingency planning on how to deal with date problems that won't be fixed in time.

Incidentally, in the entire 50 years of the software industry there has almost never been a major software application released to users where 100% of the latent errors were found prior to deployment. The current U.S. average overall is about 85% of defects are removed and 15% get deployed.

There are roughly 36,000,000 applications running in the world which have year 2000 date problems in them. It is very naïve to think that 100% of these will be repaired in time. It is also naïve to think that for any specific application that 100% of the year 2000 date references will be found and repaired.

The "best in class" removal efficiency for coding errors is less than 99% circa 1998, and the average is below 95%. While year 2000 specialists with automated search engines might achieve 99.9% defect removal efficiency, there is no reason to believe that ordinary programmers will exceed historical average results of roughly 95%. There will be year 2000 problems present at the end of the century, and hence contingency planning is needed right now.

The Year 2000 Leap Year Problem (February 29, 2000)

The 365-1/4 day rotation of the earth around the sun means that it is not possible to develop a calendar with a fixed number of days. Roughly every four years another day has to be added. Of course the situation is really much more complex because the rotational difference is slightly more than a quarter of a day, so just adding one day every four years only works for a century or two.

There are three general rules for determining a leap year, but one of these rules is so rare that it does not often occur and few people understand it. Because of this third rule, the year 2000 is a leap year and this aspect of the year 2000 problem will cause trouble on February 29th, 2000 AD.

Rule 1: Years divisible by 4 are leap years.

Rule 2: Years divisible by 100 are not leap years.

Rule 3: Years divisible by 400 are leap years.

Thus the year 2000 is going to be a leap year based on rules 1 and 3. It would not be a leap year based on rule 2, but the year 2000 is one of those rare years where it is necessary to account for the fact that the solar year is not exactly 365-1/4 days but slightly longer.

The implications of missing a leap year can be quite disruptive of computerized software applications. The year 1988 was a leap year which was accidentally omitted by a software vendor of mainframe security systems. Starting at midnight on the 28th of February 1988, customers began to be locked out from their computers because February 29th was not considered to be a valid date. By the time the company opened in the morning hundreds of frantic telephone calls and faxes were arriving from clients all over the world.

The failure mode of missing a leap year is either to shut the application down completely, or to cause calculations to be double posted. In any case, this problem is quite troubling and needs to be dealt with.

Because the year 2000 leap year not a normal leap year but one determined by the "400 rule" it will probably be missed by more than a few software applications. Unfortunately the much more visible year 2000 problem has obscured the leap-year problem but it will be troublesome too.

As of 1998 the year 2000 leap year status is probably one of the best known in history. The problem is that when software applications were being constructed in the 1970's and 1980's, the fact that year 2000 is a leap year often escaped notice. Therefore many legacy applications may fail in February of 2000 even if they make it past December of 1999.

The 10-Digit Telephone Number Problem (Circa 2025)

The author's business telephone area code changed from 617 to 781 in January of 1998. This change of course necessitated reordering business cards, office stationary, and all other documents and brochures which contained our phone number.

The change also necessitated notifying all of our customers, suppliers, etc. At least 300 companies had to be notified, and this area code change probably affected at least 2000 data bases for the author's company alone. It is hard to reach clients and suppliers with 100% efficiency, and indeed a magazine attempted to fax page proofs to our old area code. Because it was a fax instead of a voice line, the operator did not hear the recorded message about the area code change and the page proofs did not arrive.

This incident is a foretaste of more serious problems which may occur circa 2025 when the number of telephone numbers begins to exceed the overall capacity of the number of digits available. In the United States three digits are assigned to geographic area codes, and seven digits are currently assigned to the telephone number itself.

By the first quarter of the next century, we will begin to exceed the capacities of both the three digit area code fields and the seven digit telephone number fields. What may be needed for long-range stability might be five-digit area codes and perhaps nine digits for telephone numbers.

When the saturation point for telephone numbers is reached, massive software update costs will be needed. Also, millions of hand-held personal information managers will become obsolete because they cannot handle expanded formats for telephone information.

This problem will not shut down computers and damage national infrastructures as will the year 2000 problem, but it will still trigger billions of dollars in software upgrade expenses and will make telephone communication less certain than desirable.

Other countries besides the United States are also approaching the need to expand both area codes and add digits to basic telephone numbers as populations grow and new businesses are created.

The UNIX and C Library Problem (January 19, 2038)

As another example of field size causing date problems, on January 18th of the year 2038 yet another date crisis will occur when the UNIX operating system and the C programming language internal date representations expire. UNIX stores dates in terms of the number of seconds accrued after January 1, 1970 using a four-byte storage area.

Using normal 32 bit storage this method works until UNIX time reaches 2,147,483,647 accumulated seconds, when a roll-over occurs. Thus the UNIX clock will roll-over on January 19, 2038 at 3:14:07 at which point it will seem to be 1970 again or at least the number of seconds from January 1, 1970 will seem to be 0.

Some applications may then revert to January 1, 1970 as the current date, but some may revert to a date of December 13, 1901 based on implementation logic.

Here too, the root cause of the problem is conservation of storage by using only four digits. Using six digits instead of four would have extended the useful UNIX and C date life for many thousands of year, but the use of four digits will cause another mass migration of dates in less than 40 years.

The C runtime library has a time function which reports time as a 31 bit signed integer. Jan Huffman of Software Productivity Research (SPR) has suggested creating a new data type which would be an unsigned integer, and hence allow the 32nd bit to be used for dates. This would provide an additional 68 years before roll-over occurs.

As of 1998 almost no press coverage is being given to the UNIX and C library problem because it is about 40 years away, even though this problem will be in the same magnitude as the current year 2000 problem.

If the UNIX problem follows the same pattern as the year 2000 problem, it will not be covered in the press until about 2033 and major repairs won't begin until 2036, when it is almost too late to get the affected applications updated before the roll-over occurs.

Date Expiration in Microsoft Products (2019 - 2078)

Because software from Microsoft is used in more computers than all other vendors combined, how Microsoft handles dates is a very serious issue. Based on Microsoft's congressional testimony in 1996 and the Microsoft year 2000 web site, information on all Microsoft products is available for review and analysis. Readers can start at the basic Microsoft web site and branch to relevant sections: http://www.Microsoft.com is the basic URL to get started.

Microsoft's internal standard is to record dates using four digits for the year, regardless of how the dates are displayed on screen. Users can select a variety of screen representation methods based on their own preferences. However if they select two-digit year formats then users are responsible for any problems this might cause for dates that run into the next century.

Although Microsoft states that all of their software is year 2000 compliant, there are still a number of date expirations which users should know about. For example the very popular Excel 95 spreadsheet package handles dates only up to 2019 using two-digit dates, or up to 2078 using four-digit dates. The Microsoft Project planning tool can handle dates only up to 2049. The Microsoft Access database product will stop at 1999 using two-digit date formats, but goes all the way to 9999 using four-digit dates.

According to Microsoft's congressional testimony, these rather close-in dates are going to be stretched out in future versions with the year 9999 being the stated end point for many new Microsoft product releases.

Microsoft was somewhat late in realizing the seriousness of the year 2000 problem, but is finally alert to its significance. However it would be useful to the scientific community if Microsoft adopted a more far-reaching date format which could deal with geologic and astronomical date processing.

The Social Security Number Problem (Circa 2050)

The use of unique national identity numbers such as the social security number in the United States poses a very difficult long-range challenge for software applications. The reason for the problem is that unlike many other numbers (i.e. telephone numbers) the social security numbers are "retired" once they are used and cannot be reassigned. Currently even social security numbers assigned to people who have died cannot be reassigned.

The current format for the U.S. social security number is nine digits long in the format nnn-nn-nnnn. This format uses the first three digits for state identification. The capacity of the social security number system is about 1 billion unique numbers. A report by Dr. Clifford Kurtzman (Kurtzman 1997) indicates that about 383,000,000 social security numbers have been assigned to date, and about 6,000,000 more are being assigned each year. Thus in theory the number of digits should serve for another 75 years.

But consequences of losing the integrity of the social security numbering system are very profound and will affect all forms of financial and government applications. Therefore it would be folly to wait until the last minute before taking remedial action, as we have done with the year 2000 and Euro-currency problems. Therefore the author recommends a target date of 2050 for creating an expanded schema for handling social security numbers.

It is obvious that some form of universal personal identity code is in the wind. When planning the expansion or replacement of social security numbers it is obvious that other issues need to be addressed. It would be a serious mistake just to add one more digit, when a more thoughtful solution is needed.

Whatever the solution adopted, a change involving social security numbers will have a major affect on millions of software applications and hence trigger expenses of billions of dollars.

Incidentally, although the U.S. social security number is cited in this article, the same form of problem occurs in virtually every nation in the world. The general problem is the assignment of too few digits to handle unique citizen identifications given normal population growth for more than 50 or 100 years.

Costs Associated With Date Repairs and Damages

The economic justifications for developing a new date and numeric information standard are the high costs which will be accrued between 1999 and 2050 for software date changes due to limitations of the current standard date and numeric formats. Let us consider the implications of the failure of the current dates.

Table 1 is a hypothetical projection of the numbers of existing software applications on a global basis which contain various kind of date and numeric problems which will require changes because of format problems.

Because dates and numeric data need to be fixed before their formats expire, but this will probably not happen, table 1 also gives an estimate of the number of applications with date problems that will miss their deadlines and not be fixed in time.

Table 1: Projected Numbers of Applications With Date Problems

Date Problem	Applications With Problem	Repaired in Time	Unrepaired Format Errors	Years of Main Impact
Year 2000	36,000,000	80.00%	7,200,000	1999-2001
Phone numbers	25,000,000	85.00%	3,750,000	2000-2025
Euro-currency	10,000,000	75.00%	2,500,000	1999-2005
Social security	15,000,000	90.00%	1,500,000	2050-2099
UNIX rollover	12,000,000	90.00%	1,200,000	2036-2038
End of file	4,000,000	90.00%	400,000	1999-2001
Leap year 2000	2,000,000	90.00%	200,000	1999-2000
GPS Rollover	250,000	98.00%	5,000	1999-2000
TOTAL	104,250,000	87.25%	16,755,000	1999-2099

Table 1 is sorted by the column labeled "unrepaired format errors" on the grounds that the greatest volume of unrepaired dates and number formats are likely to have the greatest damage potentials. Table 1 has a large margin of error but its underlying message is valid: date and numeric format problems in software are plentiful and troublesome.

As of 1998 it is obvious that it is far too late to approach 100% readiness for either the year 2000 or the Euro currency problems. At a global level, we will be lucky if even 80% of applications with date problems are ready for the year 2000. The Euro-currency situation is even stickier, and my projection here is that no more than 75% of the world's financial applications which deal with currencies will be ready in time.

Starting in 1999 and running on through 2050, computer and software date and numeric problems are going to absorb huge numbers of scarce software personnel who really should be doing more positive things. Indeed, fixing date and numeric problems is such tedious work that it hard to find people to do it without substantial pay and benefits packages.

Unfortunately a significant percentage of software personnel, possibly more than 50% of the entire software work force, will begin to spend more and more time on date and format repairs and will not be available for new applications, new functional enhancements, or work that adds positive value to business and government software.

Since the software industry has a bad track record of finishing anything on time there are strong reasons for assuming that many applications with date problems won't be updated in time.

The costs for all of these date problems will not be known with certainty until they occur, but the projections are very alarming:

•More than a trillion dollars will be spent on the year 2000 problem before it occurs, and more than two trillion dollars may be spent on damages, recovery costs, and litigation afterwards.

•More than four hundred billion dollars may be spent on Euro-currency updates, and more than six hundred billion dollars on damages, recovery costs, and litigation. Overall, the total costs associated with the Euro-currency situations could easily top a trillion dollars on a global basis if both pre-Euro costs and post-Euro damages and litigation are included.

•Given the pervasive nature of telephone communication, about a billion dollars a year is already being spent due to the frequent changes of area codes. These expenses will begin to escalate as the capacity of current telephone numbering schemes begin to approach the saturation point. The total costs can top $250 billion for software, and probably cause the premature disposal of several billion dollars worth of hand-held personal information managers (PIMs) which cannot make the transition to an expanded telephone number.

•Software cost estimates for social security numbers, the GPS date roll-over, the "nines" end of file problems, the year 2000 leap year, and for the UNIX and C library date roll-over have not been published. But all of these are likely to be in the multi-billion dollar range with the possible exception of the GPS date roll-over which may be smaller. All of these other date problems added together might top yet another trillion dollars.

The bottom line is that date and numeric format problems are becoming a black hole of software costs which will absorb far too much money and too many scarce resources. There is a strong economic justification for wanting to develop permanent date and numeric format solutions for software and computer purposes.

Summary and Conclusions

Computers and software are the major tools of business, commerce, science, and industry. Accurate date and time recording are important activities for both business and science. Also important are accurate storage of other numeric information such as telephone numbers and social security numbers.

It is obvious that computers and software are now the primary tools for numbers and for date and time recording, but unfortunately all of the current methods for handling date and time storage within computers are inadequate.

Neither the ISO nor other default date standards are fully adequate for business purposes, and are not adequate for scientific dates at all.

Neither the current GPS nor UNIX date mechanisms are fully adequate, since roll-overs due to storage limits will cause problems in 1999 and 2038 respectively.

Since computer and software date storage, date calculations, and date representation are far more important than printed dates, it would be valuable to have an international date symposium which would develop a new international standard for computerized date and time storage, and for other kinds of important numeric data such as telephone and social security numbers.

Several forms of date storage need to be included, and the standard should support both normal calendar dates and also the method of recording seconds from arbitrary starting points. This new standard should accommodate the needs of science as well as the needs of business, which means it must work over spans of billions of years in both future and past directions.

The justification for developing such a standard can be seen in the expenses that are already accumulating for the GPS roll-over, the year 2000 date problem, and will also accumulate for the UNIX 2038 problem. It would be hazardous to go forward into the 21st century without an adequate standard for dealing with dates and time storage in computers and we do not have one currently in 1998.

The enormous expenses which the software industry and all other industries are now facing due to date problems provides a strong economic reason for wanting to develop a new "super date" standard which can accommodate older date representation methods and facilitate date conversion logic among all date recording methods and all calendars.

References and Readings

Allan, David W., Ashby, Neil, and Hodge, Cliff; "Fine Tuning Time in the Space Age;" IEEE Spectrum, March 1998; pp. 42-51.

DeJager, Peter and Richard Bergeon; Managing 00 - Surviving the Year 2000 Computing Crisis; John Wiley & Sons, 1997.

Jones, Capers; Assessment and Control of Software Risks; Prentice Hall, 1994; ISBN 0-13-741406-4; 711 pages.

Jones, Capers; Patterns of Software System Failure and Success; International Thomson Computer Press, Boston, MA; December 1995; 250 pages; ISBN 1-850-32804-8; 292 pages.

Jones, Capers; Applied Software Measurement; McGraw Hill, 2nd edition 1996; ISBN 0-07-032826-9; 618 pages.

Jones, Capers; The Year 2000 Software Problem - Quantifying the Costs and Assessing the Consequences; Addison Wesley, Reading, MA; 1998; ISBN 0-201-30964-5; 303 pages.

Jones, Capers; Software Quality - Analysis and Guidelines for Success; International Thomson Computer Press, Boston, MA; ISBN 1-85032-876-6; 1997; 492 pages.

Jones, Capers; "Estimating Rules of Thumb for the Year 2000 and Euro-Currency Projects"; SPR Technical Report; Software Productivity Research, Inc.; Burlington, MA; January 1998.

Jones, Capers, "Resource Conflicts Between the Year 2000 and Euro-Currency Software Problems," Year 2000 Journal, January/February ;1998; Vol. 2 No. 1; pp. 63-70.

Jones, Keith; Year 2000 Software Crisis Solutions; International Thomson Computer Press, 1997.

Kappelman, Leon (editor); Solving the Year 2000 Problem; International Thomson Computer Press, 1997.

Kurtzman, Dr. Clifford; "Frequently Asked Questions about the Year 2000 and Similar Problems;" Tenagra Corporation, Houston, TX; December 1997; http://www.tenagra.com or http://www.year2000.com.

Lefkon, Dr. Dick (editor); Year 2000 Best Practices for Y2K Millennium Computing: Panic in Year Zero; Mainframe Special Interest Group (SIG) of the Association of Information Technologies (AITP); New York, NY.

Murray, Jerome T. and Murray, Marilyn M.; The Year 2000 Computing Crisis - A Millennium Date Conversion Plan; McGraw Hill, New York, NY; 1996.

Ragland, Bryce; The Year 2000 Problem Solver; McGraw Hill, New York, NY; 1997.

Robbins, Brian and Rubin, Dr. Howard; The Year 2000 Planning Guide; Rubin Systems, Inc.; Pound Ridge, NY; 1997.

Rubin, Dr. Howard; Survey of Year 2000 Preparations in Fortune 500 Companies; Meta Group, Stamford, CT; January 1998.

Ulrich, William and Ian S. Hayes; The Year 2000 Software Crisis - Challenge of the Century; Prentice Hall, Yourdon Press; 1997.

Yourdon, Ed and Yourdon, Jennifer; Time Bomb 2000: What the Year 2000 Computer Crisis Means to You; Prentice Hall PTR, Upper Saddle River, New Jersey; 1998.