Millennium Madness???

 

By William J. Cook, Jr.

 

------------------------------------------------------------------------

 

Living and Breathing Technology:

 

Put bluntly, Year 2000 computer problems should be all of our concern. Even if you are not directly involved with programming computers, your life will still be affected by the resulting failures in the technological fabric that binds modern society. Computers control many of the routine aspects of our daily lives. Things we have grown accustom to always working, are managed by computers.

 

For instance, consider how different your life would be if you could not trust traffic lights or elevators. What if everytime you wanted to buy something of significance, you had to sit down with a banker, fill out a loan form, and withstand a credit check. Instant credit, building security, and public safety systems are now largely, if not exclusively, controlled by computers. These are three immediately obvious examples. There are thousands of others that, like some giant alarm clock, are poised to "go off" in about three years time.

 

Computers are a relatively young phenomena. While used in large businesses for almost 40 years, most automation is less than 20 years old. Even as recently as only 10 years ago, computers were unattainable by the majority of the population.

 

There have been upwards of 100 million computers sold since 1980 in the United States alone. We must all recognize our reliance on the very enablers of our current standard of living. It is our combined responsiblity to bring together resources to ensure their continued reliability.

 

------------------------------------------------------------------------

 

History:

 

I remember when the phone bill came on an IBM punch card. On that card was recorded your name, account information, and the amount of your bill. The programmers of that era had exactly 80 characters of data to represent this information. They needed to be constantly thinking of efficiency and economy of data. Computer memory was probably one of the most expensive commodities on the planet except that it wasn't common, it was rare and proprietary to each computer. Magnetic storage was limited to tape drives. It was slow and serial in nature. Everything was batch.

 

In the mid-1970's, when on-line systems became more wide-spread, the traditional format of 80 characters of information was brought forward from the punch card. Character based display technology supported 24 lines of display per screen. Combined with display screen technology was an increase in the availability of magnetic disk storage. As with most new things, the "disk pack" was extremely expensive. It was encumbent on the professionalism of programmers to make their usage of memory and storage as frugal as possible. Storing redundant and irrelevant information not only wasted valuable resource, it reduced the efficiency of "key punch" operators who were hired for their typing skills over their computer literacy. Storing century indicators on dates was irrelevant.

 

As computers gained in performance and economy, it never occurred to anyone that it was time to trade in this efficiency for accuracy. They were finally returning on the investment. The same reasons for making systems efficient in the 1960's and 1970's were still valid.

 

------------------------------------------------------------------------

 

How We Got Here:

 

If you have ever written a computer program, you know that they are rarely written from scratch. In most business cases, there is already an existing model of data and a library of code that can be cloned as a form of reuse. These assets of data and process did not occur overnight. They were developed carefully and thoughtfully over decades.

 

"Backward compatibility" is not only a requirement in systems, it is a mark of flexibility and quality. Everything new needs to work with what came before it or else it does not pass acceptance. Without this smooth transition, change occurs in fits and starts combined with high risk and cultural upheaval. I would establish the premise that building flexibility into systems today is not only to enable the future but to accomodate the past. So, because nearly everything that exists today is a result of what came before it, the problems and limitations of those past systems are carried forward with "new" systems as well.

 

------------------------------------------------------------------------

 

Date Arithmetic:

 

In a typical business calculation, you might need to know the age of someone on a particular date. To do this, you subtract the person's date of birth from the current date. This effectively gives you the difference in years between two dates. Here's an example using the YYMMDD format.

 

961115 - 601225 = 35 years.

 

The trailing numbers in the result are irrelevant. As long as the months and days in the date of birth are greater than the months and days of the current date, the number of years calculation will be accurate. Therefore, even if you subtracted the year 60 from 96, the result is still only 35.

 

If you continue to use the YYMMDD format for date representation, you will eventually hit the following calculation.

 

000101 - 601225 = -60 years Youch!

 

Many arrows are being shot in the direction of programmers with people wondering what they were thinking when they came up with this format for using dates. As was explained above, this was the most efficient way to store data using the fewest number of characters to fully represent the context of the date. For now, most systems are working correctly without incident. The first day of the last year of this century will change all of that. The consequence of this cannot be understated.

 

------------------------------------------------------------------------

 

The Hype is Ripe:

 

Now that you understand the parameters of the problem, it is time to discuss the increasing attention this issue is getting.

 

Make no mistake, this problem is very, very real. Due to the historic reasons I cited above, the majority of data in use in business and government is stored in the YYMMDD format. It is primarily in a numeric format so math can be performed on it easily. For the majority, it is not standardized by any body that has enforcement authority over its use. In the absence of this authority, creativity and culture have rushed in to fill this vacuum. This creativity has created an environment where no single solution can be applied to all cases. This is referred to as the lack of a "Silver Bullet."

 

Date related logic is estimated to be present in 80% of the computer programs in existence. Of this 80%, between 5% and 10% of the source lines of code (LOC) contain date processing. There are 3 trillion lines of COBOL code in existence. There are 900,000 COBOL programmers world-wide. The replacement value of COBOL code makes it the world's single largest asset. For this magnitude of assets and resources, it is estimated that the cost to fix this problem is between US$400 billion and US$600 billion. While most of these statistics relate to COBOL, the problem is ubiquitous across all platforms, programming languages, operating systems and networks.

 

The Gartner Group has estimated that less then 50% of the organizations that attempt to eradicate the Year 2000 problem from their systems will be totally successful. Total success may not be required to prevent loss of business, but it will be a continuing expensive, and confidence reducing problem, inhabiting systems until they can all be eventually evolved to a higher existence. This crisis of confidence will pervade our society as trusted systems begin to fail.

 

The main reason to believe the hype surrounding this issue is that there is nothing to be gained by it. For the end user community there is nothing here that will add value to their organization. For the systems professionals, they are sealing their future existence in the present with even less opportunity to try new techniques and build new skills. After investing US$600 billion to reinvigorate the "legacy" environment, there will be very few companies that will be able to solicit new development funds to replace this rejuvinated asset. Anyone predicting the demise of legacy mainframe systems should probably switch to another area of interest. The mainframe's survival is assured.

 

Some people have accused the growing legions of consultants and Year 2000 service providers of fanning the flames of fear. The cost of repairing this problem is nowhere near the cost to organizations that will cease to function if this problem is not addressed. Money made in Y2k conversions will be made on sheer volume, not price gouging. In my opinion, there is nothing to be gained by hyping this problem except to force humanity to violate one of its basic survival skills, denial.

 

As we move closer to the last day of 1999, the cost to correct the problem will go up dramatically (geometrically?). This will be caused by a lack of human and technology resources to adequately perform the work.

 

In a flurry of desperation, programmers will be lured away from whatever project they are currently working on to stabilize the companies in the most amount of pain. It will be a continuing challenge to retain employees before they are lured away by the potential for more salary than they will ever be offered again. This is a classic example of natural selection.

 

On the hype-o-meter, Year 2000 (Y2k) will replace "the internet", data warehousing, client / server, and even Elvis.

 

------------------------------------------------------------------------

 

What are the approaches to fixing the problem?:

 

In our existence, there are certain universal properties. One is our position in space, another is our position in time. Denying the reality of either of these is currently not possible within our understanding of quantum physics. All physics aside, it's just not practical.

 

The three approaches to dealing with the problem are:

 

•Full Date Expansion

•Windowing

•Century Indicator

 

I'll take these in reverse order.

 

Century Indicator

 

The idea of using a century indicator is that you can save space in your direct access storage devices (DASD), also known as harddisk space, by only having to add a single byte (character) to each of your date fields. Then, when you read the data into your programs, you match the century indicator to either 19 or 20 and process the data. This requires a lot of procedure level coding to interpret and convert the data. You do save disk space with this method. For the expense of avoiding adding more storage, you assume the risk and incompatibilty of having your data in what is essentially a coded format. All utility programs and anything that uses your data has to have logic added to it to process these codes. This is a hardware-centric approach which is really not where the expense and problem space is in Y2k conversions. So this is bad idea number one.

 

Windowing

 

What this means is that you pick a "window" of years as a conversion factor. For example:

 

•If year is less than 50 then it must be 2000-whatever.

•If year is greater than or equal to 50 then it must be 1900-whatever.

 

You have to make this decision at some point during your conversion. You either do it when you are bulk converting your data or you do it in real-time as your are processing your data. Depending on the span of years you are representing in your data, this may work. But again, there is overhead added to your processing for every record and every date field in the record. The positive aspects of this approach is that your data records can remain in YYMMDD format. You expand the record format to YYYYMMDD as a result of your windowing logic. This approach supports programs that have been expanded with an explicit century indicator and it supports programs that have not been converted yet so it is a better approach to the first example which encrypts your data and your programs. My personal problem with the windowing approach is that you are adding logic and overhead to your target system. The thing that you are going to have to live with, now has this incredible additional processing overhead added to it to support something you are moving away from. All of your processing and utility programs have to include this logic raising your conversion and testing costs. Bad idea number two.

 

Date Expansion

 

This is the lesser evil of all three approaches. While it is initially the most expensive to implement, I think over time, it will be the most cost effective way to process your systems. As I mentioned above about the acceptance of the reality of your current position in space and time, this is really the only way that truly presents the information that is locked up in your data. It also removes any ambiguity in the usage of your data. In the windowing and century indicator approaches it is recommended that you only interrogate and expand the fields that you actually use for calculations and comparison operations. Well, this may work for one program, but another program may use the record format in a completely different way. This implies that the indicator and windowing techniques are unique to each program. I don't know about you, but I think these conversions are already difficult enough.

 

The proposal that I am recommending is that you fully expand all date datafields to represent the proper context of the data. You can do this in a bulk operation to materialize the data into a target format. Then, as you convert your programs, you refoot them on the converted datafiles for processing. To maintain concurrency, you use replication between the target and legacy formats of the data with your windowing logic in the replication process.

 

In this method, you absorb the overhead of the translation in your replication processing. Your native programs and utilities work perfectly with this approach because they don't know or care about the date related windowing and conversion logic and all of their date related rules apply because the context of the data hasn't changed. You can therefore reduce the amount of analysis you do per program and you also can have teams of programmers working on systems that support the same record format because everyone is working from the same set of assumptions.

 

But that's not all....

 

What is really enabled through full date expansion and replication strategies is project management. By providing an environment that supports the converted programs, and keeping that in synch with the rest of the production data, you remove the dependencies between systems and data and you can stage programs back into production as soon as they are completed. This removes the risk of maintaining duplicate libraries of source and also exposes your converted code to real-world processing environments with time to tune and correct your mistakes.

 

Managing risk in an environment where you are converting almost all of your source code and practically all of your data files is a very difficult task. Imagine you checked out 1000 programs and 50 datafiles, converted them all and then tried to stage them back into production. Now imagine that your balancing routines indicate that there is a data corruption. Where in the 1000 programs and 50 datafiles would you look first? What happens when you find the problem? The answer is very expensive regression testing of the whole system, it's inputs, outputs and reports. It is just wacky to consider the windowing or indicator approach.

 

------------------------------------------------------------------------

 

What is the best way to proceed?:

 

To remove all ambiguity in your systems, you have to maintain the context of the information in the data. You have to allow for every on-line, batch, utility, export, and exchange of data in your system. If you want to go through the laborious task of coming up with routines to translate your data for all of the above functions just to save a little temporary disk space, then I must be missing something critical here.

 

Continuing with this point... Over time, change is inevitable. If you have embedded routines to decrypt your date related information, then you must inherit this capability into all of your future systems.

 

Wrapping up this point....

 

•Accept the reality of the timeframe in which you process data.

•Move quickly to your target data format to manage risk.

•Protect your legacy data from your converted process until you can eliminate the usage of legacy formats.

•Build your new systems on your converted data formats so you can obsolete the old formats.