mainframe migration disaster

A few years ago, I was speaking at a conference on the west coast—attending were mostly IT executives from startup companies, and it was a lively group. Also, some IT folks were in attendance from some more established companies. Afterwards I sat beside one such chap at a sponsored evening at a nice restaurant. He attended the conference as he wanted to build a new call center on the west coast for a hotel chain that was looking to expand into other regions, and we got talking. As it turns out, he had an interesting mainframe migration story to tell, and after a few drinks he gave me all the juicy details.

It started off by me telling him what my company did: mainframe optimization. He replied, “Where were you guys three years ago?” He proceeded to tell me that they had recently moved off the mainframe: they had a 480 MSU mainframe system that managed their reservations, property management and payroll activities—the reservation activity accounted for about 80% of their MSU usage. Their reservation application was a custom app integrated into their property management system, and they could do some really intensive analysis on patterns within the year, room night rates—all kinds of things.

Their biggest concern was that the cost-per-reservation of their current mainframe was creating a problem for them because of their size. He told me that they spent about $2500 per MIPS, but he suspected that the real cost was actually closer to $2800. Before I go on, I must be fair and state that this was at least a year before IBM came out with the zEnterprise BC12 and their improved pricing metrics. So at that time, he felt that increasing costs were forcing his hand, and that he had no option but to move to Unix boxes to help mitigate his costs. Another concern was that (at that time) mobile reservations were increasing sharply, and they didn’t have a mainframe solution to handle it. Of course, that’s all changed now, but it was an important deciding factor for migration at the time.

Before the migration, they were handling about 1.1M reservations per year, so each reservation cost them about $1.25. When you factor in cancellations and such, it was probably closer to a $1.30. They saw themselves going through growth and really needed to upgrade their mainframe to keep pace. But with the pricing available at the time, coupled with their specific growth pattern, their costs were going to increase to about $1.50 or $1.55 per reservation, just based on the system upgrade.

It is important to note that he is an accomplished IT professional who really appreciated what his mainframe system could do in terms of throughput and reliability, but he also knew that (at that time) you needed to be running at least 2500 MIPS to run it effectively. He also knew that a machine of that capacity would have been serious overkill for their needs, even accounting for their growth projections.

So they made the decision to move to a distributed systems solution from a well-known reservation systems solutions provider, and began the migration process. The vendor’s solution was fairly close to what they needed to do, but not 100%, and did not provide all the custom room-night analysis capabilities that their mainframe application had. The vendor insisted that they could obtain the same data out of their solution, and use a spreadsheet to do the analysis.

When they did the math initially, it looked like the cost-per-reservation would actually go down to about $0.65, and likely even lower than that. The assumption was that they could enter into an agreement with a company that would supply them with servers during peak periods, and the cost would actually decline as they built up the number of room-nights resulting from their expected growth. The idea was that the cost would go down to about $0.50 per room-night at about 2M reservations per year, but because of some clustering issues it would actually jump back up again to about $0.65, and then decline again.

It all sounded pretty exciting and impressive, that is until things started going wrong. The initial phase of the project took two years longer than planned. They spent $160K more with the vendor in professional services to fix “crap that the guys promised us would work anyway,” and they still could not get the reports to do what they were hoping for—and had running on their mainframe before—so for the most part, they had no analytics.

That was just the start. There were several times throughout the project when he was certain that he was going to be fired. In one incident, the new system lost all their reservations for a six-week period during an upgrade and they had to manually go back through reports to rebuild it—a process that took three IT people working 65-hour work-weeks, and taking more than three weeks to complete. But the worst incident occurred during Thanksgiving, an extremely busy time for the hotel chain, as families often book reservations soon after Labour Day for Thanksgiving as they want to plan the next big get together. Their new systems went down at the worst possible time, again losing reservation information, resulting in an unmitigated business disaster. Their results for the Thanksgiving period that year were only 8% of the previous year: the system reported to potential customers that they were full when they were not, so all those reservations were lost. This impacted the corporate bottom line, and as a result 15 people were laid off at their head office—their largest single layoff in company history.

To justify the project, he felt compelled to promise the corporate management team that he could reduce the reservation cost from $1.50 to $0.65, an annual savings of nearly $950K. This was something the vendor had promised him, but the new system failed to deliver. Even so, the story had a reasonably happy ending; he didn’t lose his job, and he eventually solved most of his problems.

Ultimately, this story is about bad planning, perhaps on the part of the hotel tech team, but certainly on the part of the vendor. While it may be true that a hotel chain of this size might be better served by distributed systems (it certainly was true at the time), if the zEnterprise BC12 were available then, along with today’s more favorable pricing, they would still be running on the mainframe today, and would have suffered none of the pain that they endured over the migration process.

At the time, I asked my colleague if I could write about this and he said that based on the issues he was having with the vendor, the serious problems experienced with the migration, and his personal involvement, that he would have to say no, which I have respected (now, years later, I write about it without naming him, his employer, or the vendor).

This is one of the reasons that you rarely read detailed accounts about failed IT projects—particularly mainframe migration projects. And as mainframe migration disaster stories go, this one isn’t really that bad. I’m sure that some of you have heard of the $100M motor vehicle licensing agency disaster, the $50M oil company disaster, the $100M retail chain disaster, and so on, but nobody really wants to put their name to them. And who can really blame them?

Regular Planet Mainframe Blog Contributor
Allan Zander is the CEO of DataKinetics – the global leader in Data Performance and Optimization. As a “Friend of the Mainframe”, Allan’s experience addressing both the technical and business needs of Global Fortune 500 customers has provided him with great insight into the industry’s opportunities and challenges – making him a sought-after writer and speaker on the topic of databases and mainframes.

2 thoughts on “A Mainframe Migration Disaster Story”
  1. I recall reading in the late 80s stories on WSJ about a data conversion effort in (the original) Bank of America that almost brought down the bank. I have not been able to find proof that such an effort existed.. even talked to current BOA employees – and they are unaware of it – so I’m not sure if I imagined it or if its just not that well known. However, point is… software upgrade or conversion efforts can kill even the strongest companies.. so must be done with that in mind. Also, the current MF is a workhorse that has proven that it delivers – and yet more and more companies are leaving it for the promise of the web/cloud / distributed based systems that have yet to prove anything…. especially when the actual proof of the effort is in the bottom line operational costs of operation and maintenance… in this story, the hero was able to finally bring the costs down to his promised .65 per transaction. Many times, that is not the case.

    1. Thanks Larry for a great point. I wonder if the application at the bank was once again an application of high complexity that was actually an application designed to help the bank make revenue as opposed to reduce cost. Many times this subtle issue is easy to make an oversight but is the magic difference that makes these projects become disasters. At the end of the day – if we don’t get the cost we want – life will continue. But, if revenue suddenly stops, then the company is instantly into life support. I liken it to if you saw your doctor and the doctor gives you a band aid that is too small – likely not a big deal. Maybe your cut doesn’t heal the right way, but you move on. If your doctor makes a mistake serious enough that it almost kills you – well different story.

Leave a Reply

Your email address will not be published. Required fields are marked *