Probably not – because that’s not the way we work – or not the way we’ve worked for decades, until companies started using agile development methodology – all through the entire IT organization. How many are agile? A fair number. How many are agile all the way through? Well, considerably less.
A personal disconnect story
I’ll give you an example of how this works. A few years ago, I wanted to ask someone at a big Wall Street investment bank what my client’s product was helping them to do better. My client and I were pretty sure that it had something to do with high-frequency trading or programmed trading, but we weren’t absolutely certain.
I called a CICS programmer who had been involved in their programming projects, and when I asked him what the product was doing, his answer was both disappointing and surprising. He told me that my client’s product was being used for something inconsequential – loading data into tables, or some such thing – and that it was quite unimportant. Not central to the company’s efforts at all.
This explanation didn’t quite ring true to me. Why would an IT organization use a small vendor’s product to handle one unimportant thing? Surely they could swap out one vendor’s product with another larger vendor’s product, to make the vendor list smaller? You’d think.
Initially, I thought that he was lying to me; or rather, being a responsible employee, not telling an outsider the client’s best kept secrets. But in actuality, the programmer was probably being truthful. From his perspective – without knowledge of the actual use case – the product was playing a very minor role. It just so happened that this ‘minor role’ was of huge importance to the business. But what it actually did for the business – he did not know – rather, he did not need to know.
And that can end up being a major problem.
As the business leaders were gradually replaced over time (retired, fired, promoted, moved on, etc.), nobody really knew the details anymore. New IT managers saw a single product from a small vendor doing one seemingly inconsequential thing. So they tried to pry it out – one less vendor is a good idea, right?
Eventually they learned that they couldn’t take it out. They tried to remove it – replacing what it did using similar (but less effective) techniques with a different product supplied by one of their larger vendors. But no matter what they did, things ran slower – a lot slower – so they had to put the small vendor’s product back in place. As it turns out, they’d been trying to get rid of it for years.
More recently, they have replaced much of the software and infrastructure with newer technology. The problem is that the performance of this one little piece cannot be duplicated using the newer technology, at least not without a lot of work – work that will not deliver any new value. So for now, they’re still running the mainframe to run a small number of key applications, along with their new server networks. They will be able to get rid of it, of course, but it will cost them a lot more than they thought it would.
Disconnect in Flash Boys
Have you read Flash Boys by Michael Lewis? In it he talks about the importance of a millisecond – a microsecond – and a nanosecond, and the importance of ‘fast’ to a business. He also talks about the disconnect between business and IT. But he also makes an assumption that “legacy” technologies are “old and slow” and newer technologies are better and faster. I think he missed a bit on the last point, which is fair, since most people do that, including a large number of IT professionals.
Lewis described how everything revolved around the speed of the SIP (Securities Information Processor), and how high frequency traders had newer and faster SIPs than the exchanges (this is still true, btw). He also describes how organizations obsessed with speed (for good reason) would locate their SIPs as close as possible to the NYSE and the other various exchanges, and would measure and reduce cable runs everywhere they could, including within the server room. All this to shave off microseconds per trade, the idea being that high-frequency traders need to see the market before anyone else does to be successful. This is where they are defeating the disconnect – when people understand a problem completely, from business to IT implementation, they will probably succeed.
But the great disconnect is also hurting them, when the players assume that their “new” technology platform is inherently more suitable or better than their “legacy” systems.
The real mainframe
The truth is that mainframe architecture (the ‘old and slow’ legacy technology) is better suited to lightning-fast trade processing than any other platform, in fact it was designed for it. On the other hand, the ‘newer, faster’ technology is distributed computing, which is, well, distributed.
For one thing, the internal structure of a commodity server is such that I/O, processing, bus management, etc., is all handled by one processor (or stack of processors). Meanwhile, the internal structure of the mainframe is different –separate processors for I/O, separate processors for bus management, separate processors for memory management, and more importantly, separate processors for program processing. And that is the mainframe’s secret sauce – dedicated processors for running fast processing.
Further, mainframe systems are fully integrated, while distributed systems use commodity components. For example, Intel makes processors, Microsoft and Redhat make operating systems, Oracle makes databases, etc. On the mainframe side, one vendor makes all of the components; CPUs, memory, I/O subsystems, firmware, databases and transaction subsystems– they’re designed specifically to work together.
More than that, mainframe parallelism is more robust – imagine 170 cores sharing 32 TB of memory in a single footprint, in the same box. Not fiber running through cable trays and along the floor. Finally, mainframe processors generally run at higher clock speeds.
But newer is better, yes?
One might counter with, “Come on – newer technologies are always better, aren’t they?” That is a fair question, but it illustrates a fundamental misunderstanding of mainframe technology. You see, mainframe technology is not actually old – the newest mainframe came out last year; IBM’s z14. And more recent than that, the Z14 model ZR1.
Further, parallel processing on distributed servers solves many performance problems, but some it cannot – because parallel processing has to be managed – and that management is overhead – and that comes at a cost – and that can sometimes be a great cost when microsecond real-time events are mission critical. Like high-frequency trading or programmed trading.
And then there’s reliability. Mainframe assets are still being used by most of the largest banks on the planet, and that is because they are performance rock stars, they’re highly secure, they’re actually more cost-effective, and they’re reliable. And for that very reason, they have tremendous staying power.
It is true that many industries, and many small and medium sized businesses have moved off the mainframe. For the most part, these businesses are okay with the risks of 99% reliability instead of 99.999% reliability. And most people are.
For example, I have never experienced a failed transaction on eBay or Amazon; and if I did, I’d probably know it, and reinitiate the transaction – that is, after I’m sure that the failed transaction didn’t actually go through. On the other hand, I’d really prefer it if I NEVER experience a failed transaction at my bank – like when I deposit my pay, or pay my mortgage. I really, really need that to go off without a hitch. It is far more important to me than my once-a-week or once-a-month online purchase.
Beyond that, the bank really needs that 99.999% reliability as well. Just imagine a handful of people every week discovering that their pay checks didn’t make it into their accounts, and mortgages didn’t get paid. What kind of reputation would that garner? Just imagine – “My bank lost my pay again…”, or “My mortgage didn’t get paid, and the sheriff is at my front door.”
The banks need that extra 0.999% reliability, and they’re not giving it up. And that extra 0.999% is very expensive, no matter what technology you’re using.
The best of both worlds?
So, if the mainframe is so great, then why is it not being used by the newest and latest concerns (Amazon, eBay, etc.)? One word: bias. Whether intentional or through ignorance, there is a great deal of bias against the mainframe. It’s too expensive! (No it’s not.) It’s old and dusty! (No it’s not.) It’s hopelessly outdated! (No it’s not.) I don’t know very much about it! (Now we’re getting somewhere…)
But there are some reasonable concerns about the mainframe- one is that it is not suited to smaller operations. In fact, it is too expensive for most small companies running smaller workloads. And it might actually be old and dusty and outdated if you don’t upgrade for years.
Another legitimate concern is outdated code. In some cases, the original COBOL programmers are no longer around, and the original IT managers in charge of developing those assets are long gone. And for the most part, the business managers initiating these projects are also gone. Running code that nobody understands is a very serious disconnect; worse, it is a giant risk to the business.
In other cases (like in the biggest banks), that code is fine; it runs all day, all night, efficiently and cost-effectively, and it is supported by IT resources. That’s why they keep it. In still other cases – like the case of the high frequency traders – it’s better to dump the code and to start over.
Now, if you’re so desperate for speed that you string fiber cables through the center of your datacenter to save a microsecond, why wouldn’t you use a computing platform designed specifically for transaction processing? Like, for instance, a platform that has dedicated processors for workload processing. Why not run your new, tight code on that platform? Well, if you want to save a microsecond here and another there, then you really should also be using the mainframe to process your transactions.
How do you reconnect?
So how does an IT organization avoid the great disconnect? The answer is surprisingly straight forward – consider very carefully a couple of things we’ve been talking about for some time now. DevOps and Bi-Modal-IT. One is the answer, and one is the bane of IT connectedness.
While DevOps (or OpsDev if you prefer) has become a standard in the most productive and forward-thinking IT organizations, it could also make all the difference for organizations suffering from the disconnect malady. Think about that for a moment – if your mainframe development and support teams are brought into your DevOps practices (albeit, probably kicking and screaming), would it be likely that each side of the IT shop (mainframe and distributed) would learn much about the other? Would they gain greater understanding of the challenges involved in their interworking? Rhetorical questions – clearly, involvement in DevOps culture is better than working in the dark.
Involvement in DevOps begets visibility, and visibility begets understanding. In a DevOps environment, would it be possible to arrive in a situation where business-critical code is running without suitable IT resources to support it? These kinds of scenarios only happen in a disconnected IT environment, where not all of the interested parties are at the table. If all interested parties have a say, it’s very unlikely that such a situation could occur.
On the other hand, any organization that is entrenched in Bi-Modal-IT, is its own worst enemy. By promoting a divide between legacy and new-development sides of an IT organization, it is in effect promoting the big disconnect. In many cases, the legacy side of the shop is not involved in a DevOps culture at all – by design – despite the fact that the code base is business-critical. Sound like a good idea? Not to me, either.
Want to avoid a potentially destructive disconnect? The answer is simple: dump any semblance of Bi-Modal-IT, and embrace DevOps – and include your entire IT organization. Not just the ‘cool kids’…
Latest posts by Keith Allingham (see all)
- Part II: Six ways to improve datacenter performance while saving on costs - Apr 25, 2019
- Typical Techniques for Improving Datacenter Mainframe Performance - Apr 18, 2019
- Fast Mainframe Data Access – Apples and Oranges - Jan 17, 2019