Legacy Modernization or Chasing Rainbows

Mainframe applications, or legacy applications, often limit an organization’s ability to quickly respond to changing data and operational conditions. This is sharply evident in the banking and financial services industry, where legacy systems have thus far slowed down the pace of end-to-end digitalization. Besides limiting financial institutions’ ability to grab new business opportunities, the absence of complete digitalization has also compromised customer experience — a key differentiator in the digital era.

Faster, better, and cost-effective innovation is the name of the game to stay ahead in the highly competitive financial services arena. For financial institutions, digitization is a crucial pre-requisite to driving business growth, building customer loyalty, creating new revenue sources, and expanding market share. However, traditional banks and finance companies are constrained by legacy systems — they cannot quickly respond to changing trends and market shifts, compromising their ability to compete with fintech and new entrants.

In the last few years, it’s become quite a trend to migrate legacy applications to microservices in all kinds of enterprises. Microservices is the architecture-du-jour. Many corporations struggle with defining the right approach to expose their legacy applications while keeping the monolith stable, secure, with business running as usual in the interim.

What is “Legacy”?

Everyone has their own view on what classifies as legacy, so let’s define it here. Legacy can range anywhere from a mainframe to a monolith application. What sets legacy applications apart is having one (or several) of the following attributes:

Lack of modularity: Legacy applications are not modular or layered. There is no clear-cut separation of concerns, also known as BBOM (Big Ball of Mud).
Reliance on deprecated approaches: Developers and business-stakeholders who developed the legacy application have long moved on to other pastures. Therefore, nobody clearly understands what’s going on in the application.
Unsustainable bug fixing: Most bug-fixes are handled via reverse-engineering. There are enough landmines here and there that explode a few times a year.
Opaque data models: The data model had grown way beyond its intent to the extent that nobody can say which columns (or tables) serve what purpose (if any at all).
No usage tracking: There is no knowledge of which applications access the database and grab the data for their business processing.
Redundant processes: Occasional batches run in the background, performing updates, yet nobody knows for what purpose.

Add to this table, column, and variable names in the codebase in a foreign language, and you have a ready recipe for disaster.

Given this context, enterprises face many common struggles when they start the modernization of the monolith to enable digital transformation. As Bill Doerrfeld observed on Nordic APIs, “Breaking the monolith by exposing legacy applications as micro-services is trendy. Yet, as enterprises are quickly realizing, API-fueled micro-services bring unintended caveats”.

Migration Approach

There are various approaches to digitization or mainframe migration; let’s talk about some of them.

Re-Interface

This approach advises keeping business logic on the mainframe in its present form but unlocking it through REST APIs and web services. While this may get you started fast in terms of business outcomes and initial deliverables, it does not get you too far because of its half-hearted effort towards providing consumer-friendly APIs. The way mainframe interfaces are usually exposed is fundamentally different from what a consumer-friendly API looks like. It’s not a stretch to say that this approach is nothing more than the proverbial “putting lipstick on a pig.”

Rewrite

A rewrite would involve migrating the data off the mainframe and completely rewriting the business processes in modern frameworks. You would change the original application’s source language, for example, from COBOL or Natural to Java or C#. However, modernizing existing core systems through a big bang approach may not be feasible given its potential to disrupt business-as-usual. A less-risky option is to adopt an incremental approach to improving infrastructure and adding capabilities. While, in theory, that sounds like a noble idea, it comes with its own set of problems.

Monolith or Microservice

In the exercise for legacy modernization, the first and the foremost question is: do we need a distributed design like microservices or monolith will suffice?

Monolithic applications are usually a much cheaper proposition as you do not need to deal with a whole raft of distributed systems problems. The biggest disadvantage of distributed architecture is the complexity of designing and developing them. It requires a lot more time and effort to build and deploy a series of microservices than a monolith application. From a reliability point of view, there is a higher chance of failure during the communication between the different services when we use microservices. Additional pain points are the costs of running micro-services, as well as their orchestration and monitoring.

To be clear, I am not saying that you should never use micro-services, but I am worried that microservices are becoming the new default. “You aren’t using microservices? Well then you’re clearly not serious about software engineering.” People aren’t making decisions anymore — they just assume microservices are the way to go. This doesn’t always lead in the right direction.

In my opinion, well-designed modular-monolith architecture is just enough to start with; Ports & Adapter architecture with a multi-module maven application is a good starting point. You don’t need to introduce a network boundary as an excuse to write better code. Neither microservices — nor any other approach for modeling a technical stack — is required to write cleaner or more maintainable code. A strong monolith structure will allow you to replace any application segment with a microservice when needed. You probably need to focus first on a better monolith. As Simon Brown once said, “if you can’t build a well-structured monolith, what makes you think microservices is the answer?”

Once you decide you should move to a microservice approach, you’ve already done a good deal of the design work upfront. You likely already understand your domain well enough to be able to extract it. A solid SOA approach begins in the code itself and moves out into the stack’s physical topology as time moves on.

You Are Not Amazon or Twitter or Facebook or Netflix

As Christian Posta pointed out in his blog-post, the journey to microservices is just that: a journey. It will be different for each company. There are no hard and fast rules, only trade-offs. Copying what works for one company just because it appears to work at this one instant is an attempt to skip the entire journey, and it will likely fail.

The point here is that your enterprise is probably not Netflix. In fact, I’d argue that for however complex the domain is at Netflix, it’s not as complicated as a legacy enterprise. Searching for and showing movies, posting tweets, updating a LinkedIn profile, etc., are likely all a lot simpler than Insurance Claims Processing systems, for example. These internet companies migrated to microservices because of speed to market and sheer volume and scale. Enterprises today are going to have to confront complexity in both the domain as well as the scale. This is a journey that balances domain, scale, and organizational changes, and will be different for each organization.

Conventional business organizations usually have a similar focus: creating reliable and stable systems that deliver value for the business — all with tight engineering resources. The problems you will deal with within these organizations are not at the scale of Google, Facebook, or Uber. Quoting Justin Etheredge from his article, “working within a huge engineering environment like Google or Netflix means problem-solving at a large scale with a nearly unfathomable amount of engineering resources. They generally have access to a huge suite of internal tools and libraries to lean on that allow them to write software at scale.”

Many innovative tools like Chaos Monkey, Kafka, and Envoy were born out of companies like Netflix, LinkedIn, and Uber. Yet, have you ever seen these kinds of tools emerge from a financial institution? We need to recognize more clearly that ordinary IT departments do not have the same skill set as Netflix’s engineering team.

You Can’t Migrate What You Don’t Know

I know it sounds very intuitive, but the number of times people fail to grasp this simple concept makes me believe that this deserves attention. The first significant road-block people usually hit when they embark on this journey towards digitization is the lack of knowledge about the application that they are migrating.

Many IT organizations “don’t know what they don’t know” when it comes to their legacy mainframe applications. In the financial services industry, the mainframe has a long legacy of applications going back over 50 years. The original programming knowledge and the programmers who developed in it are often no longer there. Many need an x-ray into their existing application code to identify the hidden complexities that will bite them as they build their new shiny infrastructure. Information of the core business processes and the pre-requisites or invariants for those processes is largely missing. Rever-engineering the entire application to understand it is a time and resource-intensive work, practically impossible for any non-trivial application.

The need for this information cannot be over-emphasized, since you cannot migrate something that you don’t know. And here comes the need for a domain expert; someone who understands the application from a business or functional point of view. These folks are typically in short supply. Indeed, lots of companies try this many times because, after all, the definition of sanity is to repeat a failing strategy until it succeeds.

DDD Expert & Over-Ambitious Domain-Model

The next challenge that teams often face is defining the right approach to decompose their legacy monolith to microservices. Here we usually define a scope for each microservice using top-down analysis-driven approaches, like Event Storming or Domain-Driven Design. This is where DDD-experts are needed to facilitate this discussion between technical and domain experts. At this crucial juncture of the project, decisions are made around events, bounded-contexts, and domain objects — all of which could profoundly affect your microservice architecture. This exercise will determine your migration effort’s success or failure; get this decomposition wrong, and nothing can save the project. If you are doing DDD, you really do need domain experts.

Furthermore, it’s rather easy to get carried away in the DDD exercise and develop a charming, clean, picture-perfect domain model of how your application must behave after the migration. In the enthusiasm of the new architecture, it’s easy to forget to take an incremental approach. Until application migration is complete, both old and new systems must run simultaneously and be kept in sync to maintain the business running as usual. This is where a realization hits like a rock — ambitious domain models may work great for Greenfield projects, but may not sit well when part of your application is running on the older system.

When designing a brand new domain model, people often overlook the fact that your new model must sill work with existing data. For example, you can introduce new attributes, but those attributes cannot be mandatory because you don’t have that information for existing records in the older system. For example, if your legacy application does not store your customer’s gender information, you cannot have a mandatory gender attribute in your updated new model.

A good test is to check that existing database tables can cater to your new domain model. You will need this to work if you want to migrate data anyway, and the insights gained will be useful. Additionally, this exercise will also prevent you from making incorrect associations. For example, if in the older system you have associations like Family has Family-Members & Family has Vehicles; in the new system you cannot have associations like Family has Family-Members and Family-Members have Vehicles. After all, how will you get the individual Family-Members for the existing Vehicles?

Data-Sync between Old and New System

As Tim Berners-Lee said, “data is a precious thing and will last longer than the systems themselves”. In your migration efforts, you will reach a point where you will have to keep the data between both your old and new systems in sync, and this is where a new host of problems creep up.

CDC/Change-Data-Capture

CDC (Change-Data-Capture) is often suggested as a solution to keep both the old and the new data-stores in sync. For the benefit of the uninitiated, CDC is the streaming of database changes by tailing the transaction log. But what people often fail to appreciate is CDC works when you have the same data model (think data replication across multiple instances) or a few tables and columns (think another microservice caching your data for easy access).

Trying to use CDC to sync data across two data models (in different data-stores) that are wildly different from each other in terms of table names, column names, foreign keys, and relationships is a non-trivial task. This is made worse if you have different identities across the old and the new data-stores, for instance, having UUID on the new system and some alpha-numeric id on the older system. Now you have a non-trivial use-case of creating mapping tables to identify which Id on the new system corresponds to which Id on the old system. If the data model is significantly different, there may not be a one-to-one relationship between these ids.

Add to these concerns a lack of knowledge regarding the purpose these systems are filling, and you have a recipe for disaster. The use of technologies such as change data capture, ETL, messaging, and data mirroring often fall short of real-time integration with the core mainframe applications needed by today’s fast-paced businesses and customer expectations.

Dual-Writes

Another solution often suggested in such scenarios is dual-writes; from the migrated application, we write both on the legacy and the new data model. Simply issuing these two requests may lead to potential inconsistencies, though. The reason being that we cannot have one shared transaction that would span both the applications. In unfortunate circumstances, we may end up having the new record persisted in the new database but not having sent the corresponding record to a legacy system (e.g., due to some networking issue). Or, the other way around — we might have sent the message to legacy but failed to persist the data in the local database. Compensating Transaction/Sagas will not work; what if you do not have an id returned from legacy? What if someone has actually seen that data? All these situations are undesirable. Of course, I can use Outbox Pattern here, but again that’s another moving part that will add some significant engineering complexity and overhead.

In a similar vein, suppose after a while you realize that both of your data-stores have become inconsistent with each other. At that point, who decides which one is correct? Was there a bug in your persistence code for the legacy or in your persistence code for the new data-store?

To add further, for audit requirements, what was the timestamp when the entity was created? Is it the timestamp of its insertion in the legacy data-store or the time of its insertion in the new data-store?

The moral of the story is that data, data integration, data boundaries, enterprise usage patterns, distributed systems theory, and timing are all the hard parts of microservices (since microservices are really just distributed systems!). With these in mind, my suggestion should be that the data model should be the last to evolve. Once we have our functionality, modularity, and integration sorted out, then we will most likely have more wisdom to help decide how to approach the legacy data-model. A good modular codebase, especially the Repository & DAO classes, can help you here with a smooth transition if you do eventually decide to migrate to a new data model.

Conclusion

There is no denying that legacy systems require modernization; otherwise, they are vulnerable to crashes anytime. As corroborated by The Washington Post, that’s what happened on Tax Day 2018. Facing technical problems, the Internal Revenue Service couldn’t process electronically-filed tax returns. Although the IRS did not specify what went wrong, the fact that many of their IT systems were outdated at that time — two of them being nearly six decades old — might have contributed to the computer glitch.

Legacy software is usually difficult (or impossible) to maintain, support, improve, or integrate with the new systems due to its architecture, underlying technology, or design. As reported by The Business of Federal Technology (FCW), in 2019, the US Federal government spent 80 percent of the IT budget on Operations and Maintenance. This spending mainly included aging legacy systems, which posed efficiency, cybersecurity, and mission risk issues. To put that into context, only 20% of the IT funding was assigned to Development, Modernization, and Enhancement. So, we can conclude that legacy technology is a significant barrier to digital transformation. Thus, legacy modernization is a natural solution to the problem. Still, we need to think long and hard about how to approach the problem and arrive at a reasonably good solution.