Everyone wants software that does what it’s supposed to do, runs fast without down time and can be changed quickly without causing problems. Who doesn’t want this? For more details on software goals, see this.
Everyone claims that their methods are great at achieving those goals. Sadly, such assertions are mostly baseless and in fact the touted methods do a terrible job. But they’re standard practice!
I’ve written extensively about they way to meet the goals in dozens of blog posts and a couple of books. Here is a specific step-by step way to move from standard architecture to one that meets the goals in simple terms. It’s a modified version of a path taken by a small, rapidly growing software company that I worked with. Incremental steps of this kind, with value at each step, are usually superior to the massive re-write approach that some software people are tempted to take.
When the new tech leader took over, there was no QA. So he first put in simple, UI, click-script based QA, and a full build process. Having done that, he wanted to move from his existing code organization to components to make change cleaner. But he had a reliability problem – it was good, but because of the nature of this customer base, it needed to be near-perfect. He started by thinking about adding a second data center, putting in database replication and then somehow doing a switch-over when the primary database went down.
Replication
He first focused on replication. All the transactions would arrive at a primary data center, with results stored on disk. The disk contents would be replicated to a backup data center, so that when it needed to take over, everything would be in place. It’s easy to set up a storage system to do this.
The trouble is that replication & availability functions have been moving up the stack, for good reason. While it may be trickier to set up database replication, the results are usually much better.
DBMS replication
Even when handling replication at the DBMS level, the DBMS transaction is guaranteed, but the user’s task may be just half done. The database will think everything is cool, but unless the application is 100% lined up, the results could be bogus.Yuck.
Application-level replication
Serious applications tend to have application-level replication. This basically means recording the field-level changes a user makes during an application session, from start through confirmation. This can and should be done in a generic way, so that when the application adds or changes a field, the recording, transfer and replay at the other site doesn’t need to change. This has the same net effect as database replication and avoids the issues.
Application log
Building an app-level log for replication is VERY close to building a … user transaction log! Which you’d like to have anyway! So make sure the extra information is there. This has many uses, including being able to show the user what they did and when they did it.
Instead of needing a big, intense, fast connection between the sites like you need for storage or database replication, you just need to ship and apply the session transaction log, which you can do at the end.
Replaying application transaction logs
Application transaction logs solve the reliability problem and give the user full access to their history with the application. They are also the crucial foundation of an incredibly important architectural advance: champion/challenger QA. This is how you can dump those QA scripts and enable rapid testing of new features for rapid, problem-free deployment.
In addition to having two identical copies of the application running in the two data centers, you bring up the proposed new version, Once you’ve got that, you can replay logs against a new copy of the application, and make sure you get the same results as last time. Guess what – that’s real-life QA! The only extra thing you need is the comparison, which again can be built once, regardless of whether there are 10 functions or 10,000. See this for much more.
Live parallel test
Once you’ve got that, you can implement live parallel test, which takes all the strain and risk out of releasing. It means doing the replication to another copy of the stack, just like you do for DR, except it’s the new code. If everything works well for however long you feel is necessary, all you do is switch which copy of the application sends results back to the user.
What’s gone when you do this thing that a small number of successful organizations do to their great benefit? Among other things: QA scripting. Double coding. Fixing QA bugs. Changing the script when you change the app. Unit testing. Test-driven development. Etc.
Moving to components
Yes, components and layers and objects and microservices are what experts are in favor of. Beware of experts! And above all, even though you may not give a talk at a conference about it, migrate your code to ... yes, a monolithic architecture.
Can’t you use queuing when it seems to make sense? Of course you can! But it should be something in-memory and simple like redis queues.
Active data migration
It’s standard practice to store all user and transaction data in a DBMS, and to call on the data from the DBMS when the user wants to do something. Sadly, this standard approach has long-since become obsolete as I explain in this post from more than a decade ago. It’s is a killer of application speed, flexibility and everything else. And there are proven in-memory databases like redis that can do the vast majority of jobs. If you’re dying to use a DBMS, they continue to be useful for archive and reporting!
Doing this will also tremendously simplify your champion/challenger testing.
The final push to speed of application change
Doing the foregoing things to your application will help on multiple dimensions. The final push, which can be step by step, is the one that will do the most to enable unanticipated changes to be made to your application. Conceptually, it’s pretty simple. What makes change hard? Finding all the places that need to be changed. What if the number of places you need to go to make a change shrinks, approaching over time a single place needing to be changed? You’re in the winner’s circle! Bonus: as you do this, the number of lines of code in your application will actually decrease! There’s lots to be said about this, but the core concept and value really are as simple as this.
What are you waiting for? Get started! High quality, fast-to-change software is yours for the taking ... err ... fast for the building.
Comments