Having issues with software quality? Have you tried test-driven development? Do you have test scripts based on test requirements? Do you have a rich sandbox with good test data? What’s your code coverage? How often is a fully tested version released into production with problems that were missed? How often does the new feature work adequately in production but with destructive side effects that integration testing missed? How long and expensive is the pipeline from programmer’s release to working production? Are you proud of being on the forefront of development methods with Agile, Scrum and the rest but still avoid releasing what comes out of each Sprint to production because you can’t risk another disaster?
If any of these apply to you, you may want to consider a decades-old, widely proven in production method of software QA that enables frequent, error-free releases to production with almost no overhead or traditional QA work. It’s not taught, there are no certifications and It’s ignored by mainstream software experts. But it works.
I used to call it “comparison-based QA.” The term is appropriate because the core of the method is comparing results of the production version of the code with a test version. A CTO with a data science background suggested a better term, which I will use henceforth: champion/challenger software QA. It’s like when you’ve got a good model, the champion, and you want to see if a new model, the challenger, yields better results. Or it’s like A::B testing of a consumer UI, when you want to see if a proposed variation works better. In both cases, you’ve done something new and you want to do two things: (1) see if the new thing works like it should, and (2) make sure nothing that used to work is broken. Sounds like feature testing and regression testing, doesn’t it? Yup!
Nobody writes test requirements or feature tests for champion/challenger. You just feed the new thing the same data you fed the old thing and compare the results. You expect differences. If something is better and nothing is worse, you could have a winner – you test with a wider range of data, and if the results hold up, the challenger becomes the new champion and you move on.
Here are the main beauties of champion/challenger:
- The tests are nothing but inputs and saved outputs of real-world processing. Nothing artificial, no carefully crafted "test data."
- You create the code for comparison of output data once. The code/model could get endlessly complex, but the output comparison identifies differences whether it has 10KB or 10GB to compare. No test scripts!
- You can do the comparison with a UI with some extra work, but again one-and-done.
- The comparison pulls out all the differences. You look to see if each difference is something you expected: if everything new you expected is there, and crucially if there are any unexpected differences. If there are, you’ve got a regression failure to fix.
- You can do the comparison in batch with large samples of old data, giving you great coverage.
- You can do the comparison live, in a production environment, using the challenger code/model output for comparison and sending only the champion’s output to the user.
- You can test the changes you've made on the challenger code alone, looking at the challenger code's output, to see if you're happy with the change you made. While you're doing that, the challenger code will simultaneously be running production data, which the comparison with the champion will make sure is still OK.
- After you’ve done enough live parallel production testing, you can be confident that the challenger will do everything the champion did – with real-world data in the production environment -- so you merely turn the switch so that challenger output becomes live and the old champion is retired.
- You can do this in stages to crank the risk way down, so that the challenger becomes champion for just 10% of the inputs and then gradually turn up the fraction. This guarantees problem-free production releases.
- You can test the changes you've made on the challenger code alone, looking at the challenger code's output, to see if you're happy with the change you made. While you're doing that, the challenger code will simultaneously be running production data, which the comparison with the champion will make sure is still OK.
I've written about about the details of how to get this done in a book.
I've written a post explaining the core concepts of the two main aspects of QA and how they're different and best served by different methods.
I've written about the path that led me to writing this and related books and summaries of the books.
I've written about how this method provides crucial fuel enabling startups to surpass tech groups hundreds of times their size along with specific strategies those groups use here and here.
The method is a "secret" -- because the vast majority of industry elites choose to ignore it. Want to succeed against industry giants? Use secret methods like champion/challenger to enable rapid release of quickly evolving code with zero errors in production and minimal QA effort. You'll run circles around the lumbering, stumbling giants and then leave them in your dust.
Comments