Simple Solution to Embarrassing Collapses of Census Website due to Poor Load Testing
The sad truth is that a good load testing solution can be so simple and cost so little
(This article was initially published on LinkedIn, 2016–08–10)
09 August 2016, there was no surprise that the Australian Census website collapses due to poor load testing. Now government (including Prime Minister), ABS chief (who earns $705K a year), and many executives are busy shifting blames.
Some news headlines:
- Australia’s 2016 census website shutdown to cost $30m (the Guardian)
- Census 2016: Government, IBM settle over website crash (ABC News)
On that night, I asked my wife to submit the census earlier (before dinner) as I knew it would crash (it was advised on the Census site that the public need to submit between 6 -9 PM, which was later approved disastrous). I simply knew that very few companies can do load testing against AJAX web applications. This just proved my point again.
What I didn’t know is how much we taxpayers paid for the load testing of this quite simple web app, as reported by the news article: Census website collapses despite millions spent on IT contracts. Some numbers:
- Melbourne-based Revolution IT has been awarded more than $1 million in contracts from the ABS since December 2015
- $325,000 for “licenses for Census load testing”
- $280,000 on licenses for Hewlett Packard Performance Centre
- a $9.6 million contract to IBM in 2014 to host the eCensus.
- and various testing-related contracts
‘Revolution IT is a software testing specialist’. I often saw this company’s advertisement at software testing conferences or events. This company changed its name to Ampion in 2020.
We have seen big names (such as IBM), expensive tools, and specialized testing consultancy services were behind the 2016 Census website app. Apparently, money and time investment had not been the issues of this project. However, as the overall population of Australia (in household unit)was required to hit the website at the same timeslot, it was clear to any IT professionals that precedent load testing is critical. Why?
1. No idea of how to test dynamic websites
What ABS management failed to care about is that those companies do not know how to load test web applications, especially dynamic ones.
For traditional web applications, load testing is performed by HTTP requests in many threads. The HTTP request teams up with parameters, plus cookie and session data to simulate user requests. Rich companies can afford ultra-expensive HP testing tools while other companies use open-source JMeter. The idea is simple: the approach works for traditional websites.
Though ABS emphasized that “we have load tested 150% of the volume”, it turned out to be a lie or at least an incorrect statement. Their testing team might use expensive testing tools to only load test ‘visiting the homepage’ or ‘user login’.
2. Lack of efficiency to update load test scripts
Like functional automated tests, the load test script development and maintenance need to be very efficient. Let’s not forget that the application during development changes rapidly. If the load testing team is unable to provide early and quick feedback, the dev team is very unlikely to respond for a simple reason: “it is too late to change”.
To make load testing work effectively, the test team shall have the capability to maintain load test scripts in a matter of minutes, just like functional UI testing. From my experience, load testing script development and execution were rarely efficient when they were used with those ultra-expensive and GUI-heavy load testing tools.
My Practical and Simple Solution: real-browser load testing via CT server
Some might ask, it is easy to ridicule at the sideline, do you have a solution to load testing AJAX apps? Yes, I do. It is the reason that I write this article. The solution is simple: running functional (Selenium WebDriver) test scripts in multiple VMs in parallel. I have implemented the solution with BuildWise server + build agents for a load testing project (at a large company with 500+ IT staff) in which the load testing team had been struggling for a year. The issue was resolved by using BuildWise + 10 agents within a week.
I developed BuildWise for executing functional tests. How was it used for load testing? Because this particular load testing problem has been known as a high-priority challenge, a software architect suggested this after seeing our team’s showcase on demonstrating parallel execution of functional tests with multiple build agents.
Some might say that it won’t be correct to use functional test scripts and tools to do load testing for two reasons:
- unable to generate that traffic
- hard to set up VM labs
These reasons sound about right but practically solvable. I will use realistic load (of Census) to prove that the real-browser load testing approach can work at an extremely low cost.
1. “unable to generate that traffic”
IT managers/architects often make statements without any concrete experiences, particularly in test automation (functional or load) and Continous Integration. For the project I mentioned earlier, I used BuildWise agents to achieve the target load (the performance team had tried for one year).
Let’s say 5 million households need to lodge form online between 6 PM to 11 PM, that is 1 million per hour, i.e. 277 per second. In other words, 500 VMs that submit one application per second shall roughly cover the load. There are many techniques that can be applied here. For example, to test final submissions, pre-created ready-to-submit applications may be used. (see my API Testing Recipes in Ruby for unique data generation).
2. “hard to set up VM labs”
In recent years, VM and deployment automation have advanced dramatically. You might have heard of Chef and Docker. AWS is a good place to start. It shall not be hard to create an image of BuildAgent and launch hundreds of instances of them. Also, VMWare has software that can set up your local VM management.
Unfortunately, management and tech leads (at all different levels) do not want to know or understand test automation. Their interests in the topics remain at the talking level. Have a look at how serious that FaceBook takes on test automation: “At Facebook, We have some of our top engineers working on development infrastructure”.
One day has gone, the Census website remained down. Obviously, there were no recovery plans or no confidence in handling the much smaller remaining load.
I bet the “Census load test script” still pass though, :-)
Lessons learned for IT Executives
Before engaging a software testing and Continuous Testing service company, get them to prove their skills within 1 or 2 days. Test Automation and Continous Integration skills/experience are universally applicable to almost all software projects. If one test automation coach/company is really capable, there is no reason why he/she could not replicate the previous success to this one.
Practical advice: find a good test automation coach (ask him/her to showcase their previous work during the interview), paying him/her a good rate to work for a maximum of 3-days to get the core load testing scripts implemented and executed well. Engaged him/her for ad-hoc coaching service if needed.
- my upcoming eBook: “Practical Performance and Load Testing” (register your interests, discount code will be sent to emails when released)
- eBook: “Practical Continuous Testing” by me
- Set Up a Continuous Testing Server to Run Selenium Tests in Minutes