The company engaged Planit through the recommendation of an existing software partner. They were impressed by the structured simplicity of our performance engineering framework, which consists of a “deliver once, deliver well” approach to achieve maximum quality, reducing duplication of effort, and savings costs.
Our engagement began with discovery and planning. This consisted of an assessment of their test assets, tools, environments, applications, test data, and more to allow us to tailor a solution that meets their needs and solves their challenges.
During this phase, we uncovered several performance risks that needed to be addressed. One key consideration was the service level agreements (SLA) for availability and response times with third-party providers, since it would determine how well the platform was placed for future scaling.
Other key risk areas we initially identified were:
- No agreed solution for the company to have control over a synthetic end-to-end transaction to check the health of all systems involved from the ear tag to software provider endpoints.
- Variance of response times for business processes due to the inability to control the latency from IoT Edge locations.
- How the performance of the data platform will be impacted if any of the external connections are down or slow.
- How the IaaS Cloud Service controlled components were configured, such as the serverless compute engine, auto scaling, and others.
- How well positioned the company is to manage growth of data, software providers, and end users.
With these considerations in mind, we constructed and implemented a customised performance testing solution. Our framework ensured it was built to be scalable and maintainable to meet their requirements in a cost-effective way, and to provide immediate results.
As part of this step, we assisted in selecting the right load generation tool based on their needs. A proof of concept was conducted to identify their requirements, to understand the protocols, and a shortlist of tools to test.
Apache JMeter was selected for its strong API and web UI functionality. Not only did it meet all their requirements, it also has the added benefit of being free as an open-source tool.
Since the company and its software partners used Azure DevOps, we implemented an automated performance test execution and reporting framework that harnessed it. Doing so would enable faster performance testing of code and uncover performance issues as quickly as possible. It was also designed to intelligently use secure and scalable IaaS virtual servers to meet the planned growth of data and global markets by up to 700%.
The following key areas were tested and evaluated for performance over a variety of scenarios:
- User interface usage by end users and administration staff.
- Internal (e.g. UI business processes) and external (e.g. outside user) API usage.
- Various components that excluded any stubbed or mocked external services.
- Monitoring metric consumption where available for analysis.
Through our performance testing, we aimed to determine:
- Optimum baseline performance of the system using single business profiling.
- Peak performance to see how the system copes with 125% of traffic of the busiest hour of the busiest day of the year, or determined by total regional volume peak in a 24-hour period together with billing batches.
- Whether web servers are already a bottleneck by reducing the number of web servers and then running peak traffic through them.
- Whether the system can handle a regional business day load with a 24 to 48-hour steady endurance run with 80% peak load.
- The operating capacity of the system.
- The breaking point of the system.
- System elasticity and response during planned high activity events. For example, simulating 50 customers purchasing tags at the same time with minor variance in user entry time and across the four key markets (Australia, New Zealand, United Kingdom, and United States).
Non-functional baselines that we set for the data platform included:
- 95% of UI submits taking less than five seconds, 95% of APIs taking less than three seconds, 95% of navigation steps taking less than two seconds, and less than 10% response time increase over the baseline test run.
- Error rates less than 1% with the root cause of all errors determined and accepted by management.
- No system component consuming more that 80% of the CPU and available RAM, with disk, NIC output, and processor queue lengths remaining under five.
Performance testing was securely implemented and optimised across the company’s virtual machines within our Continuous Performance Testing Framework. Two rounds of testing were executed, with reporting of tests automatically generated after each round and presented to the company in an interim report.
At the end of the testing, a final performance summary report was presented. It outlined how the data platform performed against the targets we set for it, as well as the actual risks we uncovered, mitigations implemented, and recommendations to further mitigate risk.