Client: Global developer of IoT devices for healthcare
Project Duration: 1 month
Goal: Load test HTTP APIs and project the amount of users the current infrastructure can handle. Build a scaling roadmap.
Tech: Ranger/Berserker, Getling
The client needed help with the Cassandra cluster performance which was used as a datastore for the IoT platform with complex ETL and analysis. All the data was coming from mobile devices. The client wanted to see if database performance can be improved. The secondary goal was to see how many concurrent users could be supported by the entire platform. Testing the performance itself was straightforward, using the cassandra-stress tool. However, testing only one component in a complex system does not provide a satisfactory level of details and predictability how the platform will behave in production. Our main goal was to simulate the behavior of a real world user and implement that behavior as part of the load test. The user uses a mobile device to receive their medical data while, at the same time, the device is sending the data to the platform. So, to simulate the production-like load, we had to:
- Describe how the user uses the device (which API endpoints are targeted, when, how often etc.).
- Describe the load that the device is sending to the API.
- Implement that in the most appropriate tool and answer how many concurrent users the platform can support.
The first step was to test the database for low hanging fruit in terms of performance with the cassandra-stress too. Once we implemented this, the client wanted a more detailed report about the entire platform. In cooperation with the client, we created several scenarios which described the user’s actions. In order to get the most accurate results, we needed to get production-like user behavior. We tried to get the actual real world-like user behavior. The main challenge was to replicate real user behavior. In cooperation with the client, we specified user behavior regarding the time spent on the landing page, how the requests were distributed in terms of order and time and so on.
In order to implement complex user behavior, we used the Gatling load testing tool. This allowed us to describe the creation of users and scenarios which the users will act upon. In our case, this was a series of API calls. Gatling was chosen because it allowed the creation of complex flows and scenarios for each user.
The test provided an answer to how many concurrent users can the platform support. The report also showed the hotspots in the API and which endpoints took the most time to respond. This resulted in a baseline, detailed performance report, and actionable improvement points for the platform, which the client subsequently implemented. The last, but very important step was to provide the client with all the scripts and documentation necessary to replicate the results and rerun the tests on their own after the improvements were in place. Enabling clients to take full ownership of the solution is the core part of our company values.