Performance testing from zero to hero
The objective of this blog post is to help people that begin on performance testing. You will find some best practices, advices, common mistakes and some tools.
Methodology
A common belief is that the main difficulty with performance tests is to choose the right framework and to write tests. In my opinion, the main difficulty is that you need a lot of information before writing testing scenarios. My advice is to write specifications for performance testings, with a structure that I will detail in the following. Those performance testing specifications should be shared and validated by everyone involved in the project before coding the testing scenarios. In that way all people involved in the project shares performance tests scope, hypothesis and related risks.
In all our work, we often need to estimate tasks durations. For performance testings it is way harder than for development tasks. My favorite reply to people asking details on testing tasks duration is “We know when we begin, but we never know when it will end”: the reason is that the objective of those kind of tests is to find issues, so resolution times depends on problems about which we know nothing before the testing phase.
At the beginning of performance tests, you should try to do the following things at minimum:
- write and test a few scenarios;
- covering few services;
- always begin with a small load, then increase the load progressively to reach the targeted load;
- try to have very simple validation condition (like “no timeout” for example).
In my experience, time goes way too fast when doing performance tests, so the best method to do them is to proceed step by step and to increase the project coverage progressively.
To give you some numbers, for a quite simple performance test (with 1 or 2 scenarios), it will take you around 2 weeks to complete the tests if everything works fine.
We will continue this discussion with performance testing specifications. These specifications should follow the following structure:
1 Performance testing overview
1.1 Objective
1.2 Validation conditions
1.3 Platform
1.4 Hypotheses and load
2. Scenario used
2.1 Scenario [Scenario name]
2.1.1 Functional description of scenario
2.1.2 Input data
2.1.3 Checks at the end of launch
3. Load test and results
3.1 Load test of dd/mm/YYYY
3.1.1 Test context
3.1.2 Results and metrics
3.1.3 Conclusion
All chapters are detailed in the following.
1 Performance testing overview
1.1 Objective
To begin, you have to know why you are making those performance tests. Most of time you want to validate one of those aspect:
- Will my application support the load of the future production delivery? In which case your objective is to validate that everything is working fine with the targeted load.
- When the load on my application will increase (because of its success), which part of it will be a point of failure? Which part will we have to duplicate or modify and what is the associated cost?
- What are the limits of your application? What is the maximum load it can process with the current architecture?
- You want to reproduce production load in a separate environment in order to reproduce an issue that occurs only when the application is stressed.
1.2 Validation condition
The idea is that everyone on the project agreed on results that will validate the test before launching any load test. In many cases, you will have conditions that match the service level agreement in the contract. Once again, try to stick to the minimum necessary, because having a set of validated test results is sometimes very hard.
Some examples:
- All HTTP request responses are 20X: this is the easiest result to check. If you do not have any specific condition to check, this condition should be right.
- A specific page XXX is displayed in less than N seconds for 99% of the calls, all other pages responds with 20X: this kind of validation is used when you have a specific maximum response time to respect according to the service’s contract.
- The batch on a file of N lines id processed in less than X minutes, and at the end there are N records in the database.
1.3 Platform
You need to have your application deployed somewhere to launch performance tests.
In this part, you can determine precisely the list of services that needs to be installed, and those that are mocked or simulated. You should start doing performance tests on the part of the application you are responsible for, and then, when everything is correct on your side and you still have time, you can test the communication of your application part with other services, if and only if those other services accepted to undergo your tests.
You should detail as much as possible how mock-ups are made, for example:
- some data are hard coded
- some fake services simulate “200 OK” responses for every calls, without taking input data into account.
Here are some hints to help you choose your environment:
- The application is not open in production, and the production environment is ready: you are in the best conditions, you can really load test the targeted platform. Just keep in mind that you will need some time to clean the platform before the real production opening.
- Otherwise, you have to find some environment with capacities similar to your production environment, and sometimes do some projections. For example, if you have an external acceptance with twice less power than your production platform, you should test with only half the expected load, and hope that the production will process twice more load. Keep in mind that nothing guaranties that the load impact is linear: testing half the load on an half of the target platform will not validate the production platform. If you can, you should do performance testing with more than a half of the load.
- In the worst case, on your development machine. It’s far from the reality, but you can still find evident points of failure or concurrency issues. If you do this kind of load tests, you should write on the specifications that the tests are far from reproducing production conditions.
Please keep in mind that only the production environment is equal to the production environment. All other situations are used when you cannot do better and the production platform is not available.
To illustrate this, I give you the mechanical allegory made by Valery Brasseur:
« If we took a down scaled environment, it will work! » Actually no! Why?
Let’s imagine a car and its engine. Let’s say that the car runs at 100Km/h in production. Now we down scale our performance test at ¼, this means that the car will be driving at 25km/h, but nothing else has changed! If the issue occurs at 30Km/h, then we will never know.
Moreover, the application structure is more complex than that, because the eco-system is composed of several elements and the scaling cannot be applied in the same way for every component. You also have to pay attention to some parts that can’t be splitted in parts (for example firewalls, network throughput, load balancers etc.). To be clear, it is very important to test at a real scale, and at least doing it before starting the first production delivery.
In the same way, if you are forced to have a down scaled environment, you have to pay attention to the scaling factor. For example: if you downscale the middleware part while the database stays the same, you end up with a “2CV (popular french minicar) with a Ferrari engine”, and you will never detect database issues (engine overheat ;-)
1.4 Hypotheses and load
If you do not know where to start, begin with the load per minute or seconds. This is one of the easiest measure to check no matter the test duration. For example if you begin with a 10 minutes test to validate your scenario, you can quickly know if you are near your target load. At the first launch of performance tests, it is a very common thing to be far from the target load: either the requests are too frequent, or there is too much time between two requests.
For an application that is already in production, you can measure the real load, for example:
- the number of requests or pages served by minutes using Apache or Nginx logs ;
- the requests or pages that are used the most in your application;
- the size of the input or output files for batches ;
- the user count in the database.
For a new project, it is harder to guess the target load. If you are lucky you can have a specific load in your application contract, expressed in number of requests per minute or second, otherwise it is more complicated. You have to get this information from the people in charge of the functionalities you are about to test: the number of concurrent users, the peak load hey have in mind, the human execution time for a task done on a website or by a connected device for machine to machine applications…
Whatever your conditions, you need to share all the load hypotheses with all the project’s tenants.
You can add in this part, all common hypothesis for all scenario, for example if you set that 20% of users that put products in their basket will process with the payment with success.
2 Scenario used
The goal of this chapter is to detail the scenario used to perform the performance test. You have to use realistic stories, and you can mix several stories once you prepare your load tests.
The main advantage of using stories is that you can test several conditions, for example:
- One load test with 20% of users using service X, and 80% using the Y simulator.
- One load test with 50% of users using service X, and 50% using the Y simulator and some back-office agents.
2.1 Scenario [Scenario name]
You can add a short reference to the scenario’s name, in order to use it in the load test description, in the load test tool or to make some references on the project specifications.
2.1.1 Functional description of the scenario
The detailed story, for example:
- The user goes to the login page;
- He enters his login and password;
- He can access his account;
- He chooses in the left menu to go to his messaging service.
2.1.2 Input data
This section describes the data set used, which data is randomly generated, which data is fixed, taken from a csv file, etc. For example:
- User login and password are randomly taken from a CSV file summarizing 200 active user accounts;
- Phone localization is simulated with a random point taken in a rectangle with coordinates: [(46.188060, 6.229955) (46.188060, 6.243109) (46.198028, 6.243109) (46.198028, 6.229955)] ;
- 50% of iPhone user agents and 50% of Android user agents.
2.1.3 Checks at the end of the test
The goal of this part is that everybody agrees on the metrics that will validate the tests. Depending on the chosen metrics, you will have to set more or less tools to measure those metrics.
For example, if the HTTP response is the only measure to capture, most of the test tools give that information. If you need to validate system metrics like CPU, RAM or I/O counts, you have to check that you can collect all those data in order to easily validate them at the end of the stress test.
The objective is also to share the detailed list of performance indicators that will be monitored during the test.
3 Load test and results
This part aims at keeping track of load test results with the date and the context of the tests.
3.1 Load test of JJ/MM/AAAA
3.1.1 Test context
For every load test, give the detail of:
- The scenario used;
- The test duration;
- The ramp up, was it progressive or not;
- Some detail about the infrastructure: how many servers, with configuration information such as RAM, CPU, database, data volume at the beginning of the test, and everything that you think can impact the test’s results.
My advice is to go progressively when working on load test. You should begin by one test per scenario, and when all the scenarios you needed succeeded, you can mix several scenarios to simulate users behavior in a context that is as close to reality as possible.
3.1.2 Results and metrics
This part’s objective is to validate that the load test correspond to the targeted load test. You should really check that the load injected is compliant with the desired load. Sometimes it can be difficult to reproduce a specific load exactly.
For example:
There has been 7042 users simulated during one hour, with:
- 2359 users that proceed to payment with articles in their baskets, giving around 39 payment transactions per minute;
- 2355 users added products to their basket without completing their purchase;
- 2328 did not put any product in their basket.
In this part you can also give some general results. For example:
- Request in error count: 0
- Request responses < 3s: 15 320, representing 97,3% of the total number of requests
- Request response > 3s: 423
3.1.3 Conclusion
The objective of this chapter is to state if:
- The load test is validated, and if all results are compliant with the target;
- The results are not compliant, which means that the application needs to be fixed, and you will have to relaunch exactly the same stress test to validate the corrections you have done;
- Eventually, if the results are compliant but with some minor issues.
When you have non-compliant test, it is important to keep all information about the failed test in order to validate that the future fixes really fix the detected issue.
Some words on tools
Gatling
I will not pretend that I know all performance test tools that exist, yet nowadays my favorite tool for web testing is Gatling.
Here are the main reasons:
- It is an open source tool
- It can be launched with Maven or Graddle
mvn gatling:test
- Default results are generated as a good looking html website that you can share with managers or clients.
- You can quickly launch your first test with a few lines of code, using default options
- There is a recorder that you can use as a transparent proxy to you web site and that generates the code for the scenarios you defined
- Gatling methods and class are oriented for stress test, with scenario, simulations, CSV feeder etc.
My biggest problem with Gatling is that the language used is Scala. I have never used Scala before and few people knew this language around me.
Apache bench
If you need to stress test only one URL, you can have a look at Apache bench. Its usage is very simple:
ab -n 100 -c 10 "http://mypage.com"
The above command will launch 100 requests with 10 clients (users) on http://mypage.com.
Data set
One of the difficulties with performance tests is to have a consistent data set for the tests.
When it is possible, the easiest way is to use random data. For example, if you need a cardinal point, it is possible to take it randomly between {South, North, East, West}. But sometimes you need some more complicated data like names or addresses.
For “classical” data like names, postal addresses, etc. you can take some from open data repositories.
For example, for french names I used the CSV gathering the names of French deputies. There is a lot of different kind of data published on open data repositories that can be used to create some test data sets with a bit of imagination.
You can also have a look at some data generator website such as Fake name generator.
Conclusion
If you can only keep in mind two things, I would like you to remember these:
Stress tests are everybody’s business!
AND
Stress tests preparation begin at the beginning of the project
Have fun :)
images credits: Pixabay