What kind of test do I need for my software?

Perfect Testing in Software

Introduction

Most software companies run in that dilemma. In order to write software that works, you want to write some tests. The question for many of these companies is: what kind of tests are most efficient for my business?

Here we talk about seven common test practices in the software industry.

1. Unit Testing

Probably the most used form of testing is Unit Testing. This is easy to write and very effective in discovering bugs. If you are looking at verifying correctness, this is 100% what you need to have to test your software.

Why is it easy to write?

In most cases, Unit tests are written against fairly small amount of code. This makes it easy for a programmer to know exactly what is happening and fix any potential issue as it arise.

Such tests are also likely going to be small since it verifies a small amount of code.

Full Coverage

At Made to Order Software Corp, we like to write enough unit tests to cover 100% of our code. Full Coverage doesn't mean the code works exactly as expected, but it increases the chance that it is likely 99.9% correct.

Verify All Error Cases

Except for a very few obscure system errors which are difficult to trigger with simple data, we most often are able to test all the possible error cases in our software. Running Coverage test give you the ability to see whether all potential errors were checked. If not, continue to write unit tests until it does.

In our case, we use the LCOV_EXCL_LINE on a few error cases so our tests look like 100% coverage. If that error case ever triggers, we will know because our tests do not expect it (especially if it is a C++ exception). Such an error could be an I/O error. Our hard drives work just fine, so receiving an I/O error is just not really possible. The only way we could trigger such would be by mocking many of the system functions, which is not very productive.

2. Microbenchmark

To verify how fast your code is, you most often write microbenchmark tests. Such tests help you by giving you a current execution time and see whether various optimizations do reduce that execution time or not. If not, it's not a useful optimization. If yes, then keep that change, for sure.

A microbenchmark is so called because it verifies a very small amount of code. In that sense, it is very similar to a Unit Test.

There are, however, issues with continuously running microbenchmarks. Each time you change the hardware, you are likely to get different results. Also, depending on what is running along your software (i.e. a browser, a drawing application, 3D rendering, a video game, etc.), you may end up with quite different results. So verifying that a function continues to run fast with a microbenchmark test is a complex matter if the timing are very small (tenth of milliseconds).

A better way if you want to time things is to write a macrobenchmark (see below).

3. Integration Testing in Production

With the advent of the Internet, we started to see testing happening in production. After all, some issues really only present themselves when running the application in production so you kind of have to test the system that way too.

The production system is always so slightly different from the developer or even your test environment that some errors can happen on it when they are not reproducible on any other machines.

At Made to Order Software Corp. we created the libexcept library to help in that matter. The library captures the stack when an exception occurs. All our software use exceptions only when something is really wrong and returning a simple true/false error would not be sufficient. This means we know that the application is in a bad place and needs to stop it. By having a full stack trace, even in production, helps us in debugging the issue quickly since that way we know exactly where the bug was discovered even in a production system.

Note that most of the languages used on the Internet offer a stack trace on an error (python, PHP, Go, ruby, JavaScript, etc.) Only in languages such as C++, this is really not the default behavior of the environment. We change that with our library. Note that the library also captures signals such as the infamous SEGV.

4. Benchmarking in Production

Also I don't personally recommend you use the Integration Testing in Production (3) above, benchmarking in production is a good idea and it can get you a lot of information on how your system can be improved.

In this case, you want to look at using a type of system that gives you execution timings such as a profiler. In most cases, running a profiler on your code slows it down, but today's CPUs have many features that give profiling the ability to be very efficient and not bug down your software by much at all. So using it is a good idea.

The main reason why benchmarking in production is a good idea is because that's definitely the ultimate place to run such tests.

One possible solution to avoid a slowdown to the end customer is to mirror the incoming traffic. This means sending the client's requests to two systems: (1) the real production system and (2) a fake production system (a form of replica) which also runs the profiler. That mirror system can also be used by the Integration Testing in Production (3) defined in the section above. The reply by the mirror can be dropped so it does not affect the client. The one issue with a mirror system is that all the dependencies need to be replicated (i.e. if you use microservices, you must make sure the mirror do not use the production system microservices, otherwise it will affect your production environment—imagine you hit an SQL database, now you have a production and a mirror system that both hit the same SQL database, it is much more likely that the database will be negatively impacted).

5. End to End Integration Testing

Above we mentioned the Production version of the Integration Testing. Now a day, since we have application on the Internet, Production Testing is one of the most efficient way to verify that your software is fast and correct. However, it has all sorts of drawbacks: can bog down your application, it may affect your data, etc.

To avoid potential issues with your users, or if your application is in link with Financial, Health, Aviation, Military Equipment, etc. using a production environment is not going to be possible. In those cases and when you prefer to have more coverage over the code in your testing, having a full End to End integration test is preferable.

This means building a complete environment with all your services running as if you were in full production. The idea is simple: send requests as if you were a client and see that the results are exactly what is expected. So just like a Unit Test, only with your entire software stack.

The huge cost of running such tests is the maintenance. Especially in our existing environments which evolve very quickly. Often we had 2 or 3 iterations testing this or that library and switch very quickly. This breaks 10% of the Integration Test... that's a lot of work to make the test pass again.

Overall, though, this is the only way to verify your entire software for best correctness. Such a test can also be used to benchmark the application (see (6) below).

6. End to End Benchmark Testing

An interesting aspect to having an End to End Test is that it can be used for three purposes:

Functionality, make sure that the entire application does what it is expected to do
Correctness, like Unit Tests, but for the entire application
Benchmarking, to make sure the application is running at the expected speed

So the cost of building an End to End Test can, in the end, be verify small compared to the other methods. However, you should still not skip on having Unit Tests and at least at times do some Microbenchmarking.

The main issue with the End to End Test is to keep it running. One aspect to that is giving all your developers and support staff full access to that test so they can verify whatever issue has been reported on the production application. This way, you really have a test which is useful to the entire company. The End to End Test should also run on similar hardware and third party software environment (i.e. don't use MySQL in production and PostgreSQL in your test—if your software is expected to support both database systems, then you need two test environments).

7. Macrobenchmark Testing

Above, we mentioned Microbenchmark, which apply to a low level function to make sure that this one function performs at a decent speed.

On the other hand, a Macrobenchmark runs against your application as a whole and verifies that each step and the total amount of time it takes the application to reply remains relatively constant. However, you have to keep in mind that the results of a Macrobenchmark have much more variance (fluctuate) than a Microbenchmark.

Once thing you need to read in your Macrobenchmark (which can be difficult to accomplish) is the time it takes to wait on various I/O or Network calls. For example, reading a file from an HDD may be fast enough in your situation. In others, it may require having the file in memory. Similarly, generating a log message may be really fast when you just need to write it to your local SSD but take much longer if you want to send it to a third party computer via the network. Although, in both cases, it may be useful to make use of a separate thread to manage the logs so the main application does not need to wait on any I/O or Network and decrease the latency of managing a log message to a minimum. However, having a separate thread may not be enough if you generate more logs than the thread can handle. In that case, throttling can be used, assuming the larger amount is only temporary, otherwise sampling must be used. Sampling means some of the log messages are dropped to avoid overloading the network or fill your storage device.

So what is the best?

To answer this question you really need to know what is important to your business.

In terms of costs, it will be cheaper to start with Unit Tests and Microbenchmark Tests.

In terms of correctness, you want at least full coverage from your Unit Tests.

In terms of usability, Production Testing by your users is generally enough, assuming your users are accessible and don't mind filling out surveys.

In terms of effectiveness, it is best to use Production Testing, with the drawbacks mentioned above.

In terms of overall correctness and effectiveness, an End to End Test is best, but it is one of the most costly (the Macrobenchmark Tests are usually harder to write and thus cost more).

As we have seen with the advent of Open Source, as more and more people use your code the more likely you are to quickly hear back about bugs. This is very much the same as Production Testing. It is really cheap for an Open Source project to be available for people to test on their own machines without you having to spend any time writing any test at all.