Efficient Test Data Management

One of the cornerstones of high software quality is testing. But how is the quality of tests ensured? The basis for high-quality tests is the test design, i.e. the description of the test cases, which plays a decisive role as the basis for the test execution. A clearly defined initial state of the system to be tested is also essential. Test cases can only be described and executed realistically and thus with high quality in combination with the right test data.

Table of Contents

What Is Test Data And Why Do You Have To Manage It?

Test data includes all data that must be available before a test case can be executed. They are made up of the data that must exist in the system as a prerequisite, the input values of the test steps and the expected results. The completeness and availability of the test data is essential for efficient testing. If there is no suitable data in the system, the test case cannot be executed at all. A key requirement for test data is that it is as realistic as possible. The task of test data management is to ensure that all required test data is available each time a test case is executed. The activities required for this are planning, control, specification, conception, provision, reporting and archiving.

Strategies For Test Data Generation

With the focus on the activities of specification, creation and provision of test data, the two basic strategies are to use either production data or synthesized data. The main advantage of productive data is that real test data is available with little effort. The main problem is to find suitable data sets that meet the characteristics required by test cases. The obvious requirement for using productive data is that a productive version of the system must already exist.

If it is a new development, at most productive data from similar systems can be used. In order to make productive data more attractive, they can be prepared before they are used: masking, pseudonymization, changing dates and adapting individual data records to the characteristics required by test cases help to counteract the problems described above. Synthesized data are artificially generated data sets. The possibilities for generating such data sets range from random generation to the use of technical algorithms to completely manual specification, and they are also harmless in terms of data protection.

The Best Of Both Worlds

The focus of the process is the goal of making the specification, creation and provision of test data as easy as possible. This makes it possible to specify the required test data in a test case using only the characteristics relevant to the test case. All other data that is not required for the test execution in any specific form is automatically set in the background with the help of technically sensible standard values.

The type of specification is chosen in such a way that everyone involved speaks a uniform, technical language. As already described, the process is based on a mixture of synthesized and production data. Specifically, this means that relevant productive data already exists from existing systems and that this productive data is transferred once to the test data stock, taking data protection aspects into account. This procedure is suitable for data that are not specified by test cases with concrete characteristics or that form a closed set. Some of the test data is therefore not only as realistic as possible, but also reflects reality exactly.

Generation Of Test Data

Generating the test data, which is required for test cases due to specific characteristics, is much more complex. Their specification takes place in two stages. Exemplary, realistic data sets are specified together with the customer and analysts. The best results are achieved when the customer interacts intensively with the system. In the second step, the characteristics required in the test data are specified for each test case. When creating test data, these characteristics override the previously set default values.

Since test cases also check boundary conditions, the characteristics no longer have to be realistic. However, they are integrated into an otherwise realistic database. What is still missing is to be able to specify the test data characteristics in a uniform and professional language. To do this, the test data must be reduced to the essentials. In summary, the process is as follows: First, it is determined which part of the required data can be taken from productive data and which part has to be generated synthetically. Technically meaningful standard values are then specified and stored for the test data to be generated synthetically.

Technical entities are made available to the user to generate the test data. Finally, relevant characteristics are specified for each test case, which override the default values. First of all, it is determined which part of the required data can be taken from productive data and which part has to be generated synthetically. Technically meaningful standard values are then specified and stored for the test data to be generated synthetically.

The Next Step

In the second step of the synthesis, all data relevant to the test case are specified. If the name of a customer is relevant for a test case, for example to check the display of special characters, the test case explicitly specifies this name and thus overwrites the default value stored. The technical implementation of the described test data generation requires an intensive examination of the technical domain model, the analysis and the technical data schema of the development.

The implementation of the presented test data generation is made available to users for testing in the form of an API. Technologically, this API is integrated into development and test automation. The use of the same programming language enables consistent programming of the test data generation and the test procedure.

The implementation of the test data generation belongs next to the production resources in the source code management. This ensures that versioning matches the respective production status. At the same time, the archiving of test databases becomes obsolete since all test data can be reproduced at any time.

Also Read: How Robots Can Be Useful In Cloud Data Centers