What Is Test Data Management? | Importance & Best Practices
Test data is important, and test data management is more important. Do you know why? We have figures for you.
IBM Big Data and Analytics Hub website cited a case study where a US insurance company was estimating 15% of their testing efforts to just test data collection for the backend system and the frontend system.
To quote the study, “For every USD 14 million delivery by the software development and QA team, a hidden USD 3 million was being spent on data management. All data management tasks included moving data from back-end systems to identifying test data, data masking of sensitive data, skipped production defects due to unavailability of correct test data, manipulation of data for different scenarios, and storage of test data.”The test data management for the company had become a big problem and had to be solved. So the complete process was reviewed and evaluated. Finally, a process for test data management was implemented. This helped the insurance company to save USD 400,000 annually in the cost of testing.
The above example clearly states the importance and need for proper Test Data Management(TDM), also known as Software Test Data Management.
Table Of Contents
- 1 What is Test Data?
- 2 What is Test Data Management?
- 3 Create Test Data
- 4 Steps for Test Data Management
- 5 Why is TDM Important?
- 6 What are the Benefits of Test Data Management?
- 7 Best Practices for Test Data Management
- 8 Common Test Data Challenges
- 9 Conclusion
- 10 Suggested Reading
What is Test Data?
By Wikipedia– “Test data is data which has been specifically identified for use in tests, typically of a computer program.”The test data required by the testing team to test an application can be of two types:
1. Static data- This is the data that does not change even after recording and usually comprises non-sensitive data like City name, PIN code etc.
2. Dynamic data (Transactional data)- This data can change after recording and usually comprises sensitive data like the medical history of the client, number of employees etc.
For testing purposes, usually, a mix of static and dynamic data is necessary. Data can be present in different formats, different databases, and different types. Testing may require data from different sources according to a specific requirement of the Application Under Test (AUT).
Mostly the data present for testing is production data because it covers all types of different data that an application may encounter in a live environment.
Now, imagine a scenario where transactional data containing credit card numbers, mobile numbers, and bank login credentials are for testing purposes.
In case of improper use of such critical and high-risk data, legal action by the customers is definite. This breach will result not only in financial loss but the trust of the customers as well. It will eventually cause catastrophic damage to the business of the bank.
So how to test a business-critical banking application in such a case, without production data, where improper data will result in daunting production defects?
The answer is data masking.
We will use the production data after masking or hiding the sensitive information. This masking comes under TDM (Test Data Management), where we intend to keep the sensitive production data separate from the test data.
Let us understand a bit more about test data management (TDM).
What is Test Data Management?
On Informatica, we find the definition of TDM as – “the creation of non-production data sets that reliably mimic an organization’s actual data so that system and application developers can perform rigorous and valid system tests.”
In simple terms, Test data management (TDM), is a process which involves management- planning, design, storage and retrieval of test data. TDM ensures that test data is of high quality, appropriate quantity, proper format and fulfils the requirement of testing data in a timely manner.
Create Test Data
To create test data there are three approaches:
1. Copy Production Data
i. The actual production databases are copied or cloned in this approach.
ii. Due to the large size of the production database, it is a time-consuming process.
iii. Creates dependency on the production environment, the testing and development team cannot create the test data themselves.
iv. It is a high-risk process because the sensitive data of customers’ is at stake. If data breach happens then legal procedures may hinder the business badly.
2. Synthetic Test Data Generation
i. A database administrator(DBA) creates and runs SQL queries on the database tables to gather the required test data.
ii. Expertise of the DBA is crucial, extensive knowledge of the schema, relationships, and database is required.
iii. It is time-consuming because query writing and running them on DB may take time.
iv. DBA needs to add all the negative and boundary value conditions as well in test data for testing.
3. Data Subset Creation
i. Unlike the data cloning approach, different subsets of the production database are copied and not the whole database.
ii. This approach is time-efficient because a subset is copied, so not the whole database is involved.
iii. Skilled people are required to decide what data should be copied.
iv. Data masking is an important step in data subset creation. The sensitive data is masked, to rule out any data mishandling.
v. Data subset creation is the most used data creation approach in the test data management process. The other two approaches are usually avoided due to the cost involved and data sensitivity.
Check out the what, how and why of data driven testing
Steps for Test Data Management
1. Analysis of Data Requirement
This test data could be needed on different interfaces of the application. The format and type of data may also be different on these interfaces.
So, the first step is to understand the data requirement of the organization based on the test cases that will be run. This will require knowledge of the domain, business and all the applications involved in the whole end-to-end process.
Example- a banking system, it will have a CRM system/CRM software, a financial application for transactions, which will be coupled with messaging systems for SMS and OTP. Here, the person analyzing the test data requirement should have expertise in banking domain, CRM and financial application knowledge and messaging system also.
2. Data Subset Creation
As we have seen above, this is the most widely used data creation technique. The real production data is copied to provide different subsets which accommodate all the test data requirements.
The accuracy, uniqueness, consistency, referential integrity all these features of the test data should be taken care of while copying the data. Data for boundary value and negative testing is also created by modifying the subsets or adding some data.
3. Data Masking
We are dealing with sensitive production data, it is really important to hide the customer data like medical history, bank login information, phone number, credit/debit card information etc. Any failure to protect sensitive data may lead to compliance and regulatory issues.
4. Automation and Tools
In TDM, automation can be used to perform the above tasks of data cloning, data generation and data masking. If done manually all these steps are really time-consuming and error-prone as we are dealing with huge data.
Automation scripts could be created or licensed test data management tools like Informatica, Delphix DATPROF etc. can be used. Get to know more TDM tools. Advanced tools also help in reporting, to aid the organization make better decisions about test data.
5. Maintenance and Refresh
There is a central repository of the test data, which has rules for access and privileges. The test data needs a periodic refresh to reflect the latest and most-relevant test data. If multiple modules in a project are using the same test data repository a properly managed refresh cycle is a necessity.
Along with data refresh, the maintenance of the repository is also very important. Over a period of time, the test data may become obsolete or redundant. There has to be proper maintenance of the test data to keep it consistent, correct and available over time.
Otherwise, such data will hold unnecessary storage space in the repository and the search for relevant test data may take longer than expected.
Why is TDM Important?
Having a dedicated test data management team and a systematic TDM process in place has immense benefits for the organization and the customer.
Below are the points which depict the importance of TDM.
1. Increased Test Data Coverage:
TDM helps in having traceability of the test data to test cases and then to requirements. This provides a bird-eye view of the test data coverage and the defect patterns.
2. Cost Reduction by Finding the Bugs Early:
As seen in the previous point, there is better test data coverage and the traceability provides a clearer picture. This helps in finding the bugs early, and the cost of production fixes is reduced.
3. Data Provision Based on Testing Type:
A unique feature that is come with the TDM process is that the data is managed in one place. You can extract the appropriate data from the same repository for different testing types– Functional, Integration, Performance, etc. This reduces data redundancies and the cost of storage.
4. Data Compliance and Security:
There are strict regulations and compliance rules by govt. and authorities, and these need to be followed by everyone. Data masking is an integral part of a TDM process, and data security and compliance are the top priorities.
5. Reusability of Data:
Reusability is the most valuable feature of the TDM, as this helps in further reduction of cost. The reusable data is sorted out and is archived in a central repository for future use. Whenever the requirement for reusable data arises, the testers can use the archived data.
6. To Reduce Copies of the Data:
In a project, multiple teams can make multiple copies of the same production data for their use. This results in redundant copies of the same data and storage space are misused. When a TDM is used the same repository is used by all the teams and hence the storage space is utilized diligently.
7. Customer’s Trust:
The key advantages of the TDM process are quality data and very good data coverage. With these qualities present during the testing phase, the bugs are unravelled early. The result is a stable and high-quality application, which has minimum production defects. Customer’s trust level in organization increases, when a customer gets to see such enticing results of adopting a TDM process.
What are the Benefits of Test Data Management?
Only performing testing is not sufficient if your test data is unreliable or lousy. What you need is a planned approach toward test data management so you receive the best benefits of conducting testing at every stage.
- Improves quality: Better the quality of your test data, the better the test results you get. Your final product is only as good as the test data you use.
- Eliminates security problems: Test data management ensures that your test data is safe, separate from the production environment, and easy to access by the right people.
- Decreases instances of redundant tasks: Managing test data properly helps you to make sure that it resembles the production environment data closely. So, you don’t have to spend time building an application that might fail due to a lack of real-time data for testing.
- Promotes agility: If you handle the test data with care, you can increase agility by reducing test data creation time, which helps eliminate production delays and execution time.
- Keeps data-related issues in check: The statement itself says that a good set of data with good management prevents data-related bugs in the product.
- Reduced time-to-market: Increased development process with fewer issues results in avoiding possible delivery delays.
Best Practices for Test Data Management
Sometimes, if not all the time, it can become complicated to manage test data in an agile environment. Here are some of the effective test data management best practices you should know about:
- Focus on the security of the data
- Keep the real and test data isolated from each other
- Keep a focus on application security
- Automate data management and usage
- Refresh data using a central repository
- Perform continuous data analysis to update test data as and when necessary
Common Test Data Challenges
Every management comes with its fair share of challenges. And here are some of the common test data challenges you can expect to come across while managing your test data:
Lack of Test Data Security
It is possible that your test data is not secure from external as well as internal breaches. What if your data lacks the possibility of being separated from the real data, making it difficult to differentiate between test and real data?
Often, development teams have access to a high amount of data sets that lack purpose: the test data is not fit for the task. For instance, depending on the new feature release, you would require different input test data to verify the update. But developers and QAs can only rely on old data sets due to the complexity of setting up and refreshing the test environment to support new data.
If you forget to back up your test data for future use to eliminate the instances of redundancies, then you also risk losing all the progress you have made up until now.
Poor Data Quality
After inappropriate test data, another test data management challenge is the presence of poor test data. They do not help to overcome production or data-related issues and increase time-to-delivery.
The lack of sufficient data to conduct the testing process is yet another test data management challenge that many companies face. The resolution is to use dependable automated software to create production-like test data.
The testing team performs the test data creation. Usually, the testing team does not have direct access to the production data. Even if the production data is provided, it is a large chunk of raw data. It is not possible to use the raw data directly for testing purposes; considerable effort is needed to sort, manage and tailor the data for use.
High-quality data is a basic need if we are planning to have high-quality software testing. Average data quality will provide mediocre results after testing; no one ever wants that. To resolve all these problems, test data management is the best solution.
With Agile and DevOps, the testing cycles are getting smaller. Creating quality data within that cycle, along with performing software testing, can get really complex. To reduce cost, time, and efforts in the testing cycle -Test data management seems to be an ideal solution, with visible results. This instills a sense of satisfaction and trust in the customer, and better business is the outcome.
Testsigma is one such test automation tool that enables continuous testing along with elaborate test data management. The test cases can be automated in simple English via NLP and can be maintained easily using the self-help feature that is built into it.
Testsigma recognizes the need for an efficient test data management system and has a built-in test data generation facility. It also supports multiple data sources like JSON, excel, and in-built data tables such that you can choose the format that suits you the best.
Manage your data and simplify your test automation with Testsigma
Manage your data and simplify your test automation with Testsigma