Pragmatic Approach to Test Data Management

Pragmatic Approach to Test Data Management

Test data is the input given to a software application during execution phase of testing. Testers often use existing test data or generate new data during their testing activity.

A study done by IBM indicates that testers spend almost 30-60% of their time in searching ,maintaining and generating test data

Nevertheless the study clearly indicates that test data preparation and maintenance is infact a time consuming activity. Therefore as a QA community we need to apply an effective approach to collection, generation, automation and management of test data for both functional and non - functional requirements.

Challenges in Test Data

Some of the key challenges experienced by teams when sourcing test data

  • Test Data coverage is often not adequate
  • Large volumes of data is required in short period of time
  • Test data is often duplicated across teams
  • Lack of a centralized test data management approach results in longer testing timelines
  • Test data in lower environments do not mimic the production data scenarios resulting in critical defects being leaked in to production
  • Lack of mechanism to ring fence test data leads to erroneous test results

Test Data Management - Hybrid Approach

A hybrid approach to test data management is where we use a combination of tools and strategies to both generate new test data as well as use existing masked data from production and implement automated process that aid in a centralized management of data.

No alt text provided for this image

Data from multiple sources can be aggregated and inserted in to a TDM database. The test data can originate from synthetic test data generators, automation scripts that are executed to create test data within the applications, and also production data being brought down in to the lower environments through subsetting and masking process.

The other critical component of the TDM approach is a web based front end interface which the testers can use to select & filter conditions based on the test case / test scenarios for which they would require data. Based on the criteria selected from the webpage a API request is fired to the TDM database.

Sample Request payload to the API

{
	"request": {
		"DataCount": "3",
		"existingData": "0",
		"username": "tdm",
		"password": "FD*8DE",
		"minBal": "1000",
		"accountType": "D",
		"openDate": "10012004",
		"yearOfBirth": "1981",
		"regionId": "451"
	}
}

The response from the API contains the test data requested by the testers and it can be chosen either from an existing masked data from production subset or a new data record that was created using the automation scripts and data generators. The choice for existing data or new data record depends on the existingData parameter on the request JSON payload.

Sample Response from the TDM DB


{
	"response": {
		"data": [
			{
				"dataId": "01",
				"regionId": "451",
				"customerId": "54083102489085",
				"dateOfBirth": "1981-08-31",
				"initials": "S",
				"surname": "John",
				"firstNames": "Patrick",
				"gender": "M",
				"status": "A",
				"issueYear": "2015"
			},
			{
				"dataId": "02",
				"regionId": "451",
				"customerId": "670831882489085",
				"dateOfBirth": "1981-08-31",
				"initials": "V",
				"surname": "Danny",
				"firstNames": "Johnson",
				"gender": "M",
				"status": "A",
				"issueYear": "2017"
			},
			{
				"dataId": "03",
				"regionId": "451",
				"customerId": "22083102488285",
				"dateOfBirth": "1981-08-31",
				"initials": "B",
				"surname": "Anita",
				"firstNames": "Newman",
				"gender": "F",
				"status": "A",
				"issueYear": "2017"
			}
		]
	}
}

The test data is served back to the front end website via the API response. The webpage provides an option for the tester to either select the data that is returned or reject. Once a data record is selected it is locked for that particular user in the DB. This would ensure that the selected data record is made available exclusively for that particular tester for his testing requirements. A release parameter is also setup globally in a config file that indicates a timeout until when a data needs to be exclusively locked. Any locked data record will be released back in to the pool for reuse automatically after the timeout is reached. This mechanism allows test data to be ring fenced / locked exclusively for a tester and the timeout ensures the data can be reused by others once testing is completed. The front end interface also has a feature to manually release the lock on a particular data record this can be used in scenarios when a tester has completed the testing before the timeout window and would like to make that data record available to others. The manual release flag will override the global timeout setting.

When a tester rejects a data record from the front end the workflow again restarts prompting the user to select an alternate data record. The tester is also given option to request for multiple data records which can be used for bulk test data retrieval for use cases such as performance testing. The data in the TDM database can also be refreshed on regular cycles where new data is injected in to the database again via the 3 mentioned routes.

Centralized test data management ensures an organized and central approach to managing test data requirements across teams. The automated approach to TDM ensures proven processes resulting in greater accuracy of data returned and the testing process as such becomes much more predictable.

Arindam Mukherjee

Head of Data & Analytics Engineering ◉ Cloud & AI Transformation Leader ◉ Data Strategy Expert ◉ PB Scale Migrations Expert ◉ $80M+ Value Delivered ◉ MBA, B.Tech.

4y

Test Data Management is an often overlooked aspect and not made part of a new data ingestion efforts. A reusable framework to create and maintain test data( both facts and dimensions) which also makes it available as needed is an invaluable investment.

To view or add a comment, sign in

Others also viewed

Explore topics