It’s likely a coincidence, but I’ve recently noticed a slew of issues arising in my work around test data management. As I dug into this, I found a disappointing lack of prescient information on how to manage this in an Agile environment. It’s going to be a quick one this week, but here’s my take on agile test data management.
What is agile test data management?
Test data management is the process by which we create and manage realistic data for use in the creation of software. Agile test data management is how we manage this in an Agile environment. Simple.
Why is it important?
If you’ve read any of my previous articles, you might have noticed that I have a thing against dependencies. Poor test data management introduces dependencies in our system, which slows learning and value creation.
Without effective test data management our QA engineers are left waiting for appropriate data to be created, or need to spend significant time creating bespoke data in a hard to use system. All of this makes it harder for our organisations to be adaptable to market or customer changes.
Another aspect to consider is quality. When we test using poor data, we may as well not bother at all sometimes. It’s a bit like testing on a snowflake environment. If it doesn’t represent live, then what are we actually learning?
Principles for good agile test data management
As with most things, at least as far as I’m concerned, principles trump frameworks. Here are some principles that can help guide our approach.
Data can either be synthesised by creating test data that mirrors a live-like set or can be cut as a subset from actual live data once it has been sanitised
Either option is fine, to be honest. We’d have a preference for a real cut of live data, but not at the expense of long lead times or unnecessary complexity. It’s good to have both options available.
Test data is refreshed on-demand in order to ensure integrity and maximise the flow of work
The brutal and continuous march to eliminate dependencies in our system applies everywhere, even in agile test data management. Ideally, there should be zero wait time between requesting and receiving appropriate live-like test data.
The people who will be using the test data should own and drive the creation and management of the test data
If we need someone else to push the magic test data creation button then we’ve introduced a dependency, and an unnecessary one at that. Provide self-service functionality if you need to protect resource creation or sensitive data. By providing our skilled colleagues with the tools they need we build trust, promote ownership and encourage mastery.
Test data is created with as much care and attention to quality as production code
Much as we are now starting to treat our tests as first-class citizens, it is important to make sure that the test data management scripts they depend on are treated in the same way. The data are part of the tests, so get your scripts in version control and make sure they pass any checks you’d normally run for production code.
Test data is cattle, not pets
Like environments, we should be able to discard and recreate test data at will. Data are never to be cared for and nurtured, they become a drain on our resources. If the tools we build to create and manage the data are well executed, then we should be able to discard all of our test at will. This has all of the benefits we’d expect from good environment management.
Data are automatically sanitised
We should never be at risk of accidentally releasing or deploying sensitive data. Any process or tools that create or maintain test data should automatically sanitise it, introducing the possibility of human error where PII is concerned is just asking for trouble.
Signs you might need agile test data management
If you find yourself or someone else saying the following, then that might be a smell that you’ve got test data management issues.
- “We need to keep updating the tests because the test data keeps changing”
- “We’ve been waiting days for a new cut of live”
- “It’s getting longer to get test data because our production database is so large”
- “I can’t get the data I need to test this and dev have moved on to another ticket”
- “The live database is owned by another team”
- “It’s taking longer and longer to run the automated packs as the test data is so large”
One of my colleagues from DevOpsGroup, Bob Larkin from https://www.larkinaboutinazure.com, has pointed out that having large test data is actually one of the most common issues people need to tackle. He adds:
“Historically, maybe due to resource restrictions in bricks and mortar data centres and the use of monolithic apps meant people put data into large databases or grouped data that wasn’t necessarily related or dependant together on the same DB server. One of the challenges to make data test management possible is revisiting and often redesigning the data layer for the new cloud world so they are in smaller independent sets. Making it easier to reproduce in a test environment in their discrete units with smaller data sets”
The ideal situation for agile test data management
Here’s what I want to hear when people are describing test data management in their organisations. I want to hear that it is an enabler of value and collaboration, not a blocker. I’m idealistic, but I’m alright with it.
Since development and QA started collaborating on test data, the new APIs that were produced have put power back in the hands of QA. They are now able to create and refresh any data they need whenever they need it. This reduced a key bottleneck so work is now spending less time in test.
We got a bonus when we noticed that QA now had more time available as they didn’t need to constantly chase people. Our automated coverage has increased as a result of having more time to pay down some test debt.
As QA and dev are now working more closely together on test data, we’ve also noticed a reduction in bugs being raised. We think this is because dev has a clearer picture of what’s going to be tested before they start work on the features. Somebody said they were planning a joint demo next week.
How to get started
Here are the top points I always encourage when an organisation starts to think more about their test data managment:
- Document and share your strategy
- Update your policies to include test data management requirements, specifically the definition of done
- Document your work capacity profiles, providing a protected percentage for QA innovation and debt pay-down
- Identify test data requirements as early as possible, get them documented on the tickets when you refine them
- Devolve accountability for test data to the teams who will be actually using the data to do their job
Agile test data management is a little niche but it’s very important. Any area that is hampered by dependencies or fragility deserves our attention if we are truly interested in succeeding.
What are your hallmarks of good or bad test data management?