Quality Engineering Discussions: 5 Questions with James Espie
In this series, real (and really good) QA practitioners use their experience to support—or debunk what you might know about software quality.
James Espie is a test specialist, a quality engineering proponent, and a continuous learner from Auckland, New Zealand. He shares his insights and sporadic bursts of inspiration in a hilarious newsletter called Pie-mail. If you haven’t seen it, you should check it out.
In this quick QnA, Espie makes the point that software quality begins with people, that more testing doesn’t necessarily mean better quality, and that context matters in metrics.
Q1: What’s the most common organizational bug that you’ve seen?
An “organizational bug” I see often is teams that are ‘agile’ until it comes to testing.
Design, product, development functions are all collaborating and working together. But when it’s time to test, the team switches to ‘waterfall mode’. They pass the ticket to the tester and move on to something else.
Often agile teams don’t have a clear picture of where a test specialist should engage. The fix is to encourage teams to include testing at every stage of the development process. But this is easier said than done.
One thing that’s worked for me is encouraging developers and testers to test together.
When a developer has a change ready to merge, they pair up with a tester. The developer demonstrates to the tester that it “works on my machine”. We call this process “WOMMing”. I’ve also called it “show and tell”.
(The term WOMMing was coined by one of my former colleagues at Pushpay – I don’t know who exactly to attribute it to though.)
During WOMMing, the role of the tester is to tease out edge cases, ask questions, and be critical of the change. It’s useful because it helps find bugs early. Additionally, it encourages the developer to learn some testing skills.
Of course, it’s not a silver bullet. Testers need to be pulled into other phases of the development life cycle. But if a team is at a point where testers aren’t included at all, pairing with developers could be a great way to start.
Q2: What are some testing anti-patterns you wish never existed?
I wish massive end-to-end UI testing suites did not exist.
I see this often. Organizations use automation to test every screen, function, and feature in their product. In my experience, you only need a few E2E UI tests. As a test suite starts to get bigger, the value from each extra test starts to diminish.
Some things that have stood out to me include:
- Tests that check the same screen over and over, with only slight variations in data.
- Tests that check things that are already covered by tests at other layers; Unit or API tests, for example.
- Tests that check features that are of little to no business value.
These don’t seem like a good use of time. Also, each extra test will make the test suite take longer – and, potentially, more brittle.
A lot of organizations fall into the trap that you need to automate everything. Testers become test automation engineers and end up building more and more automated tests. I’m not convinced; I think a tester’s time could be better spent elsewhere.
Q3: Metrics question. What are some QA or testing metrics that you ALWAYS want to track? Why?
First I’ll say, I am really careful with metrics.
If care is not taken, and teams don’t understand why metrics are important, they can be tempted to ‘game the system’. There’s a really interesting (and shocking) example from Volkswagen from a few years back, where they ‘cheat’ to achieve emissions targets.
That isn’t to say metrics aren’t important, but they do need to be treated with respect. Personally, I like DORA metrics:
- Deployment Frequency (how often does a deploy happen)
- Mean Lead Time for changes (how long it takes to get a change to production)
- Mean Time To Recovery (how long it takes to recover from an incident)
- Change Failure Rate (how many changes fail in production)
My approach to metrics like these is simple: find out what the numbers are and then work to improve them. For example, it is useful to know the Mean Time To Recovery. If it’s half a day, then that’s a useful and interesting piece of information. Once we know that, then we can look at ways to improve that time and measure it again.
My suggestion with metrics though is:
- Be clear what action is being taken to improve a certain metric, and that it makes sense.
- Measure, and measure again after improvements have been made. But, don’t set arbitrary values as targets. “Make our time to recovery faster” is far better than “make our time to recovery less than ten minutes”
- Make sure everyone has an understanding of why you want to measure a specific thing. Context matters!
Hopefully, these suggestions will help to avoid becoming like the team at Volkswagen 🙂
Q4: What are some Severity 1 bugs that ALMOST slipped past you — and how did you make sure they never did so again?
This is a story about one that did slip past us. But, we got away with it.
My team worked on a billing product for a SaaS company. We had a seemingly simple task, replacing a couple of our products with new ones. The products also had new prices.
Now, the data structure was overcomplicated. There were two tables, product, and price. Each product had a start and an end date. It was related to a price, which also had its own start and end date. (You might see a problem here already!)
To make matters worse, the product dates were stored in NZT. But the price dates were stored in the local timezone for that region. We missed this in testing.
When we went to production, we ended up with a product that ‘ended’ at midnight NZT. Its associated price didn’t ‘start’ until midnight GMT. So, for a window of twelve hours, nobody in the UK could buy our product. Yikes!
We got lucky. Our product was a B2B product, and the 12 hour period fell on a Sunday. So it was very unlikely anyone was trying to make a purchase. After the fact, we checked the usage statistics – zero hits. Pretty thankful we made the call to change the price over a weekend!
We ran a post-mortem to figure out what could prevent this from happening again. We decided that the fix for it was to refactor our database so that everything used the same time zone. But, this was going to be a hefty piece of work – that one change had significant downstream effects.
There was other information to take into account:
- Price changes are a rare occurrence for this organization.
- Product changes are even rarer.
- There was an ongoing piece of work to replace this billing system – in a couple of years, we expected it to be gone.
We opted to do no mitigations. That’s kind of counter-cultural, but it was the right thing to do. I guess my point is, sometimes, doing nothing is entirely reasonable!
Q5: What’s your most memorable release day? (Good OR bad, just as long as no one dies at the end…)
The world of CI/CD highlights the difference between deploying and releasing. We deploy software all the time. Silently behind feature flags or otherwise inaccessible by customers. Then, when it’s ready for customers, all we need to do is flip a switch. Then, it’s released.
My team did this recently for a new integration feature. All our code was sitting in production, behind a feature flag. Once we were confident everything was in place, all we had to do was switch it on. Very unglamorous, and no ceremony.
We returned to the office the next day. Our lead engineer went to check the logs first thing that morning, to see if anyone had used the newly ‘released’ feature. And sure enough, one person had!
Now I know, it doesn’t sound like much. But, for us, it was incredible. After months of work over that feature, we had our first real live user!
The joy comes not from releasing the feature but seeing people use it and get value from it.
This is one reason why having good monitoring and logging is so important. You can see what your users are doing, and celebrate the success!
For more test insights by James Espie, check out JPie.nz