A Webinar about resolving the root cause of intermittent automated execution.
Replay
You can watch the live webinar replay on the Eurostar Huddle site:
The official recording does not appear to have worked properly - no slides, robot voice, etc.
Fortunately I recorded it.
I have uploaded the recording to my Evil Tester Talks Archive, which also includes the full 1.5 hour version with more detail and (at the time of writing) 7 other talks.
And I have released it on YouTube.
Watch the Full 1.5 Hour Version in Evil Tester Talks
Slides
Your Automated Execution Does Not Have to be Flaky
Eurostar Webinars Feb 2018
Alan Richardson
Have you experienced flaky test automation? In this webinar, Alan Richardson plans to convince you that you haven’t. Instead you have experienced the result of not resolving the causes of Intermittent execution. Alan will explore some common causes and solutions of intermittent behaviour. Why? So you never say the phrase, “flaky tests”, ever again.
Flaky Test Automation is Normal
Have you experienced flaky test automation?
Particularly GUI Automation
Yeah. Of course. Its normal.
“Our Tests are Flaky”, “Some tests fail Randomly”
Flaky Test Automation is Normal
Because we have normalized Flaky Test Automation
- “Our Tests are Flaky”
- “Some tests fail Randomly”
“There is nothing so absurd that it has not been said by some philosopher.”
Cicero, On Divination, Book II chapter LVIII, section 119 (44 BC)
“Truth happens to an idea. It becomes true, is made true by events.”
William James, Lecture VI, Pragmatism’s Conception of Truth, Pragmatism: A New Name for Some Old Ways of Thinking (1907)
Its a nonsense IDea but people say. Constantly. Experts. People who know what they are doing. Its normal.
Flaky tests ‘become’ true because we write them. We don’t learn the strategies to fix them so flaky is how we describe them.
How to Normalize Flaky Test Automation
We all know these Test Automation Truths
- “GUI Automation is Flaky”
- “We have to live with ‘flakiness’”
- “We shouldn’t automate at the GUI”
- “We can only remove flakiness under the GUI”
“Flaky Tests” blames the tests. Not good enough.
Have you accepted flakiness as the normal state of affairs?
Not flaky execution, not flaky environment, not flaky data. Flaky Tests.
Flaky tests is too high a level. Too high an abstraction to deal with the problem.
It isn’t even the “Tests”
-
We don’t automate Tests
-
We automate the execution of steps in a workflow or process
-
We automate the execution of a System
-
We add condition assertions during that execution
-
We call that a “Test”
We don’t have flaky Tests - we have automated execution that fails. Sometimes on the steps, sometimes on the assertions.
‘Flakiness’ does not reside at a ’level’
I have seen ‘flakiness’
- in Unit Tests
- in API Tests
- in Integration Tests
- in GUI Tests
It is too easy to say ‘flaky’
- and then blame ‘GUI execution’
- and then blame ’the tool’
Living with flakiness is a choice.
Choose a different approach.
It is too easy to say ‘flaky’ and then blame ‘GUI execution’. This is too easy an excuse.
It is too easy to say we have to live with ‘flakiness’ because we don’t have an API.
I am not the only person saying this.
- see references at the end and name drops throughout
- try to cover something different in this talk
“We designed that flakiness. We are allowing that to happen. We engineered it to be that way. And its our fault that that exists.”
Richard Bradshaw, “Your Tests aren’t Flaky, You Are!” Selenium Conference 2017
https://www.youtube.com/watch?v=XnkWkrbzMh0
Take it more seriously. Describe it differently.
Intermittent
- Occurring at irregular intervals; not continuous or steady.
https://en.oxforddictionaries.com/definition/intermittent
something is happening, but not all the time
that makes me think there is a cause that we haven’t identified
there is some factor in the system that I don’t know about
“Flaky” doesn’t do that
Take it more seriously. Describe it differently.
Nondeterministic Algorithm
“a nondeterministic algorithm is an algorithm that, even for the same input, can exhibit different behaviors on different runs”
That doesn’t sound like how we describe ‘automation’.
That’s not what we want.
Flaky is not serious enough.
We do not want to use nondeterministic algorithms for continuous assertions that we are relying on
We are trying to rely on this. Continuous Integration. Continuous Deployment. We need this to run reliably. We need to trust that pass means everything ran as expected.
remove the word ‘flakiness’ from your vocabulary
Your Test Automation is not Flaky
Your automated execution fails intermittently
You have experienced intermittent failures. You have chosen to live with them, you haven’t fixed the causes of intermittent failure.
Don’t Blame Tests. Look For Root Causes.
watch Alister Scott’s GTAC 2015 talk
I have removed ‘flakiness’
- from Unit Tests
- from API Tests
- from Integration Tests
- from GUI Tests
Automated execution does not have to fail intermittently.
How to remove Intermittent Failure from your Automated Execution
- Care
- Investigate
- Do something about it
How to remove Intermittent Failure from your Automated Execution
- Decide Intermittent Failure is unacceptable
- Investigate the cause of Intermittent Failure
- Mitigate
- Remove the cause
- actually fix it
- Implement a retry strategy
- might obscure bugs
- Accept Intermittent Results
- might provide hints at solutions
- Remove the cause
Obvious thing to do is remove the cause. But this can be hard.
You might try a retry strategy. This can hide problems. But if you your system exhibits intermittent failure to users in live then you might have to do this. Because that is what users do in live. i.e. the problem might not be your execution, it might be your system.
Alister Scott talks about retry strategies obscuring bugs.
Accept intermittent results. You might move everything that is failing intermittently into a test run of its own. This creates a ‘good’ execution pack, and a ‘bad’ intermittent pack. You isolate the failures until you can work on them. If you find that you keep moving things into the intermittent pack, then you need to rethink the mitigation strategy because eventually the pack will get larger and you haven’t done anything about the root causes.
Take it seriously
We write Automate Assertion checking because we care that those assertions are true for each build of the system.
Determinism is important.
High Level Grouping of Common Causes of Intermittency
- Synchronisation - lack of or poor
- Parallel Execution - interference
- Long Running Tests - too long, too risky
- Automatability - hard to automate system
- Tools - inappropriate or out of date
- State Preconditions - not controlled
- Assertions - wrong or incorrect assumptions
- Data - not controlled
see also Richard Bradshaw and Mark Winteringham “SACRED” Mnemonic.
Splat Sad
Will cover Top 3 for each Grouping
Top 3 Common Causes - Synchronisation
-
None
-
Time Based Synchronisation
-
Incorrect App State Synchronisation
-
None
Like a Rhino in a Chocolate Shop. Charging through everything.
Remote Cloud Execution might improve things because of latency.
- Time Based Synchronisation
implicit waits - everything waits, which can slow down error reporting
worse, thread.sleep()
Wait for 10 seconds.
But all sync is time based… timeouts. But these can be domain specific and can assert SLAs, adjusted for remote latency or environment conditions.
Ideally we want state and domain synchronisation.
- Incorrect App State Synchronisation
e.g. wait for ‘ajax’ image to go away, rather than Ajax call to complete processing
Naive app state identification e.g. “Page footer present means page is ready to work with”
Common Solutions - Synchronisation
- Synchronise on States
- do not rely on framework Synchronisation
- Multiple Intermediate States
- Consider Latency
- Synchronise in Abstractions not the
@Test
methods- unless
@Test
specific
- unless
Top 3 Common Causes - Parallel Execution
- Framework not thread safe
- Tests Interfere
- Shared Test Environment
People often jump into Parallel Execution too soon.
- Framework not thread safe
People build Frameworks, which control how they work. Often heavy on inheritence, Single instantiated control objects, shared global state and variables. All stuff that makes parallelism hard.
Test Frameworks might be thread safe, but our abstractions often are not.
May need to fork rather than thread e.g. different suites
- Tests Interfere
System state preconditions changed by other tests running in parallel
Shared data can cause interference. Browsers. System processes. System responding to one request.
Might even be bugs in the system.
Shared browser usage.
- Shared Test Environment
Not just shared for automated execution - data controls can help with this.
But shared across different teams, Exploratory testing going on in the same environment as automated execution, CI, performance testing
Common Solutions - Parallel Execution
- Independent environments
- Independent Data
- Separate Suites rather than threaded execution
- Create Threadsafe, reusable code
- Create reusable library abstractions rather than Frameworks
- Avoid ‘static’ singleton objects
Top 3 Common Causes - Long Running Tests
-
Sequential rather than Model Based
-
not delineating between: preconditions, process, assertion
-
components tests in flow rather than isolation
-
Sequential rather than Model Based
Model based has a lot of synchronisation built in. Particularly if it is state transition based because State transitions have guard conditions or entry conditions. Synch should be baked into model based execution pretty well.
Model based usually has some variety handling.
Sequential doesn’t have a lot of variety. It can’t handle pop-ups, A/B tests.
Short flows are usually easier to make reliable.
- preconditions, process, assertion
We want to:
- setup the preconditions we need
- carry out a process
- assert on the conditions we want to check
Make ‘precondition’ setup part of the test.
Too long a ‘process’ i.e. the stuff you do in the execution might leave you at risk of overlap, or too long to setup.
Too many assertions can slow things down, overlap with other processes, timing of assertions might be hard. Sync on assertion checking.
- components tests in flow rather than isolation
Very often we think we need to assert on all conditions in a full executable flow - login through GUI, navigate here, setup data in GUI, do something. Check in GUI.
It is possible to assert on GUI rendering and some functionality in isolated playgrounds. Particularly if we design the system to operate as components. e.g. good for React and other JavaScript MVC fameworks
Common Solutions - Long Running Tests
- Understand that more actions == more risk
- Synchronise prior to each step
- Consider Model Based Testing
- Create component test and automated execution playgrounds
- Minimum assertions
Top 3 Common Causes - Automatability, Automatizability
Not Testability:
-
Application has non-deterministic behaviour
-
Hard to Synchronise
-
Application fails non-deterministically in live
-
not Testability
Some people refer to this as ’testability’, but it isn’t. I can often easily test an app that I find hard to automate.
- Application has nondeterministic behaviour
JavaScript Callbacks trigger ‘random’ popups
Async and out of order processing
A/B Testing in app
External ‘stuff’ not related to core functionality e.g. ads, libraries, rendering
Alister Scott covers this in his GTAC 2015
- Hard to Synchronise
JavaScript frameworks. Many DOM updates
e.g. Queued messages (guaranteed), buffered processes, promises.
- Application fails non-deterministically in live
Saw this at a client recently - every team was reporting flaky tests - GUI, API, Components, Backend, Low level stuff. Every team.
Intermittent Application Architecture - the application is intermittent in live, by design.
Not even ‘bugs’ just known to be a ‘flaky’ app at times, and user just presses button again.
you can’t expect to have deterministic automated execution if your underlying application is non-deterministic.
Common Solutions - Automatability, Automatizability
- Build apps that can be automated
- Non-Deterministic apps need step retry strategies rather than test retry strategies
Top 3 Common Causes - Tools
-
Out of Date
-
Inappropriate
-
Local Tool Infrastructure
-
Out of Date
out of date language bindings
out of date drivers and APIs
- Inappropriate
using the wrong tool for the wrong job, e.g. use WebDriver to automate a GUI that sends HTTP REST API methods, instead of automating the HTTP REST API
- Local Tool Infrastructure
environment performing automated updates
e.g. browsers automatically update rather than controlled to match the driver versions
e.g. maintaining a local grid or tool environment
environment not maintained
environment not up all the time
environment out of date
Common Solutions - Tools
- Use the right tool for the job
- Keep your tooling environment controlled and up to date
- Change your approach to take latency into account
- process on server return results
- return source, process on execution client
Top 3 Common Causes - State Preconditions
-
Not Checking State Preconditions at start of test
-
Not controlling state preconditions prior to test
-
Precondition setup using same tool
-
Not Checking State Preconditions at start of test
We want to fail fast and at the appropriate point.
We don’t want to ‘do something’ that may or may not pass. We want to make sure it can pass. And report if it can’t.
- Not controlling state preconditions prior to test
To the best of your ability make it so that your execution can pass. Setup what you need. Lock what you need. Book what you need.
- Precondition setup using same tool
e.g. GUI automated execution has state setup using GUI automated execution
Mark Winteringham Selenium Conf talk
Common Solutions - State Preconditions
- control data
- precondition state setup - whatever works
- http, db, api - ‘hack it in’
- avoid dependencies between execution unless a long running test
Use abstraction layers to create the dependencies rather than rely on other tests.
Top 3 Common Causes - Assumptions Encoded in Assertions
-
Assert on an Ordered Set
-
Assert on Uncontrolled Data
-
Assertion Tolerences
-
Assert on an Ordered Set
multiple data items, asserted in non-guaranteed order e.g. {1,2,3}
sometimes {3,2,1}, sometimes {2, 1, 3} but we always assert on an ordered set {1,2,3}
instead assert on ‘includes’ and ’length’
{S}-union-{1}=={1}, {S}-union-{2}=={2}, {S}-union-{3}=={3}, |S| == 3
Need to make sure assertions don’t allow duplicates or extras to slip through.
- Assert on Uncontrolled Data
assert on ‘stuff’ not important to test
This might also arise from duplicated assertions.
- Assertion Tolerences
assertion tolerances not expansive e.g. amended time not within 3 seconds (operation took longer than expected), amended time == created time (because scale used is milliseconds instead of nanoseconds i.e. not enough to make distinction)
Common Solutions - Assumptions Encoded in Assertions
- Logging so you can interrogate failure afterwards
- Ability to re-run tests with same data and setup
Top 3 Common Causes - Data
- Missing Data
- Externally controlled data
- Uncontrolled Data
Data issues are pretty easy to spot, and pretty easy to avoid. But are all too common, particularly when we view the data as too hard to setup, because of complicated conditions, or dates, or amount of transaction data.
-
Missing Data
-
Externally controlled data
- Static data
- Hard-coded data
- Uncontrolled Data
- Live Data
- Randomly Generated Data
A lot of this comes down to the system design and programming notion of “Avoid Global State” or “Avoid Shared State” “Avoid Global Variables”
Common Solutions - Data
- Create data for each test
- Avoid test dependencies
- Avoid re-using data between tests
- Check data as a precondition
- Data synchronisation on all precondition data
Summary
- Your Test Execution is not ‘flaky’, it is failing intermittently
- It is possible to remove intermittent failures, even when automating through a GUI
- Commons solutions: synchronisation, data control, environmental isolation
Other talks to watch
-
Alister Scott, GTAC 2015: Your Tests Aren’t Flaky
-
Richard Bradshaw, “Your Tests aren’t Flaky, You Are!” Selenium Conference 2017
Other talks to watch
-
Craig Schwarzwald, SAY GOODBYE TO THE “F” WORD … FLAKY NO MORE!
-
Mark Winteringham - REST APIs and WebDriver: In Perfect Harmony
Search also for: Flaky Selenium, Flaky Automation, Flaky Test Automation
End
Alan Richardson www.compendiumdev.co.uk
- Linkedin - @eviltester
- Twitter - @eviltester
- Instagram - @eviltester
- Facebook - @eviltester
- Youtube - EvilTesterVideos
- Pinterest - @eviltester
- Github - @eviltester
- Slideshare - @eviltester
BIO
Alan is a Software Development and Testing Coach/Consultant who enjoys testing at a technical level using techniques from psychotherapy and computer science. In his spare time Alan is currently programming a Twitter client called ChatterScan, and multi-user text adventure game. Alan is the author of the books “Dear Evil Tester”, “Java For Testers” and “Automating and Testing a REST API”. Alan’s main website is compendiumdev.co.uk and he blogs at eviltester.com