How to fix Flaky Test Automation Webinar

A Webinar about resolving the root cause of intermittent automated execution.

Replay

You can watch the live webinar replay on the Eurostar Huddle site:

https://huddle.eurostarsoftwaretesting.com/resources/test-automation/automated-execution-not-flaky/

The official recording does not appear to have worked properly - no slides, robot voice, etc.

Fortunately I recorded it.

I have uploaded the recording to my Evil Tester Talks Archive, which also includes the full 1.5 hour version with more detail and (at the time of writing) 7 other talks.

Evil Tester Talks Archive

And I have released it on YouTube.

Watch on YouTube

Watch the Full 1.5 Hour Version in Evil Tester Talks

Slides

Your Automated Execution Does Not Have to be Flaky

Eurostar Webinars Feb 2018

Alan Richardson

Have you experienced flaky test automation? In this webinar, Alan Richardson plans to convince you that you haven’t. Instead you have experienced the result of not resolving the causes of Intermittent execution. Alan will explore some common causes and solutions of intermittent behaviour. Why? So you never say the phrase, “flaky tests”, ever again.

Flaky Test Automation is Normal

Have you experienced flaky test automation?

Particularly GUI Automation

Yeah. Of course. Its normal.

“Our Tests are Flaky”, “Some tests fail Randomly”

Flaky Test Automation is Normal

Because we have normalized Flaky Test Automation

“Our Tests are Flaky”
“Some tests fail Randomly”

“There is nothing so absurd that it has not been said by some philosopher.”

Cicero, On Divination, Book II chapter LVIII, section 119 (44 BC)

“Truth happens to an idea. It becomes true, is made true by events.”

William James, Lecture VI, Pragmatism’s Conception of Truth, Pragmatism: A New Name for Some Old Ways of Thinking (1907)

Its a nonsense IDea but people say. Constantly. Experts. People who know what they are doing. Its normal.

Flaky tests ‘become’ true because we write them. We don’t learn the strategies to fix them so flaky is how we describe them.

How to Normalize Flaky Test Automation

We all know these Test Automation Truths

“GUI Automation is Flaky”
“We have to live with ‘flakiness’”
“We shouldn’t automate at the GUI”
“We can only remove flakiness under the GUI”

“Flaky Tests” blames the tests. Not good enough.

Have you accepted flakiness as the normal state of affairs?

Not flaky execution, not flaky environment, not flaky data. Flaky Tests.

Flaky tests is too high a level. Too high an abstraction to deal with the problem.

It isn’t even the “Tests”

We don’t automate Tests
We automate the execution of steps in a workflow or process
We automate the execution of a System
We add condition assertions during that execution
We call that a “Test”

We don’t have flaky Tests - we have automated execution that fails. Sometimes on the steps, sometimes on the assertions.

‘Flakiness’ does not reside at a ’level’

I have seen ‘flakiness’

in Unit Tests
in API Tests
in Integration Tests
in GUI Tests

It is too easy to say ‘flaky’

and then blame ‘GUI execution’
and then blame ’the tool’

Living with flakiness is a choice.

Choose a different approach.

It is too easy to say ‘flaky’ and then blame ‘GUI execution’. This is too easy an excuse.

It is too easy to say we have to live with ‘flakiness’ because we don’t have an API.

I am not the only person saying this.

see references at the end and name drops throughout
try to cover something different in this talk

“We designed that flakiness. We are allowing that to happen. We engineered it to be that way. And its our fault that that exists.”

Richard Bradshaw, “Your Tests aren’t Flaky, You Are!” Selenium Conference 2017

https://www.youtube.com/watch?v=XnkWkrbzMh0

Take it more seriously. Describe it differently.

Intermittent

Occurring at irregular intervals; not continuous or steady.

https://en.oxforddictionaries.com/definition/intermittent

something is happening, but not all the time

that makes me think there is a cause that we haven’t identified

there is some factor in the system that I don’t know about

“Flaky” doesn’t do that

Take it more seriously. Describe it differently.

Nondeterministic Algorithm

“a nondeterministic algorithm is an algorithm that, even for the same input, can exhibit different behaviors on different runs”

https://en.wikipedia.org/wiki/Nondeterministic_algorithm

That doesn’t sound like how we describe ‘automation’.

That’s not what we want.

Flaky is not serious enough.

We do not want to use nondeterministic algorithms for continuous assertions that we are relying on

We are trying to rely on this. Continuous Integration. Continuous Deployment. We need this to run reliably. We need to trust that pass means everything ran as expected.

remove the word ‘flakiness’ from your vocabulary

Your Test Automation is not Flaky

Your automated execution fails intermittently

You have experienced intermittent failures. You have chosen to live with them, you haven’t fixed the causes of intermittent failure.

Don’t Blame Tests. Look For Root Causes.

watch Alister Scott’s GTAC 2015 talk

I have removed ‘flakiness’

from Unit Tests
from API Tests
from Integration Tests
from GUI Tests

Automated execution does not have to fail intermittently.

How to remove Intermittent Failure from your Automated Execution

Care
Investigate
Do something about it

How to remove Intermittent Failure from your Automated Execution

Decide Intermittent Failure is unacceptable
Investigate the cause of Intermittent Failure
Mitigate
- Remove the cause
  - actually fix it
- Implement a retry strategy
  - might obscure bugs
- Accept Intermittent Results
  - might provide hints at solutions

Obvious thing to do is remove the cause. But this can be hard.

You might try a retry strategy. This can hide problems. But if you your system exhibits intermittent failure to users in live then you might have to do this. Because that is what users do in live. i.e. the problem might not be your execution, it might be your system.

Alister Scott talks about retry strategies obscuring bugs.

Accept intermittent results. You might move everything that is failing intermittently into a test run of its own. This creates a ‘good’ execution pack, and a ‘bad’ intermittent pack. You isolate the failures until you can work on them. If you find that you keep moving things into the intermittent pack, then you need to rethink the mitigation strategy because eventually the pack will get larger and you haven’t done anything about the root causes.

Take it seriously

We write Automate Assertion checking because we care that those assertions are true for each build of the system.

Determinism is important.

High Level Grouping of Common Causes of Intermittency

Synchronisation - lack of or poor
Parallel Execution - interference
Long Running Tests - too long, too risky
Automatability - hard to automate system
Tools - inappropriate or out of date
State Preconditions - not controlled
Assertions - wrong or incorrect assumptions
Data - not controlled

see also Richard Bradshaw and Mark Winteringham “SACRED” Mnemonic.

Splat Sad

Will cover Top 3 for each Grouping

Top 3 Common Causes - Synchronisation

None
Time Based Synchronisation
Incorrect App State Synchronisation
None

Like a Rhino in a Chocolate Shop. Charging through everything.

Remote Cloud Execution might improve things because of latency.

Time Based Synchronisation

implicit waits - everything waits, which can slow down error reporting

worse, thread.sleep()

Wait for 10 seconds.

But all sync is time based… timeouts. But these can be domain specific and can assert SLAs, adjusted for remote latency or environment conditions.

Ideally we want state and domain synchronisation.

Incorrect App State Synchronisation

e.g. wait for ‘ajax’ image to go away, rather than Ajax call to complete processing

Naive app state identification e.g. “Page footer present means page is ready to work with”

Common Solutions - Synchronisation

Synchronise on States
- do not rely on framework Synchronisation
Multiple Intermediate States
- Consider Latency
Synchronise in Abstractions not the @Test methods
- unless @Test specific

Top 3 Common Causes - Parallel Execution

Framework not thread safe
Tests Interfere
Shared Test Environment

People often jump into Parallel Execution too soon.

Framework not thread safe

People build Frameworks, which control how they work. Often heavy on inheritence, Single instantiated control objects, shared global state and variables. All stuff that makes parallelism hard.

Test Frameworks might be thread safe, but our abstractions often are not.

May need to fork rather than thread e.g. different suites

Tests Interfere

System state preconditions changed by other tests running in parallel

Shared data can cause interference. Browsers. System processes. System responding to one request.

Might even be bugs in the system.

Shared browser usage.

Shared Test Environment

Not just shared for automated execution - data controls can help with this.

But shared across different teams, Exploratory testing going on in the same environment as automated execution, CI, performance testing

Common Solutions - Parallel Execution

Independent environments
Independent Data
Separate Suites rather than threaded execution
Create Threadsafe, reusable code
- Create reusable library abstractions rather than Frameworks
- Avoid ‘static’ singleton objects

Top 3 Common Causes - Long Running Tests

Sequential rather than Model Based
not delineating between: preconditions, process, assertion
components tests in flow rather than isolation
Sequential rather than Model Based

Model based has a lot of synchronisation built in. Particularly if it is state transition based because State transitions have guard conditions or entry conditions. Synch should be baked into model based execution pretty well.

Model based usually has some variety handling.

Sequential doesn’t have a lot of variety. It can’t handle pop-ups, A/B tests.

Short flows are usually easier to make reliable.

preconditions, process, assertion

We want to:

setup the preconditions we need
carry out a process
assert on the conditions we want to check

Make ‘precondition’ setup part of the test.

Too long a ‘process’ i.e. the stuff you do in the execution might leave you at risk of overlap, or too long to setup.

Too many assertions can slow things down, overlap with other processes, timing of assertions might be hard. Sync on assertion checking.

components tests in flow rather than isolation

Very often we think we need to assert on all conditions in a full executable flow - login through GUI, navigate here, setup data in GUI, do something. Check in GUI.

It is possible to assert on GUI rendering and some functionality in isolated playgrounds. Particularly if we design the system to operate as components. e.g. good for React and other JavaScript MVC fameworks

Common Solutions - Long Running Tests

Understand that more actions == more risk
Synchronise prior to each step
Consider Model Based Testing
Create component test and automated execution playgrounds
Minimum assertions

Top 3 Common Causes - Automatability, Automatizability

Not Testability:

Application has non-deterministic behaviour
Hard to Synchronise
Application fails non-deterministically in live
not Testability

Some people refer to this as ’testability’, but it isn’t. I can often easily test an app that I find hard to automate.

Application has nondeterministic behaviour

JavaScript Callbacks trigger ‘random’ popups

Async and out of order processing

A/B Testing in app

External ‘stuff’ not related to core functionality e.g. ads, libraries, rendering

Alister Scott covers this in his GTAC 2015

Hard to Synchronise

JavaScript frameworks. Many DOM updates

e.g. Queued messages (guaranteed), buffered processes, promises.

Application fails non-deterministically in live

Saw this at a client recently - every team was reporting flaky tests - GUI, API, Components, Backend, Low level stuff. Every team.

Intermittent Application Architecture - the application is intermittent in live, by design.

Not even ‘bugs’ just known to be a ‘flaky’ app at times, and user just presses button again.

you can’t expect to have deterministic automated execution if your underlying application is non-deterministic.

Common Solutions - Automatability, Automatizability

Build apps that can be automated
Non-Deterministic apps need step retry strategies rather than test retry strategies

Top 3 Common Causes - Tools

Out of Date
Inappropriate
Local Tool Infrastructure
Out of Date

out of date language bindings

out of date drivers and APIs

Inappropriate

using the wrong tool for the wrong job, e.g. use WebDriver to automate a GUI that sends HTTP REST API methods, instead of automating the HTTP REST API

Local Tool Infrastructure

environment performing automated updates

e.g. browsers automatically update rather than controlled to match the driver versions

e.g. maintaining a local grid or tool environment

environment not maintained

environment not up all the time

environment out of date

Common Solutions - Tools

Use the right tool for the job
Keep your tooling environment controlled and up to date
Change your approach to take latency into account
- process on server return results
- return source, process on execution client

Top 3 Common Causes - State Preconditions

Not Checking State Preconditions at start of test
Not controlling state preconditions prior to test
Precondition setup using same tool
Not Checking State Preconditions at start of test

We want to fail fast and at the appropriate point.

We don’t want to ‘do something’ that may or may not pass. We want to make sure it can pass. And report if it can’t.

Not controlling state preconditions prior to test

To the best of your ability make it so that your execution can pass. Setup what you need. Lock what you need. Book what you need.

Precondition setup using same tool

e.g. GUI automated execution has state setup using GUI automated execution

Mark Winteringham Selenium Conf talk

Common Solutions - State Preconditions

control data
precondition state setup - whatever works
- http, db, api - ‘hack it in’
avoid dependencies between execution unless a long running test

Use abstraction layers to create the dependencies rather than rely on other tests.

Top 3 Common Causes - Assumptions Encoded in Assertions

Assert on an Ordered Set
Assert on Uncontrolled Data
Assertion Tolerences
Assert on an Ordered Set

multiple data items, asserted in non-guaranteed order e.g. {1,2,3}

sometimes {3,2,1}, sometimes {2, 1, 3} but we always assert on an ordered set {1,2,3}

instead assert on ‘includes’ and ’length’

{S}-union-{1}=={1}, {S}-union-{2}=={2}, {S}-union-{3}=={3}, |S| == 3

Need to make sure assertions don’t allow duplicates or extras to slip through.

Assert on Uncontrolled Data

assert on ‘stuff’ not important to test

This might also arise from duplicated assertions.

Assertion Tolerences

assertion tolerances not expansive e.g. amended time not within 3 seconds (operation took longer than expected), amended time == created time (because scale used is milliseconds instead of nanoseconds i.e. not enough to make distinction)

Common Solutions - Assumptions Encoded in Assertions

Logging so you can interrogate failure afterwards
Ability to re-run tests with same data and setup

Top 3 Common Causes - Data

Missing Data
Externally controlled data
Uncontrolled Data

Data issues are pretty easy to spot, and pretty easy to avoid. But are all too common, particularly when we view the data as too hard to setup, because of complicated conditions, or dates, or amount of transaction data.

Missing Data
Externally controlled data

Static data
Hard-coded data

Uncontrolled Data

Live Data
Randomly Generated Data

A lot of this comes down to the system design and programming notion of “Avoid Global State” or “Avoid Shared State” “Avoid Global Variables”

Common Solutions - Data

Create data for each test
Avoid test dependencies
Avoid re-using data between tests
Check data as a precondition
Data synchronisation on all precondition data

Summary

Your Test Execution is not ‘flaky’, it is failing intermittently
It is possible to remove intermittent failures, even when automating through a GUI
Commons solutions: synchronisation, data control, environmental isolation

Other talks to watch

Alister Scott, GTAC 2015: Your Tests Aren’t Flaky
- https://www.youtube.com/watch?v=hmk1h40shaE
Richard Bradshaw, “Your Tests aren’t Flaky, You Are!” Selenium Conference 2017
- https://www.youtube.com/watch?v=XnkWkrbzMh0

Other talks to watch

Craig Schwarzwald, SAY GOODBYE TO THE “F” WORD … FLAKY NO MORE!
- https://www.youtube.com/watch?v=2K2M7s_Ups0
Mark Winteringham - REST APIs and WebDriver: In Perfect Harmony
- https://www.youtube.com/watch?v=ugAlCZBMOvM

Search also for: Flaky Selenium, Flaky Automation, Flaky Test Automation

End

Alan Richardson www.compendiumdev.co.uk

Linkedin - @eviltester
Twitter - @eviltester
Instagram - @eviltester
Facebook - @eviltester
Youtube - EvilTesterVideos
Pinterest - @eviltester
Github - @eviltester
Slideshare - @eviltester

BIO

Alan is a Software Development and Testing Coach/Consultant who enjoys testing at a technical level using techniques from psychotherapy and computer science. In his spare time Alan is currently programming a Twitter client called ChatterScan, and multi-user text adventure game. Alan is the author of the books “Dear Evil Tester”, “Java For Testers” and “Automating and Testing a REST API”. Alan’s main website is compendiumdev.co.uk and he blogs at eviltester.com

Watch the Full 1.5 Hour Version in Evil Tester Talks