Flaky tests in software testing are tests that exhibit inconsistent behavior across different execution runs. Such unpredictability leads to unreliable results, making it challenging to assess test coverage accurately. These tests can produce both false positives and false negatives in reports, potentially causing unnecessary escalations. False negatives occur when a test wrongly identifies a defect in the software, even when the software is functioning correctly, leading to wasted time and effort in investigating non-existing issues.
Addressing flaky tests requires a thorough identification and understanding of their underlying causes. By pinpointing the root reasons for their flakiness, testers and developers can take essential steps towards rectifying these tests, thereby ensuring a more stable and reliable test automation process.
Table Of Contents
- 1 What is a Flaky Test?
- 2 Characteristics of Flaky Tests
- 3 Why are Flaky Tests a Problem?
- 4 What Causes Flaky Tests?
- 5 Reasons for Flaky Tests
- 6 Flaky Test Examples
- 7 How to Detect Flaky Tests?
- 8 How to Fix Flaky Tests?
- 9 How to Reduce Flaky Test Failures?
- 10 Flaky Test Cases
- 11 How to Manage Flaky Tests Using Testsigma and Automating Flaky Test Detection?
- 12 Best Practices for Identifying and Reducing Flaky Tests
- 13 Leveraging Tools and Techniques to Mitigate Flaky Tests
- 14 Addressing the Root Cause
- 15 Conclusion
What is a Flaky Test?
A flaky test is a type of automated test that produces inconsistent results when executed multiple times under the same conditions. Flaky tests may intermittently pass or fail due to various factors such as timing issues, race conditions, environmental dependencies, or underlying system instability despite not changing the code or the tests. It is also called a non-deterministic test.
The ‘brittle’ nature of these tests lead to difficulties in debugging the tests and pin-pointing the issues for rectification. False positives or false negatives and the challenges in identifying genuine defects or regressions in the code make flaky test detection a necessity during the testing process.
Characteristics of Flaky Tests
Flaky tests are unreliable, inconsistent, and dependent on external factors. But these are not the only characteristics that define these non-deterministic tests. Some of their characteristics are:
- Intermittent failures: Flaky tests may fail intermittently, producing different outcomes when executed multiple times under the same conditions. These intermittent failures can be unpredictable and occur sporadically.
- Inconsistent results: These tests may produce inconsistent results across different test executions which makes it challenging to determine the true state of the system under test. The test may pass one time and fail another time without any changes to the code or test environment.
- False positives and false negatives: Flaky tests can lead to false positives (incorrectly reporting failures) or false negatives (incorrectly reporting successes), undermining the trustworthiness of test results and hindering defect identification.
- Environment sensitivity: It is possible for flaky tests to be sensitive to changes in the test environment, such as different operating systems, browsers, or hardware configurations. Variations in the test environment can affect the test outcome and contribute to flakiness.
Why are Flaky Tests a Problem?
By definition itself, flaky testing leads to false positives and negatives even when the software function is correct. Such instances lead to testers and developers thinking that the system is malfunctioning. But even in that case, none of the individuals can identify the affected area to eliminate the problem because flaky test cases make it impossible to debug the tests.
The inconsistency in producing accurate results is a huge blow to producing a quality system and resolving the problems that might exist (or not exist).
Flaky tests waste time, human resources, and push back the delivery time without any solution in sight.
What Causes Flaky Tests?
For companies relying on and adopting continuous testing in their SDLC, dependency on automation testing has increased. With this, the test execution speed has improved significantly. Yet there are chances of flaky tests coming into picture when automation suites break and render unreliable results. If not taken care of, these tests can throw the developers off guard and prolong the entire development and testing process with the possibility of bugs still remaining present in the system.
To reduce or fully eliminate these instances, it is imperative to know the cause of flaky tests.
- No wait statements:
- External dependencies:
- Rare conditions:
- Fluctuating test conditions:
Read more below in detail
Reasons for Flaky Tests
1.Timing Issues
Tests that interact with ui elements or API calls may fail intermittently as the reponses can be inconsistent at times or there could be asynchronous operations without appropriate synchronization.
In almost all applications today there are multiple ajax calls included, because of which the application may behave or respond inconsistently. In these cases, flaky tests may happen.
If any tests are supposed to wait for any specific UI elements and hard coded value is used for the waits , then it can cause flaky tests as the time taken for the elements may vary depending on the network speed and many other parameters. Its always advisable to use dynamic waits for the elements wherever necessary.
2. Test Dependency
Test dependency refers to a situation where the outcomes of one test is influenced by other tests , it can cause various issues like false positive or negative which leads to a flaky test ,where the stability and reliability of the tests are compromised.
When multiple test cases share same data set for their execution , it can lead to dependency issues .If the previous test has modified the data which the next test case is not able to follow , the second one will fail.
Test cases which mostly rely on network services or any 3rd party application integration are more prone to failure. If these external dependencies are not consistently available or experience changes, it can impact the reliability of the test results.
3.Multi threaded scenarios
The critical sections are portions of code where shared resources are accessed and modified. When multiple threads or processes attempt to access the same critical section simultaneously, this kind of situation may occur.
Non atomic operations are when a test has multiple steps which can be intervened by other tests or processes may lead to inconsistent test results .
4.Environment dependency
The accuracy of the test cases also depends on the environment where the code was written and the code being executed in a realtime basis. A test being executed in a different operating system, different configurations and other external factors can impact the behavior of the application under test, leading to inconsistent test results and potentially causing flaky tests .
The software environment can vary with different operating systems, browsers, and versions of software dependencies. Each configuration may have its peculiarities or bugs, which could lead to different behavior in the application.
Test automation may run on various hardware configurations, such as different processors, memory sizes, or graphics cards,different version of OS. These hardware differences can affect the application’s performance and behavior, leading to varying test outcomes. Also, in virtualized or containerized environments, differences in host configurations or resource allocations can impact the behavior of the application and tests.
5.Unreliable Test Frameworks
Unreliable test frameworks may produce varying outcomes for the same test scenarios, even when the application’s behavior remains unchanged. Such inconsistencies make it challenging to rely on test results and undermine the confidence in the testing process.
Some test frameworks may have limited compatibility with different operating systems, browsers, or devices, making it challenging to ensure comprehensive test coverage across various environments. And Insufficient or unclear documentation of the test framework can hinder effective adoption and usage. Testers may struggle to understand the framework’s features, configuration options, and best practices.
Unreliable frameworks may have poor support, infrequent updates, or an inactive development community. This lack of maintenance can result in outdated dependencies, security vulnerabilities, and compatibility issues.
Test frameworks that suffer from performance issues or do not scale well to handle large test suites can impact productivity and efficiency in the testing process.
If a test framework lacks integrations with other testing tools or continuous integration systems, it may hinder the overall test automation workflow.
6.Non Deterministic assertion
Non-deterministic assertions in software testing refer to assertions or checks that are not consistently true or false when applied to the same test case across multiple test runs. In other words, the outcome of these assertions may vary based on factors such as timing, external dependencies, or environment conditions. This variability can lead to flaky tests, where the same test may produce different results on different test executions.
One common example of non-deterministic assertions is when tests involve interactions with dynamic web elements or asynchronous operations. These tests might assert the presence or absence of an element on the page, but due to timing issues, the element’s state might change between test runs, causing the assertion to fail unpredictably. Similarly, when tests involve interactions with external APIs or databases, non-deterministic response times or data changes can affect the outcome of assertions.
Flaky Test Examples
Any test run can be flaky if the results differ with every test execution. One example of a flaky test could be receiving varying outcomes when you and your colleague run a test at the same time.
Imagine a login test where occasionally, due to network latency or server load, the authentication request takes longer than usual to process. As a result, the test may intermittently fail due to a timeout error, despite the login functionality working correctly. This inconsistency in test outcomes characterizes flaky testing.
Similarly, a checkout process of an e-commerce website failing or passing on different test runs due to server-side caching or database synchronization issues also qualifies as a flaky test example.
How to Detect Flaky Tests?
Flaky tests have a serious impact on the test execution and product quality. Additionally, they undermine the confidence of the team in their development and testing efforts. All this makes it necessary to detect flaky tests and do so accurately.
Here’s how flaky test detection is possible:
- Rerun the tests: If a test fails once, rerun it again to see if it passes or fails the next time. The different results in two test runs will mean that it is flaky.
- Run tests in parallel: A test passing/failing when executed independently, but returning a different result when run with other tests indicate the flakiness of the test.
- Logging and monitoring: Carefully assess the outcome of each test execution to identify the inconsistencies in the results. If the test provides conflicting results, it is likely a flaky test.
- Execute tests in different environments: A test failing in another environment after displaying a pass result in one environment indicates that an unstable test environment can lead to test flakiness.
- Use tools and frameworks: Make use of a reliable automation testing tool, such as Testsigma that is equipped with a comprehensive test analysis feature to review and analyze test outcomes, including pass/fail statuses, error messages, and execution logs. Also, take into account test automation frameworks or tool that align with your test automation needs.
How to Fix Flaky Tests?
The next stop after flaky test detection is to fix them and their future occurrences. The steps below will guide you to fix any number of flaky tests in your test suite:
- Isolate flaky tests: Separate the identified flaky test cases and run them independently. This can help identify specific conditions or dependencies that trigger flakiness and facilitate targeted debugging and troubleshooting.
- Configure the environment: Ensure consistent and stable test environments by standardizing configurations, dependencies, and external factors such as browsers, devices, or network settings. Minimize variability in the test environment to reduce the likelihood of flakiness.
- Improve error handling: Focus on making the test cases robust by writing better error handling statements in the code. For scriptless testing, include wait and retry statements during the execution.
- Continuously monitor the feedback: Continuously monitor test execution and collect feedback from testers, developers, and stakeholders to identify recurring flaky tests and prioritize them for resolution.
- Execute collaborative debugging: Foster collaboration between testers, developers, and automation engineers to debug and troubleshoot flaky tests effectively.
How to Reduce Flaky Test Failures?
You might not be able to eliminate flaky tests, but you can definitely reduce the failure instances.
Start with avoiding asynchronous wait statements and delays in the test script. Keep a keen eye on the use of test environments, server load, and external dependencies that often add to the test flakiness.
Moreover, when running parallel testing make sure to keep the testing conditions the same to avoid test failure.
Flaky Test Cases
Flaky test cases refer to automated tests that produce inconsistent results, often exhibiting intermittent failures or unpredictable behavior. Here are examples of flaky test cases:
- Login Test: Occasionally, due to network latency or server load, the authentication request may take longer than usual to process, leading to intermittent failures or timeouts.
- Checkout Process: Fluctuations in inventory availability or server-side caching may cause inconsistencies in the checkout process, which may result in failures when adding products to the cart or proceeding to checkout.
- Search Functionality: Variations in search index updates or database synchronization may lead to inconsistent search results while validating search functionality. Search functionality test cases will help you recognize which test may be flaky.
- File Upload Test: Network congestion or server delays may impact the upload process.
- API Integration Test: Dependency on external services or third-party APIs can introduce variability in test outcomes to introduce intermittent failures due to network issues or changes in API responses.
- UI Rendering Test: Changes in browser versions or rendering engines may affect UI elements’ appearance or layout, resulting in failures in UI tests.
- Concurrency Test: Any test involving concurrent execution or shared resources may encounter race conditions, which can lead to unpredictable outcomes and fail status.
- Data-Driven Test: Fluctuations in test data or database state may cause inconsistencies in test results when validating data-driven functionality.
- Cross-Browser Test: Variations in browser behavior or rendering may result in inconsistencies in test outcomes across different browsers.
- End-to-End Workflow Test: Complex end-to-end workflows involving multiple steps or dependencies may end up introducing timing issues or synchronization problems during test execution.
How to Manage Flaky Tests Using Testsigma and Automating Flaky Test Detection?
Before you jump onto onboarding a platform that can handle flaky tests on your behalf, you will have to automate flaky test detection. How do you do that?
Start with investigating if the test failed because of an issue in the application or because of problems in test creation or execution. This will help determine the root cause of the problem and figure out a solution.
Next, make use of test execution data, screenshots, and result logs. Testsigma comes equipped with a comprehensive test result analysis feature that allows review and analysis of test outcomes, including pass/fail statuses, error messages, and execution logs.
Testsigma results log shows test execution summary
Moreover, if you need to retry test cases to be double-sure of test flakiness, make use of Testsigma’s retry mechanisms to automatically rerun failed tests and determine if they pass on subsequent executions. It will help you monitor the frequency of test retries and analyze the outcomes.
Re-run the entire test script or choose Run till here to pin-point the failed test step
Not to forget the configurations you can make, including choosing the test environment and test lab to reduce the instances of flaky tests.
Select the desired test lab and test environment during test execution
Best Practices for Identifying and Reducing Flaky Tests
Identifying and reducing flaky tests is essential for maintaining a reliable and efficient test automation process. Here are 15 best practices to help you achieve this goal:
1. Monitor Test Execution History: Over a period of time, extract the test execution history and track the results to find patterns of flakiness. Analyze test failures and inconsistencies to identify the flaky tests.
2. Tag Flaky Tests: Tag separate flaky tests in your test suite for separate reporting and handling. This helps differentiate flaky test failures from genuine defects. You would get a reliable report for the consistent test cases.
3. Run Tests in Isolation: Make sure each test case can work on its own and doesn’t need the data or conditions from other tests. Keep tests separate to prevent problems with shared information. Try to divide and organize tests as much as you can.
4. Use Stable Test Data: Maintain clean and consistent test data to minimize the impact of data-related flakiness. Reset data to a known state before test execution. It is preferable to use real time data or mock the production data in automation environments for better results. Keep on updating or refreshing your automation data set for better results.
5. Implement Retry Mechanisms: If the test is not critical and sometimes produces inconsistent results, you can try adding a retry mechanism. This will automatically rerun the test, improving the likelihood of getting a reliable outcome.
6. Automate Cleanup Tasks: Ensure that tests are designed to undo any modifications made during test execution, leaving a clean environment for the next test run. Additionally, consider implementing automatic data clean-up or cache removal after each test execution to maintain a consistent testing environment.
7. Regularly Update Test Dependencies: To ensure a stable and dependable test automation process, it is essential to regularly update your test automation tools and libraries. Also, keep a vigilant watch on dependencies and third-party libraries used in your tests. Updating them as required will prevent flakiness and promote smoother test execution.
8. Synchronize Test Actions: Use dynamic waits and synchronization techniques to ensure tests interact with elements and resources only when they are ready and available.
9. Mock External Dependencies: Mocking or stubbing can be employed for external services or APIs in order to create controlled and predictable test environments. This approach prevents the need to modify the actual API and leads to more effective test execution.
10. Leverage Parallel Testing: Run tests in parallel to speed up execution and detect flakiness faster. However, design tests to handle concurrent execution without interference.
11. Investigate Flaky Tests Promptly: To tackle flakiness, performing debugging, enhancing logging, and tracing test execution can help identify potential timing or environment issues, ultimately reducing flakiness.
12. Continuous Integration and Reporting: By integrating test automation with a CI/CD pipeline, tests can be automatically run, and timely feedback on flaky tests can be received. Frequent test runs aid in quickly identifying flaky tests.
13. Document Test Environments: It is crucial to maintain clear and detailed documentation of test environments, configurations, and dependencies to ensure consistency. Documentation plays a vital role in establishing a robust framework.
14. Continuous Test Maintenance: Regularly review and update tests as the application evolves to maintain relevancy and stability.
15. Collaboration between Teams: Foster effective communication and collaboration between testers, developers, and stakeholders to collectively address flaky tests and improve overall test reliability.
Leveraging Tools and Techniques to Mitigate Flaky Tests
You can effectively reduce flaky tests by using different tools and techniques that improve the stability and reliability of your test automation.
Here’s one pointer for each tool:
1. Testsigma (Test Automation Platform): Testsigma is a no-code test automation platform that lets you automate your tests in simple English, intelligently handles dynamic web applications and provides self-healing mechanisms to reduce flakiness.
2. Selenium: Selenium is a popular automation framework that supports various programming languages and browsers for cross-browser testing. Leveraging implicit and explicit waits in Selenium helps ensure tests wait for elements to be ready before interacting with them, reducing timing-related flakiness.
3. TestNG (Test Next Generation): TestNG is a testing framework that supports test dependency management, allowing you to manage test execution order and reduce dependencies between tests.
4. JUnit: JUnit is a widely-used unit testing framework for Java, which helps in writing and executing unit tests. Using JUnit’s built-in assertion methods ensures stable and deterministic validation of test outcomes.
5. Cucumber (BDD Framework): Cucumber enables writing tests in a human-readable format (Gherkin) and encourages collaboration between stakeholders. Clear and descriptive scenarios in Cucumber tests make it easier to identify flaky test cases and their behavior.
Addressing the Root Cause
Addressing the root cause of flaky tests is vital for a robust test automation process. Thoroughly analyzing test failures helps identify underlying issues causing inconsistency. Mitigating flaky tests involves handling timing problems, synchronizing test actions, and using stable test data. Employing proper waits, dynamic locators, and mock services also stabilizes tests. Regularly updating dependencies, maintaining documentation, and collaborating with team members resolve flakiness. Continuous monitoring, prompt investigation, and integrating with CI/CD pipelines aid in early detection and resolution, ensuring a reliable and efficient test suite.
Conclusion
In conclusion, flaky tests present a challenge in test automation with their inconsistent behavior. They can lead to wasted efforts and reduced confidence in the testing process. However, by identifying root causes, using stable data, and leveraging appropriate tools, we can mitigate flakiness and establish a more reliable test suite. Proactive measures and collaboration between testers and developers are crucial for achieving consistent and high-quality test automation.