Preempting Flaky Tests
via Non-Idempotent-Outcome Tests
Anjiang Wei, Pu Yi, Zhengxi Li, Tao Xie, Darko Marinov, Wing Lam
anjiang@stanford.edu
Funding acknowledgments​
CCF-1763788
CCF-1956374
62161146003
1
2
Developer Anecdote
Servers
test0
test1
test2
testn
…
Servers
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
4:15 PM
test0
test1
test2
testn
…
Build code
Run tests
3
Developer Anecdote
Servers
test0
test1
test2
testn
…
Servers
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
4:15 PM
Merge Changes
Pass
test0
test1
test2
testn
…
Build code
Run tests
4
Developer Anecdote
Servers
test0
test1
test2
testn
…
Servers
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
4:15 PM
Fail
Debug Changes
test0
test1
test2
testn
…
Build code
Run tests
?
??
5
Developer Anecdote
Servers
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
test0
test1
test2
testn
…
Servers
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
Servers
Servers
test0
test1
test2
testn
…
4:15 PM
5:00 PM
5:30 PM
6:15 PM
Servers
test0
test1
test2
testn
…
Build code
Run tests
Build code
Run tests
?
??
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
Servers
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
Servers
Build code
Run tests
Build code
Run tests
Developer Anecdote
Servers
test0
test1
test2
testn
…
Servers
test0
test1
test2
testn
…
4:15 PM
5:00 PM
5:30 PM
6:15 PM
Servers
test0
test1
test2
testn
…
Developer wastes time
debugging & running tests
and goes home
1 hour and 15 min later
1 hour
15 min
Flaky Test: a test that can
non-deterministically
pass and fail when run on
the same code version
6
?
?
?
?
??
…
- static int add() {
+ static int add(r) {
- db.addRow(“”);
+ db.addRow(r);
return db.size();
…
Servers
…
- static int add() {
+ static int add(r) {
- db.addRow(“”);
+ db.addRow(r);
return db.size();
…
…
- static int add() {
+ static int add(r) {
- db.addRow(“”);
+ db.addRow(r);
return db.size();
… Servers
Servers
test0
test1
test2
testn
…
4:15 PM
5:00 PM
5:30 PM
6:15 PM
Servers
test0
test1
test2
testn
…
Servers
test0
test1
test2
testn
…
Developer wastes time
debugging & running tests
and goes home
1 hour and 15 min later
1 hour
15 min
Flaky Test: a test that can
non-deterministically
pass and fail when run on
the same version of the code
Public Outcry About Flaky Tests
7
What are Flaky Tests?
• A test is flaky if it passes and fails for the same code version
• Misleads developers to debug nonexistent faults in recent changes
• Reduces trust in tests
• Order-dependent tests are a prominent category of flaky tests
• An order-dependent test deterministically passes or fails in any given test order,
passes in 1+ order, and fails in 1+ order
8
Background: Victim and Polluter
•Victim 𝑡1 fails when run after polluter 𝑡2
• Polluter has modified some shared state
• Victim’s test assertion depends on some shared state
• The same shared state (the variable 𝑥 in the code)
// shared variable x is initialized to 0
void t1() { assert x == 0; } // victim
void t2() { x = 1; } // polluter
TestOrder1
t1 t2
TestOrder2
t2 t1
9
Background: Latent-Victim, Latent-Polluter
• Latent-Victim 𝑡3:
• Assertion depends on shared state; currently no tests modify 𝑦
• victims ⊂ latent-victims
• Latent-Polluter 𝑡4:
• Shared state modification; currently no tests put assertions on 𝑧
• polluters ⊂ latent-polluters
// shared variables x, y, z are initialized to 0
void t1() { assert x == 0; } // victim
void t2() { x = 1; } // polluter
void t3() { assert y == 0; } // latent-victim
void t4() { z = 1; } // latent-polluter
10
Non-Idempotent-Outcome (NIO) Test
• A test is non-idempotent-outcome (NIO):
• t5(); t5()  pass; fail
• Passes in the first run but fails in the second when run twice consecutively
• An NIO test self-pollutes the state that its own assertions depend on
• NIO ⊂ latent-polluter ∧ NIO ⊂ latent-victim
// shared variables x, y, z, w are initialized to 0
void t1() { assert x == 0; } // victim
void t2() { x = 1; } // polluter
void t3() { assert y == 0; } // latent-victim
void t4() { z = 1; } // latent-polluter
void t5() { assert w = 0; w = 1;} // NIO
11
Why should we detect NIOs?
• Typically, tests are not run twice
• To preempt/prevent flaky tests
• Why not fix latent-polluter?
• Why not fix latent-victim?
• Prior work
• Gyori et al.1 detect 575 latent-polluters
• Manually filter 381 (66%) false positives (cannot reasonably become polluters)
• Huo and Clause2 detect latent-victims with dynamic taint analysis
• Do not report how many can reasonably become victims
• They do NOT fix any tests
• NIOs are more worth fixing
• Both latent-victims and latent-polluters at the same time
• Easy to detect, no false positives
• Well-accepted fixes
1 Gyori et al., “Reliable testing: Detecting state-polluting tests to prevent test dependency”. ISSTA 2015
2 Huo and Clause, “Improving oracle quality by detecting brittle assertions and unused inputs in tests”. In FSE 2014
12
Contributions
• Definition of NIO tests
• Deterministically change from pass to fail when run twice
• Effective detection & empirical evaluation
• Propose 3 modes for detection
• 127 Java test suites  223 NIO tests
• 1006 Python projects  138 NIO tests
• Well-accepted fixes
• Inspect every NIO test (no false positive)
• Open pull requests for 268 tests
• 192 accepted, 70 pending, only 6 rejected
13
Real Example of NIO
Buggy Cleaning Code
def cmd_mock():
def _cmd_mock(name: str):
cmd.__overrides__[name] = [‘/bin/true’]
yield _cmd_mock
- cmd.__overrides__ = []
+ cmd.__overrides__ = {}
def test_slurm_command(tmp_path, cmd_mock):
cmd_mock('srun')
TypeError: list indices must be
integers or slices, not str
14
Real Example of NIO
def to_zero(tvd, northing, easting,
surface_northing, surface_easting):
# perform some checking
- northing -= surface_northing
- easting -= surface_easting
+ northing = northing - surface_northing
+ easting = easting - surface_easting
return tvd, northing, easting
# initialization for global variables: g1,…,g5
g1 = ...
def test_zero():
# global variables passed in as arguments
v1, v2, v3 = to_zero(g1, g2, g3, g4, g5)
np.testing.assert_equal (...) # assertion
Fix: Avoid Function Side Effect
AssertionError:
Mismatched elements: 121 / 121 (100%)
15
Prevalence of NIO Tests
Conclusion:
• NIO tests are prevalent enough that every project should run NIO detection
at least once
Java Python
# Test Suites (total) 127 1006
# Test Suites w/ NIO 34 138
% Test Suites w/ NIO 26% 9%
# NIO Tests 223 138
16
Different Detection Modes
• Three Different Modes
• Isolated-method
• Run1: t1, t1
• Run2: t2, t2
• Run3: t3, t3
• Isolated-class
• Run1: t1, t1, t2, t2
• Run2: t3, t3
• Entire-suite
• Run1: t1, t1, t2, t2, t3, t3
• Conclusion
• All three modes detect similar tests
• Isolated-method (223) > Isolated-class (212) > Entire-suite (210)
• Entire-suite has the lowest overhead
• Why differ? See paper for details
TestClass A
t1 t2
TestClass B
t3
Test Suite
17
• We detect 361 (233 Java + 138 Python) NIO tests
• We fix 268 NIO tests by opening Pull Requests
• 192 tests accepted
• 70 tests pending
• 6 tests are rejected
• We do not fix 51 NIO tests
• Cannot localize pollution
• Difficult to clean the pollution
• 42 tests are N/A
• Not NIO in the latest version (fixed/deleted/etc)
• Conclusion
• Developers are generally positive about fixes for NIO tests
• Providing reproducing steps and explaining the motivation help
Experience with Fixing NIO Tests
192
70
6
51
42
Accepted Pending Rejected Do not Fix N/A
18
NIO vs. Polluter vs. Victim
• NIO tests are related to but not
subsumed by polluters and
victims
• Detecting NIO tests can be an
effective way to preempt
polluters and victims
19
Conclusions
• We focus on Non-Idempotent-Outcome (NIO) tests
• Deterministically change from pass to fail when run twice
• Detect and fix NIO tests
• Preempt order-dependent flaky tests
• Importance: in the intersection of latent-polluters and latent-victims
• Detect 361 NIO tests (223 Java + 138 Python)
• Opened pull requests for 268 tests, with 192 accepted
• Dataset publicly available:
• https://sites.google.com/view/nio-tests
• IDoFT dataset (all flaky tests): https://github.com/TestingResearchIllinois/idoft
Questions? Email: Anjiang Wei <anjiang@stanford.edu> 20

More Related Content

PDF
Mutation testing in Java
PPTX
CodeChecker summary 21062021
PPT
Verilog Lecture3 hust 2014
PPT
Python testing
PPT
Introduzione al TDD
PDF
Unit testing in iOS featuring OCUnit, GHUnit & OCMock
PPTX
JDD 2016 - Sebastian Malaca - You Dont Need Unit Tests
KEY
Unit testing for Cocoa developers
Mutation testing in Java
CodeChecker summary 21062021
Verilog Lecture3 hust 2014
Python testing
Introduzione al TDD
Unit testing in iOS featuring OCUnit, GHUnit & OCMock
JDD 2016 - Sebastian Malaca - You Dont Need Unit Tests
Unit testing for Cocoa developers

What's hot (20)

PPTX
CodeChecker Overview Nov 2019
PPTX
Symbolic Execution And KLEE
PDF
MUTANTS KILLER (Revised) - PIT: state of the art of mutation testing system
PDF
MUTANTS KILLER - PIT: state of the art of mutation testing system
PPTX
Navigating the xDD Alphabet Soup
PDF
(automatic) Testing: from business to university and back
PPTX
Behavioral modelling in VHDL
PPT
AUTOMATED TESTING USING PYTHON (ATE)
PPTX
Pi j4.2 software-reliability
PDF
VHdl lab report
PPT
Handling Exceptions In C &amp; C++[Part A]
PDF
"Formal Verification in Java" by Shura Iline, Vladimir Ivanov @ JEEConf 2013,...
PDF
Exception Handling
PDF
Vlsi lab manual exp:2
KEY
Taking a Test Drive: iOS Dev UK guide to TDD
PPTX
How to create a high quality static code analyzer
PPTX
System Verilog 2009 & 2012 enhancements
PDF
UVM TUTORIAL;
PDF
TDD CrashCourse Part3: TDD Techniques
CodeChecker Overview Nov 2019
Symbolic Execution And KLEE
MUTANTS KILLER (Revised) - PIT: state of the art of mutation testing system
MUTANTS KILLER - PIT: state of the art of mutation testing system
Navigating the xDD Alphabet Soup
(automatic) Testing: from business to university and back
Behavioral modelling in VHDL
AUTOMATED TESTING USING PYTHON (ATE)
Pi j4.2 software-reliability
VHdl lab report
Handling Exceptions In C &amp; C++[Part A]
"Formal Verification in Java" by Shura Iline, Vladimir Ivanov @ JEEConf 2013,...
Exception Handling
Vlsi lab manual exp:2
Taking a Test Drive: iOS Dev UK guide to TDD
How to create a high quality static code analyzer
System Verilog 2009 & 2012 enhancements
UVM TUTORIAL;
TDD CrashCourse Part3: TDD Techniques
Ad

Similar to NIO-ICSE2022.pptx (20)

PDF
Stamp breizhcamp 2019
PDF
The CI as a partner for test improvement suggestions
PDF
st-notes-13-26-software-testing-is-the-act-of-examining-the-artifacts-and-the...
PDF
Istqb question-paper-dump-5
PDF
Istqb question-paper-dump-1
PDF
Reliability growth models
PPT
Testing foundations
DOCX
Annotated Bibliography .Guidelines Annotated Bibliograph.docx
PDF
Introduction to Software Testing
PDF
Testing concepts [3] - Software Testing Techniques (CIS640)
PDF
Finding latent code errors via machine learning over program ...
KEY
Reliability Vs. Testing
PPTX
Advances in Unit Testing: Theory and Practice
PPTX
Software Testing_mmmmmmmmmmmmmmmmmmmmmmm
PDF
Software Testing:
 A Research Travelogue 
(2000–2014)
PPT
AutoTest.ppt
PPT
AutoTest.ppt
PPT
AutoTest.ppt
PDF
Debug me
PDF
Software Testing: Test Design and the Project Life Cycle
Stamp breizhcamp 2019
The CI as a partner for test improvement suggestions
st-notes-13-26-software-testing-is-the-act-of-examining-the-artifacts-and-the...
Istqb question-paper-dump-5
Istqb question-paper-dump-1
Reliability growth models
Testing foundations
Annotated Bibliography .Guidelines Annotated Bibliograph.docx
Introduction to Software Testing
Testing concepts [3] - Software Testing Techniques (CIS640)
Finding latent code errors via machine learning over program ...
Reliability Vs. Testing
Advances in Unit Testing: Theory and Practice
Software Testing_mmmmmmmmmmmmmmmmmmmmmmm
Software Testing:
 A Research Travelogue 
(2000–2014)
AutoTest.ppt
AutoTest.ppt
AutoTest.ppt
Debug me
Software Testing: Test Design and the Project Life Cycle
Ad

Recently uploaded (20)

PPTX
Matchmaking for JVMs: How to Pick the Perfect GC Partner
PDF
Guide to Food Delivery App Development.pdf
PDF
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
PDF
Workplace Software and Skills - OpenStax
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PDF
Visual explanation of Dijkstra's Algorithm using Python
PDF
BoxLang Dynamic AWS Lambda - Japan Edition
PPTX
GSA Content Generator Crack (2025 Latest)
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PPTX
Computer Software - Technology and Livelihood Education
PPTX
4Seller: The All-in-One Multi-Channel E-Commerce Management Platform for Glob...
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PPTX
CNN LeNet5 Architecture: Neural Networks
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PDF
Microsoft Office 365 Crack Download Free
Matchmaking for JVMs: How to Pick the Perfect GC Partner
Guide to Food Delivery App Development.pdf
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
Workplace Software and Skills - OpenStax
Topaz Photo AI Crack New Download (Latest 2025)
Visual explanation of Dijkstra's Algorithm using Python
BoxLang Dynamic AWS Lambda - Japan Edition
GSA Content Generator Crack (2025 Latest)
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
Computer Software - Technology and Livelihood Education
4Seller: The All-in-One Multi-Channel E-Commerce Management Platform for Glob...
DNT Brochure 2025 – ISV Solutions @ D365
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
Trending Python Topics for Data Visualization in 2025
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
CNN LeNet5 Architecture: Neural Networks
How Tridens DevSecOps Ensures Compliance, Security, and Agility
Microsoft Office 365 Crack Download Free

NIO-ICSE2022.pptx

  • 1. Preempting Flaky Tests via Non-Idempotent-Outcome Tests Anjiang Wei, Pu Yi, Zhengxi Li, Tao Xie, Darko Marinov, Wing Lam [email protected] Funding acknowledgments​ CCF-1763788 CCF-1956374 62161146003 1
  • 2. 2 Developer Anecdote Servers test0 test1 test2 testn … Servers … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … 4:15 PM test0 test1 test2 testn … Build code Run tests
  • 3. 3 Developer Anecdote Servers test0 test1 test2 testn … Servers … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … 4:15 PM Merge Changes Pass test0 test1 test2 testn … Build code Run tests
  • 4. 4 Developer Anecdote Servers test0 test1 test2 testn … Servers … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … 4:15 PM Fail Debug Changes test0 test1 test2 testn … Build code Run tests
  • 5. ? ?? 5 Developer Anecdote Servers … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … test0 test1 test2 testn … Servers … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … Servers Servers test0 test1 test2 testn … 4:15 PM 5:00 PM 5:30 PM 6:15 PM Servers test0 test1 test2 testn … Build code Run tests Build code Run tests
  • 6. ? ?? … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … Servers … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … Servers Build code Run tests Build code Run tests Developer Anecdote Servers test0 test1 test2 testn … Servers test0 test1 test2 testn … 4:15 PM 5:00 PM 5:30 PM 6:15 PM Servers test0 test1 test2 testn … Developer wastes time debugging & running tests and goes home 1 hour and 15 min later 1 hour 15 min Flaky Test: a test that can non-deterministically pass and fail when run on the same code version 6
  • 7. ? ? ? ? ?? … - static int add() { + static int add(r) { - db.addRow(“”); + db.addRow(r); return db.size(); … Servers … - static int add() { + static int add(r) { - db.addRow(“”); + db.addRow(r); return db.size(); … … - static int add() { + static int add(r) { - db.addRow(“”); + db.addRow(r); return db.size(); … Servers Servers test0 test1 test2 testn … 4:15 PM 5:00 PM 5:30 PM 6:15 PM Servers test0 test1 test2 testn … Servers test0 test1 test2 testn … Developer wastes time debugging & running tests and goes home 1 hour and 15 min later 1 hour 15 min Flaky Test: a test that can non-deterministically pass and fail when run on the same version of the code Public Outcry About Flaky Tests 7
  • 8. What are Flaky Tests? • A test is flaky if it passes and fails for the same code version • Misleads developers to debug nonexistent faults in recent changes • Reduces trust in tests • Order-dependent tests are a prominent category of flaky tests • An order-dependent test deterministically passes or fails in any given test order, passes in 1+ order, and fails in 1+ order 8
  • 9. Background: Victim and Polluter •Victim 𝑡1 fails when run after polluter 𝑡2 • Polluter has modified some shared state • Victim’s test assertion depends on some shared state • The same shared state (the variable 𝑥 in the code) // shared variable x is initialized to 0 void t1() { assert x == 0; } // victim void t2() { x = 1; } // polluter TestOrder1 t1 t2 TestOrder2 t2 t1 9
  • 10. Background: Latent-Victim, Latent-Polluter • Latent-Victim 𝑡3: • Assertion depends on shared state; currently no tests modify 𝑦 • victims ⊂ latent-victims • Latent-Polluter 𝑡4: • Shared state modification; currently no tests put assertions on 𝑧 • polluters ⊂ latent-polluters // shared variables x, y, z are initialized to 0 void t1() { assert x == 0; } // victim void t2() { x = 1; } // polluter void t3() { assert y == 0; } // latent-victim void t4() { z = 1; } // latent-polluter 10
  • 11. Non-Idempotent-Outcome (NIO) Test • A test is non-idempotent-outcome (NIO): • t5(); t5()  pass; fail • Passes in the first run but fails in the second when run twice consecutively • An NIO test self-pollutes the state that its own assertions depend on • NIO ⊂ latent-polluter ∧ NIO ⊂ latent-victim // shared variables x, y, z, w are initialized to 0 void t1() { assert x == 0; } // victim void t2() { x = 1; } // polluter void t3() { assert y == 0; } // latent-victim void t4() { z = 1; } // latent-polluter void t5() { assert w = 0; w = 1;} // NIO 11
  • 12. Why should we detect NIOs? • Typically, tests are not run twice • To preempt/prevent flaky tests • Why not fix latent-polluter? • Why not fix latent-victim? • Prior work • Gyori et al.1 detect 575 latent-polluters • Manually filter 381 (66%) false positives (cannot reasonably become polluters) • Huo and Clause2 detect latent-victims with dynamic taint analysis • Do not report how many can reasonably become victims • They do NOT fix any tests • NIOs are more worth fixing • Both latent-victims and latent-polluters at the same time • Easy to detect, no false positives • Well-accepted fixes 1 Gyori et al., “Reliable testing: Detecting state-polluting tests to prevent test dependency”. ISSTA 2015 2 Huo and Clause, “Improving oracle quality by detecting brittle assertions and unused inputs in tests”. In FSE 2014 12
  • 13. Contributions • Definition of NIO tests • Deterministically change from pass to fail when run twice • Effective detection & empirical evaluation • Propose 3 modes for detection • 127 Java test suites  223 NIO tests • 1006 Python projects  138 NIO tests • Well-accepted fixes • Inspect every NIO test (no false positive) • Open pull requests for 268 tests • 192 accepted, 70 pending, only 6 rejected 13
  • 14. Real Example of NIO Buggy Cleaning Code def cmd_mock(): def _cmd_mock(name: str): cmd.__overrides__[name] = [‘/bin/true’] yield _cmd_mock - cmd.__overrides__ = [] + cmd.__overrides__ = {} def test_slurm_command(tmp_path, cmd_mock): cmd_mock('srun') TypeError: list indices must be integers or slices, not str 14
  • 15. Real Example of NIO def to_zero(tvd, northing, easting, surface_northing, surface_easting): # perform some checking - northing -= surface_northing - easting -= surface_easting + northing = northing - surface_northing + easting = easting - surface_easting return tvd, northing, easting # initialization for global variables: g1,…,g5 g1 = ... def test_zero(): # global variables passed in as arguments v1, v2, v3 = to_zero(g1, g2, g3, g4, g5) np.testing.assert_equal (...) # assertion Fix: Avoid Function Side Effect AssertionError: Mismatched elements: 121 / 121 (100%) 15
  • 16. Prevalence of NIO Tests Conclusion: • NIO tests are prevalent enough that every project should run NIO detection at least once Java Python # Test Suites (total) 127 1006 # Test Suites w/ NIO 34 138 % Test Suites w/ NIO 26% 9% # NIO Tests 223 138 16
  • 17. Different Detection Modes • Three Different Modes • Isolated-method • Run1: t1, t1 • Run2: t2, t2 • Run3: t3, t3 • Isolated-class • Run1: t1, t1, t2, t2 • Run2: t3, t3 • Entire-suite • Run1: t1, t1, t2, t2, t3, t3 • Conclusion • All three modes detect similar tests • Isolated-method (223) > Isolated-class (212) > Entire-suite (210) • Entire-suite has the lowest overhead • Why differ? See paper for details TestClass A t1 t2 TestClass B t3 Test Suite 17
  • 18. • We detect 361 (233 Java + 138 Python) NIO tests • We fix 268 NIO tests by opening Pull Requests • 192 tests accepted • 70 tests pending • 6 tests are rejected • We do not fix 51 NIO tests • Cannot localize pollution • Difficult to clean the pollution • 42 tests are N/A • Not NIO in the latest version (fixed/deleted/etc) • Conclusion • Developers are generally positive about fixes for NIO tests • Providing reproducing steps and explaining the motivation help Experience with Fixing NIO Tests 192 70 6 51 42 Accepted Pending Rejected Do not Fix N/A 18
  • 19. NIO vs. Polluter vs. Victim • NIO tests are related to but not subsumed by polluters and victims • Detecting NIO tests can be an effective way to preempt polluters and victims 19
  • 20. Conclusions • We focus on Non-Idempotent-Outcome (NIO) tests • Deterministically change from pass to fail when run twice • Detect and fix NIO tests • Preempt order-dependent flaky tests • Importance: in the intersection of latent-polluters and latent-victims • Detect 361 NIO tests (223 Java + 138 Python) • Opened pull requests for 268 tests, with 192 accepted • Dataset publicly available: • https://sites.google.com/view/nio-tests • IDoFT dataset (all flaky tests): https://github.com/TestingResearchIllinois/idoft Questions? Email: Anjiang Wei <[email protected]> 20