blob: 7c6630ccbc35108c4f301d68866023d1c472edfc [file] [log] [blame] [view]
Nick Burris02eae452020-11-05 20:39:031# Addressing Flaky GTests
2
3## Understanding builder results
4
Erik Staab74781fd2022-11-02 17:30:275[LUCI Analysis](https://luci-analysis.appspot.com/p/chromium/clusters) lists the
6top flake clusters of tests along with any associated bug and failure counts in
7different contexts.
Nick Burris02eae452020-11-05 20:39:038
9## Reproducing the flaky test
10
11If debugging via bot is too slow or you otherwise need to drill further into the
12cause of the flake, you can try to reproduce the flake locally. Reproducing the
13flake can be difficult, so it can help to try and replicate the test environment
14as closely as possible.
15
16Copy the gn args from one of the bots where the flake occurs, and try to choose
17a bot close to your system, i.e. linux-rel if you're building on linux. To get
18the gn args, you can again click on the timestamp in the flake portal to view
19the bot run details, and search for the "lookup GN args" build step to copy the
20args.
21
22![bot_gn_args]
23
24Build and run the test locally. Depending on the frequency of the flake, it may
25take some time to reproduce. Some helpful flags:
26 - --gtest_repeat=100
27 - --gtest_also_run_disabled_tests (if the flaky test(s) you're looking at have
28been disabled)
29
30If you're unable to reproduce the flake locally, you can also try uploading your
31patch with the debug logging and flaky test enabled to try running the bot to
32reproduce the flake with more information.
33
George Benz7a43768c2022-10-17 22:11:3034Another good solution is to use
35*Swarming* -- which will let you mimic bot conditions to better reproduce flakes
36that actually occur on CQ bots.
37
38### Swarming
39For a more detailed dive into swarming you can follow this
40[link](https://chromium.googlesource.com/chromium/src/+/master/docs/workflow/debugging-with-swarming.md#authenticating).
41
42As an example, suppose we have built Chrome using the GN args from
43above into a directory `out/linux-rel`, then we can simply run this command
44within the `chromium/src` directory:
45
46```
47tools/run-swarmed.py out/linux-rel browser_tests -- --gtest_filter="*<YOUR_TEST_NAME_HERE>*" --gtest_repeat=20 --gtest_also_run_disabled_tests
48```
49
50This allows us to quickly iterate over errors using logs to reproduce flakes and
51even fix them!
52
Nick Burris02eae452020-11-05 20:39:0353>TODO: Add more tips for reproducing flaky tests
54
55## Debugging the flaky test
56
57If the test is flakily timing out, consider any asynchronous code that may cause
58race conditions, where the test subject may early exit and miss a callback, or
59return faster than the test can start waiting for it (i.e. make sure event
Nick Burris00c5f542020-11-30 17:55:2260listeners are spawned before invoking the event). Make sure event listeners are
61for the proper event instead of a proxy (e.g. [Wait for the correct event in
62test](https://chromium.googlesource.com/chromium/src/+/6da09f7510e94d2aebbbed13b038d71c511d6cbc)).
63
64Consider possible bugs in the system or test infrastructure (e.g. [races in
65glibc](https://bugs.chromium.org/p/chromium/issues/detail?id=1010318)).
Nick Burris02eae452020-11-05 20:39:0366
67For browsertest flakes, consider possible inter-process issues, such as the
Nick Burris00c5f542020-11-30 17:55:2268renderer taking too long or returning something unexpected (e.g. [flaky
69RenderFrameHostImplBrowserTest](https://bugs.chromium.org/p/chromium/issues/detail?id=1120305)).
Nick Burris02eae452020-11-05 20:39:0370
Nick Burris00c5f542020-11-30 17:55:2271For browsertest flakes that check EvalJs results, make sure test objects are not
72destroyed before JS may read their values (e.g. [flaky
73PaymentAppBrowserTest](https://chromium.googlesource.com/chromium/src/+/6089f3480c5036c73464661b3b1b6b82807b56a3)).
Nick Burris02eae452020-11-05 20:39:0374
George Benz7a43768c2022-10-17 22:11:3075For browsertest flakes that involve dialogs or widgets, make sure that test
76objects are not destroyed because focus is lost on the dialog (e.g [flaky AccessCodeCastHandlerBrowserTest](https://chromium-review.googlesource.com/c/chromium/src/+/3951132)).
77
Nick Burris02eae452020-11-05 20:39:0378## Preventing similar flakes
79
80Once you understand the problem and have a fix for the test, think about how the
81fix may apply to other tests, or if documentation can be improved either in the
82relevant code or this flaky test documentation.
83
84
George Benz7a43768c2022-10-17 22:11:3085[bot_gn_args]: images/bot_gn_args.png