Blame - docs/testing/gtest_flake_tips.md - chromium/src.git

blob: 7c6630ccbc35108c4f301d68866023d1c472edfc [file] [log] [blame] [view]

Nick Burris	02eae45	2020-11-05 20:39:03	[diff] [blame]	1	# Addressing Flaky GTests
				2
				3	## Understanding builder results
				4
Erik Staab	74781fd	2022-11-02 17:30:27	[diff] [blame]	5	[LUCI Analysis](https://luci-analysis.appspot.com/p/chromium/clusters) lists the
				6	top flake clusters of tests along with any associated bug and failure counts in
				7	different contexts.
Nick Burris	02eae45	2020-11-05 20:39:03	[diff] [blame]	8
				9	## Reproducing the flaky test
				10
				11	If debugging via bot is too slow or you otherwise need to drill further into the
				12	cause of the flake, you can try to reproduce the flake locally. Reproducing the
				13	flake can be difficult, so it can help to try and replicate the test environment
				14	as closely as possible.
				15
				16	Copy the gn args from one of the bots where the flake occurs, and try to choose
				17	a bot close to your system, i.e. linux-rel if you're building on linux. To get
				18	the gn args, you can again click on the timestamp in the flake portal to view
				19	the bot run details, and search for the "lookup GN args" build step to copy the
				20	args.
				21
				22	![bot_gn_args]
				23
				24	Build and run the test locally. Depending on the frequency of the flake, it may
				25	take some time to reproduce. Some helpful flags:
				26	- --gtest_repeat=100
				27	- --gtest_also_run_disabled_tests (if the flaky test(s) you're looking at have
				28	been disabled)
				29
				30	If you're unable to reproduce the flake locally, you can also try uploading your
				31	patch with the debug logging and flaky test enabled to try running the bot to
				32	reproduce the flake with more information.
				33
George Benz	7a43768c	2022-10-17 22:11:30	[diff] [blame]	34	Another good solution is to use
				35	Swarming -- which will let you mimic bot conditions to better reproduce flakes
				36	that actually occur on CQ bots.
				37
				38	### Swarming
				39	For a more detailed dive into swarming you can follow this
				40	[link](https://chromium.googlesource.com/chromium/src/+/master/docs/workflow/debugging-with-swarming.md#authenticating).
				41
				42	As an example, suppose we have built Chrome using the GN args from
				43	above into a directory `out/linux-rel`, then we can simply run this command
				44	within the `chromium/src` directory:
				45
				46	```
				47	tools/run-swarmed.py out/linux-rel browser_tests -- --gtest_filter="<YOUR_TEST_NAME_HERE>" --gtest_repeat=20 --gtest_also_run_disabled_tests
				48	```
				49
				50	This allows us to quickly iterate over errors using logs to reproduce flakes and
				51	even fix them!
				52
Nick Burris	02eae45	2020-11-05 20:39:03	[diff] [blame]	53	>TODO: Add more tips for reproducing flaky tests
				54
				55	## Debugging the flaky test
				56
				57	If the test is flakily timing out, consider any asynchronous code that may cause
				58	race conditions, where the test subject may early exit and miss a callback, or
				59	return faster than the test can start waiting for it (i.e. make sure event
Nick Burris	00c5f54	2020-11-30 17:55:22	[diff] [blame]	60	listeners are spawned before invoking the event). Make sure event listeners are
				61	for the proper event instead of a proxy (e.g. [Wait for the correct event in
				62	test](https://chromium.googlesource.com/chromium/src/+/6da09f7510e94d2aebbbed13b038d71c511d6cbc)).
				63
				64	Consider possible bugs in the system or test infrastructure (e.g. [races in
				65	glibc](https://bugs.chromium.org/p/chromium/issues/detail?id=1010318)).
Nick Burris	02eae45	2020-11-05 20:39:03	[diff] [blame]	66
				67	For browsertest flakes, consider possible inter-process issues, such as the
Nick Burris	00c5f54	2020-11-30 17:55:22	[diff] [blame]	68	renderer taking too long or returning something unexpected (e.g. [flaky
				69	RenderFrameHostImplBrowserTest](https://bugs.chromium.org/p/chromium/issues/detail?id=1120305)).
Nick Burris	02eae45	2020-11-05 20:39:03	[diff] [blame]	70
Nick Burris	00c5f54	2020-11-30 17:55:22	[diff] [blame]	71	For browsertest flakes that check EvalJs results, make sure test objects are not
				72	destroyed before JS may read their values (e.g. [flaky
				73	PaymentAppBrowserTest](https://chromium.googlesource.com/chromium/src/+/6089f3480c5036c73464661b3b1b6b82807b56a3)).
Nick Burris	02eae45	2020-11-05 20:39:03	[diff] [blame]	74
George Benz	7a43768c	2022-10-17 22:11:30	[diff] [blame]	75	For browsertest flakes that involve dialogs or widgets, make sure that test
				76	objects are not destroyed because focus is lost on the dialog (e.g [flaky AccessCodeCastHandlerBrowserTest](https://chromium-review.googlesource.com/c/chromium/src/+/3951132)).
				77
Nick Burris	02eae45	2020-11-05 20:39:03	[diff] [blame]	78	## Preventing similar flakes
				79
				80	Once you understand the problem and have a fix for the test, think about how the
				81	fix may apply to other tests, or if documentation can be improved either in the
				82	relevant code or this flaky test documentation.
				83
				84
George Benz	7a43768c	2022-10-17 22:11:30	[diff] [blame]	85	[bot_gn_args]: images/bot_gn_args.png