blob: 32c46f4bd92d9a75dc93adb04b8e588145f29969 [file] [log] [blame] [view]
Kai Ninomiyaa6429fb32018-03-30 01:30:561# GPU Testing
2
3This set of pages documents the setup and operation of the GPU bots and try
4servers, which verify the correctness of Chrome's graphically accelerated
5rendering pipeline.
6
7[TOC]
8
9## Overview
10
11The GPU bots run a different set of tests than the majority of the Chromium
12test machines. The GPU bots specifically focus on tests which exercise the
13graphics processor, and whose results are likely to vary between graphics card
14vendors.
15
16Most of the tests on the GPU bots are run via the [Telemetry framework].
17Telemetry was originally conceived as a performance testing framework, but has
18proven valuable for correctness testing as well. Telemetry directs the browser
19to perform various operations, like page navigation and test execution, from
20external scripts written in Python. The GPU bots launch the full Chromium
21browser via Telemetry for the majority of the tests. Using the full browser to
22execute tests, rather than smaller test harnesses, has yielded several
23advantages: testing what is shipped, improved reliability, and improved
24performance.
25
26[Telemetry framework]: https://github.com/catapult-project/catapult/tree/master/telemetry
27
28A subset of the tests, called "pixel tests", grab screen snapshots of the web
29page in order to validate Chromium's rendering architecture end-to-end. Where
30necessary, GPU-specific results are maintained for these tests. Some of these
31tests verify just a few pixels, using handwritten code, in order to use the
32same validation for all brands of GPUs.
33
34The GPU bots use the Chrome infrastructure team's [recipe framework], and
35specifically the [`chromium`][recipes/chromium] and
36[`chromium_trybot`][recipes/chromium_trybot] recipes, to describe what tests to
37execute. Compared to the legacy master-side buildbot scripts, recipes make it
38easy to add new steps to the bots, change the bots' configuration, and run the
39tests locally in the same way that they are run on the bots. Additionally, the
40`chromium` and `chromium_trybot` recipes make it possible to send try jobs which
41add new steps to the bots. This single capability is a huge step forward from
42the previous configuration where new steps were added blindly, and could cause
43failures on the tryservers. For more details about the configuration of the
44bots, see the [GPU bot details].
45
John Palmer046f9872021-05-24 01:24:5646[recipe framework]: https://chromium.googlesource.com/external/github.com/luci/recipes-py/+/main/doc/user_guide.md
47[recipes/chromium]: https://chromium.googlesource.com/chromium/tools/build/+/main/scripts/slave/recipes/chromium.py
48[recipes/chromium_trybot]: https://chromium.googlesource.com/chromium/tools/build/+/main/scripts/slave/recipes/chromium_trybot.py
Kai Ninomiyaa6429fb32018-03-30 01:30:5649[GPU bot details]: gpu_testing_bot_details.md
50
51The physical hardware for the GPU bots lives in the Swarming pool\*. The
52Swarming infrastructure ([new docs][new-testing-infra], [older but currently
53more complete docs][isolated-testing-infra]) provides many benefits:
54
55* Increased parallelism for the tests; all steps for a given tryjob or
56 waterfall build run in parallel.
57* Simpler scaling: just add more hardware in order to get more capacity. No
58 manual configuration or distribution of hardware needed.
59* Easier to run certain tests only on certain operating systems or types of
60 GPUs.
61* Easier to add new operating systems or types of GPUs.
62* Clearer description of the binary and data dependencies of the tests. If
63 they run successfully locally, they'll run successfully on the bots.
64
65(\* All but a few one-off GPU bots are in the swarming pool. The exceptions to
66the rule are described in the [GPU bot details].)
67
68The bots on the [chromium.gpu.fyi] waterfall are configured to always test
69top-of-tree ANGLE. This setup is done with a few lines of code in the
70[tools/build workspace]; search the code for "angle".
71
72These aspects of the bots are described in more detail below, and in linked
73pages. There is a [presentation][bots-presentation] which gives a brief
74overview of this documentation and links back to various portions.
75
76<!-- XXX: broken link -->
77[new-testing-infra]: https://github.com/luci/luci-py/wiki
78[isolated-testing-infra]: https://www.chromium.org/developers/testing/isolated-testing/infrastructure
Kenneth Russell8a386d42018-06-02 09:48:0179[chromium.gpu]: https://ci.chromium.org/p/chromium/g/chromium.gpu/console
80[chromium.gpu.fyi]: https://ci.chromium.org/p/chromium/g/chromium.gpu.fyi/console
Josip Sokcevicba144412020-09-09 20:57:0581[tools/build workspace]: https://source.chromium.org/chromium/chromium/tools/build/+/HEAD:recipes/recipe_modules/chromium_tests/builders/chromium_gpu_fyi.py
Kai Ninomiyaa6429fb32018-03-30 01:30:5682[bots-presentation]: https://docs.google.com/presentation/d/1BC6T7pndSqPFnituR7ceG7fMY7WaGqYHhx5i9ECa8EI/edit?usp=sharing
83
84## Fleet Status
85
86Please see the [GPU Pixel Wrangling instructions] for links to dashboards
87showing the status of various bots in the GPU fleet.
88
Brian Sheedy5a4c0a392021-09-22 21:28:3589[GPU Pixel Wrangling instructions]: http://go/gpu-pixel-wrangler#fleet-status
Kai Ninomiyaa6429fb32018-03-30 01:30:5690
91## Using the GPU Bots
92
93Most Chromium developers interact with the GPU bots in two ways:
94
951. Observing the bots on the waterfalls.
962. Sending try jobs to them.
97
98The GPU bots are grouped on the [chromium.gpu] and [chromium.gpu.fyi]
99waterfalls. Their current status can be easily observed there.
100
101To send try jobs, you must first upload your CL to the codereview server. Then,
102either clicking the "CQ dry run" link or running from the command line:
103
104```sh
105git cl try
106```
107
108Sends your job to the default set of try servers.
109
110The GPU tests are part of the default set for Chromium CLs, and are run as part
111of the following tryservers' jobs:
112
Stephen Martinis089f5f02019-02-12 02:42:24113* [linux-rel], formerly on the `tryserver.chromium.linux` waterfall
114* [mac-rel], formerly on the `tryserver.chromium.mac` waterfall
Ben Pastene9cf11392022-11-14 19:36:25115* [win-rel], formerly on the `tryserver.chromium.win` waterfall
Kai Ninomiyaa6429fb32018-03-30 01:30:56116
Ben Pastene9cf11392022-11-14 19:36:25117[linux-rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux-rel?limit=100
118[mac-rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac-rel?limit=100
119[win-rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win-rel?limit=100
Kai Ninomiyaa6429fb32018-03-30 01:30:56120
121Scan down through the steps looking for the text "GPU"; that identifies those
122tests run on the GPU bots. For each test the "trigger" step can be ignored; the
123step further down for the test of the same name contains the results.
124
125It's usually not necessary to explicitly send try jobs just for verifying GPU
126tests. If you want to, you must invoke "git cl try" separately for each
127tryserver master you want to reference, for example:
128
129```sh
Stephen Martinis089f5f02019-02-12 02:42:24130git cl try -b linux-rel
131git cl try -b mac-rel
132git cl try -b win7-rel
Kai Ninomiyaa6429fb32018-03-30 01:30:56133```
134
135Alternatively, the Gerrit UI can be used to send a patch set to these try
136servers.
137
138Three optional tryservers are also available which run additional tests. As of
139this writing, they ran longer-running tests that can't run against all Chromium
140CLs due to lack of hardware capacity. They are added as part of the included
141tryservers for code changes to certain sub-directories.
142
Corentin Wallezb78c44a2018-04-12 14:29:47143* [linux_optional_gpu_tests_rel] on the [luci.chromium.try] waterfall
144* [mac_optional_gpu_tests_rel] on the [luci.chromium.try] waterfall
145* [win_optional_gpu_tests_rel] on the [luci.chromium.try] waterfall
Kai Ninomiyaa6429fb32018-03-30 01:30:56146
Corentin Wallezb78c44a2018-04-12 14:29:47147[linux_optional_gpu_tests_rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_optional_gpu_tests_rel
148[mac_optional_gpu_tests_rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_optional_gpu_tests_rel
149[win_optional_gpu_tests_rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win_optional_gpu_tests_rel
Kenneth Russell42732952018-06-27 02:08:42150[luci.chromium.try]: https://ci.chromium.org/p/chromium/g/luci.chromium.try/builders
Kai Ninomiyaa6429fb32018-03-30 01:30:56151
152Tryservers for the [ANGLE project] are also present on the
153[tryserver.chromium.angle] waterfall. These are invoked from the Gerrit user
154interface. They are configured similarly to the tryservers for regular Chromium
155patches, and run the same tests that are run on the [chromium.gpu.fyi]
156waterfall, in the same way (e.g., against ToT ANGLE).
157
158If you find it necessary to try patches against other sub-repositories than
159Chromium (`src/`) and ANGLE (`src/third_party/angle/`), please
160[file a bug](http://crbug.com/new) with component Internals\>GPU\>Testing.
161
John Palmer046f9872021-05-24 01:24:56162[ANGLE project]: https://chromium.googlesource.com/angle/angle/+/main/README.md
Kai Ninomiyaa6429fb32018-03-30 01:30:56163[tryserver.chromium.angle]: https://build.chromium.org/p/tryserver.chromium.angle/waterfall
164[file a bug]: http://crbug.com/new
165
166## Running the GPU Tests Locally
167
168All of the GPU tests running on the bots can be run locally from a Chromium
169build. Many of the tests are simple executables:
170
171* `angle_unittests`
Takuto Ikutaf5333252019-11-06 16:07:08172* `gl_tests`
Kai Ninomiyaa6429fb32018-03-30 01:30:56173* `gl_unittests`
174* `tab_capture_end2end_tests`
175
176Some run only on the chromium.gpu.fyi waterfall, either because there isn't
177enough machine capacity at the moment, or because they're closed-source tests
178which aren't allowed to run on the regular Chromium waterfalls:
179
180* `angle_deqp_gles2_tests`
181* `angle_deqp_gles3_tests`
182* `angle_end2end_tests`
183* `audio_unittests`
184
185The remaining GPU tests are run via Telemetry. In order to run them, just
Brian Sheedy251556b2021-11-15 23:28:09186build the `telemetry_gpu_integration_test` target (or
187`telemetry_gpu_integration_test_android_chrome` for Android) and then
Kai Ninomiyaa6429fb32018-03-30 01:30:56188invoke `src/content/test/gpu/run_gpu_integration_test.py` with the appropriate
189argument. The tests this script can invoke are
190in `src/content/test/gpu/gpu_tests/`. For example:
191
192* `run_gpu_integration_test.py context_lost --browser=release`
Brian Sheedy3a9505b92023-04-19 13:02:05193* `run_gpu_integration_test.py webgl1_conformance --browser=release`
194* `run_gpu_integration_test.py webgl2_conformance --browser=release --webgl-conformance-version=2.0.1`
Kai Ninomiyaa6429fb32018-03-30 01:30:56195* `run_gpu_integration_test.py maps --browser=release`
196* `run_gpu_integration_test.py screenshot_sync --browser=release`
197* `run_gpu_integration_test.py trace_test --browser=release`
198
Brian Sheedyc4650ad02019-07-29 17:31:38199The pixel tests are a bit special. See
200[the section on running them locally](#Running-the-pixel-tests-locally) for
201details.
202
Brian Sheedy251556b2021-11-15 23:28:09203The `--browser=release` argument can be changed to `--browser=debug` if you
204built in a directory such as `out/Debug`. If you built in some non-standard
205directory such as `out/my_special_gn_config`, you can instead specify
206`--browser=exact --browser-executable=out/my_special_gn_config/chrome`.
207
208If you're testing on Android, use `--browser=android-chromium` instead of
209`--browser=release/debug` to invoke it. Additionally, Telemetry will likely
210complain about being unable to find the browser binary on Android if you build
211in a non-standard output directory. Thus, `out/Release` or `out/Debug` are
212suggested when testing on Android.
Kenneth Russellfa3ffde2018-10-24 21:24:38213
Brian Sheedy09356cf2023-01-19 20:00:33214If you are running on a platform that does not support multiple browser
215instances at a time (Android or ChromeOS), it is also recommended that you pass
216in `--jobs=1`. This only has an effect on test suites that have parallel test
217support, but failure to pass in the argument for those tests on these platforms
218will result in weird failures due to multiple test processes stepping on each
219other. On other platforms, you are still free to specify `--jobs` to get more
220or less parallelization instead of relying on the default of one test process
221per logical core.
222
Brian Sheedy15587f72021-04-16 19:56:06223**Note:** The tests require some third-party Python packages. Obtaining these
Fabrice de Gans7820a772022-09-16 00:10:30224packages is handled automatically by `vpython3`, and the script's shebang should
Brian Sheedy3a9505b92023-04-19 13:02:05225use vpython if running the script directly. Since shebangs are not used on
226Windows, you will need to manually specify the executable if you are on a
227Windows machine. If you're used to invoking `python3` to run a script, simply
228use `vpython3` instead, e.g. `vpython3 run_gpu_integration_test.py ...`.
Kai Ninomiyaa6429fb32018-03-30 01:30:56229
Kenneth Russellfa3ffde2018-10-24 21:24:38230You can run a subset of tests with this harness:
Kai Ninomiyaa6429fb32018-03-30 01:30:56231
Brian Sheedy3a9505b92023-04-19 13:02:05232* `run_gpu_integration_test.py webgl1_conformance --browser=release
Kai Ninomiyaa6429fb32018-03-30 01:30:56233 --test-filter=conformance_attribs`
234
Brian Sheedy15587f72021-04-16 19:56:06235The exact command used to invoke the test on the bots can be found in one of
236two ways:
Kai Ninomiyaa6429fb32018-03-30 01:30:56237
Brian Sheedy15587f72021-04-16 19:56:062381. Looking at the [json.input][trigger_input] of the trigger step under
239 `requests[task_slices][command]`. The arguments after the last `--` are
240 used to actually run the test.
2411. Looking at the top of a [swarming task][sample_swarming_task].
Kai Ninomiyaa6429fb32018-03-30 01:30:56242
Brian Sheedy15587f72021-04-16 19:56:06243In both cases, the following can be omitted when running locally since they're
244only necessary on swarming:
245* `testing/test_env.py`
246* `testing/scripts/run_gpu_integration_test_as_googletest.py`
247* `--isolated-script-test-output`
248* `--isolated-script-test-perf-output`
Kai Ninomiyaa6429fb32018-03-30 01:30:56249
Kai Ninomiyaa6429fb32018-03-30 01:30:56250
Brian Sheedy15587f72021-04-16 19:56:06251[trigger_input]: https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8849851608240828544/+/u/test_pre_run__14_/l_trigger__webgl2_conformance_d3d11_passthrough_tests_on_NVIDIA_GPU_on_Windows_on_Windows-10-18363/json.input
252[sample_swarming_task]: https://chromium-swarm.appspot.com/task?id=52f06058bfb31b10
Kai Ninomiyaa6429fb32018-03-30 01:30:56253
254The Maps test requires you to authenticate to cloud storage in order to access
255the Web Page Reply archive containing the test. See [Cloud Storage Credentials]
256for documentation on setting this up.
257
258[Cloud Storage Credentials]: gpu_testing_bot_details.md#Cloud-storage-credentials
259
Brian Sheedy6bd9c162022-02-02 21:44:37260### Bisecting ChromeOS Failures Locally
261
262Failures that occur on the ChromeOS amd64-generic configuration are easy to
263reproduce due to the VM being readily available for use, but doing so requires
264some additional steps to the bisect process. The following are steps that can be
265followed using two terminals and the [Simple Chrome SDK] to bisect a ChromeOS
266failure.
267
2681. Terminal 1: Start the bisect as normal `git bisect start`
269 `git bisect good <good_revision>` `git bisect bad <bad_revision>`
2701. Terminal 1: Sync to the revision that git spits out
271 `gclient sync -r src@<revision>`
2721. Terminal 2: Enter the Simple Chrome SDK
273 `cros chrome-sdk --board amd64-generic-vm --log-level info --download-vm --clear-sdk-cache`
2741. Terminal 2: Compile the relevant target (probably the GPU integration tests)
275 `autoninja -C out_amd64-generic-vm/Release/ telemetry_gpu_integration_test`
2761. Terminal 2: Start the VM `cros_vm --start`
2771. Terminal 2: Deploy the Chrome binary to the VM
278 `deploy_chrome --build-dir out_amd64-generic-vm/Release/ --device 127.0.0.1:9222`
279 This will require you to accept a prompt twice, once because of a board
280 mismatch and once because the VM still has rootfs verification enabled.
2811. Terminal 1: Run your test on the VM. For GPU integration tests, this involves
282 specifying `--browser cros-chrome --remote 127.0.0.1 --remote-ssh-port 9222`
2831. Terminal 2: After determining whether the revision is good or bad, shut down
284 the VM `cros_vm --stop`
2851. Terminal 2: Exit the SKD `exit`
2861. Terminal 1: Let git know whether the revision was good or bad
287 `git bisect good`/`git bisect bad`
2881. Repeat from step 2 with the new revision git spits out.
289
290The repeated entry/exit from the SDK between revisions is to ensure that the
291VM image is in sync with the Chromium revision, as it is possible for
292regressions to be caused by an update to the image itself rather than a Chromium
293change.
294
295[Simple Chrome SDK]: https://chromium.googlesource.com/chromiumos/docs/+/HEAD/simple_chrome_workflow.md
296
Brian Sheedy15587f72021-04-16 19:56:06297### Telemetry Test Suites
298The Telemetry-based tests are all technically the same target,
299`telemetry_gpu_integration_test`, just run with different runtime arguments. The
300first positional argument passed determines which suite will run, and additional
301runtime arguments may cause the step name to change on the bots. Here is a list
302of all suites and resulting step names as of April 15th 2021:
303
304* `context_lost`
305 * `context_lost_passthrough_tests`
306 * `context_lost_tests`
307 * `context_lost_validating_tests`
Brian Sheedy15587f72021-04-16 19:56:06308* `hardware_accelerated_feature`
Brian Sheedy15587f72021-04-16 19:56:06309 * `hardware_accelerated_feature_tests`
310* `gpu_process`
Brian Sheedy15587f72021-04-16 19:56:06311 * `gpu_process_launch_tests`
312* `info_collection`
313 * `info_collection_tests`
314* `maps`
Brian Sheedy15587f72021-04-16 19:56:06315 * `maps_pixel_passthrough_test`
316 * `maps_pixel_test`
317 * `maps_pixel_validating_test`
318 * `maps_tests`
319* `pixel`
320 * `android_webview_pixel_skia_gold_test`
Brian Sheedy15587f72021-04-16 19:56:06321 * `egl_pixel_skia_gold_test`
Brian Sheedy15587f72021-04-16 19:56:06322 * `pixel_skia_gold_passthrough_test`
323 * `pixel_skia_gold_validating_test`
324 * `pixel_tests`
Brian Sheedy15587f72021-04-16 19:56:06325 * `vulkan_pixel_skia_gold_test`
326* `power`
327 * `power_measurement_test`
328* `screenshot_sync`
Brian Sheedy15587f72021-04-16 19:56:06329 * `screenshot_sync_passthrough_tests`
330 * `screenshot_sync_tests`
331 * `screenshot_sync_validating_tests`
332* `trace_test`
333 * `trace_test`
334* `webgl_conformance`
335 * `webgl2_conformance_d3d11_passthrough_tests`
336 * `webgl2_conformance_gl_passthrough_tests`
337 * `webgl2_conformance_gles_passthrough_tests`
338 * `webgl2_conformance_tests`
339 * `webgl2_conformance_validating_tests`
340 * `webgl_conformance_d3d11_passthrough_tests`
341 * `webgl_conformance_d3d9_passthrough_tests`
342 * `webgl_conformance_fast_call_tests`
343 * `webgl_conformance_gl_passthrough_tests`
344 * `webgl_conformance_gles_passthrough_tests`
345 * `webgl_conformance_metal_passthrough_tests`
346 * `webgl_conformance_swangle_passthrough_tests`
Brian Sheedy15587f72021-04-16 19:56:06347 * `webgl_conformance_tests`
348 * `webgl_conformance_validating_tests`
349 * `webgl_conformance_vulkan_passthrough_tests`
350
Kenneth Russellfa3ffde2018-10-24 21:24:38351### Running the pixel tests locally
Kai Ninomiyaa6429fb32018-03-30 01:30:56352
Brian Sheedyc4650ad02019-07-29 17:31:38353The pixel tests are a special case because they use an external Skia service
354called Gold to handle image approval and storage. See
355[GPU Pixel Testing With Gold] for specifics.
Kenneth Russellfa3ffde2018-10-24 21:24:38356
Brian Sheedyc4650ad02019-07-29 17:31:38357[GPU Pixel Testing With Gold]: gpu_pixel_testing_with_gold.md
Kenneth Russellfa3ffde2018-10-24 21:24:38358
Brian Sheedyc4650ad02019-07-29 17:31:38359TL;DR is that the pixel tests use a binary called `goldctl` to download and
360upload data when running pixel tests.
Kenneth Russellfa3ffde2018-10-24 21:24:38361
Brian Sheedyc4650ad02019-07-29 17:31:38362Normally, `goldctl` uploads images and image metadata to the Gold server when
363used. This is not desirable when running locally for a couple reasons:
Kenneth Russellfa3ffde2018-10-24 21:24:38364
Brian Sheedyc4650ad02019-07-29 17:31:383651. Uploading requires the user to be whitelisted on the server, and whitelisting
366everyone who wants to run the tests locally is not a viable solution.
3672. Images produced during local runs are usually slightly different from those
368that are produced on the bots due to hardware/software differences. Thus, most
369images uploaded to Gold from local runs would likely only ever actually be used
370by tests run on the machine that initially generated those images, which just
371adds noise to the list of approved images.
Kenneth Russellfa3ffde2018-10-24 21:24:38372
Brian Sheedyc4650ad02019-07-29 17:31:38373Additionally, the tests normally rely on the Gold server for viewing images
374produced by a test run. This does not work if the data is not actually uploaded.
Kenneth Russellfa3ffde2018-10-24 21:24:38375
Brian Sheedyb70d3102019-10-14 22:41:50376The pixel tests contain logic to automatically determine whether they are
377running on a workstation or not, as well as to determine what git revision is
378being tested. This *should* mean that the pixel tests will automatically work
379when run locally. However, if the local run detection code fails for some
380reason, you can manually pass some flags to force the same behavior:
381
Brian Sheedy2df4e142020-06-15 21:49:33382In order to get around the local run issues, simply pass the
383`--local-pixel-tests` flag to the tests. This will disable uploading, but
384otherwise go through the same steps as a test normally would. Each test will
385also print out `file://` URLs to the produced image, the closest image for the
386test known to Gold, and the diff between the two.
Kenneth Russellfa3ffde2018-10-24 21:24:38387
Brian Sheedyc4650ad02019-07-29 17:31:38388Because the image produced by the test locally is likely slightly different from
389any of the approved images in Gold, local test runs are likely to fail during
390the comparison step. In order to cut down on the amount of noise, you can also
391pass the `--no-skia-gold-failure` flag to not fail the test on a failed image
392comparison. When using `--no-skia-gold-failure`, you'll also need to pass the
393`--passthrough` flag in order to actually see the link output.
Kenneth Russellfa3ffde2018-10-24 21:24:38394
Brian Sheedyc4650ad02019-07-29 17:31:38395Example usage:
Brian Sheedy2df4e142020-06-15 21:49:33396`run_gpu_integration_test.py pixel --no-skia-gold-failure --local-pixel-tests
jonross8de90742019-10-15 19:10:48397--passthrough`
Kenneth Russellfa3ffde2018-10-24 21:24:38398
jonross8de90742019-10-15 19:10:48399If, for some reason, the local run code is unable to determine what the git
Brian Sheedy4d335deb2020-04-01 20:47:32400revision is, simply pass `--git-revision aabbccdd`. Note that `aabbccdd` must
jonross8de90742019-10-15 19:10:48401be replaced with an actual Chromium src revision (typically whatever revision
Andrew Williamsbbc1a1e2021-07-21 01:51:22402origin/main is currently synced to) in order for the tests to work. This can
jonross8de90742019-10-15 19:10:48403be done automatically using:
Brian Sheedy2df4e142020-06-15 21:49:33404``run_gpu_integration_test.py pixel --no-skia-gold-failure --local-pixel-tests
Andrew Williamsbbc1a1e2021-07-21 01:51:22405--passthrough --git-revision `git rev-parse origin/main` ``
Kai Ninomiyaa6429fb32018-03-30 01:30:56406
Kai Ninomiyaa6429fb32018-03-30 01:30:56407## Running Binaries from the Bots Locally
408
409Any binary run remotely on a bot can also be run locally, assuming the local
410machine loosely matches the architecture and OS of the bot.
411
412The easiest way to do this is to find the ID of the swarming task and use
413"swarming.py reproduce" to re-run it:
414
Takuto Ikuta2d01a492021-06-04 00:28:58415* `./src/tools/luci-go/swarming reproduce -S https://chromium-swarm.appspot.com [task ID]`
Kai Ninomiyaa6429fb32018-03-30 01:30:56416
417The task ID can be found in the stdio for the "trigger" step for the test. For
418example, look at a recent build from the [Mac Release (Intel)] bot, and
419look at the `gl_unittests` step. You will see something like:
420
Yves Gereya702f6222019-01-24 11:07:30421[Mac Release (Intel)]: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20Release%20%28Intel%29/
Kai Ninomiyaa6429fb32018-03-30 01:30:56422
423```
424Triggered task: gl_unittests on Intel GPU on Mac/Mac-10.12.6/[TRUNCATED_ISOLATE_HASH]/Mac Release (Intel)/83664
425To collect results, use:
426 swarming.py collect -S https://chromium-swarm.appspot.com --json /var/folders/[PATH_TO_TEMP_FILE].json
427Or visit:
428 https://chromium-swarm.appspot.com/user/task/[TASK_ID]
429```
430
431There is a difference between the isolate's hash and Swarming's task ID. Make
432sure you use the task ID and not the isolate's hash.
433
434As of this writing, there seems to be a
435[bug](https://github.com/luci/luci-py/issues/250)
436when attempting to re-run the Telemetry based GPU tests in this way. For the
437time being, this can be worked around by instead downloading the contents of
Brian Sheedy15587f72021-04-16 19:56:06438the isolate. To do so, look into the "Reproducing the task locally" section on
439a swarming task, which contains something like:
Kai Ninomiyaa6429fb32018-03-30 01:30:56440
Brian Sheedy15587f72021-04-16 19:56:06441```
442Download inputs files into directory foo:
Junji Watanabe160300022021-09-27 03:09:53443# (if needed, use "\${platform}" as-is) cipd install "infra/tools/luci/cas/\${platform}" -root bar
444# (if needed) ./bar/cas login
445./bar/cas download -cas-instance projects/chromium-swarm/instances/default_instance -digest 68ae1d6b22673b0ab7b4427ca1fc2a4761c9ee53474105b9076a23a67e97a18a/647 -dir foo
Brian Sheedy15587f72021-04-16 19:56:06446```
Kai Ninomiyaa6429fb32018-03-30 01:30:56447
448Before attempting to download an isolate, you must ensure you have permission
449to access the isolate server. Full instructions can be [found
450here][isolate-server-credentials]. For most cases, you can simply run:
451
Takuto Ikuta2d01a492021-06-04 00:28:58452* `./src/tools/luci-go/isolate login`
Kai Ninomiyaa6429fb32018-03-30 01:30:56453
454The above link requires that you log in with your @google.com credentials. It's
455not known at the present time whether this works with @chromium.org accounts.
456Email kbr@ if you try this and find it doesn't work.
457
458[isolate-server-credentials]: gpu_testing_bot_details.md#Isolate-server-credentials
459
Colin Blundellf27d43f2022-09-19 12:44:14460## Debugging a Specific Subset of Tests on a Specific GPU Bot
461
462When a test exhibits flake on the bots, it can be convenient to run it
463repeatedly with local code modifications on the bot where it is exhibiting
464flake. One way of doing this is via swarming (see the below section). However, a
465lower-overhead alternative that also works in the case where you are looking to
466run on a bot for which you cannot locally build is to locally alter the
467configuration of the bot in question to specify that it should run only the
468tests desired, repeating as many times as desired. Instructions for doing this
469are as follows (see the [example CL] for a concrete instantiation of these
470instructions):
471
4721. In testsuite_exceptions.pyl, find the section for the test suite in question
473 (creating it if it doesn't exist).
4742. Add modifications for the bot in question and specify arguments such that
475 your desired tests are run for the desired number of iterations.
4763. Run testing/buildbot/generate_buildbot_json.py and verify that the JSON file
477 for the bot in question was modified as you would expect.
4784. Upload and run tryjobs on that specific bot via "Choose Tryjobs."
4795. Examine the test results. (You can verify that the tests run were as you
480 expected by examining the test results for individual shards of the run
481 of the test suite in question.)
4826. Add logging/code modifications/etc as desired and go back to step 4,
483 repeating the process until you've uncovered the underlying issue.
4847. Remove the the changes to testsuite_exceptions.pyl and the JSON file if
485 turning the CL into one intended for submission!
486
487Here is an [example CL] that does this.
488
489[example CL]: https://chromium-review.googlesource.com/c/chromium/src/+/3898592/4
490
Kai Ninomiyaa6429fb32018-03-30 01:30:56491## Running Locally Built Binaries on the GPU Bots
492
493See the [Swarming documentation] for instructions on how to upload your binaries to the isolate server and trigger execution on Swarming.
494
John Budorickb2ff2242019-11-14 17:35:59495Be sure to use the correct swarming dimensions for your desired GPU e.g. "1002:6613" instead of "AMD Radeon R7 240 (1002:6613)" which is how it appears on swarming task page. You can query bots in the chromium.tests.gpu pool to find the correct dimensions:
Sunny Sachanandani8d071572019-06-13 20:17:58496
Takuto Ikuta2d01a492021-06-04 00:28:58497* `tools\luci-go\swarming bots -S chromium-swarm.appspot.com -d pool=chromium.tests.gpu`
Sunny Sachanandani8d071572019-06-13 20:17:58498
Kai Ninomiyaa6429fb32018-03-30 01:30:56499[Swarming documentation]: https://www.chromium.org/developers/testing/isolated-testing/for-swes#TOC-Run-a-test-built-locally-on-Swarming
500
Kenneth Russell42732952018-06-27 02:08:42501## Moving Test Binaries from Machine to Machine
502
503To create a zip archive of your personal Chromium build plus all of
504the Telemetry-based GPU tests' dependencies, which you can then move
505to another machine for testing:
506
5071. Build Chrome (into `out/Release` in this example).
Fabrice de Gans7820a772022-09-16 00:10:305081. `vpython3 tools/mb/mb.py zip out/Release/ telemetry_gpu_integration_test out/telemetry_gpu_integration_test.zip`
Kenneth Russell42732952018-06-27 02:08:42509
510Then copy telemetry_gpu_integration_test.zip to another machine. Unzip
511it, and cd into the resulting directory. Invoke
512`content/test/gpu/run_gpu_integration_test.py` as above.
513
514This workflow has been tested successfully on Windows with a
515statically-linked Release build of Chrome.
516
517Note: on one macOS machine, this command failed because of a broken
518`strip-json-comments` symlink in
519`src/third_party/catapult/common/node_runner/node_runner/node_modules/.bin`. Deleting
520that symlink allowed it to proceed.
521
522Note also: on the same macOS machine, with a component build, this
523command failed to zip up a working Chromium binary. The browser failed
524to start with the following error:
525
526`[0626/180440.571670:FATAL:chrome_main_delegate.cc(1057)] Check failed: service_manifest_data_pack_.`
527
528In a pinch, this command could be used to bundle up everything, but
529the "out" directory could be deleted from the resulting zip archive,
530and the Chromium binaries moved over to the target machine. Then the
531command line arguments `--browser=exact --browser-executable=[path]`
532can be used to launch that specific browser.
533
534See the [user guide for mb](../../tools/mb/docs/user_guide.md#mb-zip), the
535meta-build system, for more details.
536
Kai Ninomiyaa6429fb32018-03-30 01:30:56537## Adding New Tests to the GPU Bots
538
539The goal of the GPU bots is to avoid regressions in Chrome's rendering stack.
540To that end, let's add as many tests as possible that will help catch
541regressions in the product. If you see a crazy bug in Chrome's rendering which
542would be easy to catch with a pixel test running in Chrome and hard to catch in
543any of the other test harnesses, please, invest the time to add a test!
544
545There are a couple of different ways to add new tests to the bots:
546
5471. Adding a new test to one of the existing harnesses.
5482. Adding an entire new test step to the bots.
549
550### Adding a new test to one of the existing test harnesses
551
552Adding new tests to the GTest-based harnesses is straightforward and
553essentially requires no explanation.
554
555As of this writing it isn't as easy as desired to add a new test to one of the
556Telemetry based harnesses. See [Issue 352807](http://crbug.com/352807). Let's
557collectively work to address that issue. It would be great to reduce the number
558of steps on the GPU bots, or at least to avoid significantly increasing the
559number of steps on the bots. The WebGL conformance tests should probably remain
560a separate step, but some of the smaller Telemetry based tests
561(`context_lost_tests`, `memory_test`, etc.) should probably be combined into a
562single step.
563
564If you are adding a new test to one of the existing tests (e.g., `pixel_test`),
565all you need to do is make sure that your new test runs correctly via isolates.
566See the documentation from the GPU bot details on [adding new isolated
Daniel Bratellf73f0df2018-09-24 13:52:49567tests][new-isolates] for the gn args and authentication needed to upload
Kai Ninomiyaa6429fb32018-03-30 01:30:56568isolates to the isolate server. Most likely the new test will be Telemetry
Takuto Ikuta2d01a492021-06-04 00:28:58569based, and included in the `telemetry_gpu_test_run` isolate.
Kai Ninomiyaa6429fb32018-03-30 01:30:56570
571[new-isolates]: gpu_testing_bot_details.md#Adding-a-new-isolated-test-to-the-bots
572
Jamie Madill5b0716b2019-10-24 16:43:47573### Adding new steps to the GPU Bots
Kai Ninomiyaa6429fb32018-03-30 01:30:56574
575The tests that are run by the GPU bots are described by a couple of JSON files
576in the Chromium workspace:
577
John Palmer046f9872021-05-24 01:24:56578* [`chromium.gpu.json`](https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/chromium.gpu.json)
579* [`chromium.gpu.fyi.json`](https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/chromium.gpu.fyi.json)
Kai Ninomiyaa6429fb32018-03-30 01:30:56580
581These files are autogenerated by the following script:
582
John Palmer046f9872021-05-24 01:24:56583* [`generate_buildbot_json.py`](https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/generate_buildbot_json.py)
Kai Ninomiyaa6429fb32018-03-30 01:30:56584
Kenneth Russell8a386d42018-06-02 09:48:01585This script is documented in
John Palmer046f9872021-05-24 01:24:56586[`testing/buildbot/README.md`](https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/README.md). The
Kenneth Russell8a386d42018-06-02 09:48:01587JSON files are parsed by the chromium and chromium_trybot recipes, and describe
588two basic types of tests:
Kai Ninomiyaa6429fb32018-03-30 01:30:56589
590* GTests: those which use the Googletest and Chromium's `base/test/launcher/`
591 frameworks.
Kenneth Russell8a386d42018-06-02 09:48:01592* Isolated scripts: tests whose initial entry point is a Python script which
593 follows a simple convention of command line argument parsing.
594
595The majority of the GPU tests are however:
596
597* Telemetry based tests: an isolated script test which is built on the
598 Telemetry framework and which launches the entire browser.
Kai Ninomiyaa6429fb32018-03-30 01:30:56599
600A prerequisite of adding a new test to the bots is that that test [run via
Kenneth Russell8a386d42018-06-02 09:48:01601isolates][new-isolates]. Once that is done, modify `test_suites.pyl` to add the
602test to the appropriate set of bots. Be careful when adding large new test steps
603to all of the bots, because the GPU bots are a limited resource and do not
604currently have the capacity to absorb large new test suites. It is safer to get
605new tests running on the chromium.gpu.fyi waterfall first, and expand from there
606to the chromium.gpu waterfall (which will also make them run against every
Stephen Martinis089f5f02019-02-12 02:42:24607Chromium CL by virtue of the `linux-rel`, `mac-rel`, `win7-rel` and
608`android-marshmallow-arm64-rel` tryservers' mirroring of the bots on this
609waterfall – so be careful!).
Kai Ninomiyaa6429fb32018-03-30 01:30:56610
611Tryjobs which add new test steps to the chromium.gpu.json file will run those
612new steps during the tryjob, which helps ensure that the new test won't break
613once it starts running on the waterfall.
614
615Tryjobs which modify chromium.gpu.fyi.json can be sent to the
616`win_optional_gpu_tests_rel`, `mac_optional_gpu_tests_rel` and
617`linux_optional_gpu_tests_rel` tryservers to help ensure that they won't
618break the FYI bots.
619
Kenneth Russellfa3ffde2018-10-24 21:24:38620## Debugging Pixel Test Failures on the GPU Bots
621
Brian Sheedyc4650ad02019-07-29 17:31:38622If pixel tests fail on the bots, the build step will contain either one or more
623links titled `gold_triage_link for <test name>` or a single link titled
624`Too many artifacts produced to link individually, click for links`, which
625itself will contain links. In either case, these links will direct to Gold
626pages showing the image produced by the image and the approved image that most
627closely matches it.
Kenneth Russellfa3ffde2018-10-24 21:24:38628
Quinten Yearsley317532d2021-10-20 17:10:31629Note that for the tests which programmatically check colors in certain regions of
Brian Sheedyc4650ad02019-07-29 17:31:38630the image (tests with `expected_colors` fields in [pixel_test_pages]), there
631likely won't be a closest approved image since those tests only upload data to
632Gold in the event of a failure.
Kenneth Russellfa3ffde2018-10-24 21:24:38633
Brian Sheedyc4650ad02019-07-29 17:31:38634[pixel_test_pages]: https://cs.chromium.org/chromium/src/content/test/gpu/gpu_tests/pixel_test_pages.py
Kenneth Russellfa3ffde2018-10-24 21:24:38635
Kai Ninomiyaa6429fb32018-03-30 01:30:56636## Updating and Adding New Pixel Tests to the GPU Bots
637
Brian Sheedyc4650ad02019-07-29 17:31:38638If your CL adds a new pixel test or modifies existing ones, it's likely that
639you will have to approve new images. Simply run your CL through the CQ and
640follow the steps outline [here][pixel wrangling triage] under the "Check if any
641pixel test failures are actual failures or need to be rebaselined." step.
Kai Ninomiyaa6429fb32018-03-30 01:30:56642
Brian Sheedy5a4c0a392021-09-22 21:28:35643[pixel wrangling triage]: http://go/gpu-pixel-wrangler-info#how-to-keep-the-bots-green
Kai Ninomiyaa6429fb32018-03-30 01:30:56644
Brian Sheedy5a88cc72019-09-27 23:04:35645If you are adding a new pixel test, it is beneficial to set the
646`grace_period_end` argument in the test's definition. This will allow the test
647to run for a period without actually failing on the waterfall bots, giving you
648some time to triage any additional images that show up on them. This helps
649prevent new tests from making the bots red because they're producing slightly
650different but valid images from the ones triaged while the CL was in review.
651Example:
652
653```
654from datetime import date
655
656...
657
658PixelTestPage(
659 'foo_pixel_test.html',
660 ...
661 grace_period_end=date(2020, 1, 1)
662)
663```
664
665You should typically set the grace period to end 1-2 days after the the CL will
666land.
667
Brian Sheedyc4650ad02019-07-29 17:31:38668Once your CL passes the CQ, you should be mostly good to go, although you should
669keep an eye on the waterfall bots for a short period after your CL lands in case
670any configurations not covered by the CQ need to have images approved, as well.
Brian Sheedy5a88cc72019-09-27 23:04:35671All untriaged images for your test can be found by substituting your test name
672into:
673
674`https://chrome-gpu-gold.skia.org/search?query=name%3D<test name>`
Kai Ninomiyaa6429fb32018-03-30 01:30:56675
Brian Sheedye4a03fc2020-05-13 23:12:00676**NOTE** If you have a grace period active for your test, then Gold will be told
677to ignore results for the test. This is so that it does not comment on unrelated
678CLs about untriaged images if your test is noisy. Images will still be uploaded
679to Gold and can be triaged, but will not show up on the main page's untriaged
680image list, and you will need to enable the "Ignored" toggle at the top of the
681page when looking at the triage page specific to your test.
682
Kai Ninomiyaa6429fb32018-03-30 01:30:56683## Stamping out Flakiness
684
685It's critically important to aggressively investigate and eliminate the root
686cause of any flakiness seen on the GPU bots. The bots have been known to run
687reliably for days at a time, and any flaky failures that are tolerated on the
688bots translate directly into instability of the browser experienced by
689customers. Critical bugs in subsystems like WebGL, affecting high-profile
690products like Google Maps, have escaped notice in the past because the bots
691were unreliable. After much re-work, the GPU bots are now among the most
692reliable automated test machines in the Chromium project. Let's keep them that
693way.
694
695Flakiness affecting the GPU tests can come in from highly unexpected sources.
696Here are some examples:
697
698* Intermittent pixel_test failures on Linux where the captured pixels were
699 black, caused by the Display Power Management System (DPMS) kicking in.
700 Disabled the X server's built-in screen saver on the GPU bots in response.
701* GNOME dbus-related deadlocks causing intermittent timeouts ([Issue
702 309093](http://crbug.com/309093) and related bugs).
703* Windows Audio system changes causing intermittent assertion failures in the
704 browser ([Issue 310838](http://crbug.com/310838)).
705* Enabling assertion failures in the C++ standard library on Linux causing
706 random assertion failures ([Issue 328249](http://crbug.com/328249)).
707* V8 bugs causing random crashes of the Maps pixel test (V8 issues
708 [3022](https://code.google.com/p/v8/issues/detail?id=3022),
709 [3174](https://code.google.com/p/v8/issues/detail?id=3174)).
710* TLS changes causing random browser process crashes ([Issue
711 264406](http://crbug.com/264406)).
712* Isolated test execution flakiness caused by failures to reliably clean up
713 temporary directories ([Issue 340415](http://crbug.com/340415)).
714* The Telemetry-based WebGL conformance suite caught a bug in the memory
715 allocator on Android not caught by any other bot ([Issue
716 347919](http://crbug.com/347919)).
717* context_lost test failures caused by the compositor's retry logic ([Issue
718 356453](http://crbug.com/356453)).
719* Multiple bugs in Chromium's support for lost contexts causing flakiness of
720 the context_lost tests ([Issue 365904](http://crbug.com/365904)).
721* Maps test timeouts caused by Content Security Policy changes in Blink
722 ([Issue 395914](http://crbug.com/395914)).
723* Weak pointer assertion failures in various webgl\_conformance\_tests caused
724 by changes to the media pipeline ([Issue 399417](http://crbug.com/399417)).
725* A change to a default WebSocket timeout in Telemetry causing intermittent
726 failures to run all WebGL conformance tests on the Mac bots ([Issue
727 403981](http://crbug.com/403981)).
728* Chrome leaking suspended sub-processes on Windows, apparently a preexisting
729 race condition that suddenly showed up ([Issue
730 424024](http://crbug.com/424024)).
731* Changes to Chrome's cross-context synchronization primitives causing the
732 wrong tiles to be rendered ([Issue 584381](http://crbug.com/584381)).
733* A bug in V8's handling of array literals causing flaky failures of
734 texture-related WebGL 2.0 tests ([Issue 606021](http://crbug.com/606021)).
735* Assertion failures in sync point management related to lost contexts that
736 exposed a real correctness bug ([Issue 606112](http://crbug.com/606112)).
737* A bug in glibc's `sem_post`/`sem_wait` primitives breaking V8's parallel
738 garbage collection ([Issue 609249](http://crbug.com/609249)).
Kenneth Russelld5efb3f2018-05-11 01:40:45739* A change to Blink's memory purging primitive which caused intermittent
740 timeouts of WebGL conformance tests on all platforms ([Issue
741 840988](http://crbug.com/840988)).
Brian Sheedy382a59b42020-06-09 00:22:32742* Screen DPI being inconsistent across seemingly identical Linux machines,
743 causing the Maps pixel test to flakily produce incorrectly sized images
744 ([Issue 1091410](https://crbug.com/1091410)).
Kai Ninomiyaa6429fb32018-03-30 01:30:56745
746If you notice flaky test failures either on the GPU waterfalls or try servers,
747please file bugs right away with the component Internals>GPU>Testing and
748include links to the failing builds and copies of the logs, since the logs
749expire after a few days. [GPU pixel wranglers] should give the highest priority
750to eliminating flakiness on the tree.
751
Brian Sheedy5a4c0a392021-09-22 21:28:35752[GPU pixel wranglers]: http://go/gpu-pixel-wrangler