![In-GPU-we-Rust](https://hackmd.io/_uploads/B1y03o3gJe.jpg) --- ## `whoami` ![rust-projects](https://hackmd.io/_uploads/rkw__tEbyx.png) <!-- 12 years of experience --> --- ## Agenda 1. Landscape of GPU abstractions 2. History of *wgpu* 3. *Blade* of difference --- ## GPU abstractions ![safe/lightweight/portable](https://hackmd.io/_uploads/rJRR3QT6C.png) ---- ### Map of Portability ![platform availability](https://github.com/kvark/slides/raw/b3cbeaa4704dd6090cf633e5b559390e744f6c1c/md/ProcessingShaders/PlatformsMap.jpeg) ---- ![gpu-safety](https://hackmd.io/_uploads/HkSnocSZkg.jpg) <!-- is it safe to access memory? Browser definition --> <!-- underfined behavior vs undefined data --> ---- <!-- .slide: style="text-align: left; font-size: 24px; margin-left: 60px; " --> ### case: glow Purity: :heavy_check_mark: Safety: :grey_question: - OpenGL is safe, but Rust API is not Backends: *GL/GLES/WebGL* - no compute on Apple platforms Overhead: :question: - API itself is close to zero overhead - but actual platforms may involve translation Ergonomics: *AA+* - relatively small API - boilerplate related to bindings and framebuffers Downloads: every *8 seconds* ---- <!-- .slide: style="text-align: left; font-size: 24px; margin-left: 60px;" --> ### case: Ash Purity: :heavy_check_mark: (no shader solution) Safety: :x: Backends: *Vulkan* Overhead: :heavy_check_mark: Ergonomics: *A* Downloads: every *9 seconds* - is a dependency of many others ---- <!-- .slide: style="text-align: left; font-size: 24px; margin-left: 60px;" --> ### case: Vulkano ![vulkano-logo](https://github.com/vulkano-rs/vulkano/blob/master/logo.png?raw=true =15%x) Purity: :heavy_check_mark: host, :x: shader processing (3rd party C++) Safety: :heavy_check_mark: host, :x: shaders, relies on robust buffer/image access Backends: *Vulkan* Overhead: :zzz: - every draw/dispatch is iterating all the used resources - actual commands are recorded at the end of the pass Ergonomics: *AA* - automatic barriers, bit of type sugar Downloads: every *2.5 minutes* ---- <!-- .slide: style="text-align: left; font-size: 24px; margin-left: 60px;" --> ### case: wgpu ![wgpu-logo](https://github.com/gfx-rs/wgpu/blob/trunk/logo.png?raw=true =15%x) Purity: :heavy_check_mark: (includes shader solution via `naga`) Safety: :heavy_check_mark: (includes shader instrumentation) Backends: *Vulkan*, *D3D12*, *Metal*, *GL*, *WebGPU*, *WebGL2* Overhead: :zzz: - tracking every bind group setup - actual commands are recorded at the end of the pass Ergonomics: *AAA* - simple specification - automatic state tracking Downloads: every *12 seconds* ---- <!-- .slide: style="text-align: left; font-size: 24px; margin-left: 60px;" --> ### case: wgpu-hal Purity: :heavy_check_mark: (includes shader solution via `naga`) Safety: :x: Backends: *Vulkan*, *D3D12*, *Metal*, *GL/GLES/WebGL2*, *WebGPU* Overhead: :heavy_check_mark: (directly mapped) Ergonomics: *A+* - a bit simpler than Vulkan Downloads: every *12 seconds* (same as wgpu) ---- <!-- .slide: style="text-align: left; font-size: 24px; margin-left: 60px;" --> ### case: Blade ![blade-logo](https://github.com/kvark/blade/blob/main/docs/logo.png?raw=true =15%x) Purity: :heavy_check_mark: (includes shader solution via `naga`) Safety: :x: Backends: *Vulkan*, *Metal*, *GLES/WebGL2* Overhead: :heavy_check_mark: (directly mapped) GPU penalty: :question: (to be discussed) Ergonomics: *AAA+* - doesn't involve any bind group layout business - no resource states or barriers - but requires manual resource destruction Downloads: every *15 minutes* ---- ### Ergonomics scale ![ergononimcs](https://hackmd.io/_uploads/SkOtDBaaA.png) <!-- drops portability and overhead --> --- ![logo](https://github.com/gfx-rs/wgpu/blob/trunk/logo.png?raw=true =50%x) ---- ### wgpu: Implementation of WebGPU ![webgpu-problem](https://hackmd.io/_uploads/HyPBTKne1l.png) ---- ### WebGPU: Targets ![wgpu-intersection](https://hackmd.io/_uploads/HJuEyjHbye.png) ---- ### wgpu: History ![wgpu-history](https://hackmd.io/_uploads/SJDB4U3lyx.png) ---- ### wgpu: Architecture ![wgpu-graph](https://hackmd.io/_uploads/r1lZA9rWJl.png) ---- ### wgpu: Safety Core idea: *validating correctness takes as much computation as providing it*. <!-- not obvious, needed to be experimentally proven --> ---- ### wgpu: Synchronization ![wgpu-usages](https://hackmd.io/_uploads/ryPf0Fneyx.png)![wgpu-sync](https://hackmd.io/_uploads/Hk3mAFhlye.png) ---- ### WebGPU Shading Language ![webgpu-shading-language2](https://hackmd.io/_uploads/ByZw6_hlJx.jpg) ---- #### WGSL: Motivation - one of the drivers behind early Web was the ability to _inspect/edit/write_ pages directly. - no shading language is designed for safety and lack of UB. - GLSL is outdated, SPIR-V spec is difficult, everything else is poorly specified... Naga shows GLSL -> SPIRV in just 1.5ms per shader. <!-- in any case SPIR-V fork would require a spec --> ---- #### naga: Architecture ![naga-architecture](https://hackmd.io/_uploads/r1ooC5Bbkx.png) ---- ### wgpu: Conclusion - most mature, portable, well specified - pretty fast, and the only truly safe ![vangers debug](https://github.com/kvark/slides/raw/b3cbeaa4704dd6090cf633e5b559390e744f6c1c/md/WgpuChallenges/vangers-raymax-debug.png) --- ## blade Lean and mean graphics API ![](https://i.imgur.com/uXB8rPj.png) ---- ### blade: Motivation - it's not always worth it to provide the driver with all the info ahead of time. - lots of workflows are leaning to *compute-only*, e.g. 2D graphics rendering, ray tracing, neural networks. - most API complexity is from rasterization. - modern APIs are too verbose. ---- ![Screenshot 2024-10-28 222030](https://hackmd.io/_uploads/rJqCex0gJg.png) ---- ### blade: Principles 1. hacking graphics should be fun! - we can live without resource barriers - shader resource layouts can be simpler - uniforms are just data 2. simplicity >> safety - no runtime validation - copyable handles <!-- user-facing abstraction should be safe, the question is - at what level is this enforced? Arguably, GPU API level isn't the best --> ---- ![validation](https://hackmd.io/_uploads/r1ggZ0SWkg.jpg) ---- ### blade: Look, ma, no bindings! Shader: ```rust var<storage,read_write> particles: array<Particle>; var<uniform> parameters: Parameters; ``` Host: ```rust pc.bind(0, &MainData { particles: particle_buffer.into(), parameters: Parameters { my_uniform: [1,2,3,4], }, }); pc.dispatch([group_count, 1, 1]); ``` ---- ### blade: Synchronization ```rust if let mut pass = command_encoder.compute("fill-gbuf") { let mut pc = pass.with(&self.fill_pipeline); pc.bind(0, &FillData {...}); pc.dispatch(groups); } // implicit barrier between passes if let mut pass = command_encoder.compute("ray-trace") { let mut pc = pass.with(&self.main_pipeline); pc.bind(0, &MainData {...}); pc.dispatch(groups); } ``` ---- ![blade-zed](https://hackmd.io/_uploads/Bk8r-Z0xye.png) --- ### blade: Performance API translation and command recording: :zap: Rasterization: | GPU | blade | wgpu-hal | | --- | ----- | -------- | | Ryzen 3500U | 20K | 20K | | Ryzen 6850U | 70K | 70K | | GeForce 3050 | 100K | 100K | ---- <!-- .slide: style="text-align: left; font-size: 24px; margin-left: 60px; " --> ### blade: GPU Penalty [@krOoze on Khronos forums](https://community.khronos.org/t/which-vulkan-implementations-really-care-about-image-layouts/6885/4): >Supplying GENERAL everywhere sure is state-of-the-art weapons-grade laziness… Drivers: * NVIDIA: [irrelevant](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gameworks%2FVulkanDevDaypdaniel.pdf) >Just leave images in the VK_IMAGE_LAYOUT_GENERAL layout * AMD: comes down to [ac_surface_supports_dcc_image_stores](https://gitlab.freedesktop.org/mesa/mesa/-/blob/e18733300e65f97757150c6a670f80d032a2615d/src/amd/common/ac_surface.c#L149) * roughly starts with RDNA * experiments show no penalty on Vega * Intel: unclear Easy to mitigate by inserting transitions around render passes. ---- ### blade: conclusion - easy to use, hackable - very fast and portable ![game](https://github.com/kvark/blade/blob/main/docs/vehicle-colliders.jpg?raw=true) --- ## Thank you! :crab: :crab: :crab: ![torus](https://github.com/kvark/blade/raw/d99fd709b8d0b415197eee0b71b1cac9cee84aa2/docs/ray-query.gif =50%x)
{"image":"https://hackmd.io/_uploads/BJu9sS6a0.jpg","title":"In GPU we Rust","breaks":true,"description":"Presentation about the GPU abstractions in Rust.","contributors":"[{\"id\":\"979e994f-8a6f-4ba5-b86c-9af3abd000ad\",\"add\":12168,\"del\":5573}]"}
    182 views