Cg is a high-level shading language that is used to program the vertex and fragment shaders in Unity. Cg shaders allow for complex pixel processing and GPU calculations. In Unity, shaders are written using the Cg language and compiled for the target graphics API. Shaders have inputs like vertex attributes and uniforms, and outputs like varying values passed between stages. Common shader types include surface, vertex, and fragment shaders. Advanced techniques like multi-pass rendering are also possible in Unity using Cg shaders.
The document discusses SPU shaders, which are fragments of code used in larger systems on the SPU. SPU shaders are like scripts or callbacks and are used to customize system data and provide feedback outside the current system. They provide advantages like improved performance by offloading work to the SPU and allowing new functionality without modifying core systems. Implementing SPU shaders involves identifying where in systems to inject shader code fragments and setting up common functions and configurations to manage the shaders.
The document summarizes the key features and capabilities of Direct3D 10, which was designed to maximize GPU performance by reducing CPU overhead and enabling more work to be done on the GPU. Some of the main features discussed include constant buffers, geometry shaders, texture arrays, and other capabilities that reduce draw calls and state changes. Direct3D 10 also provides a standardized, consistent API and enables new visual effects by exposing more of the GPU's programmability and functionality to developers.
Graphics hardware has evolved from fixed-function pipelines to programmable shaders. Early shaders had limited instruction sets but modern shading languages can be compiled and support branching, loops, and complex math operations. Shaders overcome limits of fixed-function graphics by implementing operations like lighting and texturing through programs. The Nvidia GeForce 8800 introduced a unified shader architecture with 128 stream processors for high parallelism and performance.
Bill explains some of the ways that the Vertex Shader can be used to improve performance by taking a fast path through the Vertex Shader rather than generating vertices with other parts of the pipeline in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Check out more technical presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Far Cry uses many DirectX 9 features like shader models 2.x/3.0, geometry instancing, and floating-point render targets. To consolidate multiple lights into one pass, the developers wanted to use dynamic flow control in shaders but this was not possible in DX9. Instead, they used loop unrolling and precompiled shaders for different light combinations to avoid dynamic branching penalties. Geometry instancing was used to reduce vegetation rendering costs by submitting multiple instances in one draw call.
This document provides information about Minko, an open-source 3D engine built with ActionScript. It includes links to Minko's resources like documentation and examples. It discusses how Minko allows developing shaders using ActionScript instead of low-level AGAL, and provides shader examples in ActionScript. It also discusses how Minko enables hardware-accelerated particles using shaders, and previews an upcoming particles editor.
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeksJinTaek Seo
This document discusses shader programming in Direct3D. It covers vertex and pixel shaders, shader models, and using effects to integrate shaders with the graphics pipeline. Key points include:
- Vertex shaders process vertex data and convert it from model to projection space. Pixel shaders blend per-pixel data into output colors.
- Effects allow integrating shaders with pipeline state and simplify writing shaders for different hardware. Effects contain techniques and passes.
- Parameters can be accessed by semantic or annotation. Semantics specify the purpose and annotations add custom data. Effects are applied by selecting a technique and rendering within Begin/End passes.
This document provides an introduction to shaders and the programmable shading pipeline. It discusses key shader concepts like vertex and fragment shaders, uniforms, and semantics. It also covers topics like debugging shaders, multicompiler variants, and includes. Unity shader code examples are provided to illustrate concepts.
This document discusses a lecture on GPU architecture given by Mark Kilgard at the University of Texas on March 6, 2012. The lecture covers the architecture of graphics processing units and how they have evolved over the past six years. It also includes an in-class quiz, information about homework and projects, and the professor's office hours.
Direct3D 12 aims to reduce CPU overhead and increase scalability across CPU cores by allowing developers greater control over the graphics pipeline. It optimizes pipeline state handling through pipeline state objects and reduces redundant resource binding by introducing descriptor heaps and tables. Command lists and bundles further improve performance by enabling parallel command list generation and reuse of draw commands.
This document provides an introduction to shader programming. It discusses that shaders are programs that run on the GPU and include vertex shaders, fragment shaders, and surface shaders. It also covers the basic structure of shaders using CG and how they operate in parallel on multiple vertices and fragments. The document provides references for further reading on shader programming in Unity.
Bringing AAA graphics to mobile platforms
This document discusses techniques for bringing console-level graphics to mobile platforms using tile-based deferred rendering GPUs common in smartphones and tablets. It provides an overview of the architecture of tile-based mobile GPUs like ImgTec SGX and how they process vertices and pixels in tiles. It then discusses optimizations for mobile like using multi-sample anti-aliasing to reduce memory usage, form-fitting alpha blended geometry, and avoiding buffer restores and resolves. Specific rendering techniques like god rays and character shadows are explained.
4,000 Adams at 90 Frames Per Second | Yi Fei BoonJessica Tams
Delivered at Casual Connect Asia 2017. This session will offer the steps and explanation of techniques used to create, manage and render a large crowd in a game without killing the performance. It will cover instancing, baking of animations into textures, and skinning on GPU in the vertex shaders.
The document discusses challenges and approaches for real-time 3D architectural visualization and virtual reality using webGL. Some key challenges mentioned include the clean aesthetic required, complex lighting, and accurate material representation. The approaches discussed are physically based shading, deferred rendering using a g-buffer to store scene information, and integer packing to store g-buffer data in a single texture given webGL limitations. Unit testing of the packing functions is also emphasized.
This document describes a primitive processing and advanced shading architecture for embedded systems. It features a vertex cache and programmable primitive engine that can process fixed and variable size primitives with reduced memory bandwidth requirements. The architecture includes a configurable per-fragment shader that supports various shading models using dot products and lookup tables stored on-chip. This hybrid design aims to bring appealing shading to embedded applications while meeting limitations on gate size, power consumption, and memory traffic growth.
This document discusses new graphics APIs like DX12 and Vulkan that aim to provide lower overhead and more direct hardware access compared to earlier APIs. It covers topics like increased parallelism, explicit memory management using descriptor sets and pipelines, and best practices like batching draw calls and using multiple asynchronous queues. Overall, the new APIs allow more explicit control over GPU hardware for improved performance but require following optimization best practices around areas like parallelism, memory usage, and command batching.
The document provides an overview of next-generation graphics on the Xbox 360, including details about the Xbox 360 system architecture, GPU, and graphics APIs like Direct3D. The Xbox 360 GPU was custom designed by ATI Technologies and includes features like 10MB of embedded DRAM, 48 shader ALUs, and hardware support for tessellation, textures, and multi-sampling anti-aliasing. Direct3D on Xbox 360 exposes low-level access to the GPU while supporting familiar features and adding extensions for the custom hardware.
The document provides an overview of iOS visual effects capabilities and limitations. It discusses Cocoa Touch's reliance on CALayers for 3D capabilities despite UIViews living in 2D space. It introduces OpenGL and the power of shaders for advanced visual effects. Code examples are provided of various GLSL shaders for effects like color remapping, hue shifting, and edge detection. The document emphasizes the parallel processing capabilities of shaders for complex effects.
Practical Spherical Harmonics Based PRT MethodsNaughty Dog
The document summarizes methods for compressing precomputed radiance transfer (PRT) coefficients using spherical harmonics. It presents 4 methods with progressively higher compression ratios: Method 1 uses 9 bytes by removing a factor and scaling, Method 2 uses 6 bytes with a bit field allocation, Method 3 uses 6 bytes with a Lloyd-Max non-uniform quantizer, and Method 4 achieves 4 bytes with a different bit allocation. The methods are evaluated based on storage size, reconstruction quality, and rendering performance.
Getting the Best out of D3D12
AMD GDC2015 material
from: http://developer.amd.com/resources/documentation-articles/conference-presentations/
Talk by Yuriy O’Donnell at GDC 2017.
This talk describes how Frostbite handles rendering architecture challenges that come with having to support a wide variety of games on a single engine. Yuriy describes their new rendering abstraction design, which is based on a graph of all render passes and resources. This approach allows implementation of rendering features in a decoupled and modular way, while still maintaining efficiency.
A graph of all rendering operations for the entire frame is a useful abstraction. The industry can move away from “immediate mode” DX11 style APIs to a higher level system that allows simpler code and efficient GPU utilization. Attendees will learn how it worked out for Frostbite.
The document provides an overview of graphics pipelines. It discusses the basic graphics pipeline which includes 3D scene, vertex fetching, vertex processing, scan conversion, pixel processing, and raster operations. It then discusses modern graphics pipelines which use programmable shaders and unified shader architectures. It also discusses moving beyond traditional pipelining through parallelism approaches like SIMD, SIMT, and MIMD. Future trends may involve more MIMD approaches and programming models similar to SPU programming. This could enable more complex data structures, algorithms, lighting approaches, and rasterization techniques.
This document summarizes DirectX 10/11 visual effects and the compute shader capabilities. It introduces volumetric particle shadowing and horizon based ambient occlusion effects that can be achieved with DirectX 10. It then discusses how compute shaders on DirectX 10 hardware enable new effects by allowing general purpose computation on the GPU. Examples of particle systems, n-body simulations, and image processing are provided.
Geometry shaders operate on primitives like points, lines and triangles to modify or generate new geometry directly on the GPU. They provide benefits like reducing vertex data and computations by generating geometry from a limited number of inputs. However, generating too many new vertices can negatively impact performance due to increased memory and bandwidth usage. Geometry shaders are well suited for tasks like instancing, displacement mapping and outlining but care needs to be taken to optimize output size.
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeksJinTaek Seo
This document discusses shader programming in Direct3D. It covers vertex and pixel shaders, shader models, and using effects to integrate shaders with the graphics pipeline. Key points include:
- Vertex shaders process vertex data and convert it from model to projection space. Pixel shaders blend per-pixel data into output colors.
- Effects allow integrating shaders with pipeline state and simplify writing shaders for different hardware. Effects contain techniques and passes.
- Parameters can be accessed by semantic or annotation. Semantics specify the purpose and annotations add custom data. Effects are applied by selecting a technique and rendering within Begin/End passes.
This document provides an introduction to shaders and the programmable shading pipeline. It discusses key shader concepts like vertex and fragment shaders, uniforms, and semantics. It also covers topics like debugging shaders, multicompiler variants, and includes. Unity shader code examples are provided to illustrate concepts.
This document discusses a lecture on GPU architecture given by Mark Kilgard at the University of Texas on March 6, 2012. The lecture covers the architecture of graphics processing units and how they have evolved over the past six years. It also includes an in-class quiz, information about homework and projects, and the professor's office hours.
Direct3D 12 aims to reduce CPU overhead and increase scalability across CPU cores by allowing developers greater control over the graphics pipeline. It optimizes pipeline state handling through pipeline state objects and reduces redundant resource binding by introducing descriptor heaps and tables. Command lists and bundles further improve performance by enabling parallel command list generation and reuse of draw commands.
This document provides an introduction to shader programming. It discusses that shaders are programs that run on the GPU and include vertex shaders, fragment shaders, and surface shaders. It also covers the basic structure of shaders using CG and how they operate in parallel on multiple vertices and fragments. The document provides references for further reading on shader programming in Unity.
Bringing AAA graphics to mobile platforms
This document discusses techniques for bringing console-level graphics to mobile platforms using tile-based deferred rendering GPUs common in smartphones and tablets. It provides an overview of the architecture of tile-based mobile GPUs like ImgTec SGX and how they process vertices and pixels in tiles. It then discusses optimizations for mobile like using multi-sample anti-aliasing to reduce memory usage, form-fitting alpha blended geometry, and avoiding buffer restores and resolves. Specific rendering techniques like god rays and character shadows are explained.
4,000 Adams at 90 Frames Per Second | Yi Fei BoonJessica Tams
Delivered at Casual Connect Asia 2017. This session will offer the steps and explanation of techniques used to create, manage and render a large crowd in a game without killing the performance. It will cover instancing, baking of animations into textures, and skinning on GPU in the vertex shaders.
The document discusses challenges and approaches for real-time 3D architectural visualization and virtual reality using webGL. Some key challenges mentioned include the clean aesthetic required, complex lighting, and accurate material representation. The approaches discussed are physically based shading, deferred rendering using a g-buffer to store scene information, and integer packing to store g-buffer data in a single texture given webGL limitations. Unit testing of the packing functions is also emphasized.
This document describes a primitive processing and advanced shading architecture for embedded systems. It features a vertex cache and programmable primitive engine that can process fixed and variable size primitives with reduced memory bandwidth requirements. The architecture includes a configurable per-fragment shader that supports various shading models using dot products and lookup tables stored on-chip. This hybrid design aims to bring appealing shading to embedded applications while meeting limitations on gate size, power consumption, and memory traffic growth.
This document discusses new graphics APIs like DX12 and Vulkan that aim to provide lower overhead and more direct hardware access compared to earlier APIs. It covers topics like increased parallelism, explicit memory management using descriptor sets and pipelines, and best practices like batching draw calls and using multiple asynchronous queues. Overall, the new APIs allow more explicit control over GPU hardware for improved performance but require following optimization best practices around areas like parallelism, memory usage, and command batching.
The document provides an overview of next-generation graphics on the Xbox 360, including details about the Xbox 360 system architecture, GPU, and graphics APIs like Direct3D. The Xbox 360 GPU was custom designed by ATI Technologies and includes features like 10MB of embedded DRAM, 48 shader ALUs, and hardware support for tessellation, textures, and multi-sampling anti-aliasing. Direct3D on Xbox 360 exposes low-level access to the GPU while supporting familiar features and adding extensions for the custom hardware.
The document provides an overview of iOS visual effects capabilities and limitations. It discusses Cocoa Touch's reliance on CALayers for 3D capabilities despite UIViews living in 2D space. It introduces OpenGL and the power of shaders for advanced visual effects. Code examples are provided of various GLSL shaders for effects like color remapping, hue shifting, and edge detection. The document emphasizes the parallel processing capabilities of shaders for complex effects.
Practical Spherical Harmonics Based PRT MethodsNaughty Dog
The document summarizes methods for compressing precomputed radiance transfer (PRT) coefficients using spherical harmonics. It presents 4 methods with progressively higher compression ratios: Method 1 uses 9 bytes by removing a factor and scaling, Method 2 uses 6 bytes with a bit field allocation, Method 3 uses 6 bytes with a Lloyd-Max non-uniform quantizer, and Method 4 achieves 4 bytes with a different bit allocation. The methods are evaluated based on storage size, reconstruction quality, and rendering performance.
Getting the Best out of D3D12
AMD GDC2015 material
from: http://developer.amd.com/resources/documentation-articles/conference-presentations/
Talk by Yuriy O’Donnell at GDC 2017.
This talk describes how Frostbite handles rendering architecture challenges that come with having to support a wide variety of games on a single engine. Yuriy describes their new rendering abstraction design, which is based on a graph of all render passes and resources. This approach allows implementation of rendering features in a decoupled and modular way, while still maintaining efficiency.
A graph of all rendering operations for the entire frame is a useful abstraction. The industry can move away from “immediate mode” DX11 style APIs to a higher level system that allows simpler code and efficient GPU utilization. Attendees will learn how it worked out for Frostbite.
The document provides an overview of graphics pipelines. It discusses the basic graphics pipeline which includes 3D scene, vertex fetching, vertex processing, scan conversion, pixel processing, and raster operations. It then discusses modern graphics pipelines which use programmable shaders and unified shader architectures. It also discusses moving beyond traditional pipelining through parallelism approaches like SIMD, SIMT, and MIMD. Future trends may involve more MIMD approaches and programming models similar to SPU programming. This could enable more complex data structures, algorithms, lighting approaches, and rasterization techniques.
This document summarizes DirectX 10/11 visual effects and the compute shader capabilities. It introduces volumetric particle shadowing and horizon based ambient occlusion effects that can be achieved with DirectX 10. It then discusses how compute shaders on DirectX 10 hardware enable new effects by allowing general purpose computation on the GPU. Examples of particle systems, n-body simulations, and image processing are provided.
Geometry shaders operate on primitives like points, lines and triangles to modify or generate new geometry directly on the GPU. They provide benefits like reducing vertex data and computations by generating geometry from a limited number of inputs. However, generating too many new vertices can negatively impact performance due to increased memory and bandwidth usage. Geometry shaders are well suited for tasks like instancing, displacement mapping and outlining but care needs to be taken to optimize output size.
Contributing to WordPress With & Without Code.pptxPatrick Lumumba
Contributing to WordPress: Making an Impact on the Test Team—With or Without Coding Skills
WordPress survives on collaboration, and the Test Team plays a very important role in ensuring the CMS is stable, user-friendly, and accessible to everyone.
This talk aims to deconstruct the myth that one has to be a developer to contribute to WordPress. In this session, I will share with the audience how to get involved with the WordPress Team, whether a coder or not.
We’ll explore practical ways to contribute, from testing new features, and patches, to reporting bugs. By the end of this talk, the audience will have the tools and confidence to make a meaningful impact on WordPress—no matter the skill set.
Introducing FME Realize: A New Era of Spatial Computing and ARSafe Software
A new era for the FME Platform has arrived – and it’s taking data into the real world.
Meet FME Realize: marking a new chapter in how organizations connect digital information with the physical environment around them. With the addition of FME Realize, FME has evolved into an All-data, Any-AI Spatial Computing Platform.
FME Realize brings spatial computing, augmented reality (AR), and the full power of FME to mobile teams: making it easy to visualize, interact with, and update data right in the field. From infrastructure management to asset inspections, you can put any data into real-world context, instantly.
Join us to discover how spatial computing, powered by FME, enables digital twins, AI-driven insights, and real-time field interactions: all through an intuitive no-code experience.
In this one-hour webinar, you’ll:
-Explore what FME Realize includes and how it fits into the FME Platform
-Learn how to deliver real-time AR experiences, fast
-See how FME enables live, contextual interactions with enterprise data across systems
-See demos, including ones you can try yourself
-Get tutorials and downloadable resources to help you start right away
Whether you’re exploring spatial computing for the first time or looking to scale AR across your organization, this session will give you the tools and insights to get started with confidence.
AI Emotional Actors: “When Machines Learn to Feel and Perform"AkashKumar809858
Welcome to the era of AI Emotional Actors.
The entertainment landscape is undergoing a seismic transformation. What started as motion capture and CGI enhancements has evolved into a full-blown revolution: synthetic beings not only perform but express, emote, and adapt in real time.
For reading further follow this link -
https://akash97.gumroad.com/l/meioex
Dev Dives: System-to-system integration with UiPath API WorkflowsUiPathCommunity
Join the next Dev Dives webinar on May 29 for a first contact with UiPath API Workflows, a powerful tool purpose-fit for API integration and data manipulation!
This session will guide you through the technical aspects of automating communication between applications, systems and data sources using API workflows.
📕 We'll delve into:
- How this feature delivers API integration as a first-party concept of the UiPath Platform.
- How to design, implement, and debug API workflows to integrate with your existing systems seamlessly and securely.
- How to optimize your API integrations with runtime built for speed and scalability.
This session is ideal for developers looking to solve API integration use cases with the power of the UiPath Platform.
👨🏫 Speakers:
Gunter De Souter, Sr. Director, Product Manager @UiPath
Ramsay Grove, Product Manager @UiPath
This session streamed live on May 29, 2025, 16:00 CET.
Check out all our upcoming UiPath Dev Dives sessions:
👉 https://community.uipath.com/dev-dives-automation-developer-2025/
Supercharge Your AI Development with Local LLMsFrancesco Corti
In today's AI development landscape, developers face significant challenges when building applications that leverage powerful large language models (LLMs) through SaaS platforms like ChatGPT, Gemini, and others. While these services offer impressive capabilities, they come with substantial costs that can quickly escalate especially during the development lifecycle. Additionally, the inherent latency of web-based APIs creates frustrating bottlenecks during the critical testing and iteration phases of development, slowing down innovation and frustrating developers.
This talk will introduce the transformative approach of integrating local LLMs directly into their development environments. By bringing these models closer to where the code lives, developers can dramatically accelerate development lifecycles while maintaining complete control over model selection and configuration. This methodology effectively reduces costs to zero by eliminating dependency on pay-per-use SaaS services, while opening new possibilities for comprehensive integration testing, rapid prototyping, and specialized use cases.
Co-Constructing Explanations for AI Systems using ProvenancePaul Groth
Explanation is not a one off - it's a process where people and systems work together to gain understanding. This idea of co-constructing explanations or explanation by exploration is powerful way to frame the problem of explanation. In this talk, I discuss our first experiments with this approach for explaining complex AI systems by using provenance. Importantly, I discuss the difficulty of evaluation and discuss some of our first approaches to evaluating these systems at scale. Finally, I touch on the importance of explanation to the comprehensive evaluation of AI systems.
Introduction and Background:
Study Overview and Methodology: The study analyzes the IT market in Israel, covering over 160 markets and 760 companies/products/services. It includes vendor rankings, IT budgets, and trends from 2025-2029. Vendors participate in detailed briefings and surveys.
Vendor Listings: The presentation lists numerous vendors across various pages, detailing their names and services. These vendors are ranked based on their participation and market presence.
Market Insights and Trends: Key insights include IT market forecasts, economic factors affecting IT budgets, and the impact of AI on enterprise IT. The study highlights the importance of AI integration and the concept of creative destruction.
Agentic AI and Future Predictions: Agentic AI is expected to transform human-agent collaboration, with AI systems understanding context and orchestrating complex processes. Future predictions include AI's role in shopping and enterprise IT.
Offshore IT Support: Balancing In-House and Offshore Help Desk Techniciansjohn823664
In today's always-on digital environment, businesses must deliver seamless IT support across time zones, devices, and departments. This SlideShare explores how companies can strategically combine in-house expertise with offshore talent to build a high-performing, cost-efficient help desk operation.
From the benefits and challenges of offshore support to practical models for integrating global teams, this presentation offers insights, real-world examples, and key metrics for success. Whether you're scaling a startup or optimizing enterprise support, discover how to balance cost, quality, and responsiveness with a hybrid IT support strategy.
Perfect for IT managers, operations leads, and business owners considering global help desk solutions.
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025Nikki Chapple
Session | Protecting Your Sensitive Data with Microsoft Purview: Practical Information Protection and DLP Strategies
Presenter | Nikki Chapple (MVP| Principal Cloud Architect CloudWay) & Ryan John Murphy (Microsoft)
Event | IRMS Conference 2025
Format | Birmingham UK
Date | 18-20 May 2025
In this closing keynote session from the IRMS Conference 2025, Nikki Chapple and Ryan John Murphy deliver a compelling and practical guide to data protection, compliance, and information governance using Microsoft Purview. As organizations generate over 2 billion pieces of content daily in Microsoft 365, the need for robust data classification, sensitivity labeling, and Data Loss Prevention (DLP) has never been more urgent.
This session addresses the growing challenge of managing unstructured data, with 73% of sensitive content remaining undiscovered and unclassified. Using a mountaineering metaphor, the speakers introduce the “Secure by Default” blueprint—a four-phase maturity model designed to help organizations scale their data security journey with confidence, clarity, and control.
🔐 Key Topics and Microsoft 365 Security Features Covered:
Microsoft Purview Information Protection and DLP
Sensitivity labels, auto-labeling, and adaptive protection
Data discovery, classification, and content labeling
DLP for both labeled and unlabeled content
SharePoint Advanced Management for workspace governance
Microsoft 365 compliance center best practices
Real-world case study: reducing 42 sensitivity labels to 4 parent labels
Empowering users through training, change management, and adoption strategies
🧭 The Secure by Default Path – Microsoft Purview Maturity Model:
Foundational – Apply default sensitivity labels at content creation; train users to manage exceptions; implement DLP for labeled content.
Managed – Focus on crown jewel data; use client-side auto-labeling; apply DLP to unlabeled content; enable adaptive protection.
Optimized – Auto-label historical content; simulate and test policies; use advanced classifiers to identify sensitive data at scale.
Strategic – Conduct operational reviews; identify new labeling scenarios; implement workspace governance using SharePoint Advanced Management.
🎒 Top Takeaways for Information Management Professionals:
Start secure. Stay protected. Expand with purpose.
Simplify your sensitivity label taxonomy for better adoption.
Train your users—they are your first line of defense.
Don’t wait for perfection—start small and iterate fast.
Align your data protection strategy with business goals and regulatory requirements.
💡 Who Should Watch This Presentation?
This session is ideal for compliance officers, IT administrators, records managers, data protection officers (DPOs), security architects, and Microsoft 365 governance leads. Whether you're in the public sector, financial services, healthcare, or education.
🔗 Read the blog: https://nikkichapple.com/irms-conference-2025/
Introducing the OSA 3200 SP and OSA 3250 ePRCAdtran
Adtran's latest Oscilloquartz solutions make optical pumping cesium timing more accessible than ever. Discover how the new OSA 3200 SP and OSA 3250 ePRC deliver superior stability, simplified deployment and lower total cost of ownership. Built on a shared platform and engineered for scalable, future-ready networks, these models are ideal for telecom, defense, metrology and more.
Evaluation Challenges in Using Generative AI for Science & Technical ContentPaul Groth
Evaluation Challenges in Using Generative AI for Science & Technical Content.
Foundation Models show impressive results in a wide-range of tasks on scientific and legal content from information extraction to question answering and even literature synthesis. However, standard evaluation approaches (e.g. comparing to ground truth) often don't seem to work. Qualitatively the results look great but quantitive scores do not align with these observations. In this talk, I discuss the challenges we've face in our lab in evaluation. I then outline potential routes forward.
As data privacy regulations become more pervasive across the globe and organizations increasingly handle and transfer (including across borders) meaningful volumes of personal and confidential information, the need for robust contracts to be in place is more important than ever.
This webinar will provide a deep dive into privacy contracting, covering essential terms and concepts, negotiation strategies, and key practices for managing data privacy risks.
Whether you're in legal, privacy, security, compliance, GRC, procurement, or otherwise, this session will include actionable insights and practical strategies to help you enhance your agreements, reduce risk, and enable your business to move fast while protecting itself.
This webinar will review key aspects and considerations in privacy contracting, including:
- Data processing addenda, cross-border transfer terms including EU Model Clauses/Standard Contractual Clauses, etc.
- Certain legally-required provisions (as well as how to ensure compliance with those provisions)
- Negotiation tactics and common issues
- Recent lessons from recent regulatory actions and disputes
nnual (33 years) study of the Israeli Enterprise / public IT market. Covering sections on Israeli Economy, IT trends 2026-28, several surveys (AI, CDOs, OCIO, CTO, staffing cyber, operations and infra) plus rankings of 760 vendors on 160 markets (market sizes and trends) and comparison of products according to support and market penetration.
4. Agenda
• What is this talk about?
• Why porting a cloth simulation to the GPU?
• The first attempts – A new approach
• The shader – Easy parts – Complex parts
• Optimizing the shader
• The PS4 version
• What you can do & cannot do in compute shader
• Tips & tricks
4 / 122
5. What is this talk about?
• Cloth simulation ported to the GPU
• For PC DirectX 11, Xbox One and PS4
5 / 122
6. What is this talk about?
• Cloth simulation ported to the GPU
• For PC DirectX 11, Xbox One and PS4
• This talk is about all that we have learned
during this adventure
6 / 122
8. • What is this talk about?
• Why porting a cloth simulation to the GPU?
• The first attempts
• A new approach
• The shader – Easy parts – Complex parts
• Optimizing the shader
• The PS4 version
• What you can do & cannot do in compute shader
• Tips & tricks
8 / 122
11. # of
dancers
Xbox360 34
PS3 105
SPUs
rock!
# of
dancers
Xbox360 34
5 ms of CPU time
Why porting a cloth simulation to the GPU?
11 / 122
12. # of
dancers
Xbox360 34
PS3 105
5 ms of CPU time
Why porting a cloth simulation to the GPU?
12 / 122
Now
let’s switch
to next gen!
13. # of
dancers
Xbox360 34
PS3 105
PS4 98
# of
dancers
Xbox360 34
PS3 105
5 ms of CPU time
WTF?
Why porting a cloth simulation to the GPU?
13 / 122
14. # of
dancers
Xbox360 34
PS3 105
PS4 98
# of
dancers
Xbox360 34
PS3 105
5 ms of CPU time
Why porting a cloth simulation to the GPU?
14 / 122
5 SPUs
@ 3.2 GHz
6 cores
@ 1.6 GHz
15. # of
dancers
Xbox360 34
PS3 105
PS4 98
Xbox One 113
# of
dancers
Xbox360 34
PS3 105
PS4 98
5 ms of CPU time
Why porting a cloth simulation to the GPU?
15 / 122
19. • What is this talk about?
• Why porting a cloth simulation to the GPU?
• The first attempts
• A new approach
• The shader – Easy parts – Complex parts
• Optimizing the shader
• The PS4 version
• What you can do & cannot do in compute shader
• Tips & tricks
19 / 122
20. Easy to use
Not available on all platforms
The first attempts
20 / 122
21. Easy to use
Close to C++
DirectCompute
Not available on all platforms
Black box: no possibility to
know what’s going on
The first attempts
21 / 122
22. The first attempts
Resolve some constraints
Integrate velocity
Resolve collisions
Resolve some more constraints
Do some other funny stuffs
… 22 / 122
23. The first attempts
Resolve some constraints
Integrate velocity
Resolve collisions
Resolve some more constraints
Do some other funny stuffs
…
Compute Shader
Compute Shader
Compute Shader
Compute Shader
Compute Shader
Compute Shader
23 / 122
26. The first attempts
Merge several cloth items to
get better performance
0%
20%
40%
60%
80%
100%
120%
140%
CPU GPU
27%
All cloth items must have
the same properties
26 / 122
27. • What is this talk about?
• Why porting a cloth simulation to the GPU?
• The first attempts
• A new approach
• The shader – Easy parts – Complex parts
• Optimizing the shader
• The PS4 version
• What you can do & cannot do in compute shader
• Tips & tricks
27 / 122
28. A new approach
• A single huge compute shader to
simulate the entire cloth
• Synchronization points inside the shader
• A single “Dispatch” instead of 50+
28 / 122
29. A new approach
• A single huge compute shader to
simulate the entire cloth
• Synchronization points inside the shader
• A single “Dispatch” instead of 50+
• Simulate several cloth items (up to 32)
using a single “Dispatch” 0%
50%
100%
150%
200%
CPU GPU
160%
29 / 122
30. • What is this talk about?
• Why porting a cloth simulation to the GPU?
• The first attempts
• A new approach
• The shader – Easy parts – Complex parts
• Optimizing the shader
• The PS4 version
• What you can do & cannot do in compute shader
• Tips & tricks
30 / 122
31. The shader
• 41 .hlsl files
• 3,100 lines of code
(+ 800 lines for unit tests & benchmarks)
• Compiled shader code size = 69 KB
31 / 122
32. The shader – Easy parts
0 1 2 3 4 5 63
…
• Thread group:
• We do the same operation on 64 vertices at a time
32 / 122
There must be no dependency between the threads
33. The shader – Easy parts
Read some global properties to apply (ex: gravity, wind)
Read position
of vertex 0
Read position
of vertex 1
Read position
of vertex 63
…
33 / 122
34. The shader – Easy parts
Read some global properties to apply (ex: gravity, wind)
Read position
of vertex 0
Read position
of vertex 1
Read position
of vertex 63
…
Compute Compute Compute
…
Write position
of vertex 0
Write position
of vertex 1
Write position
of vertex 63
…
34 / 122
35. The shader – Easy parts
Read some global properties to apply (ex: gravity, wind)
Read position
of vertex 64
Read position
of vertex 65
Read position
of vertex 127
…
Compute Compute Compute
…
Write position
of vertex 64
Write position
of vertex 65
Write position
of vertex 127
…
35 / 122
36. The shader – Easy parts
Read property
for vertex 0
Read position
of vertex 0
Read position
of vertex 1
Read position
of vertex 63
…
Read property
for vertex 1
… Read property
for vertex 63
36 / 122
37. The shader – Easy parts
Read property
for vertex 0
Read position
of vertex 0
Read position
of vertex 1
Read position
of vertex 63
…
Compute Compute Compute
…
Write position
of vertex 0
Write position
of vertex 1
Write position
of vertex 63
…
Read property
for vertex 1
… Read property
for vertex 63
37 / 122
38. The shader – Easy parts
Read property
for vertex 0
Read property
for vertex 1
… Read property
for vertex 63
Ensure contiguous reads to get good performance
38 / 122
39. The shader – Easy parts
Read property
for vertex 0
Read property
for vertex 1
… Read property
for vertex 63
Ensure contiguous reads to get good performance
Coalescing = 1 read instead of 16
i.e. use Structure of Arrays (SoA) instead of Array of
Structures (AoS) 39 / 122
40. The shader – Complex parts
• Binary constraints:
Constraint
Vertex A Vertex B
40 / 122
41. The shader – Complex parts
• Binary constraints:
41 / 122
42. The shader – Complex parts
• Binary constraints:
42 / 122
43. The shader – Complex parts
• Binary constraints:
? ?
?
43 / 122
44. The shader – Complex parts
• Binary constraints:
44 / 122
45. The shader – Complex parts
• Binary constraints: Group 1
45 / 122
46. The shader – Complex parts
• Binary constraints: Group 1
Group 2
46 / 122
47. The shader – Complex parts
• Binary constraints: Group 1
Group 2
Group 3
47 / 122
48. The shader – Complex parts
• Binary constraints: Group 1
Group 2
Group 3
Group 4
GroupMemoryBarrierWithGroupSync()
GroupMemoryBarrierWithGroupSync()
GroupMemoryBarrierWithGroupSync()
48 / 122
49. The shader – Complex parts
• Collisions: Easy or not?
• Collisions with vertices Easy
49 / 122
50. The shader – Complex parts
• Collisions: Easy or not?
• Collisions with vertices Easy
• Collisions with triangles
Each thread will modify the
position of 3 vertices
You have to create groups
and add synchronization
50 / 122
51. • What is this talk about?
• Why porting a cloth simulation to the GPU?
• The first attempts
• A new approach
• The shader – Easy parts – Complex parts
• Optimizing the shader
• The PS4 version
• What you can do & cannot do in compute shader
• Tips & tricks
51 / 122
52. Optimizing the shader
• General rule:
CPU
Vertex
128 bits
(4 floats)
Bottleneck = memory bandwidth
• Data compression:
52 / 122
53. Optimizing the shader
• General rule:
CPU
Vertex
128 bits
(4 floats)
Normal
128 bits
(4 floats)
Bottleneck = memory bandwidth
• Data compression:
53 / 122
54. Optimizing the shader
• General rule:
CPU GPU
Vertex
128 bits
(4 floats)
64 bits
(21:21:21:1)
Normal
128 bits
(4 floats)
Bottleneck = memory bandwidth
• Data compression:
54 / 122
55. Optimizing the shader
• General rule:
CPU GPU
Vertex
128 bits
(4 floats)
64 bits
(21:21:21:1)
Normal
128 bits
(4 floats)
32 bits
(10:10:10)
0%
100%
200%
300%
GPU -
No compression
GPU -
Compression
x2.3
Bottleneck = memory bandwidth
• Data compression:
55 / 122
56. Optimizing the shader
• Use Local Data Storage (aka Local Shared Memory)
CU CU CU CU
VRAM
CU CU CU CU
64 KB
LDS
Compute Unit
(12 on Xbox One,
18 on PS4)
56 / 122
57. Optimizing the shader
• Store vertices in Local Data Storage
57 / 122
Copy vertices from VRAM to LDS
58. Optimizing the shader
• Store vertices in Local Data Storage
Copy vertices from VRAM to LDS
Step 1 – Update vertices
Step 2 – Update vertices
Step n – Update vertices
Copy vertices from LDS to VRAM
…
0%
50%
100%
150%
200%
VRAM LDS
x1.9
58 / 122
59. Optimizing the shader
• Use bigger
thread groups
0 1 2 3 4 5 63
…
Load
Wait
Compute
59 / 122
60. Optimizing the shader
• Use bigger
thread groups
0 1 2 3 4 5 63
…
Load
Wait
Compute
Load
Wait
Compute 60 / 122
61. Optimizing the shader
• Use bigger
thread groups
0 1 2 3 4 5 63
…
64 127
…
Load
Load
61 / 122
62. Optimizing the shader
• Use bigger
thread groups
0 1 2 3 4 5 63
…
64 127
…
Load
Load
With 256 or
512 threads,
we hide most
of the latency!
Compute
Compute
62 / 122
70. • What is this talk about?
• Why porting a cloth simulation to the GPU?
• The first attempts
• A new approach
• The shader – Easy parts – Complex parts
• Optimizing the shader
• The PS4 version
• What you can do & cannot do in compute shader
• Tips & tricks
70 / 122
71. The PS4 version
• Port from HLSL to PSSL
#ifdef __PSSL__
#define numthreads NUM_THREADS
#define SV_GroupIndex S_GROUP_INDEX
#define SV_GroupID S_GROUP_ID
#define StructuredBuffer RegularBuffer
#define RWStructuredBuffer RW_RegularBuffer
#define ByteAddressBuffer ByteBuffer
#define RWByteAddressBuffer RW_ByteBuffer
#define GroupMemoryBarrierWithGroupSync ThreadGroupMemoryBarrierSync
#define groupshared thread_group_memory
#endif
71 / 122
72. The PS4 version
• On DirectX 11:
Compute
shader
Buffer
Compute
shader
Synchronization
Buffer
CopyResource
Synchronization
72 / 122
1 2
3
73. Buffer
The PS4 version
• On DirectX 11:
Compute
shader
Buffer
Compute
shader
Synchronization
Buffer
CopyResource
Synchronization
Copy
73 / 122
1 2
3
74. The PS4 version
• On PS4:
No implicit synchronization, no implicit buffer duplication
You have to manage everything by yourself
Potentially better performance because you know when
you have to sync or not
74 / 122
75. The PS4 version
• We use labels to know if a buffer is still in use
by the GPU
• Still used Automatically allocate a new buffer
• “Used” means used by a compute shader or a copy
• We also use labels to know when a compute shader
has finished, to copy the results
75 / 122
76. • What is this talk about?
• Why porting a cloth simulation to the GPU?
• The first attempts
• A new approach
• The shader – Easy parts – Complex parts
• Optimizing the shader
• The PS4 version
• What you can do & cannot do in compute shader
• Tips & tricks
76 / 122
77. What you can do in compute shader
0
200
400
600
800
1000
1200
1400
1600
1800
CPU GPU
0
200
400
600
800
1000
1200
1400
1600
1800
CPU GPU
Xbox One PS4
Gflops Gflops
Peak power:
77 / 122
78. • Using DirectCompute, you can do almost
everything in compute shader
• The difficulty is to get good performance
What you can do in compute shader
78 / 122
79. • Efficient code = you work on 64+ data at a time
What you can do in compute shader
if (threadIndex < 32)
{
…
};
if (threadIndex == 0)
{
…
};
79 / 122
80. • Efficient code = you work on 64+ data at a time
What you can do in compute shader
if (threadIndex < 32)
{
…
};
if (threadIndex == 0)
{
…
};
// Read the same data on all threads
…
This is likely
to be the
bottleneck
80 / 122
81. • Example: collisions
• On the CPU:
What you can do in compute shader
Compute a bounding volume
(ex: Axis-Aligned Bounding Box)
Use it for an early rejection test
81 / 122
82. • Example: collisions
• On the CPU:
What you can do in compute shader
Compute a bounding volume
(ex: Axis-Aligned Bounding Box)
Use it for an early rejection test
Use an acceleration structure
(ex: AABB Tree) to improve performance
82 / 122
83. • Example: collisions
• On the GPU:
What you can do in compute shader
Compute a bounding volume
(ex: Axis-Aligned Bounding Box)
Just doing this can be more costly than
computing the collision with all vertices!!!
83 / 122
84. What you can do in compute shader
• Compute 64 sub-AABoxes 0 1 2 3 4 5 63
…
84 / 122
85. What you can do in compute shader
• Compute 64 sub-AABoxes 0 1 2 3 4 5 63
…
85 / 122
86. What you can do in compute shader
• Compute 64 sub-AABoxes
• Reduce down to 32 sub-AABoxes
0 1 2 3 4 5 63
…
We use only 32
threads for that
86 / 122
87. What you can do in compute shader
• Compute 64 sub-AABoxes
• Reduce down to 32 sub-AABoxes
• Reduce down to 16 sub-AABoxes
0 1 2 3 4 5 63
…
We use only 16
threads for that
87 / 122
88. What you can do in compute shader
• Compute 64 sub-AABoxes
• Reduce down to 32 sub-AABoxes
• Reduce down to 16 sub-AABoxes
• Reduce down to 8 sub-AABoxes
0 1 2 3 4 5 63
…
We use only 8
threads for that
88 / 122
89. What you can do in compute shader
• Compute 64 sub-AABoxes
• Reduce down to 32 sub-AABoxes
• Reduce down to 16 sub-AABoxes
• Reduce down to 8 sub-AABoxes
• Reduce down to 4 sub-AABoxes
0 1 2 3 4 5 63
…
We use only 4
threads for that
89 / 122
90. What you can do in compute shader
• Compute 64 sub-AABoxes
• Reduce down to 32 sub-AABoxes
• Reduce down to 16 sub-AABoxes
• Reduce down to 8 sub-AABoxes
• Reduce down to 4 sub-AABoxes
• Reduce down to 2 sub-AABoxes
0 1 2 3 4 5 63
…
We use only 2
threads for that
90 / 122
91. What you can do in compute shader
• Compute 64 sub-AABoxes
• Reduce down to 32 sub-AABoxes
• Reduce down to 16 sub-AABoxes
• Reduce down to 8 sub-AABoxes
• Reduce down to 4 sub-AABoxes
• Reduce down to 2 sub-AABoxes
• Reduce down to 1 AABox
0 1 2 3 4 5 63
…
We use a single
thread for that
91 / 122
92. What you can do in compute shader
• Compute 64 sub-AABoxes
• Reduce down to 32 sub-AABoxes
• Reduce down to 16 sub-AABoxes
• Reduce down to 8 sub-AABoxes
• Reduce down to 4 sub-AABoxes
• Reduce down to 2 sub-AABoxes
• Reduce down to 1 AABox
This is ~ as
costly as
computing the
collision with
7 x 64 = 448
vertices!!
92 / 122
93. • Atomic functions are available
• You can write lock-free thread-safe containers
• Too costly in practice
What you can do in compute shader
93 / 122
94. • Atomic functions are available
• You can write lock-free thread-safe containers
• Too costly in practice
What you can do in compute shader
The brute-force approach is
almost always the fastest one
94 / 122
95. • Atomic functions are available
• You can write lock-free thread-safe containers
• Too costly in practice
What you can do in compute shader
The brute-force approach is
almost always the fastest one
• Bandwidth usage
• Data compression
• Memory coalescing
• LDS usage
95 / 122
96. What you can do in compute shader
Port an algorithm to the GPU
only if you find a way
to handle 64+ data at a time
95+% of the time
96 / 122
97. • What is this talk about?
• Why porting a cloth simulation to the GPU?
• The first attempts
• A new approach
• The shader – Easy parts – Complex parts
• Optimizing the shader
• The PS4 version
• What you can do & cannot do in compute shader
• Tips & tricks
97 / 122
100. Debug buffer
struct DebugBuffer
{
…
};
// Uncomment the following line
// to use the debug buffer
#define USE_DEBUG_BUFFER
#ifdef USE_DEBUG_BUFFER
RWStructuredBuffer<DebugBuffer> g_DebugBuffer : register(u1);
#endif
float3 m_Velocity;
float m_Weight;
100 / 122
101. Debug buffer
struct DebugBuffer
{
…
};
// Uncomment the following line
// to use the debug buffer
#define USE_DEBUG_BUFFER
#ifdef USE_DEBUG_BUFFER
RWStructuredBuffer<DebugBuffer> g_DebugBuffer : register(u1);
#endif
float3 m_Velocity;
float m_Weight;
WRITE_IN_DEBUG_BUFFER(m_Velocity, threadIndex, value);
DebugBuffer *debugBuffer = GetDebugBuffer();
101 / 122
102. What to put in LDS?
LDS
No
Random
access?
Yes
102 / 122
103. What to put in LDS?
LDS
Yes No
Yes
VRAM
Contiguous
access
No
Random
access?
Accessed
several
times?
103 / 122
104. Memory consumption in LDS
104 / 122
• LDS = 64 KB per compute unit
• 1 thread group can access 32 KB
105. Memory consumption in LDS
105 / 122
• LDS = 64 KB per compute unit
• 1 thread group can access 32 KB
2 thread groups can run
simultaneously on the same
compute unit
32 32
106. Memory consumption in LDS
106 / 122
• LDS = 64 KB per compute unit
• 1 thread group can access 32 KB
2 thread groups can run
simultaneously on the same
compute unit
• Less memory used in LDS
More thread groups can run in parallel
32 32
107. Memory consumption in LDS
107 / 122
• LDS = 64 KB per compute unit
• 1 thread group can access 32 KB
2 thread groups can run
simultaneously on the same
compute unit
• Less memory used in LDS
More thread groups can run in parallel
32 32
21 21 21
16 16 16 16
108. Memory consumption in LDS
108 / 122
• LDS = 64 KB per compute unit
• 1 thread group can access 32 KB
2 thread groups can run
simultaneously on the same
compute unit
• Less memory used in LDS
More thread groups can run in parallel
32 32
21 21 21
16 16 16 16
109. Optimizing bank access in LDS?
109 / 122
• LDS is divided into several banks (16 or 32)
• 2 threads accessing the same bank Conflict
110. Optimizing bank access in LDS?
110 / 122
• LDS is divided into several banks (16 or 32)
• 2 threads accessing the same bank Conflict
Visible impact on performance on older PC
hardware
Negligible on Xbox One, PS4 and newer PC
hardware
118. Iteration time
• It’s really hard to know which code will run the fastest.
• The “best” method:
• Write 10 versions of your feature.
• Test them.
• Keep the fastest one.
118 / 122
119. Iteration time
• It’s really hard to know which code will run the fastest.
• The “best” method:
• Write 10 versions of your feature.
• Test them.
• Keep the fastest one.
• A fast iteration time really helps
119 / 122