SlideShare a Scribd company logo
How we optimized our Game – Jake & Tess’
Finding Monsters Adventure
Phil Lira
Sr. Staff Engineer (Graphics)
@phi_lira
RELEASE TRAILER
https://www.youtube.com/watch?v=STzdj04n7dc
TECHNICAL CHALLENGES
How we optimized our Game - Jake & Tess' Finding Monsters Adventure
Technical challenges
Many custom shaders and effects
Technical challenges
Many custom shaders and effects
Technical challenges
Multiple characters with complex skinning
Our budget is the limit
• Push as much content as
possible with smooth gameplay
and no overheat
– Can we get the same quality
with a similar approach?
– Are we doing something we
don’t need to?
What if we hit our budge
• What happens when we fail?
– Either gameplay or visual quality will
be impacted
• When it comes to remove
effects, trust is important
OPTIMIZATION PROCESS
Optimization Process
• Do not make any assumptions.
• A profiler will tell you where the bottleneck is.
Profile Optimize Test
Optimization Process
• Rewrite code to use resources more efficiently
• Often we can fake or simplify effects
• Experience comes into play here.
OptimizeProfile Test
Optimization Process
• Guarantee your tests have same conditions
• Did you work reduced overall gpu ms?
TestProfile Optimize
How to find our bottleneck?
• Unity comes with a built-in profiler
that does most of the work
• We wanted to have more
detailed GPU info
– Adreno Profiler – Snapdragon GPUs
– Mali Graphics Debugger (MGD) and
DS-5 Streamline – Mali GPUs
Adreno GPU Profiler
How to find our bottleneck?
Disable GL
Frame rate
increased?
No Yes
CPU Bound GPU Bound
Vertex Frag Memory
How to find our bottleneck?
• Vertex
– #triangles
– Vertex shader
– Per-vertex lighting
• Fragment
– Fragment Shader (instruc. / sample)
– Blend Ops
– Per-Pixel light (forward rendering)
• Bandwidth
– Large textures
– Dependent Texture Reads
– Block Resolve (ReadPixels)
CASE STUDY – ROYAL MOON
Case Study – Royale Moon
• Triangles 106k
• Drawcalls 87
• Overdraw 2.51x
• Shader Stats:
– Up to 160 ALU/Frag
– Up to 7 texture samples
• Adreno %Time Shading Fragment - max
– Fragment bound
Overdraw Debug
Case Study – Royale Moon
• Early Z-Test Discards occluded fragments
• Render Order Matters
• Optimized Render Order
– Opaques – Front to Back
– Skybox
– Transparent – Back to Front
– Overlay (UI / HUD)
We need to improve this
How to assign object to sorting layers?
• Per Shader
– Have to duplicate shader files. Hard to maintain because we
have to make changes individually to each duplicate.
• Per Mesh
– Not scalable, requires lot of work.
– Risky! May break batches by mistake.
• Per Material
– YES!
– In that case do not use same material for different scene
• While you fix sort for one might break for the other.
Custom Material Inspector
• Created an editor script
BRSMaterialEditor to set
Material.renderQueue
• Add CustomEditor “BRSMaterialEditor”
to the end of shader file.
Character and Props
Camera Island Top
Outer Islands
How we optimized our Game - Jake & Tess' Finding Monsters Adventure
Skydome
Before and After Improving Sort
Reduced from 2.51 to 1.91
Z-Reject
FRAGMENT SHADER
Shader hotzone (% time shading)
Shader hotzone (ALU per frag)
• Improving Shader Instructions
– Model: ops that can be done once per drawcall
• Use scripts to compute and pass values to shader
• Input Vector Normalization (ex. Rim Light)
• Scroll Offset
– Vertex: Ops that can be done per vertex
• Uniform texture tile & offset
– Fragment: Ops that needs to be done per pixel
• Equation simplification
• Half & Fixed precision for better thermal
• Saturate vs max(0.0, dot)
Fragment
Vertex
Model
COMPLEXITY
How to optimize fragment shader
Optimizing Shaders
• Many custom shaders done in ShaderForge
– ShaderForge does heavy work on fragment
• Many variants and not exactly the same code
structure
• How to optimize them all?
– 1st pass optimizing in ShaderForge
– 2nd pass optimizing in Code
1st Pass: ShaderForge
• Identify core changes to lighting model
– BlinnPhongWrapped
– BlinnPhongRamp
• Created custom code node
– Artist helped with the process to replace for this code
– This made shader code common and more organized
1st Pass: ShaderForge
Custom Lightmap in ShaderForge
• One major art complain was the lack of support for lightmap
in custom lighting
• Created a Lightmap node for them
• Problem1: Need to enable lightmap in config shader header.
• Problem2: ShaderForge does not exposes interpolated data.
2nd Pass: Shader Code
Created a cginc file with macros for optimized code
• ShaderForge follows name convention for input
data
The results - Ground Shader
After optimization:
Before optimization:
• Avg ALU/Frag – ~21% reduction
• Fragments Shaded – ~45% reduction Overall Improvement: ~7ms
• Fragment Instructions – ~64% reduction
Further Improvements
• Fallback Shader
– We came across some problems
with shaders not being supported
for some configurations
– Vertex Animation with a noise
texture (tex2dlod) is not supported
on OpenGL ES 2.0 profiles
– Fallback shader to standout in
those cases
– Makes it easy to differentiate from
other errors
ASTC
TEXTURE COMPRESSIONTEXTURE COMPRESSION
ASTC
• Optimal performance with high quality
• Improves bandwitdh and power consuption
• Galaxy Note 4, Galaxy S6 and above support it
• Supported with OpenGL 3 Unity profile
ASTC
ASTC 4x4 ASTC 6x6 ETC 2
ASTC
Format RGB RGBA Normal Map
Codec ASTC 6x6 ASTC 4x4 ASTC 4x4
BPP 3.56 8 8
Size vs
Uncompressed
14.8% 50% 50%
Size vs ETC2 89% 100% 100%
Recommended Settings:
Review
• Do not make assumptions, use a profiler.
• GPU profilers will give you in-depth data per
drawcall
• One can assign objects to sorting layers at material
level for best workflow
• Reduce amount of work to optimize shader by
creating means to reuse optimized code.
• ASTC texture compression is best option available
for quality but only supported in a few devices.
Phil Lira
f.lira@samsung.com
@phi_lira
Q&A
CONTACTS
www.blackriverstudios.net
@BlackRvrStudios
/blackrivergames
Phil Lira
f.lira@samsung.com
@phi_lira
THANKS!
CONTACTS
www.blackriverstudios.net
@BlackRvrStudios
/blackrivergames

More Related Content

PPTX
Optimizing unity games (Google IO 2014)
Alexander Dolbilov
 
PDF
Unity Internals: Memory and Performance
DevGAMM Conference
 
PPTX
LOD and Culling Systems That Scale - Unite LA
Unity Technologies
 
PPTX
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
repii
 
PPTX
Frostbite on Mobile
Electronic Arts / DICE
 
PDF
Lighting of Killzone: Shadow Fall
Guerrilla
 
PDF
Killzone Shadow Fall: Creating Art Tools For A New Generation Of Games
Guerrilla
 
PDF
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
Codemotion
 
Optimizing unity games (Google IO 2014)
Alexander Dolbilov
 
Unity Internals: Memory and Performance
DevGAMM Conference
 
LOD and Culling Systems That Scale - Unite LA
Unity Technologies
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
repii
 
Frostbite on Mobile
Electronic Arts / DICE
 
Lighting of Killzone: Shadow Fall
Guerrilla
 
Killzone Shadow Fall: Creating Art Tools For A New Generation Of Games
Guerrilla
 
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
Codemotion
 

What's hot (20)

PPT
A Bit More Deferred Cry Engine3
guest11b095
 
PDF
Siggraph2016 - The Devil is in the Details: idTech 666
Tiago Sousa
 
PPTX
Optimizing the Graphics Pipeline with Compute, GDC 2016
Graham Wihlidal
 
PPT
Star Ocean 4 - Flexible Shader Managment and Post-processing
umsl snfrzb
 
PPTX
Shiny PC Graphics in Battlefield 3
Electronic Arts / DICE
 
PPT
Secrets of CryENGINE 3 Graphics Technology
Tiago Sousa
 
PPTX
Putting the AI Back Into Air: Navigating the Air Space of Horizon Zero Dawn
Guerrilla
 
PDF
Deferred Rendering in Killzone 2
Guerrilla
 
PPTX
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas Trudel
Umbra Software
 
PDF
Rendering AAA-Quality Characters of Project A1
Ki Hyunwoo
 
PDF
Rendering Tech of Space Marine
Pope Kim
 
PPTX
[Ndc11 박민근] deferred shading
MinGeun Park
 
PPTX
Decima Engine: Visibility in Horizon Zero Dawn
Guerrilla
 
PPT
Level Design Challenges & Solutions - Mirror's Edge
Electronic Arts / DICE
 
PPTX
Built for performance: the UIElements Renderer – Unite Copenhagen 2019
Unity Technologies
 
PPT
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
repii
 
PPTX
Game object models - Game Engine Architecture
Shawn Presser
 
PDF
Ndc2010 김주복, v3. 마비노기2아키텍처리뷰
Jubok Kim
 
PDF
Multiplayer Game Sync Techniques through CAP theorem
Seungmo Koo
 
PPTX
FrameGraph: Extensible Rendering Architecture in Frostbite
Electronic Arts / DICE
 
A Bit More Deferred Cry Engine3
guest11b095
 
Siggraph2016 - The Devil is in the Details: idTech 666
Tiago Sousa
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Graham Wihlidal
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
umsl snfrzb
 
Shiny PC Graphics in Battlefield 3
Electronic Arts / DICE
 
Secrets of CryENGINE 3 Graphics Technology
Tiago Sousa
 
Putting the AI Back Into Air: Navigating the Air Space of Horizon Zero Dawn
Guerrilla
 
Deferred Rendering in Killzone 2
Guerrilla
 
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas Trudel
Umbra Software
 
Rendering AAA-Quality Characters of Project A1
Ki Hyunwoo
 
Rendering Tech of Space Marine
Pope Kim
 
[Ndc11 박민근] deferred shading
MinGeun Park
 
Decima Engine: Visibility in Horizon Zero Dawn
Guerrilla
 
Level Design Challenges & Solutions - Mirror's Edge
Electronic Arts / DICE
 
Built for performance: the UIElements Renderer – Unite Copenhagen 2019
Unity Technologies
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
repii
 
Game object models - Game Engine Architecture
Shawn Presser
 
Ndc2010 김주복, v3. 마비노기2아키텍처리뷰
Jubok Kim
 
Multiplayer Game Sync Techniques through CAP theorem
Seungmo Koo
 
FrameGraph: Extensible Rendering Architecture in Frostbite
Electronic Arts / DICE
 
Ad

Viewers also liked (20)

PPTX
Practical Guide for Optimizing Unity on Mobiles
Valentin Simonov
 
PDF
Mobile Performance Tuning: Poor Man's Tips And Tricks
Valentin Simonov
 
PPTX
Unity Optimization Tips, Tricks and Tools
Intel® Software
 
PDF
Optimizing Large Scenes in Unity
Noam Gat
 
PPTX
EA: Optimization of mobile Unity application
DevGAMM Conference
 
PPT
Visual surface detection i
elaya1984
 
PDF
Unity3D Tips and Tricks or "You are doing it wrong!"
Taras Leskiv
 
PPT
Photography & Development of Magzine Cover
ioji1
 
PPTX
IGDA RI January '16 - Jammin' - Game Jams and Hackathons Workshop
Ben Taylor
 
PPTX
Intro to Game Modding - Lecture 6
Charles Palmer
 
PPTX
Intro to Game Modding - Lecture 3
Charles Palmer
 
PPT
Intro to Game Modding - Lecture 4
Charles Palmer
 
PPTX
Virtual Reality Presentation at #HybridLive
Charles Palmer
 
PPSX
Oit And Indirect Illumination Using Dx11 Linked Lists
Holger Gruen
 
PPTX
Unity - Internals: memory and performance
Codemotion
 
PPTX
Stochastic Screen-Space Reflections
Electronic Arts / DICE
 
PPTX
[Unite2015 박민근] 유니티 최적화 테크닉 총정리
MinGeun Park
 
PPTX
[데브루키/141206 박민근] 유니티 최적화 테크닉 총정리
MinGeun Park
 
PDF
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
ozlael ozlael
 
PDF
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
ozlael ozlael
 
Practical Guide for Optimizing Unity on Mobiles
Valentin Simonov
 
Mobile Performance Tuning: Poor Man's Tips And Tricks
Valentin Simonov
 
Unity Optimization Tips, Tricks and Tools
Intel® Software
 
Optimizing Large Scenes in Unity
Noam Gat
 
EA: Optimization of mobile Unity application
DevGAMM Conference
 
Visual surface detection i
elaya1984
 
Unity3D Tips and Tricks or "You are doing it wrong!"
Taras Leskiv
 
Photography & Development of Magzine Cover
ioji1
 
IGDA RI January '16 - Jammin' - Game Jams and Hackathons Workshop
Ben Taylor
 
Intro to Game Modding - Lecture 6
Charles Palmer
 
Intro to Game Modding - Lecture 3
Charles Palmer
 
Intro to Game Modding - Lecture 4
Charles Palmer
 
Virtual Reality Presentation at #HybridLive
Charles Palmer
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Holger Gruen
 
Unity - Internals: memory and performance
Codemotion
 
Stochastic Screen-Space Reflections
Electronic Arts / DICE
 
[Unite2015 박민근] 유니티 최적화 테크닉 총정리
MinGeun Park
 
[데브루키/141206 박민근] 유니티 최적화 테크닉 총정리
MinGeun Park
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
ozlael ozlael
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
ozlael ozlael
 
Ad

Similar to How we optimized our Game - Jake & Tess' Finding Monsters Adventure (20)

PPTX
Developing Next-Generation Games with Stage3D (Molehill)
Jean-Philippe Doiron
 
PPTX
IMAGE PROCESSING
ABHISHEK MAURYA
 
PDF
Uncharted3 effect technique
MinGeun Park
 
PPTX
Adding more visuals without affecting performance
St1X
 
PPTX
Better, Faster, Smarter, Witcher. Production tips from The Witcher 3: Wild Hu...
DevGAMM Conference
 
PDF
Uncharted 2: Character Pipeline
Naughty Dog
 
PDF
Android open gl2_droidcon_2014
Droidcon Berlin
 
PDF
Sista: Improving Cog’s JIT performance
ESUG
 
PPTX
Shadowing production requests
Jakauteri
 
PDF
Optimizing thread performance for a genomics variant caller
AllineaSoftware
 
PPTX
Oculus insight building the best vr aaron davies
Mary Chan
 
PDF
Dynamic Wounds on Animated Characters in UE4
Michał Kłoś
 
PPTX
Component-first Applications
Miguelangel Fernandez
 
PPTX
Making a game with Molehill: Zombie Tycoon
Jean-Philippe Doiron
 
PPTX
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
Daosheng Mu
 
PDF
Smooth Animations for Web & Hybrid
FITC
 
PPTX
Grokking Techtalk #37: Data intensive problem
Grokking VN
 
PPTX
Software testing and quality assurance
Benjamin Baumann
 
PDF
APB Customisation System
msciglio
 
PPTX
Evaluation Activity 6
SHEKARIE
 
Developing Next-Generation Games with Stage3D (Molehill)
Jean-Philippe Doiron
 
IMAGE PROCESSING
ABHISHEK MAURYA
 
Uncharted3 effect technique
MinGeun Park
 
Adding more visuals without affecting performance
St1X
 
Better, Faster, Smarter, Witcher. Production tips from The Witcher 3: Wild Hu...
DevGAMM Conference
 
Uncharted 2: Character Pipeline
Naughty Dog
 
Android open gl2_droidcon_2014
Droidcon Berlin
 
Sista: Improving Cog’s JIT performance
ESUG
 
Shadowing production requests
Jakauteri
 
Optimizing thread performance for a genomics variant caller
AllineaSoftware
 
Oculus insight building the best vr aaron davies
Mary Chan
 
Dynamic Wounds on Animated Characters in UE4
Michał Kłoś
 
Component-first Applications
Miguelangel Fernandez
 
Making a game with Molehill: Zombie Tycoon
Jean-Philippe Doiron
 
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
Daosheng Mu
 
Smooth Animations for Web & Hybrid
FITC
 
Grokking Techtalk #37: Data intensive problem
Grokking VN
 
Software testing and quality assurance
Benjamin Baumann
 
APB Customisation System
msciglio
 
Evaluation Activity 6
SHEKARIE
 

How we optimized our Game - Jake & Tess' Finding Monsters Adventure

  • 1. How we optimized our Game – Jake & Tess’ Finding Monsters Adventure Phil Lira Sr. Staff Engineer (Graphics) @phi_lira
  • 5. Technical challenges Many custom shaders and effects
  • 6. Technical challenges Many custom shaders and effects
  • 8. Our budget is the limit • Push as much content as possible with smooth gameplay and no overheat – Can we get the same quality with a similar approach? – Are we doing something we don’t need to?
  • 9. What if we hit our budge • What happens when we fail? – Either gameplay or visual quality will be impacted • When it comes to remove effects, trust is important
  • 11. Optimization Process • Do not make any assumptions. • A profiler will tell you where the bottleneck is. Profile Optimize Test
  • 12. Optimization Process • Rewrite code to use resources more efficiently • Often we can fake or simplify effects • Experience comes into play here. OptimizeProfile Test
  • 13. Optimization Process • Guarantee your tests have same conditions • Did you work reduced overall gpu ms? TestProfile Optimize
  • 14. How to find our bottleneck? • Unity comes with a built-in profiler that does most of the work • We wanted to have more detailed GPU info – Adreno Profiler – Snapdragon GPUs – Mali Graphics Debugger (MGD) and DS-5 Streamline – Mali GPUs
  • 16. How to find our bottleneck? Disable GL Frame rate increased? No Yes CPU Bound GPU Bound Vertex Frag Memory
  • 17. How to find our bottleneck? • Vertex – #triangles – Vertex shader – Per-vertex lighting • Fragment – Fragment Shader (instruc. / sample) – Blend Ops – Per-Pixel light (forward rendering) • Bandwidth – Large textures – Dependent Texture Reads – Block Resolve (ReadPixels)
  • 18. CASE STUDY – ROYAL MOON
  • 19. Case Study – Royale Moon • Triangles 106k • Drawcalls 87 • Overdraw 2.51x • Shader Stats: – Up to 160 ALU/Frag – Up to 7 texture samples • Adreno %Time Shading Fragment - max – Fragment bound
  • 21. Case Study – Royale Moon • Early Z-Test Discards occluded fragments • Render Order Matters • Optimized Render Order – Opaques – Front to Back – Skybox – Transparent – Back to Front – Overlay (UI / HUD) We need to improve this
  • 22. How to assign object to sorting layers? • Per Shader – Have to duplicate shader files. Hard to maintain because we have to make changes individually to each duplicate. • Per Mesh – Not scalable, requires lot of work. – Risky! May break batches by mistake. • Per Material – YES! – In that case do not use same material for different scene • While you fix sort for one might break for the other.
  • 23. Custom Material Inspector • Created an editor script BRSMaterialEditor to set Material.renderQueue • Add CustomEditor “BRSMaterialEditor” to the end of shader file.
  • 29. Before and After Improving Sort Reduced from 2.51 to 1.91
  • 32. Shader hotzone (% time shading)
  • 33. Shader hotzone (ALU per frag)
  • 34. • Improving Shader Instructions – Model: ops that can be done once per drawcall • Use scripts to compute and pass values to shader • Input Vector Normalization (ex. Rim Light) • Scroll Offset – Vertex: Ops that can be done per vertex • Uniform texture tile & offset – Fragment: Ops that needs to be done per pixel • Equation simplification • Half & Fixed precision for better thermal • Saturate vs max(0.0, dot) Fragment Vertex Model COMPLEXITY How to optimize fragment shader
  • 35. Optimizing Shaders • Many custom shaders done in ShaderForge – ShaderForge does heavy work on fragment • Many variants and not exactly the same code structure • How to optimize them all? – 1st pass optimizing in ShaderForge – 2nd pass optimizing in Code
  • 36. 1st Pass: ShaderForge • Identify core changes to lighting model – BlinnPhongWrapped – BlinnPhongRamp • Created custom code node – Artist helped with the process to replace for this code – This made shader code common and more organized
  • 38. Custom Lightmap in ShaderForge • One major art complain was the lack of support for lightmap in custom lighting • Created a Lightmap node for them • Problem1: Need to enable lightmap in config shader header. • Problem2: ShaderForge does not exposes interpolated data.
  • 39. 2nd Pass: Shader Code Created a cginc file with macros for optimized code • ShaderForge follows name convention for input data
  • 40. The results - Ground Shader After optimization: Before optimization: • Avg ALU/Frag – ~21% reduction • Fragments Shaded – ~45% reduction Overall Improvement: ~7ms • Fragment Instructions – ~64% reduction
  • 41. Further Improvements • Fallback Shader – We came across some problems with shaders not being supported for some configurations – Vertex Animation with a noise texture (tex2dlod) is not supported on OpenGL ES 2.0 profiles – Fallback shader to standout in those cases – Makes it easy to differentiate from other errors
  • 43. ASTC • Optimal performance with high quality • Improves bandwitdh and power consuption • Galaxy Note 4, Galaxy S6 and above support it • Supported with OpenGL 3 Unity profile
  • 44. ASTC ASTC 4x4 ASTC 6x6 ETC 2
  • 45. ASTC Format RGB RGBA Normal Map Codec ASTC 6x6 ASTC 4x4 ASTC 4x4 BPP 3.56 8 8 Size vs Uncompressed 14.8% 50% 50% Size vs ETC2 89% 100% 100% Recommended Settings:
  • 46. Review • Do not make assumptions, use a profiler. • GPU profilers will give you in-depth data per drawcall • One can assign objects to sorting layers at material level for best workflow • Reduce amount of work to optimize shader by creating means to reuse optimized code. • ASTC texture compression is best option available for quality but only supported in a few devices.

Editor's Notes

  • #3: We will play Finding Monsters Release Trailer Here.
  • #9: Optimizitation allows us to push more content at higher framerates. We want to push as much content as possible without impacting gameplay. At mobile we are also concerned with Overheat and Battery time. Optimizing for thermal will give more gameplay time for players. While Optimizing, we frequently ask ourselves the following: * Can we get the same quality with a similar effect? For instance, if you want to take a screenshot, a RenderTexture is faster in most cases than doing a ReadPixels. Or sometimes we can make some simplifications in the shader to achieve a similar effect. However, there’s no free lunch. I often come up to the technical artists and say: “Hey, we can achieve a very similar effect but we’ll have to change some material properties and/or maps.” * Are we doing something we don’t need to? For instance, creating and destroying game objects while you could be pre-alocating and caching them.
  • #10: If we fail at further optimizing our game and still consume more resources the GPU can offer, either gameplay (lower framerates) or visual quality will be impacted. Usually we favor smooth gameplay over visual quality and end up removing effects. When it comes to that, Trust plays an important role. At Blackriver studios we built a team upon trust. We look out for each other. I know the effort, dedication and passion the art team put into our games and I do my best to optimize it. When it comes to the point I say we must make some adjustments that will impact visuals they know that we really do.
  • #12: We need to find the responsible for consuming those precious ms of your game. Engineers often tend to make assumptions on what might be slowing down our game and off course those assumptions get better with experience. However, one golden rule of optimization is to never assume anything. Sometimes the culprit is something that looks fairly simple like a blob shadow for instance. Use a profiler to tell where your bottleneck is. If you’re optimizing something that’s not your bottleneck then you’re wasting time.
  • #13: Once you find your bottleneck then it’s time to actually get the hands on optimizing the hotzone. Experience plays an important role here and will give you a hint of what to do.
  • #14: Finally we want to test if we actually had some improvement. One very important thing is to note that the test scenario has to have exactly the same conditions of the scenario we profiled or you might get wrong results. sometimes. That might be a little tricky though. At the end, we see how many ms we saved and repeat it all over again.
  • #15: How to find the bottleneck? Profilers will timestamp your game to tell what the hot zones are. Unity comes with a builtin profile that can do most of the work. However, we want to have more detailed info on what’s going on in the GPU. We used GPU profilers for that. They come with specific counters that can tell you easily the graphics pipeline hotzones and even allow you to replace a few resources while running to speed up your tests. * Adreno Profiler is a all-in-one solution to profile Qualcomm’s Snapdragon GPUs. * Mali Graphis Debugger and DS-5 Streamline are tools provided by ARM to debug and profiler Mali GPUs.
  • #16: Throughout this talk will show how we profiled our game using Adreno GPU Profiler.
  • #17: Our optimization workflow goes like this: We fire up Adreno Profiler. There’s an override to disable all OpenGL calls submitted to GPU. Disable OpenGL calls -> (Does it greatly improve fps?) -> No -> We’re CPU bound -> Go for Unity profiler. (You might get Render data there, in that case you have too much driver overhead) Yes -> GPU Bound -> Adreno also has many counters to tell which stage of the pipeline is stalled % time vertex (draw calls and triangles, index vs triangle ratio, vertex shader) % time fragment (frag shader instructions, blend, overdraw, texture sampling & filtering) memory stalls (blocking resolves, texture bandwidth)
  • #18: One can breakdown the graphics pipeline into 3 macro stages: Vertex, Fragment, and Bandwidth. Vertex Bound: Improve Index Locality for better cache. (Unity does this for you if you toggle Optimize Mesh at import settings.) Use less vertex attributes possible (normals, color, tangent, etc). Each additional attribute might split your vertices. Decrease the amount of triangles sent to GPU by performing Frustum & Occlusion Culling and by using Mesh LOD and Impostors to render distant meshes. Simplify Vertex Shaders: Per-vertex lights. GPU Skinning Vertex Offset Fragment Bound: Simplify Fragment Shader Amount of instructions and samples in texture Dependent Texture Reads Blending Decrease amount of per-pixel lights. (Forward Rendering) Bandwidth Use compression and mipmaps. Avoid operations that stall GPU (block resolve). ReadPixels for instance.
  • #19: Royal Moon is one of the stages in our game. We’ll show it a few techniques we used to optimize it.
  • #20: This is the breakdown of our scene. We’re clearly Fragment Bound.
  • #21: Here’s an Overdraw debugger captured with Adreno Profiler. Brighter pixels are hot zones and tell that how much they have been written. For opaque meshes every time we redraw a pixel we’re wasting time. We need to sort our scene for optimial performance. From this image we can do less fragment operations by reducing overdraw ratio.
  • #22: Whenever you process a fragment in the frag shader, the GPU already know it’s depth or z value. The GPU then can test if the current fragment is already occluded by a previously computed one and discard it as it will not have any effect in the final image. That is called Depth Test. Thus, the order in which you render your objects matters for perfomance as you can try to maximize the amount of fragments that gets discarded. The best way to render your scene is: Render Opaque objects from Front to Back. Render Skybox Render Transparent objects from Back to Front. The reason for that is because in order to correctly blend alpha objects must have the value of pixels behing it already computed. Overlays (HUD/UI) In order to improve overdraw we need to improve the render order of our opaque objects. Unity already does this for you based on game object pivot. However there are some special cases that doesn’t work (as we can see in our image). What we can do is to group objects into different sorting layers to improve those specific cases.
  • #23: We have a few options when it comes to assign objects to different sorting layers. Per-Shader: In Unity you can set a RenderQueue in the ShaderLab file. The problem with that is that you’ll have to duplicate a shader file just to assign it to a different sorting layer. It will increase shader compilation and warm time and that’s not easily managed. Plus, when a change is required in the shader we’ll have to propagate to all variants manually. Per-Mesh: This is not scalable and requires a lot of work tweaking per-mesh settings. Also, this is risky as assigning objects to different layers will break batches and one might do it by mistake. Per-Material Seems a balanced approach. It’s easy to group and create materials. One can do per-scene materials to make sure the work done in a scene doesn’t affect other.
  • #24: Unity allows to extend material inspector by creating a custom MaterialEditor script called BRSMaterialEditor. We created one that exposes the render order and layer to easily tweak it. In order to use it on just need to add the following line to the end of the ShaderLab file: CustomEditor “BRSMaterialEditor”
  • #25: We ended up having five opaque render layers for this scene: 1) Character and Props 2) Island Top that camera is on. 3) Outer Islands 4) Planets 5) Skydome
  • #30: This is a comparative of before and after improving sort. You can see now that characters have much better overdraw and bottom islands don’t appear anymore. OBS: The ground will appears darker on the first image due to me capturing the frame without rendering shadows by mistake (which add a additional render pass to render the ground)
  • #31: This is a hightlight of Depth Test discards. You can think of this image as a negative to the previous one where more red the better. You can see characters and planets now have much more discards too.
  • #33: Adreno Profiler provides a nice and fast way to see your fragment shader hotzone. You can query pre-draw call stats like Fragment Instructions, Textures / Fragment and Math Ops / Frag and Adreno will colorize each one of them. This picture sorts drawcalls by the % percent of time spent on shading fragments, which is our bottleneck. This counter takes into account the fragments shaded * complexity to shade fragments. This picture shows that the ground is the rendercall that spents most time shading fragments. So that will be a good candidate for improvement.
  • #34: Here´s another interesting shader we can look at. Characters. They have the most ALU/frag and texture/frag in the scene. So, why isn´t it the this the rendercall that spends more time shading fragments. Simply due to the fragments shaded being about 1/3 of the one in the ground. Remember the ground was renderer prior to characters before we optimized for overdraw.
  • #35: One good thing to notice when optimizing fragment shaders is to do less operations possible on it. If there’s something we can do at vertex or even at model that would be best.For instance, one of our monsters has a inner point light inside of him. This light flickers by adjusting light intensity using a Fourier Sum of sines. We don’t need to compute this light intensity per-fragment not per-vertex. We do it at a script level. Then we pass the light intensity as a uniform to the shader. Another example is: if we know all of our textures that use uv0 have the same tile & offset we can perform this at vertex instead of doing at fragment. This will save us not only a few instructions on the fragment but also be better for gpu to sample the textures. At fragment level we can also do some micro-optimizations.
  • #36: One of the challenges that we faced to optimize the shaders of this game was the fact that most shaders were authored by Tech Artists using a visual node tool called ShaderForge. Although ShaderForge is a nice to create and prototype shaders it does heavy work on fragment, which is far from ideal in our case. Also, due to the shaders being written by a visual node tool frequently there are tons of mini variations that don’t produce the same code. At this point we came to the question of how to optimize all these shaders. We did it in 2 passes. First we did a first pass on ShaderForge and in the shader code to optimized for things ShaderForge don’t account for.
  • #37: In the first pass we first identified all the core lighting model functions. Most of the shaders were using variations of BlinnPhong with Diffuse Wrap and Ramps. There were some other variations not to the core of lighting like Rim Lights and Custom Fog. ShaderForge allows one to create code nodes. Then, we created a few code to implement uniformly these core lighting functions and with the help of the artists replicated in the shaders we had. Also, we also come up with a solution to save/load these code nodes. if we ever needed a change to this core code nodes, we could just change them and replicate to other shaders as opposed to make changes individually to each one. This made shader code more uniform and easy to work on later.
  • #38: This is an example of a Code Node we did. Ambient color is not applied in it as you can see because not all of our shaders use it.
  • #39: While optimizing the shaders, one major complain from art was the lack of support for lightmap for custom lighting shaders in ShaderForge. I sat down with our artists and we discussed how they wanted it to be implemented. We came up with a solution with a code node that worked with minimal changes required. We found out the following: Although ShaderForge doesn’t support lightmap in custom lighting one can open the shader file and change the variable lmpd:False to lmpd:True. ShaderForge does not rewrite the shader header when it gets compiled, so we only needed to do this once per new shader. Another problem we found was that we have no means to get fragment shader input interpolated data. We have to pass that as input with a Node property and had to reapply tile/offset to lightmap uv in the fragment. Later, when we optimize in shader code we move this to vertex.
  • #40: In a second pass we optimized for the shader in code. First we created a cginc file to add all of our functions and MACROS. Because the code is uniform, i.e, all vertex and fragment has same name conventions and core functions have all same params names we can easily replace ShaderForge generated code with our optimized one by replacing with MACROS. In the picture you can see the MACROS and functions we made make the shader code really clean and lean. Plus, if we come up with a improvement they will all be replicated to all shaders. You can also see the custom material editor and our error fallback shader in which I will discuss further in this presentation.
  • #41: Here’s rough comparative of the results we got for the ground shader. We came from 90.50 ALU/Frag down to 71. The fragments shaded were reduced by 45% accouting for a total of 64% in the fragment instructions. Considering all shaders optimized for this scene, the total improvement was roughly 7ms.
  • #42: As a further improvement in the shaders we came up with a fallback error shader. We came up with a few errors in the shaders. Some of them were related to features not supported in OpenGL 2.0 profiles like doing a vertex offset by sampling a texture in the vertex shader (with tex2dlod). In those cases, the shader was fallbacking to plain diffuse which was kind of hard to spot right on. We then created a fallback error shader to make it easily standout when a shader is not supported in our current configurations. That makes it really easy to standout from other shader problems.
  • #44: Texture compression is important to improve bandwidth and improve power consuption. Blocky texture compression like ASTC, ETCn, DXTn, ATC and PVRTC are straighforward to GPUs and they don’t need to decompress it in order to read. However, the algorithms lose information when compress the texture (they are called lossy compression). ASTC is a texture compression developed by ARM that has the advantage of a block texture compression speed without losing much of the texture quality. At the moment, Samsung’s Galaxy Note 4, Galaxy S6 and above support it. It is supported in Unity OpenGL 3 profile and for Android Lollipop Android devices.
  • #45: Unlike other texture compression formats, ASTC supports different block compression configurations, allowing one to tweak tradeoff between performance and quality. Here’s a comparative of ASTC4x4, ASTC6x6 and ETC2. It is important to notice that even ASTC 6x6 having a block size larger than ETC2 the quality of the compression is still much better.
  • #46: This table showing the texture configuration we have for our most common assets. It’s also interesting to notice that ASTC4x4 provides the same size of ETC2 however with greater quality.