Investigating the Effects of
Overcommitting YARN Resources
Jason Lowe
jlowe@yahoo-inc.com
Problem: Underutilized Cluster Resources
Optimize The Jobs!
● Internal Downsizer tool quantifies job waste
● Application framework limitations
● Optimally tuned container can still have opportunities
Time
ContainerUtilization
Underutilized
Resources
What about Static Overcommit?
● Configure YARN to use more memory than node provides
● Tried with some success
● Performs very poorly when node fully utilized
Overcommit Prototype Design Goals
● No changes to applications
● Minimize changes to YARN protocols
● Minimize changes to scheduler internals
● Overcommit on memory only
● Conservative growth
● Rapid correction
Overcommit Overview
ResourceManager NodeManager
Utilization report in heartbeat
■ Unaware of overcommit amount
■ Self-preservation preemption
■ Adjusts internal node size
■ Assigns containers based on new size
Application
Masters
NodeMemoryNodeUtilization
ResourceManager Node Scaling
Time
Time
No Overcommit
Reduced Overcommit
Full Overcommit
Allocated Node Mem
Total Node Mem
Original Node Mem
ResourceManager Overcommit Tunables
Parameter Description Value
memory.max-factor Maximum amount a node will be overcommitted 1.5
memory.low-water-mark Maximum overcommit below this node utilization 0.6
memory.high-water-mark No overcommit above this node utilization 0.8
memory.increment-mb Maximum increment above node allocation 16384
increment-period-ms Delay between overcommit increments if node
container state does not change
0
Parameters use yarn.resourcemanager.scheduler.overcommit. prefix
NodeManager Self-Preservation Preemption
Node Utilization
High Water Mark
Low Water Mark
● Utilization above high mark triggers preemption
● Preempts enough to reach low mark utilization
● Does not preempt containers below original node size
● Containers preempted in group order
○ Tasks from preemptable queue
○ ApplicationMasters from preemptable queue
○ Tasks from non-preemptable queue
○ ApplicationMasters from non-preemptable queue
● Youngest containers preempted first within a group
0%
100%
NodeManager Overcommit Tunables
Parameter Description Value
memory.high-water-mark Preemption when above this utilization 0.95
memory.low-water-mark Target utilization after preemption 0.92
Parameters use yarn.nodemanager.resource-monitor.overcommit. prefix
Results
Results - Capacity_Gained vs Work_Lost
Lessons Learned
● Significant overcommit achievable on real workloads
● Far less preemption than expected
● Container reservations can drive overcommit growth
● Coordinated reducers can be a problem
● Cluster totals over time can be a bit confusing at first
Future Work
● YARN-5202
● Only grows cluster as a whole not individual queues
● Nodes can overcommit while others are relatively idle
● CPU overcommit
● Predict growth based on past behavior
● Relinquish nodes during quiet periods
● Integration with YARN-1011
YARN-1011
● Explicit GUARANTEED vs. OPPORTUNISTIC distinction
● Promotion of containers once resources are available
● SLA guarantees along with best-effort load
Acknowledgements
● Nathan Roberts for co-developing overcommit POC
● Inigo Goiri for nodemanager utilization collection and reporting
● Giovanni Matteo Fumarola for nodemanager AM container detection
● YARN-1011 contributors for helping to shape the long-term solution
Questions?
Jason Lowe
jlowe@yahoo-inc.com

Investing the Effects of Overcommitting YARN resources

  • 1.
    Investigating the Effectsof Overcommitting YARN Resources Jason Lowe [email protected]
  • 2.
  • 3.
    Optimize The Jobs! ●Internal Downsizer tool quantifies job waste ● Application framework limitations ● Optimally tuned container can still have opportunities Time ContainerUtilization Underutilized Resources
  • 4.
    What about StaticOvercommit? ● Configure YARN to use more memory than node provides ● Tried with some success ● Performs very poorly when node fully utilized
  • 5.
    Overcommit Prototype DesignGoals ● No changes to applications ● Minimize changes to YARN protocols ● Minimize changes to scheduler internals ● Overcommit on memory only ● Conservative growth ● Rapid correction
  • 6.
    Overcommit Overview ResourceManager NodeManager Utilizationreport in heartbeat ■ Unaware of overcommit amount ■ Self-preservation preemption ■ Adjusts internal node size ■ Assigns containers based on new size Application Masters
  • 7.
    NodeMemoryNodeUtilization ResourceManager Node Scaling Time Time NoOvercommit Reduced Overcommit Full Overcommit Allocated Node Mem Total Node Mem Original Node Mem
  • 8.
    ResourceManager Overcommit Tunables ParameterDescription Value memory.max-factor Maximum amount a node will be overcommitted 1.5 memory.low-water-mark Maximum overcommit below this node utilization 0.6 memory.high-water-mark No overcommit above this node utilization 0.8 memory.increment-mb Maximum increment above node allocation 16384 increment-period-ms Delay between overcommit increments if node container state does not change 0 Parameters use yarn.resourcemanager.scheduler.overcommit. prefix
  • 9.
    NodeManager Self-Preservation Preemption NodeUtilization High Water Mark Low Water Mark ● Utilization above high mark triggers preemption ● Preempts enough to reach low mark utilization ● Does not preempt containers below original node size ● Containers preempted in group order ○ Tasks from preemptable queue ○ ApplicationMasters from preemptable queue ○ Tasks from non-preemptable queue ○ ApplicationMasters from non-preemptable queue ● Youngest containers preempted first within a group 0% 100%
  • 10.
    NodeManager Overcommit Tunables ParameterDescription Value memory.high-water-mark Preemption when above this utilization 0.95 memory.low-water-mark Target utilization after preemption 0.92 Parameters use yarn.nodemanager.resource-monitor.overcommit. prefix
  • 11.
  • 12.
  • 13.
    Lessons Learned ● Significantovercommit achievable on real workloads ● Far less preemption than expected ● Container reservations can drive overcommit growth ● Coordinated reducers can be a problem ● Cluster totals over time can be a bit confusing at first
  • 14.
    Future Work ● YARN-5202 ●Only grows cluster as a whole not individual queues ● Nodes can overcommit while others are relatively idle ● CPU overcommit ● Predict growth based on past behavior ● Relinquish nodes during quiet periods ● Integration with YARN-1011
  • 15.
    YARN-1011 ● Explicit GUARANTEEDvs. OPPORTUNISTIC distinction ● Promotion of containers once resources are available ● SLA guarantees along with best-effort load
  • 16.
    Acknowledgements ● Nathan Robertsfor co-developing overcommit POC ● Inigo Goiri for nodemanager utilization collection and reporting ● Giovanni Matteo Fumarola for nodemanager AM container detection ● YARN-1011 contributors for helping to shape the long-term solution
  • 17.

Editor's Notes

  • #13 3.3 million GB hours gained and only 502 GB hours lost. Less than 0.01%