Gossip-based resource allocation with
performance and energy-savings objectives for
large clouds
Rerngvit Yanggratoke

Fetahi Wuhib

LCN Seminar
KTH Royal Institute of Technology

April 7, 2011

Rolf Stadler
Motivation

“datacenters in US consumed 1.5 % of total US power
consumption, resulting in energy cost of $4.5 billions”
–U.S. Environmental Protection Agency, 2007.

“per-server power consumption of a datacenter, over its
lifetime, is now more than the cost of the server itself”
–Christian L. Belady, 2007.
Server Consolidation

Minimize number of active servers. Idle servers can be shutdown.
Why?
The average utilization level
of servers in datacenters is
just 15% (EPA 2007)

VMWare Distributed Power
Management(DPM)

An idle server typically
consumes at least 60% of its
power consumption under
full load (VMWare DPM,
2010).
Existing works

Products: VMWare DPM, Ubuntu Entreprise Cloud Power
Management.
Research: (G. Jung et al., 2010), (V. Petrucci et al., 2010),
(C. Subramanian et al., 2010), (M. Cardosa et al., 2010) ...
All of them based on some centralized solutions.
Bottleneck.
Single point of failure.
Design goals and design principles
Design goals
Server consolidation in case of underload.
Fair resource allocation in case of overload.
Dynamic adaptation to changes in load patterns.
Scalable operation.
Design principles
a distributed middleware architecture.
distributed protocols - epidemic or gossip-based algorithms.
Generic protocol for resource management(GRMP).
Instantiation for solving the goals above(GRMP-Q).
The problem setting

The cloud service provider operates
the physical infrastructure.
The cloud hosts sites belonging to
its clients.
Users access sites through the
Internet.
A site is composed of modules.
Our focus: allocating CPU and
memory resources to sites.

The stakeholders.
Middleware architecture

Key components: machine
manager and site manager.

The middleware runs on all machines in the cloud.
Middleware architecture (Cont.)

Standby - ACPI
G2(Soft-off).
Activate - wake-on-LAN
(WoL) packet.
The machine pool service.
Modeling resource allocation

Demand and capacity
M , N : set of modules and machines (servers) respectively.
ωm (t), γm : CPU and memory demands of module m ∈ M .
Ω, Γ: CPU and memory capacity of a machine in the cloud.
Resource allocation
ωn,m (t) = αn,m (t)ωm (t): demand of module m on machine n.
A(t) = (αn,m (t))n,m a configuration matrix.
Machine n allocates ωn,m (t) CPU and γm memory to module m.
ˆ
ωn,m (t) = Ωωn,m (t)/ i ωn,i : local resource allocation policy
ˆ
ˆ
Ω.
Utility and power consumption

Utility
ω
ˆ

(t)

un,m (t) = ωn,m (t) : utility of module m on machine n.
n,m
u(s, t) = minn,m∈Ms un,m (t): site utility.
U c (t) = mins|u(s,t)≤1 u(s, t): cloud utility.
Power consumption
Assuming homogenous machines.
Pn (t) =

0
1

if rown (A)(t)1 = 0
otherwise
The resource allocation problem
Resource allocation as a utility maximization problem
maximize

U c (t + 1)

minimize

P c (t + 1)

minimize

c∗ (A(t), A(t + 1))

subject to A(t + 1) ≥ 0, 1T A(t + 1) = 1T
ˆ
Ω(A(t + 1), ω(t + 1))1 Ω
sign(A(t + 1))γ

Γ.

Cost of reconfiguration
c∗ is the number of module instances that are started to
reconfigure the system.
Protocol GRMP: pseudocode for machine n

initialization
1: read ω, γ, Ω, Γ, rown (A);
2: initInstance();
3: start passive and active threads;
active thread
1: for r = 1 to rmax do
2: n = chooseP eer();
3: send(n , rown (A));
4: rown (A) = receive(n );
5: updateP lacement(n , rown (A));
6: sleep until end of round;
7: write rown (A);

passive thread
1: while true do
2: rown (A) = receive(n );
3: send(n , rown (A));
4: updateP lacement(n , rown (A));
Three abstract methods:
initInstance();
chooseP eer();
updateP lacement(n , rown (A));
Protocol GRMP-Q: pseudocode for machine n
Principles:

initInstance()
1: read Nn ;
choosePeer()
1: if rand(0..1) < p then
2: n = unif rand(Nn );
3: else
4: n = unif rand(N − Nn );
updatePlacement(j, rowj (A))
1: if (ωn + ωj ≥ 2Ω) then
2: equalize(j, rowj (A));
3: else
4: if j ∈ Nn then
5:
packShared(j);
6: packNonShared(j);
ωn =

m

ωn,m ;

Prefer a gossiping peer with common
modules. Nn = {j ∈ N, where j have
common modules with n}.
Equalize if aggregation load ≥
aggregation capacity.
Pick destination machine to pack:

higher load machine if both
are underloaded.
underloaded machine if one
is overloaded.
Utilize both CPU and memory during
the packing process.
pickSrcDest(j)
1: dest = arg max(ωn , ωj );
2: src = arg min(ωn , ωj );
3: if ωdest > Ω then swap dest and src
4: return (src, dest);
GRMP-Q - method updatePlacement(j, rowj (A))
Protocol GRMP-Q (Cont.)

packShared(j)
1: (s, d) = pickSrcDest(j);
2: ∆ωd = Ω − m ωd,m ;
3: if ωs > Ω then ∆ωs =
m ωs,m − Ω else ∆ωs =
m ωs,m ;
4: Let mod be the list of modules
shared by s and d, sorted by decreasing γs,m /ωs,m ;
5: while mod = ∅ ∧ ∆ωs > 0 ∧
∆ωd > 0 do
6: m = popF ront(mod);
7: δω = min(∆ωd , ∆ωs , ωs,m );
8: ∆ωd -= δω; ∆ωs -= δω; δα =
αs,m ωδω ;
s,m
9: αd,m += δα; αs,m -= δα;

vn =

m

ωn,m
;
Ω

gn =

m

γn,m
;
Γ

packNonShared(j)
1: (s, d) = pickSrcDest(j);
2: ∆γd = Γ − m γd,m ; ∆ωd = Ω −
m ωd,m ;
3: if ωs > Ω then ∆ωs = m ωs,m − Ω else
∆ωs = m ωs,m ;
4: if vd ≥ gd then sortCri = γs,m /ωs,m ;
else sortCri = ωs,m /γs,m ;
5: Let nmod be the list of modules on s
not shared with d, sorted by decreasing
sortCri;
6: while nmod = ∅ ∧ ∆γd > 0 ∧ ∆ωd > 0
∧ ∆ωs > 0 do
7: m = popF ront(nmod);
8: δω = min(∆ωs , ∆ωd , ωs,m ); δγ =
γs,m ;
9: if ∆γd ≥ δγ then
10:
δα = αs,m ωδω ; αd,m += δα;
s,m
11:
αs,m -= δα; ∆γd -= δγ;
12:
∆ωd -= δω; ∆ωs -= δω;
Properties of GRMP-Q

Overload scenarios
fair allocation

Sufficient memory scenario
Optimal solution: the protocol converges into a configuration
where |N |CLF machines are fully packed (wrt CPU) and
,|N | − |N |CLF are empty.

General case
As long as there is a free machine in the cloud, the protocol
guarantees that the demand of all sites is satisfied.
Simulation: demand, capacity and evaluation metrics
Demand ω changes at discrete points in time at which
GRMP-Q recomputes A.
Demand: CPU demand of sites is Zipf distributed. Memory
demand of modules is selected from {128MB, 256MB,
512MB, 1GB, 2GB}.
Capacity: CPU and memory capacities are fixed at 34.513
GHz and 36.409GB respectively.
Evaluation scenarios
different CPU and memory load factors (CLF, MLF).
CLF ={0.1, 0.4, 0.7, 1.0, 1.3} and M LF ={0.1, 0.3, 0.5, 0.7,
0.9}.
different system size.

Evaluation metrics: power reduction, fairness, satisfied
demand, cost of reconfiguration.
Measurement Results

(b) Fraction of sites with satisfied demand.
(a) Fraction of machines that can be shutdown.

(c) Cost of change in configuration.

(d) Fairness among sites.
Measurement Results(Cont.)

Scalability with respect to the number of machines and sites. We
evaluate two different sets of CLF and M LF which are
{(0.5, 0.5), (0.25, 0.25)}.
Conclusion

We introduce and formalize the problem of server
consolidation in site-hosting cloud environment.
We develop GRMP, a generic gossip-based protocol for
resource management that can be instantiated with
different objectives.
We develop an instance of GRMP which we call
GRMP-Q, and which provides a heuristic solution to the
server consolidation problem.
We perform a simulation study of the performance of
GRMP-Q, which indicates that the protocol qualitatively
behaves as expected based on its design. For all parameter
range investigated, the protocol uses at most 30% more
machines of the cloud compared to an optimal solution.
Future work

Works relating to resource allocation protocol GRMP-Q:
Its convergence property for large CLF values.
Its support for heterogeneous machines .
Robustness regarding failures.
Regarding the middleware architecture:
Design a mechanism for deploying new sites.
Extend design to span multiple clusters.

Gossip-based resource allocation for green computing in large clouds

  • 1.
    Gossip-based resource allocationwith performance and energy-savings objectives for large clouds Rerngvit Yanggratoke Fetahi Wuhib LCN Seminar KTH Royal Institute of Technology April 7, 2011 Rolf Stadler
  • 2.
    Motivation “datacenters in USconsumed 1.5 % of total US power consumption, resulting in energy cost of $4.5 billions” –U.S. Environmental Protection Agency, 2007. “per-server power consumption of a datacenter, over its lifetime, is now more than the cost of the server itself” –Christian L. Belady, 2007.
  • 3.
    Server Consolidation Minimize numberof active servers. Idle servers can be shutdown. Why? The average utilization level of servers in datacenters is just 15% (EPA 2007) VMWare Distributed Power Management(DPM) An idle server typically consumes at least 60% of its power consumption under full load (VMWare DPM, 2010).
  • 4.
    Existing works Products: VMWareDPM, Ubuntu Entreprise Cloud Power Management. Research: (G. Jung et al., 2010), (V. Petrucci et al., 2010), (C. Subramanian et al., 2010), (M. Cardosa et al., 2010) ... All of them based on some centralized solutions. Bottleneck. Single point of failure.
  • 5.
    Design goals anddesign principles Design goals Server consolidation in case of underload. Fair resource allocation in case of overload. Dynamic adaptation to changes in load patterns. Scalable operation. Design principles a distributed middleware architecture. distributed protocols - epidemic or gossip-based algorithms. Generic protocol for resource management(GRMP). Instantiation for solving the goals above(GRMP-Q).
  • 6.
    The problem setting Thecloud service provider operates the physical infrastructure. The cloud hosts sites belonging to its clients. Users access sites through the Internet. A site is composed of modules. Our focus: allocating CPU and memory resources to sites. The stakeholders.
  • 7.
    Middleware architecture Key components:machine manager and site manager. The middleware runs on all machines in the cloud.
  • 8.
    Middleware architecture (Cont.) Standby- ACPI G2(Soft-off). Activate - wake-on-LAN (WoL) packet. The machine pool service.
  • 9.
    Modeling resource allocation Demandand capacity M , N : set of modules and machines (servers) respectively. ωm (t), γm : CPU and memory demands of module m ∈ M . Ω, Γ: CPU and memory capacity of a machine in the cloud. Resource allocation ωn,m (t) = αn,m (t)ωm (t): demand of module m on machine n. A(t) = (αn,m (t))n,m a configuration matrix. Machine n allocates ωn,m (t) CPU and γm memory to module m. ˆ ωn,m (t) = Ωωn,m (t)/ i ωn,i : local resource allocation policy ˆ ˆ Ω.
  • 10.
    Utility and powerconsumption Utility ω ˆ (t) un,m (t) = ωn,m (t) : utility of module m on machine n. n,m u(s, t) = minn,m∈Ms un,m (t): site utility. U c (t) = mins|u(s,t)≤1 u(s, t): cloud utility. Power consumption Assuming homogenous machines. Pn (t) = 0 1 if rown (A)(t)1 = 0 otherwise
  • 11.
    The resource allocationproblem Resource allocation as a utility maximization problem maximize U c (t + 1) minimize P c (t + 1) minimize c∗ (A(t), A(t + 1)) subject to A(t + 1) ≥ 0, 1T A(t + 1) = 1T ˆ Ω(A(t + 1), ω(t + 1))1 Ω sign(A(t + 1))γ Γ. Cost of reconfiguration c∗ is the number of module instances that are started to reconfigure the system.
  • 12.
    Protocol GRMP: pseudocodefor machine n initialization 1: read ω, γ, Ω, Γ, rown (A); 2: initInstance(); 3: start passive and active threads; active thread 1: for r = 1 to rmax do 2: n = chooseP eer(); 3: send(n , rown (A)); 4: rown (A) = receive(n ); 5: updateP lacement(n , rown (A)); 6: sleep until end of round; 7: write rown (A); passive thread 1: while true do 2: rown (A) = receive(n ); 3: send(n , rown (A)); 4: updateP lacement(n , rown (A)); Three abstract methods: initInstance(); chooseP eer(); updateP lacement(n , rown (A));
  • 13.
    Protocol GRMP-Q: pseudocodefor machine n Principles: initInstance() 1: read Nn ; choosePeer() 1: if rand(0..1) < p then 2: n = unif rand(Nn ); 3: else 4: n = unif rand(N − Nn ); updatePlacement(j, rowj (A)) 1: if (ωn + ωj ≥ 2Ω) then 2: equalize(j, rowj (A)); 3: else 4: if j ∈ Nn then 5: packShared(j); 6: packNonShared(j); ωn = m ωn,m ; Prefer a gossiping peer with common modules. Nn = {j ∈ N, where j have common modules with n}. Equalize if aggregation load ≥ aggregation capacity. Pick destination machine to pack: higher load machine if both are underloaded. underloaded machine if one is overloaded. Utilize both CPU and memory during the packing process. pickSrcDest(j) 1: dest = arg max(ωn , ωj ); 2: src = arg min(ωn , ωj ); 3: if ωdest > Ω then swap dest and src 4: return (src, dest);
  • 14.
    GRMP-Q - methodupdatePlacement(j, rowj (A))
  • 15.
    Protocol GRMP-Q (Cont.) packShared(j) 1:(s, d) = pickSrcDest(j); 2: ∆ωd = Ω − m ωd,m ; 3: if ωs > Ω then ∆ωs = m ωs,m − Ω else ∆ωs = m ωs,m ; 4: Let mod be the list of modules shared by s and d, sorted by decreasing γs,m /ωs,m ; 5: while mod = ∅ ∧ ∆ωs > 0 ∧ ∆ωd > 0 do 6: m = popF ront(mod); 7: δω = min(∆ωd , ∆ωs , ωs,m ); 8: ∆ωd -= δω; ∆ωs -= δω; δα = αs,m ωδω ; s,m 9: αd,m += δα; αs,m -= δα; vn = m ωn,m ; Ω gn = m γn,m ; Γ packNonShared(j) 1: (s, d) = pickSrcDest(j); 2: ∆γd = Γ − m γd,m ; ∆ωd = Ω − m ωd,m ; 3: if ωs > Ω then ∆ωs = m ωs,m − Ω else ∆ωs = m ωs,m ; 4: if vd ≥ gd then sortCri = γs,m /ωs,m ; else sortCri = ωs,m /γs,m ; 5: Let nmod be the list of modules on s not shared with d, sorted by decreasing sortCri; 6: while nmod = ∅ ∧ ∆γd > 0 ∧ ∆ωd > 0 ∧ ∆ωs > 0 do 7: m = popF ront(nmod); 8: δω = min(∆ωs , ∆ωd , ωs,m ); δγ = γs,m ; 9: if ∆γd ≥ δγ then 10: δα = αs,m ωδω ; αd,m += δα; s,m 11: αs,m -= δα; ∆γd -= δγ; 12: ∆ωd -= δω; ∆ωs -= δω;
  • 16.
    Properties of GRMP-Q Overloadscenarios fair allocation Sufficient memory scenario Optimal solution: the protocol converges into a configuration where |N |CLF machines are fully packed (wrt CPU) and ,|N | − |N |CLF are empty. General case As long as there is a free machine in the cloud, the protocol guarantees that the demand of all sites is satisfied.
  • 17.
    Simulation: demand, capacityand evaluation metrics Demand ω changes at discrete points in time at which GRMP-Q recomputes A. Demand: CPU demand of sites is Zipf distributed. Memory demand of modules is selected from {128MB, 256MB, 512MB, 1GB, 2GB}. Capacity: CPU and memory capacities are fixed at 34.513 GHz and 36.409GB respectively. Evaluation scenarios different CPU and memory load factors (CLF, MLF). CLF ={0.1, 0.4, 0.7, 1.0, 1.3} and M LF ={0.1, 0.3, 0.5, 0.7, 0.9}. different system size. Evaluation metrics: power reduction, fairness, satisfied demand, cost of reconfiguration.
  • 18.
    Measurement Results (b) Fractionof sites with satisfied demand. (a) Fraction of machines that can be shutdown. (c) Cost of change in configuration. (d) Fairness among sites.
  • 19.
    Measurement Results(Cont.) Scalability withrespect to the number of machines and sites. We evaluate two different sets of CLF and M LF which are {(0.5, 0.5), (0.25, 0.25)}.
  • 20.
    Conclusion We introduce andformalize the problem of server consolidation in site-hosting cloud environment. We develop GRMP, a generic gossip-based protocol for resource management that can be instantiated with different objectives. We develop an instance of GRMP which we call GRMP-Q, and which provides a heuristic solution to the server consolidation problem. We perform a simulation study of the performance of GRMP-Q, which indicates that the protocol qualitatively behaves as expected based on its design. For all parameter range investigated, the protocol uses at most 30% more machines of the cloud compared to an optimal solution.
  • 21.
    Future work Works relatingto resource allocation protocol GRMP-Q: Its convergence property for large CLF values. Its support for heterogeneous machines . Robustness regarding failures. Regarding the middleware architecture: Design a mechanism for deploying new sites. Extend design to span multiple clusters.