0% found this document useful (0 votes)
46 views8 pages

Acos: An Autonomic Management Layer Enhancing Commodity Operating Systems

The document describes an approach to enhance commodity operating systems with an autonomic layer that introduces smart, automatic resource allocation through self-management capabilities. The methodology realizes an Autonomic Operating System (AcOS) by leveraging monitors to observe system state, actuators to modify system parameters, and adaptation policies to decide how the system should react based on goals and constraints. The document evaluates this approach through case studies on Linux and FreeBSD dealing with performance and temperature management.

Uploaded by

Davide Bartolini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views8 pages

Acos: An Autonomic Management Layer Enhancing Commodity Operating Systems

The document describes an approach to enhance commodity operating systems with an autonomic layer that introduces smart, automatic resource allocation through self-management capabilities. The methodology realizes an Autonomic Operating System (AcOS) by leveraging monitors to observe system state, actuators to modify system parameters, and adaptation policies to decide how the system should react based on goals and constraints. The document evaluates this approach through case studies on Linux and FreeBSD dealing with performance and temperature management.

Uploaded by

Davide Bartolini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1

AcOS: an Autonomic Management Layer


Enhancing Commodity Operating Systems
Davide Basilio Bartolini ∗ , Matteo Carminati ∗ , Riccardo Cattaneo ∗ , Jacopo Panerati ∗ ,
Filippo Sironi ∗ † , Donatella Sciuto ∗
∗ Politecnico di Milano, Dipartimento di Elettronica e Informazione
{bartolini, carminati, sironi, sciuto}@elet.polimi.it,
{riccardo1.cattaneo, jacopo.panerati}@mail.polimi.it
† Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory
[email protected]

Abstract— Traditionally, operating systems have been in charge effect of ending the free lunch, moving a good part of the bur-
of serving as a convenient layer between applications and the den of improving performance onto the software developers’
bare-metal hardware by both providing an abstraction of hard- shoulders. The demanding ability of producing efficient and re-
ware itself and allocating available resources to the applications.
Both of these roles are becoming ever more important due to the liable parallel software adds to the already considerable bulk of
increasing complexity of modern computer architectures. The expertise needed for successfully coping with requirements in
rise of chip multi-processors brought to consolidation of multi- terms of computing performance, functionality, reliability, and
ple applications and systems (i.e., through hardware-supported constraints satisfaction required by today’s IT. This situation
virtualization) on a single piece of silicon. Heterogeneous System- leads to an increased need of pushing part of the system man-
on-Chips (SoCs) are becoming ubiquitous, while deep memory
hierarchies (i.e., multiple level of caches) are around since the agement effort into computing systems themselves, which can
beginning of this millennium. Moreover, modern systems have to be achieved leveraging autonomic computing techniques [2].
face the challenge of meeting a growing set of functional and Autonomic computing carries a proposal of embedding
non-functional requirements (e.g., performance, temperature, self-* properties into computing systems with the objective
power consumption, etc.); a system-wide strategy is needed and of maintaining “good-enough” working conditions. The term
operating systems may become the right actors for this role.
This paper describes an approach to enhance commodity autonomic strongly recalls the initial link between autonomic
operating systems with an autonomic layer, introducing smart, computing and the biological world, in fact, autonomic com-
automatic resource allocation through self-management capabil- puting systems were initially supposed to mimic the behavior
ities, which in turn leverage the availability of user and system- of the autonomic nervous systems of human beings. In details,
specified goals and constraints. The methodology for realiz- computing systems should be able to monitor themselves and
ing such an Autonomic Operating System (AcOS) is illustrated
throughout its building blocks: monitors, actuators, and adapta- their environment, detect significant changes and decide how
tion policies. The applicability and usefulness of this approach are to react, and act to execute such decisions towards user-
demonstrated by providing experimental evidence gathered from specified goals in terms of non-functional requirements (e.g.,
different case studies involving two different operating systems desired QoS, system health, etc.) [3].
(i.e., GNU/Linux and FreeBSD) and dealing with diverse goals Within this context, realizing support for autonomic man-
and constraints: performance and temperature requirements.
agement at the operating system level is crucial. The op-
erating system is the central system layer, serving as the
I. BACKGROUND AND I NTRODUCTION bridge between hardware components and applications by both
The turn of computer architectures from the well-established offering abstractions for more convenient access to hardware
single-core structure to multiple (and, possibly, heterogeneous) devices and managing the allocation of system resources (e.g.,
processing elements is a years-long trend. This paradigm shift processor or memory bandwidth). Hence, the operating system
has been dictated by physical and architectural motivations [1]. has both full access to the bare hardware and overall view of
Physical causes depend on the so-called power wall (i.e., the software being executed. The operating system is where
inability to increase the clock frequency without hurting power maximum information regarding the status of the computing
consumption), while architectural reasons relate to the well- system as a whole and maximum freedom of action over
known ILP-wall (i.e., the diminishing revenue due to further its behavior are available. For these reasons, we claim that
micro-architectural optimizations). These difficulties made it an appropriate infrastructure at the operating system level is
infeasible to keep up with Joy’s performance law and, to fundamental for being able to realize an autonomic apparatus
survive its commitment to performance improvements, the within a computing system, able to integrate the diverse self-
computer industry changed strategy, opening the multicore era. management techniques proposed in literature [4, 5, 6, 7, 8, 9,
In the single-core era, software performance improvements 10, 11]. This paper presents the proposed methodology and an
were provided by beefier single-core processors and appli- evaluation through case studies of different implementations
cations experienced the so-called “free lunch”, with free-of- of this approach to the realization of an Autonomic Operating
charge speedups obtainable by just upgrading to the latest System (AcOS) layer.
processor. The new parallel course in computer architectures, In the remainder of this paper, Section II describes the pro-
despite being due to architectural reasons, carries the side posed methodology, Section III presents case studies regarding
2

the implementation of this methodology over two well-known • Monitors provide access to information about the current
commodity operating systems and reports experimental results status of the system or of its environment and to user-
based on these case studies. Finally, Section IV gives an defined goals. This information is accessed by adaptation
overview of related works and Section V concludes the paper. policies to apply their decision mechanism.
The AcOS way of realizing the three phases of the ODA
II. AUTONOMIC O PERATING S YSTEM I NFRASTRUCTURE autonomic loop through autonomic components, along with
The creation of an infrastructure for autonomic computing more details about the overall approach, are further illustrated
in commodity operating systems opens to the possibility in the remainder of this section.
of supporting self-management capabilities for any system
supported by the OS of choice. For instance, enhancing B. Acting on the System
GNU/Linux, and, more specifically, the Linux kernel with an
Autonomic action on the computing system corresponds to
autonomic layer enables a wide variety of system and devices
the modification of one or more parameters which affect its
(ranging from mobiles to supercomputers) to take advantage
runtime behavior. An actuator is a wrapper for one or more
of the benefits carried by runtime self-adaptation. Autonomic
parameters, exporting a well-defined action under the form of
Operating System (AcOS) is aimed at this target, i.e., the
an Application Programming Interface (API) which can be
definition of a unified infrastructure for autonomic computing
used by adaptation policies. For instance, an actuator may
and its implementation over commodity operating systems.
export the action of assigning a certain number of cores to
The methodology at the base of AcOS leverages the Observe-
the tasks of a specific application. The implementation of
Decide-Act (ODA) autonomic loop [3]; Figure 1 represents an
the calls exported through an actuator’s API must take care
overview of a computing system enhanced with the autonomic
of performing all the operations required to reliably perform
layer. The AcOS infrastructure defines a model for hosting an
the action. Since the AcOS model places the autonomic
layer within the operating system, actuators can affect all the
App App App App App ... system parameters managed by the operating system through
Applications its subsystems (e.g., processor scheduler, device drivers, etc.).
Note that not all the possible actions affect a system parame-
ter: a different class of actuators can be defined at the level of a
D D Scheduler
O O single application. For instance, an application may be able to
A A File System
vary at runtime the number of its working threads. This kind
D D
O O Dev. Drivers of actuators, however, still requires support at the operating
A A
Operating System system level for providing a standardized API to be used by
adaptation policies with a broader view on the system than a
CPU(s) Memory Devices ... tight adaptation loop within the application itself. For instance,
HW Components to support the runtime adaptation of the number of threads in
an application, the autonomic layer would define an actuator
Fig. 1. Overall view of the AcOS infrastructure model. The autonomic layer API for adaptation policies and also another interface for
enhances the operating system implementing different autonomic loops, which
evaluate the system status and act within operating system boundary to meet
applications capable of this kind of self-adaptation to register
user and system-specified goals and constraints. and create a communication channel (e.g., based on shared
memory) through which be informed of calls to the actuator
array of ODA autonomic loops within the autonomic layer. by adaptation policies. In brief, an application-level actuator
These self-management loops gather information from running is just the same as a system-level one, but its backend is not
applications, the system, and the hardware components (ob- fully implemented within the operating system but it triggers
servation), analyze the data collected comparing it against the an action in registered applications. Thanks to this mechanism,
desired system status, determining how to intervene (decision), AcOS is able to make use of both system and application-level
and adjust available knobs (action). actuators, enabling more powerful adaptation; for instance,
when changing the number of cores assigned to an application,
A. AcOS system componenents an adaptation policy could also change the number of its
AcOS defines common interfaces and templates, standardiz- working threads to match the number of cores, obtaining the
ing these phases by the definition of three kinds of autonomic best possible scalability.
components:
• Actuators are wrappers around a parameter of the system C. Making Decisions
and serve as the actual knobs which can be adjusted by In the AcOS model, decisions are made by adaptation
the autonomic layer. policies, which are defined as a distributed decision making
• Adaptation policies represent the core autonomic com- infrastructure. Each adaptation policy chooses which monitors
ponent, as they actually take decisions on how to adapt and actuators to use. To do so, adaptation policies embed
the system behavior. Adaptation policies implement a a decision mechanism able to evaluate information about
decision mechanism to decide how to act on the knobs the system status and user-specified goals, which are made
provided by actuators. available by monitors (see the next Section II-D). The decision
3

mechanism within an adaptation policy can vary from a simple between system code, which usually runs in kernel mode and
heuristic based on empirical observation to more complex application code, which runs in user mode [13]1 . One of the
control techniques based on control theory or machine learn- challenges in the design of the autonomic infrastructure was to
ing. Adaptation policies can work at either application-specific allow efficient communications between the user and kernel-
(e.g., performance) or system-wide (e.g., system temperature) space, since diverse, interacting components can be placed
level or even at both levels. For instance, an adaptation policy in different spaces or a single component may need to be
seeking the goal of keeping the temperature of the processor partitioned. To avoid the use of system calls, which are time-
below a certain threshold by randomly injecting idle cycles in consuming and possibility harmful [14, 15], on hot code-paths,
the CPU(s) would work at system-level, but if the idle cycles the AcOS infrastructure leverages shared memory, mapping
injection is selectively performed taking into consideration portions of physical memory in both the user and kernel-
the performance of the running application, it would operate space allowing for fast, low-latency communication without
at application-level, while using system-level information and unnecessary overheads [10]. This is done considering both
goals. Clearly, in each case an appropriate actuator to allow security (i.e., managing permissions on the shared memory)
the action is needed. and efficiency (i.e., laying out data in a cache-friendly way
This definition of adaptation policy leads to interaction to avoid false-sharing [16] and other subtle issues). Thanks
problems in terms of control stability, for instance, when two to this approach, the most common operations are managed
adaptation policies want to use the same actuator in opposite in the fastest possible way, while system calls (or alternative
directions. Currently, this problem is solved in AcOS by interfaces, such as Linux’s sysfs and procfs) are em-
providing the user with the possibility of enabling or disabling ployed to handle uncommon operations [10]. According to this
each policy. One of the ongoing works is a study of the model, an autonomic component can be implemented across
application of distributed decision theory to this context to user and kernel-space, making use of specially mapped shared
automatize the activation and deactivation of conflicting adap- memory to inexpensively pass information across address
tation policies with a system-wide policies coordinator. Even spaces. Moreover, we foresee similar optimization between the
if this is currently an open problem, the AcOS infrastructure different agents involved in the global system optimization.
has the major advantage of keeping all the components at the
operating system level, thusly simplifying the connection of
the of the policies coordinator with the autonomic layer. III. C ASES FOR AUTONOMIC C OMPONENTS
We employed the methodology described in Section II to
D. Gathering Information extend two well-known commodity operating systems (i.e.,
The autonomic layer needs constantly updated informations Linux and FreeBSD) with an autonomic layer. This section
regarding the system status in order to be able to take reports case studies involving the creation of autonomic com-
informed decisions through the adaptation policies. In the ponents integrated in either Linux or FreeBSD and aimed
AcOS model, this information is made available by monitors. at tackling interesting runtime management problems in the
Just as actuators, monitors can simply be wrappers around an context of server systems. More into details, one of the
information source already available to the operating system interesting problems in server systems is the management
(e.g., the temperature of each core in the processor). In this of the quality of service (QoS) yielded by the running ap-
case, a monitor is said to be passive [12], since it simply wraps plications. A typical class of workloads in this scenario is
already available information. On the other hand, active [12] made of throughput-based application processing a stream of
monitors process in some way the information to synthesize data (e.g., a video encoder for online streaming). To enable
a metric for characterizing a certain runtime property (e.g., autonomic QoS management within this context, we designed
throughput). Other than making runtime information available and implemented an active monitor for throughput called Heart
through APIs, monitors also manage the specification of goals Rate Monitor (HRM) and different adaptation policies and
by the user and expose this additional data to adaptation actuators exporting actions over the task scheduler. Another
policies. Hence, a monitor is characterize by the metric of the compelling problem is that of thermal management, which
measurements it takes and it must expose at least two APIs: can be a significant issue in server farms and data centers. To
one towards adaptation policies, allowing to get the data on tackle this problem, we implemented a passive temperature
both the current status and the associated goal, and one towards monitor and an adaptation policy exploiting the available
users, allowing to define goals on the monitor’s measurement. actuators. The remaining of this section gives a brief overview
of these autonomic components and presents an experimental
E. Autonomic Components Interoperation characterization of what it’s possible to realize by applying
the proposed methodology. For the experimental evaluation,
In order to have a working autonomic layer, the three we employed workstations equipped with current quad-core
classes of autonomic components must be able to reliably and Intel processors (Xeon and Core i7) and applications from the
efficiently communicate. The AcOS model has been devised as PARSEC 2.1 [17] parallel benchmark suite.
an enhancement layer for contemporary commodity operating
systems, which, in most of the cases, feature a monolithic 1 A different design that exploits a micro-kernel, message-passing based,
kernel. In particular, our current design is strongly biased distributed operating system is foreseeable and maybe even more suitable for
towards UNIX-like operating systems, with a strong separation the distributed, agent-like environment we describe.
4

A. Throughput Monitoring: the Heart Rate Monitor G2 Corrective


P4 actions
G1
The Heart Rate Monitor (HRM) is the AcOS solution for P1
A D
measuring throughput and it has been employed to characterize P3 P5
the performance of parallel applications. As a motivating P2
Consumers
example, think of a set of parallel applications executing on G3
P6
a server system (e.g., video encoders for web streaming). Producers
When different applications run concurrently, they contend
for the available resources and this may lead to performance Heartbeats Heart rate

degradation below a desired QoS threshold (i.e., a minimum Goals
frame rate). In this case, a throughput measure can be used Goals
O
to indicate the frame rate at which each encoder is producing Heart Rate Monitor
the video to be streamed and the QoS goal could be easily set Fig. 2. Black box view of the Heart Rate Monitor. The inputs are heartbeats
referring to this meaningful metric. Having this information emitted by instrumented producers, organized in groups, and goals set by the
available at runtime enables dynamic adaptation (e.g., by users. HRM makes heart rate measures available to consumers, enabling the
creation of an ODA loop.
acting on scheduling) to driver applications matching their
QoS goal.
Within this context, HRM lets software developers instru- 2) Moving averages: Throughput measures are computed
ment the resource-demanding section (called the kernel, or by HRM as heart rates, i.e., for each group, emitted heartbeats
hotspot) of the application to emit a heartbeat for each count of all its producers over the elapsed time. Clearly,
encoded frame and provides throughput measures in terms of for such a measure, the considered time horizon matters:
a heart rate [10]. Moreover, according to the AcOS model, considering the whole execution time provides a smoothed
HRM permits to express application-level goals in term of average, while considering only the heartbeats emitted within
desired heart rate (which maps, in the example, to a desired a shorter time window discards the old history and allows
frame rate). to better highlight short-term trends. For this reason, HRM
A black-box view of HRM can be given according to a provides both a global heart rate and window heart rates,
producer/consumer model similar to that used in PEM [18]: allowing to tune the focus on longer- or shorter-term trends
producers emit heartbeats to signal events and consumers as required by the specific monitoring context. Initially [10],
access the heart rates computed by the monitor. Going back HRM was able to provide, apart from the global measure, only
to the example proposed above, the threads doing the frame one window heart rate at a time. A recent advancement allows
compression work within the video encoder are the producers, to output throughput measures on multiple moving averages
while the component acting on the scheduler to modify the at the same time, enabling to highlight different trends and
system behavior is a consumer. HRM acts as an interface providing richer information to consumers.
between producers and consumers, elaborating the heartbeats 3) Goal Specification: HRM aims at being a general-
emitted by the former into a metric used by the latter to purpose throughput monitor, serving as an infrastructure to be
evaluate the system status with respect to the goals and take used by a variety of consumers to get information regarding
corrective actions as needed. In this way, HRM permits a any phenomenon, within a computing system, measurable as
goal oriented approach towards the control of any phenomena a throughput. For this reason, it must allow for a simple yet
within a computing system which can be characterized by a generic way of setting a desired value for the heart rate of a
throughput measure. group. This is achieved by allowing to define a desired heart
Figure 2 represents this black box view, highlighting the rate range between a minimum and a maximum heart rate;
flow of information from producers to consumers through moreover, it is possible to tie the goal to a specific window
HRM, which enables the realization of the ODA control loop. heart rate. For instance, in the familiar example of the video
Flexibility and expressiveness are achieved in HRM through encoder, the minimum heart rate could be set to the minimum
the definition of monitoring groups (marked as Gi in Figure 2), frame rate to guarantee the desired QoS (e.g., 30 frames/s),
by providing measures as multiple moving averages, and by the maximum could be set to a value over which no sensible
letting tie the goal to one of the measures. These concepts are benefit would be achieved and the goal could be tied to a
further illustrated in the next subsubsections. certain time horizon according to how much buffering space
1) Groups: To provide flexibility and be useful in current is available for the encoded video.
and future parallel systems, HRM must support monitoring any The availability of different window heart rates enables
kind of parallel workload (i.e., multithreaded, multiprocessed, catching different trends in the execution of an application.
or any feasible mix of the two); this is attained by defining To provide an example, we implemented and deployed on a
monitoring groups. A group is a set of tasks (i.e., either workstation equipped with an AcOS-enhanced Linux 3.3 an
processes or threads) constituting the atomic monitoring entity. ad-hoc microbenchmark which attaches four producers to a
Referring to the video encoders example, the working threads monitoring group and starting to emit heartbeats as fast as
of each encoder would be grouped together, contributing to the possible in an infinite loop. Clearly, this simple application is
frame rate of that encoder. In general, each group represents very regular and it reaches a peak throughput of about 40 ×
an activity which throughput is to be measured. 106 heartbeats
s on the target workstation. Different performance
5

40
trends have been artificially created by running a variable global heart rate
number of instances of another CPU-bound application (i.e., window heart rate
the cpuburn stress test), to simulate collateral system load. 30

throughput [frames/s]
Figure 3 shows a plot of the microbenchmark’s throughput,
representing the global heart rate and six additional window
heart rates over different moving averages of sizes in the 20
set {1, 5, 10, 15, 30, 60}[seconds]2 . The execution presents six

50×106 10
(3)short load ends (4) long heavy load begins
(2) short load begins (5) long heavy load ends
throughput [heartbeats/s]

40 0
0 5 10 15 20 25 30
time [s]
(1) initial load ends
30 Fig. 4. Execution phases of the x264 video encoder working on the native
global input of the PARSEC 2.1 suite.
window60 s
window30 s
20 window15 s
window10 s
window5 s
window1 s B. Performance-Aware Scheduling
10×106
50 100 150 200 The information provided by HRM has been employed by
time [s]
different adaptation policies affecting the task scheduler in
Fig. 3. Global and six different window heart rates of an ad-hoc application different ways. These case studies are based on Linux 3.3
showing different performance trends.
and appropriate actuators were implemented to allow the
autonomic layer access two different scheduling parameters:
different phases: initially, up to the point marked (1), there is a
tasks priority (obtained by scaling their virtual runtime [10])
light additional load which makes the heart rates over shorter
and CPU affinity. The first actuator is used by an adaptation
time windows quite noisy. Then, this load terminates and the
policy built for the Metronome framework [10] and called
application reaches its peak performance up to point (2), when
Performance-Aware Fair Scheduler (PAFS), while the second
another external load is started. It is apparent from the figure
actuator is employed by another adaptation policy named
how this disturb is clearly visible looking at the heart rate
Performance-Aware Processor Allocator ((PA)2 ).
measured on the short term, while it could go almost unnoticed
1) Performance-Aware Fair Scheduling: The rationale be-
looking at heart rates take over longer periods. At point (3) the
hind PAFS is adapting the priority of the tasks belonging
second external load terminates and the microbenchmark goes
to HRM-instrumented applications according to whether their
back to its peak throughput but, at point (4), a heavier and
user-specified throughput goal is being attained or not. If an
longer-lasting load is applied. Again, it can be noticed how
application is running too slowly, i.e., under its minimum
heart rates measured on shorter time windows give a prompt
desired heart rate, its priority is increased, and conversely
feedback when a change in the performance happens, but tend
if it is running over its maximum desired heart rate. This
to become noisy when the execution becomes more regular.
simple scheme has been implemented in an adaptation policy
Finally, at point (5), the final load terminates and, after some
based on an heuristic decision mechanism, which affects the
more time, the experiment is concluded.
priority of the tasks belonging to instrumented applications
Using the right moving average can help identify execution
(i.e., comprised in an HRM group) by scaling their virtual
phases of a workload; for instance, Figure 4 shows a plot
runtime, which is the metric used by Linux’s Completely Fair
of an instance of the x264 video encoder application from
Scheduler (CFS) to choose, at each context switch, which task
the PARSEC 2.1 suite working on the reference input and
to execute next. To evaluate this adaptation policy, two 4-
instrumented to emit one heartbeat per encoded frame. The
threaded instances of the x264 video encoder have been run,
figure shows both the global and window heart rate, which
both instrumented with HRM and attached to different groups,
helps highlighting a much lighter phase in the execution, due
on a quad-core workstation equipped with an AcOS-enhanced
to the characteristics of the input. These two examples give
Linux 3.3. This time, the workload consists in encoding a
evidence of how proper instrumentation with HRM can yield
copy of the full-lenght Big Buck Bunny full HD video [19].
accurate runtime information, helping characterize applica-
Figure 5a shows the two encoders managed by the Linux
tions’ performance. Since these data are available at runtime,
Completely Fair Scheduler (CFS), which is perfectly fair in
in the AcOS layer, in both kernel and user-space, adaptation
assigning the bandwidth of the cores to the two instances of
policies can use them to pursue user-specified goals. This is
the same, application resulting in equal performance. The CFS
what is presented in the remainder of this section.
features a sophisticated mechanism to assign different CPU
bandwidths to different objects (e.g., applications); however,
this mechanism is difficult to use when systems and admin-
istrators are faced with high-level performance goals (e.g.,
2 Note that, when there are no enough data to compute a window heart rate frames/s). High-level performance goals make HRM shine
over its full size, the measure is still provided using the available data since it provides general performance measures understandable
6

150 150
throughput [frames/s]

throughput [frames/s]
100 100

50 50

x2641 x2642 x2641 x2642


0 0
0 50 100 150 200 250 0 50 100 150 200 250
time [s] time [s]

(a) Unmanaged instances of x264. (b) Managed instances of x264; the performance goals, in frames/s, are [30,60]
and [70,100] for the two instances.
Fig. 5. Window heart rate and its LOESS interpolation for each instance of x264.
30
by users, administrators, and systems and effective adaptation

throghput [frames/s]
global2 heart rate
global3 heart rate
policies can be designed and developed to exploit these data. In global4 heart rate
20 global heart rate
Figure 5b, two instances of x264 are executed with different
window heart rate
performance goals (i.e., the red and green areas, which are
respectively 30 to 60 frames/s and 70 to 100 frames/s) and are 10
driven by the adaptive scheduler towards their performance
goal, successfully exploiting the information HRM provides 0
to adjust their virtual runtimes (which boils down to assign- 4
workload
workload and cores

ing them different processor bandwidth). In Figure 5b, the cores


3
throughput of the slower instance of x264 receives a sudden
speedup when the faster instance of x264 teminates. This is 2
due to the fact that the adaptive scheduler from Metronome
1
is designed to account for performance goals whenever there
is resource contention within the system, and the maximum 0
heart rate is considered as a soft bound on the QoS, and not as 0 10 20 30 40 50 60
time [s]
a performance cap; when there is no contention applications
run as they were unmanaged and thus all the processor time Fig. 6. 4-threaded instance of x264 managed by the (PA)2 adaptation policy
towards a throughput goal of [9, 11] frames/second.
is given to the only running application.
A different interpretation of the maximum heart rate can
be given by an adaptation policy employing the CPU affinity
actuator, which allows to decide which cores a thread can be C. Adding Thermal-Awareness
run on. In fact, with this kind of action it is possible to both An interesting case study extending those regarding QoS
boost and limit the performance of a single application being management with performance goals comes from adding also
executed. (PA)2 is an adaptation policy implementing a core- system temperature constraints. This example has been imple-
allocation mechanism based on throughput goals as provided mented over FreeBSD 7.2, which got a porting of the HRM
by HRM. This adaptation policy features an autoregressive monitor and of the task priority actuator, an implementation
(AR) model to decide the number of cores to assign to each of a CPU temperature monitor, which is simply a wrapper
thread of a managed application and a workload estimation around the information available in the model-specific register
adaptive filter. Also (PA)2 was implemented on Linux 3.3 and (MSR), and an idle cycles injection actuator. In this case, an
evaluated on a quad-core workstation to manage a 4-threaded adaptation policy was implemented with the goal of managing
instance of the x264 application. Figure 6 represents the results performance under the constraints of a maximum working
of such evaluation. The black solid and dotted lines represent temperature. The decision mechanism is based on control
respectively the global and window heart rate of a managed run theory and the adaptation policy is split into two parts: one
of the encoder, while the red dashed lines show the throughput prioritizes tasks according to performance goals and the other
of the application when run with a fixed number of cores selectively injects idle cycles to cool down the processor.
assigned to its four threads. The bottom part of the plot shows This adaptation policy has been evaluated on a quad-core
the action of the adaptation policy in the CPU affinity actuator workstation running a patched version of FreeBSD 7.2 by
(i.e., number of cores to be assigned) and the estimation of the executing four different 4-threaded instances of the swaptions
current workload, used to weigh the decision of the number application from the PARSEC 2.1 suite. Figure 7 represents
of cores to be assigned. The experiment shows that, despite the results of this experiment. The black lines represent the
the changing workload, the autonomic layer is able to keep average temperature and the heart rate of the four applications
the performance of the application close to its goal. (which are very similar, since the scheduler tries to be fair) in
7

end of learning phase


1st run 2nd run 9th run 10th run
40000 4 14

core number and frequency step


100×103 13
throughput [sims/second]

12
11

heart rate [sims/s]


30000
3 10
9
8
20000 50 7
Min. Throughput (upper bound goal)
Max. Throughput (lower bound goal) 6
Throughput, with specified goal
2 5
10000 Throughput with specified goal, interpolation core number 4
Throughput, without specified goal frequency step 3
Average throughput, without specified goal, interpolation heart rate 2
0 1 1
0 200 400 600 800 1000 1200 1400 1600
80 time [s]
Fig. 8. Learning of an adaptation policy working on frequency scaling and
temperature [˚C]

core allocation through Adaptive Dynamic Programming (ADP).


60

IV. R ELATED W ORKS


40 Average temperature, without specified set point
Average temperature, with specified set point (60 ˚C)
This section provides an overview of works related to the
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280
time [s]
topic of this paper. Some of these works deal with an overall
approach to adaptivity at the OS level, while others focus on
Fig. 7. Four instances of swaptions, each one running with four threads
managed with performance and temperature goals. One of the instances is a specific topic such as monitoring or decision making.
run with a performance goal set between 37000 and 42000 Monte Carlo The need of a framework for measuring performance and
simulations. The maximum temperature constraint is set at 60 ◦ C. specifying applications’ goals, which is the problem tackled in
AcOS with the Heart Rate Monitor, is acknowledged within
the Tessellation OS [20, 11]. The authors notice that using low-
the base case. The red lines represent the temperature and the level information for determining applications performance is
heart rate of one of the four applications, which was privileged labor-intensive and error-prone, and that higher-level measures
with a higher performance goal (between the orange and green (e.g., frame rate), are more suitable for letting users and
lines, with the maximum desired temperature set to 60 ◦ C. This software developers state performance goals. A work going
experiment shows how it is possible, through coordinating the into this direction is Application Heartbeats [4, 21], a per-
use of different monitors and actuators, to selectively boost the formance monitoring infrastructure for self-adaptive systems
performance of one application while maintaining the desired which shares the basic ideas with HRM: developers are pro-
border conditions in term of maximum temperature. vided with a method for instrumenting applications to signal
progress (by issuing signals called heartbeats) and expressing
D. Learning Adaptation Policies performance goals.
An alternative approach to the creation of an adaptation Tesauro, Kephart, et al [22, 23, 24, 6, 7] designed and de-
policy is leveraging machine learning techniques to let the veloped Unity, a framework to build self-managing distributed
autonomic layer learn how to drive the actuators in order systems borrowing from the artificial intelligence community.
to achieve the desired goals. The AcOS methodology sup- Unity relies on a multi-agent approach to control the interac-
ports this kind of approach, as actuators can be controlled tion among autonomic elements and exploits utility-function
by reinforcement learning policies harnessing the autonomic to specify objectives and (hybrid) reinforcement learning to
infrastructure. As an example, consider the test case repre- acquire an effective amount of knowledge domain. Heo et
sented in Figure 8, where an adaptation policy is learned al [25] designed, implemented, and evaluated AdaptGuard, a
through Adaptive Dynamic Programming (ADP) to drive two framework for guarding autonomic systems from instability
actuators working on frequency scaling and core allocation caused by software anomalies and faults. This is somehow
for letting a 4-threaded instance of the swaptions benchmark complementary to that of reinforcement learning, which is
achieve a desired throughput setpoint. The application is run leveraged in AcOS to learn adaptation policies at runtime;
10 consecutive times and the plot shows both the application’s in fact AdaptGuard may be employed during the exploration
heart rate (in red) and the actions taken by the policy. The first phase in which it is likely the reinforcement learning efforts
two runs are used for exploration (i.e., learning), while in the may result in a far from optimal policy.
remaining eight the learnt policy is applied, showing how the The problem of maximizing performance and/or providing
machine learning engine is mostly capable of satisfying the (global or per-application) quality-of-service (QoS) in chip
performance goal set to 50000 Monte Carlo simulations per multi-processors (CMPs) for both multi-threaded applications
second. and multi-programmed mixes, which is tackled by some of
the proposed autonomic components, has received quite a lot
of attention in the latest years. Many approaches focused
on the management and partitioning of the cache hierar-
chy [26, 27, 28, 29, 30]. Other researchers addressed the
8

problem at the very end of the memory hierarchy altering [11] Juan A Colmenares, Sarah Bird, Henry Cook, Paul Pearce, David Zhu, John Shalf,
Steven Hofmeyr, Krste Asanovic, and John Kubiatowicz. Resource Management
the behavior of memory controllers [31, 32, 33, 34]. Other in the Tessellation Manycore OS. In Proc. of the 2nd Workshop on Hot Topics in
works addressed the problem through dynamic assignment of Parallelism, 2010.
[12] Markus C. Huebscher and Julie a. McCann. A survey of Autonomic Computing –
processors [35, 4, 36] and CPU bandwidth [10]. Researchers degrees, models, and applications. ACM Comput. Surv., 40(3), 2008.
have also tackled the problem with more comprehensive [13] Andrew S Tanenbaum. Modern Operating Systems. Prentice Hall PTR, Upper
Saddle River, NJ, USA, 2nd edition, 2001.
frameworks capable of managing more than one resource. [14] Livio Soares and Michael Stumm. FlexSC: Flexible System Call Scheduling with
Bitirgen et al. [37] exploited machine learning to distribute Exception-Less System Calls. In Proceedings of the 9th USENIX Conference on
Operating Systems Design and Implementation, 2010.
shared resources on a CMP. Srikantaiah et al [27] devised a [15] Livio Soares and Michael Stumm. Exception-Less System Calls for Event-Driven
strategy to partition both processors and caches. Hoffmann et Servers. In Proceedings of the 2011 USENIX Annual Technical Conference, 2011.
[16] Eddy Z. Zhang, Yunlian Jiang, and Xipeng Shen. Does cache sharing on modern
al. [5] proposed SElf-awarE Computing (SEEC), harnessing CMP matter to the performance of contemporary multithreaded programs? In
both control theory and machine learning to meet user-define Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice
of Parallel Programming, 2010.
performance goals allocating cores and scaling frequencies. [17] Princeton University. Parsec benchmark suite website, November 2011.
Sharifi et al. [9] presented METE, a framework for meeting [18] Calin Cascaval, Evelyn Duesterwald, Peter F. Sweeney, and Robert W. Wisniewski.
Performance and environment monitoring for continuous program optimization.
QoS through control theory. IBM Journal of Research and Development, 50(2.3), 2006.
[19] Big Buck Bunny. http://www.bigbuckbunny.org/.
V. C ONCLUSIONS AND F UTURE W ORKS [20] Rose Liu, Kevin Klues, Sarah Bird, Steven Hofmeyr, Krste Asanović, and John
Kubiatowicz. Tessellation: Space-Time Partitioning in a Manycore Client OS. In
This paper presents AcOS: a methodology for enhancing Proc. of the 1st Workshop on Hot Topics in Parallelism, 2009.
[21] Henry Hoffmann, Jonathan Eastep, Marco D. Santambrogio, Jason E. Miller, and
commodity operating systems with an autonomic management Anant Agarwal. Application Heartbeats for Software Performance and Health.
layer. This approach is based on three basic blocks, collectively In Proceedings of the 15th Symposium on Principles and Practice of Parallel
Programming, pages 347–348, 2010.
called autonomic components, which ensure the availability of [22] Gerald Tesauro, David M. Chess, William E. Walsh, Rajarshi Das, Alla Segal,
information and goals, manage adaptation decisions, and allow Ian Whalley, Jeffrey O. Kephart, and Steve R. White. A Multi–Agent Systems
Approach to Autonomic Computing. In Proceedings of the Third International
to modify system parameters through appropriate knobs. This Joint Conference on Autonomous Agents and Multiagent Systems, pages 464–471,
methodology has been applied towards the enhancement of 2004.
[23] G. Tesauro, R. Das, W.E. Walsh, and J.O. Kephart. Utility–Function–Driven
two widespread opensource kernels (i.e., Linux and FreeBSD) Resource Allocation in Autonomic Systems. In Proceedings of the Second
with autonomic management of performance and temperature International Conference on Autonomic Computing, pages 342–343, 2005.
[24] G. Tesauro, N.K. Jong, R. Das, and M.N. Bennani. A Hybrid Reinforcement
requirements. The case studies show that the applications of Learning Approach to Autonomic Resource Allocation. In Proceedings of the
the proposed methodology able achieve the different goals. Third Internation Conference on Autonomic Computing, pages 65–73, 2006.
[25] Jin Heo and Tarek Abdelzaher. AdaptGuard: Guarding Adaptive Systems from
This work on AcOS opens the way to many developments; Instability. In Proceedings of the 6th International Conference on Autonomic
in particular, we are working towards the implementation of Computing, pages 77–86, 2009.
[26] Jichuan Chang and Gurindar S. Sohi. Cooperative Cache Partitioning for Chip
parts of the model which have not been extensively experimen- Multiprocessors. In Proceedings of the 21st Annual International Conference on
tally evaluated yet (e.g., the use of application-level actuators Supercomputing, 2007.
[27] Shekhar Srikantaiah, Reetuparna Das, Asit K. Mishra, Chita R. Das, and Mahmut
in concert with system-level ones). Another open issue is Kandemir. A Case for Integrated Processor-Cache Partitioning in Chip Multi-
investigating the possibility of applying distributed decision processors. In Proceedings of the Conference on High Performance Computing
Networking, Storage and Analysis, 2009.
theory to automatize the activation and deactivation of possibly [28] S. Srikantaiah, E. Kultursay, Tao Zhang, M. Kandemir, M.J. Irwin, and Yuan Xie.
conflicting adaptation policies with a system-wide coordinator. MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy. In 2011
IEEE 17th International Symposium on High Performance Computer Architecture,
2011.
R EFERENCES [29] Mahmut Kandemir, Taylan Yemliha, and Emre Kultursay. A Helper Thread
Based Dynamic Cache Partitioning Scheme for Multithreaded Applications. In
[1] Samuel H. Fuller and Lynette I. Millett. Computing Performance: Game Over or
Proceedings of the 48th Design Automation Conference, 2011.
Next Level? Computer, 44(1), 2011.
[30] Akbar Sharifi, Shekhar Srikantaiah, Mahmut Kandemir, and Mary Jane Irwin.
[2] Jeffrey O. Kephart and David M. Chess. The Vision of Autonomic Computing.
Courteous Cache Sharing: Being Nice to Others in Capacity Management. In
Computer, 36(1):41–50, 2003.
Proceedings of the 49th Annual Design Automation Conference, 2012 (to appear).
[3] Mazeiar Salehie and Ladan Tahvildari. Self-Adaptive Software: Landscape and
[31] Nauman Rafique, Won-Taek Lim, and Mithuna Thottethodi. Effective Management
Research Challenges. ACM Trans. Auton. Adapt. Syst., 4(2), 2009.
of DRAM Bandwidth in Multicore Processors. In Proceedings of the 16th
[4] Henry Hoffmann, Jonathan Eastep, Marco D. Santambrogio, Jason E. Miller,
International Conference on Parallel Architecture and Compilation Techniques,
and Anant Agarwal. Application Heartbeats: A Generic Interface for Specifying
2007.
Program Performance and Goals in Autonomous Computing Environments. In
[32] Onur Mutlu and Thomas Moscibroda. Stall-Time Fair Memory Access Scheduling
Proceedings of the 7th International Conference on Autonomic computing, pages
for Chip Multiprocessors. In Proceedings of the 40th Annual IEEE/ACM Interna-
79–88, 2010.
tional Symposium on Microarchitecture, 2007.
[5] Henry Hoffmann, Martina Maggio, Marco D. Santambrogio, Alberto Leva, and
[33] Engin Ipek, Onur Mutlu, José F. Martı́nez, and Rich Caruana. Self-Optimizing
Anant Agarwal. SEEC: A Framework for Self–aware Management of Multicore
Memory Controllers: A Reinforcement Learning Approach. In Proceedings of the
Resources. Technical Report MIT–CSAIL–TR–2011–016, Massachusetts Institute
35th Annual International Symposium on Computer Architecture, 2008.
of Technology, Computer Science and Artificial Intelligence Laboratory, 2011.
[34] Fang Liu and Yan Solihin. Studying the Impact of Hardware Prefetching and
[6] Jeffrey O. Kephart and Rajarshi Das. Achieving Self–Management via Utility
Bandwidth Partitioning in Chip-Multiprocessors. In Proceedings of the ACM
Functions. IEEE Internet Computing, 11(1):40–48, 2007.
SIGMETRICS Joint International Conference on Measurement and Modeling of
[7] Gerald Tesauro. Reinforcement Learning in Autonomic Computing: A Manifesto
Computer Systems, 2011.
and Case Studies. IEEE Internet Computing, 11(1):22–30, 2007.
[35] Julita Corbalán, Xavier Martorell, and Jesús Labarta. Performance-Driven Processor
[8] Robert W. Wisniewski, Dilma Da Silva, Marc A. Auslander, Orran Krieger, Michal
Allocation. In Proceedings of the 4th Symposium on Operating System Design and
Ostrowski, and Bryan S. Rosenburg. K42: lessons for the OS community. SIGOPS
Implementation, 2000.
Oper. Syst. Rev., 42(1), 2008.
[36] M. Maggio, H. Hoffmann, M.D. Santambrogio, A. Agarwal, and A. Leva. Control-
[9] Akbar Sharifi, Shekhar Srikantaiah, Asit K. Mishra, Mahmut Kandemir, and
ling software applications via resource allocation within the Heartbeats framework.
Chita R. Das. METE: Meeting End-to-End QoS in Multicores through System-
In Proceedings of the 49th Conference on Decision and Control, pages 3736–3741,
Wide Resource Management. In Proceedings of the ACM SIGMETRICS Joint
2010.
International Conference on Measurement and Modeling of Computer Systems,
[37] Ramazan Bitirgen, Engin Ipek, and Jose F. Martinez. Coordinated Management
2011.
of Multiple Interacting Resources in Chip Multiprocessors: A Machine Learning
[10] Filippo Sironi, Davide B. Bartolini, Simone Campanoni, Fabio Cancare, Henry
Approach. In Proceedings of the 41st Annual IEEE/ACM International Symposium
Hoffmann, Donatella Sciuto, and Marco D. Santambrogio. Metronome: Operating
on Microarchitecture, 2008.
System Level Performance Management via Self-Adaptive Computing. In Proc. of
the 49th Design Automation Conference, 2012.

You might also like