Cloud Computing All Unit Notes
Cloud Computing All Unit Notes
DEPARTMENT OF CSE
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
UNIT I INTRODUCTION
Introduction to Cloud Computing – Definition of Cloud – Evolution of Cloud Computing –
Underlying Principles of Parallel and Distributed Computing – Cloud Characteristics – Elasticity
in Cloud – On-demand Provisioning.
1.1INTRODUCTION
EVOLUTION OF DISTRIBUTED COMPUTING
Grids enable access to shared computing power and storage capacity from your desktop.
Clouds enable access to leased computing power and storage capacity from your desktop.
• Grids are an open source technology. Resource users and providers alike can understand
and contribute to the management of their grid
• Clouds are a proprietary technology. Only the resource provider knows exactly how
their cloud manages data, job queues, security requirements and so on.
• The concept of grids was proposed in 1995. The Open science grid (OSG) started in 1995
The EDG (European Data Grid) project began in 2001.
• In the late 1990`s Oracle and EMC offered early private cloud solutions . However the
term cloud computing didn't gain prominence until 2007.
SCALABLE COMPUTING OVER THE INTERNET
Instead of using a centralized computer to solve computational problems, a parallel and
distributed computing system uses multiple computers to solve large-scale problems over the
Internet. Thus, distributed computing becomes data-intensive and network-centric.
The Age of Internet Computing
o high-performance computing (HPC) applications is no longer optimal for measuring
system performance
o The emergence of computing clouds instead demands high-throughput computing (HTC)
systems built with parallel and distributed computing technologies
o We have to upgrade data centers using fast servers, storage systems, and high-bandwidth
networks.
The Platform Evolution
o From 1950 to 1970, a handful of mainframes, including the IBM 360 and CDC 6400
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
o From 1960 to 1980, lower-cost minicomputers such as the DEC PDP 11 and VAX
Series
o From 1970 to 1990, we saw widespread use of personal computers built with VLSI
microprocessors.
o From 1980 to 2000, massive numbers of portable computers and pervasive devices
appeared in both wired and wireless applications
o Since 1990, the use of both HPC and HTC systems hidden in clusters, grids, or
Internet clouds has proliferated
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
HTC applications than on HPC applications. Clustering and P2P technologies lead to
the development of computational grids or data grids.
For many years, HPC systems emphasize the raw speed performance. The speed of
HPC systems has increased from Gflops in the early 1990s to now Pflops in 2010.
The development of market-oriented high-end computing systems is undergoing a
strategic change from an HPC paradigm to an HTC paradigm. This HTC paradigm
pays more attention to high-flux computing. The main application for high-flux
computing is in Internet searches and web services by millions or more users
simultaneously. The performance goal thus shifts to measure high throughput or the
number of tasks completed per unit of time. HTC technology needs to not only
improve in terms of batch processing speed, but also address the acute problems of
cost, energy savings, security, and reliability at many data and enterprise computing
centers.
Advances in virtualization make it possible to see the growth of Internet clouds as a
new computing paradigm. The maturity of radio-frequency identification (RFID),
Global Positioning System (GPS), and sensor technologies has triggered the
development of the Internet of Things (IoT). These new paradigms are only briefly
introduced here.
The high-technology community has argued for many years about the precise
definitions of centralized computing, parallel computing, distributed computing, and
cloud computing. In general, distributed computing is the opposite of centralized
computing. The field of parallel computing overlaps with distributed computing to a
great extent, and cloud computing overlaps with distributed, centralized, and parallel
computing.
Terms
Centralized computing
This is a computing paradigm by which all computer resources are centralized in
one physical system. All resources (processors, memory, and storage) are fully shared and
tightly coupled within one integrated OS. Many data centers and supercomputers are
centralized systems, but they are used in parallel, distributed, and cloud computing
applications.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
• Parallel computing
In parallel computing, all processors are either tightly coupled with centralized
shared memory or loosely coupled with distributed memory. Inter processor
communication is accomplished through shared memory or via message passing.
Acomputer system capable of parallel computing is commonly known as a parallel
computer. Programs running in a parallel computer are called parallel programs. The
process of writing parallel programs is often referred to as parallel programming.
• Distributed computing This is a field of computer science/engineering that studies
distributed systems. A distributed system consists of multiple autonomous computers,
each having its own private memory, communicating through a computer network.
Information exchange in a distributed system is accomplished through message passing.
A computer program that runs in a distributed system is known as a distributed program.
The process of writing distributed programs is referred to as distributed programming.
• Cloud computing An Internet cloud of resources can be either a centralized or a
distributed computing system. The cloud applies parallel or distributed computing, or
both. Clouds can be built with physical or virtualized resources over large data centers
that are centralized or distributed. Some authors consider cloud computing to be a form
of utility computing or service computing . As an alternative to the preceding terms,
some in the high-tech community prefer the term concurrent computing or concurrent
programming. These terms typically refer to the union of parallel computing and
distributing computing, although biased practitioners may interpret them differently.
• Ubiquitous computing refers to computing with pervasive devices at any place and time
using wired or wireless communication. The Internet of Things (IoT) is a networked
connection of everyday objects including computers, sensors, humans, etc. The IoT is
supported by Internet clouds to achieve ubiquitous computing with any object at any
place and time. Finally, the term Internet computing is even broader and covers all
computing paradigms over the Internet. This book covers all the aforementioned
computing paradigms, placing more emphasis on distributed and cloud computing and
their working systems, including the clusters, grids, P2P, and cloud systems.
Internet of Things
• The traditional Internet connects machines to machines or web pages to web pages. The
concept of the IoT was introduced in 1999 at MIT .
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
• The IoT refers to the networked interconnection of everyday objects, tools, devices, or
computers. One can view the IoT as a wireless network of sensors that interconnect all
things in our daily life.
• It allows objects to be sensed and controlled remotely across existing network
infrastructure
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Figure 1.2shows the architecture of a typical server cluster built around a low-latency,
high bandwidth interconnection network. This network can be as simple as a SAN (e.g., Myrinet)
or a LAN (e.g., Ethernet).
• To build a larger cluster with more nodes, the interconnection network can be built with
multiple levels of Gigabit Ethernet, or InfiniBand switches.
• Through hierarchical construction using a SAN, LAN, or WAN, one can build scalable
clusters with an increasing number of nodes. The cluster is connected to the Internet via a
virtual private network (VPN) gateway.
• The gateway IP address locates the cluster. The system image of a computer is decided
by the way the OS manages the shared cluster resources.
Most clusters have loosely coupled node computers. All resources of a server node are
managed by their own OS. Thus, most clusters have multiple system images as a result of having
many autonomous nodes under different OS control.
1.3.1.2 Single-System Image(SSI)
• Ideal cluster should merge multiple system images into a single-system image (SSI).
• Cluster designers desire a cluster operating system or some middleware to support SSI at
various levels, including the sharing of CPUs, memory, and I/O across all cluster nodes.
An SSI is an illusion created by software or hardware that presents a collection of resources as
one integrated, powerful resource. SSI makes the cluster appear like a single machine to the user.
A cluster with multiple system images is nothing but a collection of independent computers.
1.3.1.3 Hardware, Software, and Middleware Support
• Clusters exploring massive parallelism are commonly known as MPPs. Almost all HPC
clusters in the Top 500 list are also MPPs.
• The building blocks are computer nodes (PCs, workstations, servers, or SMP), special
communication software such as PVM, and a network interface card in each computer
node.
Most clusters run under the Linux OS. The computer nodes are interconnected by a high-
bandwidth network (such as Gigabit Ethernet, Myrinet, InfiniBand, etc.). Special cluster
middleware supports are needed to create SSI or high availability (HA). Both sequential and
parallel applications can run on the cluster, and special parallel environments are needed to
facilitate use of the cluster resources. For example, distributed memory has multiple images.
Users may want all distributed memory to be shared by all servers by forming distributed shared
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
memory (DSM). Many SSI features are expensive or difficult to achieve at various cluster
operational levels. Instead of achieving SSI, many clusters are loosely coupled machines. Using
virtualization, one can build many virtual clusters dynamically, upon user demand.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Reasons to adapt the cloud for upgraded Internet applications and web services:
1. Desired location in areas with protected space and higher energy efficiency
2. Sharing of peak-load capacity among a large pool of users, improving overall utilization
3. Separation of infrastructure maintenance duties from domain-specific application development
4. Significant reduction in cloud computing cost, compared with traditional computing
paradigms
5. Cloud computing programming and application development
6. Service and data discovery and content/service distribution
7. Privacy, security, copyright, and reliability issues
8. Service agreements, business models, and pricing policies
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Cloud computing is using the internet to access someone else's software running on
someone else's hardware in someone else's data center.
The user sees only one resource ( HW, Os) but uses virtually multiple os. HW resources
etc..
Cloud architecture effectively uses virtualization
A model of computation and data storage based on “pay as you go” access to “unlimited”
remote data center capabilities
A cloud infrastructure provides a framework to manage scalable, reliable, on-demand
access to applications
Cloud services provide the “invisible” backend to many of our mobile applications
High level of elasticity in consumption
Historical roots in today’s Internet apps
Search, email, social networks, e-com sites
File storage (Live Mesh, Mobile Me)
1.2 Definition
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Essential Characteristics 3
Resource pooling.
◦ The provider’s computing resources are pooled to serve multiple consumers using
a multi-tenant model, with different physical and virtual resources dynamically
assigned and reassigned according to consumer demand.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Essential Characteristics 4
Rapid elasticity.
◦ Capabilities can be rapidly and elastically provisioned - in some cases
automatically - to quickly scale out; and rapidly released to quickly scale in.
◦ To the consumer, the capabilities available for provisioning often appear to be
unlimited and can be purchased in any quantity at any time.
Essential Characteristics 5
Measured service.
◦ Cloud systems automatically control and optimize resource usage by leveraging a
metering capability at some level of abstraction appropriate to the type of service.
◦ Resource usage can be monitored, controlled, and reported - providing
transparency for both the provider and consumer of the service.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
The consumer does not manage or control the underlying cloud infrastructure including
network, servers, operating systems, storage, data or even individual application
capabilities, with the possible exception of limited user specific application configuration
settings.
SaaS providers
Google’s Gmail, Docs, Talk etc
Microsoft’s Hotmail, Sharepoint
SalesForce,
Yahoo, Facebook
Infrastructure as a Service (IaaS)
IaaS is the delivery of technology infrastructure ( mostly hardware) as an on demand,
scalable service
◦ Usually billed based on usage
◦ Usually multi tenant virtualized environment
◦ Can be coupled with Managed Services for OS and application support
◦ User can choose his OS, storage, deployed app, networking components
◦
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Consumer is able to deploy and run arbitrary software, which may include operating
systems and applications.
The consumer does not manage or control the underlying cloud infrastructure but has
control over operating systems, storage, deployed applications, and possibly limited
control of select networking components (e.g., host firewalls).
IaaS providers
Amazon Elastic Compute Cloud (EC2)
◦ Each instance provides 1-20 processors, upto 16 GB RAM, 1.69TB storage
RackSpace Hosting
◦ Each instance provides 4 core CPU, upto 8 GB RAM, 480 GB storage
Joyent Cloud
◦ Each instance provides 8 CPUs, upto 32 GB RAM, 48 GB storage
Go Grid
◦ Each instance provides 1-6 processors, upto 15 GB RAM, 1.69TB storage
PaaS providers
Google App Engine
◦ Python, Java, Eclipse
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Microsoft Azure
◦ .Net, Visual Studio
Sales Force
◦ Apex, Web wizard
TIBCO,
VMware,
Zoho
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Can be slow:
◦ Even with a fast connection, web-based applications can sometimes be slower
than accessing a similar software program on your desktop PC.
Disparate Protocols :
◦ Each cloud systems uses different protocols and different APIs – Standards yet to
evolve.
II Hardware Evolution
• In 1930, binary arithmetic was developed
➢ computer processing technology, terminology, and programming languages.
• In 1939,Electronic computer was developed
➢ Computations were performed using vacuum-tube technology.
• In 1941, Konrad Zuse's Z3 was developed
➢ Support both floating-point and binary arithmetic.
There are four generations
• First Generation Computers
• Second Generation Computers
• Third Generation Computers
• Fourth Generation Computers
a.First Generation Computers
Time Period : 1942 to 1955
Technology : Vacuum Tubes
Size : Very Large System
Processing : Very Slow
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Examples:
1.ENIAC (Electronic Numerical Integrator and Computer)
2.EDVAC(Electronic Discrete Variable Automatic Computer)
Advantages:
• It made use of vacuum tubes which was the advanced technology at that time
• Computations were performed in milliseconds.
Disadvantages:
• very big in size, weight was about 30 tones.
• very costly.
• Requires more power consumption
•Large amount heat was generated.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Advantages:
• Fastest in computation and size get reduced as compared to the previous generation of
computer. Heat generated is small.
• Less maintenance is required.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Disadvantages:
• The Microprocessor design and fabrication are very complex.
• Air Conditioning is required in many cases
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
• NLS was designed to cross-reference research papers for sharing among geographically
distributed researchers.
• In the 1980s, Web was developed in Europe by Tim Berners-Lee and Robert Cailliau
d.Building a Common Interface to the Internet
• Betters-Lee developed the first web browser featuring an integrated editor that could
create hypertext documents.
• Following this initial success, Berners-Lee enhanced the server and browser by adding
support for the FTP (File Transfer protocol)
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
• The Globus Toolkit is an open source software toolkit used for building grid systems and
applications
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
• Early examples of MPP systems were the Distributed ArrayProcessor, the Goodyear
MPP, the Connection Machine, and the Ultracomputer
• MPP machines are not easy to program, but for certain applications, such as data mining,
they are the best solution
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
• The computing era started with development in hardware architectures, which actually
enabled the creation of system software – particularly in the area of compilers and
operating systems – which support the management of such systems and the
development of applications
• The term parallel computing and distributed computing are often used interchangeably,
even though they mean slightly different things.
• The term parallel implies a tightly coupled system, where as distributed systems
refers to a wider class of system, including those that are tightly coupled.
• More precisely, the term parallel computing refers to a model in which the
computation is divided among several processors sharing the same memory.
• The architecture of parallel computing system is often characterized by the
homogeneity of components: each processor is of the same type and it has the same
capability as the others.
• The shared memory has a single address space, which is accessible to all the processors.
• Parallel programs are then broken down into several units of execution that can be
allocated to different processors and can communicate with each other by means of
shared memory.
• Originally parallel systems are considered as those architectures that featured multiple
processors sharing the same physical memory and that were considered a single
computer.
– Over time, these restrictions have been relaxed, and parallel systems now include
all architectures that are based on the concept of shared memory, whether this is
physically present or created with the support of libraries, specific hardware, and
a highly efficient networking infrastructure.
– For example: a cluster of which of the nodes are connected through an InfiniBand
network and configured with distributed shared memory system can be considered
as a parallel system.
• The term distributed computing encompasses any architecture or system that allows the
computation to be broken down into units and executed concurrently on different
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
computing elements, whether these are processors on different nodes, processors on the
same computer, or cores within the same processor.
• Distributed computing includes a wider range of systems and applications than parallel
computing and is often considered a more general term.
• Even though it is not a rule, the term distributed often implies that the locations of the
computing elements are not the same and such elements might be heterogeneous in terms
of hardware and software features.
• Classic examples of distributed computing systems are
– Computing Grids
– Internet Computing Systems
a.Parallel Processing
• Processing of multiple tasks simultaneously on multiple processors is called parallel
processing.
• The parallel program consists of multiple active processes ( tasks) simultaneously solving
a given problem.
• A given task is divided into multiple subtasks using a divide-and-conquer technique, and
each subtask is processed on a different central processing unit (CPU).
• Programming on multi processor system using the divide-and-conquer technique is called
parallel programming.
• Many applications today require more computing power than a traditional sequential
computer can offer.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
• Parallel Processing provides a cost effective solution to this problem by increasing the
number of CPUs in a computer and by adding an efficient communication system
between them.
• The workload can then be shared between different processors. This setup results in
higher computing power and performance than a single processor a system offers.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
• heterogeneous computing.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
the master assigns the jobs to slave PEs and, on completion, they inform the master,
which in turn collects results.
• These approaches can be utilized in different levels of parallelism.
d. Levels of Parallelism
• Levels of Parallelism are decided on the lumps of code ( grain size) that can be a
potential candidate of parallelism.
• The table shows the levels of parallelism.
• All these approaches have a common goal
– To boost processor efficiency by hiding latency.
– To conceal latency, there must be another thread ready to run whenever a lengthy
operation occurs.
• The idea is to execute concurrently two or more single-threaded applications. Such as
compiling, text formatting, database searching, and device simulation.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Levels of Parallelism
e. Laws of Caution
• Studying how much an application or a software system can gain from parallelism.
• In particular what need to keep in mind is that parallelism is used to perform multiple
activities together so that the system can increase its throughput or its speed.
• But the relations that control the increment of speed are not linear.
• For example: for a given n processors, the user expects speed to be increase by in times.
This is an ideal situation, but it rarely happens because of the communication overhead.
• Here two important guidelines to take into account.
– Speed of computation is proportional to the square root of the system cost; they
never increase linearly. Therefore, the faster a system becomes, the more
expensive it is to increase its speed
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
provides the basic services for inter process communication (IPC), process scheduling
and management, and resource management in terms of file system and local devices.
• Taken together these two layers become the platform on top of which specialized
software is deployed to turn a set of networked computers into a distributed system
• Although a distributed system comprises the interaction of several layers, the middleware
layer is the one that enables distributed computing, because it provides a coherent and
uniform runtime environment for applications.
• There are many different ways to organize the components that, taken together, constitute
such an environment.
• The interactions among these components and their responsibilities give structure to the
middleware and characterize its type or, in other words, define its architecture.
• Architectural styles aid in understanding the classifying the organization of the software
systems in general and distributed computing in particular.
• The use of well-known standards at the operating system level and even more at the
hardware and network levels allows easy harnessing of heterogeneous components and
their organization into a coherent and uniform system.
• For example; network connectivity between different devices is controlled by standards,
which allow them into interact seamlessly.
• Design patterns help in creating a common knowledge within the community of software
engineers and developers as to how to structure the relevant of components within an
application and understand the internal organization of software applications.
• Architectural styles do the same for the overall architecture of software systems.
• The architectural styles are classified into two major classes
• Software Architectural styles : Relates to the logical organization of the software.
• System Architectural styles: styles that describe the physical organization of
distributed software systems in terms of their major components.
Software Architectural Styles
• Software architectural styles are based on the logical arrangement of software
components.
• They are helpful because they provide an intuitive view of the whole system, despite its
physical deployment.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
• They also identify the main abstractions that are used to shape the components of the
system and the expected interaction patterns between them.
Data Centered Architectures
• These architectures identify the data as the fundamental element of the software
system, and access to shared data is the core characteristics of the data-centered
architectures.
• Within the context of distributed and parallel computing systems, integrity of data is
overall goal for such systems.
• The repository architectural style is the most relevant reference model in this category.
It is characterized by two main components – the central data structure, which represents
the current state of the system, and a collection of independent component, which operate
on the central data.
• The ways in which the independent components interact with the central data structure
can be very heterogeneous.
• In particular repository based architectures differentiate and specialize further into
subcategories according to the choice of control discipline to apply for the shared data
structure. Of particular interest are databases and blackboard systems.
Black board Architectural Style
• The black board architectural style is characterized by three main components:
– Knowledge sources: These are entities that update the knowledge base that is
maintained in the black board.
– Blackboard: This represents the data structure that is shared among the knowledge
sources and stores the knowledge base of the application.
– Control: The control is the collection of triggers and procedures that govern the
interaction with the blackboard and update the status of the knowledge base.
Data Flow Architectures
• Access to data is the core feature; data-flow styles explicitly incorporate the pattern of
data-flow, since their design is determined by an orderly motion of data from component
to component, which is the form of communication between them.
• Styles within this category differ in one of the following ways: how the control is exerted,
the degree of concurrency among components, and the topology that describes the flow
of data.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
-
-
-
-
-
- Symmetric architectures in which all the components, called peers, play the same role
and incorporate both client and server capabilities of the client/server model.
- More precisely, each peer acts as a server when it processes requests from other peers
and as a client when it issues requests to other peers.
Peer-to-Peer architectural Style
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Message-based communication
• The abstraction of message has played an important role in the evolution of the model
and technologies enabling distributed computing.
• The definition of distributed computing – is the one in which components located at
networked computers communicate and coordinate their actions only by passing
messages. The term messages, in this case, identify any discrete amount of information
that is passed from one entity to another. It encompasses any form of data representation
that is limited in size and time, whereas this is an invocation to a remote procedure or a
serialized object instance or a generic message.
• The term message-based communication model can be used to refer to any model for
IPC.
• Several distributed programming paradigms eventually use message-based
communication despite the abstractions that are presented to developers for programming
the interactions of distributed components.
• Here are some of the most popular and important:
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Message Passing: This paradigm introduces the concept of a message as the main
abstraction of the model. The entities exchanging information explicitly encode in the form
of a message the data to be exchanged. The structure and the content of a message vary
according to the model. Examples of this model are the Message-Passing-Interface (MPI)
and openMP.
• Remote Procedure Call (RPC): This paradigm extends the concept of procedure call
beyond the boundaries of a single process, thus triggering the execution of code in remote
processes.
• Distributed Objects: This is an implementation of the RPC model for the object-
oriented paradigm and contextualizes this feature for the remote invocation of methods
exposed by objects. Examples of distributed object infrastructures are Common Object
Request Broker Architecture (CORBA), Component Object Model (COM, DCOM, and
COM+), Java Remote Method Invocation (RMI), and .NET Remoting.
• Distributed agents and active Objects: Programming paradigms based on agents and
active objects involve by definition the presence of instances, whether they are agents of
objects, despite the existence of requests.
• Web Service: An implementation of the RPC concept over HTTP; thus allowing the
interaction of components that are developed with different technologies. A Web service
is exposed as a remote object hosted on a Web Server, and method invocation are
transformed in HTTP requests, using specific protocols such as Simple Object Access
Protocol (SOAP) or Representational State Transfer (REST).
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Classification
Elasticity solutions can be arranged in different classes based on
Scope
Policy
Purpose
Method
a.Scope
Elasticity can be implemented on any of the cloud layers.
Most commonly, elasticity is achieved on the IaaS level, where the resources to be
provisioned are virtual machine instances.
Other infrastructure services can also be scaled
On the PaaS level, elasticity consists in scaling containers or databases for instance.
Finally, both PaaS and IaaS elasticity can be used to implement elastic applications, be it
for private use or in order to be provided as a SaaS
The elasticity actions can be applied either at the infrastructure or application/platform
level.
The elasticity actions perform the decisions made by the elasticity strategy or
management system to scale the resources.
Google App Engine and Azure elastic pool are examples of elastic Platform as a Service
(PaaS).
Elasticity actions can be performed at the infrastructure level where the elasticity
controller monitors the system and takes decisions.
The cloud infrastructures are based on the virtualization technology, which can be VMs
or containers.
In the embedded elasticity, elastic applications are able to adjust their own resources
according to runtime requirements or due to changes in the execution flow.
There must be a knowledge of the source code of the applications.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Application Map: The elasticity controller must have a complete map of the application
components and instances.
Code embedded: The elasticity controller is embedded in the application source code.
The elasticity actions are performed by the application itself.
While moving the elasticity controller to the application source code eliminates the use of
monitoring systems
There must be a specialized controller for each application.
b.Policy
Elastic solutions can be either manual or automatic.
A manual elastic solution would provide their users with tools to monitor their systems
and add or remove resources but leaves the scaling decision to them.
Automatic mode: All the actions are done automatically, and this could be classified into
reactive and proactive modes.
Elastic solutions can be either reactive or predictive
Reactive mode: The elasticity actions are triggered based on certain thresholds or rules, the
system reacts to the load (workload or resource utilization) and triggers actions to adapt changes
accordingly.
An elastic solution is reactive when it scales a posteriori, based on a monitored change in
the system.
These are generally implemented by a set of Event-Condition-Action rules.
Proactive mode: This approach implements forecasting techniques, anticipates the future
needs and triggers actions based on this anticipation.
A predictive or proactive elasticity solution uses its knowledge of either recent history or
load patterns inferred from longer periods of time in order to predict the upcoming load
of the system and scale according to it.
c.Purpose
An elastic solution can have many purposes.
The first one to come to mind is naturally performance, in which case the focus should be
put on their speed.
Another purpose for elasticity can also be energy efficiency, where using the minimum
amount of resources is the dominating factor.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Other solutions intend to reduce the cost by multiplexing either resource providers or
elasticity methods
Elasticity has different purposes such as improving performance, increasing resource
capacity, saving energy, reducing cost and ensuring availability.
Once we look to the elasticity objectives, there are different perspectives.
Cloud IaaS providers try to maximize the profit by minimizing the resources while
offering a good Quality of Service (QoS),
PaaS providers seek to minimize the cost they pay to the
Cloud.
The customers (end-users) search to increase their Quality of Experience (QoE) and to
minimize their payments.
QoE is the degree of delight or annoyance of the user of an application or service
d.Method
Vertical elasticity, changes the amount of resources linked to existing instances on-the-
fly.
This can be done in two manners.
The first method consists in explicitly redimensioning a virtual machine instance, i.e.,
changing the quota of physical resources allocated to it.
This is however poorly supported by common operating systems as they fail to take into
account changes in CPU or memory without rebooting, thus resulting in service
interruption.
The second vertical scaling method involves VM migration: moving a virtual machine
instance to another physical machine with a different overall load changes its available
resources
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Migration
Migration can be also considered as a needed action to further allow the vertical scaling
when there is no enough resources on the host machine.
It is also used for other purposes such as migrating a VM to a less loaded physical
machine just to guarantee its performance.
Several types of migration are deployed such as live migration and no-live migration.
Live migration has two main approaches
post-copy
pre-copy
Post-copy migration suspends the migrating VM, copies minimal processor state to the
target host, resumes the VM and then begins fetching memory pages from the source.
In pre-copy approach, the memory pages are copied while the VM is running on the
source.
If some pages are changed (called dirty pages) during the memory copy process, they will
be recopied until the number of recopied pages is greater than dirty pages, or the source
VM will be stopped.
The remaining dirty pages will be copied to the destination VM.
Architecture
The architecture of the elasticity management solutions can be either centralized or
decentralized.
Centralized architecture has only one elasticity controller, i.e., the auto scaling system
that provisions and deprovisions resources.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Provider
Elastic solutions can be applied to a single or multiple cloud providers.
A single cloud provider can be either public or private with one or multiple regions or
datacenters.
Multiple clouds in this context means more than one cloud provider.
It includes hybrid clouds that can be private or public, in addition to the federated clouds
and cloud bursting.
Most of the elasticity solutions support only a single cloud provider
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
In order to achieve the goal, the cloud user has to request cloud service provider to make
a provision for the resources either statically or dynamically.
So that the cloud service provider will know how many instances of the resources and
what resources are required for a particular application.
By provisioning the resources, the QoS parameters like availability, throughput, security,
response time, reliability, performance etc must be achieved without violating SLA
There are two types
• Static Provisioning
• Dynamic Provisioning
Static Provisioning
For applications that have predictable and generally unchanging demands/workloads, it is
possible to use “static provisioning" effectively.
With advance provisioning, the customer contracts with the provider for services.
The provider prepares the appropriate resources in advance of start of service.
The customer is charged a flat fee or is billed on a monthly basis.
Dynamic Provisioning
In cases where demand by applications may change or vary, “dynamic provisioning"
techniques have been suggested whereby VMs may be migrated on-the-fly to new
compute nodes within the cloud.
The provider allocates more resources as they are needed and removes them when they
are not.
The customer is billed on a pay-per-use basis.
When dynamic provisioning is used to create a hybrid cloud, it is sometimes referred to
as cloud bursting.
Parameters for Resource Provisioning
Response time
Minimize Cost
Revenue Maximization
Fault tolerant
Reduced SLA Violation
Reduced Power Consumption
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Response time: The resource provisioning algorithm designed must take minimal time to
respond when executing the task.
Minimize Cost: From the Cloud user point of view cost should be minimized.
Revenue Maximization: This is to be achieved from the Cloud Service Provider’s view.
Fault tolerant: The algorithm should continue to provide service in spite of failure of nodes.
Reduced SLA Violation: The algorithm designed must be able to reduce SLA violation.
Reduced Power Consumption: VM placement & migration techniques must lower power
consumption
Dynamic Provisioning Types
1. Local On-demand Resource Provisioning
2. Remote On-demand Resource Provisioning
Local On-demand Resource Provisioning
1. The Engine for the Virtual Infrastructure
The OpenNebula Virtual Infrastructure Engine
• OpenNEbula creates a distributed virtualization layer
• Extend the benefits of VM Monitors from one to multiple resources
• Decouple the VM (service) from the physical location
• Transform a distributed physical infrastructure into a flexible and elastic virtual
infrastructure, which adapts to the changing demands of the VM (service) workloads
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
Cluster Partitioning
• Dynamic partition of the infrastructure
• Isolate workloads (several computing clusters)
• Dedicated HA partitions
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
•
Other Tools for VM Management
• VMware DRS, Platform Orchestrator, IBM Director, Novell ZENworks, Enomalism,
Xenoserver
• Advantages:
• Open-source (Apache license v2.0)
• Open and flexible architecture to integrate new virtualization technologies
• Support for the definition of any scheduling policy (consolidation, workload
balance, affinity, SLA)
• LRM-like CLI and API for the integration of third-party tools
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791 CLOUD COMPUTING
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
2.1 Introduction
Web Service
Generic definition
• Web services are self-contained, modular business applications that have open, Internet-
oriented, standards-based interfaces.
• Other systems interact with the Web service using SOAP messages.
• Loosely-coupled
• Distributed Architecture
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
A web service interface generally consists of a collection of operations that can be used
by a client over the Internet.
The operations in a web service may be provided by a variety of different resources, for
example, programs, objects, or databases.
The key characteristic of (most) web services is that they can process XML-formatted
SOAP messages. An alternative is the REST approach.
Each web service uses its own service description to deal with the service-specific
characteristics of the messages it receives. Commercial examples include Amazon,
Yahoo, Google and eBay.
Remote Access
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Service Requester finds service (on service broker) and dynamically binds to service
Enables ad-hoc collaboration and Enterprise Application Integration (EAI) within web-
based information systems
SOA is about how to design a software system that makes use of services of new or
legacy applications through their published or discoverable interfaces.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
It prompts architecture styles such as loose coupling, published interfaces and a standard
communication model in order to support this goal.
Properties of SOA
Logical view
Message orientation
Description orientation
Logical view
The SOA is an abstracted, logical view of actual programs, databases, business processes.
The service is formally defined in terms of the messages exchanged between provider
agents and requester agents.
Message Orientation
The internal structure of providers and requesters include the implementation language,
process structure, and even database structure.
Using the SOA discipline one does not and should not need to know how an agent
implementing a service is constructed.
By avoiding any knowledge of the internal structure of an agent, one can incorporate any
software component or application to adhere to the formal service definition.
Description orientation
Only those details that are exposed to the public and are important for the use of the
service should be included in the description.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Used in combination with SOAP and an XML Schema to provide Web services over
the Internet.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
APIs are for software components; a way for software to interact with other software.
Web Services are a set of rules and technologies that enable two or more components on
the web to talk to each other.
HTTP is an application layer protocol for sending and receiving messages over a
network.
REST is a specification that dictates how distributed systems on the web should
communicate.
SOA is related to early efforts on the architecture style of large scale distributed systems,
particularly Representational State Transfer (REST).
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Applications:
Advantage:
Simplicity
Figure 2.5 A simple REST interaction between user and server in HTTP specification
REST Principles
Self-Descriptive Message
Stateless Interactions
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
The RESTful web service exposes a set of resources which identify targets of interaction
with its clients.
Any information that can be named can be a resource, such as a document or image or a
temporal service.
URI is of type URL, providing a global addressing space for resources involved in an
interaction between components as well as facilitating service discovery.
Interaction with RESTful web services is done via the HTTP standard, client/server
cacheable protocol.
PUT
GET
POST
DELETE
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Self-Descriptive Message
A REST message includes enough information to describe how to process the message.
This enables intermediaries to do more with the message without parsing the message
contents.
In REST, resources are decoupled from their representation so that their content can be
accessed in a variety of standard formats
Eg:- HTML, XML, MIME, plain text, PDF, JPEG, JSON, etc.
Metadata about the resource is available and can be used for various purposes.
❖ Cache control
❖ Authentication or authorization
❖ Access control.
Stateless Interactions
REST - Advantages
RESTful web services can be considered an alternative to SOAP stack or “big web
services
Simplicity
Lightweight nature
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
This allows client software to dynamically determine what a service does, the data types
that a service uses, how to invoke operations on the service, and the responses that the
service may return.
Once a web service is deployed, other applications and other web services can discover
and invoke the deployed service.
Web services are remotely executed, they do not depend on resources residing on the
client system that calls them.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Other systems interact with the web service in a manner prescribed by its description
using SOAP messages, typically conveyed using HTTP with an XML serialization
SOAP provides a standard packaging structure for transmission of XML documents over
various Internet protocols, such as SMTP, HTTP, and FTP.
A SOAP message consists of a root element called envelope. Envelope contains a header: a
container that can be extended by intermediaries.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
The content of the payload will be marshaled by the sender’s SOAP engine and
unmarshaled at the receiver side, based on the XML schema that describes the structure
of the SOAP message.
WSDL describes the interface, a set of operations supported by a web service in a standard
format.
It standardizes the representation of input and output parameters of its operations as well as
the service’s protocol binding, the way in which the messages will be transferred on the wire.
Using WSDL enables disparate clients to automatically understand how to interact with a
web service.
UDDI provides a global registry for advertising and discovery of web services.
A simple and effective remote procedure call protocol which uses XML for encoding its
calls and HTTP as a transport mechanism.
A procedure executed on the server and the value it returns was formatted in XML.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Web services can be composed together to make more complex web services and
workflows.
That goal can be the completion of a business transaction, or fulfillment of the job
of a service.
Web Service WS-Notification enables web services to use the publish and subscribe
messaging pattern.
Web Services Security (WS-Security) are set of protocols that ensure security for
SOAP-based messages by implementing the principles of confidentiality, integrity and
authentication.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
The transaction specification is divided into two parts - short atomic transactions (AT)
and long business activity (BA).
The Java Message Service (JMS) API is a messaging standard that allows application
components based on the Java Platform Enterprise Edition (Java EE) to create, send,
receive, and read messages.
IIOP is used to enhance Internet and intranet communication for applications and
services.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
The EncodingStyle element refers to the URI address of an XML schema for encoding
elements of the message.
Each element of a SOAP message may have a different encoding, but unless specified,
the encoding of the whole message is as defined in the XML schema of the root element.
The header is an optional part of a SOAP message that may contain auxiliary
information.
The body of a SOAP request-response message contains the main information of the
conversation, formatted in one or more XML blocks.
In example, the client is calling CreateBucket of the Amazon S3 web service interface.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Subscribers can declare their interest on a subset of the whole information issuing
subscriptions.
Publisher
Subscriber
The former provides facilities for the later to register its interest in a specific topic or
event.
Department of CSE VII SEMESTER 16
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Specific conditions holding true on the publisher side can trigger the creation of messages
that are attached to a specific event.
Message will be available to all the subscribers that registered for the corresponding
event.
There are two major strategies for dispatching the event to the subscribers.
Push strategy:
It is the responsibility of the publisher to notify all the subscribers. Eg: Method
invocation.
Pull strategy :
The publisher simply makes available the message for a specific event.
It is the responsibility of the subscribers to check whether there are messages on the
events that are registered.
Subscriptions are used to filter out part of the events produced by publishers.
It describes how two different parts of a message passing system connect and
communicate with each other.
Publishers
Eventbus/broker
Subscribers
Publishers:
Subscribers:
They ‘listen’ out for messages regarding topic/categories that they are interested in
without any knowledge of who the publishers are.
Event Bus:
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Each subscriber only receives a subset of the messages that have been sent by the
Publisher.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
The clients of this system are divided according to their role into publishers and
subscribers.
The interaction takes place through the nodes of the pub/sub system.
The type is generally one of the common primitive data types defined in programming
languages or query languages (e.g. integer, real, string, etc.).
A subscription is a filter over a portion of the event content (or the whole of it).
A subscriber installs and removes a subscription from the pub/sub system by executing
the subscribe() and unsubscribe() operations respectively.
This time encompass the update of the internal data structures of the pub/sub system and the
network delay due to the routing of the subscription.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Three properties:
Safety (Validity): A subscriber cannot be notified for an event that has not been
previously published.
Liveness: The delivery of a notification for an event is guaranteed only for those
subscribers that subscribed at a time at least Tsub before the event was published.
Reliable delivery
Timeliness
Reliable delivery
Reliable delivery of an event means determining the subscribers that have to receive a
published event, as stated by the liveness property and delivering the event to all of them.
Timeliness
Real-time applications often require strict control over the time elapsed by a piece of
information to reach all its consumers.
A subscriber wants to trust authenticity of the events it receives from the system.
Generated by a trusty publisher and the information they contains have not been
corrupted.
Subscribers have to be trusted for what concerns the subscriptions they issue.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Subscription Models
Topic-based Model
A subscriber declares its interest for a particular topic to receive all events pertaining to
that topic.
Each topic corresponds to a logical channel ideally connecting each possible publisher to
all interested subscribers.
Subscribers only receive messages from logic channels they care about (and have
subscribed to).
Pub/sub variant events are actually objects belonging to a specific type, which can thus
encapsulate attributes as well as methods.
Enforce type-safety at the pub/sub system, rather than inside the application.
System allows subscribers to receive messages based on the content of the messages.
Subscribers themselves must sort out junk messages from the ones they want.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Benefits
Loose coupling
The publisher is not aware of the number of subscribers, of the identities of the
subscribers, or of the message types that the subscribers are subscribed to.
Improved security
Specific applications can exchange messages directly, excluding other applications from
the message exchange.
Improved testability.
Topics usually reduce the number of messages that are required for testing.
Separation of concerns
Due to the simplistic nature of the architecture, developers can exercise fine grained
separation of concerns by dividing up message types to serve a single simple purpose
each.
Eg. data with a topic “/cats” should only contain information about cats.
Subscribers need not concern themselves with the inner workings of a publisher.
Subscribers only interact with the publisher through the public API exposed by the
publisher.
Drawbacks
Increased complexity.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Organizations that maintain many topics usually have formal procedures for their use.
Decreased performance
This overhead increases the latency of message exchange, and this latency decreases
performance.
The publish/subscribe model introduces high semantic coupling in the messages passed
by the publishers to the subscribers.
In order to change the structure of the messages, all of the subscribers must be altered to
accept the changed format
Instability of Delivery
The publisher does not have perfect knowledge of the status of the systems listening to
the messages.
If a logger subscribing to the ‘Critical’ message type crashes or gets stuck in an error
state, then the ‘Critical’ messages may be lost!
Then any services depending on the error messages will be unaware of the problems with
the publisher.
Applications
➢ Software Distribution
➢ Internet TV
➢ Audio or Video-conferencing
➢ Virtual Classroom
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
It can also be used in even larger size group communication applications, such as broadcasting
and content distribution.
➢ Market Tracker
2.5 VIRTUALIZATION
• Virtualization hides the physical characteristics of computing resources from their users,
applications, or end users.
• This includes making a single physical resource (such as a server, an operating system, an
application, or storage device) appear to function as multiple virtual resources.
• It can also include making multiple physical resources (such as storage devices or
servers) appear as a single virtual resource.
• In computing, virtualization refers to the act of creating a virtual (rather than actual)
version of something, like computer hardware platforms, operating systems, storage
devices, and computer network resources
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Advantages of Virtualization:
1. Reduced Costs.
5. Increase Availability
6. Save energy
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Disadvantages of Virtualization:
1. Extra Costs.
2. Software Licensing.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Figure 2.11 The architecture of a computer system before and after Virtualization
Figure 2.12 Virtualization ranging from hardware to applications in five abstraction levels.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
code written for various processors on any given new hardware host machine. Instruction set
emulation leads to virtual ISAs created on any hardware machine.
The basic emulation method is through code interpretation. An interpreter program
interprets the source instructions to target instructions one by one. OneSource instruction may
require tens or hundreds of native target instructions to perform its function. Obviously, this
process is relatively slow. For better performance, dynamic binary translation is desired.
This approach translates basic blocks of dynamic source instructions to target
instructions. The basic blocks can also be extended to program traces or super blocks to increase
translation efficiency. Instruction set emulation requires binary translation and optimization. A
virtual instruction set architecture (V-ISA) thus requires adding a processor-specific software
translation layer to the compiler.
Hardware Abstraction Level:
Hardware-level virtualization is performed right on top of the bare hardware. The idea is
to virtualize a computer’s resources, such as its processors, memory, and I/O devices. The
intention is to upgrade the hardware utilization rate by multiple users concurrently.
Operating System Level:
This refers to an abstraction layer between traditional OS and user applications. OS-level
virtualization creates isolated containers on a single physical server and the OS instances to
utilize the hardware and software in datacenters.
The containers behave like real servers. OS-level virtualization is commonly used in
creating virtual hosting environments to allocate hardware resources among a large number of
mutually distrusting users. It is also used, to a lesser extent, in consolidating server hardware by
moving services on separate hosts into containers or VMs on one server.
Library Support Level:
Most applications use APIs exported by user level libraries rather than using lengthy
system calls by the OS. Since most systems provide well documented APIs, such an interface
becomes another candidate for virtualization.
Virtualization with library interfaces is possible by controlling the communication link
between applications and the rest of a system through API hooks. The software tool WINE has
implemented this approach to support Windows applications on top of UNIX hosts. Another
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
example is the vCUDA which allows applications executing within VMs to leverage GPU
hardware acceleration.
User-Application Level:
Virtualization at the application level virtualizes an application as a VM. On a traditional
OS, an application often runs as a process. Therefore, application-level virtualization is also
known as process-level virtualization. The most popular approach is to deploy high level
language (HLL)VMs.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Advantages of OS Extensions
(1) VMs at the operating system level have minimal startup/shutdown costs, low resource
requirements, and high scalability.
(2) For an OS-level VM, it is possible for a VM and its host environment to synchronize
state changes when necessary.
These benefits can be achieved via two mechanisms of OS-level virtualization:
(1) All OS-level VMs on the same physical machine share a single operating system kernel
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
(2) The virtualization layer can be designed in a way that allows processes in VMs to access as
many resources of the host machine as possible, but never to modify them.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
A micro-kernel hypervisor includes only the basic and unchanging functions (such as
physical memory management and processor scheduling). The device drivers and other
changeable components are outside the hypervisor. A monolithic hypervisor implements all the
aforementioned functions, including those of the device drivers.
Therefore, the size of the hypervisor code of a micro-kernel hypervisor is smaller than
that of a monolithic hypervisor. Essentially, a hypervisor must be able to convert physical
devices into virtual resources dedicated for the deployed VM to use.
The Xen Architecture:
The core components of a Xen system are the hypervisor, kernel, and applications. The
organization of the three components is important. Like other virtualization systems, many guest
OSes can run on top of the hypervisor. However, not all guest OSes are created equal, and one in
particular controls the others.
The guest OS, which has control ability, is called Domain 0, and the others are called
Domain U. Domain 0 is a privileged guest OS of Xen. It is first loaded when Xen boots without
any file system drivers being available. Domain 0 is designed to access hardware directly and
manage devices. Therefore, one of the responsibilities of Domain 0 is to allocate and map
hardware resources for the guest domains (the Domain U domains).
Full Virtualization:
With full virtualization, noncritical instructions run on the hardware directly while critical
instructions are discovered and replaced with traps into the VMM to be emulated by software.
Both the hypervisor and VMM approaches are considered full virtualization.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Figure 2.13 Indirect execution of complex instructions via binary translation of guest OS
requests using the VMM plus direct execution of simpleinstructions on the same host.
The method used in this emulation is called binary translation. Therefore, full
virtualization combines binary translation and direct execution. The guest OS is completely
decoupled from the underlying hardware. Consequently, the guest OS is unaware that it is being
virtualized. Binary translation employs a code cache to store translated hot instructions to
improve performance, but it increases the cost of memory usage.
Host-Based Virtualization:
An alternative VM architecture is to install a virtualization layer on top of the host OS.
This host OS is still responsible for managing the hardware. The guest OSes are installed and run
on top of the virtualization layer. Dedicated applications may run on the VMs. Certainly, some
other applications can also run with the host OS directly. This host-based architecture has some
distinct advantages, as enumerated next. First, the user can install this VM architecture without
modifying the host OS. The virtualizing software can rely on the host OS to provide device
drivers and other low level services. This will simplify the VM design and ease its deployment.
Second, the host-based approach appeals to many host machine configurations.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Compared to the hypervisor/VMM architecture, the performance of the host based architecture
may also be low. When an application requests hardware access, it involves four layers of
mapping which downgrades performance significantly.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Para-Virtualization Architecture:
When the x86 processor is virtualized, a virtualization layer is inserted between the
hardware and the OS. According to the x86 ring definitions, the virtualization layer should also
be installed at Ring 0. The para-virtualization replaces non virtualizable instructions with hyper
calls that communicate directly with the hypervisor or VMM. However, when the guest OS
kernel is modified for virtualization, it can no longer run on the hardware directly.
Although para-virtualization reduces the overhead, it has incurred other problems. First,
its compatibility and portability may be in doubt, because it must support the unmodified OS as
well. Second, the cost of maintaining para-virtualized OSes is high, because they may require
deep OS kernel modifications. Finally, the performance advantage of para virtualization varies
greatly due to workload variations.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
full virtualization architecture which intercepts and emulates privileged and sensitive instructions
at runtime, para-virtualization handles these instructions at compile time.
The guest OS kernel is modified to replace the privileged and sensitive instructions with
hyper calls to the hypervisor or VMM. Xen assumes such a para virtualization architecture. The
guest OS running in a guest domain may run at Ring 1instead of at Ring 0. This implies that the
guest OS may not be able to execute some privileged and sensitive instructions. The privileged
instructions are implemented by hypercalls to the hypervisor. After replacing the instructions
with hyper calls, the modified guest OS emulates the behavior of the original guest OS.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
memory of the VMs. That means a two-stage mapping process should be maintained by the guest
OS and the VMM, respectively: virtual memory to physical memory and physical memory to
machine memory. Furthermore, MMU virtualization should be supported, which is transparent to
the guest OS. The guest OS continues to control the mapping of virtual addresses to the physical
memory addresses of VMs. But the guest OS cannot directly access the actual machine memory.
The VMM is responsible for mapping the guest physical memory to the actual machine memory.
Figure 2.16 shows the two-level memory mapping procedure.
I/O Virtualization:
I/O virtualization involves managing the routing of I/O requests between virtual devices and the
shared physical hardware. There are three ways to implement I/O virtualization:
• Full device emulation
• Para virtualization
• Direct I/O
Full device emulation is the first approach for I/O virtualization. Generally, this
approach emulates well known, real-world devices. All the functions of a device or bus
infrastructure, such as device enumeration, identification, interrupts, and DMA, are replicated in
software. This software is located in the VMM and acts as a virtual device. The I/O access
requests of the guest OS are trapped in the VMM which interacts with the I/O devices.
A single hardware device can be shared by multiple VMs that run concurrently.
However, software emulation runs much slower than the hardware it emulates. The para
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
virtualization method of I/O virtualization is typically used in Xen. It is also known as the split
driver model consisting of a frontend driver and a backend driver. The frontend driver is running
in Domain U and the backend driver is running in Domain 0. They interact with each other via a
block of shared memory. The frontend driver manages the I/O requests of the guest OSes and the
backend driver is responsible for managing the real I/O devices and multiplexing the I/O data of
different VMs. Although para I/O-virtualization achieves better device performance than full
device emulation, it comes with a higher CPU overhead.
Figure 2.17 Device emulation for I/O virtualization implemented inside the middle layer
that maps real I/O devices into the virtual devices for the guest device driver to use.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
Using VMs in a cloud computing platform ensures extreme flexibility for users.
As the computing resources are shared by many users, a method is required to maximize the
users’ privileges and still keep them separated safely. Traditional sharing of cluster resources
depends on the user and group mechanism on a system. Such sharing is not flexible. Users
cannot customize the system for their special purposes. Operating systems cannot be changed.
The separation is not complete.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
An environment that meets one user’s requirements often cannot satisfy another
user. Virtualization allows users to have full privileges while keeping them separate. Users have
full access to their own VMs, which are completely separate from other users’ VMs. Multiple
VMs can be mounted on the same physical server. Different VMs may run with different OSes.
We also need to establish the virtual disk storage and virtual networks needed by the VMs. The
virtualized resources form a resource pool.
The virtualization is carried out by special servers dedicated to generating the
virtualized resource pool. The virtualized infrastructure (black box in the middle) is built with
many virtualizing integration managers. These managers handle loads, resources, security, data,
and provisioning functions.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit II Notes
(3) VMs can be used to improve security through creation of sandboxes for running
applications with questionable reliability;
(4) Virtualized cloud platforms can apply performance isolation, letting providers offer
some guarantees and better QoS to customer applications.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
The cloud computing resources are built into the data centers.
Data centers are typically owned and operated by a third-party provider.
Consumers do not need to know the underlying technologies
In a cloud, software becomes a service.
Cloud demands a high degree of trust of massive amounts of data retrieved from large
data centers.
The software infrastructure of a cloud platform must handle all resource management
and maintenance automatically.
Software must detect the status of each node server joining and leaving.
Cloud computing providers such as Google and Microsoft, have built a large number
of data centers.
Each data center may have thousands of servers.
The location of the data center is chosen to reduce power and cooling costs.
Layered Cloud Architectural Development
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
"pay-per-use model for enabling available, convenient and on-demand network access to a
shared pool of configurable computing resources (e.g., networks, servers, storage,
applications and services) that can be rapidly provisioned and released with minimal
management effort or service provider interaction."
Architecture
Architecture consists of 3 tiers
◦ Cloud Deployment Model
◦ Cloud Service Model
◦ Essential Characteristics of Cloud Computing .
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
Essential Characteristics 1
On-demand self-service.
◦ A consumer can unilaterally provision computing capabilities such as server
time and network storage as needed automatically, without requiring human
interaction with a service provider.
Essential Characteristics 2
Broad network access.
◦ Capabilities are available over the network and accessed through standard
mechanisms that promote use by heterogeneous thin or thick client platforms
(e.g., mobile phones, laptops, and PDAs) as well as other traditional or
cloudbased software services.
Essential Characteristics 3
Resource pooling.
◦ The provider’s computing resources are pooled to serve multiple consumers
using a multi-tenant model, with different physical and virtual resources
dynamically assigned and reassigned according to consumer demand.
Essential Characteristics 4
Rapid elasticity.
◦ Capabilities can be rapidly and elastically provisioned - in some cases
automatically - to quickly scale out; and rapidly released to quickly scale in.
◦ To the consumer, the capabilities available for provisioning often appear to be
unlimited and can be purchased in any quantity at any time.
Essential Characteristics 5
Measured service.
◦ Cloud systems automatically control and optimize resource usage by
leveraging a metering capability at some level of abstraction appropriate to the
type of service.
Resource usage can be monitored, controlled, and reported - providing transparency for both
the provider and consumer of the service.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
• The audit may involve interactions with both the Cloud Consumer and the Cloud
Provider.
Cloud Consumer
The cloud consumer is the principal stakeholder for the cloud computing service.
A cloud consumer represents a person or organization that maintains a business
relationship with, and uses the service from a cloud provider.
The cloud consumer may be billed for the service provisioned, and needs to arrange
payments accordingly.
Example Services Available to a Cloud Consumer
The consumers of SaaS can be organizations that provide their members with access
to software applications, end users or software application administrators.
SaaS consumers can be billed based on the number of end users, the time of use, the
network bandwidth consumed, the amount of data stored or duration of stored data.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
Cloud consumers of PaaScan employ the tools and execution resources provided by
cloud providers to develop, test, deploy and manage the applications.
PaaS consumers can be application developers or application testers who run and test
applications in cloud-based environments,.
PaaS consumers can be billed according to, processing, database storage and network
resources consumed.
Consumers of IaaS have access to virtual computers, network-accessible storage &
network infrastructure components.
The consumers of IaaS can be system developers, system administrators and IT
managers.
IaaS consumers are billed according to the amount or duration of the resources
consumed, such as CPU hours used by virtual computers, volume and duration of data
stored.
Cloud Provider
A cloud provider is a person, an organization;
It is the entity responsible for making a service available to interested parties.
A Cloud Provider acquires and manages the computing infrastructure required for
providing the services.
Runs the cloud software that provides the services.
Makes arrangement to deliver the cloud services to the Cloud Consumers through network
access.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
Cloud Auditor
A cloud auditor is a party that can perform an independent examination of cloud
service controls.
Audits are performed to verify conformance to standards through review of objective
evidence.
A cloud auditor can evaluate the services provided by a cloud provider in terms of
security controls, privacy impact, performance, etc.
Cloud Broker
Integration of cloud services can be too complex for cloud consumers to manage.
A cloud consumer may request cloud services from a cloud broker, instead of
contacting a cloud provider directly.
A cloud broker is an entity that manages the use, performance and delivery of cloud
services. Negotiates relationships between cloud providers and cloud consumers.
Services of cloud broker
Service Intermediation:
A cloud broker enhances a given service by improving some specific capability and
providing value-added services to cloud consumers.
Service Aggregation:
A cloud broker combines and integrates multiple services into one or more new
services.
The broker provides data integration and ensures the secure data movement between
the cloud consumer and multiple cloud providers.
Services of cloud broker
Service Arbitrage:
Service arbitrage is similar to service aggregation except that the services being
aggregated are not fixed.
Service arbitrage means a broker has the flexibility to choose services from multiple
agencies.
Eg: The cloud broker can use a credit-scoring service to measure and select an agency with
the best score.
Cloud Carrier
A cloud carrier acts as an intermediary that provides connectivity and transport of
cloud services between cloud consumers and cloud providers.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
A public cloud is one in which the cloud infrastructure and computing resources are
made available to the general public over a public network.
A public cloud is meant to serve a multitude(huge number) of users, not a single
customer.
A fundamental characteristic of public clouds is multitenancy.
Multitenancy allows multiple users to work in a software environment at the same
time, each with their own resources.
Built over the Internet (i.e., service provider offers resources, applications storage to
the customers over the internet) and can be accessed by any user.
Owned by service providers and are accessible through a subscription.
Best Option for small enterprises, which are able to start their businesses without
large up-front(initial) investment.
By renting the services, customers were able to dynamically upsize or downsize their
IT according to the demands of their business.
Services are offered on a price-per-use basis.
Promotes standardization, preserve capital investment
Public clouds have geographically dispersed datacenters to share the load of users and
better serve them according to their locations
Provider is in control of the infrastructure
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
Examples:
o Amazon EC2 is a public cloud that provides Infrastructure as a Service
o Google AppEngine is a public cloud that provides Platform as a Service
o SalesForce.com is a public cloud that provides software as a service.
Advantage
Offers unlimited scalability – on demand resources are available to meet your
business needs.
Lower costs—no need to purchase hardware or software and you pay only for the
service you use.
No maintenance - Service provider provides the maintenance.
Offers reliability: Vast number of resources are available so failure of a system will
not interrupt service.
Services like SaaS, PaaS, IaaS are easily available on Public Cloud platform as it can
be accessed from anywhere through any Internet enabled devices.
Location independent – the services can be accessed from any location
Disadvantage
No control over privacy or security
Cannot be used for use of sensitive applications(Government and Military agencies
will not consider Public cloud)
Lacks complete flexibility(since dependent on provider)
No stringent (strict) protocols regarding data management
3.3.2Private Cloud
Cloud services are used by a single organization, which are not exposed to the public
Services are always maintained on a private network and the hardware and software
are dedicated only to single organization
Private cloud is physically located at
• Organization’s premises [On-site private clouds] (or)
• Outsourced(Given) to a third party[Outsource private Clouds]
It may be managed either by
Cloud Consumer organization (or)
• By a third party
Private clouds are used by
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
• government agencies
• financial institutions
• Mid size to large-size organisations.
On-site private clouds
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
3.3.3Hybrid Cloud
Built with both public and private clouds
It is a heterogeneous cloud resulting from a private and public clouds.
Private cloud are used for
• sensitive applications are kept inside the organization’s network
• business-critical operations like financial reporting
Public Cloud are used when
• Other services are kept outside the organization’s network
• high-volume of data
• Lower-security needs such as web-based email(gmail,yahoomail etc)
The resources or services are temporarily leased for the time required and then
released. This practice is also known as cloud bursting.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
Fig:Hybrid Cloud
Advantage
It is scalable
Offers better security
Flexible-Additional resources are availed in public cloud when needed
Cost-effectiveness—we have to pay for extra resources only when needed.
Control - Organisation can maintain a private infrastructure for sensitive application
Disadvantage
Infrastructure Dependency
Possibility of security breach(violate) through public cloud
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
These models are offered based on various SLAs between providers and users
SLA of cloud computing covers
o service availability
o performance
• data protection
o Security
3.4.1 Software as a Service(SaaS)( Complete software offering on the cloud)
SaaS is a licensed software offering on the cloud and pay per use
SaaS is a software delivery methodology that provides licensed multi-tenant access to
software and its functions remotely as a Web-based service.
Usually billed based on usage
◦ Usually multi tenant environment
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
IaaS providers
Amazon Elastic Compute Cloud (EC2)
◦ Each instance provides 1-20 processors, upto 16 GB RAM, 1.69TB storage
RackSpace Hosting
◦ Each instance provides 4 core CPU, upto 8 GB RAM, 480 GB storage
Joyent Cloud
◦ Each instance provides 8 CPUs, upto 32 GB RAM, 48 GB storage
Go Grid
◦ Each instance provides 1-6 processors, upto 15 GB RAM, 1.69TB storage
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
Characteristics of PaaS
Runtime framework: Executes end-user code according to the policies set by the user and
the provider.
Abstraction: PaaS helps to deploy(install) and manage applications on the cloud.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
Solution:
o Some SaaS providers provide the opportunity to defend against DDoS attacks by using
quick scale-ups.
Customers cannot easily extract their data and programs from one site to run on another.
Solution:
o Have standardization among service providers so that customers can deploy (install)
services and data across multiple cloud providers.
Data Lock-in
It is a situation in which a customer using service of a provider cannot be moved to another
service provider because technologies used by a provider will be incompatible with other
providers.
This makes a customer dependent on a vendor for services and makes customer unable to
use service of another vendor.
Solution:
o Have standardization (in technologies) among service providers so that customers can
easily move from a service provider to another.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
o cost-per-data-transferred
The end user doesn’t have to pay for infrastructure (resources), they have to pay only for
how much they transfer and save on the provider’s storage.
5.2 Providers
Google Docs allows users to upload documents, spreadsheets, and presentations to
Google’s data servers.
Those files can then be edited using a Google application.
Web email providers like Gmail, Hotmail, and Yahoo! Mail, store email messages on
their own servers.
Users can access their email from computers and other devices connected to the Internet.
Flicker and Picasa host millions of digital photographs, Users can create their own online
photo albums.
YouTube hosts millions of user-uploaded video files.
Hostmonster and GoDaddy store files and data for many client web sites.
Facebook and MySpace are social networking sites and allow members to post pictures
and other content. That content is stored on the company’s servers.
MediaMax and Strongspace offer storage space for any kind of digital data.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
Encryption
o Algorithms are used to encode information. To decode the information keys are required.
Authentication processes
o This requires a user to create a name and password.
Authorization practices
o The client lists the people who are authorized to access information stored on the cloud
system.
If information stored on the cloud, the head of the IT department might have complete and
free access to everything.
Reliability
Service Providers gives reliability for data through redundancy (maintaining multiple
copies of data).
Reputation is important to cloud storage providers. If there is a perception that the provider is
unreliable, they won’t have many clients.
Advantages
Cloud storage providers balance server loads.
Move data among various datacenters, ensuring that information is stored close and
thereby available quickly to where it is used.
It allows to protect the data in case there’s a disaster.
Some products are agent-based and the application automatically transfers
information to the cloud via FTP
Cautions
Don’t commit everything to the cloud, but use it for a few, noncritical purposes.
Large enterprises might have difficulty with vendors like Google or Amazon.
Forced to rewrite solutions for their applications.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
Lack of portability.
Theft (Disadvantage)
User data could be stolen or viewed by those who are not authorized to see it.
Whenever user data is let out of their own datacenter, risk trouble occurs from a
security point of view.
If user store data on the cloud, make sure user encrypts data and secures data transit
with technologies like SSL.
Design Requirements
Amazon built S3 to fulfill the following design requirements:
• Scalable Amazon S3 can scale in terms of storage, request rate, and users to support an
unlimited number of web-scale applications.
• Reliable Store data durably, with 99.99 percent availability. Amazon says it does not
allow any downtime.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
Design Principles
Amazon used the following principles of distributed system design to meet Amazon S3
requirements:
• Decentralization It uses fully decentralized techniques to remove scaling bottlenecks and
single points of failure.
• Autonomy The system is designed such that individual components can make decisions
based on local information.
• Local responsibility Each individual component is responsible for achieving its
consistency; this is never the burden of its peers.
• Controlled concurrency Operations are designed such that no or limited concurrency
control is required.
• Failure toleration The system considers the failure of components to be a normal mode of
operation and continues operation with no or minimal interruption.
• Controlled parallelism Abstractions used in the system are of such granularity that
parallelism can be used to improve performance and robustness of recovery or the
introduction of new nodes.
• Small, well-understood building blocks Do not try to provide a single service that does
everything for everyone, but instead build small components that can be used as building
blocks for other services.
• Symmetry Nodes in the system are identical in terms of functionality, and require no or
minimal node-specific configuration to function.
• Simplicity The system should be made as simple as possible, but no simpler.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
How S3 Works
Amazon keeps its lips pretty tight about how S3 works, but according to Amazon, S3’s
design aims to provide scalability, high availability, and low latency at commodity costs. S3
stores arbitrary objects at up to 5GB in size, and each is accompanied by up to 2KB of
metadata. Objects are organized by buckets. Each bucket is owned by an AWS account and
the buckets are identified by a unique, user-assigned key.
Buckets and objects are created, listed, and retrieved using either a REST-style or
SOAP interface.
Objects can also be retrieved using the HTTP GET interface or via BitTorrent. An
access control list restricts who can access the data in each bucket. Bucket names and keys
are formulated so that they can be accessed using HTTP. Requests are authorized using an
access control list associated with each bucket and object, for instance:
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit III Notes
http://s3.amazonaws.com/examplebucket/examplekey
http://examplebucket.s3.amazonaws.com/examplekey
The Amazon AWS Authentication tools allow the bucket owner to create an authenticated
URL with a set amount of time that the URL will be valid.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
• Platform as a Service(PaaS)
• Infrastructure as a Service(IaaS)
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
➢ Location as a Service (LaaS), provides security to all the physical hardware and network
resources. This service is also called as Security as a Service.
➢ The cloud infrastructure layer can be further subdivided as
• Data as a Service (DaaS)
• Communication as a Service (CaaS)
• Infrastructure as a Service(IaaS)
➢ Cloud players are divided into three classes:
• Cloud service providers and IT administrators
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
o Power management
o Conflict in signed SLAs between consumers and service providers.
..
case (b)
Under provisioning of resources results in losses by both user and provider. Users have paid for
the demand (the shaded area above the capacity) is not used by users.
case (c)
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
Constant provisioning
Fixed capacity to a declining user demand could result in even worse resource waste.
The user may give up the service by canceling the demand, resulting in reduced revenue
for the provider.
Both the user and provider may be losers in resource provisioning without elasticity.
Resource-provisioning methods are
• Demand-driven method - Provides static resources and has been used in grid computing
• Event-driven method - Based on predicted workload by time.
• Popularity-Driven Resource Provisioning – Based on Internet traffic monitored
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
• This scheme has a minimal loss of QoS, if the predicted popularity is correct.
• Resources may be wasted if traffic does not occur as expected.
• Again, the scheme has a minimal loss of QoS, if the predicted popularity is correct.
• Resources may be wasted if traffic does not occur as expected.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
Fig: Cloud resource deployment using an IGG (intergrid gateway) to allocate the VMs
from a Local cluster to interact with the IGG of a public cloud provider.
Under peak demand, this IGG interacts with another IGG that can allocate resources from
a cloud computing provider.
A grid has predefined peering arrangements with other grids, which the IGG manages.
Through multiple IGGs, the system coordinates the use of InterGrid resources.
An IGG is aware of the peering terms with other grids, selects suitable grids that can
provide the required resources, and replies to requests from other IGGs.
Request redirection policies determine which peering grid InterGrid selects to process a
request and a price for which that grid will perform the task.
An IGG can also allocate resources from a cloud provider.
The InterGrid allocates and provides a distributed virtual environment (DVE).
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
This is a virtual cluster of VMs that runs isolated from other virtual clusters.
A component called the DVE manager performs resource allocation and management on
behalf of specific user applications.
The core component of the IGG is a scheduler for implementing provisioning policies
and peering with other gateways.
The communication component provides an asynchronous message-passing mechanism.
The managers provide a public API for users to submit and control the VMs
.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
Distributed VM Management
• A distributed VM manager makes requests for VMs and queries their status.
• This manager requests VMs from the gateway on behalf of the user application.
• The manager obtains the list of requested VMs from the gateway.
• This list contains a tuple of public IP/private IP addresses for each VM with Secure
Shell (SSH) tunnels.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
• Cloud infrastructure providers (i.e., IaaS providers) have established data centers in
multiple geographical locations to provide redundancy and ensure reliability in case of
site failures.
• Amazon does not provide seamless/automatic mechanisms for scaling its hosted services
across multiple geographically distributed data centers.
• This approach has many shortcomings
• First, it is difficult for cloud customers to determine in advance the best location for
hosting their services as they may not know the origin of consumers of their services.
• Second, SaaS providers may not be able to meet the QoS expectations of their service
consumers originating from multiple geographical locations.
• The figure the high-level components of the Melbourne group’s proposed InterCloud
architecture
• It is not possible for a cloud infrastructure provider to establish its data centers at all
possible locations throughout the world.
• This results in difficulty in meeting the QOS expectations of their customers.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
4.2 Security
• Virtual machines from multiple organizations have to be co-located on the same physical
server in order to maximize the efficiencies of virtualization.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
• Cloud service providers must learn from the managed service provider (MSP) model and
ensure that their customers' applications and data are secure if they hope to retain their
customer base and competitiveness.
• Cloud environment should be free from abuses, cheating, hacking, viruses, rumors, and
privacy and copyright violations.
Example: Amazon’s “Simple Storage Service” [S3] is incompatible with IBM’s Blue Cloud, or
Google, or Dell).
• Customers want their data encrypted while data is at rest (data stored) in the cloud
vendor’s storage pool.
• Data integrity means ensuring that data is identically maintained during any operation
(such as transfer, storage, or retrieval).
• Data integrity is assurance that the data is consistent and correct.
• One of the key challenges in cloud computing is data-level security.
• It is difficult for a customer to find where its data resides on a network controlled by its
provider.
• Some countries have strict limits on what data about its citizens can be stored and for
how long.
• Banking regulators require that customers’ financial data remain in their home country.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
• Security managers will need to pay particular attention to systems that contain critical
data such as corporate financial information.
• Outsourcing (giving rights to third party) loses control over data and not a good idea
from a security perspective.
• Security managers have to interact with company’s legal staff to ensure that appropriate
contract terms are in place to protect corporate data.
• Cloud-based services will result in many mobile IT users accessing business data and
services without traversing the corporate network.
• This will increase the need for enterprises to place security controls between mobile users
and cloud-based services.
• Placing large amounts of sensitive data in a globally accessible cloud leaves
organizations open to large distributed threats—attackers no longer have to come onto the
premises to steal data, and they can find it all in the one "virtual" location.
• Virtualization efficiencies in the cloud require virtual machines from multiple
organizations to be collocated on the same physical resources.
• Although traditional data center security still applies in the cloud environment, physical
segregation and hardware-based security cannot protect against attacks between virtual
machines on the same server.
• The dynamic and fluid nature of virtual machines will make it difficult to maintain the
consistency of security and ensure the auditability of records.
• The ease of cloning and distribution between physical servers could result in the
propagation of configuration errors and other vulnerabilities.
• Localized virtual machines and physical servers use the same operating systems as well
as enterprise and web applications in a cloud server environment, increasing the threat of
an attacker or malware exploiting vulnerabilities in these systems and applications
remotely.
• Virtual machines are vulnerable as they move between the private cloud and the public
cloud.
• Operating system and application files are on a shared physical infrastructure in a
virtualized cloud environment and require system, file, and activity monitoring to provide
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
confidence and auditable proof to enterprise customers that their resources have not been
compromised or tampered with.
• The Intrusion Detection System(IDS) and Intrusion Prevention Systems(IPS) detects
malicious activity at virtual machine level.
• The co-location of multiple virtual machines increases the threat from attacker.
• If Virtual machines and physical machine use the same operating systems in a cloud
environment, increases the threat from an attacker.
• A fully or partially shared cloud environment is expected to have a greater attack than
own resources environment.
• Virtual machines must be self-defending.
• Cloud computing provider is incharge of customer data security and privacy.
4.2.2 Software as a Service Security (Or) Data Security (Or) Application Security (Or)
Virtual Machine Security.
Cloud computing models of the future will likely combine the use of SaaS (and other
XaaS's as appropriate), utility computing, and Web 2.0 collaboration technologies to leverage the
Internet to satisfy their customers' needs. New business models being developed as a result of the
move to cloudcomputing are creating not only new technologies and business operational
processes but also newsecurity requirements and challenges
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
SaaS plays the dominant cloud service model and this is the area where the most critical need for
security practices are required
Security issues that are discussed with cloud-computing vendor:
1. Privileged user access—Inquire about who has specialized access to data, and about the
hiring and management of such administrators.
2. Regulatory compliance—Make sure that the vendor is willing to undergo external audits
and/or security certifications.
3. Data location—Does the provider allow for any control over the location of data?
4. Data segregation—Make sure that encryption is available at all stages, and that these
encryption schemes were designed and tested by experienced professionals.
5. Recovery—Find out what will happen to data in the case of a disaster. Do they offer complete
restoration? If so, how long would that take?
6. Investigative support—Does the vendor have the ability to investigate any inappropriate or
illegal activity?
7. Long-term viability—What will happen to data if the company goes out of business? How
will data be returned, and in what format?
The security practices for the SaaS environment are as follows:
Security Management (People)
• One of the most important actions for a security team is to develop a formal charter for
the security organization and program.
• This will foster a shared vision among the team of what security leadership is driving
toward and expects, and will also foster "ownership" in the success of the collective team.
• The charter should be aligned with the strategic plan of the organization or company the
security team works for.
4.2.3 Security Governance
• A security committee should be developed whose objective is to focus on providing
guidance about security initiatives with business and IT strategies.
• A charter for the security team is typically one of the first deliverables from the steering
committee.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
• This charter must clearly define the roles and responsibilities of the security team and
other groups involved in performing information security functions.
• Lack of a formalized strategy can lead to an unsustainable operating model and
• security level as it evolves.
• In addition, lack of attention to security governance can result in key needs of the
business not being met, including but not limited to, risk management, security
monitoring, applicationsecurity, and sales support.
• Lack of proper governance and management of duties can also result in potential security
risks being left unaddressed and opportunities to improve the business being missed.
• The security team is not focused on the key security functions and activities that are
critical to the business.
Cloud security governance refers to the management model that facilitates effective and
efficient security management and operations in the cloud environment so that an enterprise’s
business targets are achieved. This model incorporates a hierarchy of executive mandates,
performance expectations, operational practices, structures, and metrics that, when implemented,
result in the optimization of business value for an enterprise. Cloud security governance helps
answer leadership questions such as:
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
The lack of a senior management influenced and endorsed security policy is one of the
common challenges facing cloud customers. An enterprise security policy is intended to set the
executive tone, principles and expectations for security management and operations in the cloud.
However, many enterprises tend to author security policies that are often laden with tactical
content, and lack executive input or influence. The result of this situation is the ineffective
definition and communication of executive tone and expectations for security in the cloud.
Lack of embedded management operational controls
Another common cloud security governance challenge is lack of embedded management
controls into cloud security operational processes and procedures. Controls are often interpreted
as an auditor’s checklist or repackaged as procedures, and as a result, are not effectively
embedded into security operational processes and procedures as they should be, for purposes of
optimizing value and reducing day-to-day operational risks. This lack of embedded controls may
result in operational risks that may not be apparent to the enterprise. For example, the security
configuration of a device may be modified (change event) by a staffer without proper analysis of
the business impact (control) of the modification. The net result could be the introduction of
exploitable security weaknesses that may not have been apparent with this modification.
Lack of operating model, roles, and responsibilities
Many enterprises moving into the cloud environment tend to lack a formal operating
model for security, or do not have strategic and tactical roles and responsibilities properly
defined and operationalized. This situation stifles the effectiveness of a security management and
operational function/organization to support security in the cloud. Simply, establishing a
hierarchy that includes designating an accountable official at the top, supported by a stakeholder
committee, management team, operational staff, and third-party provider support (in that order)
can help an enterprise to better manage and control security in the cloud, and protect associated
investments in accordance with enterprise business goals.
Lack of metrics for measuring performance and risk
Another major challenge for cloud customers is the lack of defined metrics to measure
security performance and risks – a problem that also stifles executive visibility into the real
security risks in the cloud. This challenge is directly attributable to the combination of other
challenges discussed above. For example, a metric that quantitatively measures the number of
exploitable security vulnerabilities on host devices in the cloud over time can be leveraged as an
indicator of risk in the host device environment. Similarly, a metric that measures the number of
user-reported security incidents over a given period can be leveraged as a performance indicator
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
of staff awareness and training efforts. Metrics enable executive visibility into the extent to
which security tone and expectations (per established policy) are being met within the enterprise
and support prompt decision-making in reducing risks or rewarding performance as appropriate.
The challenges described above clearly highlight the need for cloud customers to establish a
framework to effectively manage and support security in cloud management, so that the pursuit
of business targets are not potentially compromised. Unless tone and expectations for cloud
security are established (via an enterprise policy) to drive operational processes and procedures
with embedded management controls, it is very difficult to determine or evaluate business value,
performance, resource effectiveness, and risks regarding security operations in the cloud. Cloud
security governance facilitates the institution of a model that helps enterprises explicitly address
the challenges described above.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
responsibility to maximize the business value (Key Goal Indicators, ROI) from the pursuit of
security initiatives in the cloud.
3. Risk Mitigation
Security initiatives in the cloud should be subject to measurements that gauge effectiveness in
mitigating risk to the enterprise (Key Risk Indicators). These initiatives should also yield results
that progressively demonstrate a reduction in these risks over time.
4. Effective Use of Resources
It is important for enterprises to establish a practical operating model for managing and
performing security operations in the cloud, including the proper definition and
operationalization of due processes, the institution of appropriate roles and responsibilities, and
use of relevant tools for overall efficiency and effectiveness.
5. Sustained Performance
Security initiatives in the cloud should be measurable in terms of performance, value and risk to
the enterprise (Key Performance Indicators, Key Risk Indicators), and yield results that
demonstrate attainment of desired targets (Key Goal Indicators) over time.
Risk Management
• Effective risk management entails identification of technology assets; identification of
data and its links to business processes, applications, and data stores; and assignment of
ownership and custodial responsibilities.
• Actions should also include maintaining a repository of information assets
• A risk assessment process should be created that allocates security resources related to
business continuity.
Risk Assessment
• Security risk assessment is critical to helping the information security organization make
informed decisions when balancing the dueling priorities of business utility and
protection of assets.
• Lack of attention to completing formalized risk assessments can contribute to an increase
in information security audit findings, can jeopardize certification goals, and can lead to
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
inefficient and ineffective selection of security controls that may not adequately mitigate
information security risks to an acceptable level.
Security Portfolio(selection) Management
• Security portfolio management ensures efficient and effective operation of any
information.
Security Awareness
• Not providing proper awareness and training to the people who may need them can
expose the company to a variety of security risks
Policies, Standards, and Guidelines
• Policies, standards, and guidelines are developed that can ensure consistency of
performance.
Secure Software Development Life Cycle (SecSDLC)
• The SecSDLC involves identifying specific threats and the risks. The SDLC consists of
six phases
Phase 1.Investigation:
-Define project goals, and document them.
Phase 2.Analysis:
-Analyze current threats and perform risk analysis.
Phase 3.Logical design:
-Develop a security blueprint(plan) and business responses to disaster.
Phase 4.Physical design:
-Select technologies to support the security blueprint(plan).
Phase 5.Implementation:
- Buy or develop security solutions.
Phase 6.Maintenance:
-Constantly monitor, test, modify, update, and repair to respond to changing threats.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
Vulnerability Assessment
• Vulnerability assessment classifies network assets to more efficiently prioritize
vulnerability-mitigation programs, such as patching and system upgrading.
• It measures the effectiveness of risk mitigation by setting goals of reduced vulnerability
exposure and faster mitigation
Password Assurance Testing
• If the SaaS security team or its customers want to periodically test password strength by
running
• password "crackers," they can use cloud computing to decrease crack time and pay only
for what they use.
•
Security Images:
• Virtualization-based cloud computing provides the ability to create "Gold image" VM
secure builds and to clone multiple copies.
• Gold image VMs also provide the ability to keep security up to date and reduce
exposure by patching offline.
Data Privacy
• Depending on the size of the organization and the scale of operations, either an individual
or a team should be assigned and given responsibility for maintaining privacy.
• A member of the security team who is responsible for privacy or security compliance
team should collaborate with the company legal team to address data privacy issues
and concerns.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
• Hiring a consultant in privacy area, will ensure that your organization is prepared to
meet the data privacy demands of its customers and regulators.
Data Governance
The data governance framework should include:
_ Data inventory
_ Data classification
_ Data analysis (business intelligence)
_ Data protection
_ Data privacy
_ Data retention/recovery/discovery
_ Data destruction
Data Security
The challenge in cloud computing is data-level security.
Security to data is given by
Encrypting the data
Permitting only specified users to access the data.
Restricting the data not to cross the countries border.
For example, with data-level security, the enterprise can specify that this data is not allowed to
go outside of the India.
Application Security
This is collaborative effort between the security and product development team.
Application security processes
o Secure coding guidelines
o Training
o Testing scripts
o Tools
Penetration Testing is done to a System or application.
Penetration Testing is defined as a type of Security Testing used to test the insecure areas of
the system or application.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
The goal of this testing is to find all the security vulnerabilities that are present in the system
being tested.
SaaS providers should secure their web applications by following Open Web Application
Security Project (OWASP) guidelines for secure application development, by locking down
ports and unnecessary commands
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
that the protected virtual machines are not migrated to a server in a less secure location. In the
context of Oracle VM, this implies maintaining separate server pools, each with their own group
of servers.
These rules of isolation should also be applied to networking: there are no color coded
network cables to help staff identify and isolate different routes, segments and types network
traffic to and from virtual machines or between them. There are no visual indicators that help
ensure that application, management, and backup traffic are kept separate. Rather than plug
network cables into different physical interfaces and switches, the Oracle VM administrator must
ensure that the virtual network interfaces are connected to separate virtual networks. Specifically,
use VLANs to isolate virtual machines from one another, and assign virtual networks for virtual
machine traffic to different physical interfaces from those used for management, storage or
backup. These can all be controled from the Oracle VM Manager user interface. Ensure that
secure live migration is selected to guarantee that virtual machine memory data is not sent across
the wire unencrypted.
Additional care must be given to virtual machine disk images. In most cases the virtual
disks are made available over the network for migration and failover purposes. In many cases
they are files, which could easily be copied and stolen if the security of network storage is
compromised. Therefore it is essential to lock down the NAS or SAN environments and prevent
unauthorized access. An intruder with root access to a workstation on the storage network could
mount storage assets and copy or alter their contents. Use a separate network for transmission
between the storage servers and the Oracle VM hosts to ensure its traffic is not made public and
subject to being snooped. Make sure that unauthorized individuals are not permitted to log into
the Oracle VM Servers, as that would give them access to the guests' virtual disk images, and
potentially much more.
All of these steps require controlling access to the Oracle VM Manager and Oracle VM
Server domain 0 instances. Network access to these hosts should be on a private network, and the
user accounts able to log into any of the servers in the Oracle VM environment should be
rigorously controlled, and limited to the smallest possible number of individuals.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
Authorization management: Activities for the effective governance and management of the
process for determining entitlement rights that decide what resources an entity is permitted to
access in accordance with the organization’s policies.
Access management: Enforcement of policies for access control in response to a request from
an entity (user, services) wanting to access an IT resource within the organization.
Data management and provisioning: Propagation of identity and data for authorization to IT
resources via automated or manual processes.
Monitoring and auditing: Monitoring, auditing, and reporting compliance by users regarding
access to resources within the organization based on the defined policies.
IAM processes support the following operational activities:
Provisioning: Provisioning can be thought of as a combination of the duties of the
human resources and IT departments, where users are given access to data repositories or
systems, applications, and databases based on a unique user identity. Deprovisioning works in
the opposite manner, resulting in the deletion or deactivation of an identity or of privileges
assigned to the user identity.
Credential and attribute management: These processes are designed to manage the life cycle
of credentials and user attributes— create, issue, manage, revoke—to inappropriate account use.
Credentials are usually bound to an individual and are verified during the authentication process.
The processes include provisioning of attributes, static (e.g., standard text password) and
dynamic (e.g., one-time password) credentials that comply with a password standard (e.g.,
passwords resistant to dictionary attacks), handling password expiration, encryption management
of credentials during transit and at rest, and access policies of user attributes (privacy and
handling of attributes for various regulatory reasons).Minimize the business risk associated with
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
identityimpersonation
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
capabilities, our assessment is that they still fall short of enterprise IAM requirements for
managing regulatory, privacy, and data protection requirements. The maturity model takes into
account the dynamic nature of IAM users, systems, and applications in the cloud and
addresses the four key components of the IAM automation process:
• User Management, New Users
• User Management, User Modifications
• Authentication Management
• Authorization Management
IAM practices and processes are applicable to cloud services; they need to be adjusted to the
cloud environment. Broadly speaking, user management functions in the cloud can be
categorized as follows:
• Cloud identity administration, Federation or SSO
• Authorization management
• Compliance management
Cloud Identity Administration: Cloud identity administrative functions should focus on life
cycle management of user identities in the cloud—provisioning, deprovisioning, identity
federation, SSO, password or credentials management, profile management, and administrative
management. Organizations that are not capable of supporting federation should explore cloud-
based identity management services. This new breed of services usually synchronizes an
organization’s internal directories with its directory (usually multitenant) and acts as a proxy IdP
for the organization.
Federated Identity (SSO): Organizations planning to implement identity federation that enables
SSO for users can take one of the following two paths (architectures):
• Implement an enterprise IdP within an organization perimeter.
• Integrate with a trusted cloud-based identity management service provider.
Both architectures have pros and cons.
Enterprise identity provider: In this architecture, cloud services will delegate authentication to
an organization’s IdP. In this delegated authentication architecture, the organization federates
identities within a trusted circle of CSP domains. A circle of trust can be created with all the
domains that are authorized to delegate authentication to the IdP. In this deployment architecture,
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
where the organization will provide and support an IdP, greater control can be exercised over
user identities, attributes, credentials, and policies for authenticating and authorizing users to a
cloud service.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
SAML standardizes queries for, and responses that contain, user authentication,
entitlements, and attribute information in an XML format. This format can then be used to
request security information about a principal from a SAML authority. A SAML authority,
sometimes called the asserting party, is a platform or application that can relay security
information. The relying party (or assertion consumer or requesting party) is a partner site that
receives the security information.
The exchanged information deals with a subject's authentication status, access
authorization, and attribute information. A subject is an entity in a particular domain. A person
identified by an email address is a subject, as might be a printer.
SAML assertions are usually transferred from identity providers to service providers. Assertions
contain statements that service providers use to make access control decisions. Three types of
statements are provided by SAML: authentication statements, attribute statements, and
authorization decision statements. SAML assertions contain a packet of security information in
this form:
<saml:Assertion A...>
<Authentication>
...
</Authentication>
<Attribute>
...
</Attribute>
<Authorization>
...
</Authorization>
</saml:Assertion A>
The assertion shown above is interpreted as follows:
Assertion A, issued at time T by issuer I, regarding subject
S, provided conditions C are valid.
Authentication statements assert to a service provider that the principal did indeed
authenticate with an identity provider at a particular time using a particular method of
authentication. Other information about the authenticated principal (called the authentication
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
benefit. The Core deals with fundamental aspects of the protocol, namely, to establish a
mechanism for exchanging a user name and password for a token with defined rights and to
provide tools to protect the token. . In fact, OAuth by itself provides no privacy at all and
depends on other protocols such as SSL to accomplish that.
4.5.3OpenID
OpenID is an open, decentralized standard for user authentication and access control that
allows users to log onto many services using the same digital identity. It is a single-sign-on
(SSO) method of access control. As such, it replaces the common log-in process (i.e., a log-in
name and a password) by allowing users to log in once and gain access to resources across
participating systems. The original OpenID authentication protocol was developed in May 2005
by Brad Fitzpatrick, creator of the popular community web site Live-Journal. In late June 2005,
discussions began between OpenID developers and other developers from an enterprise software
company named Net-Mesh. These discussions led to further collaboration on interoperability
between OpenID and NetMesh's similar Light-Weight Identity (LID) protocol. The direct result
of the collaboration was the Yadis discovery protocol, which was announced on October 24,
2005.
The Yadis specification provides a general-purpose identifier for a person and any other
entity, which canbe used with a variety of services. It provides a syntax for a resource description
document identifying services available using that identifier and an interpretation of the elements
of that document. Yadis discovery protocol is used for obtaining a resource description
document, given that identifier. Together these enable coexistence and interoperability of a rich
variety of services using a single identifier. The identifier uses a standard syntax and a well-
established namespace and requires no additional namespace administration infrastructure.
An OpenID is in the form of a unique URL and is authenticated by the entity hosting the OpenID
URL.The OpenID protocol does not rely on a central authority to authenticate a user's identity.
Neither the OpenID protocol nor any web sites requiring identification can mandate that a
specific type of authentication be used; nonstandard forms of authentication such as smart cards,
biometrics, or ordinary passwords are allowed. A typical scenario for using OpenID might be
something like this: A user visits a web site that displays an OpenID log-in form somewhere on
the page. Unlike a typical log-in form, which has fields for user name and password, the OpenID
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
log-in form has only one field for the OpenID identifier (which is an OpenID URL). This form is
connected to an implementation of an OpenID client library.
A user will have previously registered an OpenID identifier with an OpenID identity
provider. The user types this OpenID identifier into the OpenID log-in form. The relying party
then requests the web page located at that URL and reads an HTML link tag to discover the
identity provider service URL. With OpenID 2.0, the client discovers the identity provider
service URL by requesting the XRDS document (also called the Yadis document) with the
content type application/xrds+xml, which may be available at the target URL but is always
available for a target XRI.
There are two modes by which the relying party can communicate with the identity
provider: checkid_immediate and checkid_setup. In checkid_immediate, the relying party
requests that the provider not interact with the user. All communication is relayed through the
user's browser without explicitly notifying the user. In checkid_setup, the user communicates
with the provider server directly using the same web browser as is used to access the relying
party site. The second option is more popular on the web.
To start a session, the relying party and the identity provider establish a shared secret—
referenced by an associate handle—which the relying party then stores. Using checkid_setup,
the relying party redirects the user's web browser to the identity provider so that the user can
authenticate with the provider. The method of authentication varies, but typically, an OpenID
identity provider prompts the user for a password, then asks whether the user trusts the relying
party web site to receive his or her credentials and identity details. If the user declines the
identity provider's request to trust the relying party web site, the browser is redirected to the
relying party with a message indicating that authentication was rejected.
The site in turn refuses to authenticate the user. If the user accepts the identity provider's
request to trust the relying party web site, the browser is redirected to the designated return page
on the relying party web site along with the user's credentials. That relying party must then
confirm that the credentials really came from the identity provider. If they had previously
established a shared secret, the relying party can validate the shared secret received with the
credentials against the one previously stored. In this case, the relying party is considered to be
stateful, because it stores the shared secret between sessions (a process sometimes referred to as
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
persistence). In comparison, a stateless relying party must make background requests using the
check_authentication method to be sure that the data came from the identity provider.
4.5.4 SSL/TLS
Transport Layer Security (TLS) and its predecessor, Secure Sockets Layer (SSL), are
cryptographically secure protocols designed to provide security and data integrity for
communications over TCP/IP. TLS and SSL encrypt the segments of network connections at the
transport layer. Several versions of the protocols are in general use in web browsers, email,
instant messaging, and voice-over-IP. TLS is an IETF standard protocol which was last updated
in RFC 5246.
The TLS protocol allows client/server applications to communicate across a network in a
way specifically designed to prevent eavesdropping, tampering, and message forgery. TLS
provides endpoint authentication and data confidentiality by using cryptography. TLS
authentication is one-way—the server is authenticated, because the client already knows the
server's identity. In this case, the client remains unauthenticated. At the browser level, this means
that the browser has validated the server's certificate—more specifically, it has checked the
digital signatures of the server certificate's issuing chain ofCertification Authorities (CAs).
Validation does not identify the server to the end user. For true identification, the end
user must verify the identification information contained in the server's certificate (and, indeed,
its whole issuing CA chain).This is the only way for the end user to know the "identity" of the
server, and this is the only way identity can be securely established, verifying that the URL,
name, or address that is being used is specified inthe server's certificate. Malicious web sites
cannot use the valid certificate of another web site becausethey have no means to encrypt the
transmission in a way that it can be decrypted with the valid certificate.
Since only a trusted CA can embed a URL in the certificate, this ensures that checking the
apparent URL with the URL specified in the certificate is an acceptable way of identifying the
site.TLS also supports a more secure bilateral connection mode whereby both ends of the
connection can be assured that they are communicating with whom they believe they are
connected. This is known asmutual (assured) authentication. Mutual authentication requires the
TLS client-side to also maintain a certificate.
TLS involves three basic phases:
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit IV Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
Users of Hadoop:
❖ Hadoop is running search on some of the Internet's largest sites:
o Amazon Web Services: Elastic MapReduce
o AOL: Variety of uses, e.g., behavioral analysis & targeting
o Ebay: Search optimization (532-node cluster)
o Facebook: Reporting/analytics, machine learning (1100 m.)
o LinkedIn: People You May Know (2x50 machines)
o Twitter: Store + process tweets, log files, other data Yahoo: >36,000 nodes; biggest
cluster is 4,000 nodes
Hadoop Architecture
❖ Hadoop has a Master Slave Architecture for both Storage & Processing
❖ Hadoop framework includes following four modules:
❖ Hadoop Common: These are Java libraries and provide file system and OS level
abstractions and contains the necessary Java files and scripts required to start Hadoop.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
❖ Hadoop YARN: This is a framework for job scheduling and cluster resource management.
❖ Hadoop Distributed File System (HDFS): A distributed file system that provides high-
throughput access to application data.
❖ HadoopMapReduce: This is system for parallel processing of large data sets.
HDFS
To store a file in this architecture,
HDFS splits the file into fixed-size blocks (e.g., 64 MB) and stores them on workers (Data
Nodes).
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
HDFS-Write Operation
Writing to a file:
To write a file in HDFS, a user sends a “create” request to the NameNode to create a new
file in the file system namespace.
If the file does not exist, the NameNode notifies the user and allows him to start writing
data to the file by calling the write function.
The first block of the file is written to an internal queue termed the data queue.
A data streamer monitors its writing into a DataNode.
Each file block needs to be replicated by a predefined factor.
The data streamer first sends a request to the NameNode to get a list of suitable DataNodes
to store replicas of the first block.
The steamer then stores the block in the first allocated DataNode.
Afterward, the block is forwarded to the second DataNode by the first DataNode.
The process continues until all allocated DataNodes receive a replica of the first block from
the previous DataNode.
Once this replication process is finalized, the same process starts for the second block.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
➢ To store a file in this architecture, HDFS splits the file into fixed-size blocks (e.g., 64 MB)
and stores them on workers (DataNodes).
➢ The NameNode (master) also manages the file system’s metadata and namespace.
➢ Job Tracker is the master node (runs with the namenode)
o Receives the user’s job
o Decides on how many tasks will run (number of mappers)
o Decides on where to run each mapper (concept of locality)
➢ Task Tracker is the slave node (runs on each datanode)
o Receives the task from Job Tracker
o Runs the task until completion (either map or reduce task)
o Always in communication with the Job Tracker reporting progress (heartbeats)
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
❖ The data flow starts by calling the runJob(conf) function inside a user program running on
the user node, in which conf is an object containing some tuning parameters for the
MapReduce
❖ Job Submission: Each job is submitted from a user node to the JobTracker node.
❖ Task assignment : The JobTracker creates one map task for each computed input split
❖ Task execution : The control flow to execute a task (either map or reduce) starts inside the
TaskTracker by copying the job JAR file to its file system.
❖ Task running check : A task running check is performed by receiving periodic heartbeat
messages to the JobTracker from the TaskTrackers.
❖ Heartbeat:notifies the JobTracker that the sending TaskTracker is alive, and whether the
sending TaskTracker is ready to run a new task.
The Apache Hadoop project develops open-source software for reliable, scalable, distributed
computing, including:
❖ HadoopCore, our flagship sub-project, provides a distributed filesystem (HDFS) and
support for the MapReduce distributed computing metaphor.
❖ HBase builds on Hadoop Core to provide a scalable, distributed database.
❖ Pig is a high-level data-flow language and execution framework for parallel
computation.It is built on top of Hadoop Core.
❖ ZooKeeperis a highly available and reliable coordination system. Distributed
applications use ZooKeeper to store and mediate updates for critical shared state.
❖ Hive is a data warehouse infrastructure built on Hadoop Core that provides data
summarization, adhoc querying and analysis of datasets.
MAP REDUCE
❖ MapReduce is a programming model for data processing.
❖ MapReduce is designed to efficiently process large volumes of data by connecting many
commodity computers together to work in parallel
❖ Hadoop can run MapReduce programs written in various languages like Java, Ruby, and
Python
❖ MapReduce works by breaking the processing into two phases:
o The map phase and
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
Overall, MapReduce breaks the data flow into two phases, map phase and reduce phase
Mapreduce Workflow
Application writer specifies
❖ A pair of functions called Mapper and Reducer and a set of input files and submits the job
❖ Input phase generates a number of FileSplits from input files (one per Map task)
❖ The Map phase executes a user function to transform input key-pairs into a new set of key-
pairs
❖ The framework Sorts & Shuffles the key-pairs to output nodes
❖ The Reduce phase combines all key-pairs with the same key into new keypairs
❖ The output phase writes the resulting pairs to files as “parts”
Characteristics of MapReduce is characterized by:
❖ Its simplified programming model which allows the user to quickly write and test
distributed systems
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
❖ Its efficient and automatic distribution of data and workload across machines
❖ Its flat scalability curve. Specifically, after a Mapreduce program is written and functioning
on 10 nodes, very little-if any- work is required for making that same program run on 1000
nodes
The core concept of MapReduce in Hadoop is that input may be split into logical
chunks, and each chunk may be initially processed independently, by a map task. The results of
these individual processing chunks can be physically partitioned into distinct sets, which are
then sorted. Each sorted chunk is passed to a reduce task.
A map task may run on any compute node in the cluster, and multiple map tasks may
berunning in parallel across the cluster. The map task is responsible for transforming the input
records into key/value pairs. The output of all of the maps will be partitioned, and each partition
will be sorted. There will be one partition for each reduce task. Each partition’s sorted keys and the
values associated with the keys are then processed by the reduce task. There may be multiple
reduce tasks running in parallel on the cluster.
The application developer needs to provide only four items to the Hadoop framework:
the class that will read the input records and transform them into one key/value pair per record,
a map method, a reduce method, and a class that will transform the key/value pairs that the
reduce method outputs into output records.
My first MapReduce application was a specialized web crawler. This crawler received
as input large sets of media URLs that were to have their content fetched and processed. The
media items were large, and fetching them had a significant cost in time and resources.
The job had several steps:
1. Ingest the URLs and their associated metadata.
2. Normalize the URLs.
3. Eliminate duplicate URLs.
4. Filter the URLs against a set of exclusion and inclusion filters.
5. Filter the URLs against a do not fetch list.
6. Filter the URLs against a recently seen set.
7. Fetch the URLs.
8. Fingerprint the content items.
9. Update the recently seen set.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
Hadoop is bad
5.2 VirtualBox
VirtualBox is a general-purpose full virtualizer for x86 hardware, targeted at server,
desktop, and embedded use. Developed initially by Innotek GmbH, it was acquired by Sun
Microsystems in 2008, which was, in turn, acquired by Oracle in 2010.
VirtualBox is an extremely feature rich, high-performance product for enterprise
customers, it is also the only professional solution that is freely available as Open Source Software
under the terms of the GNU General Public License (GPL) version 2. It supports Windows, Linux,
Macintosh, Sun Solaris, and FreeBSD. Virtual Box has supported Open Virtualization Format
(OVF) since version 2.2.0 (April 2009)
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
Operating system virtualization allows your computer’s hardware to run many operating system
images simultaneously. One of the most used instances of this is to test software or applications in
a different environment, rather than on a different computer. This can potentially save you quite a
bit of money by running multiple servers virtually on one computer.
Pros and Cons of Virtual Box over VMWare
• Virtual Box is a general-purpose full virtualizer for x86 hardware, targeted at server, desktop,
and embedded use.
• This product is a Type 2 hypervisor, so it’s virtualization host software that runs on an already
established operating system as an application.
• With VirtualBox, it’s also possible to share your clipboard between the virtualized and host
operating system.
• While VMWare functions on Windows and Linux, not Mac, Virtual Box works with
Windows, Mac, and Linux computers.
• Virtual Box truly has a lot of support because it's open-source and free. Being open-source
means that recent releases are sometimes a bit buggy, but also that they typically get fixed
relatively quickly.
• With VMWare player, instead, you have to wait for the company to release an update to fix
the bugs.
• Virtual Box offers you an unlimited number of snapshots.
• Virtual Box is easy to install, takes a smaller amount of resources, and is many people’s first
choice.
• VMWare often failed to detect my USB device Besides, VirtualBox can detect as well as
identify USB devices after installing Virtual Box Extension Pack.
• With VirtualBox Guest Addition, files can be dragged and copied between VirtualBox and
host.
• VMware outperforms VirtualBox in terms of CPU and memory utilization.
• VirtualBox has snapshots and VMware has rollback points to which you can revert back to in
case you break your virtual machine.
• VMware calls it Unity mode and VirtualBox calls it the seamless mode and they both enable
you to open application windows on the host machine, while the VM supporting that app is
running in the background quietly.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
• In the case of VirtualBox, the UI is simple and clean. Your settings are split into Machine
Tools and Global Tools and the former is for creating, modifying, starting, stop and deleting
virtual machines. VMware, on the other hand, has a much more complicated UI, menu items
are named with technical terms which may seem like jargon to average users. This is
primarily because the VMware folks cater to cloud providers and server-side virtualizations
more.
• In case of VirtualBox, PCIe pass through can be accomplished, although you might have to
jump through some hoops. VMware on the other offers excellent customer support and would
help you out if you are in a fix.
• VirtualBox is basically a highly secure program that allows users to download and run OS as
a virtual machine. With Virtual Box, users are able to abstract their hardware via complete
virtualization thus guaranteeing a higher degree of protection from viruses running in the
guest OS.
• Virtual Box offers limited support for 3D graphics. And VMWare has a high-level 3D
graphics support with DX10 and OpenGL 3.3 support.
• The real advantage of VirtualBox over VMware server lies in its performance. VirtualBox
apparently runs faster than VMware server. A timed experiment of an installation of Windows
XP as the guest OS took 20 mins in VirtualBox and 35 mins on VMware server. A similar test
on the booting time of the guest OS also shows favor to VirtualBox with timing of 45secs
compared to 1min 39 secs on VMware server.
• In VirtualBox, the remote file sharing feature is built right in the package. Setting up remote
file sharing is easy and you only need to do it once: point the file path to the directory that you
want to share.
Google App Engine is a PaaS cloud that provides a complete Web service
environment(Platform)
GAE provides Web application development platform for users.
All required hardware, operating systems and software are provided to clients.
Clients can develop their own applications, while App Engine runs the applications on
Google’s servers.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
Google has established cloud development by making use of large number of data centers.
Eg: Google established cloud services in
❖ Gmail
❖ Google Docs
❖ Google Earth etc.
These applications can support a large number of users simultaneously with High
Availability (HA).
In 2008, Google announced the GAE web application platform.
GAE enables users to run their applications on a large number of data centers.
Google App Engine environment includes the following features :
❖ Dynamic web serving
❖ Persistent(constant) storage with queries, sorting, and transactions
❖ Automatic scaling and load balancing
Provides Application Programming Interface(API) for authenticating users.
Send email using Google Accounts.
Local development environment that simulates(create) Google App Engine on your
computer.
GAE ARCHITECTURE
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
When the user wants to get the data, he/she will first send an authorized data requests to
Google Apps.
It forwards the request to the tunnel server.
The tunnel servers validate the request identity.
If the identity is valid, the tunnel protocol allows the SDC to set up a connection,
authenticate, and encrypt the data that flows across the Internet.
SDC also validates whether a user is authorized to access a specified resource.
Application runtime environment offers a platform for web programming and execution.
It supports two development languages: Python and Java.
Software Development Kit (SDK) is used for local application development.
The SDK allows users to execute test runs of local applications and upload application
code.
Administration console is used for easy management of user application development
cycles.
GAE web service infrastructure provides special guarantee flexible use and management of
storage and network resources by GAE.
Google offers essentially free GAE services to all Gmail account owners.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
We can register for a GAE account or use your Gmail account name to sign up for the
service.
The service is free within a quota.
If you exceed the quota, extra amount will be charged.
Allows the user to deploy user-built applications on top of the cloud infrastructure.
They are built using the programming languages and software tools supported by the
provider (e.g., Java, Python)
GAE APPLICATIONS
GAE programming model for two supported languages: Java and Python. A client
environment includes an Eclipse plug-in for Java allows you to debug your GAE on your local
machine. Google Web Toolkit is available for Java web application developers. Python is used
with frameworks such as Django and CherryPy, but Google also has webapp Python environment.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
There are several powerful constructs for storing and accessing data. The data store is a
NOSQL data management system for entities. Java offers Java Data Object (JDO) and Java
Persistence API (JPA) interfaces implemented by the Data Nucleus Access platform, while Python
has a SQL-like query language called GQL. The performance of the data store can be enhanced by
in-memory caching using the memcache, which can also be used independently of the data store.
Recently, Google added the blobstore which is suitable for large files as its size limit is 2
GB. There are several mechanisms for incorporating external resources. The Google SDC Secure
Data Connection can tunnel through the Internet and link your intranet to an external GAE
application. The URL Fetch operation provides the ability for applications to fetch resources and
communicate with other hosts over the Internet using HTTP and HTTPS requests.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
An application can use Google Accounts for user authentication. Google Accounts handles
user account creation and sign-in, and a user that already has a Google account (such as a Gmail
account) can use that account with your app. GAE provides the ability to manipulate image data
using a dedicated Images service which can resize, rotate, flip, crop, and enhance images. A GAE
application is configured to consume resources up to certain limits or quotas. With quotas, GAE
ensures that your application won’t exceed your budget, and that other applications running on
GAE won’t impact the performance of your app. In particular, GAE use is free up to certain
quotas.
Google File System (GFS)
GFS is a fundamental storage service for Google’s search engine. GFS was designed for
Google applications, and Google applications were built for GFS. There are several concerns in
GFS. rate). As servers are composed of inexpensive commodity components, it is the norm rather
than the exception that concurrent failures will occur all the time. Other concerns the file size in
GFS. GFS typically will hold a large number of huge files, each 100 MB or larger, with files that
are multiple GB in size quite common. Thus, Google has chosen its file data block size to be 64
MB instead of the 4 KB in typical traditional file systems. The I/O pattern in the Google
application is also special. Files are typically written once, and the write operations are often the
appending data blocks to the end of files. Multiple appending operations might be concurrent. The
customized API can simplify the problem and focus on Google applications.
Figure shows the GFS architecture. It is quite obvious that there is a single master in the
whole cluster. Other nodes act as the chunk servers for storing data, while the single master stores
the metadata. The file system namespace and locking facilities are managed by the master. The
master periodically communicates with the chunk servers to collect management information as
well as give instructions to the chunk servers to do work such as load balancing or fail recovery.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
The master has enough information to keep the whole cluster in a healthy state. Google
uses a shadow master to replicate all the data on the master, and the design guarantees that all the
data operations are performed directly between the client and the chunk server. The control
messages are transferred between the master and the clients and they can be cached for future use.
With the current quality of commodity servers, the single master can handle a cluster of more than
1,000 nodes.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
2. The master replies with the identity of the primary and the locations of the other (secondary)
replicas. The client caches this data for future mutations. It needs to contact the master again only
when the primary becomes unreachable or replies that it no longer holds a lease.
3. The client pushes the data to all the replicas. Each chunk server will store the data in an internal
LRU buffer cache until the data is used or aged out. By decoupling the data flow from the control
flow, we can improve performance by scheduling the expensive data flow based on the network
topology regardless of which chunk server is the primary.
4. Once all the replicas have acknowledged receiving the data, the client sends a write request to
the primary. The request identifies the data pushed earlier to all the replicas. The primary assigns
consecutive serial numbers to all the mutations it receives, possibly from multiple clients, which
provides the necessary serialization. It applies the mutation to its own local state in serial order.
5. The primary forwards the write request to all secondary replicas. Each secondary replica applies
mutations in the same serial number order assigned by the primary.
6. The secondaries all reply to the primary indicating that they have completed the operation.
7. The primary replies to the client. Any errors encountered at any replicas are reported to the
client. In case of errors, the write corrects at the primary and an arbitrary subset of the secondary
replicas. The client request is considered to have failed, and the modified region is left in an
inconsistent state. Our client code handles such errors by retrying the failed mutation
Big Table
BigTable was designed to provide a service for storing and retrieving structured and
semistructured data. BigTable applications include storage of web pages, per-user data, and
geographic locations. The database needs to support very high read/write rates and the scale might
be millions of operations per second. Also, the database needs to support efficient scans over all or
interesting subsets of data, as well as efficient joins of large one-to-one and one-to-many data sets.
The application may need to examine data changes over time.
The BigTable system is scalable, which means the system has thousands of servers,
terabytes of in-memory data, petabytes of disk-based data, millions of reads/writes per second, and
efficient scans. BigTable is used in many projects, including Google Search, Orkut, and Google
Maps/Google Earth, among others.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
The BigTable system is built on top of an existing Google cloud infrastructure. BigTable uses the
following building blocks:
1. GFS: stores persistent state
2. Scheduler: schedules jobs involved in BigTable serving
3. Lock service: master election, location bootstrapping
4. MapReduce: often used to read/write BigTable data.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
Compute (Nova)
OpenStack Compute is also known as OpenStack Nova.
Nova is the primary compute engine of OpenStack, used for deploying and managing virtual
machine.
OpenStack Compute manages pools of computer resources and work with virtualization
technologies.
Nova can be deployed using hypervisor technologies such as KVM, VMware, LXC, XenServer,
etc.
Image Service (Glance)
OpenStack image service offers storing and retrieval of virtual machine disk images.
OpenStack Compute makes use of this during VM provisioning.
Glance has client-server architecture which allows querying of virtual machine image.
While deploying new virtual machine instances, Glance uses the stored images as templates.
OpenStack Glance supports VirtualBox, VMWare and KVM virtual machine images.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
Dashboard (Horizon)
OpenStack Horizon is a web-based graphical interface that cloud administrators and users can
access tomanage OpenStack compute, storage and networking services.
To service providers it provides services such as monitoring,billing, and other management
tools.
Networking (Neutron)
Neutron provides networking capability like managing networks and IP addresses for
OpenStack.
OpenStack networking allows users to create their own networks and connects devices and
servers to one or more networks.
Neutron also offers an extension framework, which supports deploying and managing of other
network services such as virtual private networks (VPN), firewalls, load balancing, and intrusion
detection system (IDS)
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
Telemetry (Ceilometer)
It provides customer billing, resource tracking, and alarming
capabilities across all OpenStack core components.
Orchestration (Heat)
Heat is a service to orchestrate (coordinates) multiple composite cloud applications using
templates.
Workflow (Mistral)
Mistral is a service that manages workflows.
User typically writes a workflow using workflow language and uploads the workflow definition.
The user can start workflow manually.
Database (Trove)
Trove is Database as a Service for OpenStack.
Allows users to quickly and easily utilize the features of a database without the burden of
handling complex administrative tasks.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
Users will specify several parameters like the Hadoop version number, the cluster topology
type, node flavor details (defining disk space, CPU and RAM settings), and others.
Messaging (Zaqar)
Zaqar is a multi-tenant cloud messaging service for Web developers.
DNS (Designate)
Designate is a multi-tenant API for managing DNS.
Search (Searchlight)
Searchlight provides advanced and consistent search capabilities across various OpenStack
cloud services.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
This alarming service enables the ability to trigger actions based on defined rules against an
event data collected by Ceilometer.
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
Permissive federation occurs when a server accepts a connection from a peer network
server without verifying its identity using DNS lookups or certificate checking.
The lack of verification or authentication may lead to domain spoofing.
The unauthorized use of a third party domain name in an email message in order to
pretend to be someone else), which opens the door to widespread spam and other abuses.
Verified Federation:
This type of federation occurs when a server accepts a connection from a peer after the
identity of the peer has been verified.
It uses information obtained via DNS and by means of domain-specific keys exchanged
beforehand.
The connection is not encrypted, and the use of identity verification effectively prevents
domain spoofing.
Federation requires proper DNS setup, and that is still subject to DNS poisoning attacks.
Verified federation has been the default service policy on the open XMPP since the
release of the open-source jabberd 1.2 server.
XMPP-real time communication protocol uses XML.
Prevent Address spoofing
Encrypted federation:
Server accepts a connection from a peer if and only if the peer supports Transport Layer
Security (TLS) as defined for XMPP in Request for Comments (RFC) 3920.
The peer must present a digital certificate.
The certificate may be self-signed, but this prevents using mutual authentication.
XEP-0220 defines the Server Dialback protocol, which is used between XMPP servers
to provide identity verification.
Server Dialback uses the DNS as the basis for verifying identity.
The basic approach is that a receiving server receives a server-to- server connection
request from an originating server.
It does not accept the request until it has verified a key with an authoritative server for
the domain asserted by the originating server.
Server Dialback does not provide strong authentication or trusted federation
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
annauniversityedu.blogspot.com
Panimalar Instiute of Technology CS8791-Cloud Computing Unit V Notes
Future of Federation:
The implementation of federated communications is a precursor to building a seamless
cloud that can interact with people, devices, information feeds, documents, application
interfaces, and other entities.
It enables software developers and service providers to build and deploy such
applications without asking permission from a large, centralized communications
operator.
Many big companies (e.g. banks, hosting companies, etc.) and also many large institutions
maintain several distributed data-centers or server-farms, for example to serve to multiple
geographically distributed offices, to implement HA, or to guarantee server proximity to
the end user. Resources and networks in these distributed data-centers are usually
configured as non-cooperative separate elements.
Many educational and research centers often deploy their own computing infrastructures,
that usually do not cooperate with other institutions, except in same punctual situations
(e.g. in joint projects or initiatives). Many times, even different departments within the
same institution maintain their own non-cooperative infrastructures.
Cloud end-users are often tied to a unique cloud provider, because of the different APIs,
image formats, and access methods exposed by different providers that make very difficult
for an average user to move its applications from one cloud to another, so leading to a
vendor lock-in problem.
Many SMEs have their own on-premise private cloud infrastructures to support the
internal computing necessities and workloads. These infrastructures are often over-sized
to satisfy peak demand periods, and avoid performance slow-down. Hybrid cloud (or
cloud bursting) model is a solution to reduce the on-premise infrastructure size, so that it
can be dimensioned for an average load, and it is complemented with external resources
from a public cloud provider to satisfy peak demands.
The cloud consumer is often presented with "take-it-or-leave-it standard contracts that
might be cost-saving for the provider but is often undesirable for the user”. The
commission aims to develop with “stakeholders model terms for cloud computing service
level agreements for contracts”.
annauniversityedu.blogspot.com