CCS335 Cloud Computing Lecture Notes 2
CCS335 Cloud Computing Lecture Notes 2
com
lOMoARcPSD|32653156
UNIT I
CLOUD ARCHITECTURE MODELS AND INFRASTRUCTURE
Cloud Architecture: System Models for Distributed and Cloud Computing – NIST
Cloud Computing Reference Architecture – Cloud deployment models – Cloud service
models.
CLOUD ARCHITETURE:
1. Explain in detail about Cloud Computing Architecture (Or) Explain the Various
Layered Cloud Architectural Development Design for Effective Cloud Computing
Environment. (Nov/Dec 2020)
www.EnggTree.com
To achieve this goal, certain requirements have to be considered. The basic requirements
for cloud architecture design are given as follows:
• The cloud architecture design must provide automated delivery of cloud services along
with automated management.
• It must support latest web standards like Web 2.0 or higher and REST or RESTful
APIs.
• It must support very large - scale HPC infrastructure with both physical and virtual
machines.
• The architecture of cloud must be loosely coupled.
• It should provide easy access to cloud services through a self - service web portal.
• Cloud management software must be efficient to receive the user request, finds the
correct resources and then calls the provisioning services which invoke the resources
in the cloud.
• It must provide enhanced security for shared access to the resources from data centers.
• It must use cluster architecture for getting the system scalability.
• The cloud architecture design must be reliable and flexible.
• It must provide efficient performance and faster speed of access.
Today's clouds are built to support lots of tenants (cloud devices) over the resource pools
and large data volumes. So, the hardware and software plays an important role to achieve that.
The rapid development in multicore CPUs, memory chips, and disk arrays in the hardware
field has made it possible to create data centers with large volumes of storage space instantly.
While development in software standards like web 2.0 and SOA have immensely helped to
developed a cloud services.
The Service Oriented Architecture (SOA) is also a crucial component which is used in the
delivery of SaaS.
The web service software detects the status of the joining and leaving of each node server
and performs appropriate tasks accordingly. The virtualization of infrastructure allows for
quick cloud delivery and recovery from disasters. In recent cloud platforms, resources are
built into the data centers which are typically owned and operated by a third - party
provider.
Layered Cloud Architecture Design
The layered architecture of a cloud is composed of three basic layers called infrastructure,
platform and application. These three levels of architecture are implemented with
virtualization and standardization of cloud - provided hardware and software resources. This
www.EnggTree.com
architectural design facilitates public, private and hybrid cloud services that are conveyed to
users through networking support over the internet and the intranets. The layered cloud
architecture design is shown in Fig. 1.1
In layered architecture, the foundation layer is infrastructure which is responsible for
providing different Infrastructure as a Service (IaaS) components and related services.
It is the first layer to be deployed before platform and application to get IaaS services and
to run other two layers.
• The infrastructure layer consists of virtualized services for computing, storage and
networking. It is responsible for provisioning infrastructure components like compute
(CPU and memory), storage, network and IO resources to run virtual machines or
virtual servers along with virtual storages.
• The abstraction of these hardware resources is intended to provide the flexibility to the
users. Internally, virtualization performs automated resource provisioning and
optimizes the process of managing resources.
The infrastructure layer act as a foundation for building the second layer called platform
layer for supporting PaaS services.
The platform layer is responsible for providing readily available development and
deployment platform for web applications to the cloud users without needing them to install
in a local device. This layer provides an environment for users to create their applications,
test operation flows, track the performance and monitor execution results.
The platform must be ensuring to provide scalability, reliability and security. In this layer,
virtualized cloud platform, acts as an "application middleware" between the cloud
infrastructure and application layer of cloud. The platform layer is the foundation for
application layer.
www.EnggTree.com
A collection of all software modules required for SaaS applications forms the application
layer. This layer is mainly responsible for making on demand application delivery.
In this layer, software applications include day-to-day office management software’s used
for information collection, document processing, calendar and authentication.
Enterprises also use the application layer extensively in business marketing, sales,
Customer Relationship Management (CRM), financial transactions and Supply Chain
Management (SCM). It is important to remember that not all cloud services are limited to a
single layer.
Many applications can require mixed - layers resources. After all, with a relation of
dependency, the three layers are constructed from the bottom-up approach. From the
perspective of the user, the services at various levels need specific amounts of vendor support
and resource management for functionality.
In general, SaaS needs the provider to do much more work, PaaS is in the middle and IaaS
requests the least. The best example of application layer is the Salesforce.com's CRM service
where not only the hardware at the bottom layer and the software at the top layer is supplied
by the vendor, but also the platform and software tools for user application development
and monitoring.
With today’s networking technology, a few LAN switches can easily connect hundreds
of machines as a working cluster. A WAN can connect many local clusters to form a very
large cluster of clusters. Massive systems are considered highly scalable, and can reach
web-scale connectivity, either physically or logically.
In the past 30 years, users have experienced a natural growth path from Internet to web
and grid computing services. Internet services such as the Telnet command enables a local
computer to connect to a remote computer. A web service such as HTTP enables remote access
of remote web pages. Grid computing is envisioned to allow close interaction among applications
running on distant computers simultaneously. Forbes Magazine has projected the global growth
of the IT-based economy from $1 trillion in 2001 to $20 trillion by 2015. The evolution from
Internet to web and grid services is certainly playing a major role in this growth.
www.EnggTree.com
www.EnggTree.com
machines act autonomously to join or leave the system freely. This implies that no master-slave
relationship exists among the peers. No central coordination or central database is needed. In
other words, no peer machine has a global view of the entire P2P system. The system is self-
organizing with distributed control.
Figure 1.17 shows the architecture of a P2P network at two abstraction levels. Initially, the
peers are totally unrelated. Each peer machine joins or leaves the P2P network voluntarily. Only
the participating peers form the physical network at any time. Unlike the cluster or grid, a P2P
network does not use a dedicated interconnection network. The physical network is simply an ad
hoc network formed at various Internet domains randomly using the TCP/IP and NAI protocols.
Thus, the physical network varies in size and topology dynamically due to the free membership
in the P2P network.
3.2 Overlay Networks
Data items or files are distributed in the participating peers. Based on communication or
file-sharing needs, the peer IDs form an overlay network at the logical level. This overlay is a
virtual network
www.EnggTree.com
formed by mapping each physical machine with its ID, logically, through a virtual mapping as
shown in Figure 1.17. When a new peer joins the system, its peer ID is added as a node in the
overlay network. When an existing peer leaves the system, its peer ID is removed from the overlay
network automatically. Therefore, it is the P2P overlay network that characterizes the logical
connectivity among the peers.
There are two types of overlay networks: unstructured and structured. An unstructured
overlay network is characterized by a random graph. There is no fixed route to send messages or
files among the nodes. Often, flooding is applied to send a query to all nodes in an unstructured
overlay, thus resulting in heavy network traffic and nondeterministic search results. Structured
overlay net-works follow certain connectivity topology and rules for inserting and removing
nodes (peer IDs) from the overlay graph. Routing mechanisms are developed to take advantage
of the structured overlays.
3.3 P2P Application Families
Based on application, P2P networks are classified into four groups, as shown in Table 1.5.
The first family is for distributed file sharing of digital contents (music, videos, etc.) on the P2P
network. This includes many popular P2P networks such as Gnutella, Napster, and BitTorrent,
among others. Collaboration P2P networks include MSN or Skype chatting, instant messaging,
and collaborative design, among others.
3.4 P2P Computing Challenges
P2P computing faces three types of heterogeneity problems in hardware, software, and
network requirements. There are too many hardware models and architectures to select from;
incompatibility exists between software and the OS; and different network connections and
protocols
www.EnggTree.com
make it too complex to apply in real applications. We need system scalability as the workload
increases. System scaling is directly related to performance and bandwidth. P2P networks do
have these properties. Data location is also important to affect collective performance. Data
locality, network proximity, and interoperability are three design objectives in distributed P2P
applications.
3. Internet clouds :The idea is to move desktop computing to a service-oriented platform
using server clusters and huge databases at data centers. Cloud computing leverages
its low cost and simplicity to benefit both users and providers. Machine virtualization
has enabled such cost-effectiveness. Cloud computing intends to satisfy many user
Virtualized resources from data centers to form an Internet cloud, provisioned with
hardware, software, storage, network, and services for paid users to run their
applications.
The reference architecture model given by the National Institute of Standards and
Technology (NIST). The model offers approaches for secure cloud adoption while
contributing to cloud computing guidelines and standards.
The NIST team works closely with leading IT vendors, developers of standards, industries
and other governmental agencies and industries at a global level to support effective cloud
computing security standards and their further development. It is important to note that
this NIST cloud reference architecture does not belong to any specific vendor products,
services or some reference implementation, nor does it prevent further innovation in cloud
technology.
www.EnggTree.com
Fig. 1.2 : Conceptual cloud reference model showing different actors and
entities
From Fig. 3.2.1, note that the cloud reference architecture includes five major actors :
• Cloud consumer
• Cloud provider
• Cloud auditor
• Cloud broker
• Cloud carrier
Now, understand that a cloud consumer can request cloud services directly from a
CSP or from a cloud broker. The cloud auditor independently audits and then contacts
other actors to gather information. We will now discuss the role of each actor in detail.
Cloud Consumer
www.EnggTree.com
A cloud consumer is the most important stakeholder. The cloud service is built to support
a cloud consumer. The cloud consumer uses the services from a CSP or person or asks an
organization that maintains a business relationship. The consumer then verifies the service
catalogue from the cloud provider and requests an appropriate service or sets up service
contracts for using the service. The cloud consumer is billed for the service used.
Some typical usage scenarios include :
Example 1 : Cloud consumer requests the service from the broker instead of directly
contacting the CSP. The cloud broker can then create a new service by combining
multiple services or by enhancing an existing service. Here, the actual cloud provider is not
visible to the cloud consumer. The consumer only interacts with the broker. This is
illustrated in Fig. 1.4.
Example 2 : In this scenario, the cloud carrier provides for connectivity and transports
cloud services to consumers. This is illustrated in Fig. 1.5.
In Fig. 1.2.4, the cloud provider participates by arranging two SLAs. One SLA is with the
cloud provider (SLA2) and the second SLA is with the consumer (SLA1). Here, the cloud
provider will have an arrangement (SLA) with the cloud carrier to have secured, encrypted
www.EnggTree.com
connections. This ensures that the services are available for the consumer at a consistent level
to fulfil service requests. Here, the provider can specify the requirements, such as flexibility,
capability and functionalities in SLA2 to fulfil essential service requirements in SLA1.
Example 3 : In this usage scenario, the cloud auditor conducts independent evaluations
for a cloud service. The evaluations will relate to operations and security of cloud service
implementation. Here the cloud auditor interacts with both the cloud provider and consumer,
as shown in Fig. 1.6.
In all the given scenarios, the cloud consumer plays the most important role. Based on the
service request, the activities of other players and usage scenarios can differ for other cloud
consumers. Fig. 1.7 shows an example of available cloud services types.
In Fig. 1.7 note that SaaS applications are available over a network to all consumers. These
consumers may be organisations with access to software applications, end users, app
developers or administrators. Billing is based on the number of end users, the time of use,
network bandwidth consumed and for the amount or volume of data stored.
www.EnggTree.com
PaaS consumers can utilize tools, execution resources, development IDEs made available
by cloud providers. Using these resources, they can test, develop, manage, deploy and
configure many applications that are hosted on a cloud. PaaS consumers are billed based on
processing, database, storage, network resources consumed and for the duration of the
platform used.
On the other hand, IaaS consumers can access virtual computers, network - attached
storage, network components, processor resources and other computing resources that are
deployed and run arbitrary software. IaaS consumers are billed based on the amount and
duration of hardware resources consumed, number of IP addresses, volume of data stored,
network bandwidth, and CPU hours used for a certain duration.
Cloud Provider
Cloud provider is an entity that offers cloud services to interested parties. A cloud provider
manages the infrastructure needed for providing cloud services. The CSP also runs the
software to provide services and organizes the service delivery to cloud consumers
through networks.
SaaS providers then deploy, configure, maintain and update all operations of the
software application on the cloud infrastructure, in order to ensure that services are
provisioned and to fulfil cloud consumer service requests. SaaS providers assume most of
the responsibilities associated with managing and controlling applications deployed on the
infrastructure. On the other hand, SaaS consumers have no or limited administrative controls.
PaaS cloud providers manage the computing infrastructure and ensure that the
www.EnggTree.com
platform runs the cloud software and implements databases, appropriate runtime software
execution stack and other required middleware elements. They support development,
deployment and the management of PaaS consumers by providing them with necessary tools
such as IDEs, SDKs and others. PaaS providers have complete control of applications,
settings of the hosting environment, but have lesser control over the infrastructure lying
under the platform, network, servers, OS and storage.
Now, the IaaS CSP aggregates physical cloud resources such as networks, servers, storage
and network hosting infrastructure. The provider operates the cloud software and makes all
compute resources available to IaaS cloud consumer via a set of service interfaces, such as VMs
and virtual network interfaces. The IaaS cloud provider will have control over the physical
hardware and cloud software to enable provisioning and possible infrastructure services.
Cloud Auditor
The cloud auditor performs the task of independently evaluating cloud service controls
to provide an honest opinion when requested. Cloud audits are done to validate standards
conformance by reviewing the objective evidence. The auditor will examine services provided
by the cloud provider for its security controls, privacy, performance,
and so on.
Cloud Broker
The cloud broker collects service requests from cloud consumers and manages the use,
performance, and delivery of cloud services. The cloud broker will also negotiate and manage
the relationship between cloud providers and consumers. A cloud broker may provide services
that fall into one of the following categories :
• Service intermediation : Here the cloud broker will improve some specific
capabilities, and provide value added services to cloud consumers.
• Service aggregation : The cloud broker links and integrates different services into
one or more new services.
• Service Arbitrage : This is similar to aggregation, except for the fact that services
that are aggregated are not fixed. In service arbitrage, the broker has the liberty to
choose services from different agencies.
Cloud Carrier
The cloud carrier tries to establish connectivity and transports cloud services between a
www.EnggTree.com
cloud consumer and a cloud provider. Cloud carriers offer network access for consumers,
by providing telecommunication links for accessing resources using other devices (laptops,
computers, tablets, smartphones, etc.). Usually, a transport agent is an entity offering
telecommunication carriers to a business organization to access resources. The cloud provider
will set up SLAs with cloud carrier to ensure carrier transport is consistent with the level of
SLA provided by the consumers. Cloud carriers provide secure and dedicated high - speed
links with cloud providers and between different cloud
entities.
Actor Definition
A person or organization that maintains a business relationship
Cloud Consumer
with, and uses service from, Cloud Providers.
• Public cloud
• Private cloud
• Hybrid cloud
• Community cloud
www.EnggTree.com
They describe the way in which users can access the cloud services. Each cloud deployment
model fits different organizational needs, so it's important that you pick a model that will suit
your organization's needs. The four deployment models are characterized based on the
functionality and accessibility of cloud services. The four deployment models of cloud
computing are shown in Fig. 1.9.
Public Cloud
The public cloud services are runs over the internet. Therefore, the users who want cloud
services have to have internet connection in their local device like thin client, thick client,
mobile, laptop or desktop etc. The public cloud services are managed and maintained by the
Cloud Service Providers (CSPs) or the Cloud Service Brokers (CSBs). The public cloud services
are often offered on utility base pricing like subscription or pay- per-use model. The public
cloud services are provided through internet and APIs. This model allows users to easily access
the services without purchasing any specialize hardware or software. Any device which has
web browser and internet connectivity can be a public cloud client. The popular public cloud
service providers are Amazon web services, Microsoft azure and Google app engine, Salesforce
etc.
1. It saves capital cost behind purchasing the server hardware’s, operating systems
and application software licenses.
2. There is no need of serverwww.EnggTree.com
administrators to take care of servers as they are kept at
CSPs data center and managed by them.
5. A user gets easy access to multiple services under a single self - service portal.
7. It is cheaper than in house cloud implementation because user have to pay for that they
have used.
8. The resources are easily scalable.
1. There is lack of data security as data is stored on public data center and managed by third
party data center vendors therefore there may be compromise of user’s confidential
data.
2. Expensive recovery of backup data.
3. User never comes to know where (at which location) their data gets stored, how that
can be recovered and how many replicas of data have been created.
Private Cloud
The private cloud services are used by the organizations internally. Most of the times it run
over the intranet connection. They are designed for a single organization therefore anyone
within the organization can get access to data, services and web applications easily through
local servers and local network but users outside the organizations cannot access them. This
type of cloud services are hosted on intranet therefore users who are connected to that intranet
get access to the services. The infrastructure for private cloud is fully managed and maintained
by the organization itself.
It is much more secure than public cloud as it gives freedom to local administrators to
write their own security policies for user’s access. It also provides good level trust and privacy
to the users. Private clouds are more expensive than public clouds due to the capital
expenditure involved in acquiring and maintaining them. The well-known private cloud
platforms are Openstack, Open nebula, Eucalyptus, VMware private cloud etc.
Advantages of private cloud
1. Speed of access is very high as services are provided through local servers over
www.EnggTree.com
local network.
2. It is more secure than public cloud as security of cloud services are handled by local
administrator.
3. It can be customized as per organizations need.
Hybrid Cloud
The hybrid cloud services are composed of two or more clouds that offers the benefits of
multiple deployment models. It mostly comprises on premise private cloud and off- premise
public cloud to leverage benefits of both and allow users inside and outside to have access to
it. The Hybrid cloud provides flexibility such that users can migrate their applications and
services from private cloud to public cloud and vice versa. It becomes most favored in IT
industry because of its eminent features like mobility, customized security, high throughput,
scalability, disaster recovery, easy backup and replication across clouds, high availability
and cost efficient etc. The popular hybrid clouds are AWS with eucalyptus, AWS with VMware
cloud, Google cloud with Nutanix etc.
The limitations of hybrid cloud are compatibility of deployment models, vendor-lock in
solutions, requires a common cloud management software and management of
separate cloud platforms etc.
Community Cloud
The community cloud is basically the combination of one or more public, private or hybrid
clouds, which are shared by many organizations for a single cause. The community cloud
is setup between multiple organizations whose objective is same. The Infrastructure for
community cloud is to be shared by several organizations within specific community with
www.EnggTree.com
common security, compliance objectives which is managed by third party organizations or
managed internally. The well-known community clouds are
Salesforce, Google community cloud etc.
The comparison between different deployment models of cloud computing are given in
Table 1.3.1.
Public Community
S Feature Cloud Private Cloud Hybrid Cloud Cloud
r.
1 Scalability Very High Limited Very High Limited
2 Security Less Secure Most Secure Very Secure Less Secure
Low to
3 Performance Medium Good Good Medium
Medium to
4 Reliability Medium High High Medium
Intranet and
7 Network Internet Intranet Internet Internet
internal Staff
Openstack, Combination of salesforce
Windows VMware cloud, Openstack and community
9 Example Azure, CloudStack, AWS
AWS etc. Eucalyptus etc.
A Cloud computing is meant to provide variety of services and applications for users over
the internet or intranet.
The most widespread services of cloud computing are categorised into three service
classes which are called cloud service models or cloud reference models or working
models of cloud computing.
They are based on the abstraction level of the offered capabilities and the service model
of the CSPs. The various service models are :
The three service models of cloud computing and their functions are shown in
Fig. 1.10.
From Fig. 1.10, we can see that the Infrastructure as a Service (IaaS) is the bottommost layer
in the model and Software as a Service (SaaS) lies at the top.
The IaaS has lower level of abstraction and visibility, while SaaS has highest level of
visibility.
The Fig. 1.11 represents the cloud stack organization from physical infrastructure to
applications.
In this layered architecture, www.EnggTree.com
the abstraction levels are seen where higher layer services
include the services of the underlying layer.
As you can see in Fig. 1.4.2, the three services, IaaS, PaaS and SaaS, can exist independent of
one another or may combine with one another at some layers. Different layers in every cloud
computing model are either managed by the user or by the vendor (provider).
In case of the traditional IT model, all the layers or levels are managed by the user because
he or she is solely responsible for managing and hosting the applications.
In case of IaaS, the top five layers are managed by the user, while the four lower layers
(virtualisation, server hardware, storage and networking) are managed by vendors or
providers. So, here, the user will be accountable for managing the operating system via
applications and managing databases and security of applications.
The core middleware manages the physical resources and the VMs are deployed on top
of them. This deployment will provide the features of pay-per-use services and multi-tenancy.
Infrastructure services support cloud development environments and provide capabilities for
application development and implementation.
It provides different libraries, models for programming, APIs, editors and so on to support
application development. When this deployment is ready for the cloud, they can be used
by end-users/ organisations. With this idea, let us further explore the different service models.
Infrastructure as a Service (IaaS)
• Developers use the IaaS service model to create virtual hardware on which
the applications and/ or services are developed.
• Developers can create virtual private storage, virtual private servers, and virtual
private networks by using IaaS.
• The private virtual systems contain software applications to complete the IaaS
solution. The infrastructure of IaaS consists of communication networks, physical
compute nodes, storage solutions and the pool of virtualized computing resources
managed by a service provider.
• IaaS provides users with a web-based service that can be used to create, destroy
and manage virtual machines and storage.
The resources for this server instance are drawn from a mix of virtualised systems,
RAID disks, network and interface capacity. These are physical systems partitioned into
www.EnggTree.com
logical
The client in IaaS is allocated with its own private network. For example, Amazon EC2
enables this service to behave such that each server has its own separate network unless the
user creates a virtual private cloud. If the EC2 deployment is scaled by adding additional
networks on the infrastructure, it is easy to logically scale, but this can create an overhead
as traffic gets routed between logical networks.
In IaaS, the customer has controls over the OS, storage and installed applications, but has
limited control over network components. The user cannot control the underlying cloud
infrastructure. Services offered by I aaS i n c l u d e w e b servers, server hosting,
computer hardware, OS, virtual instances, load balancing, web servers and bandwidth
provisioning. These services are useful during volatile demands and when there is a
computing resource need for a new business launch or when the company may not want to
buy hardware or if the organisation wants to expand.
Platform as a Service
• The Platform as a Service can be defined as a computing platform that allows the
user to create web applications quickly and easily and without worrying about
buying and maintaining the software and infrastructure.
• Platform-as-a-Service provides tools for development, deployment and testing the
softwares, middleware solutions, databases, programming languages and APIs for
developers to develop custom applications; without installing or configuring the
development environment.
• The PaaS provides a platform to run web applications without installing them
in a local machine i.e. the applications written by the users can be directly run
www.EnggTree.com
on the PaaS cloud. It is built on the top of IaaS layer.
• The PaaS realizes many of the unique benefits like utility computing, hardware
virtualization, dynamic resource allocation, low investment costs and pre-
configured development environment. It has all the application typically required
by the client deployed on it. Some key providers of PaaS clouds are Google App
Engine, Microsoft Azure, NetSuite, Red hat Open shift etc.
• The PaaS realizes many of the unique benefits like utility computing, hardware
virtualization, dynamic resource allocation, low investment costs and pre-
configured development environment. It has all the application typically required
by the client deployed on it. Some key providers of PaaS clouds are Google App
Engine, Microsoft Azure, NetSuite, Red hat Open shift etc.
• The PaaS model includes the software environment where the developer can
create custom solutions using development tools available with the PaaS platform.
The components of a PaaS platform are shown in Fig. 1.13. Platforms can support
specific development languages, frameworks for applications and other
constructs. Also, PaaS provides tools and development environments to design
applications. Usually, a fully Integrated
www.EnggTree.com
A PaaS customer can control services such as device integration, session management,
content management, sandbox, and so on. In addition to these services, customer controls are
also possible in Universal Description Discovery and Integration (UDDI), and platform
independent Extensible Mark-up Language (XML) registry that allows registration and
identification of web service apps.
Let us consider an example of Google app engine.
The platform allows developers to program apps using Google’s published APIs. In this
platform, Google defines the tools to be used within the development framework, the file
system structure and data stores. A similar PaaS offering is given by Force.com, another
vendor that is based on the Salesforce.com development platform for the latter’s SaaS
offerings.Force.com provides an add - on development environment.
In PaaS, note that developers can build an app with Python and Google API. Here, the PaaS
vendor is the developer who offers a complete solution to the user. For instance, Google acts
as a PaaS vendor and offers web service apps to users. Other examples are : Google Earth,
Google Maps, Gmail, etc.
PaaS has a few disadvantages. It locks the developer and the PaaS platform in a
solution specific to a platform vendor. For example, an application developed in Python using
Google API on Google App Engine might work only in that environment.
PaaS is also useful in the following situations :
Major PaaS applications include software development projects where developers and users
collaborate to develop applications and automate testing services.
Part-A
1. List out the major functionalities of cloud computing. (or) Mention the characteristic
features of cloud. (Apr/May’17)(May-2022)
• The cloud will free users to focus on user application development and create business
value by outsourcing job execution to cloud providers. The computations (programs) are
www.EnggTree.com
sent to where the data is located, rather than copying the data to millions of desktops as
in the traditional approach.
• Cloud computing avoids large data movement, resulting in much better network
bandwidth utilization.
• Furthermore, machine virtualization has enhanced resource utilization, increased
application flexibility, and reduced the total cost of using virtualized data-center
resources.
• The cloud offers significant benefit to IT companies by freeing them from the low-level
task of setting up the hardware (servers) and managing the system software.
2. Write short notes on Research Compute Cloud (RC2) / Why do we need hybrid cloud?
[NOV / DEC’16](Nov/Dec 2021)
Research Compute Cloud (RC2) shown below is a private cloud, built by IBM, that
interconnects the computing and IT resources at eight IBM Research Centers scattered
throughout the United States, Europe, and Asia. A hybrid cloud provides access to clients, the
partner network, and third parties. Public clouds promote standardization, preserve capital
investment, and offer application flexibility. Private clouds attempt to achieve customization
and offer higher efficiency, resiliency, security, and privacy. Hybrid clouds operate in the
middle, with many compromises in terms of resource sharing.
• SaaS: Software as a Service (SaaS), it provides users access directly to the cloud
application without installing anything on the system.
www.EnggTree.com
• IaaS: Infrastructure as a service, it provides the infrastructure in terms of hardware like
memory, processor speed etc.
• PaaS: Platform as a service, it provides cloud application platform for the developers
computer. It also reduces the maintenance and support of the application which are
developed using cloud service.
• Professional cloud
• Personal cloud
• Performance cloud
IAAS ( Infrastructure As A Service) provides virtual and physical resources that are used
to build a cloud. It deals with the complexities of deploying and maintaining of the services
provided by this layer. Here the infrastructure is the servers, storage and other hardware
systems.
www.EnggTree.com
• Zero infrastructure investment
• Just in time infrastructure
• More efficient resource utilization
17.What are the characteristics of cloud architecture that separates it from traditional
one?
• Reference architecture
• Technical architecture
• Deployment operation architecture
UNIT II
VIRTUALIZATION BASICS
Virtual Machine:
machine, from the perspective of a system, is implemented by the underlying hardware alone, and
the ISA provides the interface between the system and the machine.
often of secondary importance compared to accurate functionality, while with code signed VMs,
performance (and power efficiency) are often major goals. In the figure, code signed VMs are
connected using dotted lines because their interface is typically at a lower level than other system
VMs.
Types of Virtual Machines : You can classify virtual machines into two types:
www.EnggTree.com
1. System Virtual Machine: These types of virtual machines gives us complete system platform
and gives the execution of the complete virtual operating system. Just like virtual box, system
virtual machine is providing an environment for an OS to be installed completely. We can see in
below image that our hardware of Real Machine is being distributed between two simulated
operating systems by Virtual machine monitor. And then some programs, processes are going on
in that distributed hardware of simulated machines separately.
2. Process Virtual Machine : While process virtual machines, unlike system virtual machine,
does not provide us with the facility to install the virtual operating system completely. Rather it
creates virtual environment of that OS while using some app or program and this environment
will be destroyed as soon as we exit from that app. Like in below image, there are some apps
running on main OS as well some virtual machines are created to run other apps. This shows that
as those programs required different OS, process virtual machine provided them with that for the
time being those programs are running. Example – Wine software in Linux helps to run Windows
applications.
Each instance of operating system called Virtual Machine (VM) and operating system runs
inside virtual machine is called guest operating system. Depending on the position of the
virtualization layer, there are two classes of VM architectures, namely the hypervisor
architectures like bare-metal or host- based. The hypervisor is the software used for doing
virtualization also known as the VMM (Virtual Machine Monitor). The hypervisor software
provides two different structures of Virtualization namely Hosted structure (also called Type
www.EnggTree.com
To implement Hosted structure, a base OS needs to be installed first over which VMM can be
installed. The hosted structure is simple solution to run multiple desktop OS independently. Fig.
2.2.2 (a) and (b) shows Windows running on Linux base OS and Linux running on Windows
base OS using hosted Hypervisor
The popular hosted hypervisors are QEMU, VMware Workstation, Microsoft Virtual PC,
Oracle VirtualBox etc.
✓ It does not allow guest OS to directly access the hardware instead it has to go
through base OS, which increases resource overhead.
✓ It has very slow and degraded virtual machines performance due to relying on
intermediate host OS for getting hardware access.
✓ It doesn’t scale up beyond the limit.
✓ In Bare-Metal Structure, the VMM can be directly installed on the top of Hardware, therefore
no intermediate host OS is needed. The VMM can directly communicate with the hardware
and does not rely on the host system for pass through permission which results in
better performance, scalability and stability. The Bare-Metal structure is shown in Fig.
2.2.3. (See Fig. 2.2.3 on next page).
✓ Bare-metal virtualization is mostly used in enterprise data centers for getting the advanced
features like resource pooling, high availability, disaster recovery and security.
www.EnggTree.com
The popular Bare-Metal Hypervisors are Citrix Xen Server, VMware ESXI and
Microsoft Hyper V.
www.EnggTree.com
• That is emulator works by translating instructions from the guest platform to instructions
of the host platform. These instructions would include both processor oriented (add, sub,
jump etc.), and the I/O specific (IN/OUT) instructions for the devices. Although this
virtual machine architecture works fine in terms of simplicity and robustness, it has its
own pros and cons.
• The advantages of ISA are, it provides ease of implementation while dealing with multiple
platforms and it can easily provide infrastructure through which one can create virtual
machines based on x86 platforms such as Sparc and Alpha. The disadvantage of ISA is since
every instruction issued by the emulated computer needs to be interpreted in software first
which degrades the performance.
a) Boochs
It is a highly portable emulator that can be run on most popular platforms that include x86,
PowerPC, Alpha, Sun, and MIPS. It can be compiled to emulate most of the versions of x86
machines including 386, 486, Pentium, Pentium Pro or AMD64 CPU, including optional MMX, SSE,
SSE2, and 3DNow instructions.
b) QEMU
QEMU (Quick Emulator) is a fast processor emulator that uses a portable dynamic translator.
It supports two operating modes: user space only, and full system emulation. In the earlier mode,
QEMU can launch Linux processes compiled for one CPU on another CPU, or for cross-
compilation and cross-debugging. In the later mode, it can emulate a full system that includes
a processor and several peripheral devices. It supports emulation of a number of processor
architectures that includes x86, ARM, PowerPC, and Sparc.
www.EnggTree.com
c) Crusoe
The Crusoe processor comes with a dynamic x86 emulator, called code morphing engine
that can execute any x 86 based application on top of it. The Crusoe is designed to
handle the x86 ISA’s precise exception semantics without constraining speculative
scheduling. This is accomplished by shadowing all registers holding the x86 state.
d) BIRD
BIRD is an interpretation engine for x86 binaries that currently supports only x86 as the
host ISA and aims to extend for other architectures as well. It exploits the similarity between the
architectures and tries to execute as many instructions as possible on the native hardware.
All other instructions are supported through software emulation.
taking the similarities exist between them Virtualization technique helps map the virtual
resources to physical resources and use the native hardware for computations in the
virtual machine. This approach generates a virtual hardware environment which
virtualizes the computer resources like CPU, Memory and IO devices.
• For the successful working of HAL the VM must be able to trap every privileged instruction
execution and pass it to the underlying VMM, because multiple VMs running own OS might
issue privileged instructions need full attention of CPU’s .If it is not managed properly then
VM may issues trap rather than generating an exception that makes crashing of instruction
is sent to the VMM. However, the most popular platform, x86, is not fully-virtualizable,
because it is been observed that certain privileged instructions fail silently rather than
trapped when executed with insufficient privileges. Some of the popular HAL virtualization
tools are
a) VMware
The VMware products are targeted towards x86-based workstations and servers. Thus, it has
to deal with the complications that arise as x86 is not a fully-virtualizable architecture. The
www.EnggTree.com
VMware deals with this problem by using a patent-pending technology that dynamically rewrites
portions of the hosted machine code to insert traps wherever VMM intervention is required.
Although it solves the problem, it adds some overhead due to the translation and execution
costs. VMware tries to reduce the cost by caching the results and reusing them wherever
possible. Nevertheless, it again adds some caching cost that is hard to avoid.
b) Virtual PC
The Microsoft Virtual PC is based on the Virtual Machine Monitor (VMM) architecture that lets
user to create and configure one or more virtual machines. It provides most of the functions
same as VMware but additional functions include undo disk operation that lets the user easily
undo some previous operations on the hard disks of a VM. This enables easy data recovery
and might come handy in several circumstances.
c) Denali
The Denali project was developed at University of Washington’s to address this issue related
to scalability of VMs. They come up with a new virtualization architecture also called Para
• The operating system level virtualization is an abstraction layer between OS and user
applications. It supports multiple Operating Systems and applications to be run
simultaneously without required to reboot or dual boot. The degree of isolation of each OS
is very high and can be implemented at low risk with easy maintenance. The
implementation of operating system level virtualization includes, operating system
installation, application suites installation, network setup, and so on. Therefore, if the
required OS is same as the one on the physical machine then the user basically ends up
with duplication of most of the efforts, he/she has already invested in setting up the
physical machine. To run applications properly the operating system keeps the application
specific data structure, user level libraries, environmental settings and other requisites
separately.
• www.EnggTree.com
The key idea behind all the OS-level virtualization techniques is virtualization layer above
the OS produces a partition per virtual machine on demand that is a replica of the
operating environment on the physical machine. With a careful partitioning and
multiplexing technique, each VM can be able to export a full operating environment and
fairly isolated from one another and from the underlying physical machine.
• The popular OS level virtualization tools are
a) Jail
The Jail is a FreeBSD based virtualization software that provides the ability to partition an
operating system environment, while maintaining the simplicity of UNIX ”root”
model. The environments captured within a jail are typical system resources and data structures
such as processes, file system, network resources, etc. A process in a partition is referred to as “in
jail” process. When the system is booted up after a fresh install, no processes will be in jail. When
a process is placed in a jail, all of its descendants after the jail creation, along with itself, remain
within the jail. A process may not belong to more than one jail. Jails are created by a privileged
process when it invokes a special system call jail. Every call to jail creates a new jail; the only
way for a new process to enter the jail is by inheriting access to the jail from another process
that already in that jail.
b) Ensim
The Ensim virtualizes a server’s native operating system so that it can be partitioned into
isolated computing environments called virtual private servers. These virtual private servers
operate independently of each other, just like a dedicated server. It is commonly used in creating
hosting environment to allocate hardware resources among large number of distributed users.
Most of the system uses extensive set of Application Programmer Interfaces (APIs) instead of
legacy System calls to implement various libraries at user level. Such APIs are designed to hide
the operating system related details to keep it simpler for normal programmers. In this technique,
the virtual environment is created above OS layer and is mostly used to implement different
Application Binary Interface (ABI) and Application Programming Interface (API) using the
underlying system.
www.EnggTree.com
The example of Library Level Virtualization is WINE. The Wine is an implementation of the
Windows API, and can be used as a library to port Windows applications to UNIX. It is a
virtualization layer on top of X and UNIX to export the Windows API/ABI which allows to run
Windows binaries on top of it.
In this abstraction technique the operating systems and user-level programs executes like
applications for the machine. Therefore, specialize instructions are needed for hardware
manipulations like I/O mapped (manipulating the I/O) and Memory mapped (that is mapping
a chunk of memory to the I/O and then manipulating the memory). The group of such special
instructions constitutes the application called Application level Virtualization. The Java Virtual
Machine (JVM) is the popular example of application level virtualization which allows creating
a virtual machine at the application-level than OS level. It supports a new self-defined set of
instructions called java byte codes for JVM.
Such VMs pose little security threat to the system while letting the user to play with it like physical
machines. Like physical machine it has to provide an operating environment to its applications
either by hosting a commercial operating system, or by coming up with its own environment.
The comparison between different levels of virtualization is shown in Table 2.4.1.
Every hypervisor uses some mechanisms to control and manage virtualization strategies
that allow different operating systems such as Linux and Windows to be run on the same
physical machine, simultaneously. Depending on the position of the
virtualization layer, there are several classes of VM mechanisms, namely the binary translation,
para-virtualization, full virtualization, hardware assist virtualization and host-based
virtualization. The mechanisms of virtualization defined by VMware and other virtualization
providers are explained as follows.
virtualization. The binary translation mechanisms with full and host-based virtualization are
explained as follows.
a) Binary translation
In Binary translation of guest OS, The VMM runs at Ring 0 and the guest OS at Ring 1. The VMM
checks the instruction stream and identifies the privileged, control and behavior-sensitive
instructions. At the point when these instructions are identified, they are trapped into the
VMM, which emulates the behavior of these instructions. The method used in this emulation
is called binary translation. The binary translation mechanism is shown in Fig. 2.5.3.
www.EnggTree.com
Fig. 2.5.3 Binary Translation mechanism
b) Full Virtualization
In full virtualization, host OS doesn’t require any modification to its OS code. Instead it relies
on binary translation to virtualize the execution of some sensitive, non-virtualizable
instructions or execute trap. Most of the guest operating systems and their applications
composed of critical and noncritical instructions. These instructions are executed with the help
of binary translation mechanism.
With full virtualization, noncritical instructions run on the hardware directly while critical
instructions are discovered and replaced with traps into the VMM to be emulated by software. In
a host- based virtualization, both host OS and guest OS takes part in virtualization where
virtualization software layer lies between them.
Therefore, full virtualization works with binary translation to perform direct execution of
instructions where guest OS is completely decoupled from the underlying hardware and
consequently, it is unaware that it is being virtualized.
The full virtualization gives degraded performance, because it involves binary translation of
instructions first rather than executing which is rather time-consuming. Specifically, the full
virtualization of I/O intensive applications is a really a big challenge as Binary translation
employs a code cache to store translated instructions to improve performance, however it
expands the cost of memory usage.
c) Host-based virtualization
In host-based virtualization, the virtualization layer runs on top of the host OS and guest
OS runs over the virtualization layer. Therefore, host OS is responsible for managing the
hardware and control the instructions executed by guest OS.
The host- based virtualization doesn’t require to modify the code in host OS but virtualization
software has to rely on the host OS to provide device drivers and other low-level services. This
architecture simplifies the VM design with ease of deployment but gives degraded performance
compared to other hypervisor architectures because of host OS interventions.
The host OS performs four layers of mapping during any IO request by guest OS or VMM
which downgrades performance significantly.
www.EnggTree.com
Para-Virtualization
The para-virtualization is one of the efficient virtualization techniques that require explicit
modification to the guest operating systems. The APIs are required for OS modifications in user
applications which are provided by para-virtualized VM.
In some of the virtualized system, performance degradation becomes the critical issue.
Therefore, para-virtualization attempts to reduce the virtualization overhead, and thus
improve performance by modifying only the guest OS kernel. The para-virtualization
architecture is shown in Fig. 2.5.4.
www.EnggTree.com
Fig. 2.5.5 Para-virtualization (Source : VMware)
In para-virtualization, virtualization layer is inserted between the hardware and the OS. As
x86 processor requires virtualization layer should be installed at Ring 0, the other instructions
at Ring 0 may cause some problems. In this architecture, the nonvirtualizable instructions are
replaced with hypercalls that communicate directly with the hypervisor or VMM. The user
applications directly get executed upon user request on host system hardware.
and emulating them at runtime, para-virtualization can handle such instructions at compile time.
In Para-Virtualization with Compiler Support thee guest OS kernel is modified to replace the
privileged and sensitive instructions with hypercalls to the hypervisor or VMM at compile
time itself. The Xen hypervisor assumes such para-virtualization architecture.
Here, guest OS running in a guest domain may run at Ring 1 instead of at Ring 0 that’s why
guest OS may not be able to execute some privileged and sensitive instructions. Therefore, such
privileged instructions are implemented by hypercalls to the hypervisor. So, after replacing the
instructions with hypercalls, the modified guest OS emulates the behavior of the original guest
OS.
Virtualization of CPU, Memory, And I/O Devices
5. Explain in detail about Virtualization of CPU, Memory, And I/O Devices.( Nov/Dec
2021)
Virtualization of CPU
The CPU Virtualization is related to range protection levels called rings in which code can
execute. The Intel x86 architecture of CPU offers four levels of privileges known as Ring 0, 1, 2
and 3. www.EnggTree.com
Among that Ring 0, Ring 1 and Ring 2 are associated with operating system while Ring
3 is reserved for applications to manage access to the computer hardware. As Ring 0 is used
by kernel because of that Ring 0 has the highest-level privilege while Ring 3 has lowest privilege
as it belongs to user level application shown in Fig. 2.6.1.
The user level applications typically run in Ring 3, the operating system needs to have direct
access to the memory and hardware and must execute its privileged instructions in Ring 0.
Therefore, Virtualizingx86 architecture requires placing a virtualization layer under the
operating system to create and manage the virtual machines that delivers shared resources.
Some of the sensitive instructions can’t be virtualized as they have different semantics. If
virtualization is not provided then there is a difficulty in trapping and translating those sensitive
and privileged instructions at runtime which become the challenge. The x86 privilege level
architecture without virtualization is shown in Fig. 2.6.2.
www.EnggTree.com
Fig. 2.6.2 X86 privilege level architecture without virtualization
In most of the virtualization system, majority of the VM instructions are executed on the
host processor in native mode. Hence, unprivileged instructions of VMs can run directly on
the host machine for higher efficiency.
The privileged instructions are executed in a privileged mode and get trapped if executed
outside this mode. The control-sensitive instructions allow to change the configuration of
resources used during execution while Behavior-sensitive instructions uses different behaviors
of CPU depending on the configuration of resources, including the load and store operations over
the virtual memory.
Generally, the CPU architecture is virtualizable if and only if it provides ability to run the
VM’s privileged and unprivileged instructions in the CPU’s user mode during which VMM runs
in supervisor mode. When the privileged instructions along with control and behavior-
sensitive instructions of a VM are executed, then they get trapped in the VMM. In such scenarios,
the VMM becomes the unified mediator for hardware access from different VMs and guarantee
the correctness and stability of the whole system. However, not all CPU architectures are
virtualizable. There are three techniques can be used for handling sensitive and privileged
instructions to virtualize the CPU on the x86 architecture :
In binary translation, the virtual machine issues privileged instructions contained within their
compile code. The VMM takes control on these instructions and changes the code under execution
to avoid the impact on state of the system. The full virtualization technique does not need to modify
host operating system. It relies on binary translation to trap and virtualize the execution of
certain instructions.
The noncritical instructions directly run on the hardware while critical instructions have to be
discovered first then they are replaced with
www.EnggTree.com
The para-virtualization technique refers to making communication between guest OS and the
www.EnggTree.com
that removes the need for either binary translation or para-virtualization. The Fig. 2.6.5 shows
Hardware Assisted Virtualization.
Virtualization Of Memory
6. Explain in detail bout virtualization of memory with an example.
Virtualization of Memory www.EnggTree.com
The memory virtualization involves physical memory to be shared and dynamically allocated
to virtual machines. In a traditional execution environment, the operating system is
responsible for maintaining the mappings of virtual memory to machine memory using page
tables. The page table is a single-stage mapping from virtual memory to machine memory. All
recent x86 CPUs comprises built-in Memory Management Unit (MMU) and a Translation
Lookaside Buffer (TLB) to improve the virtual memory performance. However, in a virtual
execution environment, the mapping is required from virtual memory to physical memory and
physical memory to machine memory; hence it requires two-stage mapping process.
The modern OS provides virtual memory support that is similar to memory virtualization. The
Virtualized memory is seen by the applications as a contiguous address space which is not
tied to the underlying physical memory in the system. The operating system is responsible for
mappings the virtual page numbers to physical page numbers stored in page tables. To optimize
the Virtual memory performance all modern x86 CPUs include a Memory Management Unit
(MMU) and a Translation Lookaside Buffer (TLB). Therefore, to run multiple virtual machines
with Guest OS on a single system, the MMU has to be virtualized shown in Fig. 2.7.1.
The Guest OS is responsible for controlling the mapping of virtual addresses to the guest
memory physical addresses, but the Guest OS cannot have direct access to the actual machine
memory. The VMM is responsible for mapping the Guest physical memory to the actual
machine memory, and it uses shadow page tables to accelerate the mappings. The VMM uses
TLB (Translation Lookaside Buffer) hardware to map the virtual memory directly to the
www.EnggTree.com
machine memory to avoid the two levels of translation on every access. When the guest OS
changes the virtual memory to physical memory mapping, the VMM updates the shadow page
tables to enable a direct lookup. The hardware-assisted memory virtualization by AMD
processor provides hardware assistance to the two-stage address translation in a virtual
execution environment by
using a technology called nested paging.
The virtualization of devices and I/O’s is bit difficult than CPU virtualization. It involves
managing the routing of I/O requests between virtual devices and the shared physical
hardware. The software based I/O virtualization and management techniques can be used
for device and I/O virtualization to enables a rich set of features and simplified
management. The network is the integral component of the system which enables
communication between different VMs. The I/O virtualization provides virtual NICs and
switches that create virtual networks between the virtual machines without the network traffic
and consuming bandwidth on the physical network. The NIC teaming allows multiple physical
NICS to be appearing as one and provides failover transparency for virtual machines. It allows
virtual machines to be seamlessly relocated to different systems using VMware VMotion by
keeping their existing MAC addresses. The key for effective I/O virtualization is to preserve the
virtualization benefits with minimum CPU utilization. Fig. 2.7.2 shows device and I/O
virtualization.
www.EnggTree.com
The virtual devices shown in above Fig. 2.7.2 can be effectively emulate on well-
known hardware and can translate the virtual machine requests to the system hardware. The
standardize device drivers help for virtual machine standardization. The portability in I/O
Virtualization allows all the virtual machines across the platforms to be configured and run on
the same virtual hardware regardless of their actual physical hardware in the system. There are
three ways of implementing I/O virtualization. The full device emulation approach emulates
well-known real-world devices where all the functions of device such as enumeration,
identification, interrupt and DMA are replicated in software. The para-virtualization method of
IO virtualization uses split driver model that consist of frontend and backend drivers. The front-
end driver runs on Domain U which manages I/O request of guest OS. The backend driver
runs Domain 0 which manages real I/O devices with multiplexing of I/O data of different VMs.
They interact with each other via block of shared memory. The direct I/O virtualization let the
VM to access devices directly.it mainly focus on networking of mainframes. There are four
methods to implement I/O virtualization namely full device emulation, para- virtualization, and
direct I/O virtualization and through self-virtualized I/O.
In full device emulation, the IO devices are virtualized using emulation software. This method
can emulate all well-known and real-world devices. The emulation software is responsible for
performing all the functions of a devices or bus infrastructure, such as device enumeration,
identification, interrupts, and DMA which are replicated. The software runs inside the VMM
and acts as a virtual device. In this method, the I/O access
requests of the guest OS are trapped in the VMM which interacts with the I/O devices. The
multiple VMs share a single hardware device for running them concurrently. However, software
emulation consumes more time in IO access that’s why it runs much slower than the hardware it
emulates.
In para-virtualization method of I/O virtualization, the split driver model is used which
consist of frontend driver and backend driver. It is used in Xen hypervisor with different drivers
like Domain 0 and Domain U. The frontend driver runs in Domain U while backend driver runs in
www.EnggTree.com
Domain 0. Both the drivers interact with each other via a block of shared memory. The frontend
driver is responsible for managing the I/O requests of the guest OSes while backend driver is
responsible for managing the real I/O devices and multiplexing the I/O data of different VMs.
The para-virtualization method of I/O virtualization achieves better device performance than
full device emulation but with a higher CPU overhead.
In direct I/O virtualization, the virtual machines can access IO devices directly. It does not
have to rely on any emulator of VMM. It has capability to give better IO performance without high
CPU costs than para-virtualization method. It was designed for focusing on networking for
mainframes.
In self-virtualized I/O method, the rich resources of a multicore processor and harnessed
together. The self-virtualized I/O encapsulates all the tasks related with virtualizing an I/O
device. The virtual devices with associated access API to VMs and a management API to the
VMM are provided by self-virtualized I/O that defines one Virtual Interface (VIF) for every
kind of virtualized I/O device.
The virtualized I/O interfaces are virtual network interfaces, virtual block devices (disk),
virtual camera devices, and others. The guest OS interacts with the virtual interfaces via
device drivers. Each VIF carries a unique ID for identifying it in self- virtualized I/O and
consists of two message queues. One message queue for outgoing messages to the devices and
another is for incoming messages from the devices.
As there are a many of challenges associated with commodity hardware devices, the multiple
IO virtualization techniques need to be incorporated for eliminating those associated challenges
like system crash during reassignment of IO devices, incorrect functioning of IO devices and
high overhead of device emulation.
PART-A
1. “Although virtualization is widely accepted today; it does have its limits”. Comment on
the statement. (May-2021)
Although virtualization is widely accepted today; it does have its limitations that are
listed below.
• High upfront Investments : Organisations need to acquire resources beforehand to
implement Virtualization. Also, there might occur a need to incur additional resources
with time.
www.EnggTree.com
• Performance Issues : Although virtualization is an efficient technique and efficiency can
be increased by applying some techniques, there may be chances when the efficiency
is not as good as that of the actual physical systems.
• Licensing Issues : All software may not be supported on virtual platforms. Although
vendors are becoming aware of the increasing popularity of virtualization and have started
providing licenses for software to run on these platforms, the problem has not completely
vanished. Therefore, it is advised to check the licenses with the vendor before using
the software.
• Difficulty in Root Cause Analysis : With the addition of an additional layer in
virtualization, complexity gets increased. This increased complexity makes root cause
analysis difficult in case of unidentified problems.
2. List the requirements of VMM.(Nov/Dec 2021)
The requirements of VMM or hypervisor are
• VMM must support efficient task scheduling and resource allocation techniques.
• VMM should provide an environment for programs which is essentially identical to the
original physical machine.
• A VMM should be in complete control of the system resources.
• Any program run under a VMM should exhibit a function identical to that which it runs
on the original physical machine directly.
• VMM must be tightly related to the architectures of processors
3. Give the role of a VM. (or) Give the basic operations of a VM. (May-2017)
Virtualization allows running multiple operating systems on a single physical machine.
Each instance of operating system running inside called Virtual machine (VM). The main
role of VM is to allocate the host machine resources to run Operating system. The other roles of
VM are
• Provide virtual hardware, including CPUs, memory, storage, hard drives, network interfaces
and other devices to run virtual operating system.
• Provide fault and security isolation at the hardware level.
• Preserve performance with advanced resource controls.
• Save the entire state of a virtual machine to files.
• Move and copy virtual machines data as easily as like moving and copying files.
• Provision to migrate any virtual machine to any physical server.
www.EnggTree.com
4. Give the significance of virtualization. (Dec 2019)(May-2021)
As we know that the large amounts of compute, storage, and networking resources
are needed to build a cluster, grid or cloud solution. These resources need to be aggregated at
one place to offer a single system image. Therefore, the concept of virtualization comes into the
picture where resources can be aggregated together to fulfill the request for resource
provisioning with rapid speed as a single system image. The virtualization is a novel solution
that can offer application inflexibility, software manageability, optimum resource utilization and
security concerns in existing physical machines. In particular, every cloud solution has to rely
on virtualization solution for provisioning the resources dynamically. Therefore, virtualization
technology is one of the fundamental components of cloud computing. It provides secure,
customizable, and isolated execution environment for running applications on abstracted
hardware. It is mainly used for providing different computing environments. Although
these computing environments are virtual but appear like to be physical. The different
characteristics of virtualization are,
• Maximum resource utilization • Reduces Hardware Cost
• Minimize the maintenance cost • Supports Dynamic Load balancing
• Supports Server Consolidation • Supports Disaster recovery
www.EnggTree.com
virtual environment located between the hardware and the OS. As Xen hypervisor runs
directly on the hardware devices, it runs many guest operating systems on the top of it. The
various operating system platforms supported as a guest OS by Xen hypervisor are Windows,
Linux, BSD and Solaris.
9. Differentiate full Virtualization and Para-Virtualization.(Nov-2020)
S.No. Full Virtualization Para virtualization
In Full virtualization, virtual machines
In para virtualization, a virtual machine
permit the execution of the
does not implement full isolation of OS but
1 instructions with the running of
rather provides a different API which is
unmodified OS in an entirely isolated
utilized when OS is subjected to alteration.
way.
6 VM permits us for installing other The containers are software that permits
software so virtually we control it as distinct application's functionalities
disputed to install the software on a independently.
computer directly.
7 Applications executing on virtual Applications executing within the
machine system can execute distinct container environment contribute to an
OS. individual OS.
8 VM facilitates a way for virtualizing Container only virtualizes the OS.
any computer system.
9 VMs have a large size. Containers are very light (some
www.EnggTree.com
megabytes).
10 VM runs in minutes due to its large Containers run in seconds.
size.
11 It utilizes a lot of memory of the Containers utilize very less system
system. memory.
12 It is highly secured. It is less secure.
UNIT III
Types of Virtualization
1. Virtualization ranging from hardware to applications in five abstraction levels.
Based on the functionality of virtualized applications, there are five basic types of
virtualization which are explained as follows.
Desktop Virtualization
The processing of multiple virtual desktops occurs on one or a few physical servers, typically at
www.EnggTree.com
the centralized data center. The copy of the OS and applications that each end user utilizes will
typically be cached in memory as one image on the physical server.
The Desktop virtualization provides a virtual desktop environment where client can access the
system resources remotely through the network.
The ultimate goal of desktop virtualization is to make computer operating system accessible
from anywhere over the network. The virtual desktop environments do not require a specific system
or hardware resources on the client side; however, it requires just a network connection.
The user can utilize the customized and personalized desktop from a remote area through the
network connection. The virtualization of the desktop is sometimes referred as Virtual Desktop
Infrastructure (VDI) where all the operating systems like windows, or Linux are installed as a
virtual machine on a physical server at one place and deliver them remotely through the Remote
Desktop Protocols like RDP (in windows) or VNC (in Linux).
The processing of multiple virtual desktops occurs on one or more physical servers placed
commonly at the centralized data center. The copy of the OS and applications that each end client
uses will commonly be stored in memory as one image the physical server.
Currently, VMware Horizon and Citrix Xen Desktop are the two most popular VDI solutions
available in the market with so many dominating features. Although, Desktop operating system
provided by VDI is virtual but appears like a physical desktop operating system. The virtual
desktop can run all the types of applications that are supported on physical computer but only
difference is they are delivered through the network.
Some of the benefits provided by Desktop virtualization are :
• It provides easier management of devices and operating systems due to centralized
management.
• It reduces capital expenditure and maintenance cost of hardware due to consolidation
of multiple operating systems into a single physical server,
• It provides enhance security as confidential data is stored in data center instead of personal
devices that could easily be lost, stolen or tampered with.
• With Desktop virtualization, operating systems can be quickly and easily provisioned
for the new users without doing any manual setup.
• Upgradation of operating system is easier
• It can facilitate Work from Home feature for IT Employees due to the desktop operating
system delivery over the internet.
www.EnggTree.com
Network Virtualization
The Network virtualization is the ability to create virtual networks that are decoupled from the
underlying network hardware. This ensures the network can better integrate with and support
increasingly virtual environments. It has capability to combine multiple physical networks into one
virtual, or it can divide one physical network into separate, independent virtual networks.
The Network virtualization is the ability to make virtual networks that are decoupled from the
underlying network hardware. This ensures the network can better integrate with and support
increasingly virtual environments. It has capacity to combine multiple physical networks into single
virtual, or it can divide one physical network into separate, independent virtual networks.
The Network virtualization can combine the entire network into a single mode and allocates its
bandwidth, channels, and other resources based on its workload.
Network virtualization is similar to server virtualization but instead of dividing up a physical
server among several virtual machines, physical network resources are divided up among multiple
virtual networks.
Storage Virtualization
www.EnggTree.com
Storage virtualization is the process of grouping multiple physical storages using software
to appear as a single storage device in a virtual form.
It pools the physical storage from different network storage devices and makes it appear to be
a single storage unit that is handled from a single console. Storage virtualization helps to address
the storage and data management issues by facilitating easy backup, archiving and recovery tasks
in less time.
It aggregates the functions and hides the actual complexity of the storage area network. The
storage virtualization can be implemented with data storage technologies like snapshots and RAID
that take physical disks and present them in a virtual format. These features can allow to perform
redundancy to the storage and gives optimum performance by presenting host as a volume.
Virtualizing storage separates the storage management software from the underlying hardware
infrastructure in order to provide more flexibility and scalable pools of storage resources. The
benefits provided by storage virtualization are
• Automated management of storage mediums with estimated of down time.
Server Virtualization
A Server virtualization is the process of dividing a physical server into multiple unique and
isolated virtual servers by means of software. It partitions a single physical server into the multiple
virtual servers; each virtual server can run its own operating system and applications
independently. The virtual server is also termed as virtual machine. The consolidation helps in
running many virtual machines under a single physical server.
Each virtual machine shares the hardware resources from physical server that leads to better
utilization of the physical servers’ resources. The resources utilized by virtual machine include CPU,
www.EnggTree.com
memory, storage, and networking. The hypervisor is the operating system or software that runs on
the physical machine to perform server virtualization.
The hypervisor running on physical server is responsible for providing the resources to the
virtual machines. Each virtual machine runs independently of the other virtual machines on the
same box with different operating systems that are isolated from each other.
The popular server virtualization softwares are VMware’s vSphere, Citrix Xen Server, Microsoft’s
Hyper-V, and Red Hat’s Enterprise Virtualization.
The benefits of server virtualization are
Application Virtualization
Application virtualization is a technology that encapsulates an application from the underlying
operating system on which it is executed. It enables access to an application without needing to
install it on the local or target device. From the user’s perspective, the application works and
interacts like it’s native on the device.
It allows to use any cloud client which supports BYOD like Thin client, Thick client, Mobile client,
PDA and so on.
Application virtualization utilizes software to bundle an application into an executable and run
anywhere type of application. The software application is isolated from the operating system
and runs in an environment called as "sandbox”.
There are two types of application virtualization: remote and streaming of the application. In
first type, the remote application will run on a server, and the client utilizes some kind of
remote display protocol to communicate back. For large number of administrators and users, it’s
fairly simple to set up remote display protocol for applications.
In second type, the streaming application will run one copy of the application on the server, and
afterward have client desktops access and run the streaming application locally. With streaming
www.EnggTree.com
application, the upgrade process is simpler, since you simply set up another streaming application
with the upgrade version and have the end users point to the new form of the application.
Some of the popular application virtualization softwares in the commercial center are VMware
ThinApp, Citrix XenApp, Novell ZENworks Application Virtualization and so on.
Some of the prominent benefits of application virtualization are
• It allows for cross-platform operations like running Windows applications on Linux or android
and vice versa.
• It allows to run applications that have legacy issues like supported on older
• Operating systems.
• It avoids conflict between the other virtualized applications
• It allows a user to run more than one instance of an application at same time
• It reduces system integration and administration costs by maintaining a common software
baseline across multiple diverse computers in an organization.
• It allows to run incompatible applications side by side, at the same time
• It utilizes less resource than a separate virtual machine.
• It provides greater security because of isolating environment between applications and
operating system.
2. Explain in detail about Virtual Clusters and Resource Management with an example.
This technique alleviates the burden and inefficiency of managing hardware resources by
software.
It is located under the ISA and remains unmodified by the operating system or VMM
(hypervisor). The below figure illustrates the technique of a software-visible VCPU moving from one
core to another and temporarily suspending execution of a VCPU when there are no appropriate cores
on which it can run.
www.EnggTree.com
Multicore virtualization
Virtual Hierarchy
➢ The emerging many-core chip multiprocessors (CMPs) provide a new computing landscape.
Instead of supporting time-sharing jobs on one or a few cores, abundant cores are used in a space-
sharing, where single-threaded or multithreaded jobs are simultaneously assigned to separate groups
of cores for long time intervals.
➢ To optimize for space-shared workloads, they propose using virtual hierarchies to overlay a
coherence and caching hierarchy onto a physical processor. Unlike a fixed physical hierarchy, a virtual
hierarchy can adapt to fit how the work is space shared for improved performance and performance
isolation.
Virtual clusters are built with VMs installed at distributed servers from one or more physical
clusters.The VMs in a virtual cluster are interconnected logically by a virtual network across several
physical networks. The below Figure illustrates the concepts of virtual clusters and physical clusters.
Each virtual cluster is formed with physical machines or a VM hosted by multiple physical clusters.
The virtual cluster boundaries are shown as distinct boundaries.
The Virtual clusters are advantageous than physical clusters by following properties.
1. To construct and distribute software stacks (OS, libraries, applications) to a physical node inside
clustersas fast as possible, and
2. To quickly switch runtime environments from one user’s virtual cluster toanother user’s virtual
cluster. If one user finishes using his system, the corresponding virtual clustershould shut down or
suspend quickly to save the resources to run other VMs for other users.
It is important to efficiently manage the disk spaces occupied by template software packages.
Some storage architecture design can be applied to reduce duplicated blocks in a distributed file
system of virtual clusters. Hash values are used to compare the contents of data blocks. Users have
their own profiles which store the identification of the data blocks for corresponding VMs in a user-
specific virtual cluster.Basically, there are four steps to deploy a group of VMs onto a target cluster:
3. Explain about Virtualization for Linux and windows and NT Platform. Design the process of
Live Migration of VM from one host to another.
Steps 0 and 1: Start migration. This step makes preparations for the migration, including
determining the migrating VM and the destination host. Although users could manually make a VM
migrate to an appointed host, in most circumstances, the migration is automatically started by
strategies such as load balancing and server consolidation.
Steps 2: Transfer memory. Since the whole execution state of the VM is stored in memory, sending
the VM’s memory to the destination node ensures continuity of the service provided by the VM. All of
the memory data is transferred in the first round, and then the migration controller recopies the
memory data which is changed in the last round. These steps keep iterating until the dirty portion of
the memory is small enough to handle the final copy. Although precopying memory is performed
iteratively, the execution of programs is not obviously interrupted.
Step 3: Suspend the VM and copy the last portion of the data. The migrating VM’s execution is
www.EnggTree.com
suspended when the last round’s memory data is transferred. Other nonmemory data such as CPU and
network states should be sent as well. During this step, the VM is stopped and its applications will no
longer run. This “service unavailable” time is called the “downtime” of migration, which should be as
short as possible so that it can be negligible to users.
Steps 4 and 5: Commit and activate the new host. After all the needed data is copied, on the
destination host, the VM reloads the states and recovers the execution of programs in it, and the
service provided by this VM continues. Then the network connection is redirected to the new VM and
the dependency to the source host is cleared.
The whole migration process finishes by removing the original VM from the source host.
Performance Effects:
www.EnggTree.com
The below diagram shows the effect on the data transmission rate (Mbit/second) of live migration
of a VM from one host to another.Before copying the VM with 512 KB files for 100 clients, the data
throughput was 870 MB/second.
The first precopy takes 63 seconds, during which the rate is reduced to 765 MB/second. Then the
data rate reduces to 694 MB/second in 9.8 seconds for more iterations of the copying process. The
system experiences only 165 ms of downtime, before the VM is restored at the destination host. This
experimental result shows a very small migration overhead in live transfer of a VM between host
nodes.
Effect on data transmission rate of a VM migrated from one failing web server to another.
File Migration:
Memory Migration:
• Memory Migration is one of the most important aspects of VM migration. Moving the memory
instance of a VM from one physical host to another can be approached in any number of ways.
• Memory migration can be in a range of hundreds of megabytes to a few gigabytes in a typical
system today, and it needs to be done in an efficient manner.
• The Internet Suspend-Resume (ISR) technique exploits temporal locality, as memory states are
likely to have considerable overlap in the suspended and the resumed instances of a VM.
• Temporal locality refers to the fact that the memory states differ only by the amount of work done
since a VM was last suspended before being initiated for migration.
Network Migration: A migrating VM should maintain all open network connections without relying
on forwarding mechanisms on the original host or on support from mobility or redirection
mechanisms.
• To enable remote systems to locate and communicate with a VM, each VM must beassigned a
virtual IP address known to other entities.
• This address can be distinct from the IP address of the host machine where the VM is currently
located. Each VM can also have its own distinct virtual MAC address.
• The VMM maintains a mapping of the virtual IP and MAC addresses to their corresponding VMs. In
general, a migrating VM includes all the protocol states and carries its IP address with it.
The Cellular Disco at Stanford is a virtual cluster built in a shared-memory multiprocessor system.
The INRIA virtual cluster was built to test parallel algorithm performance. The COD (Cluster-on-
Demand) project is a virtual cluster management system for dynamic allocation of servers from a
computing pool to multiple virtual clusters.
The idea is illustrated by the prototype implementation of the COD shown in figure. The COD partitions
a physical cluster into multiple virtual clusters (vClusters). vCluster owners specify the operating
systems and software for their clusters through an XML-RPC interface. The vClusters run a batch
schedule from Sun’s GridEngine on a web server cluster. The COD system can respond to load changes
in restructuringthe virtual clusters dynamically.
www.EnggTree.com
3. Explain in detail about Virtual Storage Management with an example
The term “storage virtualization” was widely used before the renaissance of system
virtualization. Yet the term has a different meaning in a system virtualization environment. Previously,
storage virtualization was largely used to describe the aggregation and repartitioning of disks at very
coarse time scales for use by physical machines. In system virtualization, virtual storage includes the
storage managed by VMMs and guest OSes. Generally, the data stored in this environment can be
classified into two categories:
i. VM images and
ii. Application Data.
The VM images are special to the virtual environment, while application data includes all other data
which is the same as the data in traditional OS environments. The most important aspects of system
virtualization are encapsulation and isolation.The following are the major functions provided by
Virtual Storage Management:
• This procedure complicates storage operations. On the one hand, storage management of the guest
OS performs as though it is operating in a real hard disk while the guest OSes cannot access the hard
disk directly.
• On the other hand, many guest OSes contest the hard disk when many VMs are running on a single
physical machine. Therefore, storage management of the underlying VMM is much more complex than
that of guest OSes (traditional OSes).
Example:Parallax Providing Virtual Disks to Client VMs from a Large CommonShared Physical
Disk.
The architecture of Parallax is scalable and especially suitable for use in cluster-based
environments.The below figure shows a high-level view of the structure of a Parallax-based cluster. A
cluster-wide administrative domain manages all storage appliance VMs, which makes storage
management easy.This mechanism enables advanced storage features such as snapshot facilities to be
implemented in software and delivered above commodity network storage targets.
www.EnggTree.com
Dockers:
4. Explain in detail about Dockers and its types of components with an example.
Introduction to Dockers
What is Docker?
Docker is an open-source centralized platform designed to create, deploy, and run applications. Docker
uses container on the host's operating system to run applications. It allows applications to use the
same Linux kernel as a system on the host computer, rather than creating a whole virtual operating
system. Containers ensure that our application works in any environment like development, test, or
production.
Docker includes components such as Docker client, Docker server, Docker machine, Docker hub,
Docker composes, etc.
Why Docker?
Docker is designed to benefit both the Developer and System Administrator. There are the
following reasons to use Docker -
o Docker allows us to easily install and run software without worrying about setup or
dependencies.
o Developers use Docker to eliminate machine problems, i.e. "but code is worked on my
laptop." when working on code together with co-workers.
o Operators use Docker to run and manage apps in isolated containers for better compute
density.
o Enterprises use Docker to securely built agile software delivery pipelines to ship new
application features faster and more securely.
www.EnggTree.com
o Since docker is not only used for the deployment, but it is also a great platform for development,
that's why we can efficiently increase our customer's satisfaction.
Advantages of Docker
o Docker allows you to use a remote repository to share your container with others.
Disadvantages of Docker
o Some features such as container self -registration, containers self-inspects, copying files form
host to the container, and more are missing in the Docker.
o Docker is not a good solution for applications that require rich graphical interface.
Docker Engine
Docker components
• Docker Images
• Registries
• Docker Containers
Docker is a client-server application. The Docker client talks to the Docker server
or daemon, which, in turn, does all the work. Docker ships with a command line client binary, docker,
as well as a full RESTful API. You can run the Docker daemon and client on the same host or connect
your local Docker client to a remote daemon running on another host. You can see Docker's
architecture depicted here:
www.EnggTree.com
Docker Architecture
Docker images
Images are the building blocks of the Docker world. You launch your containers
from images. Images are the "build" part of Docker's life cycle. They are a layered
format, using Union file systems, that are built step-by-step using a series of
• Add a file.
• Run a command.
• Open a port.
You can consider images to be the "source code" for your containers. They are
highly portable and can be shared, stored, and updated. In the book, we'll learn
www.EnggTree.com
how to use existing images as well as build our own images.
Registries:
• Docker stores the images you build in registries. There are two types of registries: public and
private. Docker, Inc., operates the public registry for images, called the Docker Hub. You can
create an account on the Docker Hub and use it to share and store your own images.
• The Docker Hub also contains, at last count, over 10,000 images that other people have built
and shared. Want a Docker image for an Nginx web server, the Asterisk open source PABX
system, or a MySQL database? All of these are available, along with a whole lot more.
• You can also store images that you want to keep private on the Docker Hub. These images might
include source code or other proprietary information you want to keep secure or only share
with other members of your team or organization.
Containers
Docker helps you build and deploy containers inside of which you can package your applications and
services. As we've just learnt, containers are launched from images and can contain one or more
running processes. You can think about images as the building or packing aspect of Docker and the
containers as the running or execution aspect of Docker.
• An image format.
• An execution environment.
Docker borrows the concept of the standard shipping container, used to transport goods
globally, as a model for its containers. But instead of shipping goods, Docker containers ship software.
Each container contains a software image -- its 'cargo' -- and, like its physical counterpart,
allows a set of operations to be performed. For example, it can be created, started, stopped, restarted,
and destroyed. Like a shipping container, Docker doesn't care about the contents of the container
when performing these actions; for example, whether a container is a web server, a database, or an
application server. Each container is loaded the same as any other container.
www.EnggTree.com
Docker also doesn't care where you ship your container: you can build on your laptop, upload to a
registry, then download to a physical or virtual server, test, deploy to a cluster of a dozen Amazon EC2
hosts, and run. Like a normal shipping container, it is interchangeable, stackable, portable, and as
generic as possible.
Docker can be run on any x64 host running a modern Linux kernel; we recommend kernel version 3.8
and later. It has low overhead and can be used on servers,desktops, or laptops. It includes:
• A native Linux container format that Docker calls libcontainer, as well as the popular container
platform, lxc. The libcontainer format is now the default format.
• Linux kernel namespaces, which provide isolation for filesystems, processes, and networks.
containers.
• Resource isolation and grouping: resources like CPU and memory are allocated
• Logging: STDOUT, STDERR and STDIN from the container are collected, logged,
• Interactive shell: You can create a pseudo-tty and attach to STDIN to provide
www.EnggTree.com
A Docker image is made up of filesystems layered over each other. At the base is a boot filesystem,
bootfs, which resembles the typical Linux/Unix boot filesystem. A Docker user will probably never
interact with the boot filesystem. Indeed, when a container has booted, it is moved into memory,
and the boot filesystem is unmounted to free up the RAM used by the initrd disk image.
Docker calls each of these filesystems images. Images can be layered on top of one another. The
image below is called the parent image and you can traverse each layer until you reach the bottom
of the image stack where the final image is called the base image. Finally, when a container is
launched from an image, Docker mounts a read-write filesystem on top of any layers below. This is
where whatever processes we want our Docker container to run will execute.
Let's get started with Docker images by looking at what images are available to us on our Docker
host. We can do this using the docker images command.
That image was downloaded from a repository. Images live inside repositories, and repositories
live on registries. The default registry is the public registry man- aged by Docker, Inc., Docker Hub.
Each repository can contain multiple images (e.g., the ubuntu repository contains images
for Ubuntu 12.04, 12.10, 13.04, 13.10, and 14.04). Let's get the rest of the images in the
ubuntu repository now. www.EnggTree.com
Listing 4.3: Pulling the Ubuntu image
. . .
Here we've used the docker pull command to pull down the entire contents of the
ubuntu repository.
We can refer to a specific image inside a repository by suffixing the repository name with
a colon and a tag name, for example:
bash
root@79e36bff89b4:/#
This launches a container from the ubuntu:12.04 image, which is an Ubuntu 12.04
operating system. We can also see that some images with the same ID (see image ID
74fe38d11401) are tagged more than once. Image ID 74fe38d11401 is actu- ally
tagged both 12.04 and precise: the version number and code name for that Ubuntu
release, respectively.
Pulling images
When we run a container from images with the docker run command, if the image isn't
present locally already then Docker will download it from the Docker Hub. By default, if
you don't specify a specific tag, Docker will download the latest tag, for example:
will download the ubuntu:latest image if it isnt already present on the host.
Alternatively, we can use the docker pull command to pull images down our- selves.
Using docker pull saves us some time launching a container from a new image. Let's see
that now by pulling down the fedora base image.
www.EnggTree.com
Pulling the fedora image
Let's see this new image on our Docker host using the docker images command. This
time, however, let's narrow our review of the images to only the fedora↩ images. To do
so, we can specify the image name after the docker images↩ command.
We can see that the fedora image contains the development Rawhide release as well as
Fedora 20. We can also see that the Fedora 20 release is tagged in three ways -- 20,
heisenbug, and latest -- but it is the same image (we can see all three entries have an ID
of b7de3133ff98). If we wanted the Fedora 20 image, therefore, we could use any of the
following:
• fedora:20
• fedora:heisenbug
• fedora:latest
We could have also just downloaded one tagged image using the docker pull
command.ting 4.9: Pulling a tagged fedora imag
www.EnggTree.com
$ sudo docker pull fedora:20
We can also search all of the publicly available images on Docker Hub using the
docker search command:
ing 4.10: Searching for images
wfarr/puppet-module...
jamtur01/puppetmaster
. . .
Here, we've searched the Docker Hub for the term puppet. It'll search images and return:
• Repository names
• Image descriptions
This will pull down the jamtur01/puppetmaster image (which, by the way, con- tains
a pre-installed Puppet master).
www.EnggTree.com
We can then use this image to build a new container. Let's do that now using the
docker run command again.
Listing 4.12: Creating a Docker container from the Puppet master image
3.4.3
You can see we've launched a new container from our jamtur01/puppetmaster im- age.
We've launched the container interactively and told the container to run the Bash shell. Once
inside the container's shell, we've run Facter (Puppet's inventory application), which was
pre-installed on our image. From inside the container, we've also run the puppet binary
to confirm it is installed.
Docker Hub:
Docker Hub is a repository service and it is a cloud-based service where people push their Docker
Container Images and also pull the Docker Container Images from the Docker Hub anytime or
anywhere via the internet. It provides features such as you can push your images as private or
public. Mainly DevOps team uses the Docker Hub. It is an open-source tool and freely available for
all operating systems. It is like storage where we store the images and pull the images when it is
required. When a person wants to push/pull images from the Docker Hub they must have a basic
knowledge of Docker. Let us discuss the requirements of the Docker tool.
Docker is a tool nowadays enterprises adopting rapidly day by day. When a Developer team wants
to share the project with all dependencies for testing then the developer can push their code
on Docker Hub with all dependencies. Firstly create the Images and push the Image on Docker
Hub. After that, the testing team will pull the same image from the Docker Hub eliminating the need
for any type of file, software, or plugins for running the Image because the Developer team shares
the image with all dependencies. www.EnggTree.com
• Docker hub plays a very important role in industries as it becomes more popular day by day
and it acts as a bridge between the developer team and the testing team.
• If a person wants to share their code, software any type of file for public use, you can just make
the images public on the docker hub.
Creating First Repository in Docker Hub Using GUI
Step 1: We must open Docker Hub first, then select Create Repository.
Step 2: After that, we will be taken to a screen for configuring the repository, where we must choose
the namespace, repository name, and optional description. In the visibility area, as indicated in the
picture, there are two options: Public and Private. We can choose any of them depending on the
type of organization you are in. If you chose Public, everyone will be able t o push-pull and use the
image because it will be accessible to everyone. If you select the private option, only those with
access to that image can view and utilize it. it.
Step 3: At finally repository is created with the help of the Docker Commands we can push or pull
the image.
docker push <your-username>/my-testprivate-repo>.
1. Push Command
This command as the name suggests itself is used to pushing a docker image onto the docker hub.
Implementation
Follow this example to get an idea of the push command:
# docker images
The above command will list all the images on your system.
Step 4: Then give your credential and type in your docker hub username or password.
• username
• password
Step 5: After that hit the Enter key you will see login success on your screen.
www.EnggTree.com
Step 7: Then type the tag images name, docker hub username, and give the name it appears on the
docker hub using the below command:
# docker tag geeksforgeek mdahtisham/geeksimage
geeksimage - With this name Image will appear on the docker hub
Note:Below you can see the Docker Image successfully pushed on the docker hub:
mdahtisham/geeksimage
www.EnggTree.com
2. Pull Command
The pull command is used to get an image from the Docker Hub.
Implementation:
Follow the example to get an overview of the pull command in Docker:
Step 1: Now you can search the image using the below command in docker as follows:
# docker search imagename
One can see all images on your screen if available images with this name.One can also pull the
images if one knows the exact name
geeksimage - With this name Image will appear on the docker hub
www.EnggTree.com
Step 3: Now check for the pulled image using the below command as follows:
# docker images
PART-A
1. Define Virtualization.
Virtualization is a process that allows a computer to share its hardware
resources with multiple digitally separated environments. Each virtualized
environment runs within its allocated resources, such as memory, processing
power, and storage. www.EnggTree.com
2. List the types of Virtualization in cloud.
Server Virtualization
Network Virtualization
Storage Virtualization
Desktop Virtualization
Application Virtualization
3. What is meant by Desktop Virtualization?
The processing of multiple virtual desktops occurs on one or a few physical
servers, typically at the centralized data center. The copy of the OS and
applications that each end user utilizes will typically be cached in memory as
one image on the physical server.
The Desktop virtualization provides a virtual desktop environment where
client can access the system resources remotely through the network.
A Server virtualizationwww.EnggTree.com
is the process of dividing a physical server
into multiple unique and isolated virtual servers by means of software. It
partitions a single physical server into the multiple virtual servers; each
virtual server can run its own operating system and applications
independently.
www.EnggTree.com
16. What is Network Migration ?
A migrating VM should maintain all open network connections without
relying on forwarding mechanisms on the original host or on support from
mobility or redirection mechanisms.
19.
List out the Advantages of Docker.
o www.EnggTree.com
It does not a require full operating system to run applications.
www.EnggTree.com
24. Define Containers.
Docker helps you build and deploy containers inside of which you can package
your applications and services. As we've just learnt, containers are launched
from images and can contain one or more running processes. You can think
about images as the building or packing aspect of Docker and the containers as
the running or execution aspect of Docker.
A Docker container is:
• An image format.
• A set of standard operations.
• An execution environment.
25. What is a Docker image?
A Docker image is made up of filesystems layered over each other. At the
base is a boot filesystem, bootfs, which resembles the typical Linux/Unix boot
filesystem. A Docker user will probably never interact with the boot
filesystem. Indeed, when a container has booted, it is moved into memory,
and the boot filesystem is unmounted to free up the RAM used by the initrd
disk image.
26. Difference between Docker Vs VM (Virtual Machine).
UNIT IV
Google App Engine – Amazon AWS – Microsoft Azure; Cloud Software Environments –
Eucalyptus– OpenStack.
1. Discuss in detail about the Google App engine and its architecture(or)Discuss in detail
about GAE Applications. Nov/Dec 2020(Or)Explain the functional modules of GAE with
an example(May-2022)(Or) Demonstrate the programming environment of Google
APP Engine.(May-2023)
Google App Engine (GAE) is a Platform-as-a-Service cloud computing model that supports
many programming languages.
GAE is a scalable runtime environment mostly devoted to execute Web applications. In fact, it
www.EnggTree.com
allows developers to integrate third-party frameworks and libraries with the infrastructure still
being managed by Google.
It allows developers to use readymade platform to develop and deploy web applications using
development tools, runtime engine, databases and middleware solutions. It supports languages
like Java, Python, .NET, PHP, Ruby, Node.js and Go in which developers can write their code and
deploy it on available google infrastructure with the help of Software Development Kit (SDK).
In GAE, SDKs are required to set up your computer for developing, deploying, and managing
your apps in App Engine. GAE enables users to run their applications on a large number of data
centers associated with Google’s search engine operations. Presently, Google App Engine uses
fully managed, serverless platform that allows to choose from several popular languages,
libraries, and frameworks to develop user applications and then uses App Engine to take care
of provisioning servers and scaling application instances based on demand. The functional
architecture of the Google cloud platform for app engine is shown in Fig. 4.1.
The infrastructure for google cloud is managed inside datacenter. All the cloud services
and applications on Google runs through servers inside datacenter. Inside each data center, there
are thousands of servers forming different clusters. Each cluster can run multipurpose servers.
The infrastructure for GAE composed of four main components like Google File System (GFS),
MapReduce, BigTable, and Chubby. The GFS is used for storing large amounts of data on google
storage clusters. The MapReduce is used for application program development with data
processing on large clusters. Chubby is used as a distributed application locking services while
BigTable offers a storage service for accessing structured as well as unstructured data. In this
architecture, users can interact with Google applications via the web interface provided by each
application.
www.EnggTree.com
Fig. 4.1 : Functional architecture of the Google cloud platform for app engine
• Application runtime environment offers a platform that has built-in execution engine
for scalable web programming and execution.
• Software Development Kit (SDK) for local application development and deployment
over google cloud platform.
• Datastore to provision object-oriented, distributed, structured data storage to store
application and data. It also provides secures data management operations based on
BigTable techniques.
• Admin console used for easy management of user application development and resource
management
• GAE web service for providing APIs and interfaces.
The Google provides programming support for its cloud environment, that is, Google Apps
Engine, through Google File System (GFS), Big Table, and Chubby. The following sections provide
a brief description about GFS, Big Table, Chubby and Google APIs.
Google has designed a distributed file system, named GFS, for meeting its exacting demands
off processing a large amount of data. Most of the objectives of designing the GFS are similar
to those of the earlier designed distributed systems. Some of the objectives include
availability, performance, reliability, and scalability of systems. GFS has also been designed
www.EnggTree.com
with certain challenging assumptions that also provide opportunities for developers and
researchers to achieve these objectives. Some of the assumptions are listed as follows :
b) Efficient storage support for large - sized files as a huge amount of data to be processed is
stored in these files. Storage support is provided for small - sized files without requiring any
optimization for them.
c) With the workloads that mainly consist of two large streaming reads and small random
reads, the system should be performance conscious so that the small reads are made steady
rather than going back and forth by batching and sorting while advancing through the file.
d) The system supports small writes without being inefficient, along with the usual large
and sequential writes through which data is appended to files.
e) Semantics that are defined well are implemented.
g) Provisions for sustained bandwidth is given priority rather than a reduced latency. Google
takes the aforementioned assumptions into consideration, and supports its
cloud platform, Google Apps Engine, through GFS. Fig. 4.2 shows the architecture of
the GFS clusters.
GFS provides a file system interface and different APIs for supporting different file operations
www.EnggTree.com
such as create to create a new file instance, delete to delete a file instance, open to open a named
file and return a handle, close to close a given file specified by a handle, read to read data from
a specified file and write to write data to a specified file.
A single GFS Master and three chunk servers are serving to two clients comprise a GFS cluster.
These clients and servers, as well as the Master, are Linux machines, each running a server
process at the user level. These processes are known as user-level server processes.
Applications contain a specific file system, Application Programming Interface (APIs) that are
executed by the code that is written for the GFS client. Further, the communication
with the GFS Master and chunk servers are established for performing the read and write
operations on behalf of the application.
The clients interact with the Master only for metadata operations. However, data-bearing
communications are forwarded directly to chunk servers. POSIX API, a feature that is common to
most of the popular file systems, is not included in GFS, and therefore, Linux vnode layer hook-
in is not required.
Clients or servers do not perform the caching of file data. Due to the presence of the
streamed workload, caching does not benefit clients, whereas caching by servers has the least
consequence as a buffer cache that already maintains a record for frequently requested files
locally.
The GFS provides the following features :
www.EnggTree.com
Big Table
Googles Big table is a distributed storage system that allows storing huge volumes of
structured as well as unstructured data on storage mediums.
Google created Big Table with an aim to develop a fast, reliable, efficient and scalable
storage system that can process concurrent requests at a high speed.
Millions of users access billions of web pages and many hundred TBs of satellite images. A
lot of semi-structured data is generated from Google or web access by users.
This data needs to be stored, managed, and processed to retrieve insights. This required
data management systems to have very high scalability.
Google's aim behind developing Big Table was to provide a highly efficient system for
managing a huge amount of data so that it can help cloud storage services.
It is required for concurrent processes that can update various data pieces so that the most
recent data can be accessed easily at a fast speed. The design requirements of Big Table
are as follows :
1. High speed
2. Reliability
3. Scalability
4. Efficiency
5. High performance
Big Table is a popular, distributed data storage system that is highly scalable and self-
managed. It involves thousands of servers, terabytes of data storage for in-memory
operations, millions of read/write requests by users in a second and petabytes of data stored
on disks. Its self-managing services help in dynamic addition and removal of servers that
are capable of adjusting the load imbalance by themselves.
It has gained extreme popularity at Google as it stores almost all kinds of data, such as Web
indexes, personalized searches, Google Earth, Google Analytics, and Google Finance. It
contains data from the Web is referred to as a Web table. The generalized architecture of Big table
is shown in Fig. 4.3
www.EnggTree.com
It is composed of three entities, namely Client, Big table master and Tablet servers. Big tables
are implemented over one or more clusters that are similar to GFS clusters. The client
application uses libraries to execute Big table queries on the master server. Big table is initially
broken up into one or more slave servers called tablets for the execution of secondary tasks. Each
tablet is 100 to 200 MB in size.
The master server is responsible for allocating tablets to tasks, clearing garbage collections
and monitoring the performance of tablet servers. The master server splits tasks and executes
them over tablet servers. The master server is also responsible for maintaining a centralized view
of the system to support optimal placement and load- balancing decisions.
It performs separate control and data operations strictly with tablet servers. Upon granting
the tasks, tablet servers provide row access to clients. Fig. 4.6.3 shows the structure of Big table
:
Big Table is arranged as a sorted map that is spread in multiple dimensions and involves
sparse, distributed, and persistence features. The Big Table’s data model primarily combines
three dimensions, namely row, column, and timestamp. The first two dimensions are string types,
whereas the time dimension is taken as a 64-bit integer. The resulting combination of these
dimensions is a string type.
Each row in Big table has an associated row key that is an arbitrary string of up to
64 KB in size. In Big Table, a row name is a string, where the rows are ordered in a lexicological
form. Although Big Table rows do not support the relational model, they offer atomic access
to the data, which means you can access only one record at a time. The rows contain a large
amount of data about a given entity such as a web page. The row keys represent URLs that
www.EnggTree.com
contain information about the resources that are referenced by the URLs.
The other important dimension that is assigned to Big Table is a timestamp. In Big table,
the multiple versions of data are indexed by timestamp for a given cell. The timestamp is either
related to real-time or can be an arbitrary value that is assigned by a programmer. It is used for
storing various data versions in a cell.
By default, any new data that is inserted into Big Table is taken as current, but you can
explicitly set the timestamp for any new write operation in Big Table. Timestamps provide the
www.EnggTree.com
Big Table lookup option that returns the specified number of the most recent values. It can be
used for marking the attributes of the column families.
The attributes either retain the most recent values in a specified number or keep the values
for a particular time duration.
Big Table supports APIs that can be used by developers to perform a wide range of operations
such as metadata operations, read/write operations, or modify/update operations. The
commonly used operations by APIs are as follows:
• Creation and deletion of tables
• Creation and deletion of column families within tables
• Writing or deleting cell values
• Accessing data from rows
• Associate metadata such as access control information with tables and column
families
The functions that are used for atomic write operations are as follows :
Chubby
Chubby is the crucial service in the Google infrastructure that offers storage and coordination
for other infrastructure services such as GFS and Bigtable. It is a coarse - grained distributed
locking service that is used for synchronizing distributed activities in an asynchronous
environment on a large scale. It is used as a name service within Google and provides reliable
storage for file systems along with the election of coordinator for multiple replicas. The Chubby
interface is similar to the interfaces that are provided by
distributed systems with advisory locks. However, the aim of designing Chubby is to provide
reliable storage with consistent availability.
www.EnggTree.com
It is designed to use with loosely coupled distributed systems that are connected in a
high-speed network and contain several small-sized machines. The lock service enables the
synchronization of the activities of clients and permits the clients to reach a consensus about
the environment in which they are placed. Chubby’s main aim is to efficiently handle a large set
of clients by providing them a highly reliable and available system. Its other important
characteristics that include throughput and storage capacity are secondary. Fig. 4.5 shows the
typical structure of a Chubby system :
www.EnggTree.com
The chubby architecture involves two primary components, namely server and client library.
Both the components communicate through a Remote Procedure Call (RPC). However, the
library has a special purpose, i.e., linking the clients against the chubby cell. A Chubby cell
contains a small set of servers. The servers are also called replicas, and usually, five servers are
used in every cell. The Master is elected from the five replicas through a distributed protocol that
is used for consensus. Most of the replicas must vote
for the Master with the assurance that no other Master will be elected by replicas that have
once voted for one Master for a duration. This duration is termed as a Master lease.
Chubby supports a similar file system as Unix. However, the Chubby file system is simpler than
the Unix one. The files and directories, known as nodes, are contained in the Chubby namespace.
Each node is associated with different types of metadata. The nodes are opened to obtain the
Unix file descriptors known as handles. The specifiers for handles include check digits for
preventing the guess handle for clients, handle sequence numbers, and mode information for
recreating the lock state when the Master changes. Reader and writer locks are implemented by
Chubby using files and directories. While exclusive permission for a lock in the writer mode can
be obtained by a single client, there can be any number of clients who share a lock in the reader’s
mode.
Another important term that is used with Chubby is an event that can be subscribed by
clients after the creation of handles. An event is delivered when the action that corresponds to it
is completed. An event can be :
a. Modification in the contents of a file
In Chubby, caching is done by a client that stores file data and metadata to reduce the traffic
for the reader lock. Although there is a possibility for caching of handles and files locks, the
Master maintains a list of clients that may be cached. The clients, due to caching, find data
to be consistent. If this is not the case, an error is flagged. Chubby maintains sessions between
clients and servers with the help ofwww.EnggTree.com
a keep-alive message, which is required every few seconds to
remind the system that the session is still active.
If the server failure has indeed occurred, the Master does not respond to a client about the
keep-alive message in the local lease timeout. This incident sends the session in jeopardy.
It can be recovered in a manner as explained in the following points:
• The cache needs to be cleared.
• The client needs to wait for a grace period, which is about 45 seconds.
• Another attempt is made to contact the Master.
If the attempt to contact the Master is successful, the session resumes and its jeopardy is over.
However, if this attempt fails, the client assumes that the session is lost. Fig.4.6 shows the case of
the failure of the Master :
Chubby offers a decent level of scalability, which means that there can be any (unspecified)
number of the Chubby cells. If these cells are fed with heavy loads, the lease timeout increases.
This increment can be anything between 12 seconds and 60 seconds. The data is fed in a small
package and held in the Random-Access Memory (RAM) only. The Chubby system also uses
partitioning mechanisms to divide data into smaller packages. All of its excellent services and
applications included, Chubby has proved to be a great innovation when it comes to storage,
locking, and program support services.
www.EnggTree.com
The Chubby is implemented using the following APls :
API Descriptio
Open Opens the file or directory and returns a handle
Close Closes the file or directory and returns the associated
Delete Deletes
handle the file or directory
ReadDir Returns the contents of a directory
SetContents Writes the contents of a file
GetStat Returns the metadata
GetContentsAndSt Writes the file contents and return metadata associated
Acquire Acquires a lock on a file
with the file
Google developed a set of Application Programming Interfaces (APIs) that can be used to
communicate with Google Services. This set of APIs is referred as Google APIs. and their
integration to other services. They also help in integrating Google Services to other services.
Google App Engine help in deploying an API for an app while not being aware about its
infrastructure. Google App Engine also hosts the endpoint APIs which are created by Google
Cloud Endpoints. A set of libraries, tools, and capabilities that can be used to generate client
libraries and APIs from an App Engine application is known as Google Cloud Endpoints. It eases
the data accessibility for client applications. We can also save the time of writing the network
communication code by using Google Cloud Endpoints that can also generate client libraries
for accessing the backend API.
AWS: www.EnggTree.com
2. Explain in detail about AWS EC2 and EB with an example.
Programming on Amazon EC2
✓ Amazon was the first company to introduce VMs in application hosting. Customers can
rent VMs instead of physical machines to run their own applications. By using VMs,
customers can load any software of their choice.
✓ The elastic feature of such a service is that a customer can create, launch, and terminate
server instances as needed, paying by the hour for active servers. Amazon provides
several types of preinstalled VMs.
✓ Instances are often called Amazon Machine Images (AMIs) which are preconfigured
with operating systems based on Linux or Windows, and additional software.
Table 6.12 defines three types of AMI. Figure 6.24 shows an execution environment. AMIs are
the templates for instances, which are running VMs. The workflow to create a VM is
Create an AMI → Create Key Pair → Configure Firewall → Launch (6.3)
This sequence is supported by public, private, and paid AMIs shown in Figure 6.24. The
AMIs
are
formed from the virtualized compute, storage, and server resources shown at the bottom of
Figure 6.23.
network performance and are well suited for high-performance computing (HPC)
applications and other demanding network-bound applications. They use 10 Gigabit
Ethernet interconnections.
Amazon S3 offers a simple web services interface that can be used to store and retrieve any
www.EnggTree.com
amount of data from anywhere, at any time on the web. It gives any developer access to the same
scalable, secure, fast, low - cost data storage infrastructure that Amazon uses to operate its own
global website network. S3 is an online backup and storage system. The high - speed data transfer
feature known as AWS Import/Export will exchange data to and from AWS using Amazon’s
own internal network to another portable device.
Amazon S3 is a cloud - based storage system that allows storage of data objects in the range
of 1 byte up to 5 GB in a flat namespace. The storage containers in S3 have predefined
buckets, and buckets serve the function of a directory, though there is no object hierarchy to
a bucket, and the user can save objects to it but not files. Here it is important to note that the
concept of a file system is not associated with S3 because file systems are not supported, only
objects are stored. In addition to this, the user is not required to mount a bucket, as opposed to a
file system. Fig. 4.7 shows an S3 diagrammatically.
S3 system allows buckets to be named (Fig. 4.8), but the name must be unique in the S3
namespace across all consumers of AWS. The bucket can be accessed through the S3 web API
(with SOAP or REST), which is similar to a normal disk storage system.
The performance of S3 is limited for use with non-operational functions such as data archiving,
www.EnggTree.com
retrieval and disk backup. The REST API is more preferred to SOAP API because it is easy to
work with large binary objects in REST.
Amazon S3 offers large volumes of reliable storage with high protection and low
bandwidth access. S3 is most ideal for applications that need storage archives. For example,
S3 is used by large storage sites that share photos and images.
The APIs to manage the bucket has the following features :
The S3 service can be used by many users as a backup component in a 3-2-1 backup method.
This implies that your original data is 1, a copy of your data is 2 and an off-site copy of data is 3.
In this method, S3 is the 3rd level of backup. In addition to this, Amazon S3 provides the feature
of versioning.
In versioning, every version of the object stored in an S3 bucket is retained, but for this, the
user must enable the versioning feature. Any HTTP or REST operation, namely PUT, POST, COPY
or DELETE will create a new object that is stored along with the older version. A GET operation
retrieves the new version of the object, but the ability to recover and undo actions are also
available. Versioning is a useful method for reserving and
archiving data.
Amazon Glacier
Amazon glacier is very low - price online file storage web service which offer secure, flexible
and durable storage for online data backup and archiving. This web service is specially designed
for those data which are not accessed frequently. That data which is allowed to be retrieved
within three to five hours can use amazon glacier service.
You can virtually store any type of data, any format of data and any amount of data using
amazon glacier. The file in ZIP and TAR format are the most common type of data stored in
amazon glacier. www.EnggTree.com
Some of the common use of amazon glacier are :
• Replacing the traditional tape solutions with backup and archive which can last
longer.
• Storing data which is used for the purposes of compliance.
Glacier Vs S3
Both amazon S3 and amazon glacier work almost the same way. However, there are
certain important aspects that can reflect the difference between them. Table 6.10.1 shows
the comparison of amazon glacier and amazon S3 :
You can also use amazon S3 interface for availing the offerings of amazon glacier with no need
of learning a new interface. This can be done by utilising Glacier as S3 storage class along with
object lifecycle policies.
Azure:
4. Explain in detail about Azure Architecture and its Components with an example.
Microsoft Windows Azure(Azure)
In 2008, Microsoft launched a Windows Azure platform to meet the challenges in cloud computing.
This platform is built over Microsoft data centers. Figure 4.22 shows the overall architecture of
Microsoft’s cloud platform. The platform is divided into three major component platforms.
Windows Azure offers a cloud platform built on Windows OS and based on Microsoft virtualization
www.EnggTree.com
technol- ogy. Applications are installed on VMs deployed on the data-center servers. Azure manages
all servers, storage, and network resources of the data center. On top of the infrastructure are the
var- ious services for building different cloud applications.
Live service Users can visit Microsoft Live applications and apply the data involved across multiple
machines concurrently.
• .NET service This package supports application development on local hosts and execution on
cloud machines.
www.EnggTree.com
SQL Azure This function makes it easier for users to visit and use the relational database associated
with the SQL server in the cloud.
• SharePoint service This provides a scalable and manageable platform for users to develop their
special business applications in upgraded web services.
• Dynamic CRM service This provides software developers a business platform in managing CRM
applications in financing, marketing, and sales and promotions.
✓ All these cloud services in Azure can interact with traditional Microsoft software
applications, such as Windows Live, Office Live, Exchange online, SharePoint online, and
dynamic CRM online.
✓ The Azure platform applies the standard web communication protocols SOAP and REST. The
Azure service applications allow users to integrate the cloud application with other
platforms or third-party clouds.
✓ You can download the Azure development kit to run a local version of Azure. The powerful
SDK allows Azure applications to be developed and debugged on the Windows hosts.
SQLAzure
www.EnggTree.com
Azure offers a very rich set of storage capabilities, as shown in Figure 6.25. All the storage
modalities are accessed with REST interfaces except for the recently introduced Drives that are
analogous to Amazon EBS discussed in above (AWS Methods), and offer a file system interface
as a durable NTFS volume backed by blob storage. The REST interfaces are automatically
associated with URLs and all storage is replicated three times for fault tolerance and is
guaranteed to be consistent in access.
The basic storage system is built from blobs which are analogous to S3 for Amazon. Blobs are
arranged as a three-level hierarchy: Account → Containers → Page or Block Blobs.
Containers are analogous to directories in traditional file systems with the account acting as the
root. The block blob is used for streaming data and each such blob is made up as a sequence of
blocks of up to 4 MB each, while each block has a 64 byte ID.
Block blobs can be up to 200 GB in size. Page blobs are for random read/write access and consist
of an array of pages with a maximum blob size of 1 TB. One can associate metadata with blobs
as <name, value> pairs with up to 8 KB per blob.
Azure Tables
The Azure Table and Queue storage modes are aimed at much smaller data volumes. Queues
provide reliable message delivery and are naturally used to support work spooling between web
and worker roles. Queues consist of an unlimited number of messages which can be retrieved
and pro- cessed at least once with an 8 KB limit on message size.
Azure supports PUT, GET, and DELETE message operations as well as CREATE and DELETE for
queues. Each account can have any number of Azure tables which consist of rows called entities
and columns called properties.
There is no limit to the number of entities in a table and the technology is designed to scale well
to a large number of entities stored on distributed computers. All entities can have up to 255
general properties which are <name, type, value> triples.
Two extra properties, PartitionKey and RowKey, must be defined for each entity, but
otherwise, there are no constraints on the names of properties—this table is very flexible!
RowKey is designed to give each entity a unique label while PartitionKey is designed to be
shared and entities with the same PartitionKey are stored next to each other; a good use of
www.EnggTree.com
PartitionKey can speed up search performance. An entity can have, at most, 1 MB storage; if you
need large value sizes, just store a link to a blob store in the Table property value. ADO.NET and
LINQ support table queries.
Eucalyptus is a Linux-based open-source software architecture for cloud computing and also a
storage platform that implements Infrastructure a Service (IaaS). It provides quick and efficient
computing services. Eucalyptus was designed to provide services compatible with Amazon’s EC2
cloud and Simple Storage Service(S3).
Eucalyptus Architecture
Eucalyptus CLIs can handle Amazon Web Services and their own private instances. Clients have
the independence to transfer cases from Eucalyptus to Amazon Elastic Cloud. The virtualization
layer oversees the Network, storage, and Computing. Occurrences are isolated by hardware
virtualization.
Components of Architecture
• Node Controller is the lifecycle of instances running on each node. Interacts with the
operating system, hypervisor, and Cluster Controller. It controls the working of VM instances
on the host machine.
• Cluster Controller manages one or more Node Controller and Cloud Controller
simultaneously. It gathers information and schedules VM execution.
• Storage Controller (Walrus) Allows the creation of snapshots of volumes. Persistent block
storage over VM instances. Walrus Storage Controller is a simple file storage system. It stores
images and snapshots. Stores and serves files using S3(Simple Storage Service) APIs.
• Cloud Controller Front-end for the entire architecture. It acts as a Complaint Web Services
to client tools on one side and interacts with the rest of the components on the other side.
• Managed Mode: Numerous security groups to users as the network is large. Each security
group is assigned a set or a subset of IP addresses. Ingress rules are applied through the
security groups specified by the user. The network is isolated by VLAN between Cluster
Controller and Node Controller. Assigns two IP addresses on each virtual machine.
• Managed (No VLAN) Node: The root user on the virtual machine can snoop into other
virtual machines running on the same network layer. It does not provide VM network isolation.
• System Mode: Simplest of all modes, least number of features. A MAC address is assigned to
www.EnggTree.com
a virtual machine instance and attached to Node Controller’s bridge Ethernet device.
• Static Mode: Similar to system mode but has more control over the assignment of IP address.
MAC address/IP address pair is mapped to static entry within the DHCP server. The next set of
MAC/IP addresses is mapped.
Advantages Of The Eucalyptus Cloud
1. Eucalyptus can be utilized to benefit both the eucalyptus private cloud and the eucalyptus
public cloud.
2. Examples of Amazon or Eucalyptus machine pictures can be run on both clouds.
3. Its API is completely similar to all the Amazon Web Services.
4. Eucalyptus can be utilized with DevOps apparatuses like Chef and Puppet.
5. Although it isn’t as popular yet but has the potential to be an alternative to OpenStack and
CloudStack.
6. It is used to gather hybrid, public and private clouds.
7. It allows users to deliver their own data centers into a private cloud and hence, extend the
services to other organizations.
Nimbus Nimbus is a toolkit that, once included in the collection, provides infrastructure such as a
cloud of service to its client through the WSRF-based web service APIs or Amazon EC2 WSDL.
Nimbus is a free and open source software, subject to the requirements of the Apache License,
version 2.
Nimbus supports both the Xen and KVM hypervisors as well as the portable device organizers
Portable Batch System and Oracle Grid Engine. Allows the submission of customized visual
clusters for content. It is adjustable in terms of planning, network rental, and accounting usage.
Open Stack
OpenStack is an open - source cloud operating system that is increasingly gaining admiration
among data centers. This is because OpenStack provides a cloud computing platform to handle
enormous computing, storage, database and networking resources in a data center. In simple
way we can say, OpenStack is an opensource highly scalable cloud computing platform that
provides tools for developing private, public or hybrid clouds, along with a web interface for users
to access resources and admins to manage those resources.
Put otherwise, OpenStack is a platform that enables potential cloud providers to create,
manage and bill their custom-made VMs to their future customers. OpenStack is free and open,
which essentially means that everyone can have access to its source code and can suggest or make
www.EnggTree.com
changes to it and share it with the OpenStack community. OpenStack is an open-source and freely
available cloud computing platform that enables its users to create, manage and deploy virtual
machines and other instances. Technically, OpenStack provides Infrastructure-as-a-Service
(IaaS) to its users to enable them to manage virtual private servers in their data centers.
OpenStack provides the required software tools and technologies to abstract the underlying
infrastructure to a uniform consumption model. Basically, OpenStack allows various
organisations to provide cloud services to the user community by leveraging the organization’s
pre-existing infrastructure. It also provides options for scalability so that resources can be scaled
whenever organisations need to add more resources without hindering the ongoing processes.
The main objective of OpenStack is to provide a cloud computing platform that is :
• Global
• Open-source
• Freely available
• Easy to use
• Highly and easily scalable
• Easy to implement
• Interoperable
OpenStack is for all. It satisfies the needs of users, administrators and operators of private
clouds as well as public clouds. Some examples of open-source cloud platforms already available
are Eucalyptus, OpenNebula, Nimbus, CloudStack and OpenStack, which are used for
infrastructure control and are usually implemented in private clouds.
Components of OpenStack
OpenStack consists of many different components. Because OpenStack cloud is open - source,
developers can add components to benefit the OpenStack community. The following are the core
components of OpenStack as identified by the OpenStack community:
• Nova : This is one of the primary services of OpenStack, which provides numerous tools for
the deployment and management of a large number of virtual machines. Nova is the
compute service of OpenStack.
• Swift : Swift provides storage services for storing files and objects. Swift can be equated
with Amazon’s Simple Storage System (S3).
• www.EnggTree.com
Cinder : This component provides block storage to Nova Virtual Machines. Its working
is similar to a traditional computer storage system where the computer is able to access
specific locations on a disk drive. Cinder is analogous to AWS’s EBS.
• Glance : Glace is OpenStack’s image service component that provides virtual templates
(images) of hard disks. These templates can be used for new VMs. Glance may use either
Swift or flat files to store these templates.
• Neutron (formerly known as Quantum) : This component of OpenStack provides
Networking-as-a- Service, Load-Balancer-as-a-Service and Firewall- as-a-Service. It also
ensures communication between other components.
• Heat : It is the orchestration component of OpenStack. It allows users to manage
infrastructural needs of applications by allowing the storage of requirements in files.
• Keystone : This component provides identity management in OpenStack
• Horizon : This is a dashboard of OpenStack, which provides a graphical interface.
• Ceilometer : This component of OpenStack provisions meters and billing models for
users of the cloud services. It also keeps an account of the resources used by each
individual user of the OpenStack cloud. Let us also discuss some of the non- core
components of OpenStack and their offerings.
The basic architectural components of OpenStack, shown in Fig:4.12, includes its core and
optional services/ components. The optional services of OpenStack are also known as Big Tent
services, and OpenStack can be used without these components or they can be used as per
www.EnggTree.com
requirement.
We have already discussed the core services and the four optional services. Let us now
discuss the rest of the services.
• Designate : This component offers DNS services analogous to Amazon’s Route 53.
The following are the subsystems of Designate :
Mini DNS Server
Pool Manager
Central Service and APIs
• Barbican : Barbican is the key management service of OpenStack that is comparable to
KMS from AWS. This provides secure storage, retrieval, and provisioning and management
of various types of secret data, such as keys, certificates, and even binary data.
• AMQP : AMQP stands for Advanced Message Queue Protocol and is a messaging mechanism
used by OpenStack. The AQMP broker lies between two components of Nova and enables
communication in a slackly coupled fashion.
Further, OpenStack uses two architectures - Conceptual and Logical, which are
discussed in the next section.
OpenStack helps build cloud environments by providing the ability to integrate various
technologies of your choice. Apart from the fact that OpenStack is open-source, there are
numerous benefits that make it stand out. Following are some of the features and benefits of
OpenStack Cloud :
• Compatibility : OpenStack supports both private and public clouds and is very easy to
deploy and manage. OpenStack APIs are supported in Amazon Web Services. The
compatibility eliminates the need for rewriting applications for AWS, thus enabling easy
portability between public and private clouds.
• Security : OpenStack addresses the security concerns, which are the top- most concerns
www.EnggTree.com
for most organisations, by providing robust and reliable security systems.
• Real-time Visibility : OpenStack provides real-time client visibility to administrators,
including visibility of resources and instances, thus enabling administrators and providers
to track what clients are requesting for.
• Live Upgrades : This feature allows upgrading services without any downtime.
Earlier, for upgradations, the was a need for shutting-down complete systems, which
resulted in loss of performance. Now, OpenStack has enabled upgrading systems while they
are running by requiring only individual components to shut- down.
Apart from these, OpenStack offers other remarkable features, such as networking,
compute, Identity Access Management, orchestration, etc.
Fig. 4.13, depicting a magnified version of the architecture by showing relationships among
different services and between the services and VMs. This expanded representation is
also known as the Conceptual architecture of OpenStack.
From Fig. 5.7.2, we can see that every service of OpenStack depends on other services within
the systems, and all these services exist in a single ecosystem working together to produce a
www.EnggTree.com
virtual machine. Any service can be turned on or off depending on the VM required to be
produced. These services communicate with each other through APIs and in some cases through
privileged admin commands.
Let us now discuss the relationship between various components or services specified in the
conceptual architecture of OpenStack. As you can see in Figure 4.2, three components, Keystone,
Ceilometer and Horizon, are shown on top of the OpenStack platform.
Here, Horizon is providing user interface to the users or administrators to interact with
underlying OpenStack components or services, Keystone is providing authentication to the
user by mapping the central directory of users to the accessible OpenStack services, and
Ceilometer is monitoring the OpenStack cloud for the purpose of scalability, billing,
benchmarking, usage reporting and other telemetry services. Inside the OpenStack platform, you
can see that various processes are handled by different OpenStack services; Glance is registering
Hadoop images, providing image services to OpenStack and allowing retrieval and storage of
disk images. Glance stores the images in Swift, which is responsible for providing reading service
and storing data in the form of objects and files. All other OpenStack components also store data
in Swift, which also stores data or job binaries. Cinder, which offers permanent block storage or
volumes to VMs, also stores backup volumes in Swift. Trove stores backup databases in Swift and
boots databases instances via Nova, which is the main computing engine that provides and
manages virtual machines using disk images.
Neutron enables network connectivity for VMs and facilitates PXE Network for Ironic that
fetches images via Glance. VMs are used by the users or administrators to avail and provide the
benefits of cloud services. All the OpenStack services are used by VMs in order to provide best
services to the users. The infrastructure required for running cloud services is managed by Heat,
which is the orchestration component of OpenStack that orchestrates clusters and stores the
necessarys resource requirements of a cloud application. Here, Sahara is used to offer a
simple means of providing a data processing framework to the cloud users.
Table 4.14 shows the dependencies of these services.
OpenStack majorly operates in two modes - single host and multi host. A single host mode of
operation is that in which the network services are based on a central server, whereas a multi
host operation mode is that in which each compute node has a duplicate copy of the network
running on it and the nodes act like Internet gateways that are running on individual nodes.
In addition to this, in a multi host operation mode, the compute nodes also individually host
floating IPs and security groups. On the other hand, in a single host mode of operation, floating
IPs and security groups are hosted on the cloud controller to enable communication.
Both single host and multi host modes of operations are widely used and have their own
set of advantages and limitations. A single host mode of operation has a major limitation that if
the cloud controller goes down, it results in the failure of the entire system because instances
stop communicating. This is overcome by a multi host operation mode where a copy of the
network is provisioned to every node. Whereas, this limitation is overcome by the multi host
mode, which requires a unique public IP address for each compute node to enable
communication. In case public IP addresses are not available, using the multi host mode is not
possible.
PART-A
7. Define GAE.
Google App Engine (GAE) is a platform-as-a-service cloud computing model that
supports many programming languages. GAE is a scalable runtime environment mostly
devoted to execute Web applications. In fact, it allows developers to integrate third-party
www.EnggTree.com
frameworks and libraries with the infrastructure still being managed by Google. It allows
developers to use readymade platform to develop and deploy web applications using
development tools, runtime engine, databases and middleware solutions. It supports
languages like Java, Python, .NET, PHP, Ruby, Node.js and Go in which developers can write
their code and deploy it on available google infrastructure with the help of Software
Development Kit (SDK). In GAE, SDKs are required to set up your computer for developing,
deploying, and managing your apps in App Engine.
OpenStack is an open source highly scalable cloud computing platform that provides
tools for developing private, public or hybrid clouds, along with a web interface for
users to access resources and admins to manage those resources.
The different components of Openstack architecture are :
a. Nova (Compute)
b. Swift (Object storage)
c. Cinder (Block level storage)
d. Neutron (Networking)
e. Glance (Image Management)
f. Keystone (Key management) g. Horizon (Dashboard)
h. Ceilometer (Metering)
i. Heat (Orchestration)
this feature, users don't need to worry about tracking the latest version or who made
any changes.
• Data Protection - By storing data on cloud storage services, data is well protected
against all kinds of disasters, such as floods, earthquakes and human error.
• Disaster Recovery - Data stored in the cloud is not only protected from disasters by
having the same copy at several locations, but can also favor disaster recovery in
order to ensure business continuity.
The description about popular cloud storage providers are given as follows :
commodity servers that can store petabytes of data together. Bigtable has been
designed with very high speed, versatility and extremely high scalability in mind.
• The size of the Bigtable database can be petabytes, spanning thousands of distributed
servers. Bigtable is now open to developers as part of the Google App Engine, their
cloud computing platform.
• Microsoft Live Mesh : Windows Live Mesh was a free-to-use Internet-based file
synchronization application designed by Microsoft to enable files and directories
between two or more computers to be synchronized on Windows or Mac OS
platforms. It has support of mesh objects that consists of data feeds, which can be
represented in Atom, RSS, JSON, or XML. It uses Live Framework APIs to share any
data item between devices that recognize the data.
• Nirvanix : Nirvanix offers public, hybrid and private cloud storage services with
usage-based pricing. It supports Cloud-based Network Attached Storage
(CloudNAS) to store data in premises. Nirvanix CloudNAS is intended for
businesses that manage archival, backup, or unstructured archives that need long-
term, secure storage, or organizations that use automated processes to migrate files
www.EnggTree.com
to mapped drives. The CloudNAS has built-in disaster data recovery and automatic
data replication feature for up to three geographically distributed storage nodes.
17. What is meant by Amazon Elastic Block Store (EBS) and SimpleDB?
• The Elastic Block Store (EBS) provides the volume block interface for saving and
restoring the virtual images of EC2 instances.
• Traditional EC2 instances will be destroyed after use. The status of EC2 can now be saved
in the EBS system after the machine is shut down. Users can use EBS to save persistent
data and mount to the running instances of EC2.
18. What is Amazon SimpleDB Service?
• SimpleDB provides a simplified data model based on the relational database data model.
Structured data from users must be organized into domains. Each domain can be
considered a table. The items are the rows in the table.
• A cell in the table is recognized as the value for a specific attribute (column name) of the
corresponding row. This is similar to a table in a relational database. However, it is
possible to assign multiple values to a single cell in the table.
Windows Azure offers a cloud platform built on Windows OS and based on Microsoft
www.EnggTree.com
virtualization technology. Applications are installed on VMs deployed on the data-center
servers. Azure manages all servers, storage, and network resources of the data center.
SQL Azure This function makes it easier for users to visit and use the relational database
associated with the SQL server in the cloud.
• Dynamic CRM service This provides software developers a business platform in managing
CRM applications in financing, marketing, and sales and promotions.
The Azure Table and Queue storage modes are aimed at much smaller data volumes. Queues
provide reliable message delivery and are naturally used to support work spooling between web
and worker roles. Queues consist of an unlimited number of messages which can be retrieved
and pro- cessed at least once with an 8 KB limit on message size.
www.EnggTree.com
UNIT V
CLOUD SECURITY
Virtualization System-Specific Attacks: Guest hopping – VM migration attack – hyperjacking. Data
Security and Storage; Identity and Access Management (IAM) - IAM Challenges - IAM Architecture
and Practice.
GUEST HOPPING:
Guest Hopping:
Guest hopping in the context of virtual machines and cloud computing typically refers to an attack scenario
where an unauthorized user gains access to multiple virtual machines within a cloud environment. This
type of attack can potentially compromise the security and integrity of the virtual machines and the data
they contain.
Here are some considerations related to guest hopping attacks on virtual machines in a cloud computing
environment:
www.EnggTree.com
Shared Infrastructure: Cloud computing often involves the sharing of physical resources among multiple
virtual machines. If an attacker successfully compromises one virtual machine, they may attempt to
leverage that access to gain unauthorized access to other virtual machines within the same infrastructure.
Hypervisor Vulnerabilities: The hypervisor is the software layer that manages and orchestrates virtual
machines in a cloud environment. Exploiting vulnerabilities in the hypervisor could allow an attacker to
break the isolation between virtual machines, enabling guest hopping attacks.
Misconfigurations: Misconfigurations in virtual machine settings or the cloud infrastructure can create
security weaknesses that attackers can exploit to move laterally between virtual machines. This includes
weak authentication mechanisms, insecure network configurations, or inadequate access controls.
Privilege Escalation: Once inside a virtual machine, an attacker may attempt to escalate their privileges to
gain administrative or root access. This can be achieved by exploiting vulnerabilities in the guest operating
system or misconfigurations within the virtual machine.
Data Exfiltration and Malware Propagation: Once an attacker gains access to multiple virtual machines,
they may exfiltrate sensitive data from those machines or propagate malware to further compromise the
cloud infrastructure or launch attacks on other targets.
To mitigate guest hopping attacks in a cloud computing environment, it is crucial to follow security best
practices:
Regularly patch and update the hypervisor, virtual machine software, and guest operating systems to
address known vulnerabilities.
Implement strong access controls, including robust authentication mechanisms, to prevent unauthorized
access to virtual machines.Employ network segmentation and isolation techniques to limit lateral
movement between virtual machines.
Use intrusion detection and prevention systems to monitor and detect suspicious activities within the cloud
environment.
Regularly audit and review system logs for any signs of unauthorized access or anomalous behavior.
Educate users and administrators about secure configuration practices and the importance of adhering to
security guidelines.
By implementing these measures, organizations can reduce the risk of guest hopping attacks and enhance
the security of their cloud computing environments.
www.EnggTree.com
Application-level security issues (or cloud service provider CSP level attacks) refer to intrusion from the
malicious attackers due to vulnerabilities of the shared nature of the cloud. Some companies host their
applications in shared environments used by multiple users, without considering the possibilities of
exposure to security breaches, such as:
1. SQL Injection
An unauthorized user gains access to the entire database of an application by inserting malicious code into
a standard SQL code. Often used to attack websites, SQL injection can be avoided by the usage of
dynamically generated SQL in the code. It is also necessary to remove all stored procedures that are rarely
used and assign the least possible privileges to users who have permission to access the database.
2. Guest-Hopping Attack
In guest-hopping attacks, due to the separation failure between shared infrastructures, an attacker gets
access to a virtual machine by penetrating another virtual machine hosted in the same hardware. One
possible mitigation of guest-hopping attack is the Forensics and VM debugging tools to observe any attempt
to compromise the virtual machine. Another solution is to use the High Assurance Platform (HAP), which
provides a high degree of isolation between virtual machines.
3. Side-Channel Attack
An attacker opens a side-channel attack by placing a malicious virtual machine on the same physical
machine as the victim machine. Through this, the attacker gains access to all confidential information on
the victim machine. The countermeasure to eliminate the risk of side-channel attacks in a virtualized cloud
environment is to ensure that no legitimate user VMs reside on the same hardware of other users.
4. Malicious Insider
A malicious insider can be a current or former employee or business associate who maliciously and
intentionally abuses system privileges and credentials to access and steal sensitive customer information
within the network of an organization. Strict privilege planning and security auditing can minimize this
security risk that originates from within an organization.
5. Cookie Poisoning
www.EnggTree.com
Cookie poisoning means to gain unauthorized access into an application or a webpage by modifying the
contents of the cookie. In a SaaS model, cookies contain user identity credential information that allows the
applications to authenticate the user identity. Cookies are forged to impersonate an authorized user. A
solution is to clean up the cookie and encrypt the cookie data.
The backdoor is a hidden entrance to an application, which was created intentionally or unintentionally by
developers while coding. Debug option is also a similar entry point, often used by developers to facilitate
troubleshooting in applications. But the problem is that the hackers can use these hidden doors to bypass
security policies and enter the website and access the sensitive information. To prevent this kind of attack,
developers should disable the debugging option.
A web browser is a universal client application that uses Transport Layer Security (TLS) protocol to
facilitate privacy and data security for Internet communications. TLS encrypts the connection between web
applications and servers, such as web browsers loading a website. Web browsers only use TLS encryption
and TLS signature, which are not secure enough to defend malicious attacks. One of the solutions is to use
TLS and at the same time XML based cryptography in the browser core.
A malicious virtual machine or service implementation module such as SaaS or IaaS is injected into
the cloud system, making it believe the new instance is valid. If succeeded, the user requests are redirected
automatically to the new instance where the malicious code is executed. The mitigation is to perform an
integrity check of the service instance before using it for incoming requests in the cloud system.
9. ARP Poisoning
Address Resolution Protocol (ARP) poisoning is when an attacker exploits some ARP protocol weakness to
map a network IP address to one malicious MAC and then update the ARP cache with this malicious MAC
address. It is better to use static ARP entries to minimize this attack. This tactic can work for small networks
such as personal clouds, but it is easier to use other strategies such as port security features on large-scale
clouds to lock a single port (or network device) to a particular IP address.
VM migration is the process of moving a virtual machine from one physical host to another within a cloud
infrastructure. It allows for workload balancing, resource optimization, and maintenance activities in a
dynamic cloud environment. However, if the migration process is compromised, it can lead to security risks
and potential unauthorized access to VMs and their data.
1. Interception of Migration Traffic: Attackers may attempt to intercept the migration traffic between the
source and destination hosts. By positioning themselves as a "man-in-the-middle," they can eavesdrop on
sensitive information, such as the VM's contents, network communications, and credentials exchanged
during the migration process.
2. Unauthorized VM Migration: An attacker might try to initiate unauthorized VM migrations within the
cloud environment. This can involve moving VMs to their control or to compromised hosts under their
influence. Once the VM is under their control, the attacker can gain access to sensitive data, manipulate the
VM's behavior, or disrupt the cloud infrastructure.
3. Exploiting Migration Channel Vulnerabilities: The migration process relies on communication channels
and protocols between the source and destination hosts. Attackers may exploit vulnerabilities or
weaknesses in these channels to gain unauthorized access, inject malicious code into the VM, or manipulate
the migration process to their advantage.
4. Resource Exhaustion: Attackers can target the resources involved in the migration process, such as
network bandwidth or storage, to cause resource exhaustion. By overwhelming these resources, the
attacker can disrupt VM migrations, cause denial-of-service conditions, or impact the availability and
performance of other cloud services.
5. VM Rollback Attacks: During VM migration, checkpoints or snapshots are often created to ensure data
consistency. Attackers might attempt to tamper with these snapshots or manipulate the rollback process.
By doing so, they can compromise the integrity of the VM or introduce unauthorized changes to the VM's
state or data.
To mitigate VM migration attacks in a cloud computing environment, the following security measures are
recommended: www.EnggTree.com
1. Secure Communication Channels: Encrypting the migration traffic using secure protocols, such as
SSL/TLS, ensures the confidentiality and integrity of data during transit. This protects against interception,
eavesdropping, and tampering.
2. Authentication and Authorization: Implement strong authentication mechanisms and access controls to
ensure that only authorized users can initiate or participate in VM migration processes. This prevents
unauthorized individuals from manipulating or intercepting the migration.
3. Intrusion Detection and Prevention: Employ intrusion detection and prevention systems (IDPS) to
monitor migration traffic and detect any anomalous or malicious activities. IDPS can identify suspicious
patterns, detect unauthorized access attempts, and raise alerts for further investigation.
4. Network Segmentation: Segmenting the network infrastructure separates migration traffic from other
data traffic. This reduces the attack surface and limits the impact of potential compromises. Additionally,
network segmentation helps in applying more granular security controls and monitoring specifically for
migration-related activities.
5. Resource Monitoring and Protection: Implement mechanisms to monitor resource utilization during VM
migrations. This helps in detecting and mitigating resource exhaustion attacks, ensuring that sufficient
resources are available to complete migrations successfully.
6. Regular Updates and Patching: Keep the migration infrastructure, including the hypervisor and
migration software, up to date with the latest security patches. Regular updates address known
vulnerabilities and minimize the risk of exploitation.
7. Auditing and Logging: Enable comprehensive logging of migration activities, including the source and
destination hosts, migration timestamps, and user information. Regularly review the logs to identify any
suspicious or unauthorized activities. Logging is crucial for forensic analysis and investigations in case of a
security incident.
By implementing these security measures, organizations can strengthen the security and integrity of VM
migration processes within their cloud computing
1. Cold Migration :
A powered down Virtual Machine is carried to separate host or data store. Virtual Machine’s power state
is OFF and there is no need of common shared storage. There is a lack of CPU check and there is long
www.EnggTree.com
shortage time. Log files and configuration files are migrated from the source host to the destination host.
2. The first host’s Virtual Machine is shut down and again started on next host. Applications and OS
are terminated on Virtual Machines before moving them to physical devices. User is given choice of
movement of disks associated from one data store to another one.
3. Hot Migrations :
A powered on Virtual Machine is moved from one physical host to another physical host. A source host
state is cloned to destination host and then that source host state is discarded. Complete state is shifted to
the destination host. Network is moved to destination Virtual Machine.
A common shared storage is needed and CPU checks are put into use. Shortage time is very little. Without
stoppage of OS or applications, they are shifted from Virtual Machines to physical machines. The physical
server is freed for maintenance purposes and workloads (which are among physical servers) are
dynamically balanced so as to run at optimized levels. Downtime of clients is easily avoidable.
Suspend first host’s Virtual Machine and then clone it across registers of CPU and RAM and again resume
some time later on second host. This migration runs when source system is operative.
Stage-0:
Is Pre-Migration stage having functional Virtual Machine on primary host.
Stage-1:
Is Reservation stage initializing container on destination host.
Stage-2:
Is Iterative pre-copy stage where shadow paging is enabled and all dirty pages are cloned in succession
rounds.
Stage-3:
www.EnggTree.com
Is Stop and copy where first host’s Virtual Machine is suspended and all remaining Virtual Machine state
are synchronized on second host.
Stage-4:
Is Commitment where there is minimization of Virtual Machine state on first host.
Stage-5:
Is Activation stage where second host’s Virtual Machine start and establishes connection to all local
computers resuming all normal activities.
provides the technologies of virtual machine migration with its execution procedures. Basically it is the
process of migrate a virtual machine from one host to another. It also has the capability to move workload
of multiple running virtual machines on a single physical machine. The main difference between
virtualization and virtual machine migration is that only migration module is inculcate with hypervisor.
The architecture of virtual machine migration virtualized platform is shown in figure:
Although, the process of migration has been initiated in 1980, but it was used often-ally, due to its main
limitation i.e. how to handle interaction between various modules of operating system. But it overcomes in
www.EnggTree.com
virtual machine migration because it moves the whole operating system along with running processes. VM
migration becomes this process simplified and efficient. It also takes care of load balancing, energy
consumption, workload consolidation etc. Henceforth, it becomes more popular and wide adoption in
industry. Below table describes the types of VM migrations.
Cold Migration Before migration, the virtual machine must be powered off, after
doing this task. The old one should be deleted from source host.
Moreover, the virtual machine need not to be on shared storage.
Warm Migration Whenever transfer OS and any application, there is no need to
suspend the source host. Basically it has high demand in public cloud.
Live Migration It is the process of moving a running virtual machine without
stopping the OS and other applications from source host to
destination host.
Techniques of VM Migration
This subsection describes the types of virtual machine migration techniques. It is basically of two types:-
copy Migration
Po-Copy Migration
Pre- Copy Migration: In this migration, the hypervisor copies all memory page from source machine to
destination machine while the virtual machine is running. It has two phases: Warm- up Phase and stop and
copy phase.
Warm Up Phase: During copying all memory pages from source to destination, some memory pages
changed because of source machine CPU is active. All the changed memory paged known as dirty pages. All
these dirty pages are required to recopy on destination machine; this phase is called as warm up phase.
Stop & Copy Phase: Warm up phase is repeated until all the dirty pages recopied on destination machine.
This time CPU of source machine is deactivated till all memory pages will transfer another machine.
Ultimately at this time CPU of both source and destination is suspended, this is known as down time phase.
This is the main thing that has to explore in migration for its optimization.
Post- Copy Migration: In this technique, VM at the source is suspended to start post copy VM migration.
www.EnggTree.com
When VM is suspended, execution state of the VM (i.e. CPU state, registers, non-pageable memory) is
transferred to the target. In parallel the sources actively send the remaining memory pages of the VM to
the target. This process is known as pre-paging. At the target, if the VM tries to access a page that has not
been transferred yet, it generates a page fault, also known as network faults. These faults are redirect to
the source, which responds with the faulted pages. Due to this, the performance of applications is degrading
with number of network faults. To overcome this, pre-paging scheme is used to push pages after the last
fault by dynamically using page transmission order. Figure 3 & 4 shows the pre copy and post copy
migration technique respectively.
www.EnggTree.com
Hyperjacking:
Hyperjacking refers to a type of cyber attack where an attacker gains unauthorized access and control over
a hypervisor in a virtualized environment. The hypervisor is a critical component that manages and
orchestrates multiple virtual machines (VMs) on a physical host. By compromising the hypervisor, the
attacker can potentially control and manipulate the entire virtualized infrastructure, including the VMs
running on it.
2. Privilege Escalation: Once an attacker gains access to the hypervisor, they can attempt to escalate their
privileges to gain administrative control. This allows them to control the VMs, modify their configurations,
access sensitive data, or even launch further attacks within the virtualized environment.
www.EnggTree.com
3. Lateral Movement: Hyperjacking attacks may involve lateral movement within the virtualized
environment. Once the hypervisor is compromised, the attacker can attempt to move laterally across
different VMs, potentially compromising their security and integrity.
4. Data Exfiltration and Manipulation: With control over the hypervisor and potentially the VMs, an attacker
can exfiltrate sensitive data from the VMs or manipulate their contents. This can lead to unauthorized
access to critical information or tampering with the operations of virtualized systems.
To prevent hyperjacking attacks and enhance the security of virtualized environments, consider the
following measures:
1. Patching and Updates: Keep the hypervisor software and virtualization platform up to date with the latest
security patches and updates. Regularly applying patches helps mitigate known vulnerabilities that could
be exploited by attackers.
2. Secure Configuration: Follow best practices for securing the hypervisor and virtualized environment,
such as disabling unnecessary services, implementing strong authentication mechanisms, and applying
appropriate access controls.
3. Hypervisor Hardening: Employ hypervisor hardening techniques to reduce the attack surface and
strengthen the security of the hypervisor. This can include measures like disabling unused features,
configuring secure network settings, and employing intrusion detection systems for monitoring.
4. Network Segmentation: Implement network segmentation within the virtualized environment to isolate
different VMs and limit the lateral movement of an attacker who gains access to the hypervisor. This helps
contain the impact of a potential hyperjacking attack.
5. Hypervisor Security Monitoring: Implement robust monitoring and logging mechanisms to detect
suspicious activities and potential signs of hyperjacking. Monitor hypervisor logs, network traffic, and
other relevant indicators to identify any unauthorized access or abnormal behavior.
6. Access Control and Authentication: Implement strong access control measures for the hypervisor,
including multi-factor authentication, role-based access control, and regular review of access privileges.
This helps minimize the risk of unauthorized access to the hypervisor.
By implementing these security measures, organizations can reduce the risk of hyperjacking attacks and
strengthen the overall security posture of their virtualized environments. Regular security assessments
and audits can also help identify and address potential vulnerabilities before they are exploited by
attackers. www.EnggTree.com
In addition to the security of your own customer data, customers should also be concerned about what data
the provider collects and how the CSP protects that data. Specifically with regard to your customer data,
what metadata does the provider have about your data
Storage
For data stored in the cloud (i.e., storage-as-a-service), we are referring to IaaS and not data
associated with an application running in the cloud on PaaS or SaaS. The same three information security
concerns are associated with this data stored in the cloud (e.g., Amazon’s S3) as with data stored elsewhere:
confidentiality, integrity, and availability.
Confidentiality
For large organizations, this coarse authorization presents significant security concerns unto itself. Often,
the only authorization levels cloud vendors provide are administrator authorization and user authorization
with no levels in between. Again, these access control issues are not unique to CSPs
Symmetric Encryption Diagram
IAM
5. Draw the architecture of IAM and explain in detail.
• Identity Governance: The ability in making sure the right people are granted the right access rights,
making sure the wrong ones are not and managing the lifecycle through organization structure, processes
and enabling technology.
• Directory Services: The ability in enforcing access rights, within specified policy, when users attempt
to access a desired application, system or platform.
• Access Management: The ability to provide ways to control storage of identity information about
users and access rights.
Why IAM?
• Properly architected IAM technology and processes can improve efficiency by automating user on-
boarding and other repetitive tasks
– HIPAA, SOX
• User management
• Authentication management
• Authorization management
• Access management
www.EnggTree.com
Enterprise IAM functional architecture
• Federation or SSO
• Authorization management
• Compliance management
www.EnggTree.com
– Is the organization ready to provision and manage the user life cycle by extending its established IAM
practice to the SaaS service?
– Are the SaaS provider capabilities sufficient to automate user provisioning and life cycle management
without implementing a custom solution for the SaaS service?
Customer responsibilities
• User provisioning
• Profile management
• Investigation support
• Compliance management
PaaS
• Customer Responsibility
• CCID (this database is generally community-supported, and may not reflect all CSPs and all incidents
that have occurred)
•
www.EnggTree.com
CSP customer mailing list that notifies customers of occurring and recently occurred outages
• RSS feed for RSS readers with availability and outage information
• Availability of your virtual servers and the attached storage for compute services.
• Availability of virtual storage that your users and virtual server depend on for storage service.
• Availability of your network connectivity to the Internet or virtual network connectivity to IaaS
services.
• Availability of network services, including a DNS, routing services, and authentication services required
to connect to the IaaS service.
• CCID (this database is generally community-supported, and may not reflect all CSPs and all incidents
that have occurred).
• CSP customer mailing list that notifies customers of occurring and recently occurred outages.
• Internal or third-party-based service monitoring tools (e.g., Nagios) that periodically check the health
of your IaaS virtual server.
• Typical issues with regard to the dependence on the Cloud Computing provider are
– Cloud Computing provider were to go bankrupt and stopped providing services, the customer could
experience problems in accessing data and therefore potentially in business continuity
– Some widely used Cloud Computing services (e.g. GoogleDocs) do not include any contract between the
customer and Cloud Computing provider.
The IAM architecture is made up of several processes and activities (see Fig. 4.9.2). The processes
supported by IAM are given as follows.
a) User management - It provides processes for managing the identity of different entities.
b) Authentication management - It provides activities for management of the process for
determining that an entity is who or what it claims to be.
c) Access management - It provides policies for access control in response to request for resource by
entity.
d) Data management - It provides activities for propagation of data for authorization to
resources using automated processes.
e) Authorization management - www.EnggTree.com
It provides activities for determining the rights associated with
entities and decide what resources an entity is permitted to access in accordance with the organization’s
policies.
f) Monitoring and auditing - Based on the defined policies, it provides monitoring, auditing, and
reporting compliance by users regarding access to resources.
The activities supported by IAM are given as follows.
a) Provisioning - The provisioning has essential processes that provide users with necessary
access to data and resources. It supports management of all user account operations like add, modify,
suspend, and delete users with password management. By provisioning the users are given access to data,
systems, applications, and databases based on a unique user identity. The deprovisioning does the reverse
of provisioning which deactivate of delete the users identity with privileges.
b) Credential and attribute management - The Credential and attribute management prevents identity
impersonation and inappropriate account use. It deals with management of credentials and user attributes
such as create, issue, manage and revoke users to minimize the business risk associated with it. The
individuals’ credentials are verified during the authentication process. The Credential and attribute
management processes include provisioning of static or dynamic attributes that comply with a password
standard, encryption management of credentials and handling access policies for user attributes.
c) Compliance management - The Compliance management is the process used for monitoring
the access rights and privileges and tracked to ensure the security of an enterprise’s resources. It also
helpful to auditors to verify the compliance to various access control policies, and standards. It includes
practices like access monitoring, periodic auditing, and reporting.
Identity federation management - Identity federation management is the process of managing the trust
relationships beyond the network boundaries where organizations come together to exchange the
information about their users and entities.
e) Entitlement management - In IAM, entitlements are nothing but authorization policies. The
Entitlement management provides processes for provisioning and deprovisioning of privileges needed
for the users to access the resources
including systems, applications, and databases.
www.EnggTree.com
IAM
Security Standards
Security standards are needed to define the processes, measures and practices required to implement the
security program in a web or network environment. These standards also apply to cloud-related IT
exercises and include specific actions to ensure that a secure environment is provided for cloud
services along with privacy for confidential information. Security standards are based on a set of key
principles designed to protect a trusted environment of this kind. The following sections explain the
different security
standards used in protecting cloud environment.
Security Assertion Markup Language (SAML) is a security standard developed by OASIS Security Services
Technical Committee that enables Single Sign-On technology (SSO) by offering a way of authenticating a
user once and then communicating authentication to multiple applications. It is an open standard for
exchanging authentication and authorization data between parties, in particular, between an identity
provider and a service provider.
It enables Identity Providers (IdPs) to pass permissions and authorization credentials to Service
Providers (SP). A range of existing standards, including SOAP, HTTP, and XML, are incorporated into
SAML. An SAML transactions use Extensible Markup Language (XML) for standardized communications
between the identity provider and service providers. SAML is the link between the authentication of a
www.EnggTree.com
user’s identity and the authorization to use a service. The majority of SAML transactions are in a
standardized XML form. The XML schema is mainly used to specify SAML assertions and protocols. For
authentication and message integrity, both SAML 1.1 and SAML 2.0 use digital signatures based on the
XML Signature Standard. XML encryption is supported in SAML 2.0 but not by SAML 1.0 as it doesn’t
support encryption capabilities. SAML defines assertions, protocol, bindings and profiles based on XML.
OAuth is a standard protocol which allows secure API authorization for various types of web applications
in a simple, standard method. OAuth is an open standard for delegating access and uses it as a way of
allowing internet users to access their data on websites and applications without passwords. It is a
protocol that enables secure authorization from web, mobile or desktop applications in a simple and
standard way. It is a publication and interaction method with protected information. It allows developers
access to their data while securing credentials of their accounts. OAuth enables users to access their
information which is shared by service providers and consumers without sharing all their identities.
This mechanism is used by companies such as Amazon, Google, Facebook, Microsoft and Twitter to
permit the users to share information about their accounts with third party applications or websites. It
specifies a process for resource owners to authorize third-party access to their server resources without
sharing their credentials. Over secure Hypertext Transfer Protocol (HTTPs), OAuth essentially allows
access tokens to be issued to third-party clients by an authorization server, with the approval of
the resource owner. The third party then uses the access token to access the
protected resources hosted by the resource server.
Secure Sockets Layer and Transport Layer Security
Secure Sockets Layer (SSL) and Transport Layer Security (TLS) are cryptographically secure protocols to
provide security and data integrity for TCP/IP based communications. The network connections
segments in the transport layer are encrypted by the TLS and SSL. In web browsers, e-mail, instant
messaging and voice over IP, many implementations of these protocols are widely used. TLS is the latest
updated IETF standard protocol for RFC 5246. The TLS protocol allows client/server applications to
communicate across a network in a way that avoids eavesdropping, exploitation, tampering and
message forgery. TLS uses cryptography to ensure endpoint authentication and data confidentiality.
A more secure bilateral connection mode is also supported by TLS ensuring that both ends of the
connection communicate with the individual they believe is connected. This is called mutual
authentication. The TLS client side must also keep a certificate for mutual authentication. Three basic
phases involve TLS are Algorithm support for pair negotiation involves cipher suites that are
negotiated between the client and the server to determine the ciphers being used; Authentication
www.EnggTree.com
and key exchange involves decisions on authentication algorithms and key exchange to be used.
Here key exchange and authentication algorithms are public key algorithms; and Message authentication
using Symmetric cipher encryption determines the message authentication codes. The Cryptographic
hash functions are used for message authentication codes. Once these decisions are made, the transfer
of data can be commenced.
PART-A
Firewalls, intrusion detection and prevention, integrity monitoring, and log inspection can all be deployed
as software on virtual machines to increase protection and maintain compliance integrity of servers and
applications as virtual resources move from on-premises to public cloud environments.
Integrity monitoring and log inspection software must be applied at the virtual machine level.
2. Define IAM.
Identity and access management is a critical function for every organization, and a fundamental
expectation of SaaS customers is that the principle of least privilege is granted to their data.
Security standards define the processes, procedures, and practices necessary for implementing a security
program.
These standards also apply to cloud related IT activities and include specific steps that should be taken to
ensure a secure environment is maintained that provides privacy and security of confidential information
in a cloud environment.
4. What is SAML?
It allows businesses to securely send assertions between partner organizations regarding the identity
and entitlements of a principal.
• Authentication statements
www.EnggTree.com
• Attribute statements
• Authorization decision statements
• A SAML protocol describes how certain SAML elements (including assertions) are packaged within
SAML request and response elements SAML protocol is a simple request–response protocol.
• The most important type of SAML protocol request is a query.
• Authentication query
• Attribute query
• Authorization decision query.
8. What is OAuth?
• OAuth (Open authentication) is an open protocol, initiated by Blaine Cook and Chris Messina, to
allow secure API authorization in a simple, standardized method forvarious types of web
applications.
• OAuth is a method for publishing and interacting with protected data.
• OAuth allows users to grant access to their information, which is shared by the
• service provider and consumers without sharing all of their identity.
• OpenID is an open, decentralized standard for user authentication and access control that allows
users to log onto many services using the same digital identity.
• It is a single-sign-on (SSO) method of access control.
• An OpenID is in the form of a unique URL and is authenticated by the entity hosting the OpenID
URL.
• www.EnggTree.com
SSL/TLS Transport Layer Security (TLS) and its predecessor, Secure Sockets Layer (SSL), are
cryptographically secure protocols designed to provide security and data integrity for
communications over TCP/IP.
• TLS and SSL encrypt the segments of network connections at the transport layer.
• TLS also supports a more secure bilateral connection mode whereby both ends of the connection
can be assured that they are communicating with whom they believe they are connected. This is
known as mutual authentication.
• Mutual authentication requires the TLS client side to also maintain a certificate