Introduction to Cloud Computing
🔹 What is Cloud Computing?
Cloud computing refers to the delivery of computing services—like servers, storage, databases,
networking, software, and more—over the Internet (“the cloud”) to offer faster innovation, flexible
resources, and economies of scale.
It turns traditional computing services into a self-service utility, just like electricity or water.
Users can access technology on demand, without needing to understand or manage the
underlying infrastructure.
🌥️ Two Core Concepts of Cloud Computing:
1. Abstraction
○ Hides the complexity of system implementation from users and developers.
○ Users don’t know where apps run or where data is stored.
○ System administration is outsourced, and user access is universal.
2. Virtualization
○ Combines resources into a single system that can be shared.
○ Enables dynamic provisioning of systems and storage.
○ Pay-per-use model, scalable, supports multiple users (multi-tenancy).
🔹 Evolution of Cloud Computing
● Not just the Internet renamed: Though the Internet and intranet use cloud-like
abstraction in diagrams, cloud computing represents a new model.
● Utility Computing Dream: The idea of computing as a utility has been around for
decades and is now being realized thanks to enabling technologies.
🔹 Key Real-World Examples
● Google: Offers Software as a Service (SaaS) through free apps backed by their global
infrastructure.
● Microsoft Azure: A Platform as a Service (PaaS) for .NET developers to run
applications online.
● Amazon Web Services (AWS): An Infrastructure as a Service (IaaS) offering virtual
machines and storage on demand.
🌐 Cloud Types
To better understand cloud computing, we divide it into two main categories:
1. Deployment Models
Define where the cloud infrastructure is located and who manages it.
● Public Cloud: Open to the public or a large group. Owned by service providers (e.g.,
AWS, Azure).
● Private Cloud: Exclusively used by a single organization. Managed internally or by a
third party.
● Hybrid Cloud: A combination of two or more clouds (public, private, community) that
remain separate but are linked.
● Community Cloud: Shared infrastructure for a specific group or organization with
shared concerns (e.g., government agencies).
🔍 Example: The U.S. Government’s Apps.gov is a community cloud serving federal agencies.
2. Service Models
Define what type of service is being offered on the cloud.
● Infrastructure as a Service (IaaS)
🧱
Provides virtual machines, storage, and infrastructure.
📌
You manage the OS and applications.
Examples: Amazon EC2, Linode, RackSpace.
● Platform as a Service (PaaS)
🛠️
Provides OS, runtime, and development tools.
📌
You deploy your apps; the provider manages the platform.
Examples: Google App Engine, Microsoft Azure, Force.com.
● Software as a Service (SaaS)
📋
Delivers ready-to-use apps via a browser.
You just use the software; everything else is handled.
📌 Examples: Google Workspace, SalesForce.com, QuickBooks Online.
This layered model is also called the SPI Model (Software, Platform, Infrastructure).
🏛️ The NIST Model
The National Institute of Standards and Technology (NIST) provides a widely accepted
framework for understanding cloud computing. It separates cloud into:
● Service Models: IaaS, PaaS, SaaS (explained above)
● Deployment Models: Public, Private, Hybrid, Community
✨ NIST’s Key Characteristics of Cloud Computing:
1. On-demand self-service
2. Broad network access
3. Resource pooling
4. Rapid elasticity
5. Measured service
Initially, the NIST model didn't require virtualization or multi-tenancy, but newer versions
include both. It also doesn't fully cover service brokers, provisioning, or integration services,
which are becoming more important in modern cloud computing.
📦 XaaS – Everything as a Service
Beyond IaaS, PaaS, and SaaS, many new service models are emerging:
● StaaS – Storage as a Service
● IdaaS – Identity as a Service
● CmaaS – Compliance as a Service
But most can be grouped under the core SPI model.
📄 1. Characteristics of Cloud Computing
🌐 Key Characteristics
1. On-demand Self-service
○ Users can automatically access computing resources like storage and processing
power without human intervention from the provider.
2. Broad Network Access
○ Services are accessible over the network via standard platforms (e.g., phones,
tablets, laptops, etc.).
3. Resource Pooling
○ Cloud resources are pooled to serve multiple users using a multi-tenant model.
Resources are dynamically assigned and reassigned based on demand.
4. Rapid Elasticity
○ Resources can be scaled up or down quickly. To users, resources appear to be
unlimited and can be purchased in any quantity at any time.
5. Measured Service
○ Cloud systems automatically control and optimize resource use through metering
(e.g., bandwidth, storage, processing). Users are billed based on usage.
🛠️ Additional Features
● Lower Costs: Efficient operations lead to reduced costs for users.
● Ease of Use: Services are typically plug-and-play.
● Quality of Service (QoS): Guaranteed performance levels.
● Reliability: Redundancy and failover systems ensure high availability.
● Outsourced IT: Management and maintenance are handled by the provider.
● Simplified Maintenance: Centralized software updates and patching.
● Low Barrier to Entry: Minimal upfront investment needed.
📄 2. Benefits of Cloud Computing
✅ Major Advantages
1. Cost Efficiency
○ Reduced capital expenses. Pay only for what you use.
2. Scalability and Flexibility
○ Easily adjust computing resources as business needs grow or shrink.
3. Accessibility
○ Access applications and data from anywhere with an internet connection.
4. Disaster Recovery
○ Cloud-based backup solutions simplify recovery in case of system failure.
5. Automatic Updates
○ Providers handle software updates and security patches.
6. Collaboration Efficiency
○ Teams can access, edit, and share documents in real time, from anywhere.
7. Environmentally Friendly
○ Shared resources lead to less energy consumption and carbon output.
8. Faster Deployment
○ Services and applications can be deployed quickly.
9. High Availability
○ Most providers offer 99.9% uptime and robust disaster recovery options.
📄 3. Disadvantages of Cloud Computing
⚠️ Key Concerns
1. Limited Control
○ Users may have less control over infrastructure and services compared to
on-premise systems.
2. Security and Privacy Risks
○ Data stored offsite is vulnerable to breaches, government surveillance, and
mismanagement.
3. Internet Dependency
○ Cloud access requires a stable and fast internet connection.
4. Latency Issues
○ WAN-based services may experience delays in high-speed, data-heavy
operations.
5. Compliance and Legal Risks
○ Regulations like GDPR, HIPAA, and SOX may be difficult to comply with due to
data crossing borders.
6. Downtime
○ Even top providers can experience outages, affecting availability.
7. Vendor Lock-In
○ Migrating from one provider to another can be complex and expensive.
8. Customization Limitations
○ SaaS applications may lack the flexibility of custom-built on-premise software.
9. Performance Variability
○ Shared infrastructure can lead to performance fluctuations, especially during
peak usage.
📘 Understanding Abstraction and Virtualization
🧩 1. Using Virtualization Technologies
✅ Definition
● Virtualization is the process of abstracting physical resources (like CPU, memory,
storage, and network) into logical, manageable units.
● It enables resource pooling and efficient resource management in cloud computing.
🧠 Key Concept
● Virtualization allows multiple virtual systems to run on a single physical system.
● Users access cloud services through virtualized interfaces, not the actual physical
machines.
💡 How Virtualization Works
Concept Description
Logical Naming Physical resources are given logical names and accessed through
pointers.
Dynamic The link between virtual and physical resources is flexible and
Mapping responsive to load changes.
Facile Changes Mapping can be updated instantly without service interruption.
📚 Types of Virtualization in Cloud Computing
Type Description
Access Users can access cloud services from anywhere via virtual interfaces.
Application Multiple instances of an application run in the cloud and requests are routed
based on load.
CPU Physical CPUs are divided into virtual machines or workloads are distributed
using load balancing.
Storage Data is distributed across multiple storage devices and replicated for
availability.
🔄 Mobility Patterns in Virtualization
These patterns define how workloads move between environments:
Pattern Meaning
P2V Physical to Virtual
V2V Virtual to Virtual
V2P Virtual to Physical
P2P Physical to Physical
D2C Datacenter to Cloud
C2C Cloud to Cloud
C2D Cloud to Datacenter
D2D Datacenter to
Datacenter
🧱 Gartner’s Five Cloud Attributes Enabled by Virtualization
1. Service-Based – Abstracted through interfaces.
2. Scalable & Elastic – Adjusts based on demand.
3. Shared Services – Resource pooling.
4. Metered Usage – Pay-as-you-use model.
5. Internet Delivery – Access through internet protocols.
🌐 2. Load Balancing and Virtualization
🚦 What is Load Balancing?
● Distributes workloads across multiple resources (servers, networks, apps).
● Ensures high availability, fault tolerance, and efficient performance.
🛠️ Load Balancing Techniques
Type Description
Hardware-Based Devices like F5 BigIP, Cisco ADCs.
Software-Based Tools like Apache mod_proxy_balancer, Pound,
Squid.
🎯 What Can Be Load Balanced?
● Network Services: DNS, FTP, HTTP
● Connections: Using intelligent switches
● Processing: By server allocation
● Storage: Across devices
● Application Access: Routes user sessions
⚙️ Load Balancing Algorithms
Algorithm Function
Round Robin Cycles through resources equally.
Weighted Round Robin Considers resource capacity.
Least Connections Chooses server with fewest active
connections.
Fastest Response Based on latency.
Time
Custom Based on workload, health, priority, etc.
🔁 Session Persistence
Maintains user sessions across load-balanced systems using:
● Session Cookies (client-side)
● Server-Side DB Replication
● URL Rewrite Engines
💎 Advanced Load Balancers / Application Delivery Controllers (ADCs)
● ADC = Load Balancer + Application Layer Control
● Functions:
○ Health checks
○ Traffic shaping & filtering
○ Data compression
○ TCP offload
○ Authentication
○ SSL termination
🏢 Examples of ADC Vendors:
● F5 Networks, Cisco, Citrix, Akamai, Juniper, Barracuda, A10 Networks
☁️ 3. Case Study: Google Cloud Infrastructure
🌍 Why Google is a Benchmark
● Most visited site
● Runs 1M+ servers
● Processes 1B+ search requests/day
● Generates 20 petabytes of data daily
🏗️ Google Data Center Strategy
Factor Priority
Cheap/Renewable Energy ✅ High
Low Latency Site Connections ✅ High
Peering with Internet Hubs ✅ High
Cooling Availability ✅ Medium
Large Land Purchase ✅ Medium
Tax Concessions ✅ Medium
🔄 How Google Uses Load Balancing
1. DNS Load Balancing (IP Virtualization)
○ Requests resolved to nearest datacenter.
○ Uses round robin DNS.
2. Cluster-Level Load Balancing
○ Incoming traffic distributed across server racks.
3. Proxy Cache Layer (Squid Server)
○ Cached queries answered instantly.
4. Application Server Load Balancing
○ Real-time server utilization measured.
🧠 Google's "Secret Sauce"
● Inverted Index: Maps keywords to document IDs
● Page Rank: Determines importance of pages
● Data Compression: Efficient storage of “shards”
● Fault Tolerance: Automatically reassigns failed tasks
🧰 Other Google Services in Action
● Specialized Servers for calculations, reverse lookups
● AdSense / AdWords for monetization
● Spelling Servers for intelligent suggestions
🔹 Understanding Hypervisors in Cloud Computing
A hypervisor, also known as a Virtual Machine Monitor (VMM), is a low-level program that
enables the creation and management of virtual machines (VMs). It abstracts and isolates the
underlying physical hardware from the operating systems, allowing multiple VMs to run on a
single physical system.
📌 Purpose of Hypervisors
Hypervisors play a central role in virtualization, which is a foundational technology in cloud
computing. They allow cloud providers to:
● Run multiple operating systems on a single physical server.
● Dynamically allocate and manage resources.
● Improve server utilization and reduce costs.
● Enable workload isolation and mobility.
🔹 Types of Hypervisors
Hypervisors are primarily categorized into two types based on how they interact with hardware
and host operating systems.
✅ Type 1 Hypervisor (Bare-Metal)
● Installed directly on physical hardware.
● Does not require a host operating system.
● Provides better performance and efficiency.
● Commonly used in enterprise environments and cloud data centers.
✅ Type 2 Hypervisor (Hosted)
● Installed on top of an existing operating system (host OS).
● Suitable for desktop and development environments.
● Easier to install and use but with more overhead and lower performance.
🔽 Comparison of Type 1 and Type 2 Hypervisors
Feature Type 1 Hypervisor Type 2 Hypervisor
Installation Directly on hardware (bare-metal) On top of a host OS
Performance High (near-native) Moderate to low
Overhead Minimal Higher due to host OS
Use Case Data centers, servers, cloud infra Development, testing, personal use
Examples VMware ESXi, Microsoft Hyper-V VMware Workstation, VirtualBox,
(bare-metal), Xen, Oracle VM Parallels, KVM, Hyper-V (hosted)
Resource Direct hardware control Managed via host OS
Allocation
🔹 Types of Virtual Machines
Hypervisors support two major types of VMs:
VM Type Description
System Virtual Emulates an entire hardware system with its own OS and
Machine applications.
Process Virtual Designed to run a single process or application (e.g., JVM, .NET
Machine CLR).
🔹 Virtualization Techniques
Hypervisors implement different virtualization methods to manage guest operating systems:
✅ Full Virtualization
● Emulates the complete hardware environment.
● Guest OS runs without modification.
● Allows running multiple OS types on the same hardware.
● Common in Type 1 hypervisors.
✅ Paravirtualization
● Guest OS is modified to interact with the hypervisor via an API (para-API).
● Requires support from both the host and guest OS.
● Offers better performance than full virtualization.
✅ Emulation
● Software completely simulates hardware.
● Guest OS does not need to match host hardware.
● Useful for cross-platform compatibility.
● Typically slower due to overhead.
Virtualization Guest OS Performance Use Case
Type Modification
Full Virtualization Not required Moderate to General-purpose virtualization
high
Paravirtualization Required High Cloud systems needing
optimized I/O
Emulation Not required Low Legacy system support,
testing
🔹 Hypervisor in Cloud Computing
In cloud platforms, hypervisors enable:
● Resource isolation and multi-tenancy.
● Dynamic provisioning and cloning of VMs.
● Support for failover, load balancing, and replication.
● Efficient management through virtual infrastructure tools.
For example, Amazon Web Services (AWS) uses Xen and KVM hypervisors for their Amazon
Machine Instances (AMIs), while Microsoft Azure uses Hyper-V.
🔹 Operating System Virtualization
Apart from hardware-level virtualization, some OSes support OS-level virtualization, also
known as container-based virtualization.
● Creates virtual environments (VEs) or virtual private servers (VPS).
● All VEs share the same kernel.
● Lightweight and allows higher density of instances.
● Examples: Solaris Zones, IBM AIX Workload Partitions (WPARs), Docker (Linux
containers).
Feature OS-Level Virtualization Hypervisor-Based Virtualization
Kernel Sharing Shared Separate per VM
Overhead Low Higher
Isolation Moderate Strong
Performance High Moderate to High
Use Case Microservices, VMs, legacy OS support
containers
🔹 VMware vSphere:
🔸 What is VMware vSphere?
VMware vSphere is a cloud computing virtualization platform developed by VMware. It
serves as the foundation for building and managing virtualized data centers. In essence,
vSphere abstracts and pools hardware resources—compute, storage, and networking—and
provides tools to manage these resources effectively in a cloud environment.
vSphere is the successor to VMware Infrastructure and includes both infrastructure services
(like ESXi hypervisor and vCenter Server) and application services (like High Availability,
DRS, etc.).
🔸 Core Components of VMware vSphere
1. VMware ESXi:
○ A Type 1 hypervisor that installs directly on physical hardware (bare metal).
○ Boots with a Linux kernel initially but loads the vmkernel first, which handles
virtualization tasks.
○ Allows multiple virtual machines (VMs) to run on a single physical machine.
2. vCenter Server:
○ A centralized management console used to provision, manage, and monitor
vSphere environments.
○ Enables cluster management, performance tuning, automation, and alerting.
3. VMFS (Virtual Machine File System):
○ A clustered file system optimized for storing virtual machine disk images.
○ Supports concurrent access by multiple ESXi hosts.
4. VMotion:
○ Enables live migration of VMs from one physical server to another with zero
downtime.
○ Maintains VM state and memory contents during transfer.
5. Storage VMotion:
○ Moves a VM’s virtual disks from one datastore to another while the VM
remains active.
6. vNetwork Distributed Switch (DVS):
○ Creates and manages virtual network configurations across multiple hosts.
○ Supports advanced features like firewall, load balancing, and integration with
third-party switches like Cisco Nexus 1000V.
7. DRS (Distributed Resource Scheduler):
○ Automatically balances workloads by moving VMs between hosts based on CPU
and memory usage.
○ Can include Distributed Power Management (DPM) to reduce power usage
during low loads.
8. Virtual SMP (Symmetric Multi-Processing):
○ Allows a VM to utilize multiple physical CPUs, improving performance for
compute-intensive workloads.
9. vCompute, vStorage, vNetwork Services:
○ Abstract physical resources into pools:
■ vCompute: CPU and RAM
■ vStorage: Disk and file systems
■ vNetwork: Virtual switches, VLANs, and NICs
🔸 vSphere Architecture (Conceptual Overview)
A typical vSphere environment includes:
● Multiple physical hosts running ESXi
● A shared storage system (SAN, NAS, iSCSI, etc.)
● A management server (vCenter)
● Virtual Machines deployed on hosts, managed in resource pools
● Datastores that act as shared storage for VM files
These VMs can be dynamically moved and scaled according to business needs without being
tied to a specific piece of hardware.
🔸 Storage and Network Virtualization
Storage Virtualization:
● Involves creating logical representations of physical storage devices.
● ESXi maps a logical unit (LUN) to a Logical Block Address (LBA), effectively
abstracting storage.
● Enables features like Storage VMotion and thin provisioning.
Network Virtualization:
● Uses virtual NICs (vNICs) and virtual switches to mimic physical network interfaces.
● Allows network policies (like security, QoS) to be enforced virtually.
● External virtualization can include VLANs and network hardware abstraction using
software-defined networking (SDN) principles.
🔸 Key Advantage: Flexibility and Speed
● Rapid Deployment: New VMs can be spun up in seconds using pre-defined templates.
● Scalability: Easily scale up by adding hosts or VMs.
● Resiliency: HA and DRS provide failover and automatic load balancing.
🔹 Understanding Machine Imaging in Cloud Computing
🔸 What is Machine Imaging?
Machine Imaging is the process of creating a snapshot or clone of a virtual machine (VM),
including its operating system, applications, configurations, and data. The image serves as
a template for rapidly deploying multiple instances of identical environments.
In cloud computing, this is often referred to as a server image, machine image, or VM image.
🔸 Why Machine Imaging is Important
✅ Rapid Deployment: Deploy new VMs instantly using pre-configured images.
✅ Consistency: Ensure all instances have identical environments—eliminates
●
●
✅ Scalability: Easily scale up services by launching more instances from the same
configuration drift.
●
image.
✅ Disaster Recovery: Recover systems quickly using stored machine images.
✅ Automation: Integral part of DevOps and Infrastructure-as-Code (IaC).
●
●
🔸 Key Terms
Term Explanation
Image A read-only template of a system's disk.
Snapshot A point-in-time copy of a VM’s state, including memory and
disk.
Golden Image A fully configured, secured, and tested image used as a
base template.
AMI (Amazon Machine AWS-specific image format used to launch EC2 instances.
Image)
Custom Image A user-created image tailored to a specific use case or app.
🔸 Components of a Machine Image
A complete machine image includes:
📦 Operating System (e.g., Linux, Windows)
⚙️ System Configurations (e.g., registry settings, network configs)
●
🧩 Installed Software and Services
●
🔐 Security Settings (firewall rules, user permissions)
●
🧾 Startup Scripts or Metadata (for initialization tasks)
●
●
🔸 How Machine Imaging Works (General Steps)
1. Configure a VM: Install OS, configure system, deploy applications.
2. Stop the VM (optional): Ensures consistency during image creation.
3. Create Image:
○ On AWS: Use Create Image to make an AMI.
○ On VMware: Use Clone to Template or Export OVF.
4. Store Image: Stored in object storage or image registries (like AWS S3, Azure Blob, or
Docker registry).
5. Launch Instances: Use the image to spin up as many identical VMs as needed.
🔸 Types of Machine Images
Type Description Use Case
Base Image Clean OS install with minimal Start fresh with custom setup
configuration
Custom Includes specific software and Deploy app-ready environments
Image settings
Golden Secured, patched, tested image Enterprise deployments at scale
Image
Vendor Provided by cloud providers or 3rd Standardized environments (e.g.,
Image parties LAMP stack)
🔸 Machine Imaging in Different Platforms
✅ AWS
● Uses Amazon Machine Images (AMIs)
● Each AMI includes:
○ One or more EBS snapshots (for volumes)
○ Launch permissions
○ Block device mapping
✅ Azure
● Uses Managed Images and Shared Image Gallery
● Support for image versioning, regions, and replication
✅ Google Cloud
● Uses Custom Images
● Can be stored and used in multiple regions
✅ VMware
● Create VM templates or OVF (Open Virtualization Format) exports
● Used in vCenter to deploy cloned VMs or deploy via automation
🔸 Best Practices
✅ Use golden images for production environments.
✅ Automate image creation with scripts (e.g., Packer).
●
✅ Keep images updated with security patches.
●
✅ Avoid hardcoding sensitive information in images.
●
✅ Use version control for managing image changes.
●
●
🔸 Real-World Example
Suppose you’re deploying a web app that runs on Ubuntu with Apache, MySQL, and PHP.
Rather than configuring each server manually:
1. You set up the full stack once on a VM.
2. Create a golden image.
3. Launch 10 more VMs using that image—each one is production-ready in minutes.
4. Update the image when new security patches or app versions are released.
🔹 Capacity Planning in Cloud Environments
🔸 What is Capacity Planning?
Capacity Planning is the process of predicting and managing the computing resources
(like CPU, memory, storage, and network bandwidth) needed by applications or systems to
handle current and future workloads efficiently and cost-effectively.
In cloud computing, capacity planning ensures that your resources are:
🔄 Scalable on demand
💰 Cost-optimized
●
⚙️ Aligned with performance and availability requirements
●
●
🔸 Why is Capacity Planning Important in the Cloud?
Benefit Description
⚡ Scalability Ensures resources are enough to handle peak loads without
overprovisioning
💵 Cost Avoids paying for unused resources
Efficiency
📈 Performance Maintains application performance during traffic spikes
🔒 Reliability Prevents downtime or system crashes due to resource shortages
📊 Forecasting Helps plan for future growth and expansion
🔸 Key Concepts in Cloud Capacity Planning
Term Description
Provisioning Allocating resources based on current or expected need
Over-Provisioning Allocating more resources than required (wasteful)
Under-Provisionin Allocating fewer resources than needed (leads to performance issues)
g
Elasticity Ability to scale resources up/down automatically
Auto Scaling Cloud feature to automatically adjust resource capacity
Utilization Metrics CPU, memory, disk, and network usage statistics used for
decision-making
🔸 Capacity Planning Process (Step-by-Step)
1. Understand Application Requirements
○ Identify workload patterns (e.g., constant, bursty, seasonal)
○ Know your app’s CPU, memory, storage, and network needs
2. Collect Historical Data
○ Analyze resource usage over time (via monitoring tools)
○ Track trends in user growth, transaction volume, etc.
3. Forecast Future Demands
○ Use predictive analytics or linear projections
○ Consider upcoming features or events that may cause traffic spikes
4. Define SLAs & Performance Targets
○ E.g., 99.99% uptime, response time < 2 seconds
5. Select Right Instance Types & Services
○ Choose appropriate compute instances (e.g., EC2, Azure VMs)
○ Consider managed services (e.g., RDS, Lambda)
6. Implement Auto-Scaling Policies
○ Set rules to add/remove instances based on metrics (CPU > 70%, etc.)
7. Continuously Monitor and Adjust
○ Use tools like AWS CloudWatch, Azure Monitor, Google Cloud Operations Suite
○ Adapt based on real-time and predictive metrics
🔸 Tools for Capacity Planning
Platform Tools
AWS CloudWatch, Trusted Advisor, Cost Explorer, Compute Optimizer
Azure Azure Monitor, Advisor, Cost Management + Billing
GCP Operations Suite (Stackdriver), Recommender API
VMware vRealize Operations Manager
🔸 Common Challenges
Challenge Impact
Inaccurate Forecasting Leads to over- or under-provisioning
Ignoring Performance Trends Causes bottlenecks or slowdowns
Static Resource Allocation Doesn’t adapt to changing demand
Lack of Monitoring Misses real-time issues
🔸 Example Scenario
Let’s say you're running an e-commerce website. During regular days, 4 VMs are enough. But
during a festival sale:
● Traffic spikes by 4×
● You need to scale up to 16 VMs
● After the sale, scale back to 4 VMs
With proper capacity planning:
● You forecast this pattern based on previous years
● Use Auto Scaling to automatically handle the load
● Monitor metrics in real time to adjust thresholds
🔸 Best Practices
✅ Use Auto Scaling and Elastic Load Balancing
✅ Perform load testing before major events
●
✅ Maintain a buffer margin (usually 10–20%)
●
✅ Use cost calculators to estimate spend
●
✅ Periodically review and adjust capacity plans
●
●
🔹 Defining Baselines and Metrics in Cloud Monitoring
Monitoring in cloud computing ensures that cloud resources are functioning efficiently, securely,
and within expected performance ranges. Two core elements in monitoring are metrics and
baselines. Metrics provide raw data points, and baselines define what values are considered
“normal.” Together, they enable effective monitoring, troubleshooting, and optimization.
🔸 What Are Metrics?
Metrics are numerical measurements collected from cloud resources. They indicate system
health, performance, and usage patterns.
📌 Common Types of Metrics
Type of Metric Description Examples
System Metrics Track hardware and infrastructure CPU usage, memory usage,
performance disk I/O
Application Monitor software/application Request latency, error rate, API
Metrics performance throughput
Business Metrics Reflect operational or business-related Transactions per second, active
indicators users
Custom Metrics User-defined metrics for specific needs Queue depth, job processing
time
Cloud platforms like AWS CloudWatch, Azure Monitor, and Google Cloud Monitoring collect
both default and custom metrics for analysis and visualization.
🔸 What Is a Baseline?
A baseline is a reference pattern or average measurement that reflects “normal” system
behavior over time. It acts as a benchmark for comparing real-time data to detect anomalies or
abnormal performance.
📌 Key Characteristics of a Baseline
Aspect Explanation
Normal Range The acceptable range of a metric under typical conditions
Time-Dependent Baselines vary by time of day, week, or season (e.g., higher usage on
Mondays)
Dynamic or Baselines can be fixed (static) or adapt over time using machine learning
Static
Data-Driven Built using historical data and analysis of trends
For instance, if average CPU usage during peak hours is consistently 60–70%, that range
becomes the CPU baseline for those hours.
🔸 Why Baselines and Metrics Matter
Defining accurate baselines and tracking relevant metrics enables proactive monitoring and
efficient incident response. Without them, teams may either overlook genuine issues or react
to normal variations unnecessarily.
📌 Benefits of Using Baselines and Metrics
Benefit Explanation
Anomaly Detection Identify unusual spikes or drops in performance metrics
Performance Optimization Spot bottlenecks and tune systems for better efficiency
Capacity Planning Use trends to forecast resource needs and scale appropriately
SLA Monitoring Ensure services meet the agreed Service Level Agreements
Alert Configuration Set up alerts based on threshold breaches compared to
baselines
🔸 Example Scenario
Suppose a cloud-based e-commerce site sees the following typical CPU usage patterns:
● Weekdays (9 AM – 6 PM): CPU usage is around 40%–60%
● Weekends: CPU usage drops to 20%–30%
These observed ranges become baselines. If on a Tuesday afternoon the CPU spikes to 95%,
an alert is triggered — indicating a possible system overload or abnormal traffic.