0% found this document useful (0 votes)

19 views23 pages

Amazon EC2 Availability Tactics Explained

Software architecture aims to maximize system availability through various tactics: 1) Fault detection methods like pinging, heartbeats, and data checks monitor system health. 2) Recovery is enabled by redundancy, rollbacks, retries, and state resynchronization when failures occur. 3) Fault prevention focuses on removing potential causes of failure through software upgrades, monitoring, and exception handling.

Uploaded by

abhay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views23 pages

Amazon EC2 Availability Tactics Explained

Uploaded by

abhay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Software Architecture

Availability
BITS Pilani
Availability

1) Call out
2) Give Response

SS ZG653
Availability
• The SLA that Amazon provides for its EC2 cloud service is
• AWS will use commercially reasonable efforts to make Amazon EC2
available with an Annual Uptime Percentage [defined elsewhere] of
at least 99.95% during the Service Year. In the event Amazon EC2
does not meet the Annual Uptime Percentage commitment, you will
be eligible to receive a Service Credit as described below.
Availability
System Availability Requirements

Amazon EC2 guarantees 99.95% availability

Availability

• Availability is the ability of a system to minimize system

outages

• Availability is generally measured as % of up-time

• Ex. If a system is down for an average of 1 day in 100

days, its availability is 99.9%
Availability

Calculating availability

Uptime Downtime
(Time between failures) (Time to repair)
(TBF) (TTR)

99 days 1 day

Availability = Mean TBF / (Mean TBF + Mean TTR)

= MTBF / (MTBF + MTTR)
How to recover from node
failure in a network?
• Detects node failure by ‘pinging’ nodes

• If a node does not ‘Echo’ back, change the routing of

packets via alternate nodes
Availability

Types of faults

• Omission : Failure to respond

• Crash : Repeated failure to respond
• Timing issue : Respond early or late (ex. Late arrival of
message)
• Incorrect response: Response with incorrect value
Availability

• Replication
• Functional redundancy (same input but diversely
designed),
• Analytic redundancy (Ex. Determining altitude using
barometric pressure, using geometry)
Availability

• Redundancy:
• Active (hot spare): All components process in parallel. Ex Disk
mirroring, sending data via 2 routes
• Passive (warm spare): Periodic state updates, Ex. Server farms
in eCommerce
• The application must store as much of its state on non-
volatile shared storage as possible. Equally important is the
ability to restart on another node at the last state before
failure using the saved state from the shared storage.
• Spare (cold spare): Power on procedure
How to recover from failed
online transactions?
Roll back incomplete transactions, using a log

A=10,000, B=10,000
Begin Trasaction

Read A
A = A - 1000
Write A
Crash
Read B
B = B + 1000
Time Write B

End Transaction

A=9,000, B=11,000
How to ensure integrity of data when 2
txns are trying to access the same data?
‘Lock’ concept in database

A = 10,000
User 1 User 2

Read A
Read A
A = A + 1000
A = A - 1000
Write A Time
Write A
A = 9,000

Interleaving of txns can lead to loss of data integrity

Availability

Source: ‘Software architecture

in practice by Len Bass &
others
Availability Tactics
Fault detection Error Masking Recover From Fault Fault prevention

• Ping/echo • Active redundancy • Rollback • Removal of a

• Heartbeat (Hot) • Retry component to
• Timestamp • Passive • Reconfiguration prevent
redundancy anticipated
• Data sanity check • Shadow operation
(Warm) failure–
• Condition • State auto/manual
monitoring • Spare (Cold) resynchronization reboot
• Voting • Exception handling • Escalating restart • Create transaction
• Exception • Graceful • Nonstop
degradation • Software upgrade
Detection forwarding
• Ignore faulty • Predictive model
• Self-test
behavior • Process monitor-
that can detect,
remove and restart
faulty process
• Exception
prevention

9/17/2023 SS ZG653 14
Availability

Detecting faults – other tactics

• Checksum: used in networks, data storage
• Voting: Triple Modular Redundancy: Used in satellite
systems
Disk mirroring (Active
redundancy)
To recover from disk failure

Once new disk is installed, copy database

(resynchronization)
High availability in Tomcat

Use of temporary server when the main server is getting upgraded

during new release

• Live traffic requests are re-directed to a temporary server while the

main server is upgraded
Load balancing
• Distribute load to different servers in a server farm
Types of faults that impact
availability
Hardware faults
• Disk failure
• Power failure
• Network failure

Software faults
• Memory leak
• Divide by zero
• Incorrect parameter passing
Server farms

A server farm or server cluster is a collection of computer servers - usually

maintained by an organization to supply server functionality far beyond the
capability of a single machine.

Server farms often consist of thousands of computers which require a large amount
of power to run and to keep cool. At the optimum performance level, a server
farm has enormous costs (both financial and environmental) associated with it.[1]

Server farms often have backup servers, which can take over the function of primary
servers in the event of a primary-server failure.

Server farms are typically collocated with the network switches and/or routers which
enable communication between the different parts of the cluster and the users of
the cluster.
Amazon Server Farm

Refer :
https://www.youtube.com/watch?v=q6WlzHLxNKI&t=342
s
Availability

Recovering from faults – other tactics

• Exception handling: Ex. Divide by zero, File IO
error
• Retry: Used in networks and server farms
Summary of availability tactics

• Detect faults
• Recover from faults (Automated or manual)
• Prevent faults

IoT Software Architecture: Availability Tactics
No ratings yet
IoT Software Architecture: Availability Tactics
16 pages
Quality Attribute Requirements for Availability
No ratings yet
Quality Attribute Requirements for Availability
46 pages
Server Infrastructure and Availability Concepts
No ratings yet
Server Infrastructure and Availability Concepts
32 pages
Availability Tactics in Software Architecture
No ratings yet
Availability Tactics in Software Architecture
39 pages
Chapter 5 - Achieving The Quality Attributes - System Availability-1
No ratings yet
Chapter 5 - Achieving The Quality Attributes - System Availability-1
32 pages
High Availability and Load Balancing: Realized by
No ratings yet
High Availability and Load Balancing: Realized by
31 pages
High Availability in IT Infrastructure
No ratings yet
High Availability in IT Infrastructure
30 pages
Information Technology Infrastructure IT602
No ratings yet
Information Technology Infrastructure IT602
10 pages
Technical Essentials of HP Servers, Rev. 11.41
No ratings yet
Technical Essentials of HP Servers, Rev. 11.41
72 pages
Availability Tactics in Software Architecture
No ratings yet
Availability Tactics in Software Architecture
3 pages
Failover Clusters: HA & CA Explained
No ratings yet
Failover Clusters: HA & CA Explained
4 pages
Module 04C Scaleout & Consistent Hashing - Redundancy
No ratings yet
Module 04C Scaleout & Consistent Hashing - Redundancy
39 pages
Core System Design Principles
No ratings yet
Core System Design Principles
25 pages
Reliability
No ratings yet
Reliability
42 pages
Du3 1
No ratings yet
Du3 1
54 pages
Big Data Systems: Fault Tolerance & Analytics
No ratings yet
Big Data Systems: Fault Tolerance & Analytics
68 pages
IT Infrastructure Building Blocks Guide
No ratings yet
IT Infrastructure Building Blocks Guide
35 pages
Chapter 5: Availability: © Len Bass, Paul Clements, Rick Kazman, Distributed Under Creative Commons Attribution License
No ratings yet
Chapter 5: Availability: © Len Bass, Paul Clements, Rick Kazman, Distributed Under Creative Commons Attribution License
31 pages
BSCSF 1552 09 21 Cosf326
No ratings yet
BSCSF 1552 09 21 Cosf326
4 pages
High Availability and Load Balancing Guide
No ratings yet
High Availability and Load Balancing Guide
32 pages
IT Infrastructure Availability Guide
No ratings yet
IT Infrastructure Availability Guide
31 pages
IT602-MidTerm Handouts by Yasir Ejaz
50% (2)
IT602-MidTerm Handouts by Yasir Ejaz
201 pages
4 - HA-Reya
No ratings yet
4 - HA-Reya
20 pages
BDS Session 3
No ratings yet
BDS Session 3
67 pages
Availability Digest: Blueprints For High Availability
No ratings yet
Availability Digest: Blueprints For High Availability
8 pages
Ccaws Unit 5
No ratings yet
Ccaws Unit 5
17 pages
It602 Ppts
No ratings yet
It602 Ppts
72 pages
Combine 1-4 Week IT602
No ratings yet
Combine 1-4 Week IT602
116 pages
EContent 11 2023 12 02 20 03 22 Cloudcomputingch5pptx 2023 10 12 19 22 05
No ratings yet
EContent 11 2023 12 02 20 03 22 Cloudcomputingch5pptx 2023 10 12 19 22 05
42 pages
Cours HA LB
No ratings yet
Cours HA LB
34 pages
Grid Computing Fault Tolerance
No ratings yet
Grid Computing Fault Tolerance
14 pages
Information Technology Infrastructure IT602
No ratings yet
Information Technology Infrastructure IT602
19 pages
Telco-Grade Services From An It-Grade Cloud: Martin Taylor. CTO, Metaswitch Networks
No ratings yet
Telco-Grade Services From An It-Grade Cloud: Martin Taylor. CTO, Metaswitch Networks
6 pages
10 HighAvailability
No ratings yet
10 HighAvailability
95 pages
Understanding Server Clusters: Benefits & Types
No ratings yet
Understanding Server Clusters: Benefits & Types
4 pages
Click To Edit Master Title Style AWS Certified Solutions Architect Crash Course V2
100% (2)
Click To Edit Master Title Style AWS Certified Solutions Architect Crash Course V2
136 pages
Power HA Workshop Overview
No ratings yet
Power HA Workshop Overview
50 pages
Lecture 21,22,23,24 Availability Modifiability Tactics
No ratings yet
Lecture 21,22,23,24 Availability Modifiability Tactics
66 pages
Unit-5 Cloud Computing (Eee-Iv-I)
No ratings yet
Unit-5 Cloud Computing (Eee-Iv-I)
22 pages
Dis Sys
No ratings yet
Dis Sys
16 pages
High Availability Strategies for InterSystems
No ratings yet
High Availability Strategies for InterSystems
19 pages
2012 Adv Configuration 4
No ratings yet
2012 Adv Configuration 4
68 pages
Synology HASWhite Paper
No ratings yet
Synology HASWhite Paper
13 pages
White Paper Synology HA Configuration
No ratings yet
White Paper Synology HA Configuration
13 pages
Network Redundancy and High Availability Guide
No ratings yet
Network Redundancy and High Availability Guide
2 pages
Understanding Fault Tolerance in Systems
No ratings yet
Understanding Fault Tolerance in Systems
9 pages
Failover In-Depth
No ratings yet
Failover In-Depth
4 pages
High Availaility
No ratings yet
High Availaility
8 pages
Fault-Tolerance Techniques in Clusters
No ratings yet
Fault-Tolerance Techniques in Clusters
13 pages
Oracle RAC High Availability Design Guide
No ratings yet
Oracle RAC High Availability Design Guide
19 pages
SonicOS 5.6.5 Active-Active Clustering Full Mesh TechNote
No ratings yet
SonicOS 5.6.5 Active-Active Clustering Full Mesh TechNote
6 pages
Learning OpenStack High Availability - Sample Chapter
100% (1)
Learning OpenStack High Availability - Sample Chapter
15 pages
High Availability PDF
No ratings yet
High Availability PDF
14 pages
AWS Serverless Applications Lens
No ratings yet
AWS Serverless Applications Lens
60 pages
Cloud Telecommunication Server Reliability
No ratings yet
Cloud Telecommunication Server Reliability
5 pages
Fault Tolerance
No ratings yet
Fault Tolerance
33 pages
Ieee Ha Swieorick
No ratings yet
Ieee Ha Swieorick
19 pages
Synology High Availability White Paper: Based On
No ratings yet
Synology High Availability White Paper: Based On
17 pages
Interoperability and Performance in Software
No ratings yet
Interoperability and Performance in Software
27 pages
Cost-Benefit Analysis in Software Architecture
No ratings yet
Cost-Benefit Analysis in Software Architecture
19 pages
Software Architecture Layering Pattern
No ratings yet
Software Architecture Layering Pattern
23 pages
Biostatistics Course Curriculum Overview
No ratings yet
Biostatistics Course Curriculum Overview
4 pages
O-RAN: Towards An Open and Smart RAN: White Paper October 2018
No ratings yet
O-RAN: Towards An Open and Smart RAN: White Paper October 2018
19 pages
Linux Material For 6&7
100% (9)
Linux Material For 6&7
192 pages
Understanding Localhost for Web Development
No ratings yet
Understanding Localhost for Web Development
2 pages
Macsec, Sync
No ratings yet
Macsec, Sync
2 pages
BelAir200 User Manual
No ratings yet
BelAir200 User Manual
2 pages
ITT501 Chapter 1
No ratings yet
ITT501 Chapter 1
77 pages
Overview of Splunk Components and Functions
No ratings yet
Overview of Splunk Components and Functions
2 pages
Hands-On DVB-S2 and DVB-RCS For VSAT and Direct Satellite TV Broadcasting
No ratings yet
Hands-On DVB-S2 and DVB-RCS For VSAT and Direct Satellite TV Broadcasting
6 pages
C48ib001en e Mpds 5 v4 Manual
No ratings yet
C48ib001en e Mpds 5 v4 Manual
198 pages
Bluetooth Audio Industry Overview
No ratings yet
Bluetooth Audio Industry Overview
9 pages
Manual SICAM 8 Applications-Communication EN V06
No ratings yet
Manual SICAM 8 Applications-Communication EN V06
152 pages
308 Programm
No ratings yet
308 Programm
16 pages
VT-pro 525-625 Service Manual
No ratings yet
VT-pro 525-625 Service Manual
13 pages
31013932-MA5303 User Manual
No ratings yet
31013932-MA5303 User Manual
154 pages
Hiddy Script Installation Guide
No ratings yet
Hiddy Script Installation Guide
6 pages
CCNA Exploration 4.0 ERouting Final Exam 59 Questions 100%
100% (3)
CCNA Exploration 4.0 ERouting Final Exam 59 Questions 100%
22 pages
Unit-02 - Networking Assignment - 2024 Asry
No ratings yet
Unit-02 - Networking Assignment - 2024 Asry
24 pages
Intoduction To CN
No ratings yet
Intoduction To CN
7 pages
Intelilight - Fpc-220 Data Concentrator
No ratings yet
Intelilight - Fpc-220 Data Concentrator
2 pages
IP Board Paper
No ratings yet
IP Board Paper
11 pages
Configure Cti Applications
No ratings yet
Configure Cti Applications
12 pages
ES-TSB00278 Illumena To Ziehm C-Arm Interface Kit
100% (1)
ES-TSB00278 Illumena To Ziehm C-Arm Interface Kit
3 pages
H3C S6520X-EI Series Enhanced 10GE Switches Datasheet
No ratings yet
H3C S6520X-EI Series Enhanced 10GE Switches Datasheet
13 pages
Scor 350-701-V7
100% (1)
Scor 350-701-V7
156 pages
Service Support Tool Version 1.63E Operation Manual: Revision 0
No ratings yet
Service Support Tool Version 1.63E Operation Manual: Revision 0
87 pages
Recloser Chardon R200i
No ratings yet
Recloser Chardon R200i
112 pages
VT3 Data Com 1-Chan 10-12-07
No ratings yet
VT3 Data Com 1-Chan 10-12-07
48 pages
Cisco Cable Command Reference Guide Cable D
No ratings yet
Cisco Cable Command Reference Guide Cable D
154 pages
Vsphere Esxi Vcenter Server 703 Networking Guide
No ratings yet
Vsphere Esxi Vcenter Server 703 Networking Guide
293 pages
Anime Snake Boy - Google Search 2
No ratings yet
Anime Snake Boy - Google Search 2
1 page

Amazon EC2 Availability Tactics Explained

Uploaded by

Amazon EC2 Availability Tactics Explained

Uploaded by

Software Architecture

Amazon EC2 guarantees 99.95% availability

• Availability is the ability of a system to minimize system

• Availability is generally measured as % of up-time

• Ex. If a system is down for an average of 1 day in 100

Availability = Mean TBF / (Mean TBF + Mean TTR)

• If a node does not ‘Echo’ back, change the routing of

• Omission : Failure to respond

Interleaving of txns can lead to loss of data integrity

Source: ‘Software architecture

• Ping/echo • Active redundancy • Rollback • Removal of a

Detecting faults – other tactics

Once new disk is installed, copy database

Use of temporary server when the main server is getting upgraded

• Live traffic requests are re-directed to a temporary server while the

A server farm or server cluster is a collection of computer servers - usually

Recovering from faults – other tactics

You might also like