0% found this document useful (0 votes)

4 views26 pages

Lecture 5 Distributed Storage Systems

The document discusses distributed storage systems in cloud computing, highlighting their importance for reliable, scalable, and high-performance data storage. It covers various cloud storage services, distributed file systems, NoSQL databases, data consistency models, cloud-based data warehousing, and real-time processing solutions. Additionally, it addresses backup, disaster recovery, and security measures essential for maintaining data integrity and availability.

Uploaded by

5699silver

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views26 pages

Lecture 5 Distributed Storage Systems

Uploaded by

5699silver

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Distributed Storage Systems

Cloud Computing
Spring 2025
Introduction
• In cloud computing, storage is not confined to a single server or
location.
• Distributed storage systems enable reliable, scalable, and high-
performance data storage across a network of machines.
• These systems underpin many cloud services and are fundamental to
supporting modern applications that require access to large-scale,
highly available data.
• This chapter explores the various facets of distributed storage in the
cloud, from fundamental storage services to advanced architectures
for real-time processing and disaster recovery.
Cloud Storage Services
Cloud providers offer highly scalable and durable storage solutions for
unstructured data. Key services include:

• Amazon S3

• Google Cloud Storage

• Azure Blob Storage

Amazon S3
• Amazon Simple Storage Service (S3) is an object storage service that
offers industry-leading scalability, data availability, with high durability
(99.999999999%), and security.
• S3 organizes data into buckets and allows users to store and retrieve
any amount of data at any time.
• Key features include lifecycle management, cross-region replication,
and fine-grained access control.
• Supports versioning, lifecycle policies, and encryption.
• Integrates with AWS analytics and compute services.
Google Cloud Storage
• Google Cloud Storage offers unified object storage for developers and
enterprises.
• It provides multiple storage classes (Standard, Nearline, Coldline,
Archive) designed for different access frequencies.
• Features include strong consistency, automatic redundancy across
regions, and integration with other Google Cloud services such as
BigQuery and AI/ML tools .
• Strong consistency model.
Google Cloud Storage
Azure Blob Storage
• Azure Blob Storage is Microsoft’s object storage solution for the
cloud.
• It is optimized for storing massive amounts of unstructured data such
as text and binary data.
• Blob Storage supports three access tiers: hot, cool, and archive access
tiers, enabling cost-effective storage based on usage patterns.
• Supports block blobs, append blobs, and page blobs
• Integrated with Azure Data Lake for analytics
Distributed File Systems
Distributed file systems enable large-scale data storage across clusters.
Key systems include:

• Hadoop Distributed File System (HDFS)

• Ceph

• Lustre
Hadoop Distributed File System (HDFS)
• HDFS is a scalable, fault-tolerant distributed file system designed to
run on commodity hardware.
• Designed for batch processing with MapReduce
• Replicates data across nodes for fault tolerance
• Optimized for large, sequential reads
• It divides large files into blocks and distributes them across nodes in a
cluster.
• Each block is replicated to ensure data durability and availability.
Ceph
• Ceph is a unified, distributed storage system designed for excellent
performance, reliability, and scalability.
• It provides object, block, and file system storage in a single platform.
• Ceph uses the CRUSH (Controlled Replication Under Scalable Hashing)
algorithm for data placement, eliminating the need for a central
metadata server.
• Highly scalable with self-healing capabilities
Lustre
• Lustre is a high-performance distributed file system commonly used
in large-scale cluster computing.
• Supports POSIX (Portable OS Interface) compliance for compatibility
• It is widely deployed in supercomputing environments where
performance and throughput are critical.
• Used in scientific computing and financial modeling.
NoSQL Databases in the Cloud
NoSQL databases provide flexible schemas and horizontal scalability for
cloud applications.

• Amazon DynamoDB

• Apache Cassandra

• MongoDB Atlas
Amazon DynamoDB
• DynamoDB is a fully managed NoSQL database service that supports
key-value and document data models.
• Single-digit millisecond latency with auto-scaling
• Supports ACID transactions (atomicity, consistency, isolation, and
durability) and global tables
• It is designed for low-latency and high-throughput applications and
offers features such as on-demand scaling, DAX (DynamoDB
Accelerator), and global tables.
Apache Cassandra
• Cassandra is a highly scalable NoSQL database designed for handling
large amounts of data across multiple commodity servers with no
single point of failure.
• It uses a peer-to-peer architecture and supports eventual consistency.
• Decentralized, wide-column store with tunable consistency
• Linear scalability across multiple data centers
• Used by Netflix, Apple, and other large-scale applications
MongoDB Atlas
• MongoDB Atlas is a fully managed cloud version of MongoDB, a
document-based NoSQL database.
• Atlas supports multi-region deployments, automated backups, and
integrated monitoring tools.
• Document-oriented database with JSON-like schema.
• Supports sharding for horizontal scaling.
• Available as a managed service.
Data Consistency Models and Replication
Strategies
Distributed storage systems often face trade-offs between consistency,
availability, and partition tolerance (CAP theorem). Various consistency
models are used to balance these trade-offs:
• Strong Consistency: Guarantees that all users see the same data at
the same time.
• Eventual Consistency: Updates will eventually propagate through the
system, but immediate consistency is not guaranteed.
• Causal Consistency: Ensures that causally related updates are seen by
all nodes in the same order.
Data Consistency Models and Replication
Strategies
Replication strategies include:
• Master-slave replication: One node handles writes, others replicate
data.
• Multi-master replication: Multiple nodes can handle writes, requiring
conflict resolution.
• Quorum-based replication: Read and write operations require a
quorum of nodes to agree. Balances consistency and availability (e.g.,
Dynamo-style systems)
• Synchronous replication: Ensures data consistency but increases
latency
• Asynchronous replication: Lower latency but risk of data loss
Cloud-Based Data Warehousing
Modern data warehouses enable large-scale analytics with serverless
architectures.

• Google BigQuery

• Snowflake

• Amazon Redshift
Google BigQuery
• BigQuery is a serverless, highly scalable data warehouse that allows
users to run SQL-like queries on large datasets.

• It supports real-time analytics and integrates with various data

ingestion tools.

• Real-time querying and integration with ML models.

Snowflake
• Snowflake offers a cloud-native data warehouse with separate
compute and storage, enabling elastic scalability and concurrent
workloads.

• Its architecture supports structured and semi-structured data.

Amazon Redshift
• Redshift is a fully managed data warehouse that uses columnar
storage and parallel processing to deliver high performance for
analytical queries.

• It integrates with S3 for data lakes and supports Redshift Spectrum for
querying data directly from S3.
Data Streaming and Real-Time Processing
Real-time data processing is crucial for applications such as fraud
detection, log analysis, and recommendation systems.
Cloud-based streaming services include:
• Apache Kafka: A distributed event streaming platform that enables
real-time data feeds.
• Amazon Kinesis/ Azure Event Hubs: A suite of services for real-time
data ingestion and processing.
- Managed streaming services for real-time analytics
- Supports ingestion from IoT, logs, and transactions
• Google Cloud Dataflow: A serverless data processing service for
stream and batch data using Apache Beam SDK.
Backup, Disaster Recovery, and Storage
Security

• Backup and Disaster Recovery

• Storage Security
Backup and Disaster Recovery
Cloud providers offer automated backup services with options for
versioning and point-in-time recovery.
Disaster recovery strategies include:
• Cold standby: Delayed recovery using periodically updated backups.
• Warm standby: Partially active infrastructure that can be quickly
scaled.
• Hot standby: Fully active and redundant systems across regions.
Storage Security
Security in cloud storage involves:

• Encryption: Both in transit (TLS) and at rest (AES-256).

• Access Control: Fine-grained IAM policies, Access control, and access logs.

• Immutable storage: to prevent ransomware attacks

• Compliance: Adherence to standards like GDPR, HIPAA, and SOC 2.

Conclusion
• Distributed storage systems are foundational to the reliability,
performance, and scalability of cloud-based solutions.
• From object storage services and distributed file systems to NoSQL
databases and real-time processing platforms, understanding these
systems is essential for architects and developers building cloud-
native applications.
• Moreover, robust replication strategies, consistency models, and
security mechanisms ensure the integrity and availability of data in a
distributed environment.

Cloud Unit3
No ratings yet
Cloud Unit3
26 pages
Chapter 3 E-Business Infrastructure
No ratings yet
Chapter 3 E-Business Infrastructure
9 pages
cloud unit-4-2
No ratings yet
cloud unit-4-2
32 pages
Ccomputing Madurya
No ratings yet
Ccomputing Madurya
20 pages
CC Unit-5
No ratings yet
CC Unit-5
9 pages
Data Cube on Cloud Computing
No ratings yet
Data Cube on Cloud Computing
10 pages
Unit 1
No ratings yet
Unit 1
62 pages
CC Unit - 03
No ratings yet
CC Unit - 03
10 pages
Chapter 7
No ratings yet
Chapter 7
51 pages
Big Data and Cloud Computing
No ratings yet
Big Data and Cloud Computing
27 pages
UNIT-III CC
No ratings yet
UNIT-III CC
14 pages
Big Data Processing in The Cloud - Challenges and Platforms
No ratings yet
Big Data Processing in The Cloud - Challenges and Platforms
8 pages
L2 AWS Basics
No ratings yet
L2 AWS Basics
56 pages
CC IST IA QUESTION BANK 24-25
No ratings yet
CC IST IA QUESTION BANK 24-25
9 pages
46-Article Text-261-2-10-20210422
No ratings yet
46-Article Text-261-2-10-20210422
10 pages
unit 5 cc
No ratings yet
unit 5 cc
7 pages
Data-Intensive Computing
No ratings yet
Data-Intensive Computing
88 pages
Module 4 Iot
No ratings yet
Module 4 Iot
33 pages
Unit3 - Cloud Data Storage
No ratings yet
Unit3 - Cloud Data Storage
7 pages
CC MODULE 5
No ratings yet
CC MODULE 5
22 pages
IOT Unit 3
No ratings yet
IOT Unit 3
10 pages
Cloud computing
No ratings yet
Cloud computing
8 pages
Introduction To
No ratings yet
Introduction To
9 pages
CC-UNIT-2
No ratings yet
CC-UNIT-2
22 pages
Module 4
No ratings yet
Module 4
14 pages
Cloud computing notes
No ratings yet
Cloud computing notes
15 pages
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
No ratings yet
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
36 pages
Cloud Computing Module-5
No ratings yet
Cloud Computing Module-5
5 pages
CC Assiginment 2
No ratings yet
CC Assiginment 2
3 pages
cloud 1st internals
No ratings yet
cloud 1st internals
10 pages
20 - 04 - 2024 Cheatsheet
No ratings yet
20 - 04 - 2024 Cheatsheet
3 pages
unit-3 CC
No ratings yet
unit-3 CC
10 pages
17-01
No ratings yet
17-01
68 pages
ICC Cloud Computing Technologies (Ankit Nandera)
No ratings yet
ICC Cloud Computing Technologies (Ankit Nandera)
15 pages
#5 Cloud Computing
No ratings yet
#5 Cloud Computing
28 pages
Cloud Computing
No ratings yet
Cloud Computing
47 pages
Short notes_4_5_6
No ratings yet
Short notes_4_5_6
7 pages
Unit 3
No ratings yet
Unit 3
4 pages
Distributed File System and Scalable Computing
No ratings yet
Distributed File System and Scalable Computing
8 pages
Cloud Computing Revision
No ratings yet
Cloud Computing Revision
18 pages
Unit-4_Cloud Storage and Database Services
No ratings yet
Unit-4_Cloud Storage and Database Services
88 pages
Module 4 - Cloud Programming and Software Environments
No ratings yet
Module 4 - Cloud Programming and Software Environments
25 pages
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
No ratings yet
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
43 pages
Ugc Care Paper 1
No ratings yet
Ugc Care Paper 1
8 pages
CC - Lecture 8-Final
No ratings yet
CC - Lecture 8-Final
51 pages
Storage
No ratings yet
Storage
6 pages
BIT4440 BSE4040 CloudComputing 2.cloud Computing Technologies
No ratings yet
BIT4440 BSE4040 CloudComputing 2.cloud Computing Technologies
30 pages
Big Data Deals With Large Data Sets
No ratings yet
Big Data Deals With Large Data Sets
4 pages
Advanced Storage Technologies
No ratings yet
Advanced Storage Technologies
2 pages
09 - Cloud-Enabling Technologies - v2
No ratings yet
09 - Cloud-Enabling Technologies - v2
45 pages
IOT and Comp.architecture
No ratings yet
IOT and Comp.architecture
17 pages
test 1 big data
No ratings yet
test 1 big data
17 pages
INSIDE CLOUD - CASE STUDY
No ratings yet
INSIDE CLOUD - CASE STUDY
11 pages
Iot with CC
No ratings yet
Iot with CC
30 pages
BDA-U2
No ratings yet
BDA-U2
32 pages
Data Science
No ratings yet
Data Science
87 pages
SN_1712934164767
No ratings yet
SN_1712934164767
13 pages
big data 4
No ratings yet
big data 4
4 pages
Databases Researh
No ratings yet
Databases Researh
16 pages
Pant
No ratings yet
Pant
8 pages
Cloud Computing For Noobs
From Everand
Cloud Computing For Noobs
Silas Meadowlark
No ratings yet
Radware Alteon Data Sheet2018
No ratings yet
Radware Alteon Data Sheet2018
4 pages
Providers - Configuration Language _ Terraform _ HashiCorp Developer
No ratings yet
Providers - Configuration Language _ Terraform _ HashiCorp Developer
5 pages
Vcap5 DCD
No ratings yet
Vcap5 DCD
124 pages
Module 3 - Business Analytics
No ratings yet
Module 3 - Business Analytics
34 pages
Cryptocurrency - Bitcoin, Ethereum, Blockchain (How To Guide Investing, Trading, Mining) by Chang Lee
100% (4)
Cryptocurrency - Bitcoin, Ethereum, Blockchain (How To Guide Investing, Trading, Mining) by Chang Lee
104 pages
FLUKE 1662 SCH-TPL Kit/F FLUKE 1662 DE - TPL Kit FLUKE 1663 DE - TPL Kit FLUKE 1663 SCH - TPL Kit/F FLUKE 1663 UK - TPL Kit
No ratings yet
FLUKE 1662 SCH-TPL Kit/F FLUKE 1662 DE - TPL Kit FLUKE 1663 DE - TPL Kit FLUKE 1663 SCH - TPL Kit/F FLUKE 1663 UK - TPL Kit
12 pages
Omnipcx Enterprise Communication Server Datasheet en
No ratings yet
Omnipcx Enterprise Communication Server Datasheet en
5 pages
Enterprise Application Development
No ratings yet
Enterprise Application Development
8 pages
Fenergo Remote Account Opening Generic
No ratings yet
Fenergo Remote Account Opening Generic
4 pages
Cyberattacks Cybercrime Cybersecurity Solutions Cybersecurity
No ratings yet
Cyberattacks Cybercrime Cybersecurity Solutions Cybersecurity
2 pages
Snehavi_Salesforce_Updated
No ratings yet
Snehavi_Salesforce_Updated
4 pages
Microsoft Defender For Cloud PDF-Slides-123
No ratings yet
Microsoft Defender For Cloud PDF-Slides-123
53 pages
ICTNWK422 Learner Guide
No ratings yet
ICTNWK422 Learner Guide
87 pages
Tracy Ballew Google Classroom
No ratings yet
Tracy Ballew Google Classroom
91 pages
8 Best Free and Open Source NAS or SAN Software
No ratings yet
8 Best Free and Open Source NAS or SAN Software
9 pages
Microsoft Azure
No ratings yet
Microsoft Azure
66 pages
Microsoft CEO Satya Nadella's Master Plan - Business Insider PDF
No ratings yet
Microsoft CEO Satya Nadella's Master Plan - Business Insider PDF
4 pages
Proposed Final Year B.Tech in Computer Engineering: Dr. Babasaheb Ambedkar Technological University, Lonere
No ratings yet
Proposed Final Year B.Tech in Computer Engineering: Dr. Babasaheb Ambedkar Technological University, Lonere
49 pages
Security+ 601 Test 3
No ratings yet
Security+ 601 Test 3
26 pages
D1.1 - State of The Art, Project Concept and Requirements
0% (1)
D1.1 - State of The Art, Project Concept and Requirements
203 pages
IOT (Unit III) ...
No ratings yet
IOT (Unit III) ...
39 pages
Vcf 52 Design
100% (1)
Vcf 52 Design
308 pages
Attendance Management Using AWS Cloud: Abstract
No ratings yet
Attendance Management Using AWS Cloud: Abstract
7 pages
J45_Set-Up_EN_XX
No ratings yet
J45_Set-Up_EN_XX
28 pages
Ladies and Gentleman
No ratings yet
Ladies and Gentleman
2 pages
HPE GreenLake For Block Storage MP-A50006985enw
No ratings yet
HPE GreenLake For Block Storage MP-A50006985enw
19 pages
Rajesh Dhakad Resume
No ratings yet
Rajesh Dhakad Resume
2 pages
Event Management Proposal
No ratings yet
Event Management Proposal
12 pages
1910013128-Omada SDN Controller User Guide 5.0 (Windows&Linux)
No ratings yet
1910013128-Omada SDN Controller User Guide 5.0 (Windows&Linux)
404 pages

Lecture 5 Distributed Storage Systems

Uploaded by

Lecture 5 Distributed Storage Systems

Uploaded by

Distributed Storage Systems

• Google Cloud Storage

• Azure Blob Storage

• Hadoop Distributed File System (HDFS)

• It supports real-time analytics and integrates with various data

• Real-time querying and integration with ML models.

• Its architecture supports structured and semi-structured data.

• Backup and Disaster Recovery

• Encryption: Both in transit (TLS) and at rest (AES-256).

• Immutable storage: to prevent ransomware attacks

• Compliance: Adherence to standards like GDPR, HIPAA, and SOC 2.

You might also like