0% found this document useful (0 votes)

8 views22 pages

Unit 5 Lecture 2

Uploaded by

Mansi Varshney

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views22 pages

Unit 5 Lecture 2

Uploaded by

Mansi Varshney

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Subject Name :-Cloud Computing

Subject Code :- KCS 713

Unit No. :- 5
Lecture No. :- 2
Topic Name :-Google File System
Contents
• Google File System (GFS)
• GFS Architecture
• System Interactions
• Read Algorithm
• Write Algorithm
• Master Operation
• Garbage Collection
• Fault tolerance
• Challenges
• Important Questions
• References
Cloud File Systems
• Google File System (GFS)
– Designed to manage relatively large files using a very large distributed cluster of
commodity servers connected by a high-speed network
– Handles:
• Failures even during reading or writing of individual files
• Fault tolerant: a necessity
– p(system failure) = 1-(1-p(component failure))N -->1 (for
large N)
• Support parallel reads, writes and appends by multiple simultaneous client
programs
• Hadoop Distributed File System (HDFS)
– Open source implementation of GFS architecture
– Available on Amazon EC2 cloud platform
GOOGLE FILE SYSTEM ARCHITECTURE

• GFS cluster consists of a single master and multiple chunkservers.

• The basic analogy of GFS is master , client , chunkservers.
GFS Architecture
• Files are divided into fixed-size chunks.
• Chunkservers store chunks on local disks as Linux files. Master maintains all file system metadata.
• Includes the namespace, access control information, the mapping from files to chunks, and the current
locations of chunks.
• Clients interact with the master for metadata operations. Chunkservers need not cache file data .
Chunk
• Similar to the concept of block in file systems.
• Compared to file systems, the size of chunk is 64 MB.
• Less chunks and less metadata for chunks in the master. Problem in this chunk size is developing a
hotspot.
• Property of chunk is chunks are stored in chunkserver as file, chunk handle, i.e., chunk file name.

Metadata
Master stores three major types of metadata: the file and chunk namespaces, the mapping from files to
chunks, and the location of each chunk’s replicas.
• First two types are kept persistent to an operation log stored on the master’s local
disk.
• Metadata is stored in memory, master operations are fast.
• Easy and efficient for the master to periodically scan . Periodic scanning is
used to implement chunk garbage collection, re-replication and chunk
migration .
Master
• Single process ,running on a separate machine that stores all metadata.
• Clients contact master to get the metadata to contact the chunkservers.
SYSTEM INTERACTION
Read Algorithm
1.Application originates the read request
2.GFS client translates the request form (filename, byte range) -> (filename,
chunk index), and sends it to master
3. Master responds with chunk handle and replica locations (i.e. chunkservers where the
replicas are stored)
4.Client picks a location and sends the (chunk handle, byte range) request to the location
5.Chunkserver sends requested data to the client
6.Client forwards the data to the application
Write Algorithm
1.Application originates the request
2.GFS client translates request from (filename, data) -> (filename, chunk index), and sends it to
master
3. Master responds with chunk handle and (primary + secondary) replica locations
4. Client pushes write data to all locations. Data is stored in chunkservers’ internal buffers

5. Client sends write command to primary

6. Primary determines serial order for data instances stored in its buffer and writes the instances in
that order to the chunk

7. Primary sends the serial order to the secondaries and tells them to perform the write
7. Secondaries respond to the primary
8. Primary responds back to the client
Record Append Algorithm

1. Application originates record append request.

2. GFS client translates requests and sends it to master.

3. Master responds with chunk handle and (primary + secondary) replica locations.

4. Client pushes write data to all replicas of the last chunk of the file.

5. Primary checks if record fits in specified chunk.

6. If record doesn’t fit, then the primary: Pads the chunk

7. Tell secondaries to do the same

8. And informs the client

9. Client then retries the append with the next chunk

10. If record fits, then the primary: Appends the record

11. Tells secondaries to write data at exact offset Receives responses from secondaries

12. And sends final response to the client

MASTER OPERATION
Name space management and locking
• Multiple operations are to be active and use locks over regions of the
• namespace.
• GFS does not have a per-directory data structure.
• GFS logically represents its namespace as a lookup table. Each master operation
acquires a set of locks before it runs.

Replica placement
• A GFS cluster is highly distributed.
• The chunk replica placement policy serves , maximize data reliability and availability, and
maximize network bandwidth utilization.
• Chunk replicas are also spread across racks.
Creation , Re-replication and Balancing Chunks

• Factors for choosing where to place the initially empty replicas:

1. We want to place new replicas on chunkservers with below-average disk space

utilization.

2. We want to limit the number of “recent” creations on each chunkserver.

3. Spread replicas of a chunk across racks.

• master re-replicates a chunk.

• Chunk that needs to be rereplicated is prioritized based on how far it is from its replication goal.

• Finally, the master rebalances replicas periodically.

GARBAGE COLLECTION
• Garbage collection at both the file and chunk levels.

• Deleted by the application, the master logs the deletion immediately.

• File is just renamed to a hidden name .

• The file can be read under the new, special name and can be undeleted.

• Memory metadata is erased.

FAULT TOLERANCE
High Availability

• Fast Recovery Chunk

• Replication Master

• Replication

Data Integrity

• Chunk server uses checksumming. Broken up into 64 KB blocks.

CHALLENGES

• Storage size.

• Bottle neck for the clients.

• Time.
Important Questions

1. What is NoSQL?

2. Explain the difference between NoSQL v/s Relational database?

3. What does Google File System (GFS) mean?

4. What is GFS file system in Linux?

5. Explain Architecture of Google File System?

References
 Dan C Marinescu: “ Cloud Computing Theory and Practice.” Elsevier(MK) 2013.
 RajkumarBuyya, James Broberg, Andrzej Goscinski: “Cloud Computing Principles
and Paradigms”, Willey 2014.
 https://www.ques10.com/p/13989/explain-architecture-of-google-file-system-1/
 https://www.sciencedirect.com/topics/computer-science/google-file-system
 https://www.researchgate.net/publication/220910111_The_Google_File_System

Question Bundle - Multiple Topics 1
100% (3)
Question Bundle - Multiple Topics 1
58 pages
HandBook Soil Science - Process and Properties
No ratings yet
HandBook Soil Science - Process and Properties
1,424 pages
Google File System
No ratings yet
Google File System
20 pages
Google File System
No ratings yet
Google File System
22 pages
GFS
No ratings yet
GFS
9 pages
The Google File System Final
No ratings yet
The Google File System Final
20 pages
The Google File System
No ratings yet
The Google File System
21 pages
GFS
No ratings yet
GFS
33 pages
M4_05_Google File System
No ratings yet
M4_05_Google File System
28 pages
The Google File System: Firas Abuzaid
No ratings yet
The Google File System: Firas Abuzaid
22 pages
2 GFS
No ratings yet
2 GFS
30 pages
15 Gfs
No ratings yet
15 Gfs
40 pages
GFD Summary
No ratings yet
GFD Summary
3 pages
Case Study: Google File System
No ratings yet
Case Study: Google File System
7 pages
The Google File System: Alexandru Costan
No ratings yet
The Google File System: Alexandru Costan
38 pages
Unit 2
No ratings yet
Unit 2
22 pages
R16 4-1 BDA - Unit-2 (Ref-3)
No ratings yet
R16 4-1 BDA - Unit-2 (Ref-3)
22 pages
Google File System (GFS)
No ratings yet
Google File System (GFS)
18 pages
Lecture_14_HDFS_GFS
No ratings yet
Lecture_14_HDFS_GFS
30 pages
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
No ratings yet
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
52 pages
Gfs Google File System 13331
No ratings yet
Gfs Google File System 13331
28 pages
Chapter_2_Google_File_System_250525_070947
No ratings yet
Chapter_2_Google_File_System_250525_070947
42 pages
Unit 2 PDF
No ratings yet
Unit 2 PDF
22 pages
Google File System
No ratings yet
Google File System
9 pages
An Overview of Google File System (GFS) _ Medium
No ratings yet
An Overview of Google File System (GFS) _ Medium
10 pages
BDA-Unit-I
No ratings yet
BDA-Unit-I
18 pages
Bda Material Unit 2
No ratings yet
Bda Material Unit 2
19 pages
Questions On Google File System
100% (1)
Questions On Google File System
3 pages
Google File System
No ratings yet
Google File System
48 pages
Google_File_System_1
No ratings yet
Google_File_System_1
48 pages
What Is Distributed Data Processing?
No ratings yet
What Is Distributed Data Processing?
2 pages
BDA Unit-1
No ratings yet
BDA Unit-1
19 pages
storage-systems
No ratings yet
storage-systems
23 pages
Google File System Basics: Google World Wide Web Computers
No ratings yet
Google File System Basics: Google World Wide Web Computers
5 pages
MIT 6.824 - Lecture 3 - GFS
No ratings yet
MIT 6.824 - Lecture 3 - GFS
1 page
Paper Gfs Summary
No ratings yet
Paper Gfs Summary
14 pages
Unit-II (BIG DATA)
No ratings yet
Unit-II (BIG DATA)
9 pages
Lecture 4.1 - Hadoop - MapReduce - Hbase
No ratings yet
Lecture 4.1 - Hadoop - MapReduce - Hbase
94 pages
A Review On GOOGLE File System
No ratings yet
A Review On GOOGLE File System
4 pages
Hadoop and Big Data Unit 2
No ratings yet
Hadoop and Big Data Unit 2
11 pages
Unit 3.4 Gfs and Hdfs
No ratings yet
Unit 3.4 Gfs and Hdfs
4 pages
chap6
No ratings yet
chap6
54 pages
Rapid Application Development and Short-Time To The Market Low Latency Scalability High Availability Consistent View of The Data
No ratings yet
Rapid Application Development and Short-Time To The Market Low Latency Scalability High Availability Consistent View of The Data
21 pages
GFS
No ratings yet
GFS
44 pages
ds_2016_17_lec18
No ratings yet
ds_2016_17_lec18
26 pages
36 DC Expt9
No ratings yet
36 DC Expt9
4 pages
The Google File System: Kenneth Chiu
No ratings yet
The Google File System: Kenneth Chiu
40 pages
1564-Article Text-2810-1-10-20171231 PDF
No ratings yet
1564-Article Text-2810-1-10-20171231 PDF
5 pages
Chapter_2_c8ad153f2f004857aca733db68105108_1712934164766
No ratings yet
Chapter_2_c8ad153f2f004857aca733db68105108_1712934164766
21 pages
Chunky
No ratings yet
Chunky
3 pages
BDA Complete Notes
100% (1)
BDA Complete Notes
88 pages
AnalyzingGFS_HDFS
No ratings yet
AnalyzingGFS_HDFS
11 pages
Chapter 5a
No ratings yet
Chapter 5a
23 pages
Paper Review 1 - Google File System
No ratings yet
Paper Review 1 - Google File System
2 pages
Refer Slide Time: 00:15
No ratings yet
Refer Slide Time: 00:15
31 pages
Google Fs
No ratings yet
Google Fs
35 pages
05 en Distributed File Systems
No ratings yet
05 en Distributed File Systems
63 pages
9238 DC Assignment 3
No ratings yet
9238 DC Assignment 3
5 pages
Research On Cloud Data Storage
No ratings yet
Research On Cloud Data Storage
5 pages
Google File System
No ratings yet
Google File System
6 pages
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
From Everand
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
Nolan Reeves
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Arts Promotion and Preservation
No ratings yet
Arts Promotion and Preservation
2 pages
Format of A Project Report
No ratings yet
Format of A Project Report
8 pages
Radhakishan S/O R.S. Nathulal Vs Smt. Narainibai W/O Ramkishan and Anr. On 8/1/1962
No ratings yet
Radhakishan S/O R.S. Nathulal Vs Smt. Narainibai W/O Ramkishan and Anr. On 8/1/1962
3 pages
اقبال اور میر، تقابلی مطالعے کی چند جہتیں
No ratings yet
اقبال اور میر، تقابلی مطالعے کی چند جہتیں
20 pages
12 Angry Men - BALURAN, MISSY
No ratings yet
12 Angry Men - BALURAN, MISSY
5 pages
Guarding of Machinery and Safety of Electrical Equipment 1988 As Amended
No ratings yet
Guarding of Machinery and Safety of Electrical Equipment 1988 As Amended
4 pages
Module 3 - Plumbing Devices
No ratings yet
Module 3 - Plumbing Devices
19 pages
Story Writing
No ratings yet
Story Writing
4 pages
2024 Computer Training Seminar: About Us
No ratings yet
2024 Computer Training Seminar: About Us
2 pages
Trees Catalog
100% (3)
Trees Catalog
17 pages
Interplay of Employees Trade Unions and Management of Southern Railway With Special Reference To Srmu - A Study
0% (1)
Interplay of Employees Trade Unions and Management of Southern Railway With Special Reference To Srmu - A Study
127 pages
LUMION
No ratings yet
LUMION
4 pages
Reader's Digest
No ratings yet
Reader's Digest
21 pages
Quiz
No ratings yet
Quiz
1 page
Nicolaosullivancv 2017
No ratings yet
Nicolaosullivancv 2017
2 pages
Factors Affecting Online Shopping Decision Behavior of Vietnam Consumers Shopee International Platform (Operations)
No ratings yet
Factors Affecting Online Shopping Decision Behavior of Vietnam Consumers Shopee International Platform (Operations)
10 pages
April To March 2019 20 Presentation
No ratings yet
April To March 2019 20 Presentation
166 pages
Arch 18 - Islamic Architecture in The Philippines
50% (2)
Arch 18 - Islamic Architecture in The Philippines
2 pages
CIB 4603 Project 201930 25
No ratings yet
CIB 4603 Project 201930 25
8 pages
Annual Report 2019 - 07-31-2020
No ratings yet
Annual Report 2019 - 07-31-2020
336 pages
Novel One-Pot Synthesis of 1,2,4-Triazolidin-3-Thiones Comprising Piperidine Moiety
No ratings yet
Novel One-Pot Synthesis of 1,2,4-Triazolidin-3-Thiones Comprising Piperidine Moiety
7 pages
Approaches in Language Teaching PPT Mary Rose A. Ramirez
100% (4)
Approaches in Language Teaching PPT Mary Rose A. Ramirez
12 pages
Table of Specification Second Quarter
100% (1)
Table of Specification Second Quarter
4 pages
Far East Structural Steelwork Engineering LTD
No ratings yet
Far East Structural Steelwork Engineering LTD
23 pages
Rapid7 Penetration Testing Services Brief PDF
No ratings yet
Rapid7 Penetration Testing Services Brief PDF
2 pages
Tutorial On Navigation Software and Map Update
No ratings yet
Tutorial On Navigation Software and Map Update
12 pages
Summer Pack 2024 - 1st Grade
No ratings yet
Summer Pack 2024 - 1st Grade
28 pages
Influence of Vernacular Architecture in Evolution of Temple Forms of Odisha
No ratings yet
Influence of Vernacular Architecture in Evolution of Temple Forms of Odisha
4 pages

Unit 5 Lecture 2

Uploaded by

Unit 5 Lecture 2

Uploaded by

Subject Name :-Cloud Computing

Subject Code :- KCS 713

• GFS cluster consists of a single master and multiple chunkservers.

5. Client sends write command to primary

1. Application originates record append request.

2. GFS client translates requests and sends it to master.

5. Primary checks if record fits in specified chunk.

6. If record doesn’t fit, then the primary: Pads the chunk

7. Tell secondaries to do the same

8. And informs the client

9. Client then retries the append with the next chunk

10. If record fits, then the primary: Appends the record

12. And sends final response to the client

• Factors for choosing where to place the initially empty replicas:

1. We want to place new replicas on chunkservers with below-average disk space

2. We want to limit the number of “recent” creations on each chunkserver.

3. Spread replicas of a chunk across racks.

• master re-replicates a chunk.

• Finally, the master rebalances replicas periodically.

• Deleted by the application, the master logs the deletion immediately.

• File is just renamed to a hidden name .

• Memory metadata is erased.

• Fast Recovery Chunk

• Chunk server uses checksumming. Broken up into 64 KB blocks.

• Bottle neck for the clients.

2. Explain the difference between NoSQL v/s Relational database?

3. What does Google File System (GFS) mean?

4. What is GFS file system in Linux?

5. Explain Architecture of Google File System?

You might also like