0% found this document useful (0 votes)
8 views22 pages

Unit 5 Lecture 2

Uploaded by

Mansi Varshney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views22 pages

Unit 5 Lecture 2

Uploaded by

Mansi Varshney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Subject Name :-Cloud Computing

Subject Code :- KCS 713


Unit No. :- 5
Lecture No. :- 2
Topic Name :-Google File System
Contents
• Google File System (GFS)
• GFS Architecture
• System Interactions
• Read Algorithm
• Write Algorithm
• Master Operation
• Garbage Collection
• Fault tolerance
• Challenges
• Important Questions
• References
Cloud File Systems
• Google File System (GFS)
– Designed to manage relatively large files using a very large distributed cluster of
commodity servers connected by a high-speed network
– Handles:
• Failures even during reading or writing of individual files
• Fault tolerant: a necessity
– p(system failure) = 1-(1-p(component failure))N -->1 (for
large N)
• Support parallel reads, writes and appends by multiple simultaneous client
programs
• Hadoop Distributed File System (HDFS)
– Open source implementation of GFS architecture
– Available on Amazon EC2 cloud platform
GOOGLE FILE SYSTEM ARCHITECTURE

• GFS cluster consists of a single master and multiple chunkservers.


• The basic analogy of GFS is master , client , chunkservers.
GFS Architecture
• Files are divided into fixed-size chunks.
• Chunkservers store chunks on local disks as Linux files. Master maintains all file system metadata.
• Includes the namespace, access control information, the mapping from files to chunks, and the current
locations of chunks.
• Clients interact with the master for metadata operations. Chunkservers need not cache file data .
Chunk
• Similar to the concept of block in file systems.
• Compared to file systems, the size of chunk is 64 MB.
• Less chunks and less metadata for chunks in the master. Problem in this chunk size is developing a
hotspot.
• Property of chunk is chunks are stored in chunkserver as file, chunk handle, i.e., chunk file name.

Metadata
Master stores three major types of metadata: the file and chunk namespaces, the mapping from files to
chunks, and the location of each chunk’s replicas.
• First two types are kept persistent to an operation log stored on the master’s local
disk.
• Metadata is stored in memory, master operations are fast.
• Easy and efficient for the master to periodically scan . Periodic scanning is
used to implement chunk garbage collection, re-replication and chunk
migration .
Master
• Single process ,running on a separate machine that stores all metadata.
• Clients contact master to get the metadata to contact the chunkservers.
SYSTEM INTERACTION
Read Algorithm
1.Application originates the read request
2.GFS client translates the request form (filename, byte range) -> (filename,
chunk index), and sends it to master
3. Master responds with chunk handle and replica locations (i.e. chunkservers where the
replicas are stored)
4.Client picks a location and sends the (chunk handle, byte range) request to the location
5.Chunkserver sends requested data to the client
6.Client forwards the data to the application
Write Algorithm
1.Application originates the request
2.GFS client translates request from (filename, data) -> (filename, chunk index), and sends it to
master
3. Master responds with chunk handle and (primary + secondary) replica locations
4. Client pushes write data to all locations. Data is stored in chunkservers’ internal buffers

5. Client sends write command to primary


6. Primary determines serial order for data instances stored in its buffer and writes the instances in
that order to the chunk

7. Primary sends the serial order to the secondaries and tells them to perform the write
7. Secondaries respond to the primary
8. Primary responds back to the client
Record Append Algorithm

1. Application originates record append request.

2. GFS client translates requests and sends it to master.

3. Master responds with chunk handle and (primary + secondary) replica locations.

4. Client pushes write data to all replicas of the last chunk of the file.

5. Primary checks if record fits in specified chunk.

6. If record doesn’t fit, then the primary: Pads the chunk

7. Tell secondaries to do the same

8. And informs the client

9. Client then retries the append with the next chunk

10. If record fits, then the primary: Appends the record

11. Tells secondaries to write data at exact offset Receives responses from secondaries

12. And sends final response to the client


MASTER OPERATION
Name space management and locking
• Multiple operations are to be active and use locks over regions of the
• namespace.
• GFS does not have a per-directory data structure.
• GFS logically represents its namespace as a lookup table. Each master operation
acquires a set of locks before it runs.

Replica placement
• A GFS cluster is highly distributed.
• The chunk replica placement policy serves , maximize data reliability and availability, and
maximize network bandwidth utilization.
• Chunk replicas are also spread across racks.
Creation , Re-replication and Balancing Chunks

• Factors for choosing where to place the initially empty replicas:

1. We want to place new replicas on chunkservers with below-average disk space


utilization.

2. We want to limit the number of “recent” creations on each chunkserver.

3. Spread replicas of a chunk across racks.

• master re-replicates a chunk.


• Chunk that needs to be rereplicated is prioritized based on how far it is from its replication goal.

• Finally, the master rebalances replicas periodically.


GARBAGE COLLECTION
• Garbage collection at both the file and chunk levels.

• Deleted by the application, the master logs the deletion immediately.

• File is just renamed to a hidden name .

• The file can be read under the new, special name and can be undeleted.

• Memory metadata is erased.


FAULT TOLERANCE
High Availability

• Fast Recovery Chunk

• Replication Master

• Replication

Data Integrity

• Chunk server uses checksumming. Broken up into 64 KB blocks.


CHALLENGES

• Storage size.

• Bottle neck for the clients.

• Time.
Important Questions

1. What is NoSQL?

2. Explain the difference between NoSQL v/s Relational database?

3. What does Google File System (GFS) mean?

4. What is GFS file system in Linux?

5. Explain Architecture of Google File System?


References
 Dan C Marinescu: “ Cloud Computing Theory and Practice.” Elsevier(MK) 2013.
 RajkumarBuyya, James Broberg, Andrzej Goscinski: “Cloud Computing Principles
and Paradigms”, Willey 2014.
 https://www.ques10.com/p/13989/explain-architecture-of-google-file-system-1/
 https://www.sciencedirect.com/topics/computer-science/google-file-system
 https://www.researchgate.net/publication/220910111_The_Google_File_System

You might also like