Unit 5 Lecture 2
Unit 5 Lecture 2
Metadata
Master stores three major types of metadata: the file and chunk namespaces, the mapping from files to
chunks, and the location of each chunk’s replicas.
• First two types are kept persistent to an operation log stored on the master’s local
disk.
• Metadata is stored in memory, master operations are fast.
• Easy and efficient for the master to periodically scan . Periodic scanning is
used to implement chunk garbage collection, re-replication and chunk
migration .
Master
• Single process ,running on a separate machine that stores all metadata.
• Clients contact master to get the metadata to contact the chunkservers.
SYSTEM INTERACTION
Read Algorithm
1.Application originates the read request
2.GFS client translates the request form (filename, byte range) -> (filename,
chunk index), and sends it to master
3. Master responds with chunk handle and replica locations (i.e. chunkservers where the
replicas are stored)
4.Client picks a location and sends the (chunk handle, byte range) request to the location
5.Chunkserver sends requested data to the client
6.Client forwards the data to the application
Write Algorithm
1.Application originates the request
2.GFS client translates request from (filename, data) -> (filename, chunk index), and sends it to
master
3. Master responds with chunk handle and (primary + secondary) replica locations
4. Client pushes write data to all locations. Data is stored in chunkservers’ internal buffers
7. Primary sends the serial order to the secondaries and tells them to perform the write
7. Secondaries respond to the primary
8. Primary responds back to the client
Record Append Algorithm
3. Master responds with chunk handle and (primary + secondary) replica locations.
4. Client pushes write data to all replicas of the last chunk of the file.
11. Tells secondaries to write data at exact offset Receives responses from secondaries
Replica placement
• A GFS cluster is highly distributed.
• The chunk replica placement policy serves , maximize data reliability and availability, and
maximize network bandwidth utilization.
• Chunk replicas are also spread across racks.
Creation , Re-replication and Balancing Chunks
• The file can be read under the new, special name and can be undeleted.
• Replication Master
• Replication
Data Integrity
• Storage size.
• Time.
Important Questions
1. What is NoSQL?