0% found this document useful (0 votes)
8 views4 pages

Assignment 4 - 044

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views4 pages

Assignment 4 - 044

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

DISTRIBUTED COMPUTING

CS3551

ASSIGNMENT – 4

NAME: Saraniya P
REG.NO: 310822243044
DEPT: AI&DS
YEAR: III
1. Issues in Failure Recovery:
Failure recovery in distributed systems is complex due to several challenges:
• Unpredictable Failures: Hardware or software failures can occur at any node,
and their impact may cascade across the system.
• Concurrency and Non-Determinism: Processes in distributed systems
operate concurrently and may exhibit non-deterministic behaviour, making
recovery nontrivial.
• Global State Consistency: It is difficult to ensure a consistent global state for
recovery because distributed systems lack a single point of control.
• Partial Failures: Some nodes may fail while others remain operational,
complicating coordination and recovery efforts.
• Communication Failures: Loss or delay of messages between nodes can lead
to inconsistent states and make recovery harder.
• Cost of Recovery: Recovery mechanisms like checkpointing and logging add
computational and storage overhead.

2. Algorithm for Asynchronous Checkpointing and Recovery:


Asynchronous checkpointing allows processes in a distributed system to take
checkpoints independently, avoiding the overhead of coordination.

Steps:
1. Checkpointing:
o Each process periodically saves its local state (checkpoint)
without waiting for other processes.
o Checkpoints include process state and metadata about
dependencies (e.g., sent/received messages).

2. Log Communication:
o Messages exchanged between checkpoints are logged to
ensure they can be replayed during recovery.
o Processes log messages sent and received during execution.
3. Failure Detection:
Upon detecting a failure, the system identifies a set of consistent.
checkpoints for recovery.

4. Recovery:
o Processes roll back to their latest checkpoints.
o Lost messages after the checkpoints are replayed from the logs to
ensure consistency.

Benefits:
• No need for global coordination, reducing latency.
• Suitable for systems with frequent communication or high failure
probabilities.

Drawbacks:
• Risk of cascading rollbacks if dependencies among checkpoints are not
carefully managed.
• Increased storage and communication overhead for logging.

3. Log-Based Rollback Recovery


Log-based rollback recovery is a mechanism to recover a system to a
consistent state by replaying logged events after a failure.

Key Concepts:
• Event Logging:
o Each process logs events such as message sends, receives, and state
changes.
o Logs are stored persistently to survive failures.

• Consistent Recovery Point:


A consistent global state is reconstructed by rolling back processes to
their checkpoints and replaying logs.
Types:
1. Pessimistic Logging:
o Ensures logs are committed to stable storage synchronously before
proceeding, guaranteeing no loss of information.
o Low recovery overhead but high runtime overhead.

2. Optimistic Logging:
o Allows processes to proceed without waiting for logs to be committed,
reducing runtime overhead.
o Recovery may involve complex rollbacks and replays.

3. Causal Logging:
o Ensures logs respect causal dependencies between events.
o Balances runtime performance and recovery complexity.

Steps in Log-Based Rollback Recovery:


1. Detect failure and identify affected processes.
2. Roll back each process to the latest checkpoint.
3. Replay logged events to restore the system to a consistent state.
Advantages:
• Provides precise recovery by replaying only necessary events.
• Can tolerate multiple simultaneous failures if logs are intact.

Challenges:
• Managing and storing logs efficiently in large-scale systems.
• Ensuring that logs capture all necessary events for recovery without
excessive overhead.

You might also like