Distributed Systems Simulation and Modeling (Group 19)
Distributed Systems Simulation and Modeling (Group 19)
Distributed
Systems
Simulation
and Modeling
Presented by Group 19
GROUP 19 COE 453 - Distributed Computing
Contents 09 Environments
Introduction to
Distributed Systems
Simulation and Modeling
Purpose Techniques
• Performance testing • Replay-based
• Bottleneck detection • Synthetic
• Scalability • Benchmarking tools (e.g.,
• Fault tolerance JMeter, Locust)
• Optimization
GROUP 19 COE 453 - Distributed Computing
Characteristics
• Request arrival rate
• Service time
• Concurrency
• Resource usage
GROUP 19 COE 453 - Distributed Computing
(Cont’d) 1️M
1 8 chars 5 hours
attempts/sec
Instead of solving a complicated math formula, we run a
Monte Carlo simulation:
5M
2 8 chars 1 hour
1. Randomly generate millions of fake hacking attempts. attempts/sec
Moderate
Overhead Low (event-driven) Low to moderate
(sampling-based)
Models edge computing Simulates edge and cloud Testing edge offloading
EdgeCloudSim
architectures resource distribution strategies
Replication, retries,
Some parts of the system
Network Partition WhatsApp message delays partition-tolerant
can't communicate
databases
Backups, failover
A server, disk, or memory
Hardware Crash Bank ATM server crash mechanisms, redundant
fails
power
Randomly shuts down Microservices & cloud Shuts down transaction services to test failover
Chaos Monkey
services resilience mechanisms
Simulates network
Network stability Delays customer transactions to check retry
Gremlin failures (latency,
testing mechanisms
dropped connections)
Simulates AWS
AWS cloud Tests if the system recovers from an AWS S3
AWS FIS outages & resource
applications storage failure
failures
Let’s consider an online banking system that handles transactions, account management, and fraud
detection. The bank’s infrastructure is cloud-based and runs on AWS and Kubernetes with multiple
microservices.
To test its fault tolerance, the bank can combine Chaos Monkey, Gremlin, AWS FIS, and LitmusChaos in
the following ways:
Step 1: Test Microservices Resilience with Chaos Monkey
Step 2: Simulate Network Failures Using Gremlin
Step 3: Test Cloud Service Failures with AWS FIS
Step 4: Test Kubernetes-Based Services with LitmusChaos
GROUP 19 COE 453 - Distributed Computing
By combining these tools, the bank ensures that its system can handle failures from multiple angles—from
random shutdowns to cloud outages—keeping transactions secure and minimizing downtime.
GROUP 19 COE 453 - Distributed Computing