1-Introduction
1-Introduction
1
Motivating Parallelism
• Traditionally, software has been written for serial computation:
• To be run on a single computer having a single Central Processing Unit (CPU);
• A problem is broken into a discrete series of instructions that are executed sequentially.
• Only one instruction may execute at any moment in time.
• Performance is achieved by executing instructions concurrently through multitasking
• “For example:
• Weather simulations
• Gaming 2
• Web servers
• Code breaking
Motivating Parallelism
• In the simplest sense, parallel computing is the simultaneous use of multiple
compute resources to solve a computational problem
• To be run using multiple CPUs
• A problem is broken into discrete parts that can be solved in parallel
• Each part is further broken down to a series of instructions
3
Motivating Parallelism
• The role of parallelism in accelerating computing speeds has been recognized
for several decades
• The scalable performance and lower cost of parallel platforms is reflected in the
wide variety of applications
4
Motivating Parallelism
• Developing parallel hardware and software has traditionally been time and effort
intensive.
• Latest trends in hardware design indicate that uniprocessors may not be able to
sustain the rate of realizable performance increments in the future.
6
The Computational Power Argument
• If one is to buy into Moore’s law, the question still remains – how does one
translate transistors into useful OPS (operations per second)?
• Most serial (or seemingly serial) processors rely extensively on implicit parallelism.
• The focus of this class, for the most part, is on explicit parallelism.
7
The Computational Power Argument
• Why doubling the transistors does not double the speed?
• Increase in number of transistor per processor is due to multi-core CPUs
• It means, to follow Moore’s law, companies had to introduce ultra large-scale
integrations and multi-core processing era
8
The Computational Power Argument
• So, we must look for efficient
parallel software solutions to fulfill
our future computational needs
• Solution(s)?
• Need to find more scalable
distributed and hybrid solutions
9
The memory/Disk Speed Argument
• While clock rates of high-end processors have increased at roughly 40% per
year over the past decade, DRAM access times have only improved at the
rate of roughly 10% per year over this interval
• This mismatch in speeds causes significant performance bottlenecks
• Some of the fastest growing applications of parallel computing utilize not the
raw computational speeds, rather their ability to pump data to memory and
disk faster
10
The Data Communication Argument
• As the network evolves, the vision of the Internet as one large computing
platform has emerged
• In many applications (typically databases and data mining) the volume of data
is such that they cannot be moved
• Any analyses on this data must be performed over the network using parallel
techniques
11
Computing vs. Systems
Distributed Systems
• A collection of autonomous computers, connected through a network and
distribution middleware
• This enables computers to coordinate their activities and to share the resources of
the system
• The system is usually perceived as a single, integrated computing facility
• Mostly connected with the hardware-based accelerations
Distributed Computing
• A specific use of distributed systems, to split a large and complex processing
into subparts and execute them in parallel, to increase the productivity
• Computing mainly concerned with software-based accelerations (i.e., designing and
implementing algorithms) 12
Parallel and Distributed Computing
Parallel (shared-memory) Computing
• The term is usually used for developing concurrent solutions for following two
types of the systems:
• Multi-core Architecture
• Many-core Architectures (i.e., GPUs)
Distributed Computing
• This type of computing is mainly concerned with developing algorithms for the
distributed cluster systems
• Here, distributed means a geographical distance between the computers
without any shared-memory
13
Scope of Parallel Computing Applications
• Parallelism finds applications in very diverse application domains for different
motivating reasons
• These range from improved application performance to cost considerations
14
Scientific Applications
• Functional and structural characterization of genes and proteins
• Large scale servers (mail and web servers) are often implemented using
parallel platforms
16
Applications in Computer Systems
• Network intrusion detection: A large amount of data needs to be analyzed and
processed
• Graphic processing
17
Von Neumann Architecture
• Named after the Hungarian mathematician John von Neumann who first
authored the general requirements for an electronic computer in his 1945 papers.
• Since then, virtually all computers have followed this basic design.
• Parallel computers still follow this basic design, just multiplied in units. The
basic, fundamental architecture remains the same.
19
Flynn's Classical Taxonomy
• There are different ways to classify parallel computers. One of the more widely
used classifications, in use since 1966, is called Flynn's Taxonomy.
SISD SIMD
Single Instruction, Single Data Single Instruction, Multiple Data
MISD MIMD
Multiple Instruction, Single Data Multiple Instruction, Multiple Data 20
Single Instruction, Single Data (SISD)
• A serial (non-parallel) computer
• Deterministic execution
• Single instruction: All processing units execute the same instruction at any given
clock cycle
• Multiple data: Each processing unit can operate on a different data element
• Best suited for specialized problems characterized by a high degree of regularity, such
as graphics/image processing.
• Most modern computers, particularly those with graphics processor units (GPUs)
employ SIMD instructions and execution units. 23
Multiple Instruction, Single Data (MISD)
• A type of a parallel computer
• Multiple Instruction: Each processing unit operates on the data independently.
via independent instruction streams.
• Single Data: A single data stream is fed into multiple processing units.
• Few (if any) actual examples of this class of parallel computer have ever existed.
• Some conceivable uses might be:
• multiple frequency filters operating on a single signal stream
• multiple cryptography algorithms attempting to crack a single coded message.
24
Multiple Instruction, Multiple Data (MIMD)
• Currently, the most common type of parallel
computer. Most modern computers fall into this
category.
• Parallel Execution: Execution of a program by more than one task, with each
task being able to execute the same or different statement at the same moment
in time.
31
Design Goals of Parallel and Distributed Computing
• Performance Improvement
• Speedup: Execute tasks faster by dividing the workload among multiple processors.
• High Throughput: Perform many tasks concurrently.
• Low Latency: Reduce time to complete individual tasks through parallel execution.
• Scalability
• Horizontal Scalability: Add more nodes/processors without significant
reconfiguration.
• Vertical Scalability: Increase the power of existing nodes.
• Load Balancing: Efficiently distribute work across available resources to prevent
bottlenecks.
32
Design Goals of Parallel and Distributed Computing
• Reliability and Fault Tolerance
• Redundancy: Ensure backup systems are available in case of failure.
• Replication: Duplicate critical data and services across nodes.
• Recovery Mechanism: Enable quick recovery after failures through checkpointing
and rollback mechanisms.
33
Design Goals of Parallel and Distributed Computing
• Transparency
• Access Transparency: Users do not need to know the physical location of
resources.
• Location Transparency: Resource location changes are hidden from users.
• Replication Transparency: Users are unaware of the presence of multiple copies.
• Concurrency Transparency: Multiple processes execute simultaneously without
conflict.
• Failure Transparency: Failures are masked from users to ensure uninterrupted
service.
• Cost Effectiveness
• Utilize cost-effective commodity hardware.
• Reduce operational costs through optimized parallel and distributed processing.
35
Types of Parallel Systems
• Shared Memory Systems:
• Multiple processors share a single address space (RAM)
• Communication happen through shared variables
• Examples:
• symmetric multiprocessing (SMP)
• NUMA (non-uniform memory access)
• Pros: Low communication overhead, easy programming
• Cons: Scalability issues, memory access bottlenecks
• Grid Computing
• Uses geographically distributed computers to perform large-scale computations.
• Typically, loosely coupled compared to clusters.
• Cloud Computing
• Uses virtualized distributed resources over the Internet.
• Provides services like IaaS, PaaS, SaaS.
• Example: AWS, Google Cloud, Microsoft Azure.
38
Types of Distributed Systems
• Peer-to-Peer (P2P) Systems
• No central authority; all nodes are equal and communicate directly.
• Used in file sharing, cryptocurrencies, decentralized networks.
• Example: BitTorrent, Ethereum blockchain.
39
Comparison of Parallel vs. Distributed Systems
40
Enabling Technologies for PDC
• Hardware:
• Multicore Processors
• CPUs with multiple cores for parallel execution (e.g., Intel Xeon, AMD Ryzen).
• High-Speed Networks: Networks that facilitate fast data transfer between computing
nodes in distributed systems.
• InfiniBand, Ethernet, RDMA for fast inter-node communication.
41
• Storage Technologies: Storage solutions designed for fast access and distributed data
management.
Enabling Technologies for PDC
• Middleware and Communication:
• Message Passing Interface (MPI)
• A standardized communication protocol used in parallel computing.(e.g., OpenMPI,
MPICH).
• Distributed File Systems: File systems that allow multiple machines to access shared
storage.
• HDFS, Google File System (GFS), Ceph.
• Cloud Computing:
• Cloud computing provides on-demand computing resources over the internet, allowing flexible
and scalable computing power.
• Amazon Web Services (AWS) – EC2, S3, Lambda, EMR.
• Microsoft Azure – Azure Virtual Machines, Kubernetes Service.
• Google Cloud Platform (GCP) – Compute Engine, Kubernetes, BigQuery.
• Distributed Databases:
• NoSQL Databases: Apache Cassandra, MongoDB, Google Bigtable.
• NewSQL Databases: Google Spanner, CockroachDB.
47
Summary: Technologies, Platforms, and Software
Category Examples
Hardware Multicore CPUs, GPUs (NVIDIA, AMD), FPGAs, InfiniBand
Middleware MPI, RPC, gRPC, Distributed File Systems (HDFS, Ceph)
Cloud Platform AWS, Azure, Google Cloud, OpenStack
Big Data Apache Hadoop, Spark, Google BigQuery
Edge Computing NVIDIA Jetson, AWS Greengrass, OpenVINO