0% found this document useful (0 votes)
10 views

1-Introduction

The document introduces parallel and distributed computing, emphasizing the shift from serial computation to utilizing multiple CPUs for simultaneous processing. It discusses the significance of parallelism in enhancing computational speed, addressing challenges like memory speed mismatches, and the evolution of computing architectures. The document also outlines various classifications of parallel computers and their applications across scientific and commercial domains.

Uploaded by

sp22-bcs-073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

1-Introduction

The document introduces parallel and distributed computing, emphasizing the shift from serial computation to utilizing multiple CPUs for simultaneous processing. It discusses the significance of parallelism in enhancing computational speed, addressing challenges like memory speed mismatches, and the evolution of computing architectures. The document also outlines various classifications of parallel computers and their applications across scientific and commercial domains.

Uploaded by

sp22-bcs-073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

CSC 334 – Parallel and Distributed Computing

Instructor: Ms. Muntha Amjad


Lecture# 01: Introduction

1
Motivating Parallelism
• Traditionally, software has been written for serial computation:
• To be run on a single computer having a single Central Processing Unit (CPU);
• A problem is broken into a discrete series of instructions that are executed sequentially.
• Only one instruction may execute at any moment in time.
• Performance is achieved by executing instructions concurrently through multitasking

• Uniprocessors are fast but:


• Some problems require too much computation
• Some problems use too much data
• Some problems have too many parameters to explore

• “For example:
• Weather simulations
• Gaming 2
• Web servers
• Code breaking
Motivating Parallelism
• In the simplest sense, parallel computing is the simultaneous use of multiple
compute resources to solve a computational problem
• To be run using multiple CPUs
• A problem is broken into discrete parts that can be solved in parallel
• Each part is further broken down to a series of instructions

• Instructions from each part execute simultaneously on different CPUs

3
Motivating Parallelism
• The role of parallelism in accelerating computing speeds has been recognized
for several decades

• Its role in providing multiplicity of datapaths and increased access to storage


elements has been significant in commercial applications.

• The scalable performance and lower cost of parallel platforms is reflected in the
wide variety of applications

4
Motivating Parallelism
• Developing parallel hardware and software has traditionally been time and effort
intensive.

• If one is to view this in the context of rapidly improving uniprocessor speeds,


one is tempted to question the need for parallel computing.

• Latest trends in hardware design indicate that uniprocessors may not be able to
sustain the rate of realizable performance increments in the future.

• This is the result of several fundamental physical and computational limitations.

• The emergence of standardized parallel programming environments, libraries,


and hardware have significantly reduced time to develop (parallel) solutions.
5
The Computational Power Argument
• Moore’s Law states (1965):
• Number of transistors incorporated on a microchip doubles approximately every
two years, while the cost of computers is halved.

6
The Computational Power Argument
• If one is to buy into Moore’s law, the question still remains – how does one
translate transistors into useful OPS (operations per second)?

• The logical recourse is to rely on parallelism, both implicit and explicit.

• Most serial (or seemingly serial) processors rely extensively on implicit parallelism.

• The focus of this class, for the most part, is on explicit parallelism.

7
The Computational Power Argument
• Why doubling the transistors does not double the speed?
• Increase in number of transistor per processor is due to multi-core CPUs
• It means, to follow Moore’s law, companies had to introduce ultra large-scale
integrations and multi-core processing era

• Will Moore’s law hold forever?


• Adding multiple cores on a single chip causes heat issues
• Furthermore, increasing the number of cores, may not be able to increase speeds
due to inter-process interactions
• Moreover, transistors would eventually reach the limit of miniaturization at atomic
level

8
The Computational Power Argument
• So, we must look for efficient
parallel software solutions to fulfill
our future computational needs

• As stated earlier, number of cores


on a single chip also have some
restrictions

• Solution(s)?
• Need to find more scalable
distributed and hybrid solutions

9
The memory/Disk Speed Argument
• While clock rates of high-end processors have increased at roughly 40% per
year over the past decade, DRAM access times have only improved at the
rate of roughly 10% per year over this interval
• This mismatch in speeds causes significant performance bottlenecks

• Parallel platforms provide increased bandwidth to the memory systems

• Parallel platforms also provide higher aggregate caches

• Some of the fastest growing applications of parallel computing utilize not the
raw computational speeds, rather their ability to pump data to memory and
disk faster

10
The Data Communication Argument
• As the network evolves, the vision of the Internet as one large computing
platform has emerged
• In many applications (typically databases and data mining) the volume of data
is such that they cannot be moved
• Any analyses on this data must be performed over the network using parallel
techniques

11
Computing vs. Systems
Distributed Systems
• A collection of autonomous computers, connected through a network and
distribution middleware
• This enables computers to coordinate their activities and to share the resources of
the system
• The system is usually perceived as a single, integrated computing facility
• Mostly connected with the hardware-based accelerations

Distributed Computing
• A specific use of distributed systems, to split a large and complex processing
into subparts and execute them in parallel, to increase the productivity
• Computing mainly concerned with software-based accelerations (i.e., designing and
implementing algorithms) 12
Parallel and Distributed Computing
Parallel (shared-memory) Computing
• The term is usually used for developing concurrent solutions for following two
types of the systems:
• Multi-core Architecture
• Many-core Architectures (i.e., GPUs)

Distributed Computing
• This type of computing is mainly concerned with developing algorithms for the
distributed cluster systems
• Here, distributed means a geographical distance between the computers
without any shared-memory
13
Scope of Parallel Computing Applications
• Parallelism finds applications in very diverse application domains for different
motivating reasons
• These range from improved application performance to cost considerations

14
Scientific Applications
• Functional and structural characterization of genes and proteins

• Applications in astrophysics have explored the evolution of galaxies,


thermonuclear processes, and the analysis of extremely large datasets from
telescope

• Advances in computational physics and chemistry have explored new materials,


understanding of chemical pathways, and more efficient processes

• Bioinformatics and astrophysics also present some of the most challenging


problems with respect to analyzing extremely large datasets

• Weather modeling, mineral prospecting, flood prediction etc., are other


15
important applications
Commercial Applications
• Some of the largest parallel computers power Wall Street

• Data mining analysis for optimizing business and marketing decisions

• Large scale servers (mail and web servers) are often implemented using
parallel platforms

• Applications such as information retrieval and search are typically powered by


large clusters

16
Applications in Computer Systems
• Network intrusion detection: A large amount of data needs to be analyzed and
processed

• Cryptography (the art of writing or solving codes) employs parallel


infrastructures and algorithms to solve complex codes

• Graphic processing

• A modern automobile consists of tens of processors communicating to perform


complex tasks for optimizing handling and performance

17
Von Neumann Architecture
• Named after the Hungarian mathematician John von Neumann who first
authored the general requirements for an electronic computer in his 1945 papers.

• Also known as "stored-program computer" – both program instructions and data


are kept in electronic memory. Differs from earlier computers which were
programmed through "hard wiring".

• Since then, virtually all computers have followed this basic design.

• Comprised of four main components:


• Memory
• Control Unit
• Arithmetic Logic Unit
• Input/Output 18
Von Neumann Architecture
• Basic Design
• RD/WR, random access memory is used to store both program instructions and data
• Program instructions are coded data which tell the computer to do something
• Data is simply information to be used by the program
• Control unit fetches instructions/data from memory, decodes the instructions and then
sequentially coordinates operations to accomplish the programmed task.
• Arithmetic Unit performs basic arithmetic operations
• Input/Output is the interface to the human operator

• Parallel computers still follow this basic design, just multiplied in units. The
basic, fundamental architecture remains the same.

19
Flynn's Classical Taxonomy
• There are different ways to classify parallel computers. One of the more widely
used classifications, in use since 1966, is called Flynn's Taxonomy.

• Flynn's taxonomy distinguishes multi-processor computer architectures


according to how they can be classified along the two independent dimensions
of Instruction and Data. Each of these dimensions can have only one of two
possible states: Single or Multiple.

• The Flynn matrix below defines the 4 possible classifications:

SISD SIMD
Single Instruction, Single Data Single Instruction, Multiple Data
MISD MIMD
Multiple Instruction, Single Data Multiple Instruction, Multiple Data 20
Single Instruction, Single Data (SISD)
• A serial (non-parallel) computer

• Single instruction: only one instruction stream is


being acted on by the CPU during any one clock cycle

• Single data: only one data stream is being used as


input during any one clock cycle

• Deterministic execution

• This is the oldest and until recently, the most prevalent


form of computer

• Examples: most PCs, single CPU workstations and


21
mainframes
Single Instruction, Multiple Data (SIMD)
• A type of parallel computer

• Single instruction: All processing units execute the same instruction at any given
clock cycle

• Multiple data: Each processing unit can operate on a different data element

• This type of machine typically has an instruction dispatcher, a very high-bandwidth


internal network, and a very large array of very small-capacity instruction units.

• Best suited for specialized problems characterized by a high degree of regularity, such
as graphics/image processing.

• Synchronous (lockstep) and deterministic execution


22
• Two varieties: Processor Arrays and Vector Pipelines
Single Instruction, Multiple Data (SIMD)
• Examples:
• Processor Arrays: Thinking Machine CM-2, MasPar MP-1 & MP-2
• Vector Pipelines: IBM 9000, Cray C90, Fujitsu VP, NEC SX-2, Hitachi S820

• Most modern computers, particularly those with graphics processor units (GPUs)
employ SIMD instructions and execution units. 23
Multiple Instruction, Single Data (MISD)
• A type of a parallel computer
• Multiple Instruction: Each processing unit operates on the data independently.
via independent instruction streams.
• Single Data: A single data stream is fed into multiple processing units.
• Few (if any) actual examples of this class of parallel computer have ever existed.
• Some conceivable uses might be:
• multiple frequency filters operating on a single signal stream
• multiple cryptography algorithms attempting to crack a single coded message.

24
Multiple Instruction, Multiple Data (MIMD)
• Currently, the most common type of parallel
computer. Most modern computers fall into this
category.

• Multiple Instruction: every processor may be


executing a different instruction stream

• Multiple Data: every processor may be working


with a different data stream

• Execution can be synchronous or


asynchronous, deterministic or non-deterministic

• Examples: most current supercomputers, 25


networked parallel computer "grids" and multi-
processor SMP computers, multi-core PCs.
General Parallel Computing Terminology
• CPU: Modern CPUs has one or more cores, where each core runs its own
instructions. CPUs can be part of multiple sockets, where each socket has its
own memory. If a CPU has multiple sockets, the system can share memory
across them.
• Node: A node is like a separate computer that has multiple CPUs, memory, and
network connections. Many nodes together form a supercomputer by
connecting through a network.
• Task: A logically discrete section of computational work. A task is typically a
program or program-like set of instructions that is executed by a processor. A
parallel program consists of multiple tasks running on multiple processors.

• Pipelining: Breaking a task into steps performed by different processor units,


with inputs streaming through, much like an assembly line; a type of parallel 26
computing.
General Parallel Computing Terminology
• Serial Execution: Execution of a program sequentially, one statement at a
time. In the simplest sense, this is what happens on a one processor machine.
However, virtually all parallel tasks will have sections of a parallel program that
must be executed serially.

• Parallel Execution: Execution of a program by more than one task, with each
task being able to execute the same or different statement at the same moment
in time.

• Shared Memory: From a strictly hardware point of view, describes a computer


architecture where all processors have direct (usually bus based) access to
common physical memory. In a programming sense, it describes a model
where parallel tasks all have the same "picture" of memory and can directly
address and access the same logical memory locations regardless of where 27
the physical memory actually exists.
General Parallel Computing Terminology
• Distributed Memory: In hardware, refers to network-based memory access for
physical memory that is not common. As a programming model, tasks can only
logically "see" local machine memory and must use communications to access
memory on other machines where other tasks are executing.

• Communications: Parallel tasks typically need to exchange data. There are


several ways this can be accomplished, such as through a shared memory bus
or over a network, however the actual event of data exchange is commonly
referred to as communications regardless of the method employed.

• Synchronization: The coordination of parallel tasks in real time, very often


associated with communications. Often implemented by establishing a
synchronization point within an application where a task may not proceed further
until another task(s) reaches the same or logically equivalent point. 28
Synchronization usually involves waiting by at least one task and can therefore
cause a parallel application's wall clock execution time to increase.
General Parallel Computing Terminology
• Computational Granularity: Granularity refers to how much computation is
done before a task needs to communicate with other tasks. It is the ratio of
computation to communication in a parallel program.
• There are two types:
• Coarse-Grained: Large chunks of computation are done before communication
happens.
• Pros: Less communication overhead, more efficiency.
• Cons: Less flexibility in load balancing.
• Fine-Grained: Small amounts of computation happen before communication.
• Pros: Better load balancing, suitable for highly dynamic tasks.
• Cons: High communication overhead, may slow down execution.
• Observed Speedup: This measures how much faster a program runs when
executed in parallel. Observed speedup of a code which has been parallelized, 29
defined as:
Observed Speedup=
General Parallel Computing Terminology
• Parallel Overhead: Extra execution time required only for parallel tasks (not useful
computations).
• Cause: Parallelization introduces additional tasks that don't exist in serial execution.
• Factors Contributing to Overhead:
• Task start-up time: Time needed to initialize parallel tasks.
• Synchronizations: Time spent waiting for tasks to reach a common point.
• Data communications: Time taken to exchange data between processors.
• Software overhead: Extra processing due to parallel programming languages, libraries, and
OS management.
• Task termination time: Time spent in wrapping up parallel execution.
• Massively Parallel: A computing system with many processing elements
working together. Earlier, many meant hundreds of processors. Today,
supercomputers have hundreds of thousands to millions of processors 30
• Examples: GPU clusters with thousands of cores.
General Parallel Computing Terminology
• Embarrassingly (IDEALLY) Parallel: A problem that can be easily parallelized
because tasks are independent and require little or no communication. No
synchronization or coordination needed between tasks.

• Scalability: The ability of a parallel system (hardware/software) to increase


speedup proportionally when more resources (processors) are added. Factors
that contribute to scalability include:
• Hardware limitations – CPU/memory bandwidth and network speed.
• Application algorithm – Some algorithms parallelize well, others do not.
• Parallel overhead – Too much overhead can limit scalability.
• Application characteristics – Code structure can affect parallel efficiency.

31
Design Goals of Parallel and Distributed Computing
• Performance Improvement
• Speedup: Execute tasks faster by dividing the workload among multiple processors.
• High Throughput: Perform many tasks concurrently.
• Low Latency: Reduce time to complete individual tasks through parallel execution.

• Scalability
• Horizontal Scalability: Add more nodes/processors without significant
reconfiguration.
• Vertical Scalability: Increase the power of existing nodes.
• Load Balancing: Efficiently distribute work across available resources to prevent
bottlenecks.
32
Design Goals of Parallel and Distributed Computing
• Reliability and Fault Tolerance
• Redundancy: Ensure backup systems are available in case of failure.
• Replication: Duplicate critical data and services across nodes.
• Recovery Mechanism: Enable quick recovery after failures through checkpointing
and rollback mechanisms.

• Resource Sharing and Utilization


• Efficient Resource Utilization: Optimize the use of computational resources like
CPU, memory, and network bandwidth.
• Multi-user Support: Allow multiple users or tasks to share and access resources
concurrently

33
Design Goals of Parallel and Distributed Computing
• Transparency
• Access Transparency: Users do not need to know the physical location of
resources.
• Location Transparency: Resource location changes are hidden from users.
• Replication Transparency: Users are unaware of the presence of multiple copies.
• Concurrency Transparency: Multiple processes execute simultaneously without
conflict.
• Failure Transparency: Failures are masked from users to ensure uninterrupted
service.

• Modularity and Flexibility


• Modularity: Design systems as independent components that can be developed,
tested, and maintained separately. 34
• Flexibility: Easily upgrade, modify, or reconfigure the system without major
overhauls.
Design Goals of Parallel and Distributed Computing
• Heterogeneity Support:
• Support for diverse hardware, operating systems, and network protocols to ensure
seamless integration and communication.

• Cost Effectiveness
• Utilize cost-effective commodity hardware.
• Reduce operational costs through optimized parallel and distributed processing.

• Security and Privacy


• Implement robust security measures like encryption, authentication, and secure
communication to protect data in distributed environments.

35
Types of Parallel Systems
• Shared Memory Systems:
• Multiple processors share a single address space (RAM)
• Communication happen through shared variables
• Examples:
• symmetric multiprocessing (SMP)
• NUMA (non-uniform memory access)
• Pros: Low communication overhead, easy programming
• Cons: Scalability issues, memory access bottlenecks

• Distributed Memory Systems:


• Each processor has its own private memory
• Processes communicate using message passing (MPI, PVM)
• Examples:
• Cluster Computing
• Massively Parallel Processors (MPP) 36
• Pros: High scalability, cost-effective
• Cons: Complex programming, data consistency challenges
Types of Parallel Systems
• Hybrid Systems (Distributed Shared Memory):
• Combines shared memory and distributed memory models
• Allows processes to communicate either through shared memory or message
passing
• Examples:
• Intel Xeon Phi, Cray supercomputers
• NUMA (non-uniform memory access)

• SIMD vs. MIMD Systems:


• SIMD (Single Instruction, Multiple Data) – One instruction operates on multiple data
streams (e.g., GPUs).
• MIMD (Multiple Instruction, Multiple Data) – Different processors execute different
instructions independently.
37
Types of Distributed Systems
• Cluster Computing
• Multiple computers (nodes) work together as a single logical unit.
• Typically uses high-speed local networks (LAN).
• Example: Beowulf Clusters, Apache Hadoop Clusters.

• Grid Computing
• Uses geographically distributed computers to perform large-scale computations.
• Typically, loosely coupled compared to clusters.

• Cloud Computing
• Uses virtualized distributed resources over the Internet.
• Provides services like IaaS, PaaS, SaaS.
• Example: AWS, Google Cloud, Microsoft Azure.
38
Types of Distributed Systems
• Peer-to-Peer (P2P) Systems
• No central authority; all nodes are equal and communicate directly.
• Used in file sharing, cryptocurrencies, decentralized networks.
• Example: BitTorrent, Ethereum blockchain.

• Internet of Things (IoT) Systems


• Distributed network of smart devices that process and share data.
• Uses Edge & Fog Computing to process data closer to the source.
• Example: Smart Cities, Industrial IoT.

39
Comparison of Parallel vs. Distributed Systems

Feature Parallel Systems Distributed Systems


Memory Sharing Shared or Distributed Completely Distributed
Communication Fast, Low Latency Uses Networks (LAN/WAN)
Scalability Limited Highly Scalable
Fault Tolerance Lower Higher
Use Cases HPC, Scientific Simulations Cloud, IoT, Big Data

40
Enabling Technologies for PDC
• Hardware:
• Multicore Processors
• CPUs with multiple cores for parallel execution (e.g., Intel Xeon, AMD Ryzen).

• GPUs (Graphics Processing Units)


• Parallel execution of thousands of threads (e.g., NVIDIA CUDA, AMD ROCm).

• FPGAs (Field-Programmable Gate Arrays)


• Reconfigurable hardware optimized for specific parallel tasks(e.g., Xilinx, Intel FPGA).

• High-Speed Networks: Networks that facilitate fast data transfer between computing
nodes in distributed systems.
• InfiniBand, Ethernet, RDMA for fast inter-node communication.
41
• Storage Technologies: Storage solutions designed for fast access and distributed data
management.
Enabling Technologies for PDC
• Middleware and Communication:
• Message Passing Interface (MPI)
• A standardized communication protocol used in parallel computing.(e.g., OpenMPI,
MPICH).

• Remote Procedure Call (RPC)


• A protocol that enables function execution on a remote system as if it were local. (e.g.,
gRPC, Thrift).

• Distributed File Systems: File systems that allow multiple machines to access shared
storage.
• HDFS, Google File System (GFS), Ceph.

• Virtualization & Containerization: Technologies that create isolated environments for 42


applications, making them more portable and scalable.
• Vmware, KVM, Docker, Kubernetes
Enabling Technologies for PDC
• Cloud & Edge Computing:
• Virtual Machines (VMs) & Hypervisors
• Xen, KVM, Hyper-V for cloud-based distributed computing.

• Serverless Computing: Serverless computing is a cloud computing model where


developers write and deploy code without managing or provisioning servers. The
cloud provider dynamically allocates resources and automatically scales applications
based on demand.
• AWS Lambda, Google Cloud Functions, Azure Functions

• Edge Computing Platforms: Edge computing is a distributed computing model where


data processing and computation occur closer to the data source (i.e., at the "edge"
of the network) rather than relying on a centralized cloud or data center.
• NVIDIA Jetson, Intel OpenVINO, AWS Greengrass 43
Platforms for PDC
• High Performance Computing (HPC):
• HPC involves supercomputers and clustered computing to solve complex computational
problems.
• Supercomputers – IBM Summit, Fugaku, Cray XC40.
• HPC Clusters – Linux-based Beowulf Clusters, Slurm Workload Manager.

• Cloud Computing:
• Cloud computing provides on-demand computing resources over the internet, allowing flexible
and scalable computing power.
• Amazon Web Services (AWS) – EC2, S3, Lambda, EMR.
• Microsoft Azure – Azure Virtual Machines, Kubernetes Service.
• Google Cloud Platform (GCP) – Compute Engine, Kubernetes, BigQuery.

• Big Data Analytics:


• Big data analytics platforms are used for processing large-scale datasets efficiently. 44
• Apache Hadoop – Distributed storage and processing (HDFS + MapReduce).
• Apache Spark – Fast in-memory distributed processing.
Platforms for PDC
• Blockchain & Distributed Ledger:
• Blockchain is a decentralized, distributed ledger used for secure transactions
and data management without a central authority.
• Ethereum – Smart contracts and decentralized applications.
• Hyperledger Fabric – Enterprise blockchain solutions.

• IoT and Edge Computing:


• IoT (Internet of Things) and Edge Computing focus on processing data closer
to the source (edge devices) rather than sending it to centralized cloud servers.
• AWS IoT, Azure IoT Hub – Cloud-based IoT data processing.
• Google TensorFlow Lite – AI inference on edge devices.
45
Software Environments for PDC
• Programming Models & Languages:
• Parallel Programming
• OpenMP (C, C++, Fortran) – Shared memory parallelism.
• MPI (C, C++, Python) – Message passing for distributed systems.
• CUDA (C, Python) – GPU programming for high-performance tasks.
• Distributed Programming
• Hadoop MapReduce (Java, Python) – Large-scale data processing.
• Apache Spark (Scala, Python) – Distributed in-memory computing.
• TensorFlow, PyTorch (Python) – Distributed deep learning.
• General-Purpose Languages
• Python (Dask, Ray) – Distributed computing frameworks.
• Rust – Safe and efficient parallel programming.
46
• Java, Scala – Used in cloud-based distributed systems.
Software Environments for PDC
• Workflow Management & Orchestration:
• Apache Airflow – Distributed task scheduling.
• Kubernetes – Container orchestration for cloud and edge.
• Apache Kafka – Distributed streaming and event processing.

• Distributed Databases:
• NoSQL Databases: Apache Cassandra, MongoDB, Google Bigtable.
• NewSQL Databases: Google Spanner, CockroachDB.

47
Summary: Technologies, Platforms, and Software

Category Examples
Hardware Multicore CPUs, GPUs (NVIDIA, AMD), FPGAs, InfiniBand
Middleware MPI, RPC, gRPC, Distributed File Systems (HDFS, Ceph)
Cloud Platform AWS, Azure, Google Cloud, OpenStack
Big Data Apache Hadoop, Spark, Google BigQuery
Edge Computing NVIDIA Jetson, AWS Greengrass, OpenVINO

Programming Models OpenMP, MPI, CUDA, Dask, Ray


Workflow Tools Apache Airflow, Kubernetes, Apache Kafka
Databases Apache Cassandra, Google Spanner, MongoDB
48

You might also like