0% found this document useful (0 votes)

10 views

1-Introduction

The document introduces parallel and distributed computing, emphasizing the shift from serial computation to utilizing multiple CPUs for simultaneous processing. It discusses the significance of parallelism in enhancing computational speed, addressing challenges like memory speed mismatches, and the evolution of computing architectures. The document also outlines various classifications of parallel computers and their applications across scientific and commercial domains.

Uploaded by

sp22-bcs-073

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

1-Introduction

Uploaded by

sp22-bcs-073

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

CSC 334 – Parallel and Distributed Computing

Instructor: Ms. Muntha Amjad

Lecture# 01: Introduction

1
Motivating Parallelism
• Traditionally, software has been written for serial computation:
• To be run on a single computer having a single Central Processing Unit (CPU);
• A problem is broken into a discrete series of instructions that are executed sequentially.
• Only one instruction may execute at any moment in time.
• Performance is achieved by executing instructions concurrently through multitasking

• Uniprocessors are fast but:

• Some problems require too much computation
• Some problems use too much data
• Some problems have too many parameters to explore

• “For example:
• Weather simulations
• Gaming 2
• Web servers
• Code breaking
Motivating Parallelism
• In the simplest sense, parallel computing is the simultaneous use of multiple
compute resources to solve a computational problem
• To be run using multiple CPUs
• A problem is broken into discrete parts that can be solved in parallel
• Each part is further broken down to a series of instructions

• Instructions from each part execute simultaneously on different CPUs

3
Motivating Parallelism
• The role of parallelism in accelerating computing speeds has been recognized
for several decades

• Its role in providing multiplicity of datapaths and increased access to storage

elements has been significant in commercial applications.

• The scalable performance and lower cost of parallel platforms is reflected in the
wide variety of applications

4
Motivating Parallelism
• Developing parallel hardware and software has traditionally been time and effort
intensive.

• If one is to view this in the context of rapidly improving uniprocessor speeds,

one is tempted to question the need for parallel computing.

• Latest trends in hardware design indicate that uniprocessors may not be able to
sustain the rate of realizable performance increments in the future.

• This is the result of several fundamental physical and computational limitations.

• The emergence of standardized parallel programming environments, libraries,

and hardware have significantly reduced time to develop (parallel) solutions.
5
The Computational Power Argument
• Moore’s Law states (1965):
• Number of transistors incorporated on a microchip doubles approximately every
two years, while the cost of computers is halved.

6
The Computational Power Argument
• If one is to buy into Moore’s law, the question still remains – how does one
translate transistors into useful OPS (operations per second)?

• The logical recourse is to rely on parallelism, both implicit and explicit.

• Most serial (or seemingly serial) processors rely extensively on implicit parallelism.

• The focus of this class, for the most part, is on explicit parallelism.

7
The Computational Power Argument
• Why doubling the transistors does not double the speed?
• Increase in number of transistor per processor is due to multi-core CPUs
• It means, to follow Moore’s law, companies had to introduce ultra large-scale
integrations and multi-core processing era

• Will Moore’s law hold forever?

• Adding multiple cores on a single chip causes heat issues
• Furthermore, increasing the number of cores, may not be able to increase speeds
due to inter-process interactions
• Moreover, transistors would eventually reach the limit of miniaturization at atomic
level

8
The Computational Power Argument
• So, we must look for efficient
parallel software solutions to fulfill
our future computational needs

• As stated earlier, number of cores

on a single chip also have some
restrictions

• Solution(s)?
• Need to find more scalable
distributed and hybrid solutions

9
The memory/Disk Speed Argument
• While clock rates of high-end processors have increased at roughly 40% per
year over the past decade, DRAM access times have only improved at the
rate of roughly 10% per year over this interval
• This mismatch in speeds causes significant performance bottlenecks

• Parallel platforms provide increased bandwidth to the memory systems

• Parallel platforms also provide higher aggregate caches

• Some of the fastest growing applications of parallel computing utilize not the
raw computational speeds, rather their ability to pump data to memory and
disk faster

10
The Data Communication Argument
• As the network evolves, the vision of the Internet as one large computing
platform has emerged
• In many applications (typically databases and data mining) the volume of data
is such that they cannot be moved
• Any analyses on this data must be performed over the network using parallel
techniques

11
Computing vs. Systems
Distributed Systems
• A collection of autonomous computers, connected through a network and
distribution middleware
• This enables computers to coordinate their activities and to share the resources of
the system
• The system is usually perceived as a single, integrated computing facility
• Mostly connected with the hardware-based accelerations

Distributed Computing
• A specific use of distributed systems, to split a large and complex processing
into subparts and execute them in parallel, to increase the productivity
• Computing mainly concerned with software-based accelerations (i.e., designing and
implementing algorithms) 12
Parallel and Distributed Computing
Parallel (shared-memory) Computing
• The term is usually used for developing concurrent solutions for following two
types of the systems:
• Multi-core Architecture
• Many-core Architectures (i.e., GPUs)

Distributed Computing
• This type of computing is mainly concerned with developing algorithms for the
distributed cluster systems
• Here, distributed means a geographical distance between the computers
without any shared-memory
13
Scope of Parallel Computing Applications
• Parallelism finds applications in very diverse application domains for different
motivating reasons
• These range from improved application performance to cost considerations

14
Scientific Applications
• Functional and structural characterization of genes and proteins

• Applications in astrophysics have explored the evolution of galaxies,

thermonuclear processes, and the analysis of extremely large datasets from
telescope

• Advances in computational physics and chemistry have explored new materials,

understanding of chemical pathways, and more efficient processes

• Bioinformatics and astrophysics also present some of the most challenging

problems with respect to analyzing extremely large datasets

• Weather modeling, mineral prospecting, flood prediction etc., are other

15
important applications
Commercial Applications
• Some of the largest parallel computers power Wall Street

• Data mining analysis for optimizing business and marketing decisions

• Large scale servers (mail and web servers) are often implemented using
parallel platforms

• Applications such as information retrieval and search are typically powered by

large clusters

16
Applications in Computer Systems
• Network intrusion detection: A large amount of data needs to be analyzed and
processed

• Cryptography (the art of writing or solving codes) employs parallel

infrastructures and algorithms to solve complex codes

• Graphic processing

• A modern automobile consists of tens of processors communicating to perform

complex tasks for optimizing handling and performance

17
Von Neumann Architecture
• Named after the Hungarian mathematician John von Neumann who first
authored the general requirements for an electronic computer in his 1945 papers.

• Also known as "stored-program computer" – both program instructions and data

are kept in electronic memory. Differs from earlier computers which were
programmed through "hard wiring".

• Since then, virtually all computers have followed this basic design.

• Comprised of four main components:

• Memory
• Control Unit
• Arithmetic Logic Unit
• Input/Output 18
Von Neumann Architecture
• Basic Design
• RD/WR, random access memory is used to store both program instructions and data
• Program instructions are coded data which tell the computer to do something
• Data is simply information to be used by the program
• Control unit fetches instructions/data from memory, decodes the instructions and then
sequentially coordinates operations to accomplish the programmed task.
• Arithmetic Unit performs basic arithmetic operations
• Input/Output is the interface to the human operator

• Parallel computers still follow this basic design, just multiplied in units. The
basic, fundamental architecture remains the same.

19
Flynn's Classical Taxonomy
• There are different ways to classify parallel computers. One of the more widely
used classifications, in use since 1966, is called Flynn's Taxonomy.

• Flynn's taxonomy distinguishes multi-processor computer architectures

according to how they can be classified along the two independent dimensions
of Instruction and Data. Each of these dimensions can have only one of two
possible states: Single or Multiple.

• The Flynn matrix below defines the 4 possible classifications:

SISD SIMD
Single Instruction, Single Data Single Instruction, Multiple Data
MISD MIMD
Multiple Instruction, Single Data Multiple Instruction, Multiple Data 20
Single Instruction, Single Data (SISD)
• A serial (non-parallel) computer

• Single instruction: only one instruction stream is

being acted on by the CPU during any one clock cycle

• Single data: only one data stream is being used as

input during any one clock cycle

• Deterministic execution

• This is the oldest and until recently, the most prevalent

form of computer

• Examples: most PCs, single CPU workstations and

21
mainframes
Single Instruction, Multiple Data (SIMD)
• A type of parallel computer

• Single instruction: All processing units execute the same instruction at any given
clock cycle

• Multiple data: Each processing unit can operate on a different data element

• This type of machine typically has an instruction dispatcher, a very high-bandwidth

internal network, and a very large array of very small-capacity instruction units.

• Best suited for specialized problems characterized by a high degree of regularity, such
as graphics/image processing.

• Synchronous (lockstep) and deterministic execution

22
• Two varieties: Processor Arrays and Vector Pipelines
Single Instruction, Multiple Data (SIMD)
• Examples:
• Processor Arrays: Thinking Machine CM-2, MasPar MP-1 & MP-2
• Vector Pipelines: IBM 9000, Cray C90, Fujitsu VP, NEC SX-2, Hitachi S820

• Most modern computers, particularly those with graphics processor units (GPUs)
employ SIMD instructions and execution units. 23
Multiple Instruction, Single Data (MISD)
• A type of a parallel computer
• Multiple Instruction: Each processing unit operates on the data independently.
via independent instruction streams.
• Single Data: A single data stream is fed into multiple processing units.
• Few (if any) actual examples of this class of parallel computer have ever existed.
• Some conceivable uses might be:
• multiple frequency filters operating on a single signal stream
• multiple cryptography algorithms attempting to crack a single coded message.

24
Multiple Instruction, Multiple Data (MIMD)
• Currently, the most common type of parallel
computer. Most modern computers fall into this
category.

• Multiple Instruction: every processor may be

executing a different instruction stream

• Multiple Data: every processor may be working

with a different data stream

• Execution can be synchronous or

asynchronous, deterministic or non-deterministic

• Examples: most current supercomputers, 25

networked parallel computer "grids" and multi-
processor SMP computers, multi-core PCs.
General Parallel Computing Terminology
• CPU: Modern CPUs has one or more cores, where each core runs its own
instructions. CPUs can be part of multiple sockets, where each socket has its
own memory. If a CPU has multiple sockets, the system can share memory
across them.
• Node: A node is like a separate computer that has multiple CPUs, memory, and
network connections. Many nodes together form a supercomputer by
connecting through a network.
• Task: A logically discrete section of computational work. A task is typically a
program or program-like set of instructions that is executed by a processor. A
parallel program consists of multiple tasks running on multiple processors.

• Pipelining: Breaking a task into steps performed by different processor units,

with inputs streaming through, much like an assembly line; a type of parallel 26
computing.
General Parallel Computing Terminology
• Serial Execution: Execution of a program sequentially, one statement at a
time. In the simplest sense, this is what happens on a one processor machine.
However, virtually all parallel tasks will have sections of a parallel program that
must be executed serially.

• Parallel Execution: Execution of a program by more than one task, with each
task being able to execute the same or different statement at the same moment
in time.

• Shared Memory: From a strictly hardware point of view, describes a computer

architecture where all processors have direct (usually bus based) access to
common physical memory. In a programming sense, it describes a model
where parallel tasks all have the same "picture" of memory and can directly
address and access the same logical memory locations regardless of where 27
the physical memory actually exists.
General Parallel Computing Terminology
• Distributed Memory: In hardware, refers to network-based memory access for
physical memory that is not common. As a programming model, tasks can only
logically "see" local machine memory and must use communications to access
memory on other machines where other tasks are executing.

• Communications: Parallel tasks typically need to exchange data. There are

several ways this can be accomplished, such as through a shared memory bus
or over a network, however the actual event of data exchange is commonly
referred to as communications regardless of the method employed.

• Synchronization: The coordination of parallel tasks in real time, very often

associated with communications. Often implemented by establishing a
synchronization point within an application where a task may not proceed further
until another task(s) reaches the same or logically equivalent point. 28
Synchronization usually involves waiting by at least one task and can therefore
cause a parallel application's wall clock execution time to increase.
General Parallel Computing Terminology
• Computational Granularity: Granularity refers to how much computation is
done before a task needs to communicate with other tasks. It is the ratio of
computation to communication in a parallel program.
• There are two types:
• Coarse-Grained: Large chunks of computation are done before communication
happens.
• Pros: Less communication overhead, more efficiency.
• Cons: Less flexibility in load balancing.
• Fine-Grained: Small amounts of computation happen before communication.
• Pros: Better load balancing, suitable for highly dynamic tasks.
• Cons: High communication overhead, may slow down execution.
• Observed Speedup: This measures how much faster a program runs when
executed in parallel. Observed speedup of a code which has been parallelized, 29
defined as:
Observed Speedup=
General Parallel Computing Terminology
• Parallel Overhead: Extra execution time required only for parallel tasks (not useful
computations).
• Cause: Parallelization introduces additional tasks that don't exist in serial execution.
• Factors Contributing to Overhead:
• Task start-up time: Time needed to initialize parallel tasks.
• Synchronizations: Time spent waiting for tasks to reach a common point.
• Data communications: Time taken to exchange data between processors.
• Software overhead: Extra processing due to parallel programming languages, libraries, and
OS management.
• Task termination time: Time spent in wrapping up parallel execution.
• Massively Parallel: A computing system with many processing elements
working together. Earlier, many meant hundreds of processors. Today,
supercomputers have hundreds of thousands to millions of processors 30
• Examples: GPU clusters with thousands of cores.
General Parallel Computing Terminology
• Embarrassingly (IDEALLY) Parallel: A problem that can be easily parallelized
because tasks are independent and require little or no communication. No
synchronization or coordination needed between tasks.

• Scalability: The ability of a parallel system (hardware/software) to increase

speedup proportionally when more resources (processors) are added. Factors
that contribute to scalability include:
• Hardware limitations – CPU/memory bandwidth and network speed.
• Application algorithm – Some algorithms parallelize well, others do not.
• Parallel overhead – Too much overhead can limit scalability.
• Application characteristics – Code structure can affect parallel efficiency.

31
Design Goals of Parallel and Distributed Computing
• Performance Improvement
• Speedup: Execute tasks faster by dividing the workload among multiple processors.
• High Throughput: Perform many tasks concurrently.
• Low Latency: Reduce time to complete individual tasks through parallel execution.

• Scalability
• Horizontal Scalability: Add more nodes/processors without significant
reconfiguration.
• Vertical Scalability: Increase the power of existing nodes.
• Load Balancing: Efficiently distribute work across available resources to prevent
bottlenecks.
32
Design Goals of Parallel and Distributed Computing
• Reliability and Fault Tolerance
• Redundancy: Ensure backup systems are available in case of failure.
• Replication: Duplicate critical data and services across nodes.
• Recovery Mechanism: Enable quick recovery after failures through checkpointing
and rollback mechanisms.

• Resource Sharing and Utilization

• Efficient Resource Utilization: Optimize the use of computational resources like
CPU, memory, and network bandwidth.
• Multi-user Support: Allow multiple users or tasks to share and access resources
concurrently

33
Design Goals of Parallel and Distributed Computing
• Transparency
• Access Transparency: Users do not need to know the physical location of
resources.
• Location Transparency: Resource location changes are hidden from users.
• Replication Transparency: Users are unaware of the presence of multiple copies.
• Concurrency Transparency: Multiple processes execute simultaneously without
conflict.
• Failure Transparency: Failures are masked from users to ensure uninterrupted
service.

• Modularity and Flexibility

• Modularity: Design systems as independent components that can be developed,
tested, and maintained separately. 34
• Flexibility: Easily upgrade, modify, or reconfigure the system without major
overhauls.
Design Goals of Parallel and Distributed Computing
• Heterogeneity Support:
• Support for diverse hardware, operating systems, and network protocols to ensure
seamless integration and communication.

• Cost Effectiveness
• Utilize cost-effective commodity hardware.
• Reduce operational costs through optimized parallel and distributed processing.

• Security and Privacy

• Implement robust security measures like encryption, authentication, and secure
communication to protect data in distributed environments.

35
Types of Parallel Systems
• Shared Memory Systems:
• Multiple processors share a single address space (RAM)
• Communication happen through shared variables
• Examples:
• symmetric multiprocessing (SMP)
• NUMA (non-uniform memory access)
• Pros: Low communication overhead, easy programming
• Cons: Scalability issues, memory access bottlenecks

• Distributed Memory Systems:

• Each processor has its own private memory
• Processes communicate using message passing (MPI, PVM)
• Examples:
• Cluster Computing
• Massively Parallel Processors (MPP) 36
• Pros: High scalability, cost-effective
• Cons: Complex programming, data consistency challenges
Types of Parallel Systems
• Hybrid Systems (Distributed Shared Memory):
• Combines shared memory and distributed memory models
• Allows processes to communicate either through shared memory or message
passing
• Examples:
• Intel Xeon Phi, Cray supercomputers
• NUMA (non-uniform memory access)

• SIMD vs. MIMD Systems:

• SIMD (Single Instruction, Multiple Data) – One instruction operates on multiple data
streams (e.g., GPUs).
• MIMD (Multiple Instruction, Multiple Data) – Different processors execute different
instructions independently.
37
Types of Distributed Systems
• Cluster Computing
• Multiple computers (nodes) work together as a single logical unit.
• Typically uses high-speed local networks (LAN).
• Example: Beowulf Clusters, Apache Hadoop Clusters.

• Grid Computing
• Uses geographically distributed computers to perform large-scale computations.
• Typically, loosely coupled compared to clusters.

• Cloud Computing
• Uses virtualized distributed resources over the Internet.
• Provides services like IaaS, PaaS, SaaS.
• Example: AWS, Google Cloud, Microsoft Azure.
38
Types of Distributed Systems
• Peer-to-Peer (P2P) Systems
• No central authority; all nodes are equal and communicate directly.
• Used in file sharing, cryptocurrencies, decentralized networks.
• Example: BitTorrent, Ethereum blockchain.

• Internet of Things (IoT) Systems

• Distributed network of smart devices that process and share data.
• Uses Edge & Fog Computing to process data closer to the source.
• Example: Smart Cities, Industrial IoT.

39
Comparison of Parallel vs. Distributed Systems

Feature Parallel Systems Distributed Systems

Memory Sharing Shared or Distributed Completely Distributed
Communication Fast, Low Latency Uses Networks (LAN/WAN)
Scalability Limited Highly Scalable
Fault Tolerance Lower Higher
Use Cases HPC, Scientific Simulations Cloud, IoT, Big Data

40
Enabling Technologies for PDC
• Hardware:
• Multicore Processors
• CPUs with multiple cores for parallel execution (e.g., Intel Xeon, AMD Ryzen).

• GPUs (Graphics Processing Units)

• Parallel execution of thousands of threads (e.g., NVIDIA CUDA, AMD ROCm).

• FPGAs (Field-Programmable Gate Arrays)

• Reconfigurable hardware optimized for specific parallel tasks(e.g., Xilinx, Intel FPGA).

• High-Speed Networks: Networks that facilitate fast data transfer between computing
nodes in distributed systems.
• InfiniBand, Ethernet, RDMA for fast inter-node communication.
41
• Storage Technologies: Storage solutions designed for fast access and distributed data
management.
Enabling Technologies for PDC
• Middleware and Communication:
• Message Passing Interface (MPI)
• A standardized communication protocol used in parallel computing.(e.g., OpenMPI,
MPICH).

• Remote Procedure Call (RPC)

• A protocol that enables function execution on a remote system as if it were local. (e.g.,
gRPC, Thrift).

• Distributed File Systems: File systems that allow multiple machines to access shared
storage.
• HDFS, Google File System (GFS), Ceph.

• Virtualization & Containerization: Technologies that create isolated environments for 42

applications, making them more portable and scalable.
• Vmware, KVM, Docker, Kubernetes
Enabling Technologies for PDC
• Cloud & Edge Computing:
• Virtual Machines (VMs) & Hypervisors
• Xen, KVM, Hyper-V for cloud-based distributed computing.

• Serverless Computing: Serverless computing is a cloud computing model where

developers write and deploy code without managing or provisioning servers. The
cloud provider dynamically allocates resources and automatically scales applications
based on demand.
• AWS Lambda, Google Cloud Functions, Azure Functions

• Edge Computing Platforms: Edge computing is a distributed computing model where

data processing and computation occur closer to the data source (i.e., at the "edge"
of the network) rather than relying on a centralized cloud or data center.
• NVIDIA Jetson, Intel OpenVINO, AWS Greengrass 43
Platforms for PDC
• High Performance Computing (HPC):
• HPC involves supercomputers and clustered computing to solve complex computational
problems.
• Supercomputers – IBM Summit, Fugaku, Cray XC40.
• HPC Clusters – Linux-based Beowulf Clusters, Slurm Workload Manager.

• Cloud Computing:
• Cloud computing provides on-demand computing resources over the internet, allowing flexible
and scalable computing power.
• Amazon Web Services (AWS) – EC2, S3, Lambda, EMR.
• Microsoft Azure – Azure Virtual Machines, Kubernetes Service.
• Google Cloud Platform (GCP) – Compute Engine, Kubernetes, BigQuery.

• Big Data Analytics:

• Big data analytics platforms are used for processing large-scale datasets efficiently. 44
• Apache Hadoop – Distributed storage and processing (HDFS + MapReduce).
• Apache Spark – Fast in-memory distributed processing.
Platforms for PDC
• Blockchain & Distributed Ledger:
• Blockchain is a decentralized, distributed ledger used for secure transactions
and data management without a central authority.
• Ethereum – Smart contracts and decentralized applications.
• Hyperledger Fabric – Enterprise blockchain solutions.

• IoT and Edge Computing:

• IoT (Internet of Things) and Edge Computing focus on processing data closer
to the source (edge devices) rather than sending it to centralized cloud servers.
• AWS IoT, Azure IoT Hub – Cloud-based IoT data processing.
• Google TensorFlow Lite – AI inference on edge devices.
45
Software Environments for PDC
• Programming Models & Languages:
• Parallel Programming
• OpenMP (C, C++, Fortran) – Shared memory parallelism.
• MPI (C, C++, Python) – Message passing for distributed systems.
• CUDA (C, Python) – GPU programming for high-performance tasks.
• Distributed Programming
• Hadoop MapReduce (Java, Python) – Large-scale data processing.
• Apache Spark (Scala, Python) – Distributed in-memory computing.
• TensorFlow, PyTorch (Python) – Distributed deep learning.
• General-Purpose Languages
• Python (Dask, Ray) – Distributed computing frameworks.
• Rust – Safe and efficient parallel programming.
46
• Java, Scala – Used in cloud-based distributed systems.
Software Environments for PDC
• Workflow Management & Orchestration:
• Apache Airflow – Distributed task scheduling.
• Kubernetes – Container orchestration for cloud and edge.
• Apache Kafka – Distributed streaming and event processing.

• Distributed Databases:
• NoSQL Databases: Apache Cassandra, MongoDB, Google Bigtable.
• NewSQL Databases: Google Spanner, CockroachDB.

47
Summary: Technologies, Platforms, and Software

Category Examples
Hardware Multicore CPUs, GPUs (NVIDIA, AMD), FPGAs, InfiniBand
Middleware MPI, RPC, gRPC, Distributed File Systems (HDFS, Ceph)
Cloud Platform AWS, Azure, Google Cloud, OpenStack
Big Data Apache Hadoop, Spark, Google BigQuery
Edge Computing NVIDIA Jetson, AWS Greengrass, OpenVINO

Programming Models OpenMP, MPI, CUDA, Dask, Ray

Workflow Tools Apache Airflow, Kubernetes, Apache Kafka
Databases Apache Cassandra, Google Spanner, MongoDB
48

10987C ENU PowerPoint
No ratings yet
10987C ENU PowerPoint
278 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages
Chapter 1 - Parallel Architectures
No ratings yet
Chapter 1 - Parallel Architectures
60 pages
Lec1 Introduction to Parallel Computing (2)
No ratings yet
Lec1 Introduction to Parallel Computing (2)
40 pages
Lecture Parallel Computing
No ratings yet
Lecture Parallel Computing
6 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
30 pages
01 Intro Parallel Computing
No ratings yet
01 Intro Parallel Computing
40 pages
Parallel Computing Terminology
No ratings yet
Parallel Computing Terminology
11 pages
Parallel Computing Varun Patial
No ratings yet
Parallel Computing Varun Patial
41 pages
Parallel N Distributed Systems
No ratings yet
Parallel N Distributed Systems
44 pages
Introduction To Computing
No ratings yet
Introduction To Computing
6 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
38 pages
10 Parallel Computing
No ratings yet
10 Parallel Computing
15 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
Week1-Parallel-and-Distributed-Computing
No ratings yet
Week1-Parallel-and-Distributed-Computing
55 pages
Lecture 2 Introduction to Parallel and Distributed Computing
No ratings yet
Lecture 2 Introduction to Parallel and Distributed Computing
29 pages
Unit 1
No ratings yet
Unit 1
22 pages
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
No ratings yet
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
170 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Topic 1 2024
No ratings yet
Topic 1 2024
41 pages
Chapter # 1
No ratings yet
Chapter # 1
117 pages
Introduction To Parallel Computing: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Introduction To Parallel Computing: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
15 pages
KCS 713 Unit 1 Lecture 5
No ratings yet
KCS 713 Unit 1 Lecture 5
32 pages
L1 Introduction
No ratings yet
L1 Introduction
12 pages
Module 1: Parallelism Fundamentals Week 1 Learning Outcomes
No ratings yet
Module 1: Parallelism Fundamentals Week 1 Learning Outcomes
8 pages
Lecture-2-06.01.2025
No ratings yet
Lecture-2-06.01.2025
21 pages
Parallel Computing
100% (1)
Parallel Computing
53 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
28 pages
Lecture 4
No ratings yet
Lecture 4
27 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
No ratings yet
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
11 pages
Lecture 1
No ratings yet
Lecture 1
23 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
cloud computing
No ratings yet
cloud computing
30 pages
Unit 1
No ratings yet
Unit 1
54 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Lecture 1
No ratings yet
Lecture 1
18 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
14 pages
PDA_2
No ratings yet
PDA_2
105 pages
1.3. Underlying Principles of Parallel and Distributed Computing
No ratings yet
1.3. Underlying Principles of Parallel and Distributed Computing
118 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
23 pages
PC 1
No ratings yet
PC 1
53 pages
Paralle Processing in Brief
No ratings yet
Paralle Processing in Brief
31 pages
Hpc_unit-1 Insem Notes
No ratings yet
Hpc_unit-1 Insem Notes
76 pages
Lecture 1 - Introduction to PDC
No ratings yet
Lecture 1 - Introduction to PDC
24 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
High Performance Computing
100% (2)
High Performance Computing
164 pages
Week 1
No ratings yet
Week 1
74 pages
Computer Achitecture II - Parallel - Computing
No ratings yet
Computer Achitecture II - Parallel - Computing
46 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
34 pages
Reflection Paper - 052520
No ratings yet
Reflection Paper - 052520
4 pages
Introduction To Parallel Computing-Dr Nousheen
No ratings yet
Introduction To Parallel Computing-Dr Nousheen
43 pages
UNIT 3
No ratings yet
UNIT 3
46 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
From Everand
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
Sam Steed
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Module 03-Access Control
No ratings yet
Module 03-Access Control
35 pages
2-TypesofParallelism (1)
No ratings yet
2-TypesofParallelism (1)
69 pages
4-DesigningParallelPrograms
No ratings yet
4-DesigningParallelPrograms
69 pages
3-ParallelProgrammingModels
No ratings yet
3-ParallelProgrammingModels
20 pages
SE Unipune Syllabus
No ratings yet
SE Unipune Syllabus
52 pages
Red Hat Openstack Platform-16.2-Network Functions Virtualization Product Guide-En-Us
No ratings yet
Red Hat Openstack Platform-16.2-Network Functions Virtualization Product Guide-En-Us
21 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
CMP 408 Final
No ratings yet
CMP 408 Final
17 pages
Numaapi3 PDF
No ratings yet
Numaapi3 PDF
12 pages
Multiprocessing Wiki 20150330
No ratings yet
Multiprocessing Wiki 20150330
96 pages
COA Assignment
No ratings yet
COA Assignment
21 pages
h17317 Vxrail - Sap Hana VG
No ratings yet
h17317 Vxrail - Sap Hana VG
27 pages
EPG Architecture
No ratings yet
EPG Architecture
27 pages
Testbank ch01
No ratings yet
Testbank ch01
14 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
53 pages
Parallel Programming Platforms: Alexandre David 1.2.05
No ratings yet
Parallel Programming Platforms: Alexandre David 1.2.05
30 pages
Parallel Computing: "Parallelization" Redirects Here. For Parallelization of Manifolds, See
No ratings yet
Parallel Computing: "Parallelization" Redirects Here. For Parallelization of Manifolds, See
20 pages
Unit1 Parallel and Distributed
No ratings yet
Unit1 Parallel and Distributed
21 pages
OS Unit 1
No ratings yet
OS Unit 1
17 pages
DC unit 1 - notes
No ratings yet
DC unit 1 - notes
36 pages
AIX Tuning For Oracle DB
No ratings yet
AIX Tuning For Oracle DB
63 pages
KCS 713 Unit 1 Lecture 5
No ratings yet
KCS 713 Unit 1 Lecture 5
32 pages
Chapter 5: CPU Scheduling: Silberschatz, Galvin and Gagne ©2009 Operating System Concepts - 8 Edition
No ratings yet
Chapter 5: CPU Scheduling: Silberschatz, Galvin and Gagne ©2009 Operating System Concepts - 8 Edition
61 pages
Introduction To Parallel Computing Tutorial - HPC at LLNL
No ratings yet
Introduction To Parallel Computing Tutorial - HPC at LLNL
46 pages
Flynn's Classification
No ratings yet
Flynn's Classification
46 pages
《DPDK Cookbook - Intel® Developer Zone》
No ratings yet
《DPDK Cookbook - Intel® Developer Zone》
107 pages
Senol Cali Et Al., 2018
No ratings yet
Senol Cali Et Al., 2018
18 pages
Examtorrent: Best Exam Torrent, Excellent Test Torrent, Valid Exam Dumps Are Here Waiting For You
No ratings yet
Examtorrent: Best Exam Torrent, Excellent Test Torrent, Valid Exam Dumps Are Here Waiting For You
6 pages
Infrastructure of Data Warehouse: Ms. Ashwini Rao Asst - Prof.IT
No ratings yet
Infrastructure of Data Warehouse: Ms. Ashwini Rao Asst - Prof.IT
32 pages
HPC UNIT 1 SOLUTION
No ratings yet
HPC UNIT 1 SOLUTION
8 pages
What Are Resource Sharing and Web Challenge in DS
No ratings yet
What Are Resource Sharing and Web Challenge in DS
18 pages
atII Bks Lec 2021 31 32
No ratings yet
atII Bks Lec 2021 31 32
16 pages

1-Introduction

Uploaded by

1-Introduction

Uploaded by

CSC 334 – Parallel and Distributed Computing

Instructor: Ms. Muntha Amjad

• Uniprocessors are fast but:

• Instructions from each part execute simultaneously on different CPUs

• Its role in providing multiplicity of datapaths and increased access to storage

• If one is to view this in the context of rapidly improving uniprocessor speeds,

• This is the result of several fundamental physical and computational limitations.

• The emergence of standardized parallel programming environments, libraries,

• The logical recourse is to rely on parallelism, both implicit and explicit.

• Will Moore’s law hold forever?

• As stated earlier, number of cores

• Parallel platforms provide increased bandwidth to the memory systems

• Parallel platforms also provide higher aggregate caches

• Applications in astrophysics have explored the evolution of galaxies,

• Advances in computational physics and chemistry have explored new materials,

• Bioinformatics and astrophysics also present some of the most challenging

• Weather modeling, mineral prospecting, flood prediction etc., are other

• Data mining analysis for optimizing business and marketing decisions

• Applications such as information retrieval and search are typically powered by

• Cryptography (the art of writing or solving codes) employs parallel

• A modern automobile consists of tens of processors communicating to perform

• Also known as "stored-program computer" – both program instructions and data

• Comprised of four main components:

• Flynn's taxonomy distinguishes multi-processor computer architectures

• The Flynn matrix below defines the 4 possible classifications:

• Single instruction: only one instruction stream is

• Single data: only one data stream is being used as

• This is the oldest and until recently, the most prevalent

• Examples: most PCs, single CPU workstations and

• This type of machine typically has an instruction dispatcher, a very high-bandwidth

• Synchronous (lockstep) and deterministic execution

• Multiple Instruction: every processor may be

• Multiple Data: every processor may be working

• Execution can be synchronous or

• Examples: most current supercomputers, 25

• Pipelining: Breaking a task into steps performed by different processor units,

• Shared Memory: From a strictly hardware point of view, describes a computer

• Communications: Parallel tasks typically need to exchange data. There are

• Synchronization: The coordination of parallel tasks in real time, very often

• Scalability: The ability of a parallel system (hardware/software) to increase

• Resource Sharing and Utilization

• Modularity and Flexibility

• Security and Privacy

• Distributed Memory Systems:

• SIMD vs. MIMD Systems:

• Internet of Things (IoT) Systems

Feature Parallel Systems Distributed Systems

• GPUs (Graphics Processing Units)

• FPGAs (Field-Programmable Gate Arrays)

• Remote Procedure Call (RPC)

• Virtualization & Containerization: Technologies that create isolated environments for 42

• Serverless Computing: Serverless computing is a cloud computing model where

• Edge Computing Platforms: Edge computing is a distributed computing model where

• Big Data Analytics:

• IoT and Edge Computing:

Programming Models OpenMP, MPI, CUDA, Dask, Ray

You might also like