1 Introduction

The document provides an introduction to parallel programming, highlighting the shift from single-core to multi-core processors and the necessity for parallel programs to leverage increased computational power. It discusses the architecture of parallel computers, performance metrics, and the taxonomy of parallel systems, including shared-memory and distributed-memory systems. Additionally, it outlines the advantages and disadvantages of these systems, emphasizing the importance of parallelism in various applications such as scientific research and data analysis.

Uploaded by

wwesmackrawxtew

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views30 pages

1 Introduction

Uploaded by

wwesmackrawxtew

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Introduction to parallel

programming
Why parallel computing?
▪ From 1986-2002
▪ Performance of microprocessor increased on an average by 50%
▪ But later, the performance gain was reduced by 20%
▪ Because computer designers started focusing on designing parallel computers
▪ Rather than designing complex single core processors
▪ Multi-core processors
▪ But software developers used to develop serial programs
▪ Aren’t single processor systems fast enough?
▪ Why build parallel systems?
▪ Why we need parallel programs?
Why we need ever-lasting increase in
performance?
▪ Past improvements in performance of microprocessor resulted in
quicker web searcher, accurate and quick medical diagnosis, realistic
computer games, etc
▪ Higher computation power means we can solve larger problems:
▪ Climate modelling
▪ Protein folding
▪ Drug discovery
▪ Energy research
▪ Data analysis
Why we are building parallel systems?
▪ Increase in single processor performance has been due the ever-increasing
density of transistors
▪ As the size of transistors decreases, their speed can be increased
▪ Their power consumption also increases
▪ Dissipates heats
▪ Highly unreliable
▪ Hence, it was impossible to increase the speed of integrated circuits
▪ But, increasing transistor density can continue
▪ Rather than building ever-faster, more complex, monolithic processors
▪ They started bringing out multiple, relatively simple, complete processors
on a single chip
Why we need to write parallel programs?
▪ Most serial programs are designed to run on single core
▪ They are unaware of multiple processors
▪ We can at max run multiple instances of same program on multiple
cores
▪ This is not what we want. Why?
Why parallelism?
▪ Transistor to FLOPs
▪ It is possible to fabricate devices with very large transistor counts
▪ How we use these transistors to achieve increasing rates of computation?
▪ Memory and Disk speed
▪ Overall speed of computation is determined not just by the speed of the
processor, but also by the ability of the memory system to feed data to it
▪ Bottleneck: Gap between processor speed and memory
▪ Data communication
▪ Data mining: mining of large data distributed over relatively low bandwidth
data
▪ Without parallelism its not possible to collect the data at a central location
Applications of parallel computing
▪ Applications in Engineering and Design
▪ Optimization problems
▪ Internal combustion engines
▪ Airfoils designs in aircraft
▪ Scientific Applications
▪ Sequencing of the human genome
▪ Weather modeling, mineral prospecting, flood prediction, etc.
▪ Commercial Application
▪ Web and database servers
▪ Data mining
▪ Analysis for optimizing business and marketing decisions
▪ Applications in Computer Systems
Introduction
▪ Parallel Computing: It is the use of parallel computer to reduce the
time needed to solve computational problem

▪ Parallel Computers: It is a multi-processor computer system that

supports parallel programming
▪ Two types of parallel computers:
▪ Multi-computer: Parallel computer constructed out of multiple computers and
an inter-connection network
▪ Centralized multi-processors: An integrated system in which all CPUs share
access to a single global memory
Introduction
▪ Parallel Programming: is programming in a language that allows that
allows you to explicitly indicate how different portion of the
computation may be executed concurrently
Stored-program computer architecture
▪ Instructions are numbers that are stored as
data in memory
▪ Instructions are read and executed by a control
unit
▪ Arithmetic logic unit is responsible for actual
computation. It manipulates the data along
with the instructions
▪ I/O facilities allows for the communication with
the users
▪ Control unit and ALU along with appropriate
interfaces to memory and I/O is called as CPU
Stored-program computer architecture
▪ Programming a stored program computer requires us to modify the
instructions stored in memory
▪ This is generally done by another program called as compiler
▪ This is the general blueprint for all mainstream computers
▪ Has few drawbacks:
▪ Instructions and data must be continuously fed to the control and arithmetic
units: This is known as von Neumann bottleneck
▪ The architecture is sequential, processing a single instruction with (possibly) a
single operand or a group of operands from memory
General purpose cache-based microprocessor
architecture
▪ Arithmetic units are responsible for running
the applications
▪ FP (Floating point) and INT (Integer)
▪ CPU registers hold operands to be accessed
by instructions
▪ INT reg. file and FP reg. file
▪ 16-128 such registers are generally available
▪ LD and ST units handle instructions that
transfer data to and from registers
▪ Instructions are stored in queues to be
executed
▪ Cache holds the data for re-use
Performance metrics
▪ Let Π be an arbitrary computational problem which is to be solved by
a computer
▪ Sequential algorithm performs one operation in each step
▪ Parallel algorithm may perform multiple operations in a single step
▪ Let 𝑃 be a parallel algorithm that has parallelism
▪ Let 𝐶(𝑝) be a parallel computer of the kind 𝐶 which contains 𝑝
processing units
Performance metrics
▪ The performance of P depends on both C and p
▪ We must consider two things:
▪ Potential parallelism in 𝑃
▪ Ability of 𝐶(𝑝) to execute, in parallel, multiple operations of P
▪ So, the performance of the algorithm 𝑃 on the parallel computer
𝐶(𝑝) depends on 𝐶(𝑝) ’s capability to exploit 𝑃’s potential parallelism
▪ The “performance” means the time required to execute 𝑃 on 𝐶(𝑝)
▪ This is called the parallel execution time (or, parallel runtime) of 𝑃 on 𝐶(𝑝)
▪ Denoted by 𝑻𝒑𝒂𝒓
Performance metrics
▪ Speedup: How many times is the parallel execution of 𝑃 in 𝐶(𝑝)
faster than the sequential execution of 𝑃

𝑇𝑠𝑒𝑞
𝑆=
𝑇𝑝𝑎𝑟

▪ Parallel execution of 𝑃 on 𝐶(𝑝) is 𝑆 times faster than sequential

execution
Performance metrics and enhancement
▪ Efficiency: Average contribution of each of the 𝑝 processing units of
𝐶(𝑝) to the speedup
𝑆
𝐸=
𝑝

▪ Since, 𝑇𝑝𝑎𝑟 ≤ 𝑇𝑠𝑒𝑞 ≤ 𝑝. 𝑇𝑝𝑎𝑟 , Speedup is bounded by 𝑝 and efficiency

is bounded by 1
𝐸≤1
▪ “For any 𝐶 and 𝑝, the parallel execution of 𝑃 on 𝐶(𝑝) can be at most
𝑝 times faster than the execution of 𝑃 on a single processor”
Example:
1

1-P P

• Time taken to execute the given parallel program= 𝑇𝑝𝑎𝑟 + 𝑇𝑠𝑒𝑞

• Time taken to execute parallel part =P /n
• Where n=number of processors
• Then the time of running the parallel program will be 1-P+P/n
Example:
• Assume, 80% of the program can be parallelized
• Then, 20% cannot be parallelized
• Assume n=4
• Then, time taken to run the parallel program is : 1-0.8+(0.8/4)=0.4

• Speedup (S) = 𝑇𝑠𝑒𝑞 / 𝑇𝑝𝑎𝑟

=1/0.4
=2.5
• Efficiency=S/n=2.5/4
Taxonomy of parallel computers
▪ In 1966, Michael Flynn proposed a taxonomy of computer
architectures
▪ Based on how many instructions and data items they can
process concurrently
▪ SISD: A sequential machine that can execute one
instruction at a time on a single data item
(Conventional non-parallel systems)
▪ SIMD: Single instruction is applied on a collection of
items (Ex: GPUs)
▪ MISD: Multiple instructions are applied on a single
data
▪ MIMD: Multiple instructions are applied on multiple
data
Taxonomy of parallel computers
▪ MIMD can be further divided into two categories:
▪ Shared-memory MIMD
▪ Distributed-memory MIMD
Shared-memory systems
▪ A number of CPUs work on a common, shared physical address space
▪ Two varieties of shared-memory systems
▪ Uniform Memory Access
▪ Latency and bandwidth are same for all processors and memory locations
▪ Also called as Symmetric Multi-Processing (SMP)
▪ On cache-coherent Nonuniform Memory Access
▪ Memory is physically distributed but logically shared
▪ The physical layout of such systems are similar to distributed systems
▪ The network logic makes the aggregated memory of the whole system appear as one
single address space
▪ In both these cases, copy of same information may reside in different
caches, probably in modified state
Cache-coherence
▪ Copies of the same cache line could potentially reside in several CPU
caches
▪ One of those gets modified and evicted to memory
▪ The other caches’ contents reflect outdated data
▪ Cache coherence protocols ensure a consistent view of memory
under all circumstances P1 P2
C1 C2

A1 A2 A1 A2

A1 A2

Memory
Cache-coherence (MESI protocol)
▪ Under control of cache coherence logic discrepancy can be
avoided
▪ M modified: The cache line has been modified in this cache, P1 P2
and it resides in no other cache than this one. Only upon C1 C2
eviction, memory reflect the most current state.
▪ E exclusive: The cache line has been read from memory but not
(yet) modified. However, it resides in no other cache. A1 A2 A1 A2

▪ S shared: The cache line has been read from memory but not
(yet) modified. There may be other copies in other caches of the A1 A2
machine.
Memory
▪ I invalid: The cache line does not reflect any sensible data.
Under normal circumstances this happens if the cache line was
in the shared state and another processor has requested
exclusive ownership.
Uniform Memory Access (UMA)
▪ Simplest implementation of a UMA system is a dual-core processor, in
which two CPUs on one chip share a single path to memory

▪ Problem of UMA systems is that bandwidth bottlenecks are bound to occur

ccNUMA
▪ Locality domain (LD) is a set of processor cores together with locally
connected memory
▪ Multiple LDs are linked via a coherent interconnect
▪ Provides transparent access from any processor to any other processor’s
memory
Shared-memory systems
▪ Advantages:
▪ Global address space provides a user-friendly programming perspective to
memory
▪ Fast and uniform data sharing due to proximity of memory to CPUs
▪ Disadvantages
▪ Lack of scalability between memory and CPUs. Adding more CPUs increases
traffic on the shared memory-CPU path
▪ Programmers responsibility for correct access to global memory
Distributed-memory systems
▪ Each processor P is connected to
exclusive local memory
▪ No other CPU has direct access to it
▪ Each node comprises at least one
network interface
▪ A serial process runs on each CPU that
can communicate with other processes
on other CPUs by means of the network
Distributed-memory systems
▪ Advantages:
▪ Memory is scalable with number of CPUs
▪ Each CPU can rapidly access its own memory without overhead incurred with
trying to maintain global cache coherence
▪ Disadvantages
▪ Programmer is responsible for many of the details associated with data
communication between processors
▪ It is usually difficult to map existing data structures to this memory
organization, based on global memory
Hybrid systems
▪ Large-scale parallel computers are neither of the purely shared-
memory nor of the purely distributed-memory
▪ Shared-memory building blocks connected via a fast network
▪ Advantages:
▪ Increased scalability
▪ Disadvantages:
▪ Increased programming complexity
END

Parallel_computing
No ratings yet
Parallel_computing
32 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
KCS 713 Unit 1 Lecture 5
No ratings yet
KCS 713 Unit 1 Lecture 5
32 pages
Lecture-2-06.01.2025
No ratings yet
Lecture-2-06.01.2025
21 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
PC 1
No ratings yet
PC 1
53 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
MULTIPROCTLPA
No ratings yet
MULTIPROCTLPA
99 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages
Unit 1
No ratings yet
Unit 1
22 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
Chapter 02 - Asynchronous and Parallel Programming in .NET
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in .NET
55 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
1-Introduction
No ratings yet
1-Introduction
48 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
33 pages
001__DDS-IIIT-Jan-10th
No ratings yet
001__DDS-IIIT-Jan-10th
34 pages
Week1-Parallel-and-Distributed-Computing
No ratings yet
Week1-Parallel-and-Distributed-Computing
55 pages
Parallel Programming- Unit 1
No ratings yet
Parallel Programming- Unit 1
81 pages
Multiprocessing vs Multithreading 2
No ratings yet
Multiprocessing vs Multithreading 2
16 pages
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
No ratings yet
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
80 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
91 pages
HPC-Unit-1
No ratings yet
HPC-Unit-1
65 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
HPC-Unit-2
No ratings yet
HPC-Unit-2
72 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
Hpc_unit-1 Insem Notes
No ratings yet
Hpc_unit-1 Insem Notes
76 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
28 pages
What Is Serial Computing?: Traditionally, Software Has Been Written For Serial Computation
No ratings yet
What Is Serial Computing?: Traditionally, Software Has Been Written For Serial Computation
22 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
No ratings yet
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
70 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Architecture1 1 (2012)
No ratings yet
Architecture1 1 (2012)
87 pages
Unit 1- Part 1
No ratings yet
Unit 1- Part 1
51 pages
Coa Unit 04
No ratings yet
Coa Unit 04
85 pages
Parallel Computing Terminology
No ratings yet
Parallel Computing Terminology
11 pages
Parallelism in Computer Architecture
No ratings yet
Parallelism in Computer Architecture
27 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
Parallel Computing
100% (1)
Parallel Computing
12 pages
01 Intro Parallel Computing
No ratings yet
01 Intro Parallel Computing
40 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
Module 4- Architecture
No ratings yet
Module 4- Architecture
22 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
CSC580 Quick Notes Lect1and2
100% (1)
CSC580 Quick Notes Lect1and2
18 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Lec3 - Cache and Memory System
No ratings yet
Lec3 - Cache and Memory System
18 pages
Computer Architecture
No ratings yet
Computer Architecture
7 pages
Ieee Papers
No ratings yet
Ieee Papers
8 pages
Difference Between Cache Memory and Virtual Memory
No ratings yet
Difference Between Cache Memory and Virtual Memory
2 pages
Module 5
100% (1)
Module 5
7 pages
Gem 5 IO
No ratings yet
Gem 5 IO
161 pages
Amd Fusion: Mithun.M
No ratings yet
Amd Fusion: Mithun.M
8 pages
Belarc Advisor - Computer Profile - Hendro PC (
No ratings yet
Belarc Advisor - Computer Profile - Hendro PC (
4 pages
câu hỏi ôn tập - đề
No ratings yet
câu hỏi ôn tập - đề
16 pages
KIỂM TRA CUỐI KỲ tdtu
No ratings yet
KIỂM TRA CUỐI KỲ tdtu
21 pages
Microcomputer System Design Trend
No ratings yet
Microcomputer System Design Trend
26 pages
10.10 CSAHK501 Trainee's manual_Template_11Jul_docx_EDITTED_Callixte
No ratings yet
10.10 CSAHK501 Trainee's manual_Template_11Jul_docx_EDITTED_Callixte
34 pages
Info Dmi
No ratings yet
Info Dmi
15 pages
HP Pavilion 14 Laptop PC - Maintenance and Service Guide PDF
No ratings yet
HP Pavilion 14 Laptop PC - Maintenance and Service Guide PDF
128 pages
HW 9
No ratings yet
HW 9
3 pages
Memory Devices
No ratings yet
Memory Devices
19 pages
Intel® Core™ i7-3632QM Processor (6M Cache, Up To 3.20 GHZ) BGA Product Specifications
No ratings yet
Intel® Core™ i7-3632QM Processor (6M Cache, Up To 3.20 GHZ) BGA Product Specifications
3 pages
A Simple Makefile
No ratings yet
A Simple Makefile
14 pages
IBM Power Facts and Features
No ratings yet
IBM Power Facts and Features
19 pages
CS-404-COA_Syllabus
No ratings yet
CS-404-COA_Syllabus
2 pages
000-Chapter1 CPU
100% (1)
000-Chapter1 CPU
92 pages
Belarc Advisor Computer Profile - Inventario Software - Hardware
100% (1)
Belarc Advisor Computer Profile - Inventario Software - Hardware
6 pages
2134-F24-Assignment4
No ratings yet
2134-F24-Assignment4
6 pages
Cao Question Bank Unit 1-5 BT Format 2020-2021
No ratings yet
Cao Question Bank Unit 1-5 BT Format 2020-2021
4 pages
Informatica Dynamic Lookup Cache
No ratings yet
Informatica Dynamic Lookup Cache
6 pages
class01_cs230s22
No ratings yet
class01_cs230s22
54 pages
ESX 5 Esxcli Cheat Sheet
No ratings yet
ESX 5 Esxcli Cheat Sheet
14 pages
Ssis Problems
No ratings yet
Ssis Problems
1 page
Amd Ryzen Embedded 8000 Product Brief
No ratings yet
Amd Ryzen Embedded 8000 Product Brief
4 pages
BIT 2102 Computer Systems and Organization
No ratings yet
BIT 2102 Computer Systems and Organization
21 pages

1 Introduction

Uploaded by

1 Introduction

Uploaded by

Introduction to parallel

▪ Parallel Computers: It is a multi-processor computer system that

▪ Parallel execution of 𝑃 on 𝐶(𝑝) is 𝑆 times faster than sequential

▪ Since, 𝑇𝑝𝑎𝑟 ≤ 𝑇𝑠𝑒𝑞 ≤ 𝑝. 𝑇𝑝𝑎𝑟 , Speedup is bounded by 𝑝 and efficiency

• Time taken to execute the given parallel program= 𝑇𝑝𝑎𝑟 + 𝑇𝑠𝑒𝑞

• Speedup (S) = 𝑇𝑠𝑒𝑞 / 𝑇𝑝𝑎𝑟

▪ Problem of UMA systems is that bandwidth bottlenecks are bound to occur

You might also like