0% found this document useful (0 votes)

80 views45 pages

Module 1 Chapter3

The document discusses principles of scalable performance in parallel computing systems. It covers topics like performance measures, speedup laws, scalability principles, parallelism profiles, degree of parallelism, average parallelism, asymptotic speedup, arithmetic mean performance, geometric mean performance, harmonic mean performance, Amdahl's law, system efficiency, redundancy, utilization, quality of parallelism, standard performance measures, parallel processing applications, algorithm characteristics, isoefficiency concept, speedup performance laws, and Gustafson's law.

Uploaded by

Usha Vizay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views45 pages

Module 1 Chapter3

Uploaded by

Usha Vizay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 45

Chapter 3: Principles of Scalable Performance

 Performance measures
 Speedup laws
 Scalability principles
 Scaling up vs. scaling down
Performance metrics and measures
2

 Parallelism profiles
 Asymptotic speedup factor
 System efficiency, utilization and quality
 Standard performance measures
Degree of parallelism
3

 Reflects the matching of software and hardware

parallelism
 Discrete time function – measure, for each time
period, the # of processors used
 Parallelism profile is a plot of the DOP as a function
of time
 Ideally have unlimited resources
Factors affecting parallelism profiles
4

 Algorithm structure
 Program optimization
 Resource utilization
 Run-time conditions
 Realistically limited by # of available processors,
memory, and other nonprocessor resources
Average parallelism variables
5

 n – homogeneous processors
 m – maximum parallelism in a profile
  - computing capacity of a single processor
(execution rate only, no overhead)
 DOP=i – # processors busy during an observation
period
Average parallelism
6

 Total amount of work performed is proportional to

the area under the profile curve

t2
W    DOP (t )dt
t1
m
W   i  ti
i 1
Average parallelism
7

1 t2
A 
t 2  t1 t 1
DOP (t )dt

 m
  m

A    i  ti  /   ti 
 i 1   i 1 
Example: parallelism profile and average
parallelism
8
Asymptotic speedup
9

m
m
T (1)   ti (1)  
m
Wi
T (1) W i

i 1 i 1  S   i 1
T ( ) m
m
T (  )   ti (  )  
m
Wi W / i
i 1
i

i 1 i 1 i = A in the ideal case

(response time)
Performance measures
10

 Consider n processors executing m programs in

various modes
 Want to define the mean performance of these
multimode computers:
 Arithmetic mean performance
 Geometric mean performance
 Harmonic mean performance
Arithmetic mean performance
11

m
Ra   Ri / m Arithmetic mean execution rate
(assumes equal weighting)
i 1
m
R   ( f i Ri )
* Weighted arithmetic mean
a execution rate
i 1

-proportional to the sum of the inverses of

execution times
Geometric mean performance
12

m
Rg   R 1/ m
i
Geometric mean execution rate

i 1
m
R   Ri
*
g
fi Weighted geometric mean
execution rate
i 1
-does not summarize the real performance since it does
not have the inverse relation with the total time
Harmonic mean performance
13

Ti  1 / Ri Mean execution time per instruction

For program i

1 m 1 m 1
Ta  Ti   Arithmetic mean execution time
m i 1 m i 1 Ri per instruction
Harmonic mean performance
14

m
Rh  1 / Ta  m Harmonic mean execution rate
 (1 / R )
i 1
i

1
R 
*
h m
Weighted harmonic mean execution rate

( f
i 1
i / Ri )

-corresponds to total # of operations divided by

the total time (closest to the real performance)
Harmonic Mean Speedup
15

 Ties the various modes of a program to the number

of processors used
 Program is in mode i if i processors used
 Sequential execution time T1 = 1/R1 = 1

1
S  T1 / T 
*


n
i 1
f i / Ri
Harmonic Mean Speedup Performance
16
Amdahl’s Law
17

 Assume Ri = i, w = (, 0, 0, …, 1- )
 System is either sequential, with probability , or
fully parallel with prob. 1- 

n
Sn 
1  (n  1)
 Implies S  1/  as n  
Speedup Performance
18
System Efficiency
19

 O(n) is the total # of unit operations

 T(n) is execution time in unit time steps
 T(n) < O(n) and T(1) = O(1)

S ( N )  T (1) / T (n)

S (n) T (1)
E ( n)  
n nT (n)
Redundancy and Utilization
20

 Redundancy signifies the extent of matching

software and hardware parallelism

R(n)  O(n) / O(1)

 Utilization indicates the percentage of resources kept
busy during execution

O ( n)
U ( n)  R ( n ) E ( n) 
nT (n)
Quality of Parallelism
21

 Directly proportional to the speedup and efficiency

and inversely related to the redundancy
 Upper-bounded by the speedup S(n)

3
S (n) E (n) T (1)
Q(n)   2
R(n) nT (n)O(n)
Example of Performance
22

 Given O(1) = T(1) = n3, O(n) = n3 + n2log n, and T(n)

= 4 n3/(n+3)
 S(n) = (n+3)/4
 E(n) = (n+3)/(4n)
 R(n) = (n + log n)/n
 U(n) = (n+3)(n + log n)/(4n2)
 Q(n) = (n+3)2 / (16(n + log n))
Standard Performance Measures
23

 MIPS and Mflops

 Depends on instruction set and program used

 Dhrystone results
 Measure of integer performance

 Whestone results
 Measure of floating-point performance

 TPS and KLIPS ratings

 Transaction performance and reasoning power
Parallel Processing Applications
24

 Drug design
 High-speed civil transport
 Ocean modeling
 Ozone depletion research
 Air pollution
 Digital anatomy
Application Models for Parallel Computers
25

 Fixed-load model
 Constant workload

 Fixed-time model
 Demands constant program execution time

 Fixed-memory model
 Limited by the memory bound
26
Algorithm Characteristics
27

 Deterministic vs. nondeterministic

 Computational granularity
 Parallelism profile
 Communication patterns and synchronization
requirements
 Uniformity of operations
 Memory requirement and data structures
Isoefficiency Concept
28

 Relates workload to machine size n needed to

maintain a fixed efficiency
workload
w( s)
E overhead
w( s)  h( s, n)
 The smaller the power of n, the more scalable the
system
Isoefficiency Function
29

 To maintain a constant E, w(s) should grow in

proportion to h(s,n)

E
w( s )   h( s, n)
1 E
 C = E/(1-E) is constant for fixed E

f E (n)  C  h(s, n)
Speedup Performance Laws
30

 Amdahl’s law
 for fixed workload or fixed problem size

 Gustafson’s law
 for scaled problems (problem size increases with increased
machine size)
 Speedup model
 for scaled problems bounded by memory capacity
Amdahl’s Law
31

 As number of processors increase, the fixed load is

distributed to more processors
 Minimal turnaround time is primary goal
 Speedup factor is upper-bounded by a sequential
bottleneck
 Two cases:
 DOP < n
 DOP  n
Fixed Load Speedup Factor
32
 Case 1: DOP > n  Case 2: DOP < n

Wi  i 
ti (i )  Wi
i  n  ti ( n)  ti ( ) 
i
m
Wi  i 
T ( n)    
i 1 i  n  m

T (1) W i
Sn   i i
m
Wi i 

T ( n)
 n 
i 1 i
Amdahl’s Law
33
Gustafson’s Law
34

 With Amdahl’s Law, the workload cannot scale to

match the available computing power as n increases
 Gustafson’s Law fixes the time, allowing the problem
size to increase with higher n
 Not saving time, but increasing accuracy
Fixed-time Speedup
35

 As the machine size increases, have increased

workload and new profile
 In general, Wi’ > Wi for 2  i  m’ and W1’ = W1
 Assume T(1) = T’(n)
Gustafson’s Scaled Speedup
36

i 
m m '
Wi

i 1
Wi  
i 1 i
 n   Q ( n )
m'

W i
'
W1  nWn
S ' i 1

W1  Wn
n m

W
i 1
i
Gustafson’s Scaled Speedup
37
Memory Bounded Speedup Model
38

 Idea is to solve largest problem, limited by memory

space
 Results in a scaled workload and higher accuracy
 Each node can handle only a small subproblem for
distributed memory
 Using a large # of nodes collectively increases the
memory capacity proportionally
Fixed-Memory Speedup
39

 Let M be the memory requirement and W the

computational workload: W = g(M)
 g*(nM)=G(n)g(M)=G(n)Wn

W i
*
W1  G (n)Wn
S 
* i 1

W1  G (n)Wn / n
n m*
Wi* i 

i 1 i
 n   Q ( n )
Fixed-Memory Speedup
40

Scaled speedup model using fixed memory

Relating Speedup Models
41

 G(n) reflects the increase in workload as memory

increases n times
 G(n) = 1 : Fixed problem size (Amdahl)
 G(n) = n : Workload increases n times when memory
increased n times (Gustafson)
 G(n) > n : workload increases faster than memory
than the memory requirement
Scalability Metrics
42

 Machine size (n) : # of processors

 Clock rate (f) : determines basic m/c cycle
 Problem size (s) : amount of computational
workload. Directly proportional to T(s,1).
 CPU time (T(s,n)) : actual CPU time for execution
 I/O demand (d) : demand in moving the program,
data, and results for a given run
Scalability Metrics
43

 Memory capacity (m) : max # of memory words

demanded
 Communication overhead (h(s,n)) : amount of time
for interprocessor communication,
synchronization, etc.
 Computer cost (c) : total cost of h/w and s/w
resources required
 Programming overhead (p) : development
overhead associated with an application program
Speedup and Efficiency
44

 The problem size is the independent parameter

T ( s,1)
S ( s , n) 
T ( s , n )  h( s , n )
S ( s, n)
E ( s, n) 
n
Scalable Systems
45

 Ideally, if E(s,n)=1 for all algorithms and any s and n,

system is scalable
 Practically, consider the scalability of a m/c

S ( s, n) TI ( s, n)
 ( s, n)  
S I ( s, n) T ( s, n)

Instant Download Food Quality Balancing Health and Disease 1st Edition Alina Maria Holban PDF All Chapters
100% (6)
Instant Download Food Quality Balancing Health and Disease 1st Edition Alina Maria Holban PDF All Chapters
55 pages
[FREE PDF sample] Intelligence Security in the European Union: Building a Strategic Intelligence Community 1st Edition Artur Gruszczak ebooks
100% (2)
[FREE PDF sample] Intelligence Security in the European Union: Building a Strategic Intelligence Community 1st Edition Artur Gruszczak ebooks
62 pages
Offence, Penalty, Appeal (Customs)
No ratings yet
Offence, Penalty, Appeal (Customs)
24 pages
Carrier 30RB-22PD Product Data
No ratings yet
Carrier 30RB-22PD Product Data
96 pages
Datasheet Storage M Series
No ratings yet
Datasheet Storage M Series
8 pages
Performance Measures and Metrics
0% (1)
Performance Measures and Metrics
21 pages
Cruz V Gruspe
100% (1)
Cruz V Gruspe
2 pages
Law Commission Report No. 197 - Public Prosecutor S Appointments, 2006
No ratings yet
Law Commission Report No. 197 - Public Prosecutor S Appointments, 2006
39 pages
Lancer EVO 9 ECU Connections
100% (1)
Lancer EVO 9 ECU Connections
1 page
Chapter 3: Principles of Scalable Performance
No ratings yet
Chapter 3: Principles of Scalable Performance
41 pages
Architectural History-2: Comprehensive Architecture Licensure Examination Review + Preparation Program
No ratings yet
Architectural History-2: Comprehensive Architecture Licensure Examination Review + Preparation Program
6 pages
Scalability and Performance
No ratings yet
Scalability and Performance
2 pages
Tatehata Akira Nippon.com
No ratings yet
Tatehata Akira Nippon.com
1 page
PDC Lecture 03
No ratings yet
PDC Lecture 03
36 pages
PDC Lecture 02
No ratings yet
PDC Lecture 02
35 pages
Pepper Presentation
No ratings yet
Pepper Presentation
38 pages
UNIT-2 ACA
No ratings yet
UNIT-2 ACA
24 pages
Art Villas Costa Rica by ARCHWERK and Formafatal Architecture
No ratings yet
Art Villas Costa Rica by ARCHWERK and Formafatal Architecture
8 pages
PDC Lecture 3
No ratings yet
PDC Lecture 3
31 pages
Unit 1 - Part 3
No ratings yet
Unit 1 - Part 3
17 pages
Week 9 Performance Metrics
No ratings yet
Week 9 Performance Metrics
10 pages
Financial Acc Short Notes
No ratings yet
Financial Acc Short Notes
11 pages
Unit 6'
No ratings yet
Unit 6'
15 pages
You Exec - Workflow Automation Free
No ratings yet
You Exec - Workflow Automation Free
12 pages
How To Save Money Ebook
No ratings yet
How To Save Money Ebook
57 pages
PDC Last Min Notes For MCQS - Theory
No ratings yet
PDC Last Min Notes For MCQS - Theory
39 pages
3 PPT Guidelines
No ratings yet
3 PPT Guidelines
4 pages
CS-3006_10_PerformanceAnalysis
No ratings yet
CS-3006_10_PerformanceAnalysis
52 pages
Lecture # 21 (1)
No ratings yet
Lecture # 21 (1)
16 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
61 pages
Desempenho No Trabalho: Revisão de Literatura
No ratings yet
Desempenho No Trabalho: Revisão de Literatura
16 pages
Stanley Access Technologies: Dura-Glide SLH (Slim Line Header)
No ratings yet
Stanley Access Technologies: Dura-Glide SLH (Slim Line Header)
2 pages
unit1 2 and 3
No ratings yet
unit1 2 and 3
76 pages
HPC Ut 2
No ratings yet
HPC Ut 2
4 pages
Event Republic Company Profile 20133
No ratings yet
Event Republic Company Profile 20133
29 pages
2nd
No ratings yet
2nd
19 pages
2.0 DD2356 DiscussingSpeedUp
No ratings yet
2.0 DD2356 DiscussingSpeedUp
13 pages
Unit 2 - 2.1(Parallel Approaches) (1)
No ratings yet
Unit 2 - 2.1(Parallel Approaches) (1)
11 pages
How Much Parallelism
No ratings yet
How Much Parallelism
23 pages
CS-3006_4_PerformanceAnalysis
No ratings yet
CS-3006_4_PerformanceAnalysis
62 pages
CS439 CC 2 Parallel Distributed Systems[1]
No ratings yet
CS439 CC 2 Parallel Distributed Systems[1]
37 pages
HW2 Solutions
No ratings yet
HW2 Solutions
4 pages
SectorBrief Cambodia HealthCareSector
No ratings yet
SectorBrief Cambodia HealthCareSector
8 pages
2 Performance Matirices
No ratings yet
2 Performance Matirices
22 pages
15CS72 ACA Module1 Chapter3FinalCopy
No ratings yet
15CS72 ACA Module1 Chapter3FinalCopy
28 pages
Critical HRD: A Concept Analysis
No ratings yet
Critical HRD: A Concept Analysis
13 pages
Week_7 (1)
No ratings yet
Week_7 (1)
27 pages
HP4-5 2
No ratings yet
HP4-5 2
3 pages
Screenshot 2024-12-05 at 2.01.32 PM
No ratings yet
Screenshot 2024-12-05 at 2.01.32 PM
49 pages
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
No ratings yet
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
13 pages
HPC 4th Unit - 240504 - 160030
No ratings yet
HPC 4th Unit - 240504 - 160030
19 pages
2nd Puc Bstud MQP
No ratings yet
2nd Puc Bstud MQP
4 pages
12 MPIProgramPerformance
No ratings yet
12 MPIProgramPerformance
33 pages
PDC ch#5
No ratings yet
PDC ch#5
12 pages
ch4 PC
No ratings yet
ch4 PC
76 pages
BDS-Session-2
No ratings yet
BDS-Session-2
58 pages
Week 7
No ratings yet
Week 7
27 pages
hpc_parallel
No ratings yet
hpc_parallel
122 pages
PDC Week 2 (Performance Metrice, Amdahl's Law)
No ratings yet
PDC Week 2 (Performance Metrice, Amdahl's Law)
18 pages
Cours 2
No ratings yet
Cours 2
25 pages
Performance Evaluation of Parallel Computers
No ratings yet
Performance Evaluation of Parallel Computers
37 pages
Lect 02
No ratings yet
Lect 02
51 pages
Lecture Week - 3 Amdahl Law 1
No ratings yet
Lecture Week - 3 Amdahl Law 1
19 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
34 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
A Thesis of Numerical Simulation of Flow Through Open Channel With Series of Groins by Suman Jyoti
No ratings yet
A Thesis of Numerical Simulation of Flow Through Open Channel With Series of Groins by Suman Jyoti
46 pages
Meow
No ratings yet
Meow
4 pages
At The Shrine of Our Lady of Fatima
No ratings yet
At The Shrine of Our Lady of Fatima
11 pages
AWKUM Migration-Form-2015 PDF
No ratings yet
AWKUM Migration-Form-2015 PDF
2 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
Course Outcome 1:: 15Cs4180 - Parallel Computing
No ratings yet
Course Outcome 1:: 15Cs4180 - Parallel Computing
23 pages
High Performance Computing With Applications in R: Florian Schwendinger, Gregor Kastner, Stefan Theußl
No ratings yet
High Performance Computing With Applications in R: Florian Schwendinger, Gregor Kastner, Stefan Theußl
68 pages
Performance&Scalability Ch3
No ratings yet
Performance&Scalability Ch3
41 pages
Latex Beamer
No ratings yet
Latex Beamer
68 pages
Pitic Marketing Presentation by Slidesgo
No ratings yet
Pitic Marketing Presentation by Slidesgo
53 pages
Performance Analysis: PE PE
No ratings yet
Performance Analysis: PE PE
10 pages
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
No ratings yet
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
58 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Lecture 4 Analytical Modeling of Parallel Programs
No ratings yet
Lecture 4 Analytical Modeling of Parallel Programs
11 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
CSCI 8150 Advanced Computer Architecture
No ratings yet
CSCI 8150 Advanced Computer Architecture
26 pages
3-6 Pasolink NEO Series LCT Operation
No ratings yet
3-6 Pasolink NEO Series LCT Operation
22 pages
Zindagi Zama Da
No ratings yet
Zindagi Zama Da
21 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
33 pages
Pc98 Lect5 Part1 Speedup
No ratings yet
Pc98 Lect5 Part1 Speedup
36 pages
Principles of Scalable Performance
0% (1)
Principles of Scalable Performance
7 pages
Bowen Craggs - Explain Yourself Index US 2019
No ratings yet
Bowen Craggs - Explain Yourself Index US 2019
16 pages
Performance and Scalability Class
No ratings yet
Performance and Scalability Class
63 pages
Some Case Studies on Signal, Audio and Image Processing Using Matlab
From Everand
Some Case Studies on Signal, Audio and Image Processing Using Matlab
Dr. Hedaya Mahmood Alasooly
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet

Module 1 Chapter3

Uploaded by

Module 1 Chapter3

Uploaded by

Chapter 3: Principles of Scalable Performance

 Reflects the matching of software and hardware

 Total amount of work performed is proportional to

i 1 i 1 i = A in the ideal case

 Consider n processors executing m programs in

-proportional to the sum of the inverses of

Ti  1 / Ri Mean execution time per instruction

-corresponds to total # of operations divided by

 Ties the various modes of a program to the number

 O(n) is the total # of unit operations

 Redundancy signifies the extent of matching

R(n)  O(n) / O(1)

 Directly proportional to the speedup and efficiency

 Given O(1) = T(1) = n3, O(n) = n3 + n2log n, and T(n)

 MIPS and Mflops

 TPS and KLIPS ratings

 Deterministic vs. nondeterministic

 Relates workload to machine size n needed to

 To maintain a constant E, w(s) should grow in

 As number of processors increase, the fixed load is

 With Amdahl’s Law, the workload cannot scale to

 As the machine size increases, have increased

 Idea is to solve largest problem, limited by memory

 Let M be the memory requirement and W the

Scaled speedup model using fixed memory

 G(n) reflects the increase in workload as memory

 Machine size (n) : # of processors

 Memory capacity (m) : max # of memory words

 The problem size is the independent parameter

 Ideally, if E(s,n)=1 for all algorithms and any s and n,

You might also like