0% found this document useful (0 votes)
80 views45 pages

Module 1 Chapter3

The document discusses principles of scalable performance in parallel computing systems. It covers topics like performance measures, speedup laws, scalability principles, parallelism profiles, degree of parallelism, average parallelism, asymptotic speedup, arithmetic mean performance, geometric mean performance, harmonic mean performance, Amdahl's law, system efficiency, redundancy, utilization, quality of parallelism, standard performance measures, parallel processing applications, algorithm characteristics, isoefficiency concept, speedup performance laws, and Gustafson's law.

Uploaded by

Usha Vizay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views45 pages

Module 1 Chapter3

The document discusses principles of scalable performance in parallel computing systems. It covers topics like performance measures, speedup laws, scalability principles, parallelism profiles, degree of parallelism, average parallelism, asymptotic speedup, arithmetic mean performance, geometric mean performance, harmonic mean performance, Amdahl's law, system efficiency, redundancy, utilization, quality of parallelism, standard performance measures, parallel processing applications, algorithm characteristics, isoefficiency concept, speedup performance laws, and Gustafson's law.

Uploaded by

Usha Vizay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 45

Chapter 3: Principles of Scalable Performance

 Performance measures
 Speedup laws
 Scalability principles
 Scaling up vs. scaling down
Performance metrics and measures
2

 Parallelism profiles
 Asymptotic speedup factor
 System efficiency, utilization and quality
 Standard performance measures
Degree of parallelism
3

 Reflects the matching of software and hardware


parallelism
 Discrete time function – measure, for each time
period, the # of processors used
 Parallelism profile is a plot of the DOP as a function
of time
 Ideally have unlimited resources
Factors affecting parallelism profiles
4

 Algorithm structure
 Program optimization
 Resource utilization
 Run-time conditions
 Realistically limited by # of available processors,
memory, and other nonprocessor resources
Average parallelism variables
5

 n – homogeneous processors
 m – maximum parallelism in a profile
  - computing capacity of a single processor
(execution rate only, no overhead)
 DOP=i – # processors busy during an observation
period
Average parallelism
6

 Total amount of work performed is proportional to


the area under the profile curve

t2
W    DOP (t )dt
t1
m
W   i  ti
i 1
Average parallelism
7

1 t2
A 
t 2  t1 t 1
DOP (t )dt

 m
  m

A    i  ti  /   ti 
 i 1   i 1 
Example: parallelism profile and average
parallelism
8
Asymptotic speedup
9

m
m
T (1)   ti (1)  
m
Wi
T (1) W i

i 1 i 1  S   i 1
T ( ) m
m
T (  )   ti (  )  
m
Wi W / i
i 1
i

i 1 i 1 i = A in the ideal case


(response time)
Performance measures
10

 Consider n processors executing m programs in


various modes
 Want to define the mean performance of these
multimode computers:
 Arithmetic mean performance
 Geometric mean performance
 Harmonic mean performance
Arithmetic mean performance
11

m
Ra   Ri / m Arithmetic mean execution rate
(assumes equal weighting)
i 1
m
R   ( f i Ri )
* Weighted arithmetic mean
a execution rate
i 1

-proportional to the sum of the inverses of


execution times
Geometric mean performance
12

m
Rg   R 1/ m
i
Geometric mean execution rate

i 1
m
R   Ri
*
g
fi Weighted geometric mean
execution rate
i 1
-does not summarize the real performance since it does
not have the inverse relation with the total time
Harmonic mean performance
13

Ti  1 / Ri Mean execution time per instruction


For program i

1 m 1 m 1
Ta  Ti   Arithmetic mean execution time
m i 1 m i 1 Ri per instruction
Harmonic mean performance
14

m
Rh  1 / Ta  m Harmonic mean execution rate
 (1 / R )
i 1
i

1
R 
*
h m
Weighted harmonic mean execution rate

( f
i 1
i / Ri )

-corresponds to total # of operations divided by


the total time (closest to the real performance)
Harmonic Mean Speedup
15

 Ties the various modes of a program to the number


of processors used
 Program is in mode i if i processors used
 Sequential execution time T1 = 1/R1 = 1

1
S  T1 / T 
*


n
i 1
f i / Ri
Harmonic Mean Speedup Performance
16
Amdahl’s Law
17

 Assume Ri = i, w = (, 0, 0, …, 1- )
 System is either sequential, with probability , or
fully parallel with prob. 1- 

n
Sn 
1  (n  1)
 Implies S  1/  as n  
Speedup Performance
18
System Efficiency
19

 O(n) is the total # of unit operations


 T(n) is execution time in unit time steps
 T(n) < O(n) and T(1) = O(1)

S ( N )  T (1) / T (n)

S (n) T (1)
E ( n)  
n nT (n)
Redundancy and Utilization
20

 Redundancy signifies the extent of matching


software and hardware parallelism

R(n)  O(n) / O(1)


 Utilization indicates the percentage of resources kept
busy during execution

O ( n)
U ( n)  R ( n ) E ( n) 
nT (n)
Quality of Parallelism
21

 Directly proportional to the speedup and efficiency


and inversely related to the redundancy
 Upper-bounded by the speedup S(n)

3
S (n) E (n) T (1)
Q(n)   2
R(n) nT (n)O(n)
Example of Performance
22

 Given O(1) = T(1) = n3, O(n) = n3 + n2log n, and T(n)


= 4 n3/(n+3)
 S(n) = (n+3)/4
 E(n) = (n+3)/(4n)
 R(n) = (n + log n)/n
 U(n) = (n+3)(n + log n)/(4n2)
 Q(n) = (n+3)2 / (16(n + log n))
Standard Performance Measures
23

 MIPS and Mflops


 Depends on instruction set and program used

 Dhrystone results
 Measure of integer performance

 Whestone results
 Measure of floating-point performance

 TPS and KLIPS ratings


 Transaction performance and reasoning power
Parallel Processing Applications
24

 Drug design
 High-speed civil transport
 Ocean modeling
 Ozone depletion research
 Air pollution
 Digital anatomy
Application Models for Parallel Computers
25

 Fixed-load model
 Constant workload

 Fixed-time model
 Demands constant program execution time

 Fixed-memory model
 Limited by the memory bound
26
Algorithm Characteristics
27

 Deterministic vs. nondeterministic


 Computational granularity
 Parallelism profile
 Communication patterns and synchronization
requirements
 Uniformity of operations
 Memory requirement and data structures
Isoefficiency Concept
28

 Relates workload to machine size n needed to


maintain a fixed efficiency
workload
w( s)
E overhead
w( s)  h( s, n)
 The smaller the power of n, the more scalable the
system
Isoefficiency Function
29

 To maintain a constant E, w(s) should grow in


proportion to h(s,n)

E
w( s )   h( s, n)
1 E
 C = E/(1-E) is constant for fixed E

f E (n)  C  h(s, n)
Speedup Performance Laws
30

 Amdahl’s law
 for fixed workload or fixed problem size

 Gustafson’s law
 for scaled problems (problem size increases with increased
machine size)
 Speedup model
 for scaled problems bounded by memory capacity
Amdahl’s Law
31

 As number of processors increase, the fixed load is


distributed to more processors
 Minimal turnaround time is primary goal
 Speedup factor is upper-bounded by a sequential
bottleneck
 Two cases:
 DOP < n
 DOP  n
Fixed Load Speedup Factor
32
 Case 1: DOP > n  Case 2: DOP < n

Wi  i 
ti (i )  Wi
i  n  ti ( n)  ti ( ) 
i
m
Wi  i 
T ( n)    
i 1 i  n  m

T (1) W i
Sn   i i
m
Wi i 

T ( n)
 n 
i 1 i
Amdahl’s Law
33
Gustafson’s Law
34

 With Amdahl’s Law, the workload cannot scale to


match the available computing power as n increases
 Gustafson’s Law fixes the time, allowing the problem
size to increase with higher n
 Not saving time, but increasing accuracy
Fixed-time Speedup
35

 As the machine size increases, have increased


workload and new profile
 In general, Wi’ > Wi for 2  i  m’ and W1’ = W1
 Assume T(1) = T’(n)
Gustafson’s Scaled Speedup
36

i 
m m '
Wi

i 1
Wi  
i 1 i
 n   Q ( n )
m'

W i
'
W1  nWn
S ' i 1

W1  Wn
n m

W
i 1
i
Gustafson’s Scaled Speedup
37
Memory Bounded Speedup Model
38

 Idea is to solve largest problem, limited by memory


space
 Results in a scaled workload and higher accuracy
 Each node can handle only a small subproblem for
distributed memory
 Using a large # of nodes collectively increases the
memory capacity proportionally
Fixed-Memory Speedup
39

 Let M be the memory requirement and W the


computational workload: W = g(M)
 g*(nM)=G(n)g(M)=G(n)Wn

m*

W i
*
W1  G (n)Wn
S 
* i 1

W1  G (n)Wn / n
n m*
Wi* i 

i 1 i
 n   Q ( n )
Fixed-Memory Speedup
40

Scaled speedup model using fixed memory


Relating Speedup Models
41

 G(n) reflects the increase in workload as memory


increases n times
 G(n) = 1 : Fixed problem size (Amdahl)
 G(n) = n : Workload increases n times when memory
increased n times (Gustafson)
 G(n) > n : workload increases faster than memory
than the memory requirement
Scalability Metrics
42

 Machine size (n) : # of processors


 Clock rate (f) : determines basic m/c cycle
 Problem size (s) : amount of computational
workload. Directly proportional to T(s,1).
 CPU time (T(s,n)) : actual CPU time for execution
 I/O demand (d) : demand in moving the program,
data, and results for a given run
Scalability Metrics
43

 Memory capacity (m) : max # of memory words


demanded
 Communication overhead (h(s,n)) : amount of time
for interprocessor communication,
synchronization, etc.
 Computer cost (c) : total cost of h/w and s/w
resources required
 Programming overhead (p) : development
overhead associated with an application program
Speedup and Efficiency
44

 The problem size is the independent parameter

T ( s,1)
S ( s , n) 
T ( s , n )  h( s , n )
S ( s, n)
E ( s, n) 
n
Scalable Systems
45

 Ideally, if E(s,n)=1 for all algorithms and any s and n,


system is scalable
 Practically, consider the scalability of a m/c

S ( s, n) TI ( s, n)
 ( s, n)  
S I ( s, n) T ( s, n)

You might also like