0% found this document useful (0 votes)
5 views28 pages

Part6_AnalyzingParallelPerformance

This document discusses the analysis of parallel performance in programming, focusing on concepts such as speedup, efficiency, and Amdahl's Law. It explains how to calculate maximum speedup, the impact of parallel overhead, and introduces the Karp-Flatt metric for assessing performance. Additionally, it covers the implications of workload imbalance and provides examples of speedup predictions based on different metrics.

Uploaded by

Boy Tân
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views28 pages

Part6_AnalyzingParallelPerformance

This document discusses the analysis of parallel performance in programming, focusing on concepts such as speedup, efficiency, and Amdahl's Law. It explains how to calculate maximum speedup, the impact of parallel overhead, and introduces the Karp-Flatt metric for assessing performance. Additionally, it covers the implications of workload imbalance and provides examples of speedup predictions based on different metrics.

Uploaded by

Boy Tân
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

Introduction to Parallel Programming – Part 6

Analyzing Parallel Performance


Intel Software College
Intel® Software College

Objectives
At the end of this module, you should be able to
Define speedup and efficiency
Use Amdahl’s Law to predict maximum speedup
Use the Karp-Flatt metric to
analyze parallel program performance
predict speedup with additional processors

Analyzing Parallel Performance


2

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Speedup
Speedup is the ratio between sequential execution time and
parallel execution time
For example, if the sequential program executes in 6
seconds and the parallel program executes in 2 seconds, the
speedup is 3

y=x
Speedup

Speedup curves
look like this

Processors

Analyzing Parallel Performance


3

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Efficiency

Efficiency
A measure of processor utilization
Speedup divided by the number of processors
Example
Program achieves speedup of 3 on 4 CPUs
Efficiency is 3 / 4 = 75%

y = 1.0
Efficiency

Efficiency curves
look like this

Processors
Analyzing Parallel Performance
4

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Idea Behind Amdahl’s Law

Portion of computation
that will be performed
sequentially
Execution Time

f Portion of computation
that will be executed
in parallel
f
1-f f
f f
(1-f )/2
(1-f )/3 (1-f )/4
(1-f )/5

Processors
Analyzing Parallel Performance
5

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Derivation of Amdahl’s Law

Speedup is ratio of execution time on 1 processor to


execution time on p processors
Execution time on 1 processor is f + (1-f)
Execution time on p processors is at least f + (1-f)/p

f  (1  f ) 1
 
f  (1  f ) / p f  (1  f ) / p

Analyzing Parallel Performance


6

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Amdahl’s Law Is Too Optimistic

Amdahl’s Law ignores parallel processing overhead


Examples of this overhead include time spent
creating and terminating threads
Parallel processing overhead is usually an increasing
function of the number of processors

Analyzing Parallel Performance


7

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Graph with Parallel Overhead Added

Parallel overhead
Execution Time

increases with
# of processors

Processors
Analyzing Parallel Performance
8

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Other Optimistic Assumptions

Amdahl’s Law assumes that the computation divides


evenly among the processors
In reality, the amount of work does not divide evenly
among the processors
Processor waiting time is another form of overhead
Task started

Working time

Waiting time
Task completed
Analyzing Parallel Performance
9

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Graph with Workload Imbalance Added

Execution Time

Time lost
due to
workload
imbalance
Processors
Analyzing Parallel Performance
10

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

More General Speedup Formula

(n,p) Speedup for problem of size n on p CPUs


(n) Time spent in sequential portion of code
for problem of size n
(n) Time spent in parallelizable portion of
code for problem of size n
(n,p)Parallel overhead

 ( n)   ( n)
 ( n, p ) 
 ( n )   ( n ) / p   ( n, p )
Analyzing Parallel Performance
11

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Amdahl’s Law: Maximum Speedup

 ( n)   ( n)
 ( n, p ) 
 ( n )   ( n ) / p   ( n, p )

Assumes parallel
work divides perfectly
among available CPUs

This term is set to 0

Analyzing Parallel Performance


12

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

The Amdahl Effect

 ( n)   ( n)
 ( n, p ) 
 ( n )   ( n ) / p   ( n, p )

As n   these
terms dominate

Speedup is an increasing function of problem size

Analyzing Parallel Performance


13

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Illustration of the Amdahl Effect


Linear speedup

n = 100,000
Speedup

n = 10,000

n = 1,000

Processors
Analyzing Parallel Performance
14

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Using Amdahl’s Law

Program executes in 5 seconds


Profile reveals 80% of time spent in function alpha,
which we can execute in parallel
What would be maximum speedup on 2 processors?
1 1
  1.67
0.2  (1  0.2) / 2 0.6

New execution time ≥ 5 sec / 1.67 = 3 seconds

Analyzing Parallel Performance


15

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

The Karp-Flatt Metric

Suppose we benchmark a parallel program and get


these speedup figures

Processors Speedup Efficiency


2 1.5 75%

3 1.8 60%

4 2 50%

Why is efficiency dropping?


How much speedup could we expect on 8 processors?

Analyzing Parallel Performance


16

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Deriving the Karp-Flatt Metric

 ( n)   ( n)
 ( n, p ) 
 ( n )   ( n ) / p   ( n, p )
The denominator represents parallel execution time
One processor does sequential code; others idle
All processors incur overhead time

“Wasted time” = (p-1)(n) + p(n, p)


Experimentally determined serial fraction = “wasted
time” divided by (p-1) times sequential time
Analyzing Parallel Performance
17

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Karp-Flatt Metric

1 /  1 / p
e
1  1/ p
The experimentally determined serial fraction is a
function of speedup and the number of processors
We can use e to determine whether efficiency
decreases are due to
Sequential component of computation
Increases in overhead

Analyzing Parallel Performance


18

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

How to Interpret “e”

If “e” is constant as the number of processors


increases, then speedup is constrained by the
sequential component of the computation
If “e” is increasing as the number of processors
increases, then speedup is constrained by
parallel overhead, such as
Thread creation/termination time
Contention for shared data structures
Cache-related inefficiencies
Often a combination of the two factors

Analyzing Parallel Performance


19

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Going Back to Our Example

Processors Speedup Efficiency e


2 1.5 75% 0.33
3 1.8 60% 0.33
4 2.0 50% 0.33

In this case, speedup is constrained by the relatively


large amount of time spent in sequential code

Analyzing Parallel Performance


20

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Example: Rectangle Rule Program

Processors Speedup Efficiency e

2 1.87 93% 0.070

3 2.60 87% 0.078

4 3.16 79% 0.089

Benchmark data from an OpenMP program computing  using


the rectangle rule
We can predict speedup on 6 processors
Extrapolate e to be 0.11
Speedup would be 3.87

Analyzing Parallel Performance


21

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Speedup Prediction Formula


1 /  1 / p
e
1  1/ p
p
 
e( p  1)  1

Analyzing Parallel Performance


22

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Case Study

We benchmark a sequential program and find it


spends 85% of its time in functions we believe we
can make parallel
We make these functions multithreaded and execute
the program on a dual-core system
The parallel program achieves a speedup of 1.67 on
2 processors
If we can get access to a quad-core system, what
kind of speedup should we expect?

Analyzing Parallel Performance


23

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Prediction Based on Amdahl’s Law


1

0.15  (1  0.15) / 4
  2.76

Analyzing Parallel Performance


24

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Prediction Based on Karp-Flatt Metric

When p = 2, e = 0.25
We know 0.15 of e is sequential component
Rest of e (0.05) is parallel overhead
If parallel overhead increases linearly with number of
processors, then it will be 0.15 when p = 3
We predict when p = 4, e = 0.30
Hence when p = 4, we predict speedup of 2.11

Analyzing Parallel Performance


25

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Superlinear Speedup

According to our general speedup formula, the


maximum speedup a program can achieve on p
processors is p
Superlinear speedup is the situation where
speedup is greater than the number of processors
used
It means the computational rate of the processors is
faster when the parallel program is executing
Superlinear speedup is usually caused because the
cache hit rate of the parallel program is higher

Analyzing Parallel Performance


26

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

References

Michael J. Quinn, Parallel Programming in C with MPI


and OpenMP, McGraw-Hill (2004).

Analyzing Parallel Performance


27

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Analyzing Parallel Performance


28

Copyright © 2006, Intel Corporation. All rights reserved.


Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

You might also like