Module 4

The document outlines a course module on Shared Memory Programming with OpenMP, focusing on how to apply OpenMP pragmas and directives to parallelize code. It covers various topics including OpenMP pragmas, the trapezoidal rule, reduction clauses, loop scheduling, and tasking, emphasizing the importance of thread safety and cache coherence. The course aims to equip students with the skills to effectively implement parallel programming techniques using OpenMP.

Uploaded by

sujalsujal961162023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views40 pages

Module 4

Uploaded by

sujalsujal961162023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Module -4

Shared Memory Programming with OpenMP

Course Outcome:
At the end of the course, the student will be able to apply OpenMP
pragma and directives to parallelize the code to solve the given
problem
Topic
●
Introduction
●
OpenMP Pragmas and Directives
●
Trapezoidal Rule
●
The Reduction Clause
●
Loop Carried Dependency
●
Scheduling
●
Cache Coherence & False Sharing
●
Tasking
●
Thread Safety
Introduction
●
OpenMP is an “directive-based” API for Shared-Memory MIMD
Programming
●
It allows programmers to incrementally parallelize existing serial programs
●
It allows the programmer to specify the block of code to be executed
parallely
●
However, the parallel execution of code by threads is taken care by
Compiler and Run-time System
●
It requires a C compiler that supports OpenMP
Introduction
●
Shared-memory programs uses “Fork / Join” parallelism
●
At the beginning, a thread called “Master Thread” is active and
executes serial portion of the program
●
When parallel operations are to be executed, it creates(forks)
additional threads
●
At the end of the parallel code, the additional threads created
are destroyed/suspended (joins)
OpenMP Pragmas & Directives
●
OpenMP supports parallelism through directive called “pragma
directive”
●
It always starts with “#pragma omp”
●
The general structure of the directive is:

●
Where ‘directive-name’ specifies the action to be taken
●
‘Clause’ (optional) specifies the behavior of the parallel execution
●
The directive must end with a newline
OpenMP Pragmas & Directives
●
“parallel” pragma:
– It precedes a block of code that should be executed parallely by all
threads
– Syntax is:
#pragma omp parallel
– If the block of code to be executed is not simple, it should be
surrounded by curly braces ({ })
– The code after the ‘parallel’ pragma will be replicated among threads
OpenMP Pragmas & Directives
●
“parallel for” pragma:
– It precedes a for loop that should be parallelized
– Syntax is:
#pragma omp parallel for
– Control clause of the for loop must provide info about no. of
iterations to the run-time system
– The loop index variable will be ‘private’, while other variables
will be ‘shared’
OpenMP Pragmas & Directives
●
“parallel for” pragma: (continued...)
– For loop is parallelized only if following conditions are met:
●
For loop is in canonical form
●
‘break’, ‘return’, ‘exit’ and ‘goto’ statements are not allowed, while
‘continue’ is allowed in for loop
●
The ‘index’ variable must be integer or pointer type
●
The ‘start’, ‘end’ and ‘incr’ must be compatible
●
The ‘start’, ‘end’ and ‘incr’ must not change during execution
●
The ‘index’ must be changed only in the ‘update’ part of for loop
OpenMP Pragmas & Directives

The canonical form of parallel for loop

OpenMP Pragmas & Directives
●
“critical” pragma:
– Used to indicate ‘critical section’ of a parallel program
– It immediately precedes the code that should be executed by
one thread at a time i.e., mutual exclusion
– It must be placed in the parallel section of the code
– Critical section reduces the speedup achieved
OpenMP Pragmas & Directives
●
“single” pragma:
– It tells the compiler that only a single thread should execute
the block of code that follows it
– It must be placed in the parallel section of the code
OpenMP Pragmas & Directives
●
“section” pragma:
– It is used to achieve functional parallelism
– It precedes each function call that is executed parallely by
seperate threads
– It must be specified inside the “parallel sections” pragma
OpenMP Pragmas & Directives
●
“parallel sections” pragma:
– It is used to achieve functional parallelism
– It precedes a block of ‘k’ blocks of code that may be
executed parallely by ‘k’ seperate threads
– The block of ‘k’ code blocks must be specified using curly
braces ({ })
OpenMP Pragmas & Directives
OpenMP Pragmas & Directives
●
“sections” pragma:
– It is used to achieve functional parallelism
– Functionality is similar to “parallel sections” pragma
– However, it appears inside ‘parallel’ pragma and
– It doesn’t create any new threads, instead uses threads
created by ‘parallel’ pragma
OpenMP Pragmas & Directives
OpenMP Pragmas & Directives
●
“private” clause:
– Tells compiler to make one or more variables as ‘private’
– Syntax: private (<variable list>)
OpenMP Pragmas & Directives
●
“omp_get_num_procs” function:
– Returns the number of physical processors available for use
by parallel program

●
“omp_get_num_threads” function:
– Returns the number of active threads in the current parallel
region
OpenMP Pragmas & Directives
●
“omp_set_num_threads” function:
– Used to set the number of threads to be active in the parallel
sections of code

●
“omp_get_thread_num” function:
– Every thread on a multiprocessor has a unique identification
number
– Used to retrieve the unique ID of a thread
Trapezoidal rule
●
Classic example of ‘how to create parallel programs for problems’
●
Problem: find the area under the curve between the 2 extremes ‘a’
and ‘b’, where a < b
●
Solution:
– divide the area under the curve between ‘a’ and ‘b’ into ‘n’
subintervals of equal length
– Calculate the areas of each subinterval and sum it up to get the
total area under the curve
Trapezoidal rule
Trapezoidal rule

Program link1 Program link2

Reduction clause
●
Reductions are those operations which uses the local results of
all threads and combines them into a single result
●
Usually reductions are written in ‘critical section’ which reduces
the speedup
Reduction clause
●
OpenMP provides a ‘reduction’ clause to specify reductions
●
Syntax: reduction (<operation>: <variable>)
●
The reduction clause must be specified with ‘parallel’ pragma
Loop-Carried Dependency
●
Data Dependency: condition where the computation of a data depends
on another data
●
Loop-Carried Dependency: condition where the data computed in an
iteration is used in subsequent iterations in a for loop
●
eg., finding fibonacci series, finding Pi
●
OpenMP compilers doesn’t check for data dependencies, programmer
has to do it
●
A for loop with loop-carried dependency cannot be parallelized correctly
without using features such as Tasking API
Loop-Carried Dependency
Loop-Carried Dependency
●
Default clause:
– It makes the programmer to declare the scope of any variables in a parallel block
– Any variable declared outside but used inside the parallel block must be
declared explicitly
Loop Scheduling
●
In OpenMP assigning iterations to threads is called scheduling
●
OpenMP by default uses block partitioning
●
The ‘schedule’ clause can be used to assign iterations in either a
‘parallel’ or ‘parallel for’ directive
Loop Scheduling
Loop Scheduling
●
‘Static’ Scheduling:
– System assigns ‘chunksize’ iterations to each thread in ‘round-robin’ fashion
– Useful when each iteration takes equal amount of time to execute
– If ‘chunksize’ is omitted, it will be equal to: total_iterations / thread_count

schedule(static,1) schedule(static,2) schedule(static,4)

Loop Scheduling
Loop Scheduling
●
‘dynamic’ Scheduling:
– System assigns ‘chunksize’ iterations to each thread in a ‘first-come first-served’ fashion
– When a thread finishes its chunk, it requests another one from the run-time system
– Useful when loop iterations do not take uniform amount of time to execute
– If ‘chunksize’ is omitted, it will be equal to 1
Loop Scheduling
●
‘guided’ Scheduling:
– It is similar to ‘dynamic’ scheduling
– However, as chunks are completed, the size of new chunks decreases
– Useful when loop iterations do not take uniform amount of time to execute
Loop Scheduling
Loop Scheduling
●
‘runtime’ Scheduling:
– It uses a system variable ‘OMP_SCHEDULE’ to determine at runtime
how to schedule a loop
– The environment variable can take any of the values that can be used
for static, dynamic or guided schedule
– eg. export OMP_SCHEDULE = “static, 1”
Cache coherence & False Sharing
●
Cache coherence: cache memories of threads storing shared variables
●
Cache line or block: a block of content is transfered from main memory
to cache instead of a single value
●
When a thread updates a cache value, the entire cache line is
invalidated in other threads
●
This causes the threads to read values from memory even though they
are not sharing anything
●
This phenomenon is called ‘false sharing’
Cache coherence & False Sharing
●
False sharing has a significant effect on performance of a parallel program
●
eg. Matrix-Vector Multiplication y = Ax, where A=(m*n), x=(n) , y=(m)
Cache coherence & False Sharing

●
8,000,000 * 8 takes 22% more time than 8000 * 8000 due to write-miss at Line 4
in code
●
8 * 8,000,000 takes 26% more time than 8000 * 8000 due to read-miss at Line 6
in code
Tasking
●
While and do-while loops cannot be parallelized by OpenMP
●
For loops with unbounded no. of iterations also cannot be
parallelized
●
This will limit the parallelization: cannot be applied to recursive
algorithms, graph-related algorithms, etc
●
Tasking functionality was created to address this issue
Tasking
●
It allows the developers to specify independent units of computations with the
‘task’ directive
●
Syntax: #pragma omp task
●
When this directive is reached, a new task will be created
●
The new task may not necessarily be executed immediately
●
Tasks must be launched within a ‘parallel’ region by only one thread
●
Hence, tasking generally looks like:

Module 4 PC
No ratings yet
Module 4 PC
31 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
40 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
Unit IV Openmp
No ratings yet
Unit IV Openmp
91 pages
Chap4 OpenMP
No ratings yet
Chap4 OpenMP
35 pages
OpenMP Prefix Sum Techniques
No ratings yet
OpenMP Prefix Sum Techniques
39 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
DS1822-Parallel Computing - Unit2
No ratings yet
DS1822-Parallel Computing - Unit2
25 pages
Unit III
No ratings yet
Unit III
15 pages
Unit 3
No ratings yet
Unit 3
13 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
OpenMP Intro
No ratings yet
OpenMP Intro
52 pages
OpenMP Parallel Programming Guide
No ratings yet
OpenMP Parallel Programming Guide
25 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
OPENMP1
No ratings yet
OPENMP1
67 pages
OpenMP Guide for Parallel Computing
No ratings yet
OpenMP Guide for Parallel Computing
32 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
37 pages
Lecture - 06 (Shared Memory Programming With OpenMP)
No ratings yet
Lecture - 06 (Shared Memory Programming With OpenMP)
65 pages
OpenMP Synchronization Guide
No ratings yet
OpenMP Synchronization Guide
32 pages
Lecture Open MP
No ratings yet
Lecture Open MP
25 pages
OpenMP Overview and Programming Model
No ratings yet
OpenMP Overview and Programming Model
46 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
OpenMP Shared Memory Guide
No ratings yet
OpenMP Shared Memory Guide
35 pages
Openmp HPC Ass1
No ratings yet
Openmp HPC Ass1
43 pages
L1 Openmp
No ratings yet
L1 Openmp
131 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
1 OpenMP Part1
No ratings yet
1 OpenMP Part1
89 pages
M4: Shared Memory Programming With Openmp
No ratings yet
M4: Shared Memory Programming With Openmp
63 pages
OpenMP for Shared Memory Programming
No ratings yet
OpenMP for Shared Memory Programming
30 pages
OpenMP Guide for C/Fortran Programming
No ratings yet
OpenMP Guide for C/Fortran Programming
15 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
Open MP
No ratings yet
Open MP
98 pages
Introduction to OpenMP Basics
No ratings yet
Introduction to OpenMP Basics
152 pages
Open MP
No ratings yet
Open MP
130 pages
Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
No ratings yet
Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
73 pages
Introduction to OpenMP Programming
No ratings yet
Introduction to OpenMP Programming
35 pages
21th 22th Lecture
No ratings yet
21th 22th Lecture
22 pages
Openmp 1
No ratings yet
Openmp 1
38 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
23 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
29 pages
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
No ratings yet
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
50 pages
Chapter 5
No ratings yet
Chapter 5
92 pages
Openmp
No ratings yet
Openmp
61 pages
OpenMP Programming Overview and Examples
No ratings yet
OpenMP Programming Overview and Examples
46 pages
OpenMP Parallel Processing Guide
No ratings yet
OpenMP Parallel Processing Guide
115 pages
Unit Iii
No ratings yet
Unit Iii
61 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
65 pages
Open MP
No ratings yet
Open MP
28 pages
Parallel Programming 2
No ratings yet
Parallel Programming 2
20 pages
4-Css 311 Padc 2-New
No ratings yet
4-Css 311 Padc 2-New
51 pages
OpenMP Programming Guide
No ratings yet
OpenMP Programming Guide
38 pages
Parallel Programming Module 3
No ratings yet
Parallel Programming Module 3
44 pages
CP4253 Map Unit Iii
No ratings yet
CP4253 Map Unit Iii
26 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
OpenMP Parallel Programming Techniques
No ratings yet
OpenMP Parallel Programming Techniques
19 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
8 pages
As400 Applications Moving To The 21st Century
No ratings yet
As400 Applications Moving To The 21st Century
262 pages
Java Programming Exercises
No ratings yet
Java Programming Exercises
173 pages
Lec 6 Classes
No ratings yet
Lec 6 Classes
42 pages
Linked List: - Linked List Is A Set of Nodes Where Each Node Has Two Fields Data and Next
No ratings yet
Linked List: - Linked List Is A Set of Nodes Where Each Node Has Two Fields Data and Next
39 pages
Chapter 4 EER
No ratings yet
Chapter 4 EER
35 pages
Steganography Software SRS Document
100% (1)
Steganography Software SRS Document
5 pages
Module 4
No ratings yet
Module 4
13 pages
5 Possible Asynchronous Messaging Way To Decouple Sender and Reciever
No ratings yet
5 Possible Asynchronous Messaging Way To Decouple Sender and Reciever
21 pages
Aliasing and Cloning Python
No ratings yet
Aliasing and Cloning Python
6 pages
NEERAJ KUMAR Resume
No ratings yet
NEERAJ KUMAR Resume
1 page
Advanced - SE Answers Bank
No ratings yet
Advanced - SE Answers Bank
126 pages
Advanced Web Desigining@Solution For Excersize1
No ratings yet
Advanced Web Desigining@Solution For Excersize1
9 pages
Dot Net Training Course Content
No ratings yet
Dot Net Training Course Content
5 pages
Static vs Dynamic CALL in COBOL
No ratings yet
Static vs Dynamic CALL in COBOL
2 pages
Linux Essentials Chapter 05 Exam Answers 2019 + PDF
100% (1)
Linux Essentials Chapter 05 Exam Answers 2019 + PDF
4 pages
NEXTJSCurriculum
No ratings yet
NEXTJSCurriculum
11 pages
Class 9 Computers MM-100
No ratings yet
Class 9 Computers MM-100
3 pages
Lecture 3-11327
No ratings yet
Lecture 3-11327
51 pages
Java Aat9 Faadil
No ratings yet
Java Aat9 Faadil
7 pages
Learn CSS - The Box Model Cheatsheet - Codecademy
No ratings yet
Learn CSS - The Box Model Cheatsheet - Codecademy
2 pages
C Animation for Students
No ratings yet
C Animation for Students
3 pages
C++ Programming Challenges
No ratings yet
C++ Programming Challenges
5 pages
Ict + Pullback + Gridbot
No ratings yet
Ict + Pullback + Gridbot
41 pages
Leave Management System
No ratings yet
Leave Management System
24 pages
Tvet 2
No ratings yet
Tvet 2
32 pages
Java Programming Question Bank
No ratings yet
Java Programming Question Bank
3 pages
Big Java, Early Objects - Comprehensive Question Bank by Cay Horstmann
No ratings yet
Big Java, Early Objects - Comprehensive Question Bank by Cay Horstmann
14 pages
Hyrje Ne Programim Ne PHP-Leksion
No ratings yet
Hyrje Ne Programim Ne PHP-Leksion
8 pages
Microsoft Windows, Word & Excel Overview
No ratings yet
Microsoft Windows, Word & Excel Overview
9 pages
AldiTalk App Debug Log Analysis
No ratings yet
AldiTalk App Debug Log Analysis
43 pages

Module 4

Uploaded by

Module 4

Uploaded by

Module -4

Shared Memory Programming with OpenMP

The canonical form of parallel for loop

Program link1 Program link2

schedule(static,1) schedule(static,2) schedule(static,4)

You might also like