0% found this document useful (0 votes)
46 views19 pages

01 ParProg20

This document provides an overview of a course on parallel programming with OpenMP and MPI. The course will be taught online with weekly lectures and exercise sessions. The lectures will cover parallel computer architecture, shared-memory and message passing parallel programming, and performance optimization. Students are expected to have some programming experience in C, C++, or Fortran and familiarity with Linux. The course aims to teach how to effectively map numerical algorithms to modern parallel computer hardware.

Uploaded by

ogyxxx76
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views19 pages

01 ParProg20

This document provides an overview of a course on parallel programming with OpenMP and MPI. The course will be taught online with weekly lectures and exercise sessions. The lectures will cover parallel computer architecture, shared-memory and message passing parallel programming, and performance optimization. Students are expected to have some programming experience in C, C++, or Fortran and familiarity with Linux. The course aims to teach how to effectively map numerical algorithms to modern parallel computer hardware.

Uploaded by

ogyxxx76
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Winter term 2020/2021

Parallel Programming with OpenMP and MPI


Dr. Georg Hager
Erlangen Regional Computing Center (RRZE) at Friedrich-Alexander-Universität Erlangen-Nürnberg
Institute of Physics, Universität Greifswald

Lecture 1: Preliminaries (kick-off meeting)


Audience and contact
▪ Audience
▪ Physics, theoretical chemistry, computer science, applied math, materials
science, “computational XYZ”
▪ Everyone who
▪ Needs more computing power than what a laptop/PC can provide
▪ Wants to learn about parallel programming from desktop to supercomputers
▪ Lecturer
▪ Georg Hager [email protected]
▪ Associate lecturer at University of Greifswald, Institute of Physics
▪ PhD 2005, Habilitation 2014 (both in Greifswald)
▪ Contact: Preferably use the Moodle forum
▪ Moodle course: http://tiny.cc/ParProg20

Parallel Programming 2020 2020-10-13 2


Course format
▪ Online lecture
▪ 2 hours (90 minutes) per week
▪ Lecture video published every Monday in moodle

▪ Exercises
▪ One exercise sheet every week
▪ Solutions will be discussed in Q&A (no submits necessary)

▪ Online Q&A session (via BBB) with discussion of exercises


▪ Tuesday 3 p.m.

▪ All material (slides, videos, exercises) available at http://tiny.cc/ParProg20


Parallel Programming 2020 2020-10-13 3
Course prerequisites
▪ Lecture:
▪ Some C, C++, or Fortran programming
▪ Examples are in (simple) C or Fortran

▪ Exercises:
▪ Linux command line (including remote access via SSH)
▪ Recommended Windows tool: MobaXTerm (https://mobaxterm.mobatek.net/)
▪ Handling a compiler on the command line
▪ You will get accounts for accessing the HPC clusters at RRZE (FAU Erlangen-
Nürnberg)

▪ Linux tutorial for n00bs: https://ryanstutorials.net/linuxtutorial/


Parallel Programming 2020 2020-10-13 4
Supporting material
▪ G. Hager and G. Wellein:
Introduction to High Performance Computing
for Scientists and Engineers.
CRC Computational Science Series, 2010.
ISBN 978-1439811924
▪ Documentation:
▪ https://www.openmp.org
▪ https://www.mpi-forum.org

▪ The big ones and more useful HPC-related


information:
▪ https://www.top500.org/

Parallel Programming 2020 2020-10-13 5


Outline of lecture
▪ Basics of parallel computer architecture
▪ Basics of parallel computing
▪ Introduction to shared-memory programming with OpenMP
▪ OpenMP performance issues
▪ Introduction to the Message Passing Interface (MPI)
▪ Advanced MPI
▪ MPI performance issues
▪ Hybrid MPI+OpenMP programming

▪ Goal: A good grasp of the potentials and performance issues of parallel


computing in computational science

Parallel Programming 2020 2020-10-13 6


Supercomputing
HPC applications
© WW1,

▪ What are supercomputers good for?


FAU

▪ Weather and climate prediction


▪ Drug design
▪ Simulation of biochemical reactions
▪ Processing and analysis of measurement data
▪ Properties of condensed matter
▪ Fundamental interactions and structure of matter © T. Exner, Molcad GmbH

▪ Fluid simulations, structural analysis, fluid-structure interaction


▪ Mechanical properties of materials
▪ Rendering of 3D images and movies
▪ Simulation of nuclear explosions
▪ Medical image reconstruction
▪ …

Parallel Programming 2020 2020-10-13 8


HPC algorithms
▪ Whatever the application, there’s usually a numerical algorithm behind it
▪ Computational science → many standard algorithms
▪ “Seven dwarfs”
1. Dense linear algebra
2. Sparse linear algebra
3. Spectral methods
4. N-body methods
5. Structured grids
6. Unstructured grids
7. Monte Carlo methods See also:
The Landscape of Parallel Computing Research:
A View from Berkeley, Chapter 3
Parallel Programming 2020 2020-10-13 9
Parallel computing
Task: Map a numerical algorithm to the hardware of a parallel computer

𝑣𝑖 = ෍ 𝐴𝑖𝑗 𝑏𝑗 ???
𝑗=1

Goal: Execute the task as fast and effective as possible


Parallel Programming 2020 2020-10-13 10
Parallelism in modern computers
Core Node (2 sockets + memory + I/O,
Registers Exec. units possibly multiple chips

Memory
Socket
per socket)
L1 cache
L2 cache

Memory
Socket
core core core core
core core core core …
core core core core
Supercomputer
L3 cache (many nodes, high-performance
network, storage)
Chip (up to 64 Cores)
Parallel Programming 2020 2020-10-13 11
The Top500 list
▪ Survey of the 500 most powerful supercomputers
▪ http://www.top500.org
▪ Performance ranking?
▪ Solve large dense system of equations: 𝐴𝑥 = 𝑏 (“LINPACK”)

▪ Max. performance achieved with 64-Bit floating-point numbers: 𝑅𝑚𝑎𝑥


▪ Published twice a year (ISC in Germany, SC in USA)

▪ First: 1993 (#1: CM5 / 1,024 procs.): 60 Gflop/s


▪ June 2020 (#1: Fugaku / 7.3 mio procs): 415.5 Pflop/s

▪ Performance increase: 79% p.a. from 1993 – 2020


Parallel Programming 2020 2020-10-13 12
What is “performance”?

Performance metric:
“Flops” (+ - * /)
Lattice site updates
Iterations
“Solving the problem”...

Work
𝑃=
Time
“Wall-clock time”

Parallel Programming 2020 2020-10-13 13


The flop is quite popular…
▪ Flop == Floating-point operation (add, subtract, multiply, divide)
▪ Flop/s == “how many flops can be done per second?”

▪ How many flops can be done by a machine at most (“peak performance”)?


▪ Depends on accuracy of input operands (double, float, half-precision)
▪ Divides are slow and thus usually neglected

▪ Some double-precision peak numbers to get you orientated…


▪ Top500 range (June 2020): 2.6 Pflop/s … 514 Pflop/s
▪ Modern multicore server CPU (AMD Rome 7742): 2.3 Tflop/s
▪ Your PC: 100 … 500 Gflop/s (+ GPU 0.5 … 10 Tflop/s)
▪ Your cellphone: 5 … 50 Gflop/s
Parallel Programming 2020 2020-10-13 14
Supercomputing in Germany
Jülich Supercomputing Center:
JUWELS (9.9 PF/s)

Hannover Berlin

JSC RRZE (0.5 PF/s)

LRZ
HLRS Leibniz Supercomputing
Center: SuperMUC-NG
(26.8 PF/s)
HLRS: Hawk (26 PF/s)

Parallel Programming 2020 2020-10-13 15


RRZE “Meggie” cluster (you will get access to this!)
▪ 728 Compute nodes (14.560 cores)
▪ 2x Intel Xeon E5-2630 v4 (Broadwell) 2.2 GHz (10 cores)
▪ 20 cores/node
▪ 64 GB main memory per node
▪ No local disks
▪ Peak Performance: 𝑅𝑝𝑒𝑎𝑘 = 0.5 Pflop/s

▪ #346 @TOP500 (Nov. 2016)


▪ 𝑅𝑚𝑎𝑥 = 0.48 Pflop/s

▪ Price tag: 2.5 million €


▪ Power consumption: 120 kW – 210 kW (depending on workload)
Parallel Programming 2020 2020-10-13 16
Power consumption of RRZE HPC systems (last 7 days)

Parallel Programming 2020 2020-10-13 17


Power consumption of supercomputers
▪ Cost of electrical energy (example FAU): 20 ct/kWh
▪ 1 MW of power costs 1.8 million € per year
→ cost of electrical power over lifetime ≈ investment sum
▪ This does not include the cost for cooling (may be 5% … 150% of electrical
power)
▪ ≈ 1000 €/a for a typical
server

▪ Other countries have


different boundary
conditions
▪ US: 7ct/kWh for industrial
customers (2019)
Parallel Programming 2020 2020-10-13 18
Take-home messages
▪ Supercomputers are parallel computers
▪ No parallelism → no performance
▪ It’s your task to write parallel code (or use parallel programs that someone else
wrote)
▪ Even your desktop PC is a parallel computer nowadays

▪ Supercomputers are expensive


▪ … to buy
▪ … and to run,

so their efficient use is paramount


▪ → learn how to write efficient parallel programs

Parallel Programming 2020 2020-10-13 19

You might also like