SlideShare a Scribd company logo
IBM POWER8 as an HPC platform
Alexander Pozdneev, Georgy Pavlov
IBM
October 23, 2015 — IBM Linux on Power: Platform News
1 c 2015 IBM Corporation
What is HPC?
• HPC — High Performance Computing
• A.k.a. technical computing
• Aeroacoustics: Effects of chevrons on jet noise
• Supersonic jet engine noise computational fluid dynamics simulation
• 128k Blue Gene/P cores — ≈ 100 hours
• 1M Blue Gene/Q cores — ≈ 12 hours
http://youtu.be/cjoz5tncRUs http://youtu.be/uxT-VmY3OWc
2 c 2015 IBM Corporation
Secrets of the Dark Universe
• Cosmology: The evolution of the Universe simulation
• Understanding the physics of the dark matter and energy
• 1 BG/Q rack — 68B particles
• 32 BG/Q racks — 1.1T particles
http://www.youtube.com/watch?v=tdv8yrJk4VE http://www.youtube.com/watch?v=-S-T_iTiAxQ
3 c 2015 IBM Corporation
Real-time modeling of human heart ventricles
• Physiology: Simulation of drug-induced
arrhythmias
• Resolution — 0.1 mm
• 768k Blue Gene/Q cores
• 43% peak
• http://dl.acm.org/citation.cfm?id=2388999
• LLNL, IBM Research, IBM Research
Collaboratory for Life Sciences
4 c 2015 IBM Corporation
Modelling of a complete human viral pathogen poliovirus
• Molecular biology: Reconstruction and simulation of poliovirus
• Antiviral drugs, virus infection, modelling related viruses
• 3.3M–3.7M atoms
• Blue Gene/Q, Victorian Life Sciences Computing Initiative
• http://www.youtube.com/watch?v=Nih0Qa673FY
5 c 2015 IBM Corporation
Speakers
• Alexander Pozdneev
Research Software Engineer
HPC
• Georgy Pavlov
Software Engineer
ESSL Russian team leader
6 c 2015 IBM Corporation
Outline
1 Data centric computing as a new HPC paradigm
2 Architecture of IBM HPC systems based on POWER8+NVIDIA servers
3 Software stack of IBM HPC systems
4 IBM HPC mathematical libraries
5 Measuring efficiency of an HPC system on real applications
6 Summary
7 c 2015 IBM Corporation
Application diversity
Image credit: http://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=POL03229USEN
8 c 2015 IBM Corporation
Data centric computing as a new HPC paradigm
• That is all about moving data around
• Memory bandwidth
• Memory latency
• High value of “memory access operations” / “computations”
• Number of FLOPs1 per cycle is no longer relevant
• Offloading computing to memory (Active Memory Cube by Micron)
1
FLOP — Floating-Point Operation
9 c 2015 IBM Corporation
Department of Energy CORAL project
10 c 2015 IBM Corporation
IBM systems in CORAL project
11 c 2015 IBM Corporation
Architecture of POWER8+NVIDIA systems for HPC
12 c 2015 IBM Corporation
Overview of IBM Power System S822LC
Power S822LC model 8335-GTA
• POWER8 processor module:
8-core, 3.32 GHz
10-core, 2.92 GHz
• Two sockets
• Graphics processing units
Two NVIDIA K80 GPUs
• Eight memory slots
• 2U height
13 c 2015 IBM Corporation
Architecture of IBM Power System S822LC
14 c 2015 IBM Corporation
System software
Software stack of IBM HPC systems
• System software
Operating system: Linux, bare-metal (no virtualization)
Drivers:
• Mellanox InfiniBand OFED
• NVIDIA
Deployment: xCAT
• Parallel operating environment
IBM Parallel Environment Runtime Edition (PE RTE)
Workload scheduler: IBM Platform LSF
Parallel filesystem: IBM Spectrum Scale (“GPFS”2
)
2
General Parallel File System
15 c 2015 IBM Corporation
Development tools
Software stack of IBM HPC systems
• Compilers
IBM XL C/C++/Fortran compilers
IBM Advance Toolchain, http://ibm.co/AdvanceToolchain
• Fork of GNU compiler/tools optimized for POWER8
• gcc, g++, gfortran
• Analysis tools (oprofile, valgrind, itrace)
Vanilla GCC, binutils, etc.
CUDA Toolkit
• IBM Parallel Environment Developer Edition (PE DE)
• IBM Software Development Kit for Linux on Power
• Mathematical libraries
Mathematical Acceleration Subsystem (MASS)
IBM Engineering and Scientific Subroutine Library (ESSL)
IBM Parallel ESSL
16 c 2015 IBM Corporation
Engineering and Scientific Subroutine Library
• High-performance mathematical functions
Scientific applications
Engineering applications
• Platforms
IBM POWER servers
IBM POWER clusters
• Libraries
ESSL Serial and SMP: 600+ subroutines
(SMP — Symmetric Multi-Processing)
Parallel ESSL: 125+ subroutines
• Languages:
C
C++
Fortran
http://www.ibm.com/systems/power/software/essl
17 c 2015 IBM Corporation
ESSL: Industry de facto standards
• ESSL implements the following interfaces:
BLAS (linear algebra)
LAPACK (linear algebra)
FFTW (Fourier transformation)
• Parallel ESSL implements the following interfaces:
ScaLAPACK
• Easy migration
• Just recompile! http://fftw.org
18 c 2015 IBM Corporation
What mathematical areas are supported?
ESSL
• Linear algebra subprograms
• Matrix operations
• Linear algebraic equations
• Eigensystems analysis
• Fourier transforms, convolution,
correlation, . . .
• Sorting and searching
• Interpolation
• Numerical quadrature
• Random number generation
Parallel ESSL
• BLACS
• Level 2 parallel BLAS
• Level 3 parallel BLAS
• Linear algebraic equations
• Eigensystems analysis
• Fourier transforms
• Random number generation
19 c 2015 IBM Corporation
How to leverage the hardware?
Symmetric multiprocessing:
• Multiple hardware threads
• Multiple cores
POWER8+NVIDIA:
• Use multiple GPUs
• Select which GPU to use
• Run ESSL in a hybrid mode
20 c 2015 IBM Corporation
Synthetic benchmarks vs. real apps
Measuring car pollution in official tests?
• You get low toxic nitrogen oxides in a lab environment
• You cannot predict how much smoke you produce,
unless you test your scenarios
• You would run a testdrive prior to car purchase
21 c 2015 IBM Corporation
Threads behavior: Typical vs. HPC
Typical workload
HPC workload
22 c 2015 IBM Corporation
Importance of threads affinity
NAS Parallel Benchmarks, mg.C (peaks at SMT1), 20 cores
23 c 2015 IBM Corporation
Choice of compilation parameters: -O5 -qnohot
NAS Parallel Benchmarks, bt.C, affinity, baseline: -O3 -qhot
24 c 2015 IBM Corporation
Compilation parameters: -O3 -qhot, -O5 -qprefetch
NAS Parallel Benchmarks, mg.C, affinity, baseline: -O5 -qnohot
25 c 2015 IBM Corporation
Choice of an SMT mode: SMT1
NAS Parallel Benchmarks, mg.C, affinity, baseline: SMT8
26 c 2015 IBM Corporation
Choice of an SMT mode: SMT2, SMT4
NAS Parallel Benchmarks, bt.C, affinity, baseline: SMT1
27 c 2015 IBM Corporation
Choice of an SMT mode: SMT8
NAS Parallel Benchmarks, cg.C, affinity, baseline: SMT1
28 c 2015 IBM Corporation
Summary
• Technical computing problems ⇒ need for HPC
• Data centric computing as a new HPC paradigm
• CORAL project
• IBM Power System S822LC model 8335-GTA
• IBM HPC Software stack
• High performance math libraries
• Leveraging performance
29 c 2015 IBM Corporation
Publications
http://www.redbooks.ibm.com/abstracts/sg248263.html
30 c 2015 IBM Corporation
Further reads
• XL C/C++ for Linux,
http://www.ibm.com/support/knowledgecenter/SSXVZZ/
• XL Fortran for Linux,
http://www.ibm.com/support/knowledgecenter/SSAT4T/
• XL C/C++ for Linux 13.1.2 Optimization and Programming Guide,
http:
//www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.2/com.
ibm.xlcpp1312.lelinux.doc/proguide/optimization.html
• XL Fortran for Linux 15.1.2 Optimization and Programming Guide,
http://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.
2/com.ibm.xlf1512.lelinux.doc/proguide/optimization.html
31 c 2015 IBM Corporation
Relevance of LINPACK
• Based on DGEMM()
• 80–90% of peak performance
• Commercial deployment verification test for large systems
• Proprietary binary files run by the installation team
32 c 2015 IBM Corporation
LINPACK vs. HPC analytics
33 c 2015 IBM Corporation
Benchmarking methodology options
1. Take one core for the initial tuning
Try SMT1, SMT2, SMT4, SMT8
(number of threads + affinity + -qtune=pwr8:XXX
Try different optimization options (-O3, -O4, . . . )
2. Choose SMT-mode and compiler options that provide the best timing
3. Take one core as a baseline
Run on 1–5 cores (within one chip)
Run on 5 and 10 cores (within one socket)
Run on 10 and 20 cores
34 c 2015 IBM Corporation
POWER8 features
• Eight threads per core
Hide memory latency (like GPU3
)
Instuction flow is arbitrary (unlike GPU)
• Memory bandwidth
• No sense in benchmarking only one thread (like in GPU)
• Scalability within a core depends only on the application
• Advanced features to try
Transactional memory
Relaxed memory model
Decimal floating point unit
3
GPU — Graphical Processing Unit
35 c 2015 IBM Corporation
Disclaimer
All the information, representations, statements, opinions and proposals in this
document are correct and accurate to the best of our present knowledge but are
not intended (and should not be taken) to be contractually binding unless and
until they become the subject of separate, specific agreement between us.
Any IBM Machines provided are subject to the Statements of Limited Warranty
accompanying the applicable Machine.
Any IBM Program Products provided are subject to their applicable license terms.
Nothing herein, in whole or in part, shall be deemed to constitute a warranty.
IBM products are subject to withdrawal from marketing and or service upon
notice, and changes to product configurations, or follow-on products, may result
in price changes.
Any references in this document to “partner” or “partnership” do not constitute or
imply a partnership in the sense of the Partnership Act 1890.
IBM is not responsible for printing errors in this proposal that result in pricing or
information inaccuracies.
36 c 2015 IBM Corporation
Правовая информация
IBM, логотип IBM, BladeCenter, System Storage и System x являются товарными знаками International Business
Machines Corporation в США и/или других странах. Полный список товарных знаков компании IBM смотрите
на узле Web: www.ibm.com/legal/copytrade.shtml.
Названия других компаний, продуктов и услуг могут являться товарными знаками или знаками обслуживания
других компаний.
(c) 2015 International Business Machines Corporation. Все права защищены.
Упоминание в этой публикации продуктов или услуг корпорации IBM не означает, что IBM предполагает
предоставлять их во всех странах, в которых осуществляет свою деятельность, информация о
предоставлении продуктов или услуг может быть изменена без уведомления. За самой свежей информацией
о продуктах и услугах компании IBM, предоставляемых в Вашем регионе, следует обращаться в ближайшее
торговое представительство IBM или к авторизованным бизнес-партнерам.
Все заявления относительно намерений и перспективных планов IBM могут быть изменены без уведомления.
Информация о продуктах третьих фирм получена от производителей этих продуктов или из опубликованных
анонсов указанных продуктов. IBM не тестировала эти продукты и не может подтвердить
производительность, совместимость, или любые другие заявления относительно продуктов третьих фирм.
Вопросы о возможностях продуктов третьих фирм следует адресовать поставщику этих продуктов.
Информация может содержать технические неточности или типографические ошибки. В представленную в
публикации информацию могут вноситься изменения, эти изменения будут включаться в новые редакции
данной публикации. IBM может вносить изменения в рассматриваемые в данной публикации продукты или
услуги в любое время без уведомления.
Любые ссылки на узлы Web третьих фирм приведены только для удобства и никоим образом не служат
поддержкой этим узлам Web. Материалы на указанных узлах Web не являются частью материалов для
данного продукта IBM.
37 c 2015 IBM Corporation

More Related Content

PDF
OpenPOWER Roadmap Toward CORAL
PDF
9/ IBM POWER @ OPEN'16
PDF
Co-Design Architecture for Exascale
PDF
IBM Power8 announce
PPTX
Understanding the IBM Power Systems Advantage
PPTX
IBM Power Systems Announcement Update
PPTX
Move to Hadoop, Go Faster and Save Millions - Mainframe Legacy Modernization
PDF
Superior Cloud Economics with IBM Power Systems
OpenPOWER Roadmap Toward CORAL
9/ IBM POWER @ OPEN'16
Co-Design Architecture for Exascale
IBM Power8 announce
Understanding the IBM Power Systems Advantage
IBM Power Systems Announcement Update
Move to Hadoop, Go Faster and Save Millions - Mainframe Legacy Modernization
Superior Cloud Economics with IBM Power Systems

What's hot (20)

PDF
The IBM Data Engine for NoSQL on IBM Power Systems™
PDF
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
PDF
How to Solve Real-Time Data Problems
PDF
IBM Power leading Cognitive Systems
PPTX
Why Hadoop is important to Syncsort
PDF
Ibm integrated analytics system
PDF
Open-Source Lamp Stacks Fly with IBM POWER8
PPTX
SQL Server on Linux - march 2017
PDF
Hadoop in the Enterprise: Legacy Rides the Elephant
PDF
Red Hat Enterprise Linux on IBM Power Systems
PPTX
Oracle big data appliance and solutions
PPTX
Ibm symp14 referentin_barbara koch_power_8 launch bk
PPTX
IBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
PDF
DB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools Update
PPT
OpenPOWER Webinar
PPTX
Enabling the Software Defined Data Center for Hybrid IT
PDF
IMCSummit 2015 - Day 2 Keynote - In-Memory Computing and the Emergence of Tie...
PPTX
Oracle Database Consolidation with FlexPod on Cisco UCS
PPTX
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
PDF
Performing Simulation-Based, Real-time Decision Making with Cloud HPC
The IBM Data Engine for NoSQL on IBM Power Systems™
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
How to Solve Real-Time Data Problems
IBM Power leading Cognitive Systems
Why Hadoop is important to Syncsort
Ibm integrated analytics system
Open-Source Lamp Stacks Fly with IBM POWER8
SQL Server on Linux - march 2017
Hadoop in the Enterprise: Legacy Rides the Elephant
Red Hat Enterprise Linux on IBM Power Systems
Oracle big data appliance and solutions
Ibm symp14 referentin_barbara koch_power_8 launch bk
IBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
DB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools Update
OpenPOWER Webinar
Enabling the Software Defined Data Center for Hybrid IT
IMCSummit 2015 - Day 2 Keynote - In-Memory Computing and the Emergence of Tie...
Oracle Database Consolidation with FlexPod on Cisco UCS
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
Performing Simulation-Based, Real-time Decision Making with Cloud HPC
Ad

Viewers also liked (19)

PDF
The Quantum Effect: HPC without FLOPS
PDF
OpenPOWER Update
PPTX
Building a Microsoft cloud with open technologies
PPTX
Azure Service Fabric Overview
PPTX
Georgia Azure Event - Scalable cloud games using Microsoft Azure
PPTX
Accelerating Business Intelligence Solutions with Microsoft Azure pass
PDF
The State of Linux Containers
PDF
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
PPTX
Bitcoin explained
PPTX
Blockchain
PDF
Oracle Solaris Software Integration
PDF
Open Innovation with Power Systems
PPTX
Expert summit SQL Server 2016
PPTX
Puppet + Windows Nano Server
PDF
Oracle Solaris Build and Run Applications Better on 11.3
PDF
Oracle Solaris Secure Cloud Infrastructure
PDF
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
PDF
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
PDF
20150716 introduction to apache spark v3
The Quantum Effect: HPC without FLOPS
OpenPOWER Update
Building a Microsoft cloud with open technologies
Azure Service Fabric Overview
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Accelerating Business Intelligence Solutions with Microsoft Azure pass
The State of Linux Containers
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Bitcoin explained
Blockchain
Oracle Solaris Software Integration
Open Innovation with Power Systems
Expert summit SQL Server 2016
Puppet + Windows Nano Server
Oracle Solaris Build and Run Applications Better on 11.3
Oracle Solaris Secure Cloud Infrastructure
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
20150716 introduction to apache spark v3
Ad

Similar to IBM POWER8 as an HPC platform (20)

PDF
Future Commodity Chip Called CELL for HPC
PDF
OpenPOWER Seminar at IIT Madras
PDF
OpenPOWER/POWER9 Webinar from MIT and IBM
PDF
IBM POWER8: The first OpenPOWER processor
PDF
Ibm power systems hpc cluster
PPTX
IBM Power for Life Sciences
PPTX
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
PDF
Deeplearningusingcloudpakfordata
PDF
AI in Healh Care using IBM POWER systems
PDF
AI in Health Care using IBM Systems/OpenPOWER systems
PDF
@IBM Power roadmap 8
PDF
IBM Data Centric Systems & OpenPOWER
PDF
OpenPOWER Acceleration of HPCC Systems
PPTX
2018 bsc power9 and power ai
PDF
OpenPOWER Seminar at IIIT Bangalore
PDF
IBM Power9 Features and Specifications
PPTX
PowerAI Deep dive
PPTX
IBM Power Systems E850C and S824
PPT
STG101 Power Product_PRESENTATION VERSION 1.ppt
PDF
Barcelona Supercomputing Center, Generador de Riqueza
Future Commodity Chip Called CELL for HPC
OpenPOWER Seminar at IIT Madras
OpenPOWER/POWER9 Webinar from MIT and IBM
IBM POWER8: The first OpenPOWER processor
Ibm power systems hpc cluster
IBM Power for Life Sciences
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Deeplearningusingcloudpakfordata
AI in Healh Care using IBM POWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
@IBM Power roadmap 8
IBM Data Centric Systems & OpenPOWER
OpenPOWER Acceleration of HPCC Systems
2018 bsc power9 and power ai
OpenPOWER Seminar at IIIT Bangalore
IBM Power9 Features and Specifications
PowerAI Deep dive
IBM Power Systems E850C and S824
STG101 Power Product_PRESENTATION VERSION 1.ppt
Barcelona Supercomputing Center, Generador de Riqueza

More from Alexander Pozdneev (8)

PDF
IBM Graph – Graph Database-as-a-Service: Managing Data and Its Relationships ...
PDF
Методология уточнения параметров системы разработки при планировании эксплуат...
PDF
A Methodology for the Refinement of Well Locations During Operational Drillin...
PDF
Enhanced MPSM3 for applications to quantum biological simulations
PDF
Midpoint-Based Parallel Sparse Matrix-Matrix Multiplication Algorithm
PDF
Parallel Algorithms for Trillion Edges Graph Problems
PDF
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
PDF
Параллельные алгоритмы IBM Research для решения задач обхода и построения кра...
IBM Graph – Graph Database-as-a-Service: Managing Data and Its Relationships ...
Методология уточнения параметров системы разработки при планировании эксплуат...
A Methodology for the Refinement of Well Locations During Operational Drillin...
Enhanced MPSM3 for applications to quantum biological simulations
Midpoint-Based Parallel Sparse Matrix-Matrix Multiplication Algorithm
Parallel Algorithms for Trillion Edges Graph Problems
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Параллельные алгоритмы IBM Research для решения задач обхода и построения кра...

Recently uploaded (20)

PPT
Hypersensitivity Namisha1111111111-WPS.ppt
PPTX
quadraticequations-111211090004-phpapp02.pptx
PPTX
PLC ANALOGUE DONE BY KISMEC KULIM TD 5 .0
PPTX
Lecture-3-Computer-programming for BS InfoTech
PPTX
Wireless and Mobile Backhaul Market.pptx
PPTX
Fundamentals of Computer.pptx Computer BSC
PPTX
"Fundamentals of Digital Image Processing: A Visual Approach"
PPTX
Computers and mobile device: Evaluating options for home and work
PPTX
New professional education PROF-ED-7_103359.pptx
PDF
-DIGITAL-INDIA.pdf one of the most prominent
PDF
Dozuki_Solution-hardware minimalization.
PDF
ISS2022 present sdabhsa hsdhdfahasda ssdsd
PPT
chapter_1_a.ppthduushshwhwbshshshsbbsbsbsbsh
PDF
Tcl Scripting for EDA.pdf
DOCX
A PROPOSAL ON IoT climate sensor 2.docx
DOCX
fsdffdghjjgfxfdghjvhjvgfdfcbchghgghgcbjghf
PDF
Dynamic Checkweighers and Automatic Weighing Machine Solutions
PPTX
02fdgfhfhfhghghhhhhhhhhhhhhhhhhhhhh.pptx
PPTX
Nanokeyer nano keyekr kano ketkker nano keyer
PPTX
Presentation 1.pptxnshshdhhdhdhdhdhhdhdhdhd
Hypersensitivity Namisha1111111111-WPS.ppt
quadraticequations-111211090004-phpapp02.pptx
PLC ANALOGUE DONE BY KISMEC KULIM TD 5 .0
Lecture-3-Computer-programming for BS InfoTech
Wireless and Mobile Backhaul Market.pptx
Fundamentals of Computer.pptx Computer BSC
"Fundamentals of Digital Image Processing: A Visual Approach"
Computers and mobile device: Evaluating options for home and work
New professional education PROF-ED-7_103359.pptx
-DIGITAL-INDIA.pdf one of the most prominent
Dozuki_Solution-hardware minimalization.
ISS2022 present sdabhsa hsdhdfahasda ssdsd
chapter_1_a.ppthduushshwhwbshshshsbbsbsbsbsh
Tcl Scripting for EDA.pdf
A PROPOSAL ON IoT climate sensor 2.docx
fsdffdghjjgfxfdghjvhjvgfdfcbchghgghgcbjghf
Dynamic Checkweighers and Automatic Weighing Machine Solutions
02fdgfhfhfhghghhhhhhhhhhhhhhhhhhhhh.pptx
Nanokeyer nano keyekr kano ketkker nano keyer
Presentation 1.pptxnshshdhhdhdhdhdhhdhdhdhd

IBM POWER8 as an HPC platform

  • 1. IBM POWER8 as an HPC platform Alexander Pozdneev, Georgy Pavlov IBM October 23, 2015 — IBM Linux on Power: Platform News 1 c 2015 IBM Corporation
  • 2. What is HPC? • HPC — High Performance Computing • A.k.a. technical computing • Aeroacoustics: Effects of chevrons on jet noise • Supersonic jet engine noise computational fluid dynamics simulation • 128k Blue Gene/P cores — ≈ 100 hours • 1M Blue Gene/Q cores — ≈ 12 hours http://youtu.be/cjoz5tncRUs http://youtu.be/uxT-VmY3OWc 2 c 2015 IBM Corporation
  • 3. Secrets of the Dark Universe • Cosmology: The evolution of the Universe simulation • Understanding the physics of the dark matter and energy • 1 BG/Q rack — 68B particles • 32 BG/Q racks — 1.1T particles http://www.youtube.com/watch?v=tdv8yrJk4VE http://www.youtube.com/watch?v=-S-T_iTiAxQ 3 c 2015 IBM Corporation
  • 4. Real-time modeling of human heart ventricles • Physiology: Simulation of drug-induced arrhythmias • Resolution — 0.1 mm • 768k Blue Gene/Q cores • 43% peak • http://dl.acm.org/citation.cfm?id=2388999 • LLNL, IBM Research, IBM Research Collaboratory for Life Sciences 4 c 2015 IBM Corporation
  • 5. Modelling of a complete human viral pathogen poliovirus • Molecular biology: Reconstruction and simulation of poliovirus • Antiviral drugs, virus infection, modelling related viruses • 3.3M–3.7M atoms • Blue Gene/Q, Victorian Life Sciences Computing Initiative • http://www.youtube.com/watch?v=Nih0Qa673FY 5 c 2015 IBM Corporation
  • 6. Speakers • Alexander Pozdneev Research Software Engineer HPC • Georgy Pavlov Software Engineer ESSL Russian team leader 6 c 2015 IBM Corporation
  • 7. Outline 1 Data centric computing as a new HPC paradigm 2 Architecture of IBM HPC systems based on POWER8+NVIDIA servers 3 Software stack of IBM HPC systems 4 IBM HPC mathematical libraries 5 Measuring efficiency of an HPC system on real applications 6 Summary 7 c 2015 IBM Corporation
  • 8. Application diversity Image credit: http://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=POL03229USEN 8 c 2015 IBM Corporation
  • 9. Data centric computing as a new HPC paradigm • That is all about moving data around • Memory bandwidth • Memory latency • High value of “memory access operations” / “computations” • Number of FLOPs1 per cycle is no longer relevant • Offloading computing to memory (Active Memory Cube by Micron) 1 FLOP — Floating-Point Operation 9 c 2015 IBM Corporation
  • 10. Department of Energy CORAL project 10 c 2015 IBM Corporation
  • 11. IBM systems in CORAL project 11 c 2015 IBM Corporation
  • 12. Architecture of POWER8+NVIDIA systems for HPC 12 c 2015 IBM Corporation
  • 13. Overview of IBM Power System S822LC Power S822LC model 8335-GTA • POWER8 processor module: 8-core, 3.32 GHz 10-core, 2.92 GHz • Two sockets • Graphics processing units Two NVIDIA K80 GPUs • Eight memory slots • 2U height 13 c 2015 IBM Corporation
  • 14. Architecture of IBM Power System S822LC 14 c 2015 IBM Corporation
  • 15. System software Software stack of IBM HPC systems • System software Operating system: Linux, bare-metal (no virtualization) Drivers: • Mellanox InfiniBand OFED • NVIDIA Deployment: xCAT • Parallel operating environment IBM Parallel Environment Runtime Edition (PE RTE) Workload scheduler: IBM Platform LSF Parallel filesystem: IBM Spectrum Scale (“GPFS”2 ) 2 General Parallel File System 15 c 2015 IBM Corporation
  • 16. Development tools Software stack of IBM HPC systems • Compilers IBM XL C/C++/Fortran compilers IBM Advance Toolchain, http://ibm.co/AdvanceToolchain • Fork of GNU compiler/tools optimized for POWER8 • gcc, g++, gfortran • Analysis tools (oprofile, valgrind, itrace) Vanilla GCC, binutils, etc. CUDA Toolkit • IBM Parallel Environment Developer Edition (PE DE) • IBM Software Development Kit for Linux on Power • Mathematical libraries Mathematical Acceleration Subsystem (MASS) IBM Engineering and Scientific Subroutine Library (ESSL) IBM Parallel ESSL 16 c 2015 IBM Corporation
  • 17. Engineering and Scientific Subroutine Library • High-performance mathematical functions Scientific applications Engineering applications • Platforms IBM POWER servers IBM POWER clusters • Libraries ESSL Serial and SMP: 600+ subroutines (SMP — Symmetric Multi-Processing) Parallel ESSL: 125+ subroutines • Languages: C C++ Fortran http://www.ibm.com/systems/power/software/essl 17 c 2015 IBM Corporation
  • 18. ESSL: Industry de facto standards • ESSL implements the following interfaces: BLAS (linear algebra) LAPACK (linear algebra) FFTW (Fourier transformation) • Parallel ESSL implements the following interfaces: ScaLAPACK • Easy migration • Just recompile! http://fftw.org 18 c 2015 IBM Corporation
  • 19. What mathematical areas are supported? ESSL • Linear algebra subprograms • Matrix operations • Linear algebraic equations • Eigensystems analysis • Fourier transforms, convolution, correlation, . . . • Sorting and searching • Interpolation • Numerical quadrature • Random number generation Parallel ESSL • BLACS • Level 2 parallel BLAS • Level 3 parallel BLAS • Linear algebraic equations • Eigensystems analysis • Fourier transforms • Random number generation 19 c 2015 IBM Corporation
  • 20. How to leverage the hardware? Symmetric multiprocessing: • Multiple hardware threads • Multiple cores POWER8+NVIDIA: • Use multiple GPUs • Select which GPU to use • Run ESSL in a hybrid mode 20 c 2015 IBM Corporation
  • 21. Synthetic benchmarks vs. real apps Measuring car pollution in official tests? • You get low toxic nitrogen oxides in a lab environment • You cannot predict how much smoke you produce, unless you test your scenarios • You would run a testdrive prior to car purchase 21 c 2015 IBM Corporation
  • 22. Threads behavior: Typical vs. HPC Typical workload HPC workload 22 c 2015 IBM Corporation
  • 23. Importance of threads affinity NAS Parallel Benchmarks, mg.C (peaks at SMT1), 20 cores 23 c 2015 IBM Corporation
  • 24. Choice of compilation parameters: -O5 -qnohot NAS Parallel Benchmarks, bt.C, affinity, baseline: -O3 -qhot 24 c 2015 IBM Corporation
  • 25. Compilation parameters: -O3 -qhot, -O5 -qprefetch NAS Parallel Benchmarks, mg.C, affinity, baseline: -O5 -qnohot 25 c 2015 IBM Corporation
  • 26. Choice of an SMT mode: SMT1 NAS Parallel Benchmarks, mg.C, affinity, baseline: SMT8 26 c 2015 IBM Corporation
  • 27. Choice of an SMT mode: SMT2, SMT4 NAS Parallel Benchmarks, bt.C, affinity, baseline: SMT1 27 c 2015 IBM Corporation
  • 28. Choice of an SMT mode: SMT8 NAS Parallel Benchmarks, cg.C, affinity, baseline: SMT1 28 c 2015 IBM Corporation
  • 29. Summary • Technical computing problems ⇒ need for HPC • Data centric computing as a new HPC paradigm • CORAL project • IBM Power System S822LC model 8335-GTA • IBM HPC Software stack • High performance math libraries • Leveraging performance 29 c 2015 IBM Corporation
  • 31. Further reads • XL C/C++ for Linux, http://www.ibm.com/support/knowledgecenter/SSXVZZ/ • XL Fortran for Linux, http://www.ibm.com/support/knowledgecenter/SSAT4T/ • XL C/C++ for Linux 13.1.2 Optimization and Programming Guide, http: //www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.2/com. ibm.xlcpp1312.lelinux.doc/proguide/optimization.html • XL Fortran for Linux 15.1.2 Optimization and Programming Guide, http://www.ibm.com/support/knowledgecenter/SSAT4T_15.1. 2/com.ibm.xlf1512.lelinux.doc/proguide/optimization.html 31 c 2015 IBM Corporation
  • 32. Relevance of LINPACK • Based on DGEMM() • 80–90% of peak performance • Commercial deployment verification test for large systems • Proprietary binary files run by the installation team 32 c 2015 IBM Corporation
  • 33. LINPACK vs. HPC analytics 33 c 2015 IBM Corporation
  • 34. Benchmarking methodology options 1. Take one core for the initial tuning Try SMT1, SMT2, SMT4, SMT8 (number of threads + affinity + -qtune=pwr8:XXX Try different optimization options (-O3, -O4, . . . ) 2. Choose SMT-mode and compiler options that provide the best timing 3. Take one core as a baseline Run on 1–5 cores (within one chip) Run on 5 and 10 cores (within one socket) Run on 10 and 20 cores 34 c 2015 IBM Corporation
  • 35. POWER8 features • Eight threads per core Hide memory latency (like GPU3 ) Instuction flow is arbitrary (unlike GPU) • Memory bandwidth • No sense in benchmarking only one thread (like in GPU) • Scalability within a core depends only on the application • Advanced features to try Transactional memory Relaxed memory model Decimal floating point unit 3 GPU — Graphical Processing Unit 35 c 2015 IBM Corporation
  • 36. Disclaimer All the information, representations, statements, opinions and proposals in this document are correct and accurate to the best of our present knowledge but are not intended (and should not be taken) to be contractually binding unless and until they become the subject of separate, specific agreement between us. Any IBM Machines provided are subject to the Statements of Limited Warranty accompanying the applicable Machine. Any IBM Program Products provided are subject to their applicable license terms. Nothing herein, in whole or in part, shall be deemed to constitute a warranty. IBM products are subject to withdrawal from marketing and or service upon notice, and changes to product configurations, or follow-on products, may result in price changes. Any references in this document to “partner” or “partnership” do not constitute or imply a partnership in the sense of the Partnership Act 1890. IBM is not responsible for printing errors in this proposal that result in pricing or information inaccuracies. 36 c 2015 IBM Corporation
  • 37. Правовая информация IBM, логотип IBM, BladeCenter, System Storage и System x являются товарными знаками International Business Machines Corporation в США и/или других странах. Полный список товарных знаков компании IBM смотрите на узле Web: www.ibm.com/legal/copytrade.shtml. Названия других компаний, продуктов и услуг могут являться товарными знаками или знаками обслуживания других компаний. (c) 2015 International Business Machines Corporation. Все права защищены. Упоминание в этой публикации продуктов или услуг корпорации IBM не означает, что IBM предполагает предоставлять их во всех странах, в которых осуществляет свою деятельность, информация о предоставлении продуктов или услуг может быть изменена без уведомления. За самой свежей информацией о продуктах и услугах компании IBM, предоставляемых в Вашем регионе, следует обращаться в ближайшее торговое представительство IBM или к авторизованным бизнес-партнерам. Все заявления относительно намерений и перспективных планов IBM могут быть изменены без уведомления. Информация о продуктах третьих фирм получена от производителей этих продуктов или из опубликованных анонсов указанных продуктов. IBM не тестировала эти продукты и не может подтвердить производительность, совместимость, или любые другие заявления относительно продуктов третьих фирм. Вопросы о возможностях продуктов третьих фирм следует адресовать поставщику этих продуктов. Информация может содержать технические неточности или типографические ошибки. В представленную в публикации информацию могут вноситься изменения, эти изменения будут включаться в новые редакции данной публикации. IBM может вносить изменения в рассматриваемые в данной публикации продукты или услуги в любое время без уведомления. Любые ссылки на узлы Web третьих фирм приведены только для удобства и никоим образом не служат поддержкой этим узлам Web. Материалы на указанных узлах Web не являются частью материалов для данного продукта IBM. 37 c 2015 IBM Corporation