SlideShare a Scribd company logo
Understanding and optimizing
parallelism in NumPy-based programs
Ralf Gommers
21 April 2022
First make it work, then make it fast
>>> %timeit main()
50.1 ms ± 1.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> # ... perform some optimizations
>>> %timeit main()
9.58 ms ± 22.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> # break out your profiler (e.g., py-spy), optimize some more
>>> %timeit main()
2.83 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
`htop` output
Approaches for performant numerical
code (single-threaded)
Vectorization Use compiled code
Python compilers Python interpreters
Pythran
CPython
Plus Cinder, Pyston, and more -- very experimental,
and limited gains for numerical code
multiprocessing & multithreading
A key issue: oversubscription
Package A sees N CPU cores, and decides to use them all:
A key issue: oversubscription
Package B, which uses package A, or the end user decides to use
multiprocessing, 1 process per core:
The more CPU cores a machine has, the worse the effect is!
Parallel APIs & behavior: NumPy
NumPy is single-threaded, no code in NumPy is written for parallel execution.
However, most numpy.linalg functions (those using BLAS or LAPACK) execute in
parallel. They use all available cores on a machine.
NumPy does release the GIL wherever it can.
numpy.random has specific APIs to allow users to:
(a) Obtain independent streams for random number generation across
processes (local or distributed)
(b) Perform multithreaded random number generation
Parallel APIs & behavior: SciPy
SciPy is single-threaded by default (same as NumPy)
Calls to functionality using BLAS or LAPACK is again multithreaded:
● primarily in scipy.linalg and scipy.sparse.linalg,
● also higher-level functionality using linear algebra under the hood:
kernel density estimation, multivariate distributions etc. in scipy.stats,
vector quantization in scipy.cluster, interpolators in scipy.interpolate,
optimizers in scipy.optimize, and more
Some APIs have a workers=1 keyword, which allows the user to control the
number of processes or threads. Or pass in a custom Pool.
scipy.fft provides a context manager:
Parallel APIs & behavior: SciPy
An example using workers=:
Parallel APIs & behavior: scikit-learn
Scikit-learn is mostly single-threaded by default.
However, more and more functionality uses OpenMP for automatic
parallelization. This defaults to the number of virtual (not physical) CPU cores.
Many scikit-learn APIs offer a n_jobs= keyword to let user enable multiple
threads or processes via joblib.
Scikit-learn implements fairly complex control of NumPy/SciPy’s BLAS and
LAPACK libraries to prevent oversubscription in the presence of
multiprocessing on top of multi-threading. This is done via the threadpoolctl
package.
Controlling parallelism - packages
Dependencies (Conda)
Controlling parallelism - packages
Dependencies (PyPI)
Controlling parallelism - packages
Conda PyPI
Tuning the default behavior
Default behavior is inconsistent: too aggressive for linear algebra, and too
conservative for workers (SciPy) and n_jobs (scikit-learn)!
OpenBLAS, MKL and OpenMP don’t have a nice API, only environment variables:
For scikit-learn you can explicitly choose a backend (but defaults are usually fine):
Tuning the default behavior
NumPy, SciPy and scikit-learn all recommend using threadpoolctl in case you
want more granular control over threading behavior of BLAS, LAPACK and
OpenMP libraries (or cannot set environment variables):
A pitfall on multi-tenant machines
Multi-tenant machines: N “vCPU” (virtual CPU) cores for you, M in total.
CircleCI gives you 2 cores for a CI job, on a 64 core machine (and
os.cpu_count() reports 64). Set OPENBLAS_NUM_THREADS=2 to avoid problems!
GitHub Actions, Azure DevOps and other services are better behaved.
The impact can be severe:
Parallel random number generation
Parallel random number generation
First what not to do – simply drawing random numbers in different
subprocesses will give you the same numbers in each process:
Parallel random number generation
Use SeedSequence to obtain independent streams easily:
Parallel random number generation
Second option: use the .jumped() method of BitGenerator instances to obtain
independent streams easily:
Parallel random number generation
Where is NumPy going - technical
Interoperability
Array API standard support
Extensibility
Easier custom dtypes
Performance
SIMD acceleration on:
x86, arm64, PPC, …?
C++
Just dipping our toes in the
water here - so far it was just
Python and C
Platform support
PPC, AIX, s390x,
cross-compiling to embedded
ARM systems, ...
Type annotations
Main namespace annotations
just completed
Note what is not on this list: auto-parallelization
Resources to learn more
Scikit-learn:
https://scikit-learn.org/stable/computing/parallelism.html
https://joblib.readthedocs.io/en/latest/parallel.html
SciPy:
http://scipy.github.io/devdocs/dev/toolchain.html#openmp-support
http://scipy.github.io/devdocs/search.html?q=workers
NumPy:
https://numpy.org/doc/stable/reference/random/parallel.html
Relevant paper: Composable Multi-Threading and Multi-Processing for Numeric Libraries
Find me at: ralf.gommers@gmail.com, rgommers, ralfgommers
Thank you!

More Related Content

Similar to Parallelism in a NumPy-based program (20)

Joblib for cloud computing
Joblib for cloud computingJoblib for cloud computing
Joblib for cloud computing
Alexandre Abadie
 
PyParis2017 / Cloud computing made easy in Joblib, by Alexandre Abadie
PyParis2017 / Cloud computing made easy in Joblib, by Alexandre AbadiePyParis2017 / Cloud computing made easy in Joblib, by Alexandre Abadie
PyParis2017 / Cloud computing made easy in Joblib, by Alexandre Abadie
Pôle Systematic Paris-Region
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpc
Dr Reeja S R
 
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etcComparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Yukio Okuda
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Introduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimizationIntroduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimization
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
Travis Oliphant
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
Ian Ozsvald
 
Multiprocessing with python
Multiprocessing with pythonMultiprocessing with python
Multiprocessing with python
Patrick Vergain
 
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AIAI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
Alluxio, Inc.
 
Mpi in-python
Mpi in-pythonMpi in-python
Mpi in-python
A Jorge Garcia
 
Parallel Processing with IPython
Parallel Processing with IPythonParallel Processing with IPython
Parallel Processing with IPython
Enthought, Inc.
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
MLconf
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
Rizwan Habib
 
Parallel Computing--Webminar.ppsx
Parallel Computing--Webminar.ppsxParallel Computing--Webminar.ppsx
Parallel Computing--Webminar.ppsx
BharathiLakshmiAAssi
 
parallelcomputing-webminar.ppsx
parallelcomputing-webminar.ppsxparallelcomputing-webminar.ppsx
parallelcomputing-webminar.ppsx
Bharathi Lakshmi Pon
 
Introduction-to-NumPy-in-Python (1).pptx
Introduction-to-NumPy-in-Python (1).pptxIntroduction-to-NumPy-in-Python (1).pptx
Introduction-to-NumPy-in-Python (1).pptx
disserdekabrcha
 
St Petersburg R user group meetup 2, Parallel R
St Petersburg R user group meetup 2, Parallel RSt Petersburg R user group meetup 2, Parallel R
St Petersburg R user group meetup 2, Parallel R
Andrew Bzikadze
 
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
Masashi Shibata
 
Joblib for cloud computing
Joblib for cloud computingJoblib for cloud computing
Joblib for cloud computing
Alexandre Abadie
 
PyParis2017 / Cloud computing made easy in Joblib, by Alexandre Abadie
PyParis2017 / Cloud computing made easy in Joblib, by Alexandre AbadiePyParis2017 / Cloud computing made easy in Joblib, by Alexandre Abadie
PyParis2017 / Cloud computing made easy in Joblib, by Alexandre Abadie
Pôle Systematic Paris-Region
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpc
Dr Reeja S R
 
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etcComparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Yukio Okuda
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
Travis Oliphant
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
Ian Ozsvald
 
Multiprocessing with python
Multiprocessing with pythonMultiprocessing with python
Multiprocessing with python
Patrick Vergain
 
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AIAI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
Alluxio, Inc.
 
Parallel Processing with IPython
Parallel Processing with IPythonParallel Processing with IPython
Parallel Processing with IPython
Enthought, Inc.
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
MLconf
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
Rizwan Habib
 
Introduction-to-NumPy-in-Python (1).pptx
Introduction-to-NumPy-in-Python (1).pptxIntroduction-to-NumPy-in-Python (1).pptx
Introduction-to-NumPy-in-Python (1).pptx
disserdekabrcha
 
St Petersburg R user group meetup 2, Parallel R
St Petersburg R user group meetup 2, Parallel RSt Petersburg R user group meetup 2, Parallel R
St Petersburg R user group meetup 2, Parallel R
Andrew Bzikadze
 
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
Masashi Shibata
 

More from Ralf Gommers (12)

Reliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfReliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdf
Ralf Gommers
 
Python array API standardization - current state and benefits
Python array API standardization - current state and benefitsPython array API standardization - current state and benefits
Python array API standardization - current state and benefits
Ralf Gommers
 
Building SciPy kernels with Pythran
Building SciPy kernels with PythranBuilding SciPy kernels with Pythran
Building SciPy kernels with Pythran
Ralf Gommers
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
Ralf Gommers
 
Strengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond codeStrengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond code
Ralf Gommers
 
PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019
Ralf Gommers
 
Inside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decadeInside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decade
Ralf Gommers
 
The evolution of array computing in Python
The evolution of array computing in PythonThe evolution of array computing in Python
The evolution of array computing in Python
Ralf Gommers
 
__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts
Ralf Gommers
 
NumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumNumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS Forum
Ralf Gommers
 
NumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_sessionNumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_session
Ralf Gommers
 
SciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and CodeSciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and Code
Ralf Gommers
 
Reliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfReliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdf
Ralf Gommers
 
Python array API standardization - current state and benefits
Python array API standardization - current state and benefitsPython array API standardization - current state and benefits
Python array API standardization - current state and benefits
Ralf Gommers
 
Building SciPy kernels with Pythran
Building SciPy kernels with PythranBuilding SciPy kernels with Pythran
Building SciPy kernels with Pythran
Ralf Gommers
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
Ralf Gommers
 
Strengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond codeStrengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond code
Ralf Gommers
 
PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019
Ralf Gommers
 
Inside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decadeInside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decade
Ralf Gommers
 
The evolution of array computing in Python
The evolution of array computing in PythonThe evolution of array computing in Python
The evolution of array computing in Python
Ralf Gommers
 
__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts
Ralf Gommers
 
NumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumNumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS Forum
Ralf Gommers
 
NumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_sessionNumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_session
Ralf Gommers
 
SciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and CodeSciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and Code
Ralf Gommers
 
Ad

Recently uploaded (20)

iOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod KumariOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod Kumar
Pramod Kumar
 
FME for Climate Data: Turning Big Data into Actionable Insights
FME for Climate Data: Turning Big Data into Actionable InsightsFME for Climate Data: Turning Big Data into Actionable Insights
FME for Climate Data: Turning Big Data into Actionable Insights
Safe Software
 
Software Engineering Process, Notation & Tools Introduction - Part 3
Software Engineering Process, Notation & Tools Introduction - Part 3Software Engineering Process, Notation & Tools Introduction - Part 3
Software Engineering Process, Notation & Tools Introduction - Part 3
Gaurav Sharma
 
Simplify Training with an Online Induction Portal for Contractors
Simplify Training with an Online Induction Portal for ContractorsSimplify Training with an Online Induction Portal for Contractors
Simplify Training with an Online Induction Portal for Contractors
SHEQ Network Limited
 
Design by Contract - Building Robust Software with Contract-First Development
Design by Contract - Building Robust Software with Contract-First DevelopmentDesign by Contract - Building Robust Software with Contract-First Development
Design by Contract - Building Robust Software with Contract-First Development
Par-Tec S.p.A.
 
IBM Rational Unified Process For Software Engineering - Introduction
IBM Rational Unified Process For Software Engineering - IntroductionIBM Rational Unified Process For Software Engineering - Introduction
IBM Rational Unified Process For Software Engineering - Introduction
Gaurav Sharma
 
Artificial Intelligence Applications Across Industries
Artificial Intelligence Applications Across IndustriesArtificial Intelligence Applications Across Industries
Artificial Intelligence Applications Across Industries
SandeepKS52
 
Agentic Techniques in Retrieval-Augmented Generation with Azure AI Search
Agentic Techniques in Retrieval-Augmented Generation with Azure AI SearchAgentic Techniques in Retrieval-Augmented Generation with Azure AI Search
Agentic Techniques in Retrieval-Augmented Generation with Azure AI Search
Maxim Salnikov
 
Generative Artificial Intelligence and its Applications
Generative Artificial Intelligence and its ApplicationsGenerative Artificial Intelligence and its Applications
Generative Artificial Intelligence and its Applications
SandeepKS52
 
Bonk coin airdrop_ Everything You Need to Know.pdf
Bonk coin airdrop_ Everything You Need to Know.pdfBonk coin airdrop_ Everything You Need to Know.pdf
Bonk coin airdrop_ Everything You Need to Know.pdf
Herond Labs
 
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdfThe Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
Varsha Nayak
 
Best Inbound Call Tracking Software for Small Businesses
Best Inbound Call Tracking Software for Small BusinessesBest Inbound Call Tracking Software for Small Businesses
Best Inbound Call Tracking Software for Small Businesses
TheTelephony
 
How John started to like TDD (instead of hating it) (ViennaJUG, June'25)
How John started to like TDD (instead of hating it) (ViennaJUG, June'25)How John started to like TDD (instead of hating it) (ViennaJUG, June'25)
How John started to like TDD (instead of hating it) (ViennaJUG, June'25)
Nacho Cougil
 
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
SheenBrisals
 
Essentials of Resource Planning in a Downturn
Essentials of Resource Planning in a DownturnEssentials of Resource Planning in a Downturn
Essentials of Resource Planning in a Downturn
OnePlan Solutions
 
Providing Better Biodiversity Through Better Data
Providing Better Biodiversity Through Better DataProviding Better Biodiversity Through Better Data
Providing Better Biodiversity Through Better Data
Safe Software
 
How AI Can Improve Media Quality Testing Across Platforms (1).pptx
How AI Can Improve Media Quality Testing Across Platforms (1).pptxHow AI Can Improve Media Quality Testing Across Platforms (1).pptx
How AI Can Improve Media Quality Testing Across Platforms (1).pptx
kalichargn70th171
 
The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...
Prachi Desai
 
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdfHow to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
QuickBooks Training
 
Software Engineering Process, Notation & Tools Introduction - Part 4
Software Engineering Process, Notation & Tools Introduction - Part 4Software Engineering Process, Notation & Tools Introduction - Part 4
Software Engineering Process, Notation & Tools Introduction - Part 4
Gaurav Sharma
 
iOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod KumariOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod Kumar
Pramod Kumar
 
FME for Climate Data: Turning Big Data into Actionable Insights
FME for Climate Data: Turning Big Data into Actionable InsightsFME for Climate Data: Turning Big Data into Actionable Insights
FME for Climate Data: Turning Big Data into Actionable Insights
Safe Software
 
Software Engineering Process, Notation & Tools Introduction - Part 3
Software Engineering Process, Notation & Tools Introduction - Part 3Software Engineering Process, Notation & Tools Introduction - Part 3
Software Engineering Process, Notation & Tools Introduction - Part 3
Gaurav Sharma
 
Simplify Training with an Online Induction Portal for Contractors
Simplify Training with an Online Induction Portal for ContractorsSimplify Training with an Online Induction Portal for Contractors
Simplify Training with an Online Induction Portal for Contractors
SHEQ Network Limited
 
Design by Contract - Building Robust Software with Contract-First Development
Design by Contract - Building Robust Software with Contract-First DevelopmentDesign by Contract - Building Robust Software with Contract-First Development
Design by Contract - Building Robust Software with Contract-First Development
Par-Tec S.p.A.
 
IBM Rational Unified Process For Software Engineering - Introduction
IBM Rational Unified Process For Software Engineering - IntroductionIBM Rational Unified Process For Software Engineering - Introduction
IBM Rational Unified Process For Software Engineering - Introduction
Gaurav Sharma
 
Artificial Intelligence Applications Across Industries
Artificial Intelligence Applications Across IndustriesArtificial Intelligence Applications Across Industries
Artificial Intelligence Applications Across Industries
SandeepKS52
 
Agentic Techniques in Retrieval-Augmented Generation with Azure AI Search
Agentic Techniques in Retrieval-Augmented Generation with Azure AI SearchAgentic Techniques in Retrieval-Augmented Generation with Azure AI Search
Agentic Techniques in Retrieval-Augmented Generation with Azure AI Search
Maxim Salnikov
 
Generative Artificial Intelligence and its Applications
Generative Artificial Intelligence and its ApplicationsGenerative Artificial Intelligence and its Applications
Generative Artificial Intelligence and its Applications
SandeepKS52
 
Bonk coin airdrop_ Everything You Need to Know.pdf
Bonk coin airdrop_ Everything You Need to Know.pdfBonk coin airdrop_ Everything You Need to Know.pdf
Bonk coin airdrop_ Everything You Need to Know.pdf
Herond Labs
 
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdfThe Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
Varsha Nayak
 
Best Inbound Call Tracking Software for Small Businesses
Best Inbound Call Tracking Software for Small BusinessesBest Inbound Call Tracking Software for Small Businesses
Best Inbound Call Tracking Software for Small Businesses
TheTelephony
 
How John started to like TDD (instead of hating it) (ViennaJUG, June'25)
How John started to like TDD (instead of hating it) (ViennaJUG, June'25)How John started to like TDD (instead of hating it) (ViennaJUG, June'25)
How John started to like TDD (instead of hating it) (ViennaJUG, June'25)
Nacho Cougil
 
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
SheenBrisals
 
Essentials of Resource Planning in a Downturn
Essentials of Resource Planning in a DownturnEssentials of Resource Planning in a Downturn
Essentials of Resource Planning in a Downturn
OnePlan Solutions
 
Providing Better Biodiversity Through Better Data
Providing Better Biodiversity Through Better DataProviding Better Biodiversity Through Better Data
Providing Better Biodiversity Through Better Data
Safe Software
 
How AI Can Improve Media Quality Testing Across Platforms (1).pptx
How AI Can Improve Media Quality Testing Across Platforms (1).pptxHow AI Can Improve Media Quality Testing Across Platforms (1).pptx
How AI Can Improve Media Quality Testing Across Platforms (1).pptx
kalichargn70th171
 
The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...
Prachi Desai
 
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdfHow to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
QuickBooks Training
 
Software Engineering Process, Notation & Tools Introduction - Part 4
Software Engineering Process, Notation & Tools Introduction - Part 4Software Engineering Process, Notation & Tools Introduction - Part 4
Software Engineering Process, Notation & Tools Introduction - Part 4
Gaurav Sharma
 
Ad

Parallelism in a NumPy-based program

  • 1. Understanding and optimizing parallelism in NumPy-based programs Ralf Gommers 21 April 2022
  • 2. First make it work, then make it fast >>> %timeit main() 50.1 ms ± 1.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> # ... perform some optimizations >>> %timeit main() 9.58 ms ± 22.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) >>> # break out your profiler (e.g., py-spy), optimize some more >>> %timeit main() 2.83 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  • 4. Approaches for performant numerical code (single-threaded) Vectorization Use compiled code Python compilers Python interpreters Pythran CPython Plus Cinder, Pyston, and more -- very experimental, and limited gains for numerical code
  • 6. A key issue: oversubscription Package A sees N CPU cores, and decides to use them all:
  • 7. A key issue: oversubscription Package B, which uses package A, or the end user decides to use multiprocessing, 1 process per core: The more CPU cores a machine has, the worse the effect is!
  • 8. Parallel APIs & behavior: NumPy NumPy is single-threaded, no code in NumPy is written for parallel execution. However, most numpy.linalg functions (those using BLAS or LAPACK) execute in parallel. They use all available cores on a machine. NumPy does release the GIL wherever it can. numpy.random has specific APIs to allow users to: (a) Obtain independent streams for random number generation across processes (local or distributed) (b) Perform multithreaded random number generation
  • 9. Parallel APIs & behavior: SciPy SciPy is single-threaded by default (same as NumPy) Calls to functionality using BLAS or LAPACK is again multithreaded: ● primarily in scipy.linalg and scipy.sparse.linalg, ● also higher-level functionality using linear algebra under the hood: kernel density estimation, multivariate distributions etc. in scipy.stats, vector quantization in scipy.cluster, interpolators in scipy.interpolate, optimizers in scipy.optimize, and more Some APIs have a workers=1 keyword, which allows the user to control the number of processes or threads. Or pass in a custom Pool. scipy.fft provides a context manager:
  • 10. Parallel APIs & behavior: SciPy An example using workers=:
  • 11. Parallel APIs & behavior: scikit-learn Scikit-learn is mostly single-threaded by default. However, more and more functionality uses OpenMP for automatic parallelization. This defaults to the number of virtual (not physical) CPU cores. Many scikit-learn APIs offer a n_jobs= keyword to let user enable multiple threads or processes via joblib. Scikit-learn implements fairly complex control of NumPy/SciPy’s BLAS and LAPACK libraries to prevent oversubscription in the presence of multiprocessing on top of multi-threading. This is done via the threadpoolctl package.
  • 12. Controlling parallelism - packages Dependencies (Conda)
  • 13. Controlling parallelism - packages Dependencies (PyPI)
  • 14. Controlling parallelism - packages Conda PyPI
  • 15. Tuning the default behavior Default behavior is inconsistent: too aggressive for linear algebra, and too conservative for workers (SciPy) and n_jobs (scikit-learn)! OpenBLAS, MKL and OpenMP don’t have a nice API, only environment variables: For scikit-learn you can explicitly choose a backend (but defaults are usually fine):
  • 16. Tuning the default behavior NumPy, SciPy and scikit-learn all recommend using threadpoolctl in case you want more granular control over threading behavior of BLAS, LAPACK and OpenMP libraries (or cannot set environment variables):
  • 17. A pitfall on multi-tenant machines Multi-tenant machines: N “vCPU” (virtual CPU) cores for you, M in total. CircleCI gives you 2 cores for a CI job, on a 64 core machine (and os.cpu_count() reports 64). Set OPENBLAS_NUM_THREADS=2 to avoid problems! GitHub Actions, Azure DevOps and other services are better behaved. The impact can be severe:
  • 19. Parallel random number generation First what not to do – simply drawing random numbers in different subprocesses will give you the same numbers in each process:
  • 20. Parallel random number generation Use SeedSequence to obtain independent streams easily:
  • 21. Parallel random number generation Second option: use the .jumped() method of BitGenerator instances to obtain independent streams easily:
  • 23. Where is NumPy going - technical Interoperability Array API standard support Extensibility Easier custom dtypes Performance SIMD acceleration on: x86, arm64, PPC, …? C++ Just dipping our toes in the water here - so far it was just Python and C Platform support PPC, AIX, s390x, cross-compiling to embedded ARM systems, ... Type annotations Main namespace annotations just completed Note what is not on this list: auto-parallelization
  • 24. Resources to learn more Scikit-learn: https://scikit-learn.org/stable/computing/parallelism.html https://joblib.readthedocs.io/en/latest/parallel.html SciPy: http://scipy.github.io/devdocs/dev/toolchain.html#openmp-support http://scipy.github.io/devdocs/search.html?q=workers NumPy: https://numpy.org/doc/stable/reference/random/parallel.html Relevant paper: Composable Multi-Threading and Multi-Processing for Numeric Libraries
  • 25. Find me at: [email protected], rgommers, ralfgommers Thank you!