Profiling in
Python
Fabian Pedregosa
Profiling : measure information of a running program (timing,
memory consumption, disk usage, etc.)
Profiling is about avoiding unnecessary optimizations and
focusing on important ones.
It is not enough to known algorithmic to make optimal code:
Profiling is necessary to make meaningful comparisons.
# three equivalent
# ways to add two vectors
# option 1
for i in range(a.size):
a[i] = a[i] + b[i]
# option 2
a = a + b
# option 3
a += b
The main Python profilers
cProfile (in the standard library)
line_profiler: line-by-line timing.
memory_profiler: to get memory usage information (☺)
IPython magic commands.
yep extension profiler: to get timing of compiled extensions (☺)
Example: Gradient-descent algorithm
import numpy as np
from scipy import optimize
def gd(f, grad, x0, rtol=1e-3):
xk = x0
fk = f(xk)
tol = np.inf
while tol >= rtol:
# .. compute gradient ..
gk = grad(xk)
# .. find step size by line-search ..
step_size = optimize.line_search(f, grad, xk, -gk, gfk=gk)[0]
# .. perform update ..
for i in range(xk.size):
xk[i] = xk[i] - step_size * gk[i]
fk_new = f(xk)
tol = np.abs(fk - fk_new) / fk
fk = fk_new
return xk
Question: should I optimize the gradient evaluation, the line-search
or the coefficient update ?
Profiling How-To
1. Create a realistic example
2. Run it with the appropriate profiler.
cProfile
line_profiler
memory_profiler
yep
(most of the time) cProfile is
useless
$ python -m cProfile -s time gd.py
ncalls tottime percall cumtime percall filename:lineno(function)
2 0.555 0.278 0.555 0.278 {method 'randn' of 'mtrand.RandomState' objects}
1 0.465 0.465 1.075 1.075 samples_generator.py:38(make_classification)
106/90 0.254 0.002 0.290 0.003 {built-in method load_dynamic}
65 0.174 0.003 0.174 0.003 {built-in method dot}
449 0.067 0.000 0.067 0.000 {built-in method loads}
2 0.051 0.026 0.051 0.026 {method 'take' of 'numpy.ndarray' objects}
261 0.039 0.000 0.094 0.000 doccer.py:12(docformat)
628/1 0.039 0.000 2.453 2.453 {built-in method exec}
3737 0.036 0.000 0.036 0.000 {built-in method stat}
7 0.031 0.004 0.034 0.005 {method 'get_filename' of 'zipimport.zipimporter' object
1985 0.031 0.000 0.122 0.000 <frozen importlib._bootstrap>:2016(find_spec)
1114/1108 0.030 0.000 0.086 0.000 {built-in method __build_class__}
1 0.026 0.026 0.211 0.211 gd.py:5(gd)
449 0.022 0.000 0.136 0.000 <frozen importlib._bootstrap>:1534(get_code)
7/3 0.020 0.003 0.223 0.074 {method 'load_module' of 'zipimport'.
Enter line_profiler
line-by-line time consumption
decorate the function with @profile:
run the code with kernprof -l -v my_script.py
written by Robert Kern
@profile
def gd(f, grad, x0, rtol=1e-3):
xk = x0
fk = f(xk)
tol = np.inf
while tol >= rtol:
# .. compute gradient ..
gk = grad(xk)
# etc.
$ kernprof -l -v gd.py
Total time: 0.561124 s
File: gd.py
Function: gd at line 4
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 @profile
5 def gd(f, grad, x0, rtol=1e-12):
6 1 3 3.0 0.0 xk = x0
7 1 998 998.0 0.2 fk = f(xk)
8 1 2 2.0 0.0 tol = np.inf
9 18 28 1.6 0.0 while tol >= rtol:
10 # .. compute gradient ..
11 17 29337 1725.7 5.2 gk = grad(xk)
12
13 # .. find step size by line-search ..
14 17 101399 5964.6 18.1 step_size = optimize.line_search(f, grad, xk,
15
16 # .. perform update ..
17 170017 152034 0.9 27.1 for i in range(xk.size):
18 170000 260621 1.5 46.4 xk[i] = xk[i] - step_size * gk[i]
19 17 16550 973.5 2.9 fk_new = f(xk)
20 17 135 7.9 0.0 tol = np.abs(fk - fk_new) / fk
21 17 17 1.0 0.0 fk = fk_new
Last column contains % of time spent on that line.
More than 70% of time is spent in performing the parameter
vector update!
vectorizing the parameter update
$ kernprof -l -v gd.py
Total time: 0.138711 s
File: gd.py
Function: gd at line 4
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 @profile
5 def gd(f, grad, x0, rtol=1e-12):
6 1 3 3.0 0.0 xk = x0
7 1 1007 1007.0 0.7 fk = f(xk)
8 1 2 2.0 0.0 tol = np.inf
9 17 34 2.0 0.0 while tol >= rtol:
10 # .. compute gradient ..
11 16 29335 1833.4 21.1 gk = grad(xk)
12
13 # .. find step size by line-search ..
14 16 91213 5700.8 65.8 step_size = optimize.line_search(f, grad, xk,
15
16 # .. perform update ..
17 16 589 36.8 0.4 xk -= step_size * gk
18 16 16363 1022.7 11.8 fk_new = f(xk)
19 16 145 9.1 0.1 tol = np.abs(fk - fk_new) / fk
20 16 20 1.2 0.0 fk = fk_new
from 48.5% to 0.4% !
total time divided by 4, without leaving Python.
memory_profiler
Get memory consumption of a Python program.
Two modes: line-by-line and time-based.
line-by-line similar interface as line_profiler:
$ python -m memory_profiler gd.py
Filename: gd.py
Line # Mem usage Increment Line Contents
================================================
4 352.117 MiB 0.000 MiB @profile
5 def gd(f, grad, x0, rtol=1e-12):
6 352.117 MiB 0.000 MiB xk = x0
7 352.145 MiB 0.027 MiB fk = f(xk)
8 352.145 MiB 0.000 MiB tol = np.inf
9 354.133 MiB 1.988 MiB while tol >= rtol:
10 # .. compute gradient ..
11 354.133 MiB 0.000 MiB gk = grad(xk)
12
13 # .. find step size by line-search ..
14 354.133 MiB 0.000 MiB step_size = optimize.line_search(f, grad, xk, -gk, gfk=
15
16 # .. perform update ..
17 354.133 MiB 0.000 MiB xk -= step_size * gk
18 354.133 MiB 0.000 MiB fk_new = f(xk)
19 354.133 MiB 0.000 MiB tol = np.abs(fk - fk_new) / fk
20 354.133 MiB 0.000 MiB fk = fk_new
memory_profiler
time-based memory consumption.
run script with mprof run my_script.py
plot with mprof plot
memory_profiler, example 2
Two functions, process_data, create_data
import numpy as np
import scipy.signal
@profile
def create_data():
ret = []
for n in range(70):
ret.append(np.random.randn(1, 70, 71, 72))
return ret
@profile
def process_data(data):
data = np.concatenate(data)
detrended = scipy.signal.detrend(data, axis=0)
return detrended
memory_profiler, example 2
Two functions, process_data, create_data
IPython shortcuts
line_profilerand memory_profilerhave IPython shortcuts
(AKA magic commands).
line_profilershortcut: %lprun
memory_profilershortcut: %memit, %mprun
IPython %timeit:
YEP
recursive acronym for YEP extension
profiler
Existing Python profilers only record Python functions.
Many codebases contain significant parts written in Cython/C/C++.
YEP Allows to get timing information for compiled extensions
(Cython/C/C++)
Only one that does it without special compiling/linking.
Integrates within Python the google-perftoolsprofiler.
Does not work on Windows.
YEP
Simple example using scikit-learn
The svm module in scikit-learn uses libsvm.
import numpy as np
from sklearn import svm, datasets
X, y = datasets.make_classification(n_samples=1000)
clf = svm.SVC()
clf.fit(X, y)
run with $ python -m yep -v simple_svm.py
YEP output
$ python -m yep -v simple_svm.py
2 0.7% 0.7% 268 97.5% _PyEval_EvalFrameEx
0 0.0% 0.7% 267 97.1% _PyClassMethod_New
0 0.0% 0.7% 267 97.1% _PyEval_EvalCode
0 0.0% 0.7% 267 97.1% _PyEval_EvalCodeEx
0 0.0% 0.7% 267 97.1% _PyObject_Call
0 0.0% 0.7% 267 97.1% __PyBuiltin_Init
0 0.0% 0.7% 206 74.9% __mh_execute_header
0 0.0% 0.7% 193 70.2% _Py_GetArgcArgv
0 0.0% 0.7% 193 70.2% _Py_Main
0 0.0% 0.7% 193 70.2% _main
0 0.0% 0.7% 193 70.2% start
0 0.0% 0.7% 188 68.4% _PyInit_libsvm
0 0.0% 0.7% 187 68.0% _svm_train
0 0.0% 0.7% 186 67.6% svm::Solver::Solve
48 17.5% 18.2% 184 66.9% svm_csr::Kernel::kernel_precomputed
135 49.1% 67.3% 135 49.1% svm::Kernel::kernel_rbf
0 0.0% 67.3% 74 26.9% _PyEval_CallObjectWithKeywords
0 0.0% 67.3% 74 26.9% _PyImport_ImportModuleLevelObject
0 0.0% 67.3% 74 26.9% __PyObject_CallMethodIdObjArgs
0 0.0% 67.3% 71 25.8% svm::Solver::select_working_set
0 0.0% 67.3% 27 9.8% _PyType_GenericAlloc
25 9.1% 76.4% 25 9.1% _stat
0 0.0% 76.4% 22 8.0% _PyInit_posix
YEP is still a work in progress
many rough edges.
Conclusion
Python has many tools for code profiling.
Some of them are mature and easy to use.
Integration with IPython.
Happy hacking!
Profiling in Python

Profiling in Python

  • 1.
  • 2.
    Profiling : measureinformation of a running program (timing, memory consumption, disk usage, etc.) Profiling is about avoiding unnecessary optimizations and focusing on important ones. It is not enough to known algorithmic to make optimal code: Profiling is necessary to make meaningful comparisons. # three equivalent # ways to add two vectors # option 1 for i in range(a.size): a[i] = a[i] + b[i] # option 2 a = a + b # option 3 a += b
  • 3.
    The main Python profilers cProfile (in thestandard library) line_profiler: line-by-line timing. memory_profiler: to get memory usage information (☺) IPython magic commands. yep extension profiler: to get timing of compiled extensions (☺)
  • 4.
    Example: Gradient-descent algorithm import numpy asnp from scipy import optimize def gd(f, grad, x0, rtol=1e-3): xk = x0 fk = f(xk) tol = np.inf while tol >= rtol: # .. compute gradient .. gk = grad(xk) # .. find step size by line-search .. step_size = optimize.line_search(f, grad, xk, -gk, gfk=gk)[0] # .. perform update .. for i in range(xk.size): xk[i] = xk[i] - step_size * gk[i] fk_new = f(xk) tol = np.abs(fk - fk_new) / fk fk = fk_new return xk Question: should I optimize the gradient evaluation, the line-search or the coefficient update ?
  • 5.
    Profiling How-To 1. Create arealistic example 2. Run it with the appropriate profiler. cProfile line_profiler memory_profiler yep
  • 6.
    (most of the time) cProfile is useless $ python -mcProfile -s time gd.py ncalls tottime percall cumtime percall filename:lineno(function) 2 0.555 0.278 0.555 0.278 {method 'randn' of 'mtrand.RandomState' objects} 1 0.465 0.465 1.075 1.075 samples_generator.py:38(make_classification) 106/90 0.254 0.002 0.290 0.003 {built-in method load_dynamic} 65 0.174 0.003 0.174 0.003 {built-in method dot} 449 0.067 0.000 0.067 0.000 {built-in method loads} 2 0.051 0.026 0.051 0.026 {method 'take' of 'numpy.ndarray' objects} 261 0.039 0.000 0.094 0.000 doccer.py:12(docformat) 628/1 0.039 0.000 2.453 2.453 {built-in method exec} 3737 0.036 0.000 0.036 0.000 {built-in method stat} 7 0.031 0.004 0.034 0.005 {method 'get_filename' of 'zipimport.zipimporter' object 1985 0.031 0.000 0.122 0.000 <frozen importlib._bootstrap>:2016(find_spec) 1114/1108 0.030 0.000 0.086 0.000 {built-in method __build_class__} 1 0.026 0.026 0.211 0.211 gd.py:5(gd) 449 0.022 0.000 0.136 0.000 <frozen importlib._bootstrap>:1534(get_code) 7/3 0.020 0.003 0.223 0.074 {method 'load_module' of 'zipimport'.
  • 7.
    Enter line_profiler line-by-line time consumption decoratethe function with @profile: run the code with kernprof -l -v my_script.py written by Robert Kern @profile def gd(f, grad, x0, rtol=1e-3): xk = x0 fk = f(xk) tol = np.inf while tol >= rtol: # .. compute gradient .. gk = grad(xk) # etc.
  • 8.
    $ kernprof -l-v gd.py Total time: 0.561124 s File: gd.py Function: gd at line 4 Line # Hits Time Per Hit % Time Line Contents ============================================================== 4 @profile 5 def gd(f, grad, x0, rtol=1e-12): 6 1 3 3.0 0.0 xk = x0 7 1 998 998.0 0.2 fk = f(xk) 8 1 2 2.0 0.0 tol = np.inf 9 18 28 1.6 0.0 while tol >= rtol: 10 # .. compute gradient .. 11 17 29337 1725.7 5.2 gk = grad(xk) 12 13 # .. find step size by line-search .. 14 17 101399 5964.6 18.1 step_size = optimize.line_search(f, grad, xk, 15 16 # .. perform update .. 17 170017 152034 0.9 27.1 for i in range(xk.size): 18 170000 260621 1.5 46.4 xk[i] = xk[i] - step_size * gk[i] 19 17 16550 973.5 2.9 fk_new = f(xk) 20 17 135 7.9 0.0 tol = np.abs(fk - fk_new) / fk 21 17 17 1.0 0.0 fk = fk_new Last column contains % of time spent on that line. More than 70% of time is spent in performing the parameter vector update!
  • 9.
    vectorizing the parameter update $ kernprof -l-v gd.py Total time: 0.138711 s File: gd.py Function: gd at line 4 Line # Hits Time Per Hit % Time Line Contents ============================================================== 4 @profile 5 def gd(f, grad, x0, rtol=1e-12): 6 1 3 3.0 0.0 xk = x0 7 1 1007 1007.0 0.7 fk = f(xk) 8 1 2 2.0 0.0 tol = np.inf 9 17 34 2.0 0.0 while tol >= rtol: 10 # .. compute gradient .. 11 16 29335 1833.4 21.1 gk = grad(xk) 12 13 # .. find step size by line-search .. 14 16 91213 5700.8 65.8 step_size = optimize.line_search(f, grad, xk, 15 16 # .. perform update .. 17 16 589 36.8 0.4 xk -= step_size * gk 18 16 16363 1022.7 11.8 fk_new = f(xk) 19 16 145 9.1 0.1 tol = np.abs(fk - fk_new) / fk 20 16 20 1.2 0.0 fk = fk_new from 48.5% to 0.4% ! total time divided by 4, without leaving Python.
  • 10.
    memory_profiler Get memory consumptionof a Python program. Two modes: line-by-line and time-based. line-by-line similar interface as line_profiler: $ python -m memory_profiler gd.py Filename: gd.py Line # Mem usage Increment Line Contents ================================================ 4 352.117 MiB 0.000 MiB @profile 5 def gd(f, grad, x0, rtol=1e-12): 6 352.117 MiB 0.000 MiB xk = x0 7 352.145 MiB 0.027 MiB fk = f(xk) 8 352.145 MiB 0.000 MiB tol = np.inf 9 354.133 MiB 1.988 MiB while tol >= rtol: 10 # .. compute gradient .. 11 354.133 MiB 0.000 MiB gk = grad(xk) 12 13 # .. find step size by line-search .. 14 354.133 MiB 0.000 MiB step_size = optimize.line_search(f, grad, xk, -gk, gfk= 15 16 # .. perform update .. 17 354.133 MiB 0.000 MiB xk -= step_size * gk 18 354.133 MiB 0.000 MiB fk_new = f(xk) 19 354.133 MiB 0.000 MiB tol = np.abs(fk - fk_new) / fk 20 354.133 MiB 0.000 MiB fk = fk_new
  • 11.
    memory_profiler time-based memory consumption. runscript with mprof run my_script.py plot with mprof plot
  • 12.
    memory_profiler, example 2 Two functions, process_data,create_data import numpy as np import scipy.signal @profile def create_data(): ret = [] for n in range(70): ret.append(np.random.randn(1, 70, 71, 72)) return ret @profile def process_data(data): data = np.concatenate(data) detrended = scipy.signal.detrend(data, axis=0) return detrended
  • 13.
  • 14.
    IPython shortcuts line_profilerand memory_profilerhave IPythonshortcuts (AKA magic commands). line_profilershortcut: %lprun memory_profilershortcut: %memit, %mprun IPython %timeit:
  • 15.
    YEP recursive acronym for YEP extension profiler Existing Python profilersonly record Python functions. Many codebases contain significant parts written in Cython/C/C++. YEP Allows to get timing information for compiled extensions (Cython/C/C++) Only one that does it without special compiling/linking. Integrates within Python the google-perftoolsprofiler. Does not work on Windows.
  • 16.
    YEP Simple example using scikit-learn The svm modulein scikit-learn uses libsvm. import numpy as np from sklearn import svm, datasets X, y = datasets.make_classification(n_samples=1000) clf = svm.SVC() clf.fit(X, y) run with $ python -m yep -v simple_svm.py
  • 17.
    YEP output $ python -myep -v simple_svm.py 2 0.7% 0.7% 268 97.5% _PyEval_EvalFrameEx 0 0.0% 0.7% 267 97.1% _PyClassMethod_New 0 0.0% 0.7% 267 97.1% _PyEval_EvalCode 0 0.0% 0.7% 267 97.1% _PyEval_EvalCodeEx 0 0.0% 0.7% 267 97.1% _PyObject_Call 0 0.0% 0.7% 267 97.1% __PyBuiltin_Init 0 0.0% 0.7% 206 74.9% __mh_execute_header 0 0.0% 0.7% 193 70.2% _Py_GetArgcArgv 0 0.0% 0.7% 193 70.2% _Py_Main 0 0.0% 0.7% 193 70.2% _main 0 0.0% 0.7% 193 70.2% start 0 0.0% 0.7% 188 68.4% _PyInit_libsvm 0 0.0% 0.7% 187 68.0% _svm_train 0 0.0% 0.7% 186 67.6% svm::Solver::Solve 48 17.5% 18.2% 184 66.9% svm_csr::Kernel::kernel_precomputed 135 49.1% 67.3% 135 49.1% svm::Kernel::kernel_rbf 0 0.0% 67.3% 74 26.9% _PyEval_CallObjectWithKeywords 0 0.0% 67.3% 74 26.9% _PyImport_ImportModuleLevelObject 0 0.0% 67.3% 74 26.9% __PyObject_CallMethodIdObjArgs 0 0.0% 67.3% 71 25.8% svm::Solver::select_working_set 0 0.0% 67.3% 27 9.8% _PyType_GenericAlloc 25 9.1% 76.4% 25 9.1% _stat 0 0.0% 76.4% 22 8.0% _PyInit_posix YEP is still a work in progress many rough edges.
  • 18.
    Conclusion Python has manytools for code profiling. Some of them are mature and easy to use. Integration with IPython. Happy hacking!