A0-Class
A0-Class
Assignment 0
Programmer Headaches
●
Why is the program slow?
●
How many page faults / context switches in the program?
●
Cache misses / Branch mispredictions
●
Where is the Hotspot in the program? Which function?
●
Memory leaks in the program?
●
gdb is too slow, working with breakpoints is cumbersome.
Outline
●
Pin
●
Valgrind
●
Perf
●
Gprof
Pin
●
Code instrumentation on x86 binaries
– insert arbitrary code (C or C++) in arbitrary places
in an executable
●
Pin
●
Code instrumentation on x86 binaries
Generated Executable
pintool
[[[[pin [[ pin init code ]]
pininit
initcode
code]]]] [[ pin init code ]]
[[[[analysis
analysiscode
codeforforLine
Line0]]
0]] Analysis code function() {
Analysis code function() {
Line 0
Line 0
Original binary Foreach Instruction:
Foreach Instruction:
[[[[analysis [[ analysis code for all Instructions ]]
Line 0 analysiscode
codefor
forLine
Line1]]
1]] [[ analysis code for all Instructions ]]
Line1 0
Line Line 1
Line 1 If ALU Instruction:
Line2 1
Line If ALU Instruction:
[[ analysis code for ALU Instruction ]]
...Line 2 [[[[analysis [[ analysis code for ALU Instruction ]]
...... analysiscode
codefor
forLine
Line2]]
2]] If Branch Instruction:
... Line
Line22 If Branch Instruction:
[[ analysis code for Branch Instruction ]]
[[ analysis code for Branch Instruction ]]
[[[[analysis
analysiscode
codefor
forLine
Line3]]
3]]
If Memory Access Instruction:
If Memory Access Instruction:
Line [[ analysis code for Memory Access Instruction ]]
Line33 ...
[[ analysis code for Memory Access Instruction ]]
...... ...
...... [[ pin final code ]]
[[ pin final code ]]
[[[[pin
pinfinal
finalcode
code]]]]
Pintool – Example code
...
...
////Pin
Pincalls
callsthis
thisfunction
functionevery
everytime
timeaanew
newinstruction
instructionisisencountered
encountered
VOID
VOIDInstruction(INS
Instruction(INSins,
ins,VOID*
VOID*v)v){{
////Insert
Insertaacall
callto
todocount
docountbefore
beforeevery
everyinstruction,
instruction,no
noarguments
argumentsare
arepassed
passed
INS_InsertCall(ins,
INS_InsertCall(ins,IPOINT_BEFORE,
IPOINT_BEFORE,(AFUNPTR)docount,
(AFUNPTR)docount,IARG_END);
IARG_END);
}}
VOID
VOIDdocount()
docount(){{icount++;
icount++;}}
...
...
Pin
●
Instruction stats, Register access patterns,
Memory access patterns, Branches stats
– Application Trace
Pin – Execution
●
$ pin -t pintool – binary
●
Pin Tools page
Assignment
●
Profile and analyze a family of 2 or more similar
programs using the Intel PIN tool.
– Collect instruction count, Instruction Address Trace,
Memory Reference Trace.
Valgrind
●
Valgrind is also an instrumentation tool.
●
memcheck - Check memory related errors
●
cachegrind - a cache and branch-prediction
profiler
●
callgrind - a call-graph generating cache and
branch prediction profiler
Valgrind
●
Valgrind is also an instrumentation tool
●
Can instrument most binaries – Intel, ARM,
PPC, ARM, Android on ARM
●
$ valgrind ./a.out
●
Valgrind Page, Tutorial, FAQ page.
Perf tool
●
Profiler tool. Performance analysis tool.
●
Prints out performance counters.
– Hardware counters from PMU: number of cycles,
instructions retired, L1 cache misses and so on
– Software counters - context-switches
●
For the full list of events do
– $ perf list
perf Example
$ perf stat -B dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB) copied, 0.956217 s, 535 MB/s
Assignment 0
Programmer Headaches
●
Why is the program slow?
●
How many page faults / context switches in the program?
●
Cache misses / Branch mispredictions
●
Where is the Hotspot in the program? Which function?
●
Memory leaks in the program?
●
gdb is too slow, working with breakpoints is cumbersome.
Outline
●
Pin
●
Valgrind
●
Perf
●
Gprof
Pin
●
Code instrumentation on x86 binaries
– insert arbitrary code (C or C++) in arbitrary places
in an executable
●
Pin
●
Code instrumentation on x86 binaries
Generated Executable
pintool
[[[[pin [[ pin init code ]]
pininit
initcode
code]]]] [[ pin init code ]]
[[[[analysis
analysiscode
codeforforLine
Line0]]
0]] Analysis code function() {
Analysis code function() {
Line 0
Line 0
Original binary Foreach Instruction:
Foreach Instruction:
[[[[analysis [[ analysis code for all Instructions ]]
Line 0 analysiscode
codefor
forLine
Line1]]
1]] [[ analysis code for all Instructions ]]
Line1 0
Line Line 1
Line 1 If ALU Instruction:
Line2 1
Line If ALU Instruction:
[[ analysis code for ALU Instruction ]]
...Line 2 [[[[analysis [[ analysis code for ALU Instruction ]]
...... analysiscode
codefor
forLine
Line2]]
2]] If Branch Instruction:
... Line
Line22 If Branch Instruction:
[[ analysis code for Branch Instruction ]]
[[ analysis code for Branch Instruction ]]
[[[[analysis
analysiscode
codefor
forLine
Line3]]
3]]
If Memory Access Instruction:
If Memory Access Instruction:
Line [[ analysis code for Memory Access Instruction ]]
Line33 ...
[[ analysis code for Memory Access Instruction ]]
...... ...
...... [[ pin final code ]]
[[ pin final code ]]
[[[[pin
pinfinal
finalcode
code]]]]
Pintool – Example code
...
...
////Pin
Pincalls
callsthis
thisfunction
functionevery
everytime
timeaanew
newinstruction
instructionisisencountered
encountered
VOID
VOIDInstruction(INS
Instruction(INSins,
ins,VOID*
VOID*v)v){{
////Insert
Insertaacall
callto
todocount
docountbefore
beforeevery
everyinstruction,
instruction,no
noarguments
argumentsare
arepassed
passed
INS_InsertCall(ins,
INS_InsertCall(ins,IPOINT_BEFORE,
IPOINT_BEFORE,(AFUNPTR)docount,
(AFUNPTR)docount,IARG_END);
IARG_END);
}}
VOID
VOIDdocount()
docount(){{icount++;
icount++;}}
...
...
Pin
●
Instruction stats, Register access patterns,
Memory access patterns, Branches stats
– Application Trace
Pin – Execution
●
$ pin -t pintool – binary
●
Pin Tools page
Assignment
●
Profile and analyze a family of 2 or more similar
programs using the Intel PIN tool.
– Collect instruction count, Instruction Address Trace,
Memory Reference Trace.
Valgrind
●
Valgrind is also an instrumentation tool.
●
memcheck - Check memory related errors
●
cachegrind - a cache and branch-prediction
profiler
●
callgrind - a call-graph generating cache and
branch prediction profiler
Valgrind
●
Valgrind is also an instrumentation tool
●
Can instrument most binaries – Intel, ARM,
PPC, ARM, Android on ARM
●
$ valgrind ./a.out
●
Valgrind Page, Tutorial, FAQ page.
Perf tool
●
Profiler tool. Performance analysis tool.
●
Prints out performance counters.
– Hardware counters from PMU: number of cycles,
instructions retired, L1 cache misses and so on
– Software counters - context-switches
●
For the full list of events do
– $ perf list
perf Example
$ perf stat -B dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB) copied, 0.956217 s, 535 MB/s