1-Dynamic Program Analysis - Purdue CS PDF
1-Dynamic Program Analysis - Purdue CS PDF
Xiangyu Zhang
Introduction
Dynamic program analysis is to solve problems regarding software
dependability and productivity by inspecting software execution.
Program executions vs. programs
Not all statements are executed; one statement may be executed many times.
Analysis on a single path – the executed path
CS510
Resulting in:
Software Engineering
2
Program Tracing
Outline
What is tracing.
Why tracing.
CS510
How to trace.
Reducing trace size.
Software Engineering
4
What is Tracing
Tracing is a process that faithfully records detailed
information of program execution (lossless).
CS510
Dependence tracing
the sequence of exercised dependences.
Value tracing
the sequence of values that are produced by each instruction.
Memory access tracing
the sequence of memory references during an execution
5
Why Tracing
Debugging
Enables time travel to understand what has happened.
CS510
Code optimizations
Identify hot program paths;
Software Engineering
Data compression;
Value speculation;
Data locality that help cache design;
Security
Malware analysis
Testing
Coverage.
6
Outline
What is tracing.
Why tracing.
CS510
How to trace.
Reducing trace size.
Software Engineering
Trace accessibility
7
Tracing by Printf
Max = 0;
for (p = head; p; p = p->next)
CS510
{
printf(“In loop\n”);
Software Engineering
8
Tracing by Source Level Instrumentation
9
10
An Example
CS510 Software Engineering
An Example
CS510
Software Engineering
printf(“In loop\n”)
11
Limitations of Source Level Instrumentation
12
Tracing by Binary Instrumentation
What is binary instrumentation
Given a binary executable, parses it into intermediate
representation. More advanced representations such as
CS510
13
Static vs. Dynamic Instrumentation
Static: takes an executable and generate an
instrumented executable that can be executed with
many different inputs
CS510
14
Dynamic Binary Instrumentation -
Valgrind
Developed by Julian Seward at Cambridge University.
Google-O'Reilly Open Source Award for "Best Toolmaker" 2006
A merit (bronze) Open Source Award 2004
CS510
Open source
works on x86, AMD64
Software Engineering
Binary pc Dispatcher
Software Engineering
……
BB Compiler
Code
Tool n
Instrumenter
Trampoline New BB
Input New BB Runtime
state
New pc
16
1: do {
Valgrind Infrastructure
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder Tool 2
CS510
1
Software Engineering
Binary 1 ……
Dispatcher BB Compiler
Code
Tool n
Instrumenter
Trampoline
Input
Runtime
OUTPUT:
17
1: do {
Valgrind Infrastructure
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder 1: do {
Tool 2
CS510
2: i=i+1;
3: s1;
Software Engineering
Binary ……
4: } while (i<2)
Dispatcher BB Compiler
Code
Tool n
Instrumenter
Trampoline
Input
Runtime
OUTPUT:
18
1: do {
Valgrind Infrastructure
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder Tool 2
CS510
Software Engineering
Binary ……
Dispatcher BB Compiler
Code
Tool n
Instrumenter
Trampoline 1: do {
Input print(“1”)
2: i=i+1;
Runtime
3: s1;
4: } while (i<2)
OUTPUT:
19
1: do {
Valgrind Infrastructure
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder Tool 2
CS510
Software Engineering
Binary ……
Dispatcher BB Compiler
Code
Tool n
Instrumenter
1 Trampoline
Input 1: do { Runtime
print(“1”)
i=i+1;
s1;
} while (i<2) OUTPUT: 1 1
20
1: do {
Valgrind Infrastructure
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; 5 BB Decoder Tool 2
CS510
5: s2;
Software Engineering
Binary ……
Dispatcher BB Compiler
Code
Tool n
5 Instrumenter
Trampoline
Input 1: do { Runtime
print(“1”)
i=i+1;
s1;
} while (i<2) OUTPUT: 1 1
21
1: do {
Valgrind Infrastructure
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder Tool 2
CS510
Software Engineering
Binary ……
Dispatcher BB Compiler
Code
Tool n
Instrumenter
Trampoline
Input 1: do { Runtime
print(“1”) 5: print (“5”);
i=i+1; s2;
s1;
} while (i<2) OUTPUT: 1 1
22
1: do {
Valgrind Infrastructure
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder Tool 2
CS510
Software Engineering
Binary ……
Dispatcher BB Compiler
Code
Tool n
Instrumenter
Trampoline
1: do {
Input print(“1”)
i=i+1;
Runtime
s1;
} while (i<2)
5: print (“5”); OUTPUT: 1 1 5
s2;
23
Instrumentation with Valgrind
UCodeBlock* SK_(instrument)(UCodeBlock* cb_in, …)
{
…
UCodeBlock cb = VG_(setup_UCodeBlock)(…);
CS510
…
for (i = 0; i < VG_(get_num_instrs)(cb_in); i++) {
Software Engineering
u = VG_(get_instr)(cb_in, i);
switch (u->opcode) {
case LD:
…
case ST:
…
case MOV:
…
case ADD:
…
case CALL:
…
return cb;
} 24
Outline
What is tracing.
Why tracing.
CS510
How to trace.
Reducing trace size.
Software Engineering
25
Fine-Grained Tracing is Expensive
1: sum=0
1: sum=0 2: i=1
2: i=1
CS510
4: i=i+1
5: sum=sum+i 4: i=i+1
endwhile 5: sum=sum+i
6: print(sum)
6: print (sum)
Trace(N=6): 1 2 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5 3 6
4: i=i+1
5: sum=sum+i 4: i=i+1
endwhile 5: sum=sum+i
6: print(sum)
6: print (sum)
Trace(N=6): 1 2 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5 3 6
BB Trace: 1 34 34 34 34 34 36
27
More Ideas
Would a function level tracing idea work?
A trace entry is a function call with its parameters.
CS510
Predicate tracing
Software Engineering
Divide traces into trunks, and then compress them with zlib.
Software Engineering
29
Compression using value predictors
Last n values predictor
Facilitated by a buffer that stores the last n unique values
encountered
CS510
If the next value is one of the n values, the index of the value (in [0,
n-1]) is emitted to the encoded trace, prefixed with a bit 0 to
Software Engineering
Example:
999 333 999 333 999 999 999 333 use last-2 predictor
1 999 1 333 00 01 00 00 00 01 (underlined are 32 bits)
999 333 555 555 999 333 999 999 999 333
30
Compression using value predictors
Decompression
Take one bit from the encoded trace, if it is 1, emit the next 32 bits. If it is 0,
CS510
emit the value in the buffer indexed by the next log n bits.
Maintain the table in the same way as compression
Software Engineering
31
Compression using value predictors
Finite Context Method (FCM)
Facilitated by a look up table that predicts a value based on the
context of left n values. 2-FCM, 3-FCM
CS510
If the next value can be found in the table through its left context, a
bit 0 is emitted to the encoded trace.
Software Engineering
Example:
12345345345…3456
1 1 1 2 1 3 1 4 1 5 1 3 1 4 0 0 0 0 … 0 1 6 (underlined are 32 bits)
32
Compression using value predictors
Decompression
Take one bit from the encoded trace, if it is 1, emit the next 32 bits. If it is 0,
CS510
emit the value looked up from the table using the left n values.
Maintain the table in the same way as compression
Software Engineering
33
Compression using value predictors
FCM (finite context method).
Example, FCM-3
CS510
Software Engineering
Compressed
1
34
Compression using value predictors
FCM (finite context method).
Example, FCM-3
CS510
Software Engineering
Compressed
0B
Fast.
Good compression.
Software Engineering
Methodology:
Have a small sliding window on the compressed string.
The string in the window is plain text (decompressed)
The strings on the left and the right of the window are
compressed.
36
Enable bidirectional traversal
Forward compressed, backward decompressed FCM
Traditional FCM is forward compressed, forward decompressed
CS510
Uncompressed Right
Left Context
Context lookup
lookup table
table
XYZ A
Software Engineering
X YYZZ A
A
Compressed Uncompressed
current context
1X Y Z A
Bidirectional FCM
37
Left-context look up table
Predict the next value based on its left context
Right-context look up table
CS510
38
Bidirectional FCM - example
1A X
XYY Z1 111
CS510
A XYZ AXY Z
39
Characteristics of bidirectional predictors
40