0% found this document useful (0 votes)
55 views

1-Dynamic Program Analysis - Purdue CS PDF

Dynamic program analysis involves inspecting software execution to solve problems regarding dependability and productivity. It has several advantages over static analysis such as handling variable instantiation, precision, and applicability to real executions. Key techniques include tracing, profiling, checkpointing, slicing, and debugging. Tracing records detailed execution information and is useful for debugging, optimizations, and security. It can be done through source code instrumentation or binary instrumentation. Binary instrumentation avoids source code requirements and easily handles libraries but incurs overhead.

Uploaded by

runqi fan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

1-Dynamic Program Analysis - Purdue CS PDF

Dynamic program analysis involves inspecting software execution to solve problems regarding dependability and productivity. It has several advantages over static analysis such as handling variable instantiation, precision, and applicability to real executions. Key techniques include tracing, profiling, checkpointing, slicing, and debugging. Tracing records detailed execution information and is useful for debugging, optimizations, and security. It can be done through source code instrumentation or binary instrumentation. Binary instrumentation avoids source code requirements and easily handles libraries but incurs overhead.

Uploaded by

runqi fan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Dynamic Program Analysis

Xiangyu Zhang
Introduction
Dynamic program analysis is to solve problems regarding software
dependability and productivity by inspecting software execution.
Program executions vs. programs
Not all statements are executed; one statement may be executed many times.
Analysis on a single path – the executed path
CS510

All variables are instantiated (solving the aliasing problem)

Resulting in:
Software Engineering

Relatively lower learning curve.


Precision.
Applicability.
Scalability.
Dynamic program analysis can be constructed from a set of primitives
Tracing
Profiling
Checkpointing and replay
Dynamic slicing
Execution indexing
Delta debugging
Applications
Dynamic information flow tracking
Automated debugging

2
Program Tracing
Outline
What is tracing.
Why tracing.
CS510

How to trace.
Reducing trace size.
Software Engineering

4
What is Tracing
Tracing is a process that faithfully records detailed
information of program execution (lossless).
CS510

Control flow tracing


the sequence of executed statements.
Software Engineering

Dependence tracing
the sequence of exercised dependences.
Value tracing
the sequence of values that are produced by each instruction.
Memory access tracing
the sequence of memory references during an execution

The most basic primitive.

5
Why Tracing
Debugging
Enables time travel to understand what has happened.
CS510

Code optimizations
Identify hot program paths;
Software Engineering

Data compression;
Value speculation;
Data locality that help cache design;
Security
Malware analysis
Testing
Coverage.

6
Outline
What is tracing.
Why tracing.
CS510

How to trace.
Reducing trace size.
Software Engineering

Trace accessibility

7
Tracing by Printf
Max = 0;
for (p = head; p; p = p->next)
CS510

{
printf(“In loop\n”);
Software Engineering

if (p->value > max)


{
printf(“True branch\n”);
max = p->value;
}
}

8
Tracing by Source Level Instrumentation

Read a source file and parse it into ASTs.


Annotate the parse trees with instrumentation.
CS510

Translate the annotated trees to a new source file.


Compile the new source.
Software Engineering

Execute the program and a trace produced.

9
10
An Example
CS510 Software Engineering
An Example
CS510
Software Engineering

printf(“In loop\n”)

11
Limitations of Source Level Instrumentation

Hard to handle libraries.


Proprietary libraries: communication (MPI, PVM), linear
algebra (NGA), database query (SQL libraries).
CS510

Hard to handle multi-lingual programs


Software Engineering

Source code level instrumentation is heavily language


dependent.
Requires source code
Worms and viruses are rarely provided with source code

12
Tracing by Binary Instrumentation
What is binary instrumentation
Given a binary executable, parses it into intermediate
representation. More advanced representations such as
CS510

control flow graphs may also be generated.


Software Engineering

Tracing instrumentation is added to the intermediate


representation.
A lightweight compiler compiles the instrumented
representation into a new executable.
Features
No source code requirement
Easily handle libraries.

13
Static vs. Dynamic Instrumentation
Static: takes an executable and generate an
instrumented executable that can be executed with
many different inputs
CS510

Dynamic: given the original binary and an input,


Software Engineering

starts executing the binary with the input, during


execution, an instrumented binary is generated on
the fly; essentially the instrumented binary is
executed.

14
Dynamic Binary Instrumentation -
Valgrind
Developed by Julian Seward at Cambridge University.
Google-O'Reilly Open Source Award for "Best Toolmaker" 2006
A merit (bronze) Open Source Award 2004
CS510

Open source
works on x86, AMD64
Software Engineering

Easy to execute, e.g.:


valgrind --tool=memcheck ls
It becomes very popular
One of the two most popular dynamic instrumentation tools
Pin and Valgrind
Very good usability, extendibility, robust
25MLOC
Mozilla, MIT, Berkley-security, Me, and many other places

Overhead is the problem


5-10X slowdown without any instrumentation
Reading assignment
Valgrind: A Framework for Heavyweight Dynamic Binary
Instrumentation (PLDI07) 15
Valgrind Infrastructure
Tool 1
VALGRIND CORE
BB
BB Decoder Tool 2
pc
CS510

Binary pc Dispatcher
Software Engineering

……
BB Compiler
Code
Tool n

Instrumenter
Trampoline New BB
Input New BB Runtime
state

New pc

16
1: do {
Valgrind Infrastructure
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder Tool 2
CS510

1
Software Engineering

Binary 1 ……
Dispatcher BB Compiler
Code
Tool n

Instrumenter
Trampoline
Input
Runtime

OUTPUT:
17
1: do {
Valgrind Infrastructure
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder 1: do {
Tool 2
CS510

2: i=i+1;
3: s1;
Software Engineering

Binary ……
4: } while (i<2)
Dispatcher BB Compiler
Code
Tool n

Instrumenter
Trampoline
Input
Runtime

OUTPUT:
18
1: do {
Valgrind Infrastructure
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder Tool 2
CS510
Software Engineering

Binary ……
Dispatcher BB Compiler
Code
Tool n

Instrumenter
Trampoline 1: do {
Input print(“1”)
2: i=i+1;
Runtime
3: s1;
4: } while (i<2)
OUTPUT:
19
1: do {
Valgrind Infrastructure
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder Tool 2
CS510
Software Engineering

Binary ……
Dispatcher BB Compiler
Code
Tool n

Instrumenter
1 Trampoline
Input 1: do { Runtime
print(“1”)
i=i+1;
s1;
} while (i<2) OUTPUT: 1 1
20
1: do {
Valgrind Infrastructure
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; 5 BB Decoder Tool 2
CS510

5: s2;
Software Engineering

Binary ……
Dispatcher BB Compiler
Code
Tool n

5 Instrumenter
Trampoline
Input 1: do { Runtime
print(“1”)
i=i+1;
s1;
} while (i<2) OUTPUT: 1 1
21
1: do {
Valgrind Infrastructure
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder Tool 2
CS510
Software Engineering

Binary ……
Dispatcher BB Compiler
Code
Tool n

Instrumenter
Trampoline
Input 1: do { Runtime
print(“1”) 5: print (“5”);
i=i+1; s2;
s1;
} while (i<2) OUTPUT: 1 1
22
1: do {
Valgrind Infrastructure
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder Tool 2
CS510
Software Engineering

Binary ……
Dispatcher BB Compiler
Code
Tool n

Instrumenter
Trampoline
1: do {
Input print(“1”)
i=i+1;
Runtime
s1;
} while (i<2)
5: print (“5”); OUTPUT: 1 1 5
s2;
23
Instrumentation with Valgrind
UCodeBlock* SK_(instrument)(UCodeBlock* cb_in, …)
{

UCodeBlock cb = VG_(setup_UCodeBlock)(…);
CS510


for (i = 0; i < VG_(get_num_instrs)(cb_in); i++) {
Software Engineering

u = VG_(get_instr)(cb_in, i);
switch (u->opcode) {
case LD:

case ST:

case MOV:

case ADD:

case CALL:

return cb;
} 24
Outline
What is tracing.
Why tracing.
CS510

How to trace.
Reducing trace size.
Software Engineering

25
Fine-Grained Tracing is Expensive
1: sum=0
1: sum=0 2: i=1
2: i=1
CS510

3: while ( i<N) do 3: while ( i<N) do


Software Engineering

4: i=i+1
5: sum=sum+i 4: i=i+1
endwhile 5: sum=sum+i
6: print(sum)

6: print (sum)

Trace(N=6): 1 2 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5 3 6

Space Complexity: 4 bytes * Execution length


26
Basic Block Level Tracing
1: sum=0
1: sum=0 2: i=1
2: i=1
CS510

3: while ( i<N) do 3: while ( i<N) do


Software Engineering

4: i=i+1
5: sum=sum+i 4: i=i+1
endwhile 5: sum=sum+i
6: print(sum)

6: print (sum)

Trace(N=6): 1 2 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5 3 6

BB Trace: 1 34 34 34 34 34 36
27
More Ideas
Would a function level tracing idea work?
A trace entry is a function call with its parameters.
CS510

Predicate tracing
Software Engineering

1: sum=0 Instruction trace Predicate trace


2: i=1 1 2 3 6 F
3: while ( i<N) do
1 2 3 4 5 3 6 TF
4: i=i+1
5: sum=sum+i
endwhile
6: print(sum)

Lose random accessibility


Path based tracing
28
Compression
Using zlib
Zlib is a software library used for data compression. It wraps
the compression algorithm used in gzip.
CS510

Divide traces into trunks, and then compress them with zlib.
Software Engineering

Disadvantage: trace can only be accessed after complete


decompression; slow
Desired features
Accessing traces in their compressed form.
Traversing forwards and backwards.
fast

29
Compression using value predictors
Last n values predictor
Facilitated by a buffer that stores the last n unique values
encountered
CS510

If the next value is one of the n values, the index of the value (in [0,
n-1]) is emitted to the encoded trace, prefixed with a bit 0 to
Software Engineering

indicate the prediction is correct.


Otherwise (mis-prediction), the original value (32 bits) is emitted to
the encoded trace, prefixed with a bit 1 to indicate mis-prediction.
The buffer is updated with least used strategy.

Example:
999 333 999 333 999 999 999 333 use last-2 predictor
1 999 1 333 00 01 00 00 00 01 (underlined are 32 bits)

999 333 555 555 999 333 999 999 999 333

30
Compression using value predictors

Decompression
Take one bit from the encoded trace, if it is 1, emit the next 32 bits. If it is 0,
CS510

emit the value in the buffer indexed by the next log n bits.
Maintain the table in the same way as compression
Software Engineering

31
Compression using value predictors
Finite Context Method (FCM)
Facilitated by a look up table that predicts a value based on the
context of left n values. 2-FCM, 3-FCM
CS510

If the next value can be found in the table through its left context, a
bit 0 is emitted to the encoded trace.
Software Engineering

Otherwise (mis-prediction), the original value (32 bits) is emitted to


the encoded trace, prefixed with a bit 1 to indicate mis-prediction.
The lookup table is updated accordingly.

Example:
12345345345…3456
1 1 1 2 1 3 1 4 1 5 1 3 1 4 0 0 0 0 … 0 1 6 (underlined are 32 bits)

32
Compression using value predictors

Decompression
Take one bit from the encoded trace, if it is 1, emit the next 32 bits. If it is 0,
CS510

emit the value looked up from the table using the left n values.
Maintain the table in the same way as compression
Software Engineering

33
Compression using value predictors
FCM (finite context method).
Example, FCM-3
CS510
Software Engineering

Uncompressed Left Context lookup table


XYZ A
XYZ A

Compressed
1

34
Compression using value predictors
FCM (finite context method).
Example, FCM-3
CS510
Software Engineering

Uncompressed Left Context lookup table


XYZ B
XYZ B
A

Compressed
0B

Length(Compressed) = n/32 + n*(1- prediction rate)

It was shown that predictors are better than zlib;


It works so well because the repetitive pattern caused by loops;
35
Only forward traversable;
Bidirectional Compression
Allow trace traversal in the compressed form.
Bidirectional.
CS510

Fast.
Good compression.
Software Engineering

Methodology:
Have a small sliding window on the compressed string.
The string in the window is plain text (decompressed)
The strings on the left and the right of the window are
compressed.

36
Enable bidirectional traversal
Forward compressed, backward decompressed FCM
Traditional FCM is forward compressed, forward decompressed
CS510

Uncompressed Right
Left Context
Context lookup
lookup table
table
XYZ A
Software Engineering

X YYZZ A
A
Compressed Uncompressed
current context
1X Y Z A

‰ Bidirectional FCM

Right Context lookup table Left Context lookup table

37
Left-context look up table
Predict the next value based on its left context
Right-context look up table
CS510

Predict the next value based on its right context


Software Engineering

Moving the plain text window of size n one step


forward
Decompress using the left-context lookup (now get a window
of size n+1)
Compress the first value of window using the right-context
lookup table (again we get a window of size n)
Moving the window one step barward
The opposite actions.

38
Bidirectional FCM - example

1A X
XYY Z1 111
CS510

Right Context lookup table Left Context lookup table


Software Engineering

A XYZ AXY Z

39
Characteristics of bidirectional predictors

High compression rate


The compression rate is nearly the SAME as unidirectional
predictors;
CS510

Fast compression and de-compression


Roughly TWO times slower than unidirectional predictors;
Software Engineering

40

You might also like