Symbolic Execution And KLEE

Symbolic executionOverview of work done by Dawson Engler’s group at Stanford (EGT/EXE/KLEE*)byShauvik Roy Choudharyhttp://cc.gatech.edu/~shauvikSome slides adapted from the EXE and KLEE presentations + slides from Saswat

Old research area but still active..First introduced in 1975 (source: Saswat)1976 by James King, IBM – TJ watsonVery active area of research. Eg.EGT / EXE / KLEE [Stanford]DART [Bell Labs]CUTE [UIUC]SAGE, Pex [MSR Redmond]Vigilante [MSR Cambridge]BitScope [Berkeley/CMU]CatchConv [Berkeley]JPF [NASA Ames]2

Symbolic ExecutionSymbolic execution refers to execution of program with symbols as argument.Unlike concrete execution, in symbolic execution the program can take any feasible path. (limitation: constraint solver)During symbolic execution, program state consists ofsymbolic values for some memory locationspath conditionPath condition is a conjuct of constraints on the symbolic input values.Solution of path-condition is an test-input that covers the respective path.3

Implementation of Symbolic ExecutionTransformation approachtransform the program to another program that operates on symbolic values such that execution of the transformed program is equivalent to symbolic execution of the original programdifficult to implement, portable solution, suitable for Java, .NETInstrumentation approachcallback hooks are inserted in the program such that symbolic execution is done in background during normal execution of programeasy to implement for CCustomized runtime approachCustomize the runtime (e.g., JVM) to support symbolic executionApplicable to Java, .NET, difficult to implement, flexible, not portable4CUTE, KLEEJPF

Limitations of Symbolic ExecutionLimited by the power of constraint solvercannot handle non-linear and very complex constraintsDoes not scale when number of paths are large. (subject of ongoing research in this area)Source code, or equivalent (e.g., Java class files) is required for precise symbolic execution5

EGT & EXESlides based on D. Engler’s slides

Generic features: Baroque interfaces, tricky input, rats nest of conditionals.Enormous undertaking to hit with manual testing.Random “fuzz” testingCharm: no manual workBlind generation makes hard to hit errors for narrow input rangeAlso hard to hit errors that require structureThis talk: a simple trick to finesse.Goal: find many bugs in systems code

EGT: Execution Generated Testing [SPIN’05]Basic Idea: Use the code itself to construct its input !Basic Algorithm:Symbolic execution + constraints solving.Run code on symbolic inputs, initial value = “anything”As code observes inputs, it tells us values it can be.At conditionals that uses symbolic input, forkOn true branch, add constraint that input satisfies checkOn false that it does not.Then generate constraints using these inputs and re-run code using them.8How to make system code crash itself !

The toy exampleInitialize x to be “any int”Code will run 3 times.Solve constraints at each to get our 3 test cases.9

The big pictureImplementation prototypeDo source-to-source transformation using CILUse CVCL decision procedure to solve constraints, then re-run code on concrete valuesRobustness: use mixed symbolic and concrete execution3 ways to look at what’s going onGrammar extractionTurn code inside out from input consumer to generatorSort-of Heisenberg effect: observations perturb symbolic inputs into increasingly concrete ones. More definite observation = more definite perturbation10

Mixed executionBasic idea: given an operation:If all of it’s operands are concrete, just do it.If any are symbolic, add constraint.If current constraints are impossible, stop.If current path causes something to blow up, solve & emit.If current path calls unmodelled function, solve & call.If program exits, solve & emit.How to track?Use variable addresses to determine if symbolic or concreteNote: Symbolic assignment not destructive. Creates new symbol11

Example transformation “+”Each varv has v.concrete and v.symbolic fields If v is concrete, symbol = <invalid> and vice versa12

ResultsMutt vs <= 1.4 have buffer overflow (osdi paper)Input size 4, took 34 minutes to generate 458 tests with 98% st coverageprintf(3 implementations pintOS, gccfast, embedded)Made format strings symbolicTwo bugsIncorrect grouping of integers Incorrect handling of plus flags (“%” followed by space)14

More..WsMP3 server case study2ooo LOCTechnique: Make recv input symbolicFound known security hole + 2 new bugs15Network controlled infinite loopBuffer overflow

EXE: EXecution generated Executions [CCS’06]Same ideas as EGTMain contributionsMore practical tool: Can test any code pathGenerates actual attacksConstraint Solver : STPDecision solver for bitvectors and arrays.If solvable, passes constraints to MiniSATFour times lesser code than CVCL and magnitude fasterArray optimizations (substitution, refinements, simplification)16Automatically Generating inputs of Death !

The mechanicsUser marks input to treat symbolically using either:Compile with EXE compiler, exe-cc. Uses CIL toInsert checks around every expression: if operands all concrete, run as normal. Otherwise, add as constraintInsert fork calls when symbolic could cause multiple acts./a.out: forks at each decision point.When path terminates use STP to solve constraints.Terminates when: (1) exit, (2) crash, (3) EXE detects errRerun concrete through uninstrumented code.

Isn’t exponential expensive?Only fork on symbolic branches.Most concrete (linear).Loops? Heuristics.Default: DFS. Linear processes with chain depth.Can get stuck.“Best first” search: chose branch, backtrack to point that will run code hit fewest times.Can do better…However:Happy to let run for weeks as long as generating interesting test cases. Competition is manual and random.

Mixed executionBasic idea: given expression (e.g., deref, ALU op)If all of its operands are concrete, just do it.If any are symbolic, add as constraint.If current constraints are impossible, stop.If current path hits error or exit(), solve+emit.If calls uninstrumented code: do call, or solve and do callExample: “x = y + z”If y, z both concrete, execute. Record x = concrete.Otherwise set “x = y + z”, record x =symbolic.Result:Most code runs concretely: small slice deals w/ symbolics.Robust: do not need all source code (e.g., OS). Just run

LimitsMissed constraints:If call asm, or CIL cannot eat file.STP cannot do div/mod: constraint to be power of 2, shift, mask respectively.Cannot handle **p where “p” is symbolic: must concretize *p. (Note: **p still symbolic.)Stops path if cannot solve; can get lost in exponentials.Missing:No symbolic function pointers, symbolics passed to varargs not tracked.No floating point.long long support is erratic.

EXE ResultsBerkley Packet FilterTwo buffer overflow exploitsudhcpd – well tested user level DHCP serverFive memory errorsPCRE – Perl Compatible Regular ExpressionsMany out of bounds writes leading to abort in glibc on freeDisks of death – File systemsFour bugs on ext2 & ext 3 file systems.Null pointer dereference in JFS21

A galactic view [Oakland’06]

KLEEThanks to CristianCadar for the slides

24Code complexityTricky control flowComplex dependenciesAbusive use of pointer operationsEnvironmental dependenciesCode has to anticipate all possible interactionsIncluding malicious onesWriting Systems Code Is Hard

KLEE[OSDI 2008, Best Paper Award]Based on symbolic execution and constraint solving techniquesAutomatically generates high coverage test suitesOver 90% on average on ~160 user-level appsFinds deep bugs in complex systems programsIncluding higher-level correctness ones25

Toy Examplex= x < 0intbad_abs(intx) { if (x < 0) return –x; if (x == 1234) return –x; return x;}TRUEFALSEx0x< 0x = 1234return -xTRUEFALSEx= 1234x1234x = -2return xreturn -xtest1.outx = 3x = 1234test2.outtest3.out26

KLEE ArchitectureLLVM bytecodeC codeLLVMx = -2K L E ESYMBOLIC ENVIRONMENTx = 1234x = 3x  0x  1234x = 3Constraint Solver (STP)27

OutlineMotivation Example and Basic ArchitectureScalability ChallengesExperimental Evaluation28

Three Big ChallengesMotivation Example and Basic ArchitectureScalability ChallengesExponential number of paths

Interaction with environmentExperimental Evaluation29

Exponential Search SpaceNaïve exploration can easily get “stuck”Use search heuristics:Coverage-optimized searchSelect path closest to an uncovered instruction

Favor paths that recently hit new codeRandom path searchSee [KLEE – OSDI’08]30

Constraint SolvingDominates runtimeInherently expensive (NP-complete)

Invoked at every branchTwo simple and effective optimizationsEliminating irrelevant constraints

Caching solutionsDramatic speedup on our benchmarks32

Eliminating Irrelevant ConstraintsIn practice, each branch usually depends on a small number of variables……if (x < 10) { …} x + y > 10z & -z = zx< 10 ?33

Caching SolutionsStatic set of branches: lots of similar constraint sets2  y < 100x > 3x + y > 10x = 5y = 15x = 5y = 152  y < 100x + y > 10Eliminating constraintscannot invalidate solution2  y < 100x > 3x + y > 10x < 10x = 5y = 15Adding constraints often does not invalidate solutionUBTree data structure [Hoffman and Koehler, IJCAI ’99]34

Dramatic SpeedupAggregated data over 73 applicationsTime (s)Executed instructions (normalized)35

Environment: Calling Out Into OSintfd = open(“t.txt”, O_RDONLY);If all arguments are concrete, forward to OSOtherwise, provide models that can handle symbolic filesGoal is to explore all possible legal interactions with the environmentintfd = open(sym_str, O_RDONLY);37

Environmental Modeling// actual implementation: ~50 LOCssize_tread(intfd, void *buf, size_t count) {exe_file_t *f = get_file(fd); …memcpy(buf, f->contents + f->off, count)f->off += count; …}Plain C code run by KLEEUsers can extend/replace environment w/o any knowledge of KLEE internalsCurrently: effective support for symbolic command line arguments, files, links, pipes, ttys, environment vars38

Does KLEE work?Motivation Example and Basic ArchitectureScalability ChallengesEvaluationCoverage results

GNU Coreutils SuiteCore user-level apps installed on many UNIX systems89 stand-alone (i.e. excluding wrappers) apps (v6.10)File system management: ls, mkdir, chmod, etc.

Management of system properties: hostname, printenv, etc.

Text file processing : sort, wc, od, etc.

…Variety of functions, different authors,intensive interaction with environmentHeavily tested, mature code40

Symbolic Execution And KLEE

More Related Content

What's hot (20)

Similar to Symbolic Execution And KLEE (20)

More from Shauvik Roy Choudhary, Ph.D. (10)

Recently uploaded (20)

Symbolic Execution And KLEE