SlideShare a Scribd company logo
Symbolic executionOverview of work done by Dawson Engler’s group at Stanford (EGT/EXE/KLEE*)byShauvik Roy Choudharyhttp://cc.gatech.edu/~shauvikSome slides adapted from the EXE and KLEE presentations + slides from Saswat
Old research area but still active..First introduced in 1975  (source: Saswat)1976 by James King, IBM – TJ watsonVery active area of research. Eg.EGT / EXE / KLEE [Stanford]DART [Bell Labs]CUTE [UIUC]SAGE, Pex [MSR Redmond]Vigilante [MSR Cambridge]BitScope [Berkeley/CMU]CatchConv [Berkeley]JPF [NASA Ames]2
Symbolic ExecutionSymbolic execution refers to execution of program with symbols as argument.Unlike concrete execution, in symbolic execution the program can take any feasible path. (limitation: constraint solver)During symbolic execution, program state consists ofsymbolic values for some memory locationspath conditionPath condition is a conjuct of constraints on the symbolic input values.Solution of path-condition is an test-input that covers the respective path.3
Implementation of Symbolic ExecutionTransformation approachtransform the program to another program that operates on symbolic values such that execution of the transformed program is equivalent to symbolic execution of the original programdifficult to implement, portable solution, suitable for Java, .NETInstrumentation approachcallback hooks are inserted in the program such that symbolic execution is done in background during normal execution of programeasy to implement for CCustomized runtime approachCustomize the runtime (e.g., JVM) to support symbolic executionApplicable to Java, .NET, difficult to implement, flexible, not portable4CUTE, KLEEJPF
Limitations of Symbolic ExecutionLimited by the power of constraint solvercannot handle non-linear and very complex constraintsDoes not scale when number of paths are large. (subject of ongoing research in this area)Source code, or equivalent (e.g., Java class files) is required for precise symbolic execution5
EGT & EXESlides based on D. Engler’s slides
Generic features: Baroque interfaces, tricky input, rats nest of conditionals.Enormous undertaking to hit with manual testing.Random “fuzz” testingCharm: no manual workBlind generation makes hard					 to hit errors for narrow						 input rangeAlso hard to hit errors that 						require structureThis talk: a simple trick to finesse.Goal: find many bugs in systems code
EGT: Execution Generated Testing [SPIN’05]Basic Idea: Use the code itself to construct its input !Basic Algorithm:Symbolic execution + constraints solving.Run code on symbolic inputs, initial value = “anything”As code observes inputs, it tells us values it can be.At conditionals that uses symbolic input, forkOn true branch, add constraint that input satisfies checkOn false that it does not.Then generate constraints using these inputs and re-run code using them.8How to make system code crash itself !
The toy exampleInitialize x to be “any int”Code will run 3 times.Solve constraints at each   to get our 3 test cases.9
The big pictureImplementation prototypeDo source-to-source transformation using CILUse CVCL decision procedure to solve constraints, then re-run code on concrete valuesRobustness: use mixed symbolic and concrete execution3 ways to look at what’s going onGrammar extractionTurn code inside out from input consumer to generatorSort-of Heisenberg effect: observations perturb symbolic inputs into increasingly concrete ones. More definite observation = more definite perturbation10
Mixed executionBasic idea: given an operation:If all of it’s operands are concrete, just do it.If any are symbolic, add constraint.If current constraints are impossible, stop.If current path causes something to blow up, solve & emit.If current path calls unmodelled function, solve & call.If program exits, solve & emit.How to track?Use variable addresses to determine if symbolic or concreteNote: Symbolic assignment not destructive. Creates new symbol11
Example transformation “+”Each varv has v.concrete and v.symbolic fields	If v is concrete, symbol = <invalid> and vice versa12
13
ResultsMutt vs <= 1.4 have buffer overflow (osdi paper)Input size 4, took 34 minutes to generate 458 tests with 98% st coverageprintf(3 implementations pintOS, gccfast, embedded)Made format strings symbolicTwo bugsIncorrect grouping of integers Incorrect handling of plus flags (“%” followed by space)14
More..WsMP3 server case study2ooo LOCTechnique: Make recv input symbolicFound known security hole + 2 new bugs15Network controlled infinite loopBuffer overflow
EXE: EXecution generated Executions [CCS’06]Same ideas as EGTMain contributionsMore practical tool: Can test any code pathGenerates actual attacksConstraint Solver : STPDecision solver for bitvectors and arrays.If solvable, passes constraints to MiniSATFour times lesser code than CVCL and magnitude fasterArray optimizations (substitution, refinements, simplification)16Automatically Generating inputs of Death !
The mechanicsUser marks input to treat symbolically using either:Compile with EXE compiler, exe-cc.  Uses CIL toInsert checks around every expression: if operands all concrete, run as normal.  Otherwise, add as constraintInsert fork calls when symbolic could cause multiple acts./a.out: forks at each decision point.When path terminates use STP to solve constraints.Terminates when: (1) exit, (2) crash, (3) EXE detects errRerun concrete through uninstrumented code.
Isn’t exponential expensive?Only fork on symbolic branches.Most concrete (linear).Loops?  Heuristics.Default: DFS.  Linear processes with chain depth.Can get stuck.“Best first” search: chose branch, backtrack to point that will run code hit fewest times.Can do better…However:Happy to let run for weeks as long as generating interesting test cases.  Competition is manual and random.
Mixed executionBasic idea: given expression (e.g., deref, ALU op)If all of its operands are concrete, just do it.If any are symbolic, add as constraint.If current constraints are impossible, stop.If current path hits error or exit(), solve+emit.If calls uninstrumented code: do call, or solve and do callExample: “x = y + z”If y, z both concrete, execute.  Record x = concrete.Otherwise set “x = y + z”, record x =symbolic.Result:Most code runs concretely: small slice deals w/ symbolics.Robust: do not need all source code (e.g., OS).  Just run
LimitsMissed constraints:If call asm, or CIL cannot eat file.STP cannot do div/mod: constraint to be power of 2, shift, mask respectively.Cannot handle **p where “p” is symbolic: must concretize *p.  (Note: **p still symbolic.)Stops path if cannot solve; can get lost in exponentials.Missing:No symbolic function pointers, symbolics passed to varargs not tracked.No floating point.long long support is erratic.
EXE ResultsBerkley Packet FilterTwo buffer overflow exploitsudhcpd – well tested user level DHCP serverFive memory errorsPCRE – Perl Compatible Regular ExpressionsMany out of bounds writes leading to abort in glibc on freeDisks of death – File systemsFour bugs on ext2 & ext 3 file systems.Null pointer dereference in JFS21
A galactic view [Oakland’06]
KLEEThanks to CristianCadar for the slides
24Code complexityTricky control flowComplex dependenciesAbusive use of pointer operationsEnvironmental dependenciesCode has to anticipate all possible interactionsIncluding malicious onesWriting Systems Code Is Hard
KLEE[OSDI 2008, Best Paper Award]Based on symbolic execution and constraint solving techniquesAutomatically generates high coverage test suitesOver 90% on average on ~160 user-level appsFinds deep bugs in complex systems programsIncluding higher-level correctness ones25
Toy Examplex= x < 0intbad_abs(intx) {     if (x < 0)	     return –x;     if (x == 1234)         return –x;     return x;}TRUEFALSEx0x< 0x = 1234return -xTRUEFALSEx= 1234x1234x = -2return xreturn -xtest1.outx = 3x = 1234test2.outtest3.out26
KLEE ArchitectureLLVM bytecodeC codeLLVMx = -2K L E ESYMBOLIC ENVIRONMENTx = 1234x = 3x  0x  1234x = 3Constraint Solver (STP)27
OutlineMotivation Example and Basic ArchitectureScalability ChallengesExperimental Evaluation28
Three Big ChallengesMotivation Example and Basic ArchitectureScalability ChallengesExponential number of paths
Expensive constraint solving
Interaction with environmentExperimental Evaluation29
Exponential Search SpaceNaïve exploration can easily get “stuck”Use search heuristics:Coverage-optimized searchSelect path closest to an uncovered instruction
Favor paths that recently hit new codeRandom path searchSee [KLEE – OSDI’08]30
Three Big ChallengesMotivation Example and Basic ArchitectureScalability ChallengesExponential number of paths
Expensive constraint solving
Interaction with environmentExperimental Evaluation31
Constraint SolvingDominates runtimeInherently expensive (NP-complete)
Invoked at every branchTwo simple and effective optimizationsEliminating irrelevant constraints
Caching solutionsDramatic speedup on our benchmarks32
Eliminating Irrelevant ConstraintsIn practice, each branch usually depends on a small number of variables……if (x < 10) {    …}                   x + y > 10z & -z = zx< 10 ?33
Caching SolutionsStatic set of branches: lots of similar constraint sets2  y < 100x > 3x + y > 10x = 5y = 15x = 5y = 152  y < 100x + y > 10Eliminating constraintscannot invalidate solution2  y < 100x > 3x + y > 10x < 10x = 5y = 15Adding constraints often does not invalidate solutionUBTree data structure [Hoffman and Koehler, IJCAI ’99]34
Dramatic SpeedupAggregated data over 73 applicationsTime (s)Executed instructions (normalized)35
Three Big ChallengesMotivation Example and Basic ArchitectureScalability ChallengesExponential number of paths
Expensive constraint solving
Interaction with environmentExperimental Evaluation36
Environment: Calling Out Into OSintfd  = open(“t.txt”, O_RDONLY);If all arguments are concrete, forward to OSOtherwise, provide models that can handle symbolic filesGoal is to explore all possible legal interactions with the environmentintfd  = open(sym_str, O_RDONLY);37
Environmental Modeling// actual implementation: ~50 LOCssize_tread(intfd, void *buf, size_t count) {exe_file_t *f = get_file(fd);        …memcpy(buf, f->contents + f->off, count)f->off += count;        …}Plain C code run by KLEEUsers can extend/replace environment w/o any knowledge of KLEE internalsCurrently: effective support for symbolic command line arguments, files, links, pipes, ttys, environment vars38
Does KLEE work?Motivation Example and Basic ArchitectureScalability ChallengesEvaluationCoverage results
Bug finding
Crosschecking39
GNU Coreutils SuiteCore user-level apps installed on many UNIX systems89 stand-alone (i.e. excluding wrappers) apps (v6.10)File system management: ls, mkdir, chmod, etc.
Management of system properties: hostname, printenv, etc.
Text file processing : sort, wc, od, etc.
…Variety of functions, different authors,intensive interaction with environmentHeavily tested, mature code40

More Related Content

PPT
Tsp branch and bound
PPTX
Bellman Ford Algorithm
PPTX
Data Structures- Hashing
PDF
Johnson's algorithm
PDF
Heaps
PDF
65487681 60444264-engineering-optimization-theory-and-practice-4th-edition
PPTX
artificial intelligence document final.pptx
PPTX
Minimum Spanning Tree
Tsp branch and bound
Bellman Ford Algorithm
Data Structures- Hashing
Johnson's algorithm
Heaps
65487681 60444264-engineering-optimization-theory-and-practice-4th-edition
artificial intelligence document final.pptx
Minimum Spanning Tree

What's hot (20)

PPTX
Chomsky Normal Form
PPT
Ai & expert introduction
PPTX
Turing Machine
PPTX
Performance analysis and randamized agoritham
PPTX
Kruskal Algorithm
PPTX
Branch and bounding : Data structures
PPT
Longest Common Subsequence
PDF
Pumping lemma for cfl
PDF
knapsackusingbranchandbound
PPTX
Branch and bound
PPTX
ROOT OF NON-LINEAR EQUATIONS
PDF
Continuity and Uniform Continuity
PPTX
discrete mathematics binary%20trees.pptx
PPTX
Lefmost rightmost TOC.pptx
PPT
minimum spanning trees Algorithm
PPTX
LINEAR BOUNDED AUTOMATA (LBA).pptx
PPTX
CLR AND LALR PARSER
PPT
Predicate calculus
PPTX
Introduction to Dynamic Programming, Principle of Optimality
PPT
Greedy algorithms
Chomsky Normal Form
Ai & expert introduction
Turing Machine
Performance analysis and randamized agoritham
Kruskal Algorithm
Branch and bounding : Data structures
Longest Common Subsequence
Pumping lemma for cfl
knapsackusingbranchandbound
Branch and bound
ROOT OF NON-LINEAR EQUATIONS
Continuity and Uniform Continuity
discrete mathematics binary%20trees.pptx
Lefmost rightmost TOC.pptx
minimum spanning trees Algorithm
LINEAR BOUNDED AUTOMATA (LBA).pptx
CLR AND LALR PARSER
Predicate calculus
Introduction to Dynamic Programming, Principle of Optimality
Greedy algorithms
Ad

Similar to Symbolic Execution And KLEE (20)

PDF
NSC #2 - D2 06 - Richard Johnson - SAGEly Advice
PDF
Symbolic Execution (introduction and hands-on)
PDF
Klee and angr
PDF
Sthack 2015 - Jonathan "@JonathanSalwan" Salwan - Dynamic Behavior Analysis U...
PDF
St hack2015 dynamic_behavior_analysis_using_binary_instrumentation_jonathan_s...
PPT
02paradigms.ppt
PDF
DConf 2016: Keynote by Walter Bright
PPTX
Tools for the Toolmakers
PDF
Intermediate code generation
PDF
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
PDF
Scheme on WebAssembly: It is happening!
PDF
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
PPTX
Architecting Domain-Specific Languages
PPTX
Static analysis
PDF
programacion funcional.pdf
PDF
Dealing with complex constraints in symbolic execution
PDF
Covering a function using a Dynamic Symbolic Execution approach
PPT
Symbol Table, Error Handler & Code Generation
PDF
SE 20016 - programming languages landscape.
PPTX
Automated Program Repair, Distinguished lecture at MPI-SWS
NSC #2 - D2 06 - Richard Johnson - SAGEly Advice
Symbolic Execution (introduction and hands-on)
Klee and angr
Sthack 2015 - Jonathan "@JonathanSalwan" Salwan - Dynamic Behavior Analysis U...
St hack2015 dynamic_behavior_analysis_using_binary_instrumentation_jonathan_s...
02paradigms.ppt
DConf 2016: Keynote by Walter Bright
Tools for the Toolmakers
Intermediate code generation
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Scheme on WebAssembly: It is happening!
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
Architecting Domain-Specific Languages
Static analysis
programacion funcional.pdf
Dealing with complex constraints in symbolic execution
Covering a function using a Dynamic Symbolic Execution approach
Symbol Table, Error Handler & Code Generation
SE 20016 - programming languages landscape.
Automated Program Repair, Distinguished lecture at MPI-SWS
Ad

More from Shauvik Roy Choudhary, Ph.D. (10)

PDF
Test and docs: Hand in hand
PDF
Using Robots for App Testing
PDF
From Manual to Automated Tests - STAC 2015
PDF
PhD Dissertation Defense (April 2015)
PDF
PDF
CheckDroid Startup Madness 2014
PDF
Penetration Testing with Improved Input Vector Identification
PPTX
PDF
Barcamp Atlanta 2007
Test and docs: Hand in hand
Using Robots for App Testing
From Manual to Automated Tests - STAC 2015
PhD Dissertation Defense (April 2015)
CheckDroid Startup Madness 2014
Penetration Testing with Improved Input Vector Identification
Barcamp Atlanta 2007

Recently uploaded (20)

PDF
IGGE1 Understanding the Self1234567891011
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
advance database management system book.pdf
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PPTX
Computer Architecture Input Output Memory.pptx
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
Introduction to Building Materials
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PPTX
Virtual and Augmented Reality in Current Scenario
IGGE1 Understanding the Self1234567891011
Share_Module_2_Power_conflict_and_negotiation.pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Chinmaya Tiranga quiz Grand Finale.pdf
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
FORM 1 BIOLOGY MIND MAPS and their schemes
advance database management system book.pdf
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
Computer Architecture Input Output Memory.pptx
Weekly quiz Compilation Jan -July 25.pdf
Introduction to Building Materials
Paper A Mock Exam 9_ Attempt review.pdf.
LDMMIA Reiki Yoga Finals Review Spring Summer
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
TNA_Presentation-1-Final(SAVE)) (1).pptx
Unit 4 Computer Architecture Multicore Processor.pptx
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Virtual and Augmented Reality in Current Scenario

Symbolic Execution And KLEE

  • 1. Symbolic executionOverview of work done by Dawson Engler’s group at Stanford (EGT/EXE/KLEE*)byShauvik Roy Choudharyhttp://cc.gatech.edu/~shauvikSome slides adapted from the EXE and KLEE presentations + slides from Saswat
  • 2. Old research area but still active..First introduced in 1975 (source: Saswat)1976 by James King, IBM – TJ watsonVery active area of research. Eg.EGT / EXE / KLEE [Stanford]DART [Bell Labs]CUTE [UIUC]SAGE, Pex [MSR Redmond]Vigilante [MSR Cambridge]BitScope [Berkeley/CMU]CatchConv [Berkeley]JPF [NASA Ames]2
  • 3. Symbolic ExecutionSymbolic execution refers to execution of program with symbols as argument.Unlike concrete execution, in symbolic execution the program can take any feasible path. (limitation: constraint solver)During symbolic execution, program state consists ofsymbolic values for some memory locationspath conditionPath condition is a conjuct of constraints on the symbolic input values.Solution of path-condition is an test-input that covers the respective path.3
  • 4. Implementation of Symbolic ExecutionTransformation approachtransform the program to another program that operates on symbolic values such that execution of the transformed program is equivalent to symbolic execution of the original programdifficult to implement, portable solution, suitable for Java, .NETInstrumentation approachcallback hooks are inserted in the program such that symbolic execution is done in background during normal execution of programeasy to implement for CCustomized runtime approachCustomize the runtime (e.g., JVM) to support symbolic executionApplicable to Java, .NET, difficult to implement, flexible, not portable4CUTE, KLEEJPF
  • 5. Limitations of Symbolic ExecutionLimited by the power of constraint solvercannot handle non-linear and very complex constraintsDoes not scale when number of paths are large. (subject of ongoing research in this area)Source code, or equivalent (e.g., Java class files) is required for precise symbolic execution5
  • 6. EGT & EXESlides based on D. Engler’s slides
  • 7. Generic features: Baroque interfaces, tricky input, rats nest of conditionals.Enormous undertaking to hit with manual testing.Random “fuzz” testingCharm: no manual workBlind generation makes hard to hit errors for narrow input rangeAlso hard to hit errors that require structureThis talk: a simple trick to finesse.Goal: find many bugs in systems code
  • 8. EGT: Execution Generated Testing [SPIN’05]Basic Idea: Use the code itself to construct its input !Basic Algorithm:Symbolic execution + constraints solving.Run code on symbolic inputs, initial value = “anything”As code observes inputs, it tells us values it can be.At conditionals that uses symbolic input, forkOn true branch, add constraint that input satisfies checkOn false that it does not.Then generate constraints using these inputs and re-run code using them.8How to make system code crash itself !
  • 9. The toy exampleInitialize x to be “any int”Code will run 3 times.Solve constraints at each to get our 3 test cases.9
  • 10. The big pictureImplementation prototypeDo source-to-source transformation using CILUse CVCL decision procedure to solve constraints, then re-run code on concrete valuesRobustness: use mixed symbolic and concrete execution3 ways to look at what’s going onGrammar extractionTurn code inside out from input consumer to generatorSort-of Heisenberg effect: observations perturb symbolic inputs into increasingly concrete ones. More definite observation = more definite perturbation10
  • 11. Mixed executionBasic idea: given an operation:If all of it’s operands are concrete, just do it.If any are symbolic, add constraint.If current constraints are impossible, stop.If current path causes something to blow up, solve & emit.If current path calls unmodelled function, solve & call.If program exits, solve & emit.How to track?Use variable addresses to determine if symbolic or concreteNote: Symbolic assignment not destructive. Creates new symbol11
  • 12. Example transformation “+”Each varv has v.concrete and v.symbolic fields If v is concrete, symbol = <invalid> and vice versa12
  • 13. 13
  • 14. ResultsMutt vs <= 1.4 have buffer overflow (osdi paper)Input size 4, took 34 minutes to generate 458 tests with 98% st coverageprintf(3 implementations pintOS, gccfast, embedded)Made format strings symbolicTwo bugsIncorrect grouping of integers Incorrect handling of plus flags (“%” followed by space)14
  • 15. More..WsMP3 server case study2ooo LOCTechnique: Make recv input symbolicFound known security hole + 2 new bugs15Network controlled infinite loopBuffer overflow
  • 16. EXE: EXecution generated Executions [CCS’06]Same ideas as EGTMain contributionsMore practical tool: Can test any code pathGenerates actual attacksConstraint Solver : STPDecision solver for bitvectors and arrays.If solvable, passes constraints to MiniSATFour times lesser code than CVCL and magnitude fasterArray optimizations (substitution, refinements, simplification)16Automatically Generating inputs of Death !
  • 17. The mechanicsUser marks input to treat symbolically using either:Compile with EXE compiler, exe-cc. Uses CIL toInsert checks around every expression: if operands all concrete, run as normal. Otherwise, add as constraintInsert fork calls when symbolic could cause multiple acts./a.out: forks at each decision point.When path terminates use STP to solve constraints.Terminates when: (1) exit, (2) crash, (3) EXE detects errRerun concrete through uninstrumented code.
  • 18. Isn’t exponential expensive?Only fork on symbolic branches.Most concrete (linear).Loops? Heuristics.Default: DFS. Linear processes with chain depth.Can get stuck.“Best first” search: chose branch, backtrack to point that will run code hit fewest times.Can do better…However:Happy to let run for weeks as long as generating interesting test cases. Competition is manual and random.
  • 19. Mixed executionBasic idea: given expression (e.g., deref, ALU op)If all of its operands are concrete, just do it.If any are symbolic, add as constraint.If current constraints are impossible, stop.If current path hits error or exit(), solve+emit.If calls uninstrumented code: do call, or solve and do callExample: “x = y + z”If y, z both concrete, execute. Record x = concrete.Otherwise set “x = y + z”, record x =symbolic.Result:Most code runs concretely: small slice deals w/ symbolics.Robust: do not need all source code (e.g., OS). Just run
  • 20. LimitsMissed constraints:If call asm, or CIL cannot eat file.STP cannot do div/mod: constraint to be power of 2, shift, mask respectively.Cannot handle **p where “p” is symbolic: must concretize *p. (Note: **p still symbolic.)Stops path if cannot solve; can get lost in exponentials.Missing:No symbolic function pointers, symbolics passed to varargs not tracked.No floating point.long long support is erratic.
  • 21. EXE ResultsBerkley Packet FilterTwo buffer overflow exploitsudhcpd – well tested user level DHCP serverFive memory errorsPCRE – Perl Compatible Regular ExpressionsMany out of bounds writes leading to abort in glibc on freeDisks of death – File systemsFour bugs on ext2 & ext 3 file systems.Null pointer dereference in JFS21
  • 22. A galactic view [Oakland’06]
  • 24. 24Code complexityTricky control flowComplex dependenciesAbusive use of pointer operationsEnvironmental dependenciesCode has to anticipate all possible interactionsIncluding malicious onesWriting Systems Code Is Hard
  • 25. KLEE[OSDI 2008, Best Paper Award]Based on symbolic execution and constraint solving techniquesAutomatically generates high coverage test suitesOver 90% on average on ~160 user-level appsFinds deep bugs in complex systems programsIncluding higher-level correctness ones25
  • 26. Toy Examplex= x < 0intbad_abs(intx) { if (x < 0) return –x; if (x == 1234) return –x; return x;}TRUEFALSEx0x< 0x = 1234return -xTRUEFALSEx= 1234x1234x = -2return xreturn -xtest1.outx = 3x = 1234test2.outtest3.out26
  • 27. KLEE ArchitectureLLVM bytecodeC codeLLVMx = -2K L E ESYMBOLIC ENVIRONMENTx = 1234x = 3x  0x  1234x = 3Constraint Solver (STP)27
  • 28. OutlineMotivation Example and Basic ArchitectureScalability ChallengesExperimental Evaluation28
  • 29. Three Big ChallengesMotivation Example and Basic ArchitectureScalability ChallengesExponential number of paths
  • 32. Exponential Search SpaceNaïve exploration can easily get “stuck”Use search heuristics:Coverage-optimized searchSelect path closest to an uncovered instruction
  • 33. Favor paths that recently hit new codeRandom path searchSee [KLEE – OSDI’08]30
  • 34. Three Big ChallengesMotivation Example and Basic ArchitectureScalability ChallengesExponential number of paths
  • 38. Invoked at every branchTwo simple and effective optimizationsEliminating irrelevant constraints
  • 39. Caching solutionsDramatic speedup on our benchmarks32
  • 40. Eliminating Irrelevant ConstraintsIn practice, each branch usually depends on a small number of variables……if (x < 10) { …} x + y > 10z & -z = zx< 10 ?33
  • 41. Caching SolutionsStatic set of branches: lots of similar constraint sets2  y < 100x > 3x + y > 10x = 5y = 15x = 5y = 152  y < 100x + y > 10Eliminating constraintscannot invalidate solution2  y < 100x > 3x + y > 10x < 10x = 5y = 15Adding constraints often does not invalidate solutionUBTree data structure [Hoffman and Koehler, IJCAI ’99]34
  • 42. Dramatic SpeedupAggregated data over 73 applicationsTime (s)Executed instructions (normalized)35
  • 43. Three Big ChallengesMotivation Example and Basic ArchitectureScalability ChallengesExponential number of paths
  • 46. Environment: Calling Out Into OSintfd = open(“t.txt”, O_RDONLY);If all arguments are concrete, forward to OSOtherwise, provide models that can handle symbolic filesGoal is to explore all possible legal interactions with the environmentintfd = open(sym_str, O_RDONLY);37
  • 47. Environmental Modeling// actual implementation: ~50 LOCssize_tread(intfd, void *buf, size_t count) {exe_file_t *f = get_file(fd); …memcpy(buf, f->contents + f->off, count)f->off += count; …}Plain C code run by KLEEUsers can extend/replace environment w/o any knowledge of KLEE internalsCurrently: effective support for symbolic command line arguments, files, links, pipes, ttys, environment vars38
  • 48. Does KLEE work?Motivation Example and Basic ArchitectureScalability ChallengesEvaluationCoverage results
  • 51. GNU Coreutils SuiteCore user-level apps installed on many UNIX systems89 stand-alone (i.e. excluding wrappers) apps (v6.10)File system management: ls, mkdir, chmod, etc.
  • 52. Management of system properties: hostname, printenv, etc.
  • 53. Text file processing : sort, wc, od, etc.
  • 54. …Variety of functions, different authors,intensive interaction with environmentHeavily tested, mature code40
  • 55. Coreutils ELOC (incl. called lib)Number of applicationsExecutable Lines of Code (ELOC)41
  • 56. MethodologyFully automatic runsRun KLEE one hour per utility, generate test casesRun test cases on uninstrumented version of utilityMeasure line coverage using gcovCoverage measurements not inflated by potential bugs in our tool42
  • 57. High Line Coverage (Coreutils, non-lib, 1h/utility = 89 h)Overall: 84%, Average 91%, Median 95%16 at 100%Coverage (ELOC %)Apps sorted by KLEE coverage43
  • 58. KLEE91%Manual68%Beats 15 Years of Manual TestingAvg/utilityManual tests also check correctnessKLEE coverage – Manual coverageApps sorted by KLEE coverage – Manual coverage44
  • 59. Busybox Suite for Embedded DevicesOverall: 91%, Average 94%, Median 98%31 at 100%Coverage (ELOC %)Apps sorted by KLEE coverage45
  • 60. KLEE94%Manual44%Busybox – KLEE vs. ManualAvg/utilityKLEE coverage – Manual coverageApps sorted by KLEE coverage – Manual coverage46
  • 61. Does KLEE work?Motivation Example and Basic ArchitectureScalability ChallengesEvaluationCoverage results
  • 64. GNU Coreutils BugsTen crash bugsMore crash bugs than approx last three years combined
  • 65. KLEE generates actual command lines exposing crashes48
  • 66. md5sum -c t1.txtmkdir -Z a bmkfifo -Z a bmknod -Z a b pseq -f %0 1pr -e t2.txttac -r t3.txt t3.txtpaste -d\\ abcdefghijklmnopqrstuvwxyzptx -F\\ abcdefghijklmnopqrstuvwxyzptx x t4.txtt1.txt: \t \tMD5(t2.txt: \b\b\b\b\b\b\b\tt3.txt: \nt4.txt: ATen command lines of death49
  • 67. Does KLEE work?Motivation Example and Basic ArchitectureScalability ChallengesEvaluationCoverage results
  • 70. Finding Correctness BugsKLEE can prove asserts on a per path basisConstraints have no approximations
  • 71. An assert is just a branch, and KLEE proves feasibility/infeasibility of each branch it reaches
  • 72. If KLEE determines infeasibility of false side of assert, the assert was proven on the current path51
  • 73. CrosscheckingAssume f(x) and f’(x) implement the same interfaceMake input x symbolicRun KLEE on assert(f(x) == f’(x))For each explored path:KLEE terminates w/o error: paths are equivalentKLEE terminates w/ error: mismatch foundCoreutils vs. Busybox:UNIX utilities should conform to IEEE Std.1003.1Crosschecked pairs of Coreutils and Busybox appsVerified paths, found mismatches52
  • 74. InputBusyboxCoreutilstee "" <t1.txt[infinite loop][terminates]tee -[copies once to stdout][copies twice]comm t1.txt t2.txt[doesn’t show diff][shows diff]cksum /"4294967295 0 /""/: Is a directory"split /"/: Is a directory"tr[duplicates input]"missing operand"[ 0 ‘‘<’’ 1 ]"binary op. expected"tail –2l[rejects][accepts]unexpand –f[accepts][rejects]split –[rejects][accepts]t1.txt: a t2.txt: b(no newlines!)Mismatches Found53
  • 75. KLEE Effective Testing of Systems ProgramsKLEE can effectively:Generate high coverage test suitesOver 90% on average on ~160 user-level applicationsFind deep bugs in complex softwareIncluding higher-level correctness bugs, via crosschecking54
  • 76. KLEE DEMOTool available at http://klee.llvm.org/ExperimentsTool examplesisLower()RegExpMore experimentation55
  • 77. DiscussionQuestions / Ideas ?Thanks for listening !