0% found this document useful (0 votes)
26 views14 pages

Invivo

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views14 pages

Invivo

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Invivo Fuzzing by Amplifying Actual Executions

Octavio Galland Marcel Böhme


Canonical, Argentina MPI-SP, Germany

Amplifier Point Amplifier Constraints


Abstract—A major bottleneck that remains when fuzzing ASN1_parse( sizeof(pp) = len
software libraries is the need for fuzz drivers, i.e., the glue code BIO* , const char* pp, long len, int ) ∧ len < C
between the fuzzer and the library. Despite years of fuzzing, OPENSSL_hexstr2buf( sizeof(str) < C
critical security flaws are still found, e.g., by manual auditing, const char* str, long* )
ossl_punycode_decode( sizeof(pEnc) = encLen
because the fuzz drivers do not cover the complex interactions char* pEnc, size t encLen, ∧ sizeof(pDec) = encLen
between the library and the host programs using it. int* pDec, int* pOutLen) ∧ encLen < C
In this work we propose an alternative approach to library ∧ sizeof(pOutLen) = 1
∧ *pOutLen = encLen
fuzzing, which leverages a valid execution context that is set up
cms_kek_cipher( sizeof(in) = inlen
by a given program using the library (the host), and amplify char** , size t* , char* in, size t inlen, ∧ inlen < C
its execution. More specifically, we execute the host until a CMS KeyAgreeRecipientInfo* , int )
designated function from a list of target functions has been
reached, and then perform coverage-guided function-level fuzzing TABLE I: Four of 100+ amplifier points semi-automatically
on it. Once the fuzzing quota is exhausted, we move on to selected for O PEN SSL. C is a constant chosen to prevent
fuzzing the next target from the list. In this way we not only spurious out-of-memory errors. For brevity, we omit names
reduce the amount of manual work needed by a developer to for parameters which are not fuzzed (_).
incorporate fuzzing into their workflow, but we also allow the
fuzzer to explore parts of the library as they are used in real-
world programs that may otherwise not have been tested due to
the simplicity of most fuzz drivers. In fact, despite an abundance of fuzz drivers and years of
fuzzing, it is still possible to find vulnerabilities in software
I. I NTRODUCTION libraries by manual auditing that could have been found
by fuzzing if only the “right” fuzz driver was written. For
Today, fuzz drivers have to be developed to make a software instance, recently a high-severity vulneratility has been found
library ammenable to fuzzing. Fuzzing is a popular automated in the OpenSSL project which involved faulty memory man-
testing technique which has proven to be very effective at bug agement during parsing of Punycode and resulted in a stack-
finding, with tens of thousands of vulnerabilities discovered buffer overflow (CVE-2022-3602). In the aftermath of the
in commonly used software [1]. Since this technique involves discovery it was found that this vulnerability could be exposed
executing the program, when applying it to code libraries it in a matter of minutes through fuzzing, if only the relevant
becomes necessary to implement a fuzz driver. A fuzz driver part of the code was being targeted.1 Moreover, the relevant
is a piece of code that acts as the entry point for execution function was being executed by the test suite, which suggests
during the fuzzing campaign. It sets up an artificial calling that if the test suite had been “amplified” by fuzzing, the bug
context and is responsible for accepting data from the fuzzer might have been spotted earlier.
and feeding it into the library with the appropriate format. In this paper, we propose in-vivo fuzzing to make all
Traditionally, fuzz drivers are implemented manually which code subject to fuzzing by (i) identifying amplifier points,
constitutes a major hinderance to widespread adoption of (ii) injecting function-level fuzzers at amplifier points, and
fuzzing. For instance, Google incentivices the development of (iii) amplifying actual executions from a host application that
fuzz drivers for important open source projects by paying up is using the target library. This approach side-steps any re-
to 30k USD of integration awards [2]. Google’s project OSS- quirement for fuzz drivers and allows us to amplify executions
Fuzz [1] is primarily a community-maintained collection of generated in any way as they are entering an amplifier point.
fuzz drivers for open source projects. Amplifier points. While it is possible to choose every func-
Moreover, fuzz drivers do not capture the complex inter- tion entry as amplifier point, to maximize utility we would like
action that actual host programs have with the library. They to focus on the most interesting functions. In order to identify
often set up too simplistic a context and seperately target very such functions, the user can rely on their expert knowledge
specific parts of any given API. While this allows developers to of the codebase, or on automated static analyses. For our
maximize fuzzer throughput—for instance, LibFuzzer’s docu- implementation,2 we choose functions that are associated with
mentation [3] advices developers to make fuzz drivers execute parsing a specific data chunk from the input sequence of bytes.
as fast as possible and leave any global state unmodified after Without loss of generality, this simplifies the data types used
execution has finished—it also reduces the search space for
bugs that could be observed during normal execution by a 1 https://allsoftwaresucks.blogspot.com/2022/11/why- cve- 2022- 3602- was- not- detected- by.html
host but not during executions generated via the fuzz drivers. 2 https://anonymous.4open.science/r/afllive-598A/README.md#config-file-1
for amplification and minimizes the number of false positives.
The criteria involves having at least one parameter whose type
is a pointer to a byte-stream and has a parsing-related name
(e.g., parse, decode). Table I-Col.1 shows four (of 100+)
amplifier points automatically identified for OpenSSL.
Amplifier preconditions. In-vivo fuzzing mutates the input
parameters of functions chosen as amplifier points. The hope is
that the mutational approach corrupts the valid program state
minimally to maintain the validity of the resulting program
state and minimize the number of false positives. Nevertheless,
there are certain constraints that need to be satisfied to
maintain validity. These preconditions take the form of a
conjunction of constraints that arguments need to satisfy. To Fig. 1: Overall procedure.
minimize false positives, when specifying amplifier points, we
allow such preconditions to be specified, as well. Table I-
Col.2 shows the preconditions for the four amplifier points 2.4k USD in bug bounty. Test cases often cover certain
in OpenSSL. edge cases or are designed to expose regressions similar to
Instrumentation and runtime. In order to fuzz a library with previously discovered bugs. Amplifying such test cases allows
our approach it is first necessary to find a suitable host which us to search their “neighborhood” to bring to light new bugs or
uses it, and instrument both of them during compilation. This existing bugs that have been incompletely fixed. In fact, over
instrumentation enables us to intercept any call to an amplifier 40% of 0-days exploited in-the-wild are variants of previously
point during an execution of the host. Upon invocation of discovered vulnerabilities.4 Also, test suites often achieve high
an amplifier point, we can proceed to amplify the execution code coverage. Amplifying the test suite thus enables the
by repeatedly forking into a shadow execution, replacing the fuzzer to reach deep into the code. During amplification, the
parameters passed to the function with parameters provided test cases practically become fuzz drivers for in-vivo fuzzing.
by the fuzzer, and allowing this shadow execution to terminate
In summary, the main contributions of this work are:
while monitoring the shadow process for potential crashes.
In-vivo fuzzing (§III). To study the effectiveness of our in- • An approach that auto-enables fuzzing for every compil-
vivo approach, we implemented our tool called AFLLIVE and able system or library subject to user-defined constraints
measured the difference in code coverage achieved between that is executed, e.g., in production.
the unamplified, original executions and the amplified, shadow • A way to harness existing test suites by amplifying their
executions. For each of the four target libraries, we chose a coverage and exploring the neighborhoods states induced
host program and input to generate unamplified executions. by existing regression tests.
Indeed, we observe a substantial increase in code coverage • An open-source prototype implementation AFLLIVE and
without false positives, indicating that AFLLIVE effectively an extensive evaluation, available at: https://anonymous.
explores the “valid” neighborhood of the original execution. 4open.science/r/afllive-598A.
Auto harnessing (§IV). An approach to enable fuzzing for a
library without manual intervention is the automatic synthesis
II. I N -V IVO F UZZING
of fuzz drivers. To study the difference in effectiveness, we
compare AFLLIVE against state-of-the-art fuzz driver gener- Given a host process p and a set of interesting functions F ,
ators. Specifically, we choose the fuzz drivers generated by called amplifier points, in-vivo fuzzing piggybacks on the
FuzzGen [4] and FUDGE [5] and compare them with our correct execution of the host process to generate a valid library
approach on the FuzzGen and FUDGE benchmarks.3 Ensuring state, calling context, and arguments for any function f ∈ F .
the same initial conditions, we find that our prototype is able Trying to minimize interference with the original process, an
to identify 7 bugs in one of the subjects (5 memory corruption in-vivo fuzzer proceeds to repeatedly fork the execution and
bugs and 2 assertion violations), while FuzzGen and FUDGE mutate the parameters of a targeted function call—within the
are unable to find any. Furthermore, our prototype consistently constraints C specified by the user and in a coverage-guided
achieves greater code coverage on all of their benchmarks. manner—to generate a crashing function call.
Test amplification (§V). We discuss the amplification of the In-vivo fuzzing is inspired by the mutational fuzzing ap-
executions generated by a test suite as a special use case of in- proach for file-processing programs [7]–[11] or protocol im-
vivo fuzzing. For the OpenSSL example, amplifying the test plementations [12]–[15]. Given a valid seed input, such as
suite did not only rediscover the Punycode vulnerability, but a PDF file for a PDF reader, a mutational fuzzer slightly
also found a previously unknown vulnerability that received corrupts the valid file by applying various mutation operators
3 We spent several months conducting experiments with the UTopia fuzz to generate semi-valid files that can still reach deep into
driver generator [6], but could only reproduce false positives on the original
and most recent versions of the benchmark programs. Details in the Appendix. 4 https://blog.google/threat-analysis-group/0-days-exploited-wild-2022/
the parsing process but suddenly induce crashes in the file- with the fuzzer. At the entry point of an amplifier point, the
processing program. We might consider the generated inputs to instrumentation pass inserts a call into our runtime containing
be within the “neighborhood” of the valid seed file. Moreover, the name of the amplifier point and the memory addresses
given a seed corpus with a high diversity, a mutational fuzzer of the arguments that will be fuzzed. Additionally, AFLLIVE
can cover a large diversity of program behaviors in the file- hooks the exit point of every amplifier point to facilitate early
processing program. termination of the host if configured in this way.
Amplifier points. To focus the in-vivo fuzzing on interesting
Algorithm 1 In-Vivo Fuzzing – Function fuzz library functions, we require the user to specify a set of func-
Input: Amplifier points F , Types T , Constraints C tions called amplifier points. This selection need not be limited
Input: Instrumented process p, Time budget t0 and t1 to library API functions only. It is possible to choose these
1: Global corpus Q = ∅ amplifier points manually or using an auto-discovery process.
2: Local corpus Qf = ∅ for all f ∈ F
3: Crashes Q ✗ = ∅
Generally, we would be looking for functions whose signa-
4: Shadow process p′ = fork(p) ture or behavior suggests they may be attacker-controllable
5: for each function f ∈ F executed in p do and contain vulnerabilities. For instance, our prototype uses
6: Objects objs = collect_initial_args(p′ , f ) CodeQL [17] to find parsing functions5 (which are user-
7: Types t ∈ T corresponding to f controlled by default and have the added benefit that constraint
8: Args args = serialize(objs, f, t)
9: Add args to local corpus Qf
specification for them tends to be particularly simple, since
10: Add ⟨f, args⟩ to global Q they typically take a byte array and its length).
11: end for Amplifier constraints. To minimize the number of false
12: positives, the in-vivo fuzzer also takes user-provided amplifier
13:for each function f ∈ F executed in p do constraints that the generated function parameters need to sat-
14: while t0 not expired do
15: Args q = select(Qf ) isfy before they are passed into the amplified function. These
16: For f , find types t ∈ T and constraints c ∈ C amplifier constraints carry the same role as the preconditions
17: fuzz_function_args(p′ , f, t, c, q, Q, Qf , Q ✗ ) in property-based testing [18], [19]. They take the form of
18: end while binary relationships between arguments and/or constants. For
19:end for example, in Row 1 of Table I, a constraint can be seen which
20:while campaign not aborted and t1 not expired do
21: Tuple ⟨f, q⟩ = select(Q) implies that the pointer pp must point to an array of length
22: For f , find types t ∈ T and constraints c ∈ C len, and that len should be less than a constant C.
23: fuzz_function_args(p′ , f, t, c, q, Q, Qf , Q ✗ ) Amplifier types. The instrumentation pass also records type
24:end while information for the given amplifier points (in JSON format).
Output: Crashes Q ✗ During fuzzing, these amplifier types are used to serialize
function parameter objects to a sequence of bytes for the
Similarly, we propose to use as seed a valid calling context in-vivo fuzzer and, vice versa, deserialize for the runtime,
and valid function arguments generated by a host. This way an similar to the coverage-guided Java fuzzing approach proposed
in-vivo fuzzer remains within the “neighborhood” of a valid by Padhye, Lemieux, and Sen [19]. This type information is
program state when fuzzing a function. Assuming the host composed of the bitwidth for primitive types, fields’ types and
application generates several calls to the library at different offsets for struct types, and the type of the pointee for pointers.
amplifier points, our in-vivo fuzzer can reach deep into the Note that since the collected types are in LLVM Intermediate
program and cover and amplify a diverse set of program states. Representation (IR) types, these three cases cover every kind
of variable types encountered for fuzzing most targets.
A. Overall Procedure
Figure 1 sketches the overall procedure. Given the host B. In-Vivo Fuzzing Algorithm
code, including the library code, and the user-specified am-
Algorithm 1 starts with the user-provided amplifier points
plifier points and constraints, the first step is to compile and
and constraints, the auto-generated amplifier types, an instance
instrument the program. Our instrumentation pass introduces
of the instrumented host binary p, and two user-provided
a function call into our in-vivo runtime within the preamble
time budgets t0 and t1 (which determines the lengths of the
of every function identified as amplifier point. During fuzzing,
screening phase and the main fuzzing loop, respectively).
this transfer of control allows the runtime to create a shadow
process and independently fuzz the function arguments in that Global and local corpora. In Line 1–3, all seed corpora
shadow process in collaboration with the in-vivo fuzzer. We are initially set to the empty set. The fuzzer maintains two
implemented our prototype, AFLLIVE, on top of AFL++ 4.02c. global corpora Q and Q ✗ and one local corpus Qf for every
amplifier function F ∈ F . Throughout the campaign, the
Instrumentation. Our instrumentation pass (LLVM 14 [16])
global corpus will contain tuples where the first element is
adds the runtime whose purpose is to mediate between the in-
vivo fuzzer and running process of the instrumented binary. At 5 This script checks each function’s name against a predefined list of
the entry point of the main function, AFLLIVE inserts a call into substrings and validates if at least one of its parameters is a pointer (or double
our runtime to initialize any necessary state and communicate pointer) of type char* or uint8_t and another one is an number.
an amplifier point and the second the serialized function argu- Algorithm 2 Function fuzz_function_args
ments. A local corpus does not need the amplifier information Input: Process p′ , function f , types t, constraints c, args q
and is hence a set of serialized function arguments. Since the Input: Global corpus Q, Local corpus Qf , Crashes Q ✗
fuzzer proceeds in a coverage-guided manner, the local and 1: Energy e = compute_energy(f, q, Q)
2: while e not expired do
global corpora Q (and Q ✗ ) contain arguments that have been
3: Mutated args q ′ = mutate(q)
observed to be coverage-increasing (and crash-inducing, resp.). 4: Mutated objs o = deserialize(q ′ , t, c)
5: Process p′′ = fork_rewind_wait(p′ , f )
Forking. In Line 4, the execution of process p is forked 6: Result r = substitute_continue(p′′ , f, o)
which allows the host to continue execution normally while we 7: if r = new crash detected then
keep a handle to the shadow execution, which will be used for 8: Add ⟨f, q ′ ⟩ to Q ✗
fuzzing. We assume that the forked process is isolated from the 9: else if r = coverage increased then
original execution and does not interfere with it. Conceptually, 10: Add ⟨f, q ′ ⟩ to Q
11: Add q ′ to Qf
we assume the process has the ability to be rewound back 12: end if
to the invocation of any amplifier point, which is needed to 13: end while
alternate amplifier points during fuzzing. Output: Global corpus Q, Local corpus Qf , Crashes Q ✗

Auto-collecting initial seeds. In Line 5–11, the fuzzer har-


vests the initial seeds from the original execution. These seeds
C. Mutational Fuzzing of Function Arguments
are later used for coverage-guided, mutational fuzzing. For
every amplifier point that is executed in the original, running Algorithm 2 shows fuzz_function_args called in
process, our instrumentation pass made sure the call is routed Line 16 and 22 of Algorithm 1. Given the shadow process, the
through the in-vivo runtime which collects the function argu- amplifier point, types, and constraints, and the seed arguments,
ments as objects from the (forked) shadow process (Line 6). it mutates the seed to generate alternative function arguments.
These objects corresponding to the function arguments are Those that increased code coverage are added to the local and
serialized and added to the global and local corpora. global corpora while those that induced a unique crash are
added to the set of crashes.
Screening loop. In Line 12–18, the fuzzer fuzzes every Mutation. In Line 1–3, an “optimal” number of mutations
executed amplifier point for a fixed amount of time in order o′ of the serialized function arguments q are created. What is
to collect sufficient coverage information for the main fuzzing considered as optimal is computed in the compute_energy
loop. The time budget t0 for every amplifier point is fixed function while the mutation operators applied to the arguments
by the user. Without the screening loop, all amplifier points are implemented in the mutate function. Since the serialized
will be considered as equally good at generating coverage in- arguments is just a sequence of bytes, we can reuse the
creasing inputs since the coverage collected during the initial, implementations of both functions in a classic fuzzer [7], [20].
non-amplified execution will be exactly the same for all initial Deserialization. In Line 4, the in-vivo runtime receives
seeds. The screening loop over all amplifier functions forces and parses the mutated sequence of bytes into the actual
the fuzzer to explore which regions of code each amplifier function argument objects using the intrumenter-provided type
point is capable of covering. The function select (Line 14) information t. This process is deterministic: Deserializing the
selects the next best seed from the current local queue while same byte sequence multiple times results in the same argu-
the function fuzz_function_args (Line 16) fuzzes the ment objects being generated, which ensures consistency and
selected seed. During fuzzing, all coverage-increasing inputs reproducibility. The deserialization procedure also enforces
are added to the global and local queues (Q, Qf ) while all the user-provided constraints c. We discuss the procedure of
crashing inputs are added to the set of crashes (Q ✗ ). deserialize in a separate section below.
Main fuzzing loop. In Line 19–23, the fuzzer fuzzes the Spawning shadow executions. In Line 5 and 6, the in-vivo
seeds selected from the global queue until the campaign is runtime uses the shadow process p′ to spawn another shadow
aborted or the time budget t1 is depleted. To select the next execution which is rewound back to right before the selected
seed, we can now simply reuse the default heuristics of the function f is called, so as to continue executing with the
underlying fuzzer (AFL++). Given a seed corpus of serialized mutated function parameters. Technically, we can implement
function arguments, this is how we fuzz those arguments in a the function fork_rewind_wait by considering p′ as
coverage-guided manner with a minimal false positive rate. running in a virtual machine and using a snapshot-restore
mechanism to restore a snapshot of p′ right before f is called.
Early termination. AFLLIVE can be configured to terminate This approach fully isolates the constructed process p′′ from
at the exit of or an arbitrary period of time after an amplifier the shadow process p′ (and the original process p), but it also
function has returned, e.g., if the fuzzer throughput is too low. introduces a performance and memory overhead for storing
Our intuition is that crashes often arise shortly after the call and loading the snapshots. In our prototype, we chose to
to the amplified function. Early termination allows the user to fully reexecute the host application until f is reached (to
strike a balance between performance and stability for a given conceptually rewind it) and start a fork server at f where the
campaign, at the risk of introducing false-negatives. runtime interferes to provide the function call parameters o.
Algorithm 3 Function deserialize set to 0. When called, the function returns the desired amount
Input: Function argument byte sequence q, of bytes available in the sequence, and advances the offset by
Input: Function argument types t that same amount of bytes. If the in-vivo runtime attempts to
Input: Function argument constraints c consume more bytes than available, the function returns all
Output: Function argument objects o
1: Objects o = ∅
available bytes and the remaining bytes set to zero.
2: for type in t do For primitive types (Line 9–10), the runtime reads as many
3: Object obj = deserialize_arg(q, type, c) bytes from the fuzzer-provided byte sequence q as needed in
4: Add obj to o order to properly cast them into the appropriate type t.
5: end for For structured types (Line 11–14), an empty instance of
6: function deserialize_arg(q, type)
7: Object obj the structure is allocated, and the algorithm is then applied
8: if type is primitive then repeatedly and recursively for each field of the structure,
9: obj = q.consume_bytes(type.bitWidth / 8) consuming the available bytes from the byte sequence q.
10: else if type = Struct then For pointer types (Line 15–21), our current prototype de-
11: for field, fieldType in type.fields do serializes those as arrays. The value represented by the four
12: obj.field = deserialize_arg(q, fieldType, c)
13: end for bytes consumed from the byte sequence q determine the
14: else if type = Pointer then number of elements (length) that are to be included in the
15: type′ = type.pointeeType array. This includes arrays of length one (single elements) and
16: length = q.consume_bytes(4) zero (null pointers). This is because in the C programming
17: for i ∈ {0, ..., length − 1} do language a pointer can transparently point to one element or
18: obj[i] = deserialize_arg(q, type′ , c)
19: end for the beginning of a list of elements. We can recover the ”width”
20: end if of a single element from the pointee type information (type′ ).
21: Object obj = enforce_constraints(obj, c) The individual elements can then be deserialized recursively.
22: return obj We rely on the user-provided constraints to enforce the validity
23: end function of the deserialized array length (cf. Tab. I).
Constraint enforcement. In Line 22, the fuzzer runtime mod-
ifies the constructed function argument objects to render them
Coverage-guidance. In Line 7 to 12, the fuzzer adds func- valid with respect to the user-provided constraints C. These
tion arguments to the corpora that are observed to be coverage- constraints denote inequalities between constants, primitive
increasing. Function arguments that are observed to induce type parameters, lengths of array parameters, or array items.
crashes are added to the corpus containing the crashing inputs. In order to “enforce” them, the runtime goes over the de-
serialized values, bounding the value of each left-hand side of
D. Serialization and Deserialization
a constraint with respect to the right-hand side.
In order to reuse existing greybox fuzzers to implement seed This implies a dependency relationship between the values
selection (select), prioritization (compute_energy), and of the left-hand side of a constraint and its right-hand side.
mutation (mutate), we need to translate function arguments This in turn means that the set of constraints specified by the
into a sequence of bytes (which the fuzzer can handle) and user can not denote circular dependencies, in order to allow
back again. We accomodate the serialization (serialize) the runtime to traverse the set of arguments in a valid order.
and deserialization (deserialize) procedures in the in- Additionally, the user can tag string arguments as filenames,
vivo runtime that is instrumented into the host binary (cf. for which the runtime will dump fuzzing data into a temporary
Fig. 1). On a high-level, this process is similar to the appraoch file, and replace the string provided to the function with the
proposed by Padhye et al. [19] which enables coverage-guided corresponding filename.
mutational fuzzing for an object-oriented language, like Java. Serialization. As mentioned earlier, the serialization in
Algorithm 3 presents the procedure of the deserialize Algorithm 1 (Line 8) proceeds analogously, translating the
function. The procedure of serialize is analogous. Given function argument objects o into a byte sequence q, such that
a seed byte sequence, the argument types, and the argument if Algorithm 3 is applied to q, we would recover o. However, it
constraints, the deserialization algorithm computes the func- might not be immediately clear how pointers are handled. How
tion argument objects to be passed into the function call. In do we know a priori whether a pointer points to no element
Line 1–5, the runtime generates one object for every function at all, a single element, or a number of elements? In this case,
argument using its type information. Line 7–24 sketches the re- we rely on user-specified constraints to properly indicate the
cursive procedure of the corresponding deserialize_arg size of the array by way of reference. If the user-provided
function which also enforces the validity of the synthesized constraints are not strong enough to assign a value to the length
function argument objects. of the array referred to by a pointer, our fuzzer prototype
The provided sequence of bytes q is “consumed”, such that defaults to treating character pointers as null-terminated strings
each byte is used at most once for the construction of the (in which case the length of the array is calculated by looking
function argument objects. The function consume_bytes for the first occurrence of the null byte in the string) and any
keeps an offset into the fuzzer-provided byte sequence, initially other pointers as pointers to single elements.
Subject Type #LOC Version Host AP M.#C.
1125.0 39550
boringssl Encryption 483.2k dd52194 crypto_test 37 2

lines covered
39450

lines covered
bzip2 Compression 8.2k 1.0.8 bzip2 1 2 1068.8
libass Rendering 35.4k 0.17.1 ffmpeg 4 2
libexif Parsing 30.7k 0.6.24 photographer 2 1 1012.5 39350
956.2 39250
TABLE II: Detailed information about our subject programs.
900.0 39150
0 6 12 18 24 0 6 12 18 24
time (hs) time (hs)
III. I S E XECUTION A MPLIFICATION E FFECTIVE ? (a) bzip2 (b) boringssl
2400
To study the effectiveness of our in-vivo approach, we im- 6500

lines covered
plement AFLLIVE and measure the difference in code coverage 2050

lines covered
5938
achieved between the unamplified original executions and the 1700
5375
amplified, shadow executions on four target libraries. For each 1350
one of them, we choose a host program and one host input 4812
1000
to generate unamplified executions, and use a simple CodeQL 4250 0 6 12 18 24
0 6 12 18 24 time (hs)
script to choose interesting amplifier points heuristically. time (hs)
(c) libass (d) libexif
A. Experimental Setup
Libraries and hosts. Table II shows information about the Fig. 2: Coverage-vs-time for test subjects using in-vivo
benchmarks selected for this experiment (two more to follow). fuzzing. The horizontal, dashed lines indicate the baseline
We randomly picked four widely-used open-source C libraries coverage from the original, unamplified execution. The ver-
that parse host-provided input data. These libraries cover a tical, dashed lines indicate when AFLLIVE switched from the
wide range of domains, from cryptography to rendering. Ap- screening loop to the main fuzzing loop, on the average.
plications using these libraries might attempt to parse untrusted
data and thus any errors present in them might represent
turn shows the median number of constraints specified for each
potential security vulnerabilities. For every library, we picked
amplifier point, as a “proxy” metric of the amount of effort
one host and one input for that host whose execution we
involved in setting up each subject.
sought to amplify. For our host selection criteria, we focused
Fuzzing campaigns. For every project, we started 20 in-vivo
on programs that were either developed or endorsed by the
campaigns initialized with the same original execution using
same group that developed the library. This was done with
AFLLIVE . All campaigns were run for 24-hours each on a
the intention of minimizing the likelihood of potential crashes
AMD EPYC 7713P 64-Core processor with 256GB of RAM.
stemming from wrong library usage, instead of actual bugs in
the library. B. Experimental Results
For boringssl, we used a binary as host that is supposed
to test the encryption functionality of the library which also Presentation. The results are shown in Figure 2. All values
generates the one unamplified execution. For bzip2, we used reflect coverage within the library, and not on the host. The
the example application bundled with the source code as host vertical, dashed line indicates when, on the average, AFLLIVE
and a compressed version of a text file containing sample text6 switched from screening to the main fuzzing loop. However,
to generate the unamplified execution. For libass, we used since the screening loop only lasts one minute for every target
ffmpeg as host, a large video and audio editing library that function executed, it is barely visible in cases where few
integrates subtitle functionality via libass. The unamplified functions were amplified. The horizontal, dashed line indicates
execution was generated by adding subtitle track with a single the coverage achieved by the unamplified host execution.
subtitle to the shortest possible video. For libexif, we used Results. The greatest increases in term of coverage were
an example application bundled with the source code and one obtained in libass and libexif, where AFLLIVE achieved
of the test images7 to generate the unamplified execution. an increase of 38.24% (1706 lines) and 91.79% (1040 lines)
Amplifier points and constraints are identified using a Cod- over the baseline, unamplified execution, respectively. For
eQL script implementing heuristics to identify parsing-related bzip2 and boringssl, AFLLIVE managed to achieve an
functions (Section II-A). This script returns potential amplifier increase in coverage of 18.75% (173 lines) and 0.82% (321
points consisting of at most a few hundred functions. We then lines), respectively.
went through the list, adjusting the automatically inferred con- The lack of increase in coverage for boringssl is further
straints based on the function signature and example invoca- confirmed by a visual inspection of Figure 2b where we can
tions within the code. Column AP in Table II shows how many see the campaign reach a plateau about 1 hour into the 24 hour
of the identified amplifier points were executed during the host fuzzing campaign.8 Although line coverage did not increase
execution used for the fuzzing campaign. Column M.#C. in after the first few hours, new path-increasing inputs kept being
added to the corpus throughout the campaign. We attribute this
6 “The quick brown fox jumps over the lazy dog”
7 22-canon_tags.jpg 8 Notice that coverage increases well after the screening loop has finished.
to the fact that out of the 2044 functions executed by the host, In the following, we compare the effectiveness of automatic
1381 were fully covered in terms of lines by the unamplified fuzz driver generation to in-vivo fuzzing as implemented
host execution. This in turn means there was little room for in AFLLIVE. Like fuzz driver generation techniques, in-vivo
improvement upon initial line coverage. fuzzing requires only the source code and little human inter-
The substantial increase in coverage for libass and vention in the specification of amplifier points and constraints.
libexif is observed over the entire day-long fuzzing cam-
A. Experimental Setup
paign and appears to further increase beyond our time bud-
get. This suggests that amplifying the right executions can Fuzz driver generators. We selected F UZZ G EN [4] and
bring tremendous benefits to automatic vulnerability discovery F UDGE [5] according to the following selection criteria. We
where the amplified executions reach deep into the code base. consider approaches that target C libraries, and that are either
False positives. No crashes were reported in these (previ- themselves publicly available and compilable or the generated
ously well-fuzzed) programs during the campaign. This in turn drivers are publicly available and compilable. We also consid-
implies that no false positives were reported either, although ered the following fuzz driver generators, but excluded them
it is important to keep in mind that the rate of false positives for the following reasons. GraphFuzz [24] focuses on object-
depends on the quality of the specified amplifier constraints. oriented libraries, and in the case of C libraries a complete
dataflow specification9 must be provided, which we do not
IV. O NBOARDING L IBRARIES W ITHOUT F UZZ D RIVERS have available. For Daisy [23], despite substantial effort, we
An advantage of in-vivo fuzzing, as shown in the previous did not succeed in compiling the available fuzz drivers due
section, is that it makes fuzz drivers superfluous. A fuzz driver to missing dependencies. For IntelliGen [22], neither the tool
is a piece of code that acts as a glue code between an off- itself nor fuzz drivers generated by it were publicly available.
the-shelf fuzzer and the library-under-test. The driver sets up For the comparison against F UZZ G EN, since the tool itself
an artificial calling context and is responsible for accepting was not available, we selected three of the seven libraries, as
data from the fuzzer and feeding it into the library with the shown in Table III. Out of the four excluded libraries, three
appropriate format. had an API that consisted of a single function that accepted
However, effective fuzz drivers are often manually imple- a complex struct object which wraps the actual library
mented which constitutes a major hinderance to widespread call and maintains the entire state of the library interaction10 .
adoption of fuzzing. For instance, Google incentivices the The remaining library was excluded because it could not be
development of fuzz drivers for important open source projects compiled (or easily fixed). For the three selected libraries, we
by paying up to 30k USD for the successful and effective used the only driver available for libaom and libvpx and a
integration [2]. In fact, Google’s project OSS-Fuzz [1] is random fuzz driver (codlin) for libgsm.
primarily a community-maintained collection of fuzz drivers For the comparison against F UDGE, we selected all fuzz
for open source projects. drivers mentioned in the paper, except OpenCV, as shown in
An existing approach to overcome this hindrance is to Table III. OpenCV was excluded since the entire API consisted
automatically synthesize fuzz drivers. For instance, F UZZ G EN of C++ rather than C functions. leptonica and htslib
[4] leverages a whole system analysis to infer the library’s are highly popular libraries used for image processing, and
interface and synthesizes fuzz drivers specifically against that high-throughput sequencing data processing, respectively.
interface. F UDGE [5] scans a repository for usages of the Hosts and Original Execution. For every library, we picked
library’s API, uses program slicing [21] to extract the corre- one host and one input for that host whose execution we
sponding code snippets, synthesizes a fuzz driver candidate for could amplify (cf. Table III). To be fair, we provided each
every code snippet by concretizing place holders, and evaluate auto-generated fuzz driver with an initial corpus that generates
the generated fuzz driver candidates by building and running it. precisely the same values for the library API as our in-vivo
IntelliGen [22] also first infers the library’s interface annotated fuzzer. Our intention is that the tested libraries execute on
with vulnerability likelhoods and generates fuzz drivers for the same piece of data during the first run (for instance,
the entry functions through hierarchical parameter replacement they should attempt to decode the same byte-stream in the
and type inference. Daisy [23] first dynamically observes how case of decoders). For our host selection criteria, we focused
a host system calls the library’s API, and then synthesizes fuzz on programs that were either developed or endorsed by the
drivers that follow a similar object usage pattern via a series same group that developed the library. This was done with
of API calls. the intention of minimizing the likelihood of potential crashes
However, these approaches hoist the tested library only stemming from wrong library usage, instead of actual bugs in
very artificially resulting in a high false positive and false the library.
negative rate. The libraries would never be integrated or used Fuzzing campaigns. For all of the five projects, we started
in this way in real applications. Approaches that immitate the 20 in-vivo fuzzing campaigns initialized with the same original
actual usage as faithfully as possible will still not be as close execution using AFLLIVE and 20 normal AFL campaigns using
to fuzzing a library as it is actually used. This is precisely 9 https://github.com/hgarrereyn/GraphFuzz/issues/1
our proposal: We suggest to amplify actual user-generated 10 Example for libhevc: https://android.googlesource.com/platform/
executions where a library is actually used. external/libhevc/+/refs/heads/main/test/decoder/main.c#563
SOTA Library Type #LOC Version Synth. fuzz driver #LOC Host Initial corpus AP M.#C.
libaom Video Codec 693.0k 3613e5d av1_dec_fuzzer 1131 aomdec sample av1 file 4 1.5
F UZZ G EN libvpx Video Codec 0.5k 1.12.0 simple_decoder 482 vpxdec sample vp9 file 5 2
libgsm Speech compressor 8.7k 1.0.22 cod2lin 371 STL/rpedemo sample wav file 3 1
htslib File parser 99.0k 1.16 hts_open 152 samtools sample sam 2 2
F UDGE
and fasta file
leptonica Image processor 320.0k 1.83.0 pix_rotate_shear 68 tesseract sample png file 18 1
with english text

TABLE III: Summary of setup for each test subjects for comparison against state-of-the-art (SOTA) fuzz driver generators.

40000 9600 AFL via the synthesized fuzz drivers when the campaign is
30000 lines covered 9400 started. We find that a synthesized fuzz driver only exercises
lines covered

a handful of API functions in a rather shallow manner while


20000 9200
a host application often interacts with a library via a complex
10000 9000 series of API function calls. The synthesized fuzz drivers do
0 8800 not seem to be able to mimic these complex interactions.
0 6 12 18 24 0 6 12 18 24
time (hs) time (hs) The case libgsm seems pathological since there is little
(a) libaom (F UZZ G EN) (b) libvpx (F UZZ G EN) coverage increase over time for both fuzzers. Upon closer
1300 9000 inspection, we found that the encoding and decoding routines
1075 7875 in this library consisted almost entirely of sequential blocks
lines covered

lines covered

of instructions with no control flow statements. This explains


850 6750
why neither fuzzer was able to increase coverage substantially
625 5625 via the selected amplifier points.
400 4500 Bug finding results. Our in-vivo fuzzer AFLLIVE found
0 6 12 18 24 0 6 12 18 24
time (hs) time (hs) seven previously unknown crashes in htslib (cf. Fig. 3.f).
(c) libgsm (F UZZ G EN) (d) htslib (F UDGE) Five were memory safety bugs: a null pointer dereference and
6000 Library Bug Type ID two out-of-memory errors within cram/cram_encode.c
4500 as well as a heap overflow in header.c, and a use-after-free
lines covered

htslib NULL ptr. deref. [blinded]


htslib UAF [blinded]
in md5.c. The remaining two crashes were assertion viola-
3000 htslib Buffer overflow [blinded]
htslib Out-Of-Memory [blinded] tions in cram/cram_io.c and cram/cram_codecs.c.
1500 htslib Out-Of-Memory [blinded]
AFLLIVE found these seven memory corruption bugs despite
htslib Assertion violation [blinded]
0 htslib Assertion violation [blinded] the F UDGE-synthesized (and later manually adjusted)11 having
0 6 12 18 24
time (hs) continuously fuzzed the library for four years12 . No further
(e) leptonica (F UDGE) (f) Crashes found crashes were reported by other campaigns.
False positives. All of the reported crashes were true pos-
Fig. 3: Coverage-vs-time comparison between state-of-the-art
itives which could be reproduced after the campaigns were
(dash-dots) and in-vivo fuzzing (solid), plus crashes found.
finished. Moreover, manual inspection revealed that all seven
crashes were reproducible via the host program by providing
the synthesized fuzz harnesses. All campaigns were run for 24- an appropriate system-level input which, we confirmed, could
hours each on a AMD EPYC 7713P 64-Core processor with be under attacker-control.
256GB of RAM.
V. A MPLIFYING THE P ROGRAM ’ S M ANUAL T EST S UITE
B. Experimental Results
AFLLIVE can amplify any execution, including one that is
Presentation. Figure 3 show the results in terms of coverage generated by a manually constructed test suite. Test cases often
over time and crashes found for all five subjects. The dashed cover certain edge cases or are designed to catch regressions
vertical line indicates when, on the average, AFLLIVE switched similar to previously discovered bugs. Over 40% of 0-days
from screening to the main fuzzing loop. In all cases, only exploited in-the-wild are variants of previously discovered
coverage achieved within the library is counted, excluding any vulnerabilities [25]. Amplifying test cases allows us to search
coverage information about the host or the fuzz driver. their ”neighborhood” and bring to light new bugs or those that
Coverage results. For the entire duration of the campaign have been incompletely fixed. Since test suite are designed
and for all subjects, AFLLIVE achieves substantially more with code coverage in mind, amplifying test executions might
coverage than the campaigns via the synthesized fuzz drivers. allow us to reach deep into the code, effectively rendering
Both seem to plateau at around the same time. However, in- every function ammenable to fuzzing.
vivo fuzzing has the capability to cover substantially more
code before reaching that plateau. It is interesting to note 11 https://github.com/samtools/htslib/commits/develop/test/fuzz/hts open fuzzer.c

that AFLLIVE constistently achieves more initial coverage than 12 https://github.com/google/oss-fuzz/commit/af319543


Initial
Library Type #LOC Version %Coverage AP M.#C.
136200 59300.0
136000

lines covered
59187.5

lines covered
openssl Cryptography 1M 3.0.6 60% 93 2
libxml2 Parsing 308k 2.10.3 61% 14 2
opus Speech 80k 1.3.1 93% 9 2 135800 59075.0
compressor 135600 58962.5
TABLE IV: Information about libraries and manual test suites. 135400 58850.0
0 6 12 18 24 0 6 12 18 24
time (hs) time (hs)
(a) openssl (b) libxml2
Distributing energy. Ideally, we would like to fuzz every 17750
Bug Type ID
17188

lines covered
amplifier point that is executed by the test suite for the same
Buffer overflow CVE-2022-3602
amount of time. However, some amplifier points are executed 16625 Buffer overflow PR 19166
by a large number of test cases while other amplifier points Use-after-free CVE-[blinded]
16062 Denial of Service PR [blinded]
are executed just by a single test case. So, how much “energy”
do we assign to each test case to achieve this objective? 15500
0 6 12 18 24
time (hs)
Algorithm 4 Test amplification (c) opus (d) Bugs found in OpenSSL.
Input: Test suite S Fig. 4: Coverage and bugs in test amplification campaigns.
Input: Amplifier points F , Types T , Constraints C, Time t0
1: Map test2func = ∅
2: Set funcs = ∅
3: for Test s ∈ S do
4: test2func[s] = get_exec_amplifiers(s, F ) the amplifier points were auto-identified using our tool (§ II-A)
5: funcs = funcs ∪ test2func[s] and manually constrained afterwards.
6: end for
7: executed = |funcs| State of the art. There exists a fuzz driver generator specific
8: fuzzed funcs= ∅ for test amplification, called UT OPIA [6]. Given a library-
9: while not aborted do
under-test and the gtest or boost test suite, UT OPIA first
10: for s in S do
11: unfuzzed = |test2func[s] − fuzzed funcs| performs a lightweight static analysis before synthesizing fuzz
12: if unfuzzed > 0 then drivers for the tested library functions. The static analysis is
13: Time budget t1 = unfuzzed/executed used to identify the precondition of every library function.
14: fuzz(F, T, C, exec(s), t0 , t1 ) For every test case, the synthesis first identifies the library
15: fuzzed funcs = fuzzed funcs ∪ test2func[s] functions used in the test case and the constants used as
16: end if
17: end for parameters in a corresponding function call, and then generates
18: end while a fuzz driver for the library functions by rendering the constant
library function call parameters subject to fuzzing. For our ex-
Algorithm 4 illustrates our algorithm to distribute the avail- periments, we reuse the identified functions and preconditions
able energy evenly over the amplifier points executed by as amplifier points and constraints using a straightforward
the test suite S. In Line 1–7, it finds the amplifier function translation, to ensure the fairness of the comparison. This
executed by each test case s ∈ S and counts how many demonstrates the versitality of our in-vivo approach which al-
amplifiers are executed in total. In Line 8–18, it skips test cases lows diverse means of automatic amplifier point identification
that execute no unfuzzed amplifier point (Line 12). Otherwise, and requires no specific test framework.
it computes the proportion of all executed amplifiers that are Unfortunately, despite several months of experimentation,
executed by test case s and still unfuzzed as the time budget t1 we realized that on the UT OPIA benchmark programs using
for s (Line 13), and starts a corresponding fuzzing campaign the UT OPIA-identified amplifier points and constraints, all
(Line 14). Specifically, the function fuzz implements the crashing inputs generated by UT OPIA (and by our AFLLIVE)
proposed in-vivo fuzzing approach as defined in Algorithm 1. only reveal false positives. Upon manual examination, we
A. Experimental Setup discovered that the drivers synthesized by UT OPIA (as well
as the results of its analysis) did lead to an incorrect usage of
Table IVshows the selected libraries, the corresponding test
the libraries, and thus to a large amount of spurious crashes.
suite coverage, and the number of executed amplifier points
To be sure, we repeated the analysis by filtering inputs that
(AP). We randomly chose libraries from diverse domains that
did not crash on the most recent version, assuming these
are security-critical, well-fuzzed (5+ years),13 and widely used
bugs would now be fixed, but only found that the remaining
open-source C libraries. For test amplification, no host or host
crashers were flaky, i.e., crashed again if run repeatedly. Since
input is needed, as all libraries had test suites and testing
the prototype provided by the authors is highly automated
frameworks readily available. Like for the other experiments,
(i.e. it requires little intervention and there is not much room
13 2016 Commit contains OpenSSL & LibXML2: https://github.com/google/ for misusage), we conclude that an experimental comparison
oss-fuzz/commit/a143b9b3 would not provide much insight.
Inferred config. Curated config.
B. Experimental Results Subject Cov. (#LOC) T.P. F.P. Cov. (#LOC) T.P. F.P.
Presentation. Figure 4 shows the average coverage over boringssl 40027.95 - 4 39511.95 - -
bzip2 - - - 1096.10 - -
time and the bugs found during test amplification. The vertical libass 6553.40 - - 6168.25 - -
dashed lines indicate a change in test case during the fuzzing libexif - - - 2174.95 - -
campaign (Line 14 of Alg. 4). The horizontal dashed line htslib 5723.00 - - 8504.50 7 -
leptonica 3178.85 - - 5255.75 - -
indicates the initial coverage for the library’s test suite. We libaom 11442.00 - - 38261.55 - -
only measure coverage of the library. libgsm - - - 1218.40 - -
Coverage results. AFLLIVE achieved an increase over the libvpx 8203.00 - - 9565.80 - -
libxml2 59681.40 - - 59235.95 - -
manual test suite by around 600 LoC for openssl and over openssl 135903.20 - 4 136162.90 4 -
300 LoC for libxml2. No increase in coverage was achieved opus 18804.95 - - 16642.00 - -
for opus. Closer inspection revealed that the manual test suite
is of very high quality and nearly saturated, covering almost TABLE V: Coverage and bugs in fully automated campaigns.
95% of lines of code in opus. There are five test cases that
exercise all of the amplifier points selected for opus (which
CodeQL script to identify APs (113 LoC) and a Python script
explains why all of the time budget was invested into one test
to generate ACs (274 LoC).14 For manual refinement:
case). The latter is true also for LibXML2, where the first test
• For subjects where no executed APs where identified
case already exercises all but one amplifier point.
For openssl, we see that switching test cases to exercise (bzip2, libexif and libgsm), we added the main
new amplifier points is effective and after saturation, code cov- entry points of the library as APs (via documentation).
• We added (or modified) ACs to ensure these conditions:
erage increases again when the next test case is fuzzed. This
highlights that, using our approach, once the amplifier points 1) (sizeof(buf) = len ∧ len < C) which requires that
have been identified and their constraints correctly specified variable len determines the length of the buffer buf
the user is able to setup several fuzzing campaigns with little and len is less than the constant C.
extra effort. Towards the end of the 24 hour campaign, it is also 2) (sizeof(buf)<C) which requires that the length of
interesting to note that coverage saturates despite switching to the buffer is smaller than the constant C, or
test cases that exercise new amplifier points. 3) (is_file(filename)) which requires that the string
Bug finding results. AFLLIVE discovers 4 bugs in openssl, f ilename refers to a valid file where fuzzing input
two of which have previously been found only by manual will be dumped.
auditing, including the high-severity PunyCode vulnerability These patterns account for 98% of all ACs.
(CVE-2022-3602), and two of which have not previously been As an indicator of the additional manual effort for each
known, including a moderate-severity use-after-free (CVE- subject, we note that we either added or removed constraints
[blinded]). In terms of false positives, no false positive crashes for no more than 12% of the automatically identified amplifier
were reported. points across all subjects. Even then, no more than two
constraints needed to be added/removed.
VI. S EMI -AUTOMATED I DENTIFICATION OF A MPLIFIER
In comparison to AC specification, writing a fuzz driver
P OINTS AND C ONSTRAINTS
from scratch could take an experienced developer several
The two main concepts of invivo fuzzing are amplifier hours, and would need to be maintained afterwards. For
points (APs) and amplifier constraints (ACs). While APs instance, the driver integrated into OSS-Fuzz [1] for libass
identify interesting functions, the purpose of ACs is to make was written over the course of two days by a core developer
implicit function-preconditions explicit, just like like user- of the project, and iterated upon several times.
defined preconditions in property-based testing (PBT) [18],
[19] or user-defined repOK-methods in search-based software B. Ablation Study
testing (SBST) [26], [27]. In order to study the impact of our additional manual effort
In general, ACs can be written manually to reduce false to reduce false positives, we compare the effectiveness of
positives, but they do not need to be added. In terms of AFLLIVE using only the auto-generated APs and ACs to the
manual effort, there is a tradeoff between specifying ACs effectiveness of AFLLIVE using the manually augmented set
versus going through the false positives. For instance, sup- of APs and ACs. All campaigns were run for 24-hours each
pose AFLLIVE finds a possible null-pointer-dereference on a on a AMD EPYC 7713P 64-Core processor with 256GB of
function-parameter, but that function is never called with a RAM.
null-pointer. This is a false positive. We allow users of invivo- Coverage results. Table V shows the average coverage
fuzzing to encode this implicit assumption explicitly. achieved throughout the campaign, along with false and true
positives reported, for both the fully automated and manually
A. Semi-Automatic Identification
modified configurations. For all subjects with identifiable
For our experiments, we used a semi-automated approach. amplifier points, coverage achieved via auto-generated APs
An initial set of APs/ACs was first automatically identified
and then manually refined. For automation, we developed a 14 https://anonymous.4open.science/r/afllive-598A/config generator
and ACs was on par (i.e., same order of magnitude) with the within the neighborhood of a valid program state, there is a
coverage achieved through the semi-automatic approach. low risk of false positives.
For the subject where no executed APs were identified Selective symbolic execution. S 2 E [33] first introduced
automatically (i.e., bzip2, libexif and libgsm), the the ”in-vivo approach” for symbolic execution by injecting
campaigns failed to run. However, after manually specifying 6 a symbolic executor into a program binary that would acti-
amplifier points and 7 constraints across the three subjects, the vate whenever an ”expansion point” is reached and collapse
campaigns run and managed to increase coverage significantly the symbolic state (corsetting) whenever symbolic execution
over the original execution (see Figure 2, Figure 3). becomes impractical, e.g., for library calls. In contrast, our
For some subjects the automatically inferred constraints led coverage-guided in-vivo fuzzer does not require the symbolic
to a higher code coverage, such as in the case of boringssl, execution machinery for tracking and solving symbolic states.
libass, libxml2 and opus. This can be attributed to Our approach is coverage-guided and works even for deployed
the fact that we were overly conservative when manually binaries using actual executions if non-interference between
modifying constraints in an effort to prevent a high false- shadow and original execution is guaranteed by the snapshot-
positive rate. ting mechanism.
Bug finding results. Given only the automatically inferred Snapshot fuzzing. The first lightweight snapshot-restore
constraints, AFLLIVE failed to find the previously discovered mechanism in fuzzing was the AFL fork server [20]. It would
bugs. Expectedly, this also led to a higher number of false pos- allow the fuzzer to skip the expensive execution prefix during
itives for two of the subjects (boringssl and openssl), repeated execution of the same program with different inputs.
which were also the most complex subjects that we analyzed. Snappy [34] further explored how to set the fork server as late
Still, no more than five false positives were reported in each as possible into the execution of the program. Nyx [14], [35]
case, and could thus be triaged in a reasonable amount of time introduced a proper Virtual Machine (VM)-based snapshot-
(less than a few hours). restore mechanism. In contrast, we relax the constraint that
the fuzzer must produce a system-level input and instead
VII. R ELATED W ORK propose to use the snapshot-restore mechanism to amplify an
original execution at user-specify concrete amplifier points and
Automatic unit-level testing. Long before fuzzing entered the constraints to generate shadow executions.
stage, the software engineering community studied automatic In-vivo fuzzing in production. Our long-term vision, assum-
approaches for unit-level test generation [18], [26], [28]–[30]. ing several technical challenges are tackled, is to integrate in-
Examples of a unit are Java objects or C functions. One major vivo fuzzing into the production system, so as to fuzz the
research challenge of automatic unit-level testing has been to entire supply chain of a software system, including all of its
minimize the number of false positives, i.e., bugs that only dependencies. The idea to integrate bug finding into production
appear during automatic testing, but never in production when is not very far fetched. For instance, Google is running a no-
the unit is properly used. There are two approaches to tackle overhead version of AddressSanitizer [36] on every Android
this problem: (a) to let the user specify conditions representing 11 phone and every Chrome browser [37], [38]. Apart from
the valid usage of that unit [18], [26], and (b) to observe bug finding, Google has long been running Google-Wide
how the unit is used, e.g., during system-level testing, and Profilers (GWP) which conduct light-weight program analysis
to enforce the infered protocol during unit-level testing [31]. across entire fleets of machines [39]. Mozilla implemented
For instance, the approach taken by the Daisy [23] fuzz driver the approach for Firefox [40]. The open source community
generator represents Approach (b) while our AFLLIVE takes implemented the approach for the Linux kernel [41].
Approach (a) to minimize the number of false positives during
VIII. C ONCLUSION
in-vivo fuzzing.
Valid calling context. Another major research challenge A. Perspective
of automatic unit-level testing has been to generate a valid Existing fuzzers are designed to test a software system
sequence of API calls and construct the required objects to in-vitro, i.e. under artificial lab conditions. However, the
pass in as parameters to these calls. Given the preconditions effectiveness of in-vitro fuzzing is limited [42]. It is these
(called contract), Randoop [29] constructs the sequence of API limitations which we sought to address in this paper.
calls and objects in a feedback-directed manner, continuously Solving dependency on fuzz driver quality. A fuzzer must
evolving test cases that do not violate the user-provided con- first be ”glued” to the software via fuzz drivers. Typically, fuzz
tract. JQF [19] and CGPT [32] add coverage-guidance. How- drivers are tediously developed and continuously updated over
ever, fundamentally these tools follow a generational approach months. For instance, Google pays up to 20k USD for fuzz
where the API calls and objects are generated out of thin air drivers of critical open source software [1], [43]. To reduce
and validated only against a user-provided specification. In some manual effort, recent research has focussed on generating
contrast, ours is a mutational approach, where we piggyback drivers automatically [4], [5], [44]. Whenever a security was
on a valid sequence of API calls that are passed valid objects. found by manual auditing, the developer would add a new fuzz
Like the mutational approach on the system-level [3], [7], [20], driver through which the fuzzer is able to find the security flaw.
this allows us to reach much deeper into the code. Staying While the drivers can be improved over time, this dependency
on driver quality cannot be avoided. OpenSSL has 16 drivers in fuzz-driver generation in terms of both code coverage and bug
OSS-Fuzz which have been continuously fuzzed 24/7 over the finding. Providing empirical evidence is the discovery of seven
past six (6) years [45]. In contrast, in-vivo fuzzing eliminates (7) previously unknown vulnerabilities in htslib, even as
the need for fuzz drivers entirely. Just by amplifying the this library has been continually fuzzed using synthetic fuzz
developer test suite, our in-vivo prototype found a critical bug drivers for seven (7) years as of the time of writing. This not
in ”unfuzzed” code of OpenSSL (CVE-2023-0215). only suggests that execution amplification is effective, but also
Solving structure-aware fuzzing. A fuzzer’s effectiveness that real-world applications do indeed interact with libraries in
depends critically on the quality of the initial seed corpus [46]. ways that are not properly captured by existing fuzz drivers.
For instance, if we are fuzzing an PNG image library, inputs Moreover, through test amplification we re-discover a high-
that were generated by mutating valid PNG image files will severity vulnerability in openssl and also uncover a novel
reach more deeply into the library than a random string of moderate severity vulnerability, both of which had not been
bytes. However, valid input structures are easily broken and found through fuzzing before. Apart from the vulnerabilities,
new input structures are difficult to generate by chance. For we find a known bug and a novel one, as well.
instance, if none of the seed images contains an optional eXIf We should note that the effectiveness of our approach
chunk specifying some metadata, it will hardly be generated. depends crucially on the choice of amplifier points and con-
Recent work, including ours [8], has addressed this using (or straints. If we choose the wrong amplifiers, we might get false
learning) the input structure, and “inventing” the missing data positives crashes; but given the flexibility of our approach,
chunks [47]–[50]. However, the critical dependence on initial we did not find this to be an obstacle. For our experiments,
seeds remains. In contrast, in-vivo fuzzing allows us to define we developed a CodeQL script to come up with an initial
as amplifier point that function in the parser which handles amplifier set.15 Via an interactive process, we refined the
an interesting data chunk or set amplifier points deep in the constraints (i.e., preconditions) for every function as follows:
program functionality to entirely skip the parser. Whenever a constraint was incorrectly specified, the fuzzer
Solving stateful fuzzing. Some software systems require would fail within a few seconds, and the constraint would need
inputs in a certain order. For instance, the Transmission Con- an obvious adjustment. Overall, the amplifier identification
trol Protocol (TCP) requires a three-way handshake between process took no more than a few hours for every library.
client and server before data can actually be sent. Without There are still several interesting socio-technical challenges
knowing precisely the implemented protocol, it is difficult ahead of us. Considering that the largest continuous fuzzing
for a fuzzer to generate the right sequence of packets with platform, OSS-Fuzz [1], which fuzzes over 1000 open source
the correct structure. Recent work, including ours [12], has projects on 100k machines 24/7, is nothing but a collection of
used mutational, feedback-direct fuzzing that uses response manually generated fuzz drivers, we are truely excited about
codes, state variables, or human annotations to identify and the prospect that in-vivo fuzzing enables fuzzing for every
leverage the sequence of software states for a sequence of library that is used and compiled in a production environment.
inputs/packets [14], [51]. However, these approaches heavily
depend on the recorded sequences of packets that are used to
seed the mutational fuzzers. In contrast, in-vivo fuzzing allows R EFERENCES
us to define as amplifier point that function which handles a
certain state or state transition. [1] K. Serebryany, “OSS-Fuzz - google’s continuous fuzzing service for
open source software,” in USENIX Security. Vancouver, BC: USENIX
Association, Aug. 2017.
B. Paper Summary
[2] OSS-Fuzz, “Integration rewards,” https://google.github.io/oss-fuzz/
Our approach allows the user to fuzz a library within the getting-started/integration-rewards/, 2021, accessed: 2023-01-11.
context of a host application by exploring the neighborhood [3] LLVM, “Libfuzzer,” https://llvm.org/docs/LibFuzzer.html, accessed:
2023-01-11.
of a valid program state induced by an actual host-generated [4] K. Ispoglou, D. Austin, V. Mohan, and M. Payer, “FuzzGen:
execution of that library. We do so by applying coverage- Automatic fuzzer generation,” in 29th USENIX Security Symposium
guided mutation-based fuzzing on the arguments of each (USENIX Security 20). USENIX Association, Aug. 2020, pp.
2271–2287. [Online]. Available: https://www.usenix.org/conference/
function marked as an amplifier point, subject to a set of user- usenixsecurity20/presentation/ispoglou
specified constraints. By using real-world programs, we can [5] D. Babić, S. Bucur, Y. Chen, F. Ivančić, T. King, M. Kusano,
leverage our approach to fuzz the library within a production- C. Lemieux, L. Szekeres, and W. Wang, “Fudge: Fuzz driver generation
at scale,” in Proceedings of the 2019 27th ACM Joint Meeting on
like usage context. Conversely, we can use a test-suite as a European Software Engineering Conference and Symposium on the
host to explore variants of regression tests and corner cases Foundations of Software Engineering, ser. ESEC/FSE 2019. New
identified by the developers. In contrast to a fuzz-driver based York, NY, USA: Association for Computing Machinery, 2019, p.
975–985. [Online]. Available: https://doi.org/10.1145/3338906.3340456
approach, selected amplifier points need not be part of the API,
[6] B. Jeong, J. Jang, H. Yi, J. Moon, J. Kim, I. Jeon, T. Kim, W. Shim, and
implying that our approach can reach deeper into the code. Y. H. Hwang, “Utopia: Automatic generation of fuzz driver using unit
In our experiments we manage to increase coverage sig- tests,” in 2023 IEEE Symposium on Security and Privacy (SP), 2023,
nificantly over non-amplified executions, indicating that am- pp. 2676–2692.
plification is indeed effective. Furthermore, we manage to
outperform existing state-of-the-art approaches for automated 15 https://anonymous.4open.science/r/afllive-598A/README.md#config-file-1
[7] M. Böhme, V.-T. Pham, and A. Roychoudhury, “Coverage-based [24] H. Green and T. Avgerinos, “Graphfuzz: Library api fuzzing with
greybox fuzzing as markov chain,” in Proceedings of the 2016 lifetime-aware dataflow graphs,” in Proceedings of the 44th International
ACM SIGSAC Conference on Computer and Communications Security, Conference on Software Engineering, ser. ICSE ’22. New York, NY,
ser. CCS ’16. New York, NY, USA: Association for Computing USA: Association for Computing Machinery, 2022, p. 1070–1081.
Machinery, 2016, p. 1032–1043. [Online]. Available: https://doi.org/10. [Online]. Available: https://doi.org/10.1145/3510003.3510228
1145/2976749.2978428 [25] M. Stone, “The ups and downs of 0-days: Our review of 0-days exploited
[8] V.-T. Pham, M. Böhme, A. E. Santosa, A. R. Căciulescu, and A. Roy- in-the-wild in 2022,” July 2023, accessed: 2023-01-11.
choudhury, “Smart greybox fuzzing,” IEEE Transactions on Software [26] C. Boyapati, S. Khurshid, and D. Marinov, “Korat: Automated testing
Engineering, vol. 47, no. 9, pp. 1980–1997, 2021. based on java predicates,” in Proceedings of the 2002 ACM SIGSOFT
[9] C. Aschermann, T. Frassetto, T. Holz, P. Jauernig, A.-R. Sadeghi, and International Symposium on Software Testing and Analysis, ser. ISSTA
D. Teuchert, “Nautilus: Fishing for deep bugs with grammars.” in NDSS, ’02. New York, NY, USA: Association for Computing Machinery, 2002,
2019. p. 123–133. [Online]. Available: https://doi.org/10.1145/566172.566191
[10] J. Wang, B. Chen, L. Wei, and Y. Liu, “Superion: Grammar-aware [27] D. Marinov and S. Khurshid, “Testera: a novel framework for automated
greybox fuzzing,” in 2019 IEEE/ACM 41st International Conference on testing of java programs,” in Proceedings 16th Annual International
Software Engineering (ICSE). IEEE, 2019, pp. 724–735. Conference on Automated Software Engineering (ASE 2001), 2001, pp.
[11] P. Srivastava and M. Payer, “Gramatron: Effective grammar-aware 22–31.
fuzzing,” in Proceedings of the 30th ACM SIGSOFT International [28] G. Fraser and A. Arcuri, “Evolutionary generation of whole test suites,”
Symposium on Software Testing and Analysis, ser. ISSTA 2021. New in International Conference On Quality Software (QSIC). IEEE
York, NY, USA: Association for Computing Machinery, 2021, p. Computer Society, 2011, pp. 31–40.
244–256. [Online]. Available: https://doi.org/10.1145/3460319.3464814 [29] C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball, “Feedback-
[12] V.-T. Pham, M. Böhme, and A. Roychoudhury, “Aflnet: a greybox fuzzer directed random test generation,” in ICSE 2007, Proceedings of the 29th
for network protocols,” in 2020 IEEE 13th International Conference on International Conference on Software Engineering, Minneapolis, MN,
Software Testing, Validation and Verification (ICST). IEEE, 2020, pp. USA, May 2007, pp. 75–84.
460–465. [30] P. McMinn, “Search-based software test data generation: a survey,”
[13] X. Feng, R. Sun, X. Zhu, M. Xue, S. Wen, D. Liu, S. Nepal, Software Testing, Verification and Reliability, vol. 14, no. 2, pp.
and Y. Xiang, “Snipuzz: Black-box fuzzing of iot firmware via 105–156, 2004. [Online]. Available: https://onlinelibrary.wiley.com/doi/
message snippet inference,” in Proceedings of the 2021 ACM SIGSAC abs/10.1002/stvr.294
Conference on Computer and Communications Security, ser. CCS ’21. [31] S. Elbaum, H. N. Chin, M. B. Dwyer, and J. Dokulil, “Carving
New York, NY, USA: Association for Computing Machinery, 2021, p. differential unit test cases from system test cases,” in Proceedings
337–350. [Online]. Available: https://doi.org/10.1145/3460120.3484543 of the 14th ACM SIGSOFT International Symposium on Foundations
[14] S. Schumilo, C. Aschermann, A. Jemmett, A. Abbasi, and T. Holz, of Software Engineering, ser. SIGSOFT ’06/FSE-14. New York,
“Nyx-net: Network fuzzing with incremental snapshots,” in Proceedings NY, USA: Association for Computing Machinery, 2006, p. 253–264.
of the Seventeenth European Conference on Computer Systems, ser. [Online]. Available: https://doi.org/10.1145/1181775.1181806
EuroSys ’22. New York, NY, USA: Association for Computing [32] L. Lampropoulos, M. Hicks, and B. C. Pierce, “Coverage guided,
Machinery, 2022, p. 166–180. [Online]. Available: https://doi.org/10. property based testing,” Proc. ACM Program. Lang., vol. 3, no.
1145/3492321.3519591 OOPSLA, oct 2019. [Online]. Available: https://doi.org/10.1145/
[15] H. Gascon, C. Wressnegger, F. Yamaguchi, D. Arp, and K. Rieck, 3360607
“Pulsar: Stateful black-box fuzzing of proprietary network protocols,” [33] V. Chipounov, V. Kuznetsov, and G. Candea, “S2e: a platform for
in Security and Privacy in Communication Networks: 11th EAI Interna- in-vivo multi-path analysis of software systems,” in Proceedings of
tional Conference, SecureComm 2015, Dallas, TX, USA, October 26-29, the Sixteenth International Conference on Architectural Support for
2015, Proceedings 11. Springer, 2015, pp. 330–347. Programming Languages and Operating Systems. New York, NY,
[16] C. Lattner and V. Adve, “LLVM: A compilation framework for lifelong USA: Association for Computing Machinery, 2011, p. 265–278.
program analysis and transformation,” in CGO, San Jose, CA, USA, [Online]. Available: https://doi.org/10.1145/1950365.1950396
Mar 2004, pp. 75–88. [34] E. Geretto, C. Giuffrida, H. Bos, and E. Van Der Kouwe,
[17] Github, “Codeql,” https://codeql.github.com/, 2021, accessed: 2023-01- “Snappy: Efficient fuzzing with adaptive and mutable snapshots,”
11. in Proceedings of the 38th Annual Computer Security Applications
[18] K. Claessen and J. Hughes, “Quickcheck: A lightweight tool for random Conference, ser. ACSAC ’22. New York, NY, USA: Association
testing of haskell programs,” in Proceedings of the Fifth ACM SIGPLAN for Computing Machinery, 2022, p. 375–387. [Online]. Available:
International Conference on Functional Programming, ser. ICFP ’00. https://doi.org/10.1145/3564625.3564639
New York, NY, USA: Association for Computing Machinery, 2000, p. [35] S. Schumilo, C. Aschermann, A. Abbasi, S. Wör-ner, and T. Holz, “Nyx:
268–279. [Online]. Available: https://doi.org/10.1145/351240.351266 Greybox hypervisor fuzzing using fast snapshots and affine types,” in
[19] R. Padhye, C. Lemieux, and K. Sen, “Jqf: Coverage-guided property- 30th USENIX Security Symposium (USENIX Security 21). USENIX
based testing in java,” in Proceedings of the 28th ACM SIGSOFT Association, Aug. 2021, pp. 2597–2614. [Online]. Available: https:
International Symposium on Software Testing and Analysis, ser. ISSTA //www.usenix.org/conference/usenixsecurity21/presentation/schumilo
2019. New York, NY, USA: Association for Computing Machinery, [36] K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov, “Address-
2019, p. 398–401. [Online]. Available: https://doi.org/10.1145/3293882. sanitizer: A fast address sanity checker,” in Proceedings of the 2012
3339002 USENIX Conference on Annual Technical Conference, ser. USENIX
[20] A. Fioraldi, D. Maier, H. Eißfeldt, and M. Heuse, “Afl++: Combining ATC’12. USA: USENIX Association, 2012, p. 28.
incremental steps of fuzzing research,” in Proceedings of the 14th [37] M. Morehouse, M. Phillips, and K. Serebryany, “Crowdsourced bug
USENIX Conference on Offensive Technologies, ser. WOOT’20. USA: detection in production: Gwp-asan and beyond,” in Proceedings of the
USENIX Association, 2020. C++ Russia, 2020.
[21] M. Weiser, “Program slicing,” in Proceedings of the 5th International [38] V. Tsyrklevich, “Gwp-asan: Sampling heap memory error detection in-
Conference on Software Engineering, ser. ICSE ’81. IEEE Press, 1981, the-wild,” https://www.chromium.org/Home/chromium-security/articles/
p. 439–449. gwp-asan, accessed: 2023-01-11.
[22] M. Zhang, J. Liu, F. Ma, H. Zhang, and Y. Jiang, “Intelligen: Automatic [39] G. Ren, E. Tune, T. Moseley, Y. Shi, S. Rus, and R. Hundt,
driver synthesis for fuzz testing,” in 2021 IEEE/ACM 43rd International “Google-wide profiling: A continuous profiling infrastructure for
Conference on Software Engineering: Software Engineering in Practice data centers,” IEEE Micro, pp. 65–79, 2010. [Online]. Available:
(ICSE-SEIP), 2021, pp. 318–327. http://www.computer.org/portal/web/csdl/doi/10.1109/MM.2010.68
[23] M. Zhang, C. Zhou, J. Liu, M. Wang, J. Liang, J. Zhu, and Y. Jiang, [40] C. Holler, “Phc (probabilistic heap checker): a port of chromium’s gwp-
“Daisy: Effective fuzz driver synthesis with object usage sequence asan project to firefox,” https://bugzilla.mozilla.org/show bug.cgi?id=
analysis,” in 2023 IEEE/ACM 45th International Conference on Software 1523268, 2021, accessed: 2023-01-11.
Engineering: Software Engineering in Practice (ICSE-SEIP), 2023, pp. [41] L. K. Developers, “Kernel electric-fence (kfence),” https://www.kernel.
87–98. org/doc/html/latest/dev-tools/kfence.html, 2021, accessed: 2023-01-11.
[42] M. Böhme, C. Cadar, and A. Roychoudhury, “Fuzzing: Challenges and
reflections,” IEEE Software, vol. 38, no. 3, pp. 79–86, 2021.
[43] O.-F. Team, “Oss-fuzz integration awards,” https://google.github.io/
oss-fuzz/getting-started/integration-rewards/, accessed: 2023-01-11.
[44] C. Zhang, X. Lin, Y. Li, Y. Xue, J. Xie, H. Chen, X. Ying,
J. Wang, and Y. Liu, “APICraft: Fuzz driver generation for
closed-source SDK libraries,” in 30th USENIX Security Symposium
(USENIX Security 21). USENIX Association, Aug. 2021, pp.
2811–2828. [Online]. Available: https://www.usenix.org/conference/
usenixsecurity21/presentation/zhang-cen
[45] “Openssl at oss-fuzz: Commit history,” https://github.com/google/
oss-fuzz/commits/master/projects/openssl, accessed: 2023-01-11.
[46] A. Herrera, H. Gunadi, S. Magrath, M. Norrish, M. Payer, and A. L.
Hosking, “Seed selection for successful fuzzing,” in Proceedings of
the 30th ACM SIGSOFT International Symposium on Software Testing
and Analysis, ser. ISSTA 2021. New York, NY, USA: Association
for Computing Machinery, 2021, p. 230–243. [Online]. Available:
https://doi.org/10.1145/3460319.3464795
[47] W. You, X. Liu, S. Ma, D. Perry, X. Zhang, and B. Liang,
“Slf: Fuzzing without valid seed inputs,” in Proceedings of the
41st International Conference on Software Engineering, ser. ICSE
’19. IEEE Press, 2019, p. 712–723. [Online]. Available: https:
//doi.org/10.1109/ICSE.2019.00080
[48] Y. Li, B. Chen, M. Chandramohan, S.-W. Lin, Y. Liu, and
A. Tiu, “Steelix: Program-state based binary fuzzing,” in Proceedings
of the 2017 11th Joint Meeting on Foundations of Software
Engineering, ser. ESEC/FSE 2017. New York, NY, USA: Association
for Computing Machinery, 2017, p. 627–637. [Online]. Available:
https://doi.org/10.1145/3106237.3106295
[49] C. Aschermann, S. Schumilo, T. Blazytko, R. Gawlik, and T. Holz,
“Redqueen: Fuzzing with input-to-state correspondence,” in Symposium
on Network and Distributed System Security (NDSS), 2019.
[50] A. Fioraldi, D. C. D’Elia, and E. Coppa, “Weizz: automatic grey-box
fuzzing for structured binary formats,” in Proceedings of the 29th
ACM SIGSOFT International Symposium on Software Testing and
Analysis, ser. ISSTA 2020. New York, NY, USA: Association
for Computing Machinery, 2020, p. 1–13. [Online]. Available:
https://doi.org/10.1145/3395363.3397372
[51] C. Aschermann, S. Schumilo, A. Abbasi, and T. Holz, “Ijon: Exploring
deep state spaces via fuzzing,” in 2020 IEEE Symposium on Security
and Privacy, ser. S&P 2020, 2020, pp. 1597–1612.
[52] P. Godefroid, “Micro execution,” in Proceedings of the 36th
International Conference on Software Engineering, ser. ICSE 2014.
New York, NY, USA: Association for Computing Machinery, 2014, p.
539–549. [Online]. Available: https://doi.org/10.1145/2568225.2568273
[53] W. Gao, V.-T. Pham, D. Liu, O. Chang, T. Murray, and B. I.
Rubinstein, “Beyond the coverage plateau: A comprehensive study
of fuzz blockers (registered report),” in Proceedings of the 2nd
International Fuzzing Workshop, ser. FUZZING 2023. New York, NY,
USA: Association for Computing Machinery, 2023, p. 47–55. [Online].
Available: https://doi.org/10.1145/3605157.3605177
[54] C. Holler, K. Herzig, and A. Zeller, “Fuzzing with code
fragments,” in 21st USENIX Security Symposium (USENIX
Security 12). Bellevue, WA: USENIX Association, Aug. 2012,
pp. 445–458. [Online]. Available: https://www.usenix.org/conference/
usenixsecurity12/technical-sessions/presentation/holler
[55] T. Dullien, “Introducing prodfile,” https://prodfiler.com/blog/, 2021, ac-
cessed: 2023-01-11.
[56] Google, “Syzkaller: an unsupervised coverage-guided kernel fuzzer,”
https://github.com/google/syzkaller, 2021, accessed: 2023-01-11.

You might also like