Binary fuzzing way of american fuzzy lop

First Post:

Last Update:

Word Count:
6.2k

Read Time:
38 min

Binary fuzzing way of american fuzzy lop

Intro

The github project is here: https://github.com/google/AFL

This paper refer to the official document: https://afl-1.readthedocs.io/

What is AFL?

American fuzzy lop is a security-oriented fuzzer that employs a novel type of compile-time instrumentation and genetic algorithms to automatically discover clean, interesting test cases that trigger new internal states in the targeted binary. This substantially improves the functional coverage for the fuzzed code. The compact synthesized corpora produced by the tool are also useful for seeding other, more labor- or resource-intensive testing regimes down the road.

How to install env

Method 1

1
2
sudo pacman -S afl-utils
sudo pacman -S afl

Input afl and press Tab Tab output as follows:

1
2
3
4
5
sh: afl-
afl-analyze afl-clang-fast++ afl-fuzz afl-minimize afl-showmap afl-vcrash
afl-clang afl-cmin afl-g++ afl-multicore afl-stats afl-whatsup
afl-clang++ afl-collect afl-gcc afl-multikill afl-sync
afl-clang-fast afl-cron afl-gotcpu afl-plot afl-tmin

So you can use these tools to start fuzzing.

Method 2

1
2
3
4
git clone https://github.com/google/AFL.git
cd afl-master
make -j
sudo make install

How it works?

Copy from official doc: click here to show all details.

0x00 Technical

Technical “whitepaper” for afl-fuzz.

0x01 Coverage measurements

The instrumentation injected into complied programs captures branch (edge) coverage, along with coarse branch-taken (采取粗分枝) hit counts. The code injected at branch points is essentially equivalent to

1
2
3
cur_location = <COMPILE_TIME_RANDOM>;
shared_mem[cur_location ^ prev_location]++;
prev_location = cur_location >> 1;

The cur_location value is generated randomly to simplify the process of linking complex projects and keep the XOR output distributed uniformly.

The shared_mem[] array is a 64 KB SHM region passed to the instrumented binary by the caller. Every byte set in the output map can be thought of as a hit for a particular (branch_src, branch_dst) tuple in the instrumented code.

The size of the map is chosen so that collisions are sporadic with almost all of the intended targets, which usually sport between 2k and 10k discoverable branch points:

Branch cnt Colliding tuples Example targets
1,000 0.75% giflib, lzo
2,000 1.5% zlib, tar, xz
5,000 3.5% libpng, libwebp
10,000 7% libxml
20,000 14% sqlite
50,000 30%

At the same time, its size is small enough to allow the map to be analyzed in a matter of microseconds on the receiving end, and to effortlessly fit within L2 cache.

This form of coverage provides considerably more insight into the execution path of the program than simple block coverage. In particular, it trivially distinguishes between the following execution traces:

1
2
A -> B -> C -> D -> E (tuples: AB, BC, CD, DE)
A -> B -> D -> C -> E (tuples: AB, BD, DC, CE)

This aids the discovery of subtle fault conditions in the underlying code, because security vulnerabilities are more often associated with unexpected or incorrect state transitions than with merely reaching a new basic block.

The reason for the shift operation in the last line of the pseudocode shown earlier in this section is to preserve the directionality of tuples (without this, A ^ B would be indistinguishable from B ^ A) and to retain the identity of tight loops (otherwise, A ^ A would be obviously equal to B ^ B).

The absence of simple saturating arithmetic opcodes on Intel CPUs means that the hit counters can sometimes wrap around to zero. Since this is a fairly unlikely and localized event, it’s seen as an acceptable performance trade-off.

0x02 Detecting new behaviors

The fuzzer maintains a global map of tuples seen in previous executions; this data can be rapidly compared with individual traces and updated in just a couple of dword- or qword-wide instructions and a simple loop.

When a mutated input produces an execution trace containing new tuples, the corresponding input file is preserved and routed for additional processing later on (see section #3). Inputs that do not trigger new local-scale state transitions in the execution trace (i.e., produce no new tuples) are discarded, even if their overall control flow sequence is unique.

This approach allows for a very fine-grained and long-term exploration of program state while not having to perform any computationally intensive and fragile global comparisons of complex execution traces, and while avoiding the scourge of path explosion.

To illustrate the properties of the algorithm, consider that the second trace shown below would be considered substantially new because of the presence of new tuples (CA, AE):

1
2
#1: A -> B -> C -> D -> E
#2: A -> B -> C -> A -> E

At the same time, with #2 processed, the following pattern will not be seen as unique, despite having a markedly different overall execution path:

1
#3: A -> B -> C -> A -> B -> C -> A -> B -> C -> D -> E

In addition to detecting new tuples, the fuzzer also considers coarse tuple hit counts. These are divided into several buckets:

1
1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+

To some extent, the number of buckets is an implementation artifact: it allows an in-place mapping of an 8-bit counter generated by the instrumentation to an 8-position bitmap relied on by the fuzzer executable to keep track of the already-seen execution counts for each tuple.

Changes within the range of a single bucket are ignored; transition from one bucket to another is flagged as an interesting change in program control flow, and is routed to the evolutionary process outlined in the section below.

The hit count behavior provides a way to distinguish between potentially interesting control flow changes, such as a block of code being executed twice when it was normally hit only once. At the same time, it is fairly insensitive to empirically less notable changes, such as a loop going from 47 cycles to 48. The counters also provide some degree of “accidental” immunity against tuple collisions in dense trace maps.

The execution is policed fairly heavily through memory and execution time limits; by default, the timeout is set at 5x the initially-calibrated execution speed, rounded up to 20 ms. The aggressive timeouts are meant to prevent dramatic fuzzer performance degradation by descending into tarpits that, say, improve coverage by 1% while being 100x slower; we pragmatically reject them and hope that the fuzzer will find a less expensive way to reach the same code. Empirical testing strongly suggests that more generous time limits are not worth the cost.

0x03 Evolving the input queue

Mutated test cases that produced new state transitions within the program are added to the input queue and used as a starting point for future rounds of fuzzing. They supplement, but do not automatically replace, existing finds.

In contrast to more greedy genetic algorithms, this approach allows the tool to progressively explore various disjoint and possibly mutually incompatible features of the underlying data format, as shown in this image:

http://lcamtuf.coredump.cx/afl/afl_gzip.png

Several practical examples of the results of this algorithm are discussed here:

http://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-thin-air.html http://lcamtuf.blogspot.com/2014/11/afl-fuzz-nobody-expects-cdata-sections.html

The synthetic corpus produced by this process is essentially a compact collection of “hmm, this does something new!” input files, and can be used to seed any other testing processes down the line (for example, to manually stress-test resource-intensive desktop apps).

With this approach, the queue for most targets grows to somewhere between 1k and 10k entries; approximately 10-30% of this is attributable to the discovery of new tuples, and the remainder is associated with changes in hit counts.

The following table compares the relative ability to discover file syntax and explore program states when using several different approaches to guided fuzzing. The instrumented target was GNU patch 2.7.3 compiled with -O3 and seeded with a dummy text file; the session consisted of a single pass over the input queue with afl-fuzz:

Fuzzer guidance strategy used Blocks reached Edges reached Edge hit cnt var Highest-coverage test case generated
(Initial file) 156 163 1.00 (none)
Blind fuzzing S 182 205 2.23 First 2 B of RCS diff
Blind fuzzing L 228 265 2.23 First 4 B of -c mode diff
Block coverage 855 1,130 1.57 Almost-valid RCS diff
Edge coverage 1,452 2,070 2.18 One-chunk -c mode diff
AFL model 1,765 2,597 4.99 Four-chunk -c mode diff

The first entry for blind fuzzing (“S”) corresponds to executing just a single round of testing; the second set of figures (“L”) shows the fuzzer running in a loop for a number of execution cycles comparable with that of the instrumented runs, which required more time to fully process the growing queue.

Roughly similar results have been obtained in a separate experiment where the fuzzer was modified to compile out all the random fuzzing stages and leave just a series of rudimentary, sequential operations such as walking bit flips. Because this mode would be incapable of altering the size of the input file, the sessions were seeded with a valid unified diff:

Queue extension strategy used Blocks reached Edges reached Edge hit cnt var Number of unique crashes found
(Initial file) 624 717 1.00
Blind fuzzing 1,101 1,409 1.60 0
Block coverage 1,255 1,649 1.48 0
Edge coverage 1,259 1,734 1.72 0
AFL model 1,452 2,040 3.16 1

At noted earlier on, some of the prior work on genetic fuzzing relied on maintaining a single test case and evolving it to maximize coverage. At least in the tests described above, this “greedy” approach appears to confer no substantial benefits over blind fuzzing strategies.

0x04 Culling the corpus

The progressive state exploration approach outlined above means that some of the test cases synthesized later on in the game may have edge coverage that is a strict superset of the coverage provided by their ancestors.

To optimize the fuzzing effort, AFL periodically re-evaluates the queue using a fast algorithm that selects a smaller subset of test cases that still cover every tuple seen so far, and whose characteristics make them particularly favorable to the tool.

The algorithm works by assigning every queue entry a score proportional to its execution latency and file size; and then selecting lowest-scoring candidates for each tuple.

The tuples are then processed sequentially using a simple workflow:

  1. Find next tuple not yet in the temporary working set,
  2. Locate the winning queue entry for this tuple,
  3. Register all tuples present in that entry’s trace in the working set,
  4. Go to #1 if there are any missing tuples in the set.

The generated corpus of “favored” entries is usually 5-10x smaller than the starting data set. Non-favored entries are not discarded, but they are skipped with varying probabilities when encountered in the queue:

  • If there are new, yet-to-be-fuzzed favorites present in the queue, 99% of non-favored entries will be skipped to get to the favored ones.
  • If there are no new favorites:
    • If the current non-favored entry was fuzzed before, it will be skipped 95% of the time.
    • If it hasn’t gone through any fuzzing rounds yet, the odds of skipping drop down to 75%.

Based on empirical testing, this provides a reasonable balance between queue cycling speed and test case diversity.

Slightly more sophisticated but much slower culling can be performed on input or output corpora with afl-cmin. This tool permanently discards the redundant entries and produces a smaller corpus suitable for use with afl-fuzz or external tools.

0x05 Trimming input files

File size has a dramatic impact on fuzzing performance, both because large files make the target binary slower, and because they reduce the likelihood that a mutation would touch important format control structures, rather than redundant data blocks. This is discussed in more detail in Performance Tips.

The possibility that the user will provide a low-quality starting corpus aside, some types of mutations can have the effect of iteratively increasing the size of the generated files, so it is important to counter this trend.

Luckily, the instrumentation feedback provides a simple way to automatically trim down input files while ensuring that the changes made to the files have no impact on the execution path.

The built-in trimmer in afl-fuzz attempts to sequentially remove blocks of data with variable length and stepover; any deletion that doesn’t affect the checksum of the trace map is committed to disk. The trimmer is not designed to be particularly thorough; instead, it tries to strike a balance between precision and the number of execve() calls spent on the process, selecting the block size and stepover to match. The average per-file gains are around 5-20%.

The standalone afl-tmin tool uses a more exhaustive, iterative algorithm, and also attempts to perform alphabet normalization on the trimmed files. The operation of afl-tmin is as follows.

First, the tool automatically selects the operating mode. If the initial input crashes the target binary, afl-tmin will run in non-instrumented mode, simply keeping any tweaks that produce a simpler file but still crash the target. If the target is non-crashing, the tool uses an instrumented mode and keeps only the tweaks that produce exactly the same execution path.

The actual minimization algorithm is:

  1. Attempt to zero large blocks of data with large stepovers. Empirically, this is shown to reduce the number of execs by preempting finer-grained efforts later on.
  2. Perform a block deletion pass with decreasing block sizes and stepovers, binary-search-style.
  3. Perform alphabet normalization by counting unique characters and trying to bulk-replace each with a zero value.
  4. As a last result, perform byte-by-byte normalization on non-zero bytes.

Instead of zeroing with a 0x00 byte, afl-tmin uses the ASCII digit ‘0’. This is done because such a modification is much less likely to interfere with text parsing, so it is more likely to result in successful minimization of text files.

The algorithm used here is less involved than some other test case minimization approaches proposed in academic work, but requires far fewer executions and tends to produce comparable results in most real-world applications.

0x06 Fuzzing strategies

The feedback provided by the instrumentation makes it easy to understand the value of various fuzzing strategies and optimize their parameters so that they work equally well across a wide range of file types. The strategies used by afl-fuzz are generally format-agnostic and are discussed in more detail here:

http://lcamtuf.blogspot.com/2014/08/binary-fuzzing-strategies-what-works.html

It is somewhat notable that especially early on, most of the work done by afl-fuzz is actually highly deterministic, and progresses to random stacked modifications and test case splicing only at a later stage. The deterministic strategies include:

  • Sequential bit flips with varying lengths and stepovers,
  • Sequential addition and subtraction of small integers,
  • Sequential insertion of known interesting integers (0, 1, INT_MAX, etc),

The purpose of opening with deterministic steps is related to their tendency to produce compact test cases and small diffs between the non-crashing and crashing inputs.

With deterministic fuzzing out of the way, the non-deterministic steps include stacked bit flips, insertions, deletions, arithmetics, and splicing of different test cases.

The relative yields and execve() costs of all these strategies have been investigated and are discussed in the aforementioned blog post.

For the reasons discussed in History (chiefly, performance, simplicity, and reliability), AFL generally does not try to reason about the relationship between specific mutations and program states; the fuzzing steps are nominally blind, and are guided only by the evolutionary design of the input queue.

That said, there is one (trivial) exception to this rule: when a new queue entry goes through the initial set of deterministic fuzzing steps, and tweaks to some regions in the file are observed to have no effect on the checksum of the execution path, they may be excluded from the remaining phases of deterministic fuzzing - and the fuzzer may proceed straight to random tweaks. Especially for verbose, human-readable data formats, this can reduce the number of execs by 10-40% or so without an appreciable drop in coverage. In extreme cases, such as normally block-aligned tar archives, the gains can be as high as 90%.

Because the underlying “effector maps” are local every queue entry and remain in force only during deterministic stages that do not alter the size or the general layout of the underlying file, this mechanism appears to work very reliably and proved to be simple to implement.

0x07 Dictionaries

The feedback provided by the instrumentation makes it easy to automatically identify syntax tokens in some types of input files, and to detect that certain combinations of predefined or auto-detected dictionary terms constitute a valid grammar for the tested parser.

A discussion of how these features are implemented within afl-fuzz can be found here:

http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html

In essence, when basic, typically easily-obtained syntax tokens are combined together in a purely random manner, the instrumentation and the evolutionary design of the queue together provide a feedback mechanism to differentiate between meaningless mutations and ones that trigger new behaviors in the instrumented code - and to incrementally build more complex syntax on top of this discovery.

The dictionaries have been shown to enable the fuzzer to rapidly reconstruct the grammar of highly verbose and complex languages such as JavaScript, SQL, or XML; several examples of generated SQL statements are given in the blog post mentioned above.

Interestingly, the AFL instrumentation also allows the fuzzer to automatically isolate syntax tokens already present in an input file. It can do so by looking for run of bytes that, when flipped, produce a consistent change to the program’s execution path; this is suggestive of an underlying atomic comparison to a predefined value baked into the code. The fuzzer relies on this signal to build compact “auto dictionaries” that are then used in conjunction with other fuzzing strategies.

0x08 De-duping crashes

De-duplication of crashes is one of the more important problems for any competent fuzzing tool. Many of the naive approaches run into problems; in particular, looking just at the faulting address may lead to completely unrelated issues being clustered together if the fault happens in a common library function (say, strcmp, strcpy); while checksumming call stack backtraces can lead to extreme crash count inflation if the fault can be reached through a number of different, possibly recursive code paths.

The solution implemented in afl-fuzz considers a crash unique if any of two conditions are met:

  • The crash trace includes a tuple not seen in any of the previous crashes,
  • The crash trace is missing a tuple that was always present in earlier faults.

The approach is vulnerable to some path count inflation early on, but exhibits a very strong self-limiting effect, similar to the execution path analysis logic that is the cornerstone of afl-fuzz.

Investigating crashes

The exploitability of many types of crashes can be ambiguous; afl-fuzz tries to address this by providing a crash exploration mode where a known-faulting test case is fuzzed in a manner very similar to the normal operation of the fuzzer, but with a constraint that causes any non-crashing mutations to be thrown away.

A detailed discussion of the value of this approach can be found here:

http://lcamtuf.blogspot.com/2014/11/afl-fuzz-crash-exploration-mode.html

The method uses instrumentation feedback to explore the state of the crashing program to get past the ambiguous faulting condition and then isolate the newly-found inputs for human review.

On the subject of crashes, it is worth noting that in contrast to normal queue entries, crashing inputs are not trimmed; they are kept exactly as discovered to make it easier to compare them to the parent, non-crashing entry in the queue. That said, afl-tmin can be used to shrink them at will.

0x09 The fork server

To improve performance, afl-fuzz uses a “fork server”, where the fuzzed process goes through execve(), linking, and libc initialization only once, and is then cloned from a stopped process image by leveraging copy-on-write. The implementation is described in more detail here:

http://lcamtuf.blogspot.com/2014/10/fuzzing-binaries-without-execve.html

The fork server is an integral aspect of the injected instrumentation and simply stops at the first instrumented function to await commands from afl-fuzz.

With fast targets, the fork server can offer considerable performance gains, usually between 1.5x and 2x. It is also possible to:

  • Use the fork server in manual (“deferred”) mode, skipping over larger, user-selected chunks of initialization code. It requires very modest code changes to the targeted program, and With some targets, can produce 10x+ performance gains.
  • Enable “persistent” mode, where a single process is used to try out multiple inputs, greatly limiting the overhead of repetitive fork() calls. This generally requires some code changes to the targeted program, but can improve the performance of fast targets by a factor of 5 or more - approximating the benefits of in-process fuzzing jobs while still maintaining very robust isolation between the fuzzer process and the targeted binary.

0x0A Parallelization

The parallelization mechanism relies on periodically examining the queues produced by independently-running instances on other CPU cores or on remote machines, and then selectively pulling in the test cases that, when tried out locally, produce behaviors not yet seen by the fuzzer at hand.

This allows for extreme flexibility in fuzzer setup, including running synced instances against different parsers of a common data format, often with synergistic effects.

For more information about this design, see Tips for parallel fuzzing.

0x0B Binary-only instrumentation

Instrumentation of black-box, binary-only targets is accomplished with the help of a separately-built version of QEMU in “user emulation” mode. This also allows the execution of cross-architecture code - say, ARM binaries on x86.

QEMU uses basic blocks as translation units; the instrumentation is implemented on top of this and uses a model roughly analogous to the compile-time hooks:

1
2
3
4
5
6
7
if (block_address > elf_text_start && block_address < elf_text_end) {

cur_location = (block_address >> 4) ^ (block_address << 8);
shared_mem[cur_location ^ prev_location]++;
prev_location = cur_location >> 1;

}

The shift-and-XOR-based scrambling in the second line is used to mask the effects of instruction alignment.

The start-up of binary translators such as QEMU, DynamoRIO, and PIN is fairly slow; to counter this, the QEMU mode leverages a fork server similar to that used for compiler-instrumented code, effectively spawning copies of an already-initialized process paused at _start.

First-time translation of a new basic block also incurs substantial latency. To eliminate this problem, the AFL fork server is extended by providing a channel between the running emulator and the parent process. The channel is used to notify the parent about the addresses of any newly-encountered blocks and to add them to the translation cache that will be replicated for future child processes.

As a result of these two optimizations, the overhead of the QEMU mode is roughly 2-5x, compared to 100x+ for PIN.

0x0C Others

The afl-analyze tool

The file format analyzer is a simple extension of the minimization algorithm discussed earlier on; instead of attempting to remove no-op blocks, the tool performs a series of walking byte flips and then annotates runs of bytes in the input file.

It uses the following classification scheme:

  • “No-op blocks” - segments where bit flips cause no apparent changes to control flow. Common examples may be comment sections, pixel data within a bitmap file, etc.
  • “Superficial content” - segments where some, but not all, bitflips produce some control flow changes. Examples may include strings in rich documents (e.g., XML, RTF).
  • “Critical stream” - a sequence of bytes where all bit flips alter control flow in different but correlated ways. This may be compressed data, non-atomically compared keywords or magic values, etc.
  • “Suspected length field” - small, atomic integer that, when touched in any way, causes a consistent change to program control flow, suggestive of a failed length check.
  • “Suspected cksum or magic int” - an integer that behaves similarly to a length field, but has a numerical value that makes the length explanation unlikely. This is suggestive of a checksum or other “magic” integer.
  • “Suspected checksummed block” - a long block of data where any change always triggers the same new execution path. Likely caused by failing a checksum or a similar integrity check before any subsequent parsing takes place.
  • “Magic value section” - a generic token where changes cause the type of binary behavior outlined earlier, but that doesn’t meet any of the other criteria. May be an atomically compared keyword or so.

How to use?

The AFL approach

American Fuzzy Lop is a brute-force fuzzer coupled with an exceedingly simple but rock-solid instrumentation-guided genetic algorithm. It uses a modified form of edge coverage to effortlessly pick up subtle, local-scale changes to program control flow.

Simplifying a bit, the overall algorithm can be summed up as:

  1. Load user-supplied initial test cases into the queue,
  2. Take next input file from the queue,
  3. Attempt to trim the test case to the smallest size that doesn’t alter the measured behavior of the program,
  4. Repeatedly mutate the file using a balanced and well-researched variety of traditional fuzzing strategies,
  5. If any of the generated mutations resulted in a new state transition recorded by the instrumentation, add mutated output as a new entry in the queue.
  6. Go to 2.

The discovered test cases are also periodically culled to eliminate ones that have been obsoleted by newer, higher-coverage finds; and undergo several other instrumentation-driven effort minimization steps.

As a side result of the fuzzing process, the tool creates a small, self-contained corpus of interesting test cases. These are extremely useful for seeding other, labor- or resource-intensive testing regimes - for example, for stress-testing browsers, office applications, graphics suites, or closed-source tools.

The fuzzer is thoroughly tested to deliver out-of-the-box performance far superior to blind fuzzing or coverage-only tools.

Example 1

vul.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <stdio.h> 
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <signal.h>

int vuln_3(char *str) {
int len = strlen(str);
if(str[0] == 'A' && len == 66) {
raise(SIGSEGV);
}
else if(str[0] == 'F' && len == 6) {
raise(SIGSEGV);
}
else {
printf("it is good!\n");
}
return 0;
}

int main(int argc, char *argv[]) {
char buf[100]={0};
gets(buf); // vul1
printf(buf); // vul2
vuln_3(buf);
return 0;
}

Compile source code

1
afl-gcc vul.c -o vul

Test program

1
afl-showmap -m none -o /dev/null -- ./vul

output:

1
2
3
4
5
6
7
8
afl-showmap 2.57b by <lcamtuf@google.com>
[*] Executing './vul'...

-- Program output begins --
AAAAAAAAAABBBBBBBBBBBBBBBB
AAAAAAAAAABBBBBBBBBBBBBBBBit is good!
-- Program output ends --
[+] Captured 3 tuples in '/dev/null'.

Test two:

To trigger signal raise

1
2
3
4
5
6
7
8
9
10
[i0gan@arch example_1]$ afl-showmap -m none -o /dev/null -- ./vul
afl-showmap 2.57b by <lcamtuf@google.com>
[*] Executing './vul'...

-- Program output begins --
Faaaaa
-- Program output ends --

+++ Program killed by signal 11 +++
[+] Captured 5 tuples in '/dev/null'.

Using different input, under normal circumstances, afl-showmap cmd will capture different tuples, which shows that our instrumentation is effective. In addition, the previously mentioned afl-cmin uses this tool to remove duplicate input files.

Close core pattern

Before executing afl-fuzz, if the system is configured to send core dump notification to an external program. As a result, the delay between sending crash information to fuzzer will increase, and the crash may be misreported as timeout, so we have to modify the core_pattern file, run command with root user as follows:

1
echo core >/proc/sys/kernel/core_pattern

Create a output dir

1
mkdir out

Starting fuzz

For target binaries that accept input directly from stdin, the usual syntax is:

1
afl-fuzz -i testcase_dir -o findings_dir /path/to/program [params]

So our fuzzing cmd is

1
afl-fuzz -i ./testcases/ -o out ./vul

Breaking the above command into parts:

  • -i afl_in specifies the directory to take the seed test cases from
  • -o afl_out specifies the directory where AFL can store all result files for crashes, hangs and queue

The testcases dir is given by github afl project, you just copy that directory to here or run commands as follows:

1
2
mkdir testcases
cp /bin/sh ./testcases # copy random seeds to testcases

May be comming some error

Whoops, your system uses on-demand CPU frequency scaling, adjusted
between 1367 and 2050 MHz. Unfortunately, the scaling algorithm in the
kernel is imperfect and can miss the short-lived processes spawned by
afl-fuzz. To keep things moving, run these commands as root:

1
2
cd /sys/devices/system/cpu
echo performance | tee cpu*/cpufreq/scaling_governor

output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[i0gan@arch example_1]$ afl-fuzz -i ./testcases/ -o out ./vul @@
afl-fuzz 2.57b by <lcamtuf@google.com>
[+] You have 8 CPU cores and 1 runnable tasks (utilization: 12%).
[+] Try parallel jobs - see /usr/share/doc/afl/parallel_fuzzing.txt.
[*] Checking CPU core loadout...
[+] Found a free CPU core, binding to #0.
[*] Checking core_pattern...
[*] Checking CPU scaling governor...
[*] Setting up output directories...
[+] Output directory exists but deemed OK to reuse.
[*] Deleting old session data...
[+] Output dir cleanup successful.
[*] Scanning './testcases/'...
[+] No auto-generated dictionary tokens to reuse.
[*] Creating hard links for all input files...
[*] Validating target binary...
[*] Attempting dry run with 'id:000000,orig:README.testcases'...
[*] Spinning up the fork server...
[+] All right - fork server is up.
len = 828, map size = 4, exec speed = 225 us
[+] All test cases processed.

[+] Here are some useful stats:

Test case count : 1 favored, 0 variable, 1 total
Bitmap range : 4 to 4 bits (average: 4.00 bits)
Exec timing : 225 to 225 us (average: 225 us)

[*] No -t option specified, so I'll use exec timeout of 20 ms.
[+] All set and ready to roll!

MENU UI

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
                         american fuzzy lop 2.57b (vul)

┌─ process timing ─────────────────────────────────────┬─ overall results ─────┐
│ run time : 0 days, 0 hrs, 1 min, 9 sec │ cycles done : 172 │
│ last new path : 0 days, 0 hrs, 1 min, 8 sec │ total paths : 3 │
│ last uniq crash : 0 days, 0 hrs, 0 min, 48 sec │ uniq crashes : 4 │
│ last uniq hang : none seen yet │ uniq hangs : 0 │
├─ cycle progress ────────────────────┬─ map coverage ─┴───────────────────────┤
│ now processing : 1 (33.33%) │ map density : 0.00% / 0.01% │
│ paths timed out : 0 (0.00%) │ count coverage : 1.00 bits/tuple │
├─ stage progress ────────────────────┼─ findings in depth ────────────────────┤
│ now trying : splice 14 │ favored paths : 3 (100.00%) │
│ stage execs : 31/32 (96.88%) │ new edges on : 3 (100.00%) │
│ total execs : 363k │ total crashes : 588 (4 unique) │
│ exec speed : 5736/sec │ total tmouts : 2 (1 unique) │
├─ fuzzing strategy yields ───────────┴───────────────┬─ path geometry ────────┤
│ bit flips : 0/128, 0/125, 0/119 │ levels : 2 │
│ byte flips : 0/16, 0/13, 0/7 │ pending : 0 │
│ arithmetics : 1/893, 0/220, 0/2 │ pend fav : 0 │
│ known ints : 0/82, 0/334, 0/308 │ own finds : 2 │
│ dictionary : 0/0, 0/0, 0/0 │ imported : n/a │
│ havoc : 4/158k, 1/202k │ stability : 100.00% │
│ trim : 98.56%/17, 0.00% ├────────────────────────┘
^C────────────────────────────────────────────────────┘ [cpu000: 40%]

We found 4 unique crashes just tack some seconds.

1
2
3
4
5
6
7
tree out/crashes
out/crashes
├── id:000000,sig:06,src:000000,op:havoc,rep:128
├── id:000001,sig:06,src:000002,op:havoc,rep:64
├── id:000002,sig:11,src:000000+000002,op:splice,rep:32
├── id:000003,sig:06,src:000001,op:havoc,rep:32
└── README.txt
1
2
cat out/crashes/id:000002,sig:11,src:000000+000002,op:splice,rep:32
F+�Ft

test crashes

1
2
3
cat out/crashes/id:000002,sig:11,src:000000+000002,op:splice,rep:32 | ./vul 
[1] 1213401 done cat out/crashes/id:000002,sig:11,src:000000+000002,op:splice,rep:32 |
1213402 segmentation fault (core dumped) ./vul

Example 2 (fuzzgat)

Target

ref: https://medium.com/@ayushpriya10/fuzzing-applications-with-american-fuzzy-lop-afl-54facc65d102

modify CC as alf-gcc

1
2
3
git clone https://github.com/fuzzstati0n/fuzzgoat
cd fuzzgoat
make

Get fuzzgoat vul file

1
2
3
4
afl-command-line  fuzzgoat_ASAN  fuzzgoat.h         in           LICENSE  Makefile   seed
fuzzgoat fuzzgoat.c fuzzgoatNoVulns.c input-files main.c README.md
[i0gan@arch fuzzgoat]$ ./fuzzgoat
./fuzzgoat <file_json

Create test_in and test_out directories

1
mkdir test_in test_out

Copy seed to test_in

1
cp /bin/ls ./test_in

Before we start fuzz, we should set some environments. login as root, run commands as follows:

1
2
3
4
5
su - root
echo core >/proc/sys/kernel/core_pattern
cd /sys/devices/system/cpu
echo performance | tee cpu*/cpufreq/scaling_governor
exit

Fuzzing

Finally, to fuzz the application we use the following command

1
afl-fuzz -i test_in -o test_out -- ./fuzzgoat @@

Breaking the above command into parts:

  • -- separates the target’s command structure. The left side of the separation is where AFL’s flags are passed and the right side is where the target’s run command is, in this case, ./fuzzgoat
  • @@ defines the position where AFL is supposed to insert the test file in the target application’s command structure

Note:* Using *@@* is not mandatory. AFL can also pass input to the target through *STDIN*.*

Running AFL should yield the following interface on your terminal:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
                      american fuzzy lop 2.57b (fuzzgoat)

┌─ process timing ─────────────────────────────────────┬─ overall results ─────┐
│ run time : 0 days, 0 hrs, 1 min, 5 sec │ cycles done : 0 │
│ last new path : 0 days, 0 hrs, 0 min, 0 sec │ total paths : 165 │
│ last uniq crash : 0 days, 0 hrs, 0 min, 18 sec │ uniq crashes : 10 │
│ last uniq hang : none seen yet │ uniq hangs : 0 │
├─ cycle progress ────────────────────┬─ map coverage ─┴───────────────────────┤
│ now processing : 100 (60.61%) │ map density : 0.16% / 0.55% │
│ paths timed out : 0 (0.00%) │ count coverage : 2.28 bits/tuple │
├─ stage progress ────────────────────┼─ findings in depth ────────────────────┤
│ now trying : arith 8/8 │ favored paths : 53 (32.12%) │
│ stage execs : 243/1404 (17.31%) │ new edges on : 74 (44.85%) │
│ total execs : 258k │ total crashes : 1317 (10 unique) │
│ exec speed : 4082/sec │ total tmouts : 0 (0 unique) │
├─ fuzzing strategy yields ───────────┴───────────────┬─ path geometry ────────┤
│ bit flips : 7/1064, 1/1027, 1/953 │ levels : 4 │
│ byte flips : 0/133, 0/96, 0/41 │ pending : 129 │
│ arithmetics : 15/6243, 0/983, 0/71 │ pend fav : 22 │
│ known ints : 1/621, 1/1939, 0/975 │ own finds : 164 │
│ dictionary : 0/0, 0/0, 0/0 │ imported : n/a │
│ havoc : 146/242k, 0/0 │ stability : 100.00% │
│ trim : 100.00%/61, 0.00% ├────────────────────────┘
^C────────────────────────────────────────────────────┘ [cpu000: 41%]

Let the fuzzer run till it has at least 50 cycles done

Analysing Results

All that’s left is looking at the results. Let’s navigate to the directory where AFL has kept all the test cases that resulted in crashes or hangs:

Looking inside /crashes or /hangs directories should have files with names resembling (but not the same) as depicted below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[i0gan@arch fuzzgoat]$ tree test_out/
test_out/
├── crashes
│   ├── id:000000,sig:06,src:000000,op:havoc,rep:8
│   ├── id:000001,sig:11,src:000000,op:havoc,rep:128
│   ├── id:000002,sig:06,src:000035,op:havoc,rep:4
│   ├── id:000003,sig:06,src:000055,op:flip1,pos:0
│   ├── id:000004,sig:06,src:000055,op:flip1,pos:2
│   ├── id:000005,sig:06,src:000055,op:arith8,pos:0,val:-21
│   ├── id:000006,sig:06,src:000055,op:arith8,pos:0,val:-24
│   ├── id:000007,sig:06,src:000055,op:arith8,pos:0,val:-25
│   ├── id:000008,sig:06,src:000055,op:arith8,pos:2,val:-21
│   ├── id:000009,sig:06,src:000055,op:arith8,pos:2,val:-24
│   └── README.txt
├── fuzz_bitmap
├── fuzzer_stats
├── hangs
├── plot_data
└── queue
├── id:000000,orig:sh
├── id:000001,src:000000,op:flip1,pos:0,+cov
├── id:000002,src:000000,op:flip1,pos:0,+cov
├── id:000003,src:000000,op:flip1,pos:0,+cov
├── id:000004,src:000000,op:arith8,pos:0,val:-11,+cov
├── id:000005,src:000000,op:arith8,pos:0,val:-17,+cov
...

Now, we can take a look in these files to see what exactly AFL mutated the seed input to and then figure out why it made the application crash or hang. Finally, it’s on us what we want to do with the bugs we found with AFL.

Test results

Test 1
1
2
3
4
5
6
7
8
./fuzzgoat test_out/crashes/id:000000,sig:06,src:000000,op:havoc,rep:8 
""
--------------------------------

string:
free(): invalid pointer
[1] 848244 abort (core dumped) ./fuzzgoat test_out/crashes/id:000000,sig:06,src:000000,op:havoc,rep:8

Test 2
1
2
3
4
5
6
7
8
./fuzzgoat test_out/crashes/id:000009,sig:06,src:000055,op:arith8,pos:2,val:-24 
""

--------------------------------

string:
free(): invalid pointer
[1] 848263 abort (core dumped) ./fuzzgoat

Show file content as hex

1
2
3
4
5
hexedit test_out/crashes/id:000000,sig:06,src:000000,op:havoc,rep:8 
22 22

hexedit test_out/crashes/id:000008,sig:06,src:000055,op:arith8,pos:2,val:-21
22 22 0D

Conclusion

This paper will continue to write later.

打赏点小钱
支付宝 | Alipay
微信 | WeChat