304 lines
11 KiB
Markdown
304 lines
11 KiB
Markdown
# Efficient Fuzzing Guide
|
||
|
||
Once you have a fuzz target running, you can analyze and tweak it to improve its
|
||
efficiency. This document describes techniques to minimize fuzzing time and
|
||
maximize your results.
|
||
|
||
*** note
|
||
**Note:** If you haven’t created your first fuzz target yet, see the [Getting
|
||
Started Guide].
|
||
***
|
||
|
||
The most direct way to gauge the effectiveness of your fuzz target is to collect
|
||
metrics. You can get them manually, or take them from a [ClusterFuzz status]
|
||
page after your fuzz target is checked into the Chromium repository.
|
||
|
||
[TOC]
|
||
|
||
## Key metrics of a fuzz target
|
||
|
||
### Execution speed
|
||
|
||
A fuzzing engine such as libFuzzer typically explores a large search space by
|
||
performing randomized mutations, so it needs to run as fast as possible to find
|
||
interesting code paths.
|
||
|
||
Fuzz target speed is calculated in executions per second (`exec/s`). It is
|
||
printed while a fuzz target is running:
|
||
|
||
```
|
||
#11002 NEW cov: 1337 ft: 10934 corp: 707/409Kb lim: 1098 exec/s: 5333 rss: 27Mb L: 186/1098
|
||
```
|
||
|
||
You should aim for at least 1,000 exec/s from your fuzz target locally before
|
||
submitting it to the Chromium repository. If you’re under 1,000, consider the
|
||
following improvements:
|
||
|
||
* [Simplifying initialization/cleanup](#Simplifying-initialization-cleanup)
|
||
* [Minimizing memory usage](#Minimizing-memory-usage)
|
||
|
||
#### Simplifying initialization/cleanup
|
||
|
||
If your `LLVMFuzzerTestOneInput` function is too complex, it can decrease the
|
||
fuzzer’s execution speed. It can also cause the fuzzer to target specific
|
||
use-cases or fail to account for unexpected scenarios.
|
||
|
||
Instead of performing setup and teardown on each input, use static
|
||
initialization and shared resources. Check out this [startup initialization] in
|
||
libFuzzer’s documentation for an example.
|
||
|
||
*** note
|
||
**Note:** You can skip freeing static resources. However, all other resources
|
||
allocated within the `LLVMFuzzerTestOneInput` function should be de-allocated,
|
||
since the function gets called millions of times during a fuzzing session. If
|
||
you don’t, you’ll often run out of memory and reduce overall fuzzing efficiency.
|
||
***
|
||
|
||
#### Minimizing memory usage
|
||
|
||
Avoid allocation of dynamic memory wherever possible. Memory instrumentation
|
||
works faster for stack-based and static objects than for heap-allocated ones.
|
||
|
||
*** note
|
||
**Note:** It’s always a good idea to try different variants for your fuzz target
|
||
locally, then submit only the fastest implementation to the Chromium repository.
|
||
***
|
||
|
||
### Code coverage
|
||
|
||
You can check the percentage of code covered by your fuzz target to gauge
|
||
fuzzing effectiveness:
|
||
|
||
* Review aggregated Chrome coverage from recent runs by checking the [fuzzing
|
||
coverage] report. This report can provide insight on how to improve code
|
||
coverage.
|
||
* Generate a source-level coverage report for your fuzzer by running the
|
||
[coverage script] stored in the Chromium repository. The script provides
|
||
detailed instructions and a usage example.
|
||
|
||
For the `out/coverage` target in the coverage script, make sure to add all of
|
||
the gn args you needed to build the `out/libfuzzer` target; this could include
|
||
args like `target_os=chromeos` and `is_asan=true` depending on the [gn config]
|
||
you chose.
|
||
|
||
*** note
|
||
**Note:** The code coverage of a fuzz target depends heavily on the corpus. A
|
||
well-chosen corpus will produce much greater code coverage. On the other hand,
|
||
a coverage report generated by a fuzz target without a corpus won't cover much
|
||
code. If you don’t have a corpus to use, you can download the [corpus from
|
||
ClusterFuzz]. For more information on the corpus, see
|
||
[Corpus Size](#Corpus-Size).
|
||
***
|
||
|
||
### Corpus size
|
||
|
||
A guided fuzzing engine such as libFuzzer considers an input (a.k.a. testcase
|
||
or corpus unit) *interesting* if the input results in new code coverage (i.e.,
|
||
if the fuzzer reaches code that has not been reached before). The set of all
|
||
interesting inputs is called the *corpus*. A corpus is shared across fuzzer runs
|
||
and grows over time.
|
||
|
||
If a fuzz target stops discovering new interesting inputs after running for a
|
||
while, it typically indicates that the fuzz target is hitting a code barrier
|
||
(also called a *coverage plateau*). The corpus for a reasonably complex target
|
||
should contain hundreds (if not thousands) of inputs.
|
||
|
||
If a fuzz target reaches coverage plateau with a small corpus, the common causes
|
||
are checksums and magic numbers. Or, it may be impossible for your fuzzer to
|
||
reach a lot of code. The easiest way to diagnose the problem is to generate and
|
||
analyze a [coverage report](#code-coverage). Then, to fix the issue, try the
|
||
following:
|
||
|
||
* Change the code (e.g., disable CRC checks while fuzzing) with a
|
||
[custom build](#Custom-build).
|
||
* Prepare or improve the [seed corpus](#Seed-corpus).
|
||
* Prepare or improve the [fuzzer dictionary](#Fuzzer-dictionary).
|
||
|
||
## Ways to improve a fuzz target
|
||
|
||
### Seed corpus
|
||
|
||
You can give your fuzz target a starting point by creating a set of valid and
|
||
interesting inputs called a *seed corpus*. If you don’t provide a seed corpus,
|
||
the fuzzing engine has to guess inputs from scratch, which can take time
|
||
(depending on the size of the inputs and the complexity of the target format).
|
||
In many cases, providing a seed corpus can increase code coverage by an order of
|
||
magnitude.
|
||
|
||
Seed corpuses work especially well for strictly defined file formats and data
|
||
transmission protocols:
|
||
|
||
* For file format parsers, add valid files from your test suite.
|
||
* For protocol parsers, add valid raw streams from a test suite into separate
|
||
files.
|
||
* For graphics libraries, add a variety of small PNG/JPG/GIF files.
|
||
|
||
#### Using a corpus locally
|
||
|
||
If you’re running a fuzz target locally, you can easily designate a corpus by
|
||
passing a directory as an argument:
|
||
|
||
```
|
||
./out/libfuzzer/my_fuzzer ~/tmp/my_fuzzer_corpus
|
||
```
|
||
|
||
The fuzzer stores all the interesting inputs it finds in the directory.
|
||
|
||
#### Creating a Chromium repository seed corpus
|
||
|
||
When running fuzz targets at scale, ClusterFuzz looks for a seed corpus defined
|
||
in the Chromium source repository. You can define one in your `BUILD.gn` file by
|
||
adding a `seed_corpus` attribute to your `fuzzer_test` target definition:
|
||
|
||
```
|
||
fuzzer_test("my_fuzzer") {
|
||
...
|
||
seed_corpus = "test/fuzz/testcases"
|
||
...
|
||
}
|
||
```
|
||
|
||
If you want to specify multiple seed corpus directories, use the `seed_corpuses`
|
||
attribute instead:
|
||
|
||
```
|
||
fuzzer_test("my_fuzzer") {
|
||
...
|
||
seed_corpuses = [ "test/fuzz/testcases", "test/unittest/data" ]
|
||
...
|
||
}
|
||
```
|
||
|
||
All files found in these directories and their subdirectories are stored in a
|
||
`<my_fuzzer>_seed_corpus.zip` output archive.
|
||
|
||
#### Uploading corpus files to GCS
|
||
|
||
If you can't store your seed corpus in the Chromium repository (e.g., it’s too
|
||
large, can’t be open-sourced, etc.), you can upload the corpus to the Google
|
||
Cloud Storage (GCS) bucket used by ClusterFuzz.
|
||
|
||
1) Open the [Corpus GCS Bucket] in your browser.
|
||
2) Search for the directory named `<my_fuzzer>`. If the directory does not
|
||
exist, create it.
|
||
3) In the `<my_fuzzer>` directory, upload your corpus files.
|
||
|
||
*** note
|
||
**Note:** If you upload your corpus to GCS, you don’t need to add the
|
||
`seed_corpus` attribute to your `fuzzer_test` target definition. However, adding
|
||
seed corpus to the Chromium repository is the preferred way.
|
||
***
|
||
|
||
You can do the same thing by using the [gsutil] command line tool:
|
||
|
||
```bash
|
||
gsutil -m rsync <path_to_corpus> gs://clusterfuzz-corpus/libfuzzer/<my_fuzzer>
|
||
```
|
||
|
||
*** note
|
||
**Note:** To write to this bucket using `gsutil`, you must be logged into your
|
||
@google.com account (@chromium.org will not work). You can use the `gcloud auth
|
||
login` command to log into your account in `gsutil` if you installed `gsutil`
|
||
through `gcloud`.
|
||
***
|
||
|
||
#### Minimizing a seed corpus
|
||
|
||
Your seed corpus is synced to all fuzzing bots for every iteration, so it's
|
||
important to minimize it to a small set of interesting inputs before uploading.
|
||
Keeping the seed corpus small improves fuzzing efficiency and prevents our bots
|
||
from running out of disk space.
|
||
|
||
You can minimize your seed corpus by using libFuzzer’s `-merge=1` option:
|
||
|
||
```bash
|
||
# Create an empty directory.
|
||
mkdir seed_corpus_minimized
|
||
|
||
# Run the fuzzer with -merge=1 flag.
|
||
./my_fuzzer -merge=1 ./seed_corpus_minimized ./seed_corpus
|
||
```
|
||
|
||
After running the command, the `seed_corpus_minimized` directory will contain a
|
||
minimized corpus that gives the same code coverage as your initial `seed_corpus`
|
||
directory.
|
||
|
||
### Fuzzer dictionary
|
||
|
||
You can help your fuzzer increase its coverage by providing a set of common
|
||
words or values that you expect to find in the input. Such a dictionary works
|
||
especially well for certain use-cases (e.g., fuzzing file format decoders or
|
||
text-based protocols like XML).
|
||
|
||
Add a fuzzer dictionary:
|
||
|
||
1) Create a flat ASCII text file that lists one input token per line in the
|
||
format `name="value"`. The value must appear in quotes with hex escaping
|
||
(`\xNN`) applied to all non-printable, high-bit, or otherwise problematic
|
||
characters (`\` and `"` shorthands are recognized, too). This syntax is
|
||
similar to the one used by the [AFL] fuzzing engine (`-x` option).
|
||
|
||
*** note
|
||
**Note:** `name` can be omitted, but it is a convenient way to document the
|
||
meaning of each token. Here’s an example dictionary:
|
||
***
|
||
|
||
```
|
||
# Lines starting with '#' and empty lines are ignored.
|
||
|
||
# Adds "blah" word (w/o quotes) to the dictionary.
|
||
kw1="blah"
|
||
# Use \\ for backslash and \" for quotes.
|
||
kw2="\"ac\\dc\""
|
||
# Use \xAB for hex values.
|
||
kw3="\xF7\xF8"
|
||
# Key name before '=' can be omitted:
|
||
"foo\x0Abar"
|
||
```
|
||
|
||
2) Test your dictionary by running your fuzz target locally:
|
||
|
||
```bash
|
||
./out/libfuzzer/my_fuzzer -dict=<path_to_dict> <path_to_corpus>
|
||
```
|
||
|
||
If the dictionary is effective, you should see `NEW` units discovered in the
|
||
output.
|
||
|
||
3) Add the dictionary file in the same directory as your fuzz target, then add
|
||
the `dict` attribute to the `fuzzer_test` definition in your `BUILD.gn` file:
|
||
|
||
```
|
||
fuzzer_test("my_fuzzer") {
|
||
...
|
||
dict = "my_fuzzer.dict"
|
||
}
|
||
```
|
||
|
||
The dictionary is submitted to the Chromium repository. Once ClusterFuzz
|
||
picks up a new revision build, the dictionary is used automatically.
|
||
|
||
### Custom build
|
||
|
||
If you need to change the code being tested by your fuzz target, you can use an
|
||
`#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` macro in your target code.
|
||
|
||
*** note
|
||
**Note:** Patching target code is not a preferred way of improving the
|
||
corresponding fuzz target, but in some cases it might be the only way to do it
|
||
(e.g., when there is no intended API to disable checksum verification, or when
|
||
the target code uses a random generator that affects the reproducibility of
|
||
crashes).
|
||
***
|
||
|
||
[AFL]: http://lcamtuf.coredump.cx/afl/
|
||
[ClusterFuzz status]: libFuzzer_integration.md#Status-Links
|
||
[Corpus GCS Bucket]: https://console.cloud.google.com/storage/clusterfuzz-corpus/libfuzzer
|
||
[Getting Started Guide]: getting_started.md
|
||
[gn config]: getting_started.md#running-the-fuzz-target
|
||
[corpus from ClusterFuzz]: libFuzzer_integration.md#Corpus
|
||
[coverage script]: https://cs.chromium.org/chromium/src/tools/code_coverage/coverage.py
|
||
[fuzzing coverage]: https://chromium-coverage.appspot.com/reports/latest_fuzzers_only/linux/index.html
|
||
[gsutil]: https://cloud.google.com/storage/docs/gsutil
|
||
[startup initialization]: https://llvm.org/docs/LibFuzzer.html#startup-initialization
|