diff options
author | Alon Zakai <azakai@google.com> | 2020-04-28 15:34:45 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2020-04-28 15:34:45 -0700 |
commit | e44b0c9225cf17d3c76cfb9e41a83eb0104f13e4 (patch) | |
tree | b7145f00b4edd66aeb9dff957dede5a7b59d3909 | |
parent | 1cd20951895a8e54ab52bdbf011296c666b0630f (diff) | |
download | binaryen-e44b0c9225cf17d3c76cfb9e41a83eb0104f13e4.tar.gz binaryen-e44b0c9225cf17d3c76cfb9e41a83eb0104f13e4.tar.bz2 binaryen-e44b0c9225cf17d3c76cfb9e41a83eb0104f13e4.zip |
Simple reduction for any* fuzzer-found bug (#2817)
This builds on recent work for deterministic reproduction of fuzzer
testcases using IDs. With that, we can remove all the old auto-reduction
code and make something very similar possible for all* things the
fuzzer script checks for.
The idea is simple: if you run the fuzzer script and it finds a bug,
it prints out the ID it found it with. If you then run
fuzz_opt.py ID
then it runs that exact testcase again, deterministically, making
all the same random choices it made before. The new addition
in this PR is that you can do
fuzz_opt.py ID WASM
which also adds a wasm file. If provided, we still randomly
generate one in the fuzzer script (so that later random numbers
are the same) but we swap in that provided wasm. This then
lets wasm-reduce drive fuzz_opt.py itself as a whole. No more
extracting a testcase and all its commands, it's all done for you.
The fuzzer script will print out hopefully-useful text when it finds
a bug, something like this:
================================================================================
You found a bug! Please report it with
seed: 4671273171120144526
and the exact version of Binaryen you found it on, plus the exact Python
version (hopefully deterministic random numbers will be identical).
You can run that testcase again with "fuzz_opt.py 4671273171120144526"
The initial wasm file used here is saved as /home/username/binaryen/out/test/original.wasm
You can try to reduce the testcase with
wasm-reduce /home/username/binaryen/out/test/original.wasm '--command=bash reduce.sh' -t /home/username/binaryen/out/test/t.wasm -w /home/username/binaryen/out/test/w.wasm
where "reduce.sh" is something like
# check the input is even a valid wasm file
bin/wasm-opt /home/username/binaryen/out/test/t.wasm
echo $?
# run the command
./scripts/fuzz_opt.py 4671273171120144526 /home/username/binaryen/out/test/t.wasm > o 2> e
cat o | tail -n 10
echo $?
You may want to adjust what is printed there: in the example we save stdout
and stderr separately and then print (so that wasm-reduce can see it) what we
think is the relevant part of that output. Make sure that includes the right
details, and preferably no more (less details allow more reduction, but raise
the risk of it reducing to something you don't quite want).
You may also need to add --timeout 5 or such if the testcase is a slow one.
================================================================================
The text has full instructions to run the reducer, which should
work in almost all cases - see (*) note below. Because of that
corner case I think it's safer to not run the reducer automatically,
but it's just a quick copy-paste away, and the user can then adjust
the reduce.sh script if necessary.
(*) Well, almost any. There are some corner cases, such as if the
fuzzer generator includes bounds checks in the wasm, reduction
might remove them. We can fix this eventually by making the
bounds checks additions a pass that can be run after the fuzzer
generator, but meanwhile you can work around this by making the
reduction script look for the right thing (i.e. if all it looks for is a
failing return code, that won't be enough as a removed bounds
check will fail but on something else).
-rwxr-xr-x | scripts/fuzz_opt.py | 230 |
1 files changed, 99 insertions, 131 deletions
diff --git a/scripts/fuzz_opt.py b/scripts/fuzz_opt.py index 94f8d955f..2d15ef76a 100755 --- a/scripts/fuzz_opt.py +++ b/scripts/fuzz_opt.py @@ -19,10 +19,10 @@ import contextlib import os import difflib import math +import shutil import subprocess import random import re -import shutil import sys import time import traceback @@ -182,7 +182,7 @@ def compare_between_vms(x, y, context): x_lines = x.splitlines() y_lines = y.splitlines() if len(x_lines) != len(y_lines): - return compare(x, y, context) + return compare(x, y, context + ' (note: different number of lines between vms)') num_lines = len(x_lines) for i in range(num_lines): @@ -269,13 +269,6 @@ def run_d8(wasm): return run_vm([shared.V8] + shared.V8_OPTS + [in_binaryen('scripts', 'fuzz_shell.js'), '--', wasm]) -# There are two types of test case handlers: -# * get_commands() users: these return a list of commands to run (for example, "run this wasm-opt -# command, then that one"). The calling code gets and runs those commands on the test wasm -# file, and has enough information and control to be able to perform auto-reduction of any -# bugs found. -# * Totally generic: These receive the input pattern, a wasm generated from it, and a wasm -# optimized from that, and can then do anything it wants with those. class TestCaseHandler: # how frequent this handler will be run. 1 means always run it, 0.5 means half the # time @@ -284,9 +277,9 @@ class TestCaseHandler: def __init__(self): self.num_runs = 0 - # If the core handle_pair() method is not overridden, it calls handle_single() - # on each of the pair. That is useful if you just want the two wasms, and don't - # care about their relationship + # If the core handle_pair() method is not overridden, it calls handle() on + # each of the items. That is useful if you just want the two wasms and don't + # care about their relationship. def handle_pair(self, input, before_wasm, after_wasm, opts): self.handle(before_wasm) self.handle(after_wasm) @@ -312,7 +305,17 @@ class VM: self.can_compare_to_others = can_compare_to_others +# Fuzz the interpreter with --fuzz-exec. +class FuzzExec(TestCaseHandler): + frequency = 1 + + def handle_pair(self, input, before_wasm, after_wasm, opts): + run([in_bin('wasm-opt'), before_wasm] + opts + ['--fuzz-exec']) + + class CompareVMs(TestCaseHandler): + frequency = 0.5 + def __init__(self): super(CompareVMs, self).__init__() @@ -458,28 +461,10 @@ class CompareVMs(TestCaseHandler): return all([x in feature_opts for x in ['--disable-simd', '--disable-reference-types', '--disable-exception-handling', '--disable-multivalue']]) -# Fuzz the interpreter with --fuzz-exec. This tests everything in a single command (no -# two separate binaries) so it's easy to reproduce. -class FuzzExec(TestCaseHandler): - def get_commands(self, wasm, opts, random_seed): - return [ - '%(MAX_INTERPRETER_ENV_VAR)s=%(MAX_INTERPRETER_DEPTH)d %(wasm_opt)s --fuzz-exec %(opts)s %(wasm)s' % { - 'MAX_INTERPRETER_ENV_VAR': MAX_INTERPRETER_ENV_VAR, - 'MAX_INTERPRETER_DEPTH': MAX_INTERPRETER_DEPTH, - 'wasm_opt': in_bin('wasm-opt'), - 'opts': ' '.join(opts), - 'wasm': wasm - } - ] - - # Check for determinism - the same command must have the same output. -# Note that this doesn't use get_commands() intentionally, since we are testing -# for something that autoreduction won't help with anyhow (nondeterminism is very -# hard to reduce). class CheckDeterminism(TestCaseHandler): # not that important - frequency = 0.333 + frequency = 0.3 def handle_pair(self, input, before_wasm, after_wasm, opts): # check for determinism @@ -489,6 +474,8 @@ class CheckDeterminism(TestCaseHandler): class Wasm2JS(TestCaseHandler): + frequency = 0.6 + def handle_pair(self, input, before_wasm, after_wasm, opts): # always check for compiler crashes. without NaNs we can also compare # before and after (with NaNs, a reinterpret through memory might end up @@ -531,6 +518,8 @@ class Wasm2JS(TestCaseHandler): class Asyncify(TestCaseHandler): + frequency = 0.6 + def handle_pair(self, input, before_wasm, after_wasm, opts): # we must legalize in order to run in JS run([in_bin('wasm-opt'), before_wasm, '--legalize-js-interface', '-o', before_wasm] + FEATURE_OPTS) @@ -593,91 +582,26 @@ testcase_handlers = [ # Do one test, given an input file for -ttf and some optimizations to run -def test_one(random_input, opts, allow_autoreduce): +def test_one(random_input, opts, given_wasm): randomize_pass_debug() randomize_feature_opts() randomize_fuzz_settings() print() - generate_command = [in_bin('wasm-opt'), random_input, '-ttf', '-o', 'a.wasm'] + FUZZ_OPTS + FEATURE_OPTS - if PRINT_WATS: - printed = run(generate_command + ['--print']) - with open('a.printed.wast', 'w') as f: - f.write(printed) + if given_wasm: + shutil.copyfile(given_wasm, 'a.wasm') else: - run(generate_command) + generate_command = [in_bin('wasm-opt'), random_input, '-ttf', '-o', 'a.wasm'] + FUZZ_OPTS + FEATURE_OPTS + if PRINT_WATS: + printed = run(generate_command + ['--print']) + with open('a.printed.wast', 'w') as f: + f.write(printed) + else: + run(generate_command) wasm_size = os.stat('a.wasm').st_size bytes = wasm_size print('pre wasm size:', wasm_size) - # first, run all handlers that use get_commands(). those don't need the second wasm in the - # pair, since they all they do is return their commands, and expect us to run them, and - # those commands do the actual testing, by operating on the original input wasm file. by - # fuzzing the get_commands() ones first we can find bugs in creating the second wasm (that - # has the opts run on it) before we try to create it later down for the passes that - # expect to get it as one of their inputs. - for testcase_handler in testcase_handlers: - if testcase_handler.can_run_on_feature_opts(FEATURE_OPTS): - if hasattr(testcase_handler, 'get_commands'): - print('running testcase handler:', testcase_handler.__class__.__name__) - testcase_handler.increment_runs() - - # if the testcase handler supports giving us a list of commands, then we can get those commands - # and use them to do useful things like automatic reduction. in this case we give it the input - # wasm plus opts and a random seed (if it needs any internal randomness; we want to have the same - # value there if we reduce). - random_seed = random.random() - - # gets commands from the handler, for a given set of optimizations. this is all the commands - # needed to run the testing that that handler wants to do. - def get_commands(opts): - return testcase_handler.get_commands(wasm='a.wasm', opts=opts + FUZZ_OPTS + FEATURE_OPTS, random_seed=random_seed) - - def write_commands_and_test(opts): - commands = get_commands(opts) - write_commands(commands, 't.sh') - subprocess.check_call(['bash', 't.sh']) - - try: - write_commands_and_test(opts) - except subprocess.CalledProcessError: - if allow_autoreduce: - print('') - print('====================') - print('Found a problem! See "t.sh" for the commands, and "input.wasm" for the input. Auto-reducing to "reduced.wasm" and "tt.sh"...') - print('====================') - print('') - # first, reduce the fuzz opts: keep removing until we can't - while 1: - reduced = False - for i in range(len(opts)): - # some opts can't be removed, like --flatten --dfo requires flatten - if opts[i] == '--flatten': - if i != len(opts) - 1 and opts[i + 1] in ('--dfo', '--local-cse', '--rereloop'): - continue - shorter = opts[:i] + opts[i + 1:] - try: - write_commands_and_test(shorter) - except subprocess.CalledProcessError: - # great, the shorter one is good as well - opts = shorter - print('reduced opts to ' + ' '.join(opts)) - reduced = True - break - if not reduced: - break - # second, reduce the wasm - # copy a.wasm to a safe place as the reducer will use the commands on new inputs, and the commands work on a.wasm - shutil.copyfile('a.wasm', 'input.wasm') - # add a command to verify the input. this lets the reducer see that it is indeed working on the input correctly - commands = [in_bin('wasm-opt') + ' -all a.wasm'] + get_commands(opts) - write_commands(commands, 'tt.sh') - # reduce the input to something smaller with the same behavior on the script - subprocess.check_call([in_bin('wasm-reduce'), 'input.wasm', '--command=bash tt.sh', '-t', 'a.wasm', '-w', 'reduced.wasm']) - print('Finished reduction. See "tt.sh" and "reduced.wasm".') - raise Exception('halting after autoreduction') - print('') - # create a second wasm for handlers that want to look at pairs. generate_command = [in_bin('wasm-opt'), 'a.wasm', '-o', 'b.wasm'] + opts + FUZZ_OPTS + FEATURE_OPTS if PRINT_WATS: @@ -690,26 +614,29 @@ def test_one(random_input, opts, allow_autoreduce): bytes += wasm_size print('post wasm size:', wasm_size) - # run some of the pair handling handlers. we don't run them all so that we get more - # coverage of different wasm files (but we do run all get_commands() using ones, - # earlier, because those are autoreducible and we want to maximize the chance of - # that) - # however, also try our best to pick a handler that can run this + # first, find which handlers can even run here relevant_handlers = [handler for handler in testcase_handlers if not hasattr(handler, 'get_commands') and handler.can_run_on_feature_opts(FEATURE_OPTS)] - NUM_PAIR_HANDLERS = 2 - if len(relevant_handlers) > 0: - chosen_handlers = set(random.choices(relevant_handlers, k=NUM_PAIR_HANDLERS)) - for testcase_handler in chosen_handlers: - assert testcase_handler.can_run_on_feature_opts(FEATURE_OPTS) - assert not hasattr(testcase_handler, 'get_commands') - if random.random() < testcase_handler.frequency: - print('running testcase handler:', testcase_handler.__class__.__name__) - testcase_handler.increment_runs() - - # let the testcase handler handle this testcase however it wants. in this case we give it - # the input and both wasms. - testcase_handler.handle_pair(input=random_input, before_wasm='a.wasm', after_wasm='b.wasm', opts=opts + FUZZ_OPTS + FEATURE_OPTS) - print('') + if len(relevant_handlers) == 0: + return 0 + # filter by frequency + filtered_handlers = [handler for handler in relevant_handlers if random.random() < handler.frequency] + if len(filtered_handlers) == 0: + # pick at least one, to not waste the effort we put into making the wasm + filtered_handlers = [random.choice(relevant_handlers)] + # run only some of the pair handling handlers. if we ran them all all the + # time that would mean we have less variety in wasm files and passes run + # on them in the same amount of time. + NUM_PAIR_HANDLERS = 3 + chosen_handlers = set(random.choices(filtered_handlers, k=NUM_PAIR_HANDLERS)) + for testcase_handler in chosen_handlers: + assert testcase_handler.can_run_on_feature_opts(FEATURE_OPTS) + print('running testcase handler:', testcase_handler.__class__.__name__) + testcase_handler.increment_runs() + + # let the testcase handler handle this testcase however it wants. in this case we give it + # the input and both wasms. + testcase_handler.handle_pair(input=random_input, before_wasm='a.wasm', after_wasm='b.wasm', opts=opts + FUZZ_OPTS + FEATURE_OPTS) + print('') return bytes @@ -817,9 +744,16 @@ print('POSSIBLE_FEATURE_OPTS:', POSSIBLE_FEATURE_OPTS) if __name__ == '__main__': # if we are given a seed, run exactly that one testcase. otherwise, # run new ones until we fail - if len(sys.argv) == 2: + # if we are given a seed, we can also be given a wasm file, which we use + # instead of the randomly generating one. this can be useful for + # reduction. + given_wasm = None + if len(sys.argv) >= 2: given_seed = int(sys.argv[1]) print('checking a single given seed', given_seed) + if len(sys.argv) >= 3: + given_wasm = sys.argv[2] + print('using given wasm file', given_wasm) else: given_seed = None print('checking infinite random inputs') @@ -852,9 +786,7 @@ if __name__ == '__main__': opts = randomize_opt_flags() print('randomized opts:', ' '.join(opts)) try: - # don't autoreduce if we are given a specific case to test, as this - # is a reproduction of the test case, not the first finding of it - total_wasm_size += test_one(raw_input_data, opts, allow_autoreduce=given_seed is None) + total_wasm_size += test_one(raw_input_data, opts, given_wasm) except KeyboardInterrupt: print('(stopping by user request)') break @@ -873,6 +805,12 @@ if __name__ == '__main__': if given_seed is not None: given_seed_passed = False else: + # show some useful info about filing a bug and reducing the + # testcase (to make reduction simple, save "original.wasm" on + # the side, so that we can autoreduce using the name "a.wasm" + # which we use internally) + original_wasm = os.path.abspath('original.wasm') + shutil.copyfile('a.wasm', original_wasm) print('''\ ================================================================================ You found a bug! Please report it with @@ -882,9 +820,39 @@ You found a bug! Please report it with and the exact version of Binaryen you found it on, plus the exact Python version (hopefully deterministic random numbers will be identical). -(you can run that testcase again with "fuzz_opt.py %(seed)d") +You can run that testcase again with "fuzz_opt.py %(seed)d" + +The initial wasm file used here is saved as %(original_wasm)s + +You can try to reduce the testcase with + + wasm-reduce %(original_wasm)s '--command=bash reduce.sh' -t %(temp_wasm)s -w %(working_wasm)s + +where "reduce.sh" is something like + + # check the input is even a valid wasm file + bin/wasm-opt %(temp_wasm)s + echo $? + + # run the command + ./scripts/fuzz_opt.py %(seed)s %(temp_wasm)s > o 2> e + cat o | tail -n 10 + echo $? + +You may want to adjust what is printed there: in the example we save stdout +and stderr separately and then print (so that wasm-reduce can see it) what we +think is the relevant part of that output. Make sure that includes the right +details, and preferably no more (less details allow more reduction, but raise +the risk of it reducing to something you don't quite want). + +You may also need to add --timeout 5 or such if the testcase is a slow one. + +After reduction, the reduced file will be in %(working_wasm)s ================================================================================ - ''' % {'seed': seed}) + ''' % {'seed': seed, + 'original_wasm': original_wasm, + 'temp_wasm': os.path.abspath('t.wasm'), + 'working_wasm': os.path.abspath('w.wasm')}) break if given_seed is not None: if given_seed_passed: |