| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had two JS files that could run a wasm file for fuzzing purposes:
* --emit-js-shell, which emitted a custom JS file that runs the wasm.
* scripts/fuzz_shell.js, which was a generic file that did the same.
Both of those load the wasm and then call the exports in order and print out
logging as it goes of their return values (if any), exceptions, etc. Then the
fuzzer compares that output to running the same wasm in another VM, etc. The
difference is that one was custom for the wasm file, and one was generic. Aside
from that they are similar and duplicated a bunch of code.
This PR improves things by removing 1 and using 2 in all places, that is, we
now use the generic file everywhere.
I believe we added 1 because we thought a generic file can't do all the
things we need, like know the order of exports and the types of return values,
but in practice there are ways to do those things: The exports are in fact
in the proper order (JS order of iteration is deterministic, thankfully), and
for the type we don't want to print type internals anyhow since that would
limit fuzzing --closed-world. We do need to be careful with types in JS (see
notes in the PR about the type of null) but it's not too bad. As for the types
of params, it's fine to pass in null for them all anyhow (null converts to a
number or a reference without error).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the idea was that we started with HANG_LIMIT = 10 or so, and we'd decrement
it by one in each potentially-recursive call and loop entry. When we reached 0 we'd start
to unwind the stack. Then, after we unwound it all the way, we'd reset HANG_LIMIT before
calling the next export.
That approach adds complexity that each "execution wrapper", like for JS or for --fuzz-exec,
had to manually reset HANG_LIMIT. That was done by calling an export. Calls to those
exports had to appear in various places, which is sort of a hack.
The new approach here does the following when the hang limit reaches zero: It resets
HANG_LIMIT, and it traps. The trap unwinds the call stack all the way out. When the next
export is called, it will have a fresh hang limit since we reset it before the trap.
This does have downsides. Before, we did not always trap when we hit the hang limit but
rather we'd emit something unreachable, like a return. The idea was that we'd leave the
current function scope at least, so we don't hang forever. That let us still execute a small
amount of code "on the way out" as we unwind the stack. I'm not sure it's worth the
complexity for that.
The advantages of this PR are to simplify the code, and also it makes more fuzzing
approaches easy to implement. I'd like to add a wasm-ctor-eval fuzzer, and having to add
hacks to call the hang limit init export in it would be tricky. With this PR, the execution
model is simple in the fuzzer: The exports are called one by one, in order, and that's it -
no extra magic execution needs to be done.
Also bump the hang limit from 10 to 100, just to give some more chance for code to run.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the goal of supporting null characters (i.e. zero bytes) in strings.
Rewrite the underlying interned `IString` to store a `std::string_view` rather
than a `const char*`, reduce the number of map lookups necessary to intern a
string, and present a more immutable interface.
Most importantly, replace the `c_str()` method that returned a `const char*`
with a `toString()` method that returns a `std::string`. This new method can
correctly handle strings containing null characters. A `const char*` can still
be had by calling `data()` on the `std::string_view`, although this usage should
be discouraged.
This change is NFC in spirit, although not in practice. It does not intend to
support any particular new functionality, but it is probably now possible to use
strings containing null characters in at least some cases. At least one parser
bug is also incidentally fixed. Follow-on PRs will explicitly support and test
strings containing nulls for particular use cases.
The C API still uses `const char*` to represent strings. As strings containing
nulls become better supported by the rest of Binaryen, this will no longer be
sufficient. Updating the C and JS APIs to use pointer, length pairs is left as
future work.
|
|
|
|
|
|
|
|
|
| |
When using nominal types, func.ref of two functions with identical signatures
but different HeapTypes will yield different types. To preserve these semantics,
Functions need to track their HeapTypes, not just their Signatures.
This PR replaces the Signature field in Function with a HeapType field and adds
new utility methods to make it almost as simple to update and query the function
HeapType as it was to update and query the Function Signature.
|
|
|
|
| |
VMs will not convert a 0 or undefined from JS into a wasm null reference - it must be null.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This matches #3747 which makes us not log out reference values, instead
we print just their types.
This also prints a type for non-reference things, replacing a previous
exception, which affects things like SIMD and BigInts, but those trap
anyhow at the JS boundary I believe (or did that change for SIMD?).
Anyhow, printing the type won't give a false "ok" when comparing wasm2js
output to the interpreter, assuming the interpreter prints out a value and
not just a type (which is the case). We could try to do better, but this
code is on the JS side, where we don't have the type - just a string
representation of it, which we'd need to parse etc.
|
| |
|
|
|
| |
Since they make the code clearer and more self-documenting.
|
|
|
| |
This leads to simpler code and is a prerequisite for #3012, which makes it so that not all `Type`s are backed by vectors that `expand` could return.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main benefit here is comparing VMs, instead of just comparing
each VM to itself after opts. Comparing VMs is a little tricky since there
is room for nondeterminism with how results are printed and other
annoying things, which is why that didn't work well earlier.
With this PR I can run 10's of thousands of iterations without finding
any issues between v8 and the binaryen interpreter. That's after
fixing the various issues over the last few days as found by this:
#2760 #2757 #2750 #2752
Aside from that main benefit I ended up adding more improvements
to make it practical to do all that testing:
Randomize global fuzz settings like whether we allow NaNs and
out-of-bounds memory accesses. (This was necessary here since
we have to disable cross-VM comparisons if NaNs are enabled.)
Better logging of statistics like how many times each handler
was run.
Remove redundant FuzzExecImmediately handler (looks like
after past refactorings it was no longer adding any value).
Deterministic testcase handling: if you run e.g. fuzz_opt.py 42 it
will run one testcase and exactly the same one. If you run without
an argument it will run forever until it fails, and if it fails, it prints out
that ID so that you can easily reproduce it (I guess, on the same
binaryen + same python, not sure how python's deterministic
RNG changes between versions and builds).
Upgrade to Python 3.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Function signatures were previously redundantly stored on Function
objects as well as on FunctionType objects. These two signature
representations had to always be kept in sync, which was error-prone
and needlessly complex. This PR takes advantage of the new ability of
Type to represent multiple value types by consolidating function
signatures as a pair of Types (params and results) stored on the
Function object.
Since there are no longer module-global named function types,
significant changes had to be made to the printing and emitting of
function types, as well as their parsing and manipulation in various
passes.
The C and JS APIs and their tests also had to be updated to remove
named function types.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds the ability to create multivalue types from vectors of concrete value
types. All types are transparently interned, so their representation is still a
single uint32_t. Types can be extracted into vectors of their component parts,
and all the single value types expand into vectors containing themselves.
Multivalue types are not yet used in the IR, but their creation and inspection
functionality is exposed and tested in the C and JS APIs.
Also makes common type predicates methods of Type and improves the ergonomics of
type printing.
|
|
|
| |
Applies the changes in #2065, and temprarily disables the hook since it's too slow to run on a change this large. We should re-enable it in a later commit.
|
|
|
| |
Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
|
|
|
|
|
|
|
|
|
| |
* make DE_NAN avoid creating nan literals in the first place
* add a reducer option `--denan` to not introduce nans in destructive reduction
* add a `Literal::isNaN()` method
* also remove the default exception logging from the fuzzer js glue, which is a source of non-useful VM differences (like nan nondeterminism)
* added an option `--no-fuzz-nans` to make it easy to avoid nans when fuzzing (without hacking the source and recompiling).
Background: trying to get fuzzing on jsc working despite this open issue: https://bugs.webkit.org/show_bug.cgi?id=175691
|
|
|
|
|
|
|
|
|
|
|
| |
The main fuzz_opt.py script compares JS VMs, and separately runs binaryen's fuzz-exec that compares the binaryen interpreter to itself (before and after opts). This PR lets us directly compare binaryen's interpreter output to JS VMs. This found a bunch of minor things we can do better on both sides, giving more fuzz coverage.
To enable this, a bunch of tiny fixes were needed:
* Add --fuzz-exec-before which is like --fuzz-exec but just runs the code before opts are run, instead of before and after.
* Normalize double printing (so JS and C++ print comparable things). This includes negative zero in JS, which we never printed properly til now.
* Various improvements to how we print fuzz-exec logging - remove unuseful things, and normalize the others across JS and C++.
* Properly legalize the wasm when --emit-js-wrapper (i.e., we will run the code from JS), and use that in the JS wrapper code.
|
|
|
|
|
|
|
|
|
| |
After we added logging to the fuzzer, we forgot to add to the JS glue code the necessary imports so it can be run there too.
Also adds legalization for the JS glue code imports and exports.
Also adds a missing validator check on imports having a function type (the fuzzing code was missing one).
Fixes #1842
|
|
|
|
| |
* rename WasmType to Type. it's in the wasm:: namespace anyhow, and without Wasm- it fits in better alongside Index, Address, Expression, Module, etc.
|
|
This adds a new method of fuzzing, "translate to fuzz" which means we consider the input to be a stream of data that we translate into a valid wasm module. It's sort of like a random seed for a process that creates a random wasm module. By using the input that way, we can explore the space of valid wasm modules quickly, and it makes afl-fuzz integration easy.
Also adds a "fuzz binary" option which is similar to "fuzz execution". It makes wasm-opt not only execute the code before and after opts, but also write to binary and read from it, helping to fuzz the binary format.
|