| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
|
|
|
| |
Split them into two i32 globals.
|
|
|
| |
Fixes #1984
|
| |
|
|
|
|
|
| |
In the absence of the target features section or command line flags. When there are command line flags, it is an error if they do not exactly match the target features section, except if --detect-features has been provided.
Also adds a --print-features pass to print the command line flags for all enabled options and uses it to make the feature tests more rigorous.
|
|
|
|
|
| |
* Make the memory instrumentation pass log both pointers and values.
* Use "env" as the import module - simpler to support and get working.
|
|
|
| |
Fixes #2007 #2008
|
|
|
|
|
| |
This allows us to emit a (potentially modified) target features
section and conditionally emit other sections such as the DataCount
section based on the presence of features.
|
|
|
|
|
|
|
|
| |
* I64ToI32Lowering - don't assume address 0 is a hardcoded location for scratch memory. Import __tempMemory__ for that.
* RemoveNonJSOps - also use __tempMemory__. Oddly here the address was a hardcoded 1024 (perhaps where the rust program put a static global?).
* Support imported ints in wasm2js, coercing them as needed.
* Add "env" import support in the tests, since now we emit imports from there.
* Make wasm2js tests split out multi-module tests using split_wast which is more robust and avoids emitting multiple outputs in one file (which makes no sense for ES6 modules)
|
| |
|
|
|
| |
We handled this for the normal case, but the optimizer can also precompute a return into a value, so check the output of the precomputation as well.
|
|
|
|
|
|
|
| |
Get fuzzer to attempt to create almost all features. Pass v8 all the flags to allow that.
Fix fuzz bugs where we read signed_ even when it was irrelevant for that type of load.
Improve wasm-reduce on fuzz testcases, try to replace a node with drops of its children, not just the children themselves.
|
|
|
|
| |
computations (#1990)
|
|
|
|
| |
(#1989)
|
|
|
|
|
|
|
|
| |
Hash the contents of all of memory and log that out in random places in the fuzzer, so we are more sensitive there and can catch memory bugs.
Fix UB that was uncovered by this in the binary writing code - if a segment is empty, we should not look at &vector[0], and instead use vector.data().
Add Builder::addExport convenience method.
|
|
|
|
|
|
| |
It was previously part of writing a binary, but changing the number of
segments at such a late stage would not work in the presence of bulk
memory's datacount section. Also updates the memory packing pass
to respect the web's limits on the number of data segments.
|
|
|
|
| |
optimizing to an unreachable (#1985)
|
|
|
|
|
| |
Adds support for the bulk memory proposal's passive segments. Uses a
new (data passive ...) s-expression syntax to mark sections as
passive.
|
|
|
|
|
|
|
| |
Minus multi-memory which we don't support yet.
Improve validator.
Fix some minor validation issues in our tests.
|
|
|
|
|
|
|
|
|
| |
This allows
wasm-opt --pass-arg=KEY:VALUE
where KEY and VALUE are strings. It is then added to passOptions.arguments, where passes can read it.
This is used in ExtractFunction instead of an env var.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
This renames the following:
- `i32.wait` -> `i32.atomic.wait`
- `i64.wait` -> `i64.atomic.wait`
- `wake` -> `atomic.notify`
to match the spec.
|
|
|
|
|
|
|
| |
This adds an ssa-nomerge pass, which like ssa creates new local indexes for each set, but it does not alter indexes that have merges (in practice adding indexes to merges can lead to more copies in the end.)
This also stops adding a new local index for a set that is already in "ssa form", that is, has only one set (aside from the zero initialization which wasm mandates, but for an "ssa form" index, that must not be used).
This then enables ssa-nomerge in -O3 and -Os. This doesn't help much on well-optimized code like from the wasm backend (but it does sometimes - 0.5% code size improvement on Box2D), but on AssemblyScript for example it can remove a copy in the n-body benchmark as can be seen in the test updates here.
|
|
|
|
|
|
|
| |
A propagated constant can be helpful in the various patterns in optimize instructions.
Testcase shows an example of this in action - we can optimize out a load offset for a constant, but if we propagated it afterwards, we would miss that.
In general these two passes can help each other, so maybe they should be combined and run multiple iterations, but that's what --converge is for. Meanwhile this change improves us on what seems to be the more common case - guessed at by it being what I noticed in practice, and when I run the fuzzer, I see only this type of case.
|
|
|
|
|
| |
Recreate it using --extract-function which turns unwanted functions into exports. This avoids weirdness with them having empty function bodies and the inliner taking advantage of that.
Also uses updated LLVM, which no longer has incorrectly identified irreducible control flow here.
|
|
|
|
| |
Also, always output high level metrics even when zero.
|
|
|
|
|
| |
Parse the formats allowed by the spec proposal and emit the i32x4
canonical format.
|
| |
|
|
|
| |
And run it in wasm-emscripten-finalize. This will prevent the emscripten output from changing when the target features section lands in LLVM.
|
| |
|
|
|
|
|
|
|
| |
That caused it to miss switch targets, and a code-folding bug.
Fixes #1838
Sadly the fuzzer didn't find this because code folding looks for very particular code patterns that are unlikely to be emitted randomly.
|
| |
|
|
|
|
| |
that one var by reusing a param
|
|
|
|
| |
uses of the original add, as otherwise we may just be adding work (both an offset, and an add). Refactor local-utils.h, and make UnneededSetRemover also check for side effects, so it cleanly removes all traces of unneeded sets.
|
|
|
|
|
| |
Multiple propagations may be possible in some cases, like nested
structs in C.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The initial OptimizeAddedConstants pass did not try to handle the case of non-ssa locals. However, that can happen, and optimizing those cases too improves us by almost 1% of code size on some large benchmarks like bullet.
How this works is that if we see
b = a + 10
a = c
load(b)
then we copy the base value at the add,
a' = a
b = a' + 10
a = c
load(a', offset=10)
This no longer has a guarantee of improving code size, since in theory both b and a may have other uses. However, in practice it's very common for b to be optimized out later.
|
|
|
|
|
| |
This can happen in emscripten if we run fpcast-emu after previous passes created dynCalls already. If so, it's fine to just do nothing.
Fixes emscripten-core/emscripten#8229
|
|
|
|
|
|
|
|
| |
This PR changes the formatting of v128.const literals in text format / stack ir like so
- v128.const i32 0x1 0x2 0x3 0x4 0x5 0x6 0x7 0x8 0x9 0xa 0xb 0xc 0xd 0xe 0xf 0x80
+ v128.const i32 0x04030201 0x08070605 0x0c0b0a09 0x800f0e0d
Recently hit this when trying to load Binaryen generated text format with WABT, which errored with `error: unexpected token 0x5, expected ).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See #1919 - we did not do this consistently before.
This adds a lowMemoryUnused option to PassOptions. It can be passed on the commandline with --low-memory-unused. If enabled, we run the new optimize-added-constants pass, which does the real work here, replacing older code in post-emscripten.
Aside from running at the proper time (unlike the old pass, see #1919), this also has a -propagate mode, which can do stuff like this:
y = x + 10
[..]
load(y)
[..]
load(y)
=>
y = x + 10
[..]
load(x, offset=10)
[..]
load(x, offset=10)
That is, it can propagate such offsets to the loads/stores. This pattern is common in big interpreter loops, where the pointers are offsets into a big struct of state.
The pass does this propagation by using a new feature of LocalGraph, which can verify which locals are in SSA mode. Binaryen IR is not SSA (intentionally, since it's a later IR), but if a local only has a single set for all gets, that means that local is in such a state, and can be optimized. The tricky thing is that all locals are initialized to zero, so there are at minimum two sets. But if we verify that the real set dominates all the gets, then the zero initialization cannot reach them, and we are safe.
This PR also makes safe-heap aware of lowMemoryUnused. If so, we check for not just an access of 0, but the range 0-1023.
This makes zlib 5% faster, with either the wasm backend or asm2wasm. It also makes it 0.5% smaller. Also helps sqlite (1.5% faster) and lua (1% faster)
|
|
|
|
|
| |
This refactors the hashing and comparison code to use a single immediate-value iterator. This makes us have a single place that knows the list of immediate fields in every node type, instead of 2.
This also fixes a few bugs found by doing that. In particular, this makes us slightly slower than before since we are hashing more fields.
|
|
|
|
|
|
|
| |
* Finds functions whose return value is always dropped, and removes the return.
* Run multiple iterations of the pass, as one can enable others.
* Do not run DeadArgumentElimination at all if debug info is present (with these improvements, it became much more likely to destroy debug info).
Saves 2.5% on hello world, because of some simple libc calls.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Checks if a value is being dropped higher up, like
```
(drop
(block i32
(block i32
(i32.const 1)
)
)
)
```
Handling this forces us to be careful in that pass about whether a value is used, and whether the type matters (for example, we can't replace a unary with its child in all cases, if the return value matters).
|
|
|
|
|
|
|
|
|
| |
* make DE_NAN avoid creating nan literals in the first place
* add a reducer option `--denan` to not introduce nans in destructive reduction
* add a `Literal::isNaN()` method
* also remove the default exception logging from the fuzzer js glue, which is a source of non-useful VM differences (like nan nondeterminism)
* added an option `--no-fuzz-nans` to make it easy to avoid nans when fuzzing (without hacking the source and recompiling).
Background: trying to get fuzzing on jsc working despite this open issue: https://bugs.webkit.org/show_bug.cgi?id=175691
|
|
|
| |
See [emscripten-core/emscripten#7679
|
|
|
|
|
|
|
|
| |
WebAssembly/tool-conventions#93 has a summary of emscripten's current thinking on this. For Binaryen, we don't want to do anything to the producers section by default, but do want it to be possible to optionally remove it. To achieve that, this PR
* creates a --strip-producers pass that removes that section.
* creates a --strip-debug pass that removes debug info, same as the old --strip, which is still around but deprecated.
A followup in emscripten will use this pass by default.
|
|
|
| |
Before this, we just did not emit illegal dynCalls. This was wrong as we do need them (e.g. if a function with a setjmp call calls a function with an i64 param - we'll have an invoke with that signature there). We just need to legalize them. This fixes that by first emitting them, and second by running legalization late, after dynCalls have been generated, so it legalizes them too.
|
|
|
|
|
|
|
|
|
|
| |
FuncCastEmulation supports a hardcoded number of parameters:
// This should be enough for everybody. (As described above, we need this
// to match when dynamically linking, and also dynamic linking is why we
// can't just detect this automatically in the module we see.)
static const int NUM_PARAMS = 15;
Turns out 15 is not enough for everybody: Ruby 2.6.0 needs NUM_PARAMS = 16. This patch is necessary to support Ruby 2.6.0 in WebAssembly, and in fact is the only patch needed to make the relevant build process work with an otherwise normal emscripten toolchain.
|
|
|
|
|
| |
* Also fixes some bugs in wasm2js tests that did not validate.
* Rename FeatureOptions => ToolOptions, as they now contain all the basic stuff each tool needs for commandline options (validation yes or no, and which features if so).
|
|
|
|
|
|
|
|
|
|
|
| |
The main fuzz_opt.py script compares JS VMs, and separately runs binaryen's fuzz-exec that compares the binaryen interpreter to itself (before and after opts). This PR lets us directly compare binaryen's interpreter output to JS VMs. This found a bunch of minor things we can do better on both sides, giving more fuzz coverage.
To enable this, a bunch of tiny fixes were needed:
* Add --fuzz-exec-before which is like --fuzz-exec but just runs the code before opts are run, instead of before and after.
* Normalize double printing (so JS and C++ print comparable things). This includes negative zero in JS, which we never printed properly til now.
* Various improvements to how we print fuzz-exec logging - remove unuseful things, and normalize the others across JS and C++.
* Properly legalize the wasm when --emit-js-wrapper (i.e., we will run the code from JS), and use that in the JS wrapper code.
|