| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the only Ifs that were typed unreachable were those in which
both arms were unreachable and those in which the condition was
unreachable that would have otherwise been typed none. This caused
problems in IRBuilder because Ifs with unreachable conditions and
value-returning arms would have concrete types, effectively hiding the
unreachable condition from the logic for dropping concretely typed
expressions preceding an unreachable expression when finishing a scope.
Relax the conditions under which an If can be typed unreachable so that
all Ifs with unreachable conditions or two unreachable arms are typed
unreachable. Propagating unreachability more eagerly this way makes
various optimizations of Ifs more powerful. It also requires new
handling for unreachable Ifs with concretely typed arms in the Printer
to ensure that printed wat remains valid.
Also update Unsubtyping, Flatten, and CodeFolding to account for the
newly unreachable Ifs.
|
|
|
|
|
|
| |
(#7116)
Lower away saturating fptoint operations when we know we are using
emscripten.
|
|
|
|
| |
The only internal use was in wasm2js, which doesn't need it. Fix API
tests to explicitly drop expressions as necessary.
|
|
|
|
|
|
|
|
|
|
|
| |
When a loop has no name, the name does not matter, but we also cannot
emit the same name for all such loops, as that is invalid JS. Just do not
emit a while(){} at all in that case, as no continue can exist anyhow.
Fixes #7099
Also fix two missing * in error reporting logic, that was printing pointers
rather than the expression we wanted to print. I think we changed how
iostream prints things years ago, and forgot to update these.
|
|
|
|
|
|
|
|
|
|
|
|
| |
IRBuilder often has to generate new label names for blocks and other
scopes. Previously it would generate each new name by starting with
"block" or "label" and incrementing a suffix until finding a fresh name,
but this made name generation quadratic in the number of names to
generate.
To spend less time generating names, track a hint index at which to
start looking for a fresh name and increment it every time a name is
generated. This speeds up a version of the binary parser that uses
IRBuilder by about 15%.
|
|
|
|
|
|
| |
Run the upstream tests by default, except for a large list of them that
do not successfully run. Remove the local version of those that do
successfully run where the local version is entirely subsumed by the
upstream version.
|
|
|
|
|
|
| |
We were missing code to mangle such names for JS. Without that, the name
of a temp var for the type `(ref $foo)` would end up with `(`, `)` in
the name, which is not valid in JS.
|
|
|
|
|
|
|
|
| |
Previously only basic types were allowed.
Generalizing this to arbitrary types means we use a map instead of a vector,
which is slower, but I can't measure any noticeable difference. Temp vars are
pretty rare, and there are just much slower parts of wasm2js, I think.
|
|
|
|
|
|
|
| |
(#6659)
This avoids special-casing particular global init forms. After this we should
support everything in global inits that we support anywhere else.
|
| |
|
|
|
|
|
|
|
| |
TableGet, Set, Size, Grow, Fill, Copy.
Also move "null" into shared-constants, to make the code
more consistent overall.
|
|
|
| |
This adds ref.eq, ref.null, ref.is_null, ref.func.
|
|
|
|
| |
When generating assertions, traverse the `WASTScript` data structure rather than
interleaving assertion parsing with emitting.
|
|
|
|
|
|
|
|
|
|
|
|
| |
We previously supported (and primarily used) a non-standard text format for
conditionals in which the condition, if-true expression, and if-false expression
were all simply s-expression children of the `if` expression. The standard text
format, however, requires the use of `then` and `else` forms to introduce the
if-true and if-false arms of the conditional. Update the legacy text parser to
require the standard format and update all tests to match. Update the printer to
print the standard format as well.
The .wast and .wat test inputs were mechanically updated with this script:
https://gist.github.com/tlively/85ae7f01f92f772241ec994c840ccbb1
|
|
|
|
|
| |
Update the legacy text parser and all tests to use the standard text format for shared memories, e.g. `(memory $m 1 1 shared)` rather than `(memory $m (shared 1 1))`. Also remove support for non-standard in-line "data" or "segment" declarations.
This change makes the tests more compatible with the new text parser, which only supports the standard format.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We previously supported a non-standard `(func "name" ...` syntax for declaring
functions exported with the quoted name. Since that is not part of the standard
text format, drop support for it, replacing it with the standard `(func $name
(export "name") ...` syntax instead.
Also replace our other usage of the quoted form in our text output, which was
where we quoted names containing characters that are not allowed to appear in
standard names. To handle that case, adjust our output from `"$name"` to
`$"name"`, which is the standards-track way of supporting such names. Also fix
how we detect non-standard name characters to match the spec.
Update the lit test output generation script to account for these changes,
including by making the `$` prefix on names mandatory. This causes the script to
stop interpreting declarative element segments with the `(elem declare ...`
syntax as being named "declare", so prevent our generated output from regressing
by counting "declare" as a name in the script.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`wasm2js.asserts.js` and `wasm2js.traps.js` seem to be used in wasm2js
asserts test:
https://github.com/WebAssembly/binaryen/blob/1d615b38dd4152494d2f4d3520c8b1d917624a30/scripts/test/wasm2js.py#L28
https://github.com/WebAssembly/binaryen/blob/1d615b38dd4152494d2f4d3520c8b1d917624a30/scripts/test/wasm2js.py#L126-L127
But other `*.js` tests in `test/` don't seem to be used anywhere. Please
let me know if they are actually being used.
This moves `wasm2js.asserts.js` and `wasm2js.traps.js`, which are only
used in wasmjs tests, to `test/wasm2js/`, and deletes all other `*.js`
tests in `test/`.
|
|
|
| |
That optimization uncovered some LLVM and Binaryen bugs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we export a function that just calls another function, we can export that one
instead. Then the one in the middle may be unused,
function foo() {
return bar();
}
export foo; // can be an export of bar
This saves a few bytes in rare cases, but probably more important is that it saves
the trampoline, so if this is on a hot path, we save a call.
Context: emscripten-core/emscripten#20478 (comment)
In general this is not needed as inlining helps us out by inlining foo() into the
caller (since foo is tiny, that always ends up happening). But exports are a case
the inliner cannot handle, so we do it here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#5989)
Fixes #5983: The testcase from there is used here in a new testcase
remove-unused-brs_levels in which we check if we are willing to unconditionally
do a division operation. Turning an if with an arm that does a division into a
select, which always does the division, is almost 5x slower, so we should probably
be extremely careful about doing that.
I took some measurements and have some suggestions for changes in this PR:
* Raise the cost of div/rem to what I measure on my machine, which is 5x slower
than an add, or worse.
* For some reason we added the if arms rather than take the max of them, so
fix that. This does not help the issue, but was confusing.
* Adjust TooCostlyToRunUnconditionally in the pass from 9 to 8 (this helps
balance the last point).
* Use half that value when not optimizing for size. That is, we allow only 4 extra
unconditional work normally, and 8 in -Os, and when -Oz then we allow any
extra amount.
Aside from the new testcases, some existing ones changed. They all appear to
change in a reasonable way, to me.
We should perhaps go even further than this, and not even run a division
unconditionally in -Os, but I wasn't sure it makes sense to go that far as
other benchmarks may be affected. For now, this makes the benchmark in
#5983 run at full speed in -O3 or -Os, and it remains slow in -Oz. The
modified version of the benchmark that only divides in the if (no other
operations) is still fast in -O3, but it become slow in -Os as we do turn that
if into a select (but again, I didn't want to go that far as to overfit on that one
benchmark).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Br and BrOn can consider the code before and after them connected if it might
be reached (which is the case if the Br has a condition, which BrOn always has).
The wasm2js changes may look a little odd as some of them have this:
i64toi32_i32$1 = i64toi32_i32$2;
i64toi32_i32$1 = i64toi32_i32$2;
I looked into that and the reason is that those outputs are not optimized, and
also even in unoptimized wasm2js we do run simplify-locals once (to try to
reduce the downsides of flatten). As a result, this PR makes a difference there,
and that difference can lead to such odd duplicated code after other operations.
However, there are no changes to optimized wasm2js outputs, so there is no
actual problem.
Followup to #5860.
|
|
|
|
|
|
|
|
|
|
|
| |
SimplifyLocals (#5860)
This addresses most of the minor regression from the correctness fix in #5857.
That PR makes us consider calls as branching, but in some cases it is ok to
ignore that branching (see the comment in the code here), which this PR allows as
an option.
This undoes one test change from that PR, showing it undoes the regression for
SimplifyLocals. More tests are added to cover this specifically as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes loops from having the effect "may trap (timeout)" to having
"may not return." The only noticeable difference is in TrapsNeverHappen mode,
which ignores the former but not the latter. So after this PR, in TNH mode we do
not optimize away an infinite loop that seems to have no other side effects. We
may also use this for other things in the future, like continuations/stack switching.
There are upsides and downsides to allowing the optimizer to remove infinite
loops (the C and C++ communities have had interesting discussions on that topic
over the years...) but it seems safer to not optimize them out for now, to let the
most code work properly. If a need comes up to optimize such code, we can look
at various options then (like a flag to ignore infinite loops).
See discussion in #5228
|
|
|
|
|
|
|
|
|
|
|
| |
Each time we inline we put the contents in a block. Before we used the same name
each time we inlined the same method, and as a result had many conflicts if a
function was inlined many times. With this PR we emit a different name each
time.
This is not 100% NFC as it does change block names, which is observable in the IR
(as can be seen in the test updates).
This helps #5696 in speeding up UniqueNameManner.
|
|
|
|
|
|
|
|
|
| |
* Do not treat `atomic.fence` as using a memory
Update RemoveUnusedModuleElements so that it no longer keeps the memory alive
due to an `atomic.fence` instruction and update validation to allow modules to
use `atomic.fence` without a memory.
* update wasm2js tests
|
|
|
|
|
|
|
|
|
|
| |
Previously we treated global.get as a constant expression and only
additionally verified that the target globals were immutable in some cases. But
global.get of a mutable global is never a constant expression, and further,
only imported globals are available in constant expressions unless GC is
enabled.
Fix constant expression validation to only allow global.get of immutable,
imported globals, and fix all the invalid tests.
|
|
|
|
|
| |
Without this fix, the common idiom of using `INT_MAX` in C source to mean an
unlimited number of waiters should be woken up actually compiled down to an
argument of -1 in JS, causing zero waiters to be woken.
|
|
|
|
|
| |
The assertion that the offset is zero does not necessarily hold for code that
uses this instruction via the clang builtin. Add support so that Emscripten
wasm2js tests pass in the presence of such code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This can handle e.g.
(drop
(i32.add
(call ..)
(call ..)
)
)
We can remove the add and just leave two dropped calls:
(drop
(call ..)
)
(drop
(call ..)
)
|
|
|
|
|
|
|
| |
As noted in #4739, legacy language emitting nan and infinity
exists, with the observation that it can be removed once asm.js
is no longer used and global NaN is available.
This commit removes that asm.js-specific code accordingly.
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in #4806, trying to optimize past level 0 can result in
passes emitting non-JS code, which is then unable to be converted during
final output.
This commit creates a new targetJS option in PassOptions, which can
be checked inside each pass where non-JS code might be emitted.
This commit initially adds that logic to OptimizeInstructions, where
this issue was first noticed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
possible (#5194)
(local.set $refined (cast (local.get $plain)))
..
.. (local.get $plain) .. ;; we can change this to read from $refined
By using the more refined type we may be able to eliminate casts later.
To do this, look at the fallthrough value (so we can look through a cast or a block
value - this is the reason for the small wasm2js improvements in tests), and also
extend the code that picks which local index to read to look at types (previously
we just ignored any pairs of locals with different types).
|
|
|
|
|
|
|
|
| |
The previous code was making emscripten-specific assumptions about
imports basically all coming from the `env` module.
I can't find a way to make this backwards compatible so may do a
combined roll with the emscripten-side change:
https://github.com/emscripten-core/emscripten/pull/17806
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recently we added logic to ignore effects that don't "escape" past the function call.
That is, e.g. local.set only affects the current function scope, and once the call stack
is unwound it no longer matters as an effect. This moves that logic to a shared place,
and uses it in the core Vacuum logic.
The new constructor in EffectAnalyzer receives a function and then scans it as
a whole. This works just like e.g. scanning a Block as a whole (if we see a break in
the block, that has an effect only inside it, and the Block + children doesn't have a
branch effect).
Various tests are updated so they don't optimize away trivially, by adding new
return values for them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
due to timeout (#5039)
I think this simplifies the logic behind what we consider to trap. Before we had kind of
a hack in visitLoop that now has a more clear reasoning behind it: we consider as
trapping things that trap in all VMs all the time, or will eventually. So a single allocation
doesn't trap, but an unbounded amount can, and an infinite loop is considered to
trap as well (a timeout in a VM will be hit eventually, somehow).
This means we cannot optimize way a trivial infinite loop with no effects in it,
while (1) {}
But we can optimize it out in trapsNeverHappen mode. In any event, such a loop
is not a realistic situation; an infinite loop with some other effect in it, like a call to
an import, will not be optimized out, of course.
Also clarify some other things regarding traps and trapsNeverHappen following
recent discussions in https://github.com/emscripten-core/emscripten/issues/17732
Specifically, TNH will never be allowed to remove calls to imports.
|
|
|
|
|
|
| |
This import was being injected and then used to implement trapping.
Rather than injecting an import that doesn't exist in the original
module we instead use the existing mechanism to implement this as
an internal helper.
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we were assuming asmLibraryArg which is what emscripten
passes as the `env` import object but using this method is more
flexible and should allow wasm2js to work with import that are
not all form a single object.
The slight size increase here is just temporary until emscripten
gets updated.
See https://github.com/emscripten-core/emscripten/pull/17737
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the wat parser would turn this input:
(block
(nop)
)
into something like this:
(block $block17
(nop)
)
It just added a name all the time, in case the block is referred to by an index
later even though it doesn't have a name.
This PR makes us rountrip more precisely by not adding such names: if there
was no name before, and there is no break by index, then do not add a name.
In addition, this will be useful for non-nullable locals since whether a block has
a name or not matters there. Like #4912, this makes us more regular in our
usage of block names.
|
| |
|
|
|
|
|
|
| |
Export an object with a `.value` property like the wasm JS API does
in browsers, and implement them with a getter and setter.
Fixes #4522
|
|
|
|
|
|
| |
Without this, the result in a build without assertions might be quite
confusing. See #4410
Also make the internal names more obviously internal names.
|
| |
|
|
|
|
|
|
| |
Also, fix bug where pointer was being used direcltly to
index into Int32Array. I suppose this code had basically
zero users until I tried to land this change in emscripten:
https://github.com/emscripten-core/emscripten/pull/15742
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the old hardcoded value numbering in that pass and makes
it use the new code that was split into helper code. The immediate benefit
of this is to make the code aware of identical constants: if two locals have
the same constant then they do not interfere. Future improvements to
numbering will also automatically help here.
This changes some constants in existing tests so that they keep testing
what they were testing before, and adds new tests for the new benefit here.
This implements a proposed TODO from #4314
|
|
|
|
|
|
| |
Its seems that with this emscripten change DCE is able to remove
the `assert` JS runtime function making this call to assert fail
with `ReferenceError: assert is not defined`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old algorithm can be summarized as: In each basic block, start at the beginning.
Each pair of live locals there might interfere with each other, as they might arrive from
different entry blocks with different values. Afterwards, go through the block and find
overlapping live ranges, and mark interferences there as well.
This is non-linear because at the start of the block we do a double-loop over all
pairs of live locals, which in general can be O(N^2) (N - number of locals). It also
has the downside of ignoring copies: if two locals have overlapping live ranges but
they must have identical values on those ranges, they do not actually interfere,
for example
x = 10;
y = x;
.. // live ranges overlap here
foo(x, y); // live ranges end here.
We can ignore this overlap since the copy shows they are identical there, but the
pass did not take this into account. To some extent other passes can remove such
copies (SimplifyLocals, MergeLocals, RedundantSetElimination), but in general
this was a weak spot for the optimizer.
I realized there is a solution to both these problems: In Wasm, given that we have
a default value for all locals, if a local is live at the start of a block then it must be
live at the end of all the blocks reaching it. That is so because the liveness will
extend backwards all the way to some set of the local, possibly all the way to
the zero-initialization at the start of the function, and it extends that way through
all predecessor blocks. A consequence of this is that there are no interferences
between locals that only occur during a merge: The live ranges include the
predecessor blocks, and theirs, and so forth, until we reach a block where one
of the locals is assigned a value different than the other. That is a necessary and
sufficient condition for intererence, and therefore when processing a block we
only need to look at its contents, and can ignore the merging of control flow,
which allows us to be linear.
More details on this and on the new algorithm in comments in the source, but
the basic idea is that it simply goes through each block in a linear way, finding
which values are assigned to each local (using a numbering of unique values),
and noting which are live at each time. If two locals are live and one is assigned
a value that is not the same as the value in the other, mark them as interfering.
This is of substantial benefit to j2wasm output, I believe because it is common
there to find local subexpression elimination opportunities after inlining, and
each time we find one we add a local. If we inline different functions into the
same target, we may end up with copied locals for each of them. (This was
not noticed in the past because it is very rare on LLVM output, which has
already had inlining and GVN etc. done.)
There is a small benefit to LLVM output as well, though just a few
percent at best. However, it is enough to be noticeable on some of
the code size tests.
This is also faster than the previous pass. It's normally not noticeable
as this pass is not one of the slowest anyhow, but I found some real-world
codebases where the pass becomes 50% faster. I have not found any
case where it is slower than the old algorithm.
Fuzzed over several days to be sure this is correct, and also verified
on the emscripten test suite.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Canonicalize:
(signed)x > -1 ==> x >= 0
(signed)x <= -1 ==> x < 0
(signed)x < 1 ==> x <= 0
(signed)x >= 1 ==> x > 0
(unsigned)x < 1 ==> x == 0
(unsigned)x >= 1 ==> x != 0
This should help #4265, and in general 0 is usually a more
common constant, and reasonable to canonicalize to.
|
|
|
|
|
|
|
|
|
|
|
|
| |
If all a select's inputs are boolean, we can sometimes turn the select
into an AND or an OR operation,
x ? y : 0 => x & y
x ? 1 : y => x | y
I believe LLVM aggressively canonicalizes to this form. It makes sense
to do here too as it is smaller (save the constant 0 or 1). It also allows
further optimizations (which is why LLVM does it) but I don't think we
have those yet.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable it in -O3 and -Os and higher.
This helps very little on output from LLVM, but also it does not alter
compile times much anyhow. On code that has not been run through
an optimizing compiler already, this can help quite a lot, e.g., 15% of
code size on some wasm GC samples.
This will not normally help with speed, as optimizing VMs do such
things anyhow. However, this can help baseline compilers and
interpreters and so forth.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #3973
Loads:
f32.reinterpret_i32(i32.load(x)) => f32.load(x)
f64.reinterpret_i64(i64.load(x)) => f64.load(x)
i32.reinterpret_f32(f32.load(x)) => i32.load(x)
i64.reinterpret_f64(f64.load(x)) => i64.load(x)
Stores:
f32.store(y, f32.reinterpret_i32(x)) => i32.store(y, x)
f64.store(y, f64.reinterpret_i64(x)) => i64.store(y, x)
i32.store(y, i32.reinterpret_f32(x)) => f32.store(y, x)
i64.store(y, i64.reinterpret_f64(x)) => f64.store(y, x)
Also optimize reinterprets that are undone:
i32.reinterpret_f32(f32.reinterpret_i32(x)) => x
i64.reinterpret_f64(f64.reinterpret_i64(x)) => x
f32.reinterpret_i32(i32.reinterpret_f32(x)) => x
f64.reinterpret_i64(i64.reinterpret_f64(x)) => x
|