| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Spec for it is here:
https://docs.google.com/document/d/1DklC3qVuOdLHSXB5UXghM_syCh-4cMinQ50ICiXnK3Q/edit#
Also reorder some things in wasm.h that were not in the canonical order (that has
no effect, but it is confusing to read).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example,
(if (result i32)
(local.get $x)
(return
(local.get $y)
)
(return
(local.get $z)
)
)
If we move the returns outside we'd become unreachable, but we should
not make such type changes in this pass (they are handled by DCE and
Vacuum).
(found by the fuzzer)
|
| |
|
|
|
|
|
|
|
| |
We tried to ignore unreachable code, but only checked the type of
the entire node. But an arm might be unreachable, and after moving
code around that requires more work to update the type. But such
cases are best left to DCE anyhow, so just check for any unreachability
and stop there.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Effects are fine in the moved code, if we are doing so on an if
(which runs just one arm anyhow).
Allow unreachable, which lets us hoist returns for example.
Allow none, which lets us hoist drop and call for example. For
this we also need to be careful with subtyping, as at least drop
is polymorphic, so the child types may not have an LUB (see
example in code).
Adds a small ShallowEffectAnalyzer child of EffectAnalyzer that
calls visit to just do a shallow analysis (instead of walk which
walks the children).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(select
(foo
(X)
)
(foo
(Y)
)
(condition)
)
=>
(foo
(select
(X)
(Y)
(condition)
)
)
To make this simpler, refactor optimizeTernary to be templated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(select
(i32.eqz (X))
(i32.const 0|1)
(Y)
)
=>
(i32.eqz
(select
(X)
(i32.const 1|0)
(Y)
)
)
This is beneficial as the eqz may be folded into something on the outside. I
see this pattern in real-world code, both a GC benchmark (which is why I
noticed it) and it shrinks code size by tiny amounts on the emscripten
benchmark suite as well.
|
|
|
|
| |
In both cases doing the ref.as_non_null last is beneficial as we have
optimizations that can remove it based on where it is consumed.
|
|
|
|
|
| |
* Note that ref.cast has a fallthrough value.
* Optimize ref.eq on identical inputs.
|
|
|
|
|
|
| |
ref.as_non_null is not needed if the value flows into a place that traps
on null anyhow. We replace a trap on one instruction with a trap on
another, but we allow such things (and even changing trap types, which
does not happen here).
|
|
|
|
|
|
|
| |
If we are ignoring implicit traps, and if the cast is from a subtype to a supertype,
then we ignore the possible RTT-related inconsistency and can just drop the
cast.
See #3636
|
|
|
|
|
|
|
|
|
| |
This is similar to the optimization of BrOn in #3719 and #3724. When the
type tells us the kind of input we have, we can tell at compile time what
result we'll get, like ref.is_func of something with type (ref func) will
always return 1, etc.
There are some code size and perf tradeoffs that should be looked into
more and are marked as TODOs.
|
|
|
|
|
|
|
|
|
| |
This was noticed by samparker on LLVM:
https://reviews.llvm.org/D99171
This is apparently a pattern LLVM emits, and doing it there helps by 1-2%
on the real-world Bullet Physics codebase. Seems worthwhile doing here
as well.
|
|
|
| |
Same as we already do for struct.set.
|
|
|
|
|
|
|
| |
(#3680)
When storing to an i8, we can ignore any higher bits, etc.
Adds a getByteSize utility to Field to make this convenient.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of a single big optimize() method we now use separate functions
per instruction. This gives us smaller functions and less nesting in some cases,
and avoids manually casting and checking etc.
The reason this was not done originally is that this pass does repeated
applications. That is, if optimize() changed something, it would run again
on the result, perhaps further optimizing it. It did not need to run on the
children, but just on the result itself, so it didn't just do another full walk,
and so the simplest way was to just do a loop on optimize(). To replace that,
this PR modifies replaceCurrent() which the methods now call to report
that the current node can be replaced. There is some code in there now that
keeps doing more processing while changes happen. It's not trivial code as
it avoids recursion, but that slight complexity seems worthwhile in order to
simplify the bulk of the (very large) pass.
|
|
|
|
|
|
|
| |
Since in principle an unreachable expression can be used in any position. An
exception to this rule is in OptimizeInstructions, which avoids replacing
concrete expressions with unreachable expressions so that it doesn't need to
refinalize any expressions. Notably, Type::getLeastUpperBound was already
treating unreachable as the bottom type.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This updates `try`-`catch`-`catch_all` and `rethrow` instructions to
match the new spec. `delegate` is not included. Now `Try` contains not a
single `catchBody` expression but a vector of catch
bodies and events.
This updates most existing routines, optimizations, and tests modulo the
interpreter and the CFG traversal. Because the interpreter has not been
updated yet, the EH spec test is temporarily disabled in check.py. Also,
because the CFG traversal for EH is not yet updated, several EH tests in
`rse_all-features.wast`, which uses CFG traversal, are temporarily
commented out.
Also added a few more tests in existing EH test functions in
test/passes. In the previous spec, `catch` was catching all exceptions
so it was assumed that anything `try` body throws is caught by its
`catch`, but now we can assume the same only if there is a `catch_all`.
Newly added tests test cases when there is a `catch_all` and cases there
are only `catch`es separately.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code there looks for a "sign-extend": (x << a) >> b where the
right shift is signed. If a = b = 24 for example then that is a sign
extend of an 8-bit value (it works by shifting the 8-bit value's sign bit
to the position of the 32-bit value's sign bit, then shifting all the way
back, which fills everything above 8 bits with the sign bit). The tricky
thing is that in some cases we can handle a != b - but we forgot a
place to check that. Specifically, a repeated sign-extend is not
necessary, but if the outer one has extra shifts, we can't do it.
This is annoyingly complex code, but for purposes of reviewing this
PR, you can see (unless I messed up) that the only change is to
ensure that when we look for a repeated sign extend, then we
only optimize that case when there are no extra shifts. And a
repeated sign-extend is obviously ok to remove,
(((x << a) >> a) << a) >> a => (x << a) >> a
This is an ancient bug, showing how hard it can be to find certain
patterns either by fuzzing or in the real world...
Fixes #3362
|
|
|
|
| |
values (#3399)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bugs (#3401)
* Count signatures in tuple locals.
* Count nested signature types (confirming @aheejin was right, that was missing).
* Inlining was using the wrong type.
* OptimizeInstructions should return -1 for unhandled types, not error.
* The fuzzer should check for ref types as well, not just typed function references,
similar to what GC does.
* The fuzzer now creates a function if it has no other option for creating a constant
expression of a function type, then does a ref.func of that.
* Handle unreachability in call_ref binary reading.
* S-expression parsing fixes in more places, and add a tiny fuzzer for it.
* Switch fuzzer test to just have the metrics, and not print all the fuzz output which
changes a lot. Also fix noprint handling which only worked on binaries before.
* Fix Properties::getLiteral() to use the specific function type properly, and make
Literal's function constructor require that, to prevent future bugs.
* Turn all input types into nullable types, for now.
|
|
|
| |
The vacuum code can be deleted as it is handled by the default anyhow.
|
|
|
|
| |
See discussion in #3303
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X - Y <= 0
=>
X <= Y
That is true mathematically, but not in the case of an overflow, e.g.
X=10, Y=0x8000000000000000. X - Y is a negative number, so
X - Y <= 0 is true. But it is not true that X <= Y (as Y is negative, but
X is not).
See discussion in #3303 (comment)
The actual regression was in #3275, but the fuzzer had an easier time
finding it due to #3303
|
|
|
|
|
|
| |
bool(i32(x) % C_pot) -> bool(i32(x) & (C_pot - 1))
bool(i32(x) % min_s) -> bool(i32(x) & max_s)
For all other situations we already do this for (i32|i64).rem_s
|
|
|
|
|
|
|
| |
Using addition in more places is better for gzip, and helps simplify the
optimizer as well.
Add a FinalOptimizer phase to do optimizations like our signed LEB tweaks, to
reduce binary size in the rare case when we do want a subtraction.
|
| |
|
|
|
| |
Move the checks for most unoptimizable expression types out into visitExpression and simplify some other code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can still make x * -1.0 cheaper for non-fastMath mode as:
x * -1.0 -> -0.0 - x
Should at least help baseline compilers.
Also it could enable further optimizations, e.g.:
a + b * -1
a + (-0.0 - b)
(a - 0.0) - b
a - b
|
| |
|
|
|
|
|
| |
`C1 - (x + C2)` -> `(C1 - C2) - x`
`C1 - (x - C2)` -> `(C1 + C2) - x`
`C1 - (C2 - x)` -> `x + (C1 - C2)`
|
| |
|
|
|
|
|
|
|
|
| |
Fixes a fuzz bug that was triggered by
https://github.com/WebAssembly/binaryen/pull/3015#issuecomment-718001620
but was actually a pre-existing bug in pow2, that that PR just happened
to uncover.
|
| |
|
|
|
| |
But only when doing so doesn't require adding a new local.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
This change makes matchers in OptimizeInstructions more compact and readable by
removing the explicit `Abstract::` namespace from individual operations. In some
cases, this makes multi-line matcher expressions fit on a single line.
This change is only possible because it also adds an explicit "RMW" prefix to
each element of the `AtomicRMWOp` enumeration. Without that, their names
conflicted with the names of Abstract ops.
|
| |
|
|
|
|
|
| |
Extend ZeroRemover and optimizeAddedConstants to handle 64-bit integers as well.
Use Literal.makeFromInt64 to make this easier.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
i32(bool(x)) != 0 ==> i32(bool(x))
i64(bool(x)) & 1 ==> i64(bool(x))
Also:
* clean up related matching rules in optimizeWithConstantOnRight
* add more explanations about isPowerOf2Float & rename to
isPowerOfTwoInvertibleFloat
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
`(uint32_t)x / C` --> `x >= C`, where `C > 2^31`
`(uint32_t)x / -1` --> `x != -1`
and for `shrinkLevel == 0`:
`(uint64_t)x / C` --> `uint64_t(x >= C)`, where `C > 2^63`
`(uint64_t)x / -1` --> `x != -1`
|
|
|
| |
When there are two versions of a function, one handling tuples and the other handling non-tuple values, the previous naming convention was to have "Single" in the name of the non-tuple handling function. This PR simplifies the convention and shortens function names by making the names plural for the tuple-handling version and singular for the non-tuple-handling version.
|
|
|
|
|
|
| |
Wasm turned out to not be that good for a DSL for such peephole optimizations,
so that never made progress. Meanwhile we have the new matcher stuff which
works well.
|
|
|
| |
Specifically, truncates constant shift values that are greater than the number of bits available and optimizes out explicit masking of the shift value that is redundant with the implicit masking performed by shift operations.
|