| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#4985)
x + nan -> nan'
x - nan -> nan'
x * nan -> nan'
x / nan -> nan'
min(x, nan) -> nan'
max(x, nan) -> nan'
where nan' is canonicalized nan of rhs
x != nan -> 1
x == nan -> 0
x >= nan -> 0
x <= nan -> 0
x > nan -> 0
x < nan -> 0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
E.g.
x + C1 > C2 ==> x > (C2-C1)
We do need to be careful of overflows in either the add on the left or
the proposed subtract on the right. In the latter case, we can at least do
x + C1 > C2 ==> x + (C1-C2) > 0
Helps #5008 (but more patterns remain).
Found by the superoptimizer #4994. This was the top suggestion for Java and Dart.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
E.g. if we just do addition etc., then any higher bits will be wrapped out anyhow:
int32_t(int64_t(x) + int64_t(10))
=>
x + int32_t(10)
Found by the superoptimizer #4994 . This is by far the most promising suggestion it
had. Interestingly, it mainly helps Go, where it removes 20% of all Unary operations
(the extends and wraps), and Rust, where it removes 3%.
Helps #5004. This handles the common cases I see in the superoptimizer output, but
there are more that could be handled.
|
|
|
|
|
|
|
|
|
|
|
|
| |
shifts and same constant (#4996)
(x >> C) << C -> x & -(1 << C)
(x >>> C) << C -> x & -(1 << C)
(x << C) >>> C -> x & (-1 >>> C)
// (x << C) >> C doesn't support
Found by the superoptimizer #4994
Fixes #5012
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x ? 0 : y ==> z & y where z = !x
x ? y : 1 ==> z | y where z = !x
Only do this when we have z = !x, that is, we can invert x without adding
an actual eqz (which would add work).
To do this, canonicalize selects to prefer to flip the arms, when
possible, if it would move a constant to a location that the existing
optimizations already turn into an and/or. That is,
x >= 5 ? 0 : y != 42
would be canonicalized into
x < 5 ? y != 42 : 0
and existing opts turn that into
(x < 5) & (y != 42)
The canonicalization does not always help this optimization, as we need
the values to be boolean to do this, but canonicalizing is still nice to get
more regular code which might compress slightly better.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An overview of this is in the README in the diff here (conveniently, it is near the
top of the diff). Basically, we fix up nn locals after each pass, by default. This keeps
things easy to reason about - what validates is what is valid wasm - but there are
some minor nuances as mentioned there, in particular, we ignore nameless blocks
(which are commonly added by various passes; ignoring them means we can keep
more locals non-nullable).
The key addition here is LocalStructuralDominance which checks which local
indexes have the "structural dominance" property of 1a, that is, that each get has
a set in its block or an outer block that precedes it. I optimized that function quite
a lot to reduce the overhead of running that logic after each pass. The overhead
is something like 2% on J2Wasm and 0% on Dart (0%, because in this mode we
shrink code size, so there is less work actually, and it balances out).
Since we run fixups after each pass, this PR removes logic to manually call the
fixup code from various places we used to call it (like eh-utils and various passes).
Various passes are now marked as requiresNonNullableLocalFixups => false.
That lets us skip running the fixups after them, which we normally do automatically.
This helps avoid overhead. Most passes still need the fixups, though - any pass
that adds a local, or a named block, or moves code around, likely does.
This removes a hack in SimplifyLocals that is no longer needed. Before we
worked to avoid moving a set into a try, as it might not validate. Now, we just do it
and let fixups happen automatically if they need to: in the common code they
probably don't, so the extra complexity seems not worth it.
Also removes a hack from StackIR. That hack tried to avoid roundtrip adding a
nondefaultable local. But we have the logic to fix that up now, and opts will
likely keep it non-nullable as well.
Various tests end up updated here because now a local can be non-nullable -
previous fixups are no longer needed.
Note that this doesn't remove the gc-nn-locals feature. That has been useful for
testing, and may still be useful in the future - it basically just allows nn locals in
all positions (that can't read the null default value at the entry). We can consider
removing it separately.
Fixes #4824
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To unblock some optimizations. For example this:
```wat
(select
(i32.eqz (local.get $x))
(i32.const 0)
(i32.eqz (local.get $y))
)
```
Was previously optimized as:
```wat
(i32.eqz
(select
(i32.const 1)
(local.get $x)
(local.get $y)
)
)
```
Because `optimizeSelect` applied `!x ? !y : 0 -> x ? 0 : !y` then `!(x ? 1 : y)`, blocking the next rules which could have folded this to `or` or `and`.
After this PR the same example optimizes better:
```wat
(i32.eqz
(i32.or
(local.get $x)
(local.get $y)
)
)
```
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A continuation of #4272.
```
(signed)x < s_min + 1 ==> x == s_min
(signed)x >= s_min + 1 ==> x != s_min
(signed)x > s_max - 1 ==> x == s_max
(signed)x <= s_max - 1 ==> x != s_max
(unsigned)x <= u_max - 1 ==> x != u_max
(unsigned)x > u_max - 1 ==> x == u_max
```
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fuzzing with TrapsNeverHappen found a bug, and then reading similar code
I found another, where we check structural equality but ignored effects. Some
things look equal but may have different values at runtime:
(foo
(call $x)
(call $y)
)
The arms are identical structurally but due to side effects may not be identical
in value.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
casts (#4720)
i32 -> f64 -> i32 rountripping optimizations:
```rust
i32(f64(i32(x))) -> x // i32.trunc(_sat)_f64_s(f64.convert_i32_s(x)) -> x
u32(f64(u32(x))) -> x // i32.trunc(_sat)_f64_u(f64.convert_i32_u(x)) -> x
// note assymetric signed / unsigned or unsigned / signed cases can't be simplified in similar way
```
and rounding after integer to float casts:
```rust
ceil(float(int(x))) -> float(int(x))
floor(float(int(x))) -> float(int(x))
trunc(float(int(x))) -> float(int(x))
nearest(float(int(x))) -> float(int(x))
```
where `float = f32 | f64`, `int = i32 | i64 | u32 | u64`
|
|
|
|
|
|
|
| |
This PR removes the single memory restriction in IR, adding support for a single module to reference multiple memories. To support this change, a new memory name field was added to 13 memory instructions in order to identify the memory for the instruction.
It is a goal of this PR to maintain backwards compatibility with existing text and binary wasm modules, so memory indexes remain optional for memory instructions. Similarly, the JS API makes assumptions about which memory is intended when only one memory is present in the module. Another goal of this PR is that existing tests behavior be unaffected. That said, tests must now explicitly define a memory before invoking memory instructions or exporting a memory, and memory names are now printed for each memory instruction in the text format.
There remain quite a few places where a hardcoded reference to the first memory persist (memory flattening, for example, will return early if more than one memory is present in the module). Many of these call-sites, particularly within passes, will require us to rethink how the optimization works in a multi-memories world. Other call-sites may necessitate more invasive code restructuring to fully convert away from relying on a globally available, single memory pointer.
|
|
|
|
|
|
| |
eqz(eqz(i32(x))) -> i32(x) != 0
eqz(eqz(i64(x))) -> i64(x) != 0
Only when shrinkLevel == 0 (prefer speed over binary size).
|
|
|
|
|
|
|
| |
RTTs were removed from the GC spec and if they are added back in in the future,
they will be heap types rather than value types as in our implementation.
Updating our implementation to have RTTs be heap types would have been more work
than deleting them for questionable benefit since we don't know how long it will
be before they are specced again.
|
|
|
|
|
|
|
|
|
|
|
|
| |
+ Move these rules to separate function;
+ Refactor them to use matches;
+ Add comments;
+ Handle rotational shifts as well;
+ Handle overflows for `<<`, `>>`, `>>>` shifts;
+ Add mixed rotate rules:
```rust
rotl(rotr(x, C1), C2) => rotr(x, C1 - C2)
rotr(rotl(x, C1), C2) => rotl(x, C1 - C2)
```
|
|
|
|
|
|
| |
constants on RHS (#4808)
(x * C1) << C2 -> x * (C1 << C2)
(x << C1) * C2 -> x * (C2 << C1)
|
|
|
|
|
|
|
|
|
| |
Basic reference types like `Type::funcref`, `Type::anyref`, etc. made it easy to
accidentally forget to handle reference types with the same basic HeapTypes but
the opposite nullability. In principle there is nothing special about the types
with shorthands except in the binary and text formats. Removing these shorthands
from the internal type representation by removing all basic reference types
makes some code more complicated locally, but simplifies code globally and
encourages properly handling both nullable and non-nullable reference types.
|
|
|
|
|
| |
For them to be the same we must have a value that can appear on both
sides. If the heap types disallow that, then only null is possible, and if
that is impossible as well then the result must be 0.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This marks all reference operations that return 0/1 as doing so. This
allows various bitwise operations to be optimized on them.
This also marks StringEq as a boolean, though we can't test that fully yet
as Strings support is wip (no interpreter or other stuff yet).
As a driveby this moves emitsBoolean to its own file, and uses it
in getMaxBits to avoid redundancy (the redundant code paths now have
a WASM_UNREACHABLE).
|
|
|
| |
This just moves code around and adds comments.
|
|
|
|
|
|
|
|
|
|
|
| |
(#4749)
(ref.eq
(local.tee $x (..))
(local.get $x)
)
That will definitely return 1. Before this PR the side effects of tee stopped us
from optimizing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
#4748 regressed us in some cases, because it removed casts first:
(ref.is_func
(ref.as_func
(local.get $anyref)))
If the cast is removed first, and the local has no useful type info, then
we'd have removed the cast but could not remove the ref.is. But
the ref.is could be optimized to 1, as it must be a func - the type
info proves it thanks to the cast. To avoid this, remove casts after
everything else.
|
|
|
|
|
|
| |
(#4748)
Comparing references does not depend on the cast, so if we are ignoring
traps in traps-never-happen mode then we can remove them.
|
|
|
|
|
|
| |
Similar to #4004 but for 32-bit integers
i32(x) << 24 >> 24 ==> i32.extend8_s(x)
i32(x) << 16 >> 16 ==> i32.extend16_s(x)
|
|
|
|
|
|
| |
calls (#4660)
This extends the existing call_indirect code to do the same for call_ref,
basically. The shared code is added to a new helper utility.
|
|
|
|
|
|
|
|
|
| |
Casts can replace a type with a subtype, which normally has no downsides, but
in a corner case of struct types it can lead to us needing to refinalize higher up
too, see details in the comment.
We have avoided any Refinalize calls in OptimizeInstructions, but the case
handled here requires it sadly. I considered moving it to another pass, but this
is a peephole optimization so there isn't really a better place.
|
|
|
|
|
|
|
|
|
|
|
| |
(#4399)
Final part of #4265
(i32(x) >= 0) & (i32(y) >= 0) ==> i32(x | y) >= 0
(i64(x) >= 0) & (i64(y) >= 0) ==> i64(x | y) >= 0
(i32(x) == -1) & (i32(y) == -1) ==> i32(x & y) == -1
(i64(x) == -1) & (i64(y) == -1) ==> i64(x & y) == -1
|
|
|
|
|
|
|
|
|
| |
(#4372)
(i32(x) >= 0) | (i32(y) >= 0) ==> i32(x & y) >= 0
(i64(x) >= 0) | (i64(y) >= 0) ==> i64(x & y) >= 0
(i32(x) != -1) | (i32(y) != -1) ==> i32(x & y) != -1
(i64(x) != -1) | (i64(y) != -1) ==> i64(x & y) != -1
|
|
|
|
|
| |
without "ignoreImplicitTraps" (#4295)" (#4459)
This reverts commit 5cf3521708cfada341285414df2dc7366d7e5454.
|
|
|
|
| |
"ignoreImplicitTraps" (#4295)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`ref.cast` can be statically removed when the ref's type is a subtype of
the intended RTT type and either of `--ignore-implicit-traps` or
`--traps-never-happen` is given: https://github.com/WebAssembly/binaryen/blob/083ab9842ec3d4ca278c95e1a33112ae7cd4d9e5/src/passes/OptimizeInstructions.cpp#L1603-L1624
Some more context: https://github.com/WebAssembly/binaryen/pull/4097#discussion_r694456784
But this can create a block in which a `pop` is nested, which makes the
`catch` invalid. The test in this PR is the same as the example given by
@kripken in #4237. This calls the fixup function
`EHUtils::handleBlockNestedPops` at the end of the pass to fix this.
Also, because this pass creates a lot of blocks in other patterns, I
think it is possible there can be other patterns to cause this kind of
`pop` nesting.
|
|
|
|
|
|
| |
(#4339)
(i32(x) < 0) & (i32(y) < 0) ==> i32(x & y) < 0
(i64(x) < 0) & (i64(y) < 0) ==> i64(x & y) < 0
|
|
|
|
|
|
|
|
| |
(#4338)
(i32(x) < 0) | (i32(y) < 0) ==> i32(x | y) < 0
(i32(x) != 0) | (i32(y) != 0) ==> i32(x | y) != 0
Likewise for i64.
|
|
|
|
|
|
| |
PassOptions is a fairly large structure and even includes a std::map. I also
have plans to add further fields there to make it even larger. Before doing that
I noticed that in some places we copy it instead of being consistent and taking
it by reference, which this PR fixes.
|
|
|
|
|
|
|
|
|
|
|
|
| |
With nominal function types, this change makes it so that we preserve the
identity of the function type used with call_indirect instructions rather than
recreating a function heap type, which may or may not be the same as the
originally parsed heap type, from the function signature during module writing.
This will simplify the type system implementation by removing the need to store
a "canonical" nominal heap type for each unique signature. We previously
depended on those canonical types to avoid creating multiple duplicate function
types during module writing, but now we aren't creating any new function types
at all.
|
|
|
|
|
|
| |
(#4336)
(i32(x) != 0) | (i32(y) != 0) ==> i32(x | y) != 0
(i64(x) != 0) | (i64(y) != 0) ==> i64(x | y) != 0
|
|
|
|
|
|
| |
(#4333)
(i32(x) == 0) & (i32(y) == 0) ==> i32(x | y) == 0
(i64(x) == 0) & (i64(y) == 0) ==> i64(x | y) == 0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#4307)
i64.extend_i32_u(i32.load8_u(x)) -> i64.load8_u(x)
i64.extend_i32_u(i32.load16_u(x)) -> i64.load16_u(x)
i64.extend_i32_s(i32.load8_u(x)) -> i64.load8_u(x)
i64.extend_i32_s(i32.load16_u(x)) -> i64.load16_u(x)
i64.extend_i32_s(i32.load8_s(x)) -> i64.load8_s(x)
i64.extend_i32_s(i32.load16_s(x)) -> i64.load16_s(x)
i64.extend_i32_u(i32.load(x))) -> i64.load32_u(x)
i64.extend_i32_s(i32.load(x))) -> i64.load32_s(x)
don't apply to
i64.extend_i32_u(i32.load8_s(x)) -> skip
i64.extend_i32_u(i32.load16_s(x)) -> skip
i64.extend_i32_s(i32.atomic.load(x)) -> skip
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We found one cast that has another as its input, and forgot that
the child was possibly a fallthrough value. That is, there might be more
code that needs to be kept around.
Rather than fix the middle of the three cases there - the one with
HeapType::isSubType(childIntendedType, intendedType) - I
noticed it is not actually needed. That case checks if the child's
type is more specific than the parent's, and if so, then the parent
is not needed. But we already handle that earlier above in the
same function: regardless of what the child of a static cast is,
if the cast is static and the input is the proper type already, the
cast is unneeded (see lines 1565-1566).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Canonicalize:
(signed)x > -1 ==> x >= 0
(signed)x <= -1 ==> x < 0
(signed)x < 1 ==> x <= 0
(signed)x >= 1 ==> x > 0
(unsigned)x < 1 ==> x == 0
(unsigned)x >= 1 ==> x != 0
This should help #4265, and in general 0 is usually a more
common constant, and reasonable to canonicalize to.
|
|
|
| |
Fuzzing followup to #4244.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimizes this type of pattern:
(local.set $x (struct.new X Y Z))
(struct.set (local.get $x) X')
=>
(local.set $x (struct.new X' Y Z))
Note how the struct.set is removed, and X' moves to where X was.
This removes almost 90% (!) of the struct.sets in j2wasm output, which reduces
total code size by 2.5%. However, I see no speedup with this - I guess that either
this is not on the hot path, or V8 optimizes it well already, or the CPU is making
stores "free" anyhow...
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
patterns (#4181)
i32(x) ? i32(x) : 0 ==> x
i32(x) ? 0 : i32(x) ==> {x, 0}
i64(x) == 0 ? 0 : i64(x) ==> x
i64(x) != 0 ? i64(x) : 0 ==> x
i64(x) == 0 ? i64(x) : 0 ==> {x, 0}
i64(x) != 0 ? 0 : i64(x) ==> {x, 0}
|
|
|
| |
Rather than load from the table and call that reference, call using the table.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that they are all implemented, we can optimize them. This removes the
big if that ignored static operations, and implements things for them.
In general this matches the existing rtt-using case, but there are a few things
we can do better, which this does:
* A cast of a subtype to a type always succeeds.
* A test of a subtype to a type is always 1 (if non-nullable).
* Repeated static casts can leave just the most demanding of them.
|
|
|
|
|
|
|
|
|
|
|
|
| |
We added an optional ReFinalize in OptimizeInstructions at some point,
but that is not valid: The ReFinalize only updates types when all other
works is done, but the pass works incrementally. The bug the fuzzer found
is that a child is changed to be unreachable, and then the parent is
optimized before finalize() is called on it, which led to an assertion being
hit (as the child was unreachable but not the parent, which should also
be).
To fix this, do not change types in this pass. Emit an extra block with a
declared type when necessary. Other passes can remove the extra block.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
These variants take a HeapType that is the type we intend to cast to,
and do not take an RTT.
These are intended to be more statically optimizable. For now though
this PR just implements the minimum to get them parsing and to get
through the optimizer without crashing.
Spec: https://docs.google.com/document/d/1afthjsL_B9UaMqCA5ekgVmOm75BVFu6duHNsN9-gnXw/edit#
See #4149
|
|
|
|
|
|
|
|
|
|
| |
If we can remove such traps, we can remove ref.as_non_null if the local
type is nullable anyhow.
If we support non-nullable locals, however, then do not do this, as it
could inhibit specializing the local type later. Do the same for tees which
we had existing code for.
Background: #4061 (comment)
|
|
|
|
|
|
|
|
|
| |
That PR reused the same node twice in the output, which fails on the
assertion in BINARYEN_PASS_DEBUG=1 mode.
No new test is needed because the existing test suite fails already
in that mode. That the PR managed to land seems to say that we are
not testing pass-debug mode on our lit tests, which we need to
investigate.
|