| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
(#4749)
(ref.eq
(local.tee $x (..))
(local.get $x)
)
That will definitely return 1. Before this PR the side effects of tee stopped us
from optimizing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
#4748 regressed us in some cases, because it removed casts first:
(ref.is_func
(ref.as_func
(local.get $anyref)))
If the cast is removed first, and the local has no useful type info, then
we'd have removed the cast but could not remove the ref.is. But
the ref.is could be optimized to 1, as it must be a func - the type
info proves it thanks to the cast. To avoid this, remove casts after
everything else.
|
|
|
|
|
|
| |
(#4748)
Comparing references does not depend on the cast, so if we are ignoring
traps in traps-never-happen mode then we can remove them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Updating wasm.h/cpp for DataSegments
* Updating wasm-binary.h/cpp for DataSegments
* Removed link from Memory to DataSegments and updated module-utils, Metrics and wasm-traversal
* checking isPassive when copying data segments to know whether to construct the data segment with an offset or not
* Removing memory member var from DataSegment class as there is only one memory rn. Updated wasm-validator.cpp
* Updated wasm-interpreter
* First look at updating Passes
* Updated wasm-s-parser
* Updated files in src/ir
* Updating tools files
* Last pass on src files before building
* added visitDataSegment
* Fixing build errors
* Data segments need a name
* fixing var name
* ran clang-format
* Ensuring a name on DataSegment
* Ensuring more datasegments have names
* Adding explicit name support
* Fix fuzzing name
* Outputting data name in wasm binary only if explicit
* Checking temp dataSegments vector to validateBinary because it's the one with the segments before we processNames
* Pass on when data segment names are explicitly set
* Ran auto_update_tests.py and check.py, success all around
* Removed an errant semi-colon and corrected a counter. Everything still passes
* Linting
* Fixing processing memory names after parsed from binary
* Updating the test from the last fix
* Correcting error comment
* Impl kripken@ comments
* Impl tlively@ comments
* Updated tests that remove data print when == 0
* Ran clang format
* Impl tlively@ comments
* Ran clang-format
|
|
|
|
| |
Spec and VM support for that is not yet stable (atm VMs do not allow complex user-
defined types to be passed around).
|
|
|
|
| |
Otherwise when a type is only used on a global, it will be incorrectly omitted
from the output.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In GSI we look for a read of a global in a situation like this:
$global1: value1
$global2: value2
(struct.get $Type (ref))
If global inference shows this get must be of either $global1 or $global2, then we
can optimize to this:
(ref) == $global1 ? value1 : value2
We focus on the case of two values because 1 is handled by other passes, and >2
makes the tradeoffs less clear.
However, a simple extension is the case where there are more than 2 globals, but
there are only two values, and one value is unique to one global:
$global1: valueA
$global2: valueB
$global3: valueA
=>
(ref) == $global2 ? valueB : valueA
We can still use a single comparison here, on the global that has the
unique value. Then the else will handle all the other globals.
This increases the cases that GSI can optimize J2Wasm output by over 50%.
|
|
|
|
|
|
| |
Similar to #4004 but for 32-bit integers
i32(x) << 24 >> 24 ==> i32.extend8_s(x)
i32(x) << 16 >> 16 ==> i32.extend16_s(x)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimizes constants in the megamorphic case of two: when we
know two function references are possible, we could in theory emit this:
(select
(ref.func A)
(ref.func B)
(ref.eq
(..ref value..) ;; globally, only 2 things are possible here, and one has
;; ref.func A as its value, and the other ref.func B
(ref.func A))
That is, compare to one of the values, and emit the two possible values there.
Other optimizations can then turn a call_ref on this select into an if over
two direct calls, leading to devirtualization.
We cannot compare a ref.func directly (since function references are not
comparable), and so instead we look at immutable global structs. If we
find a struct type that has only two possible values in some field, and
the structs are in immutable globals (which happens in the vtable case
in j2wasm for example), then we can compare the references of the struct
to decide between the two values in the field.
|
|
|
|
|
|
|
|
|
| |
SimplifyLocals (#4705)
Followup to #4703, this also handles the case where there is a non-
nullable local.set in the value of a nullable one, which we also cannot
optimize.
Fixes #4702
|
|
|
|
|
|
|
| |
Binaryen will not change dominance in SimplifyLocals, however, the current spec's
notion of dominance is simpler than ours, and we must not optimize certain cases in
order to still validate. See details in the comment and test.
Helps #4702
|
|
|
|
|
|
| |
calls (#4660)
This extends the existing call_indirect code to do the same for call_ref,
basically. The shared code is added to a new helper utility.
|
|
|
|
|
|
|
|
|
|
| |
Optionally avoid updating types in TypeUpdating::updateParamTypes(). That update
is incomplete if the function signature is also changing, which is the case in
SignatureRefining (but not DeadArgumentElimination). "Incomplete" means that
we updated the local.get type, but the function signature does not match yet. That
incomplete state can hit an internal error in GlobalTypeRewriter::updateSignatures
where it updates types. To avoid that, do the entire full update only there (in
GlobalTypeRewriter::updateSignatures).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we could return different results depending on the order we
noted things:
note(anyref.null);
note(funcref.null);
get() => anyref.null
note(funcref.null);
note(anyref.null);
get() => funcref.null
This is correct, as nulls are equal anyhow, and any could be used in
the location we are optimizing. However, it can lead to nondeterminism
if the caller's order of notes is nondeterministic. That is the case in
DeadArgumentElimination, where we scan functions in parallel, then
merge them without special ordering.
To fix this, make the note operation symmetric. That seems simplest and
least likely to be confusing. We can use the LUB to do that.
To avoid duplicating the null logic, refactor note() to use combine().
|
|
|
|
|
| |
Casts involve branches in the VM, so adding a cast in return for removing a branch
(like If=>Select) is not beneficial. We don't want to ever do any more casts than we
already are.
|
|
|
|
|
|
|
| |
Do not prune parameters if there is a supertype that is a signature.
Without this we crash on an assertion in TypeBuilder when we try to
recreate the types (as we try to make a a subtype with fewer fields
than the super).
|
| |
|
|
|
|
|
|
| |
Remove `Type::externref` and `HeapType::ext` and replace them with uses of
anyref and any, respectively, now that we have unified these types in the GC
proposal. For backwards compatibility, continue to parse `extern` and
`externref` and maintain their relevant C API functions.
|
|
|
|
|
|
| |
V8 requires that supertypes come before subtypes when it parses
isorecursive (i.e. standards-track) type definitions. Since 2268f2a we are
emitting nominal types using the standard isorecursive format, so respect the
ordering requirement.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We assume a closed world atm in the GC space, but the call.without.effects
intrinsic sort of breaks that: that intrinsic looks like an import, but we really
need to care about what is sent to it even in a closed world:
(call $call-without-effects
(ref.func $target-keep)
)
That reference cannot be ignored, as logically it is called just as if there
were a call_ref there. This adds support for that, fixing the combination of
#4621 and using call.without.effects.
Also flip the vector of ref.func names to a set. I realized that in a very
large program we might see the same name many times.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we see (ref.func $foo) that does not mean that $foo is reachable - we
must also see a (call_ref ..) of the proper type. Only after seeing both should
we mark the function as reachable, which this PR does.
This adds some complexity as we need to track intermediate state as we go,
since we could see the RefFunc before the CallRef or vice versa. We also
need to handle the case of a RefFunc without a CallRef properly: We cannot
remove the function, as the RefFunc must refer to it, but at least we can
empty out the body since we know it is never reached.
This removes an old wasm-opt test which is now superseded by a new lit
test.
On J2Wasm output this removes 3% of all functions, which account for
2.5% of total code size.
|
|
|
|
|
|
|
|
|
| |
Casts can replace a type with a subtype, which normally has no downsides, but
in a corner case of struct types it can lead to us needing to refinalize higher up
too, see details in the comment.
We have avoided any Refinalize calls in OptimizeInstructions, but the case
handled here requires it sadly. I considered moving it to another pass, but this
is a peephole optimization so there isn't really a better place.
|
|
|
| |
This hits the fuzzer when it tries to call reference exports with a null.
|
|
|
|
|
|
|
|
|
|
|
| |
The cast instruction may be unreachable but the intended type for the cast
still needs to be collected. Otherwise we end up with problems both during
optimizations that look at heap types and in printing (which will use the heap
type in code but not declare it).
Diff without whitespace is much smaller: this just moves code around so
that we can use a template to avoid code duplication. The actual change
is just to scan ->intendedType unconditionally, and not ignore it if the
cast is unreachable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a field has no reads, we remove all its writes, but we did this:
(struct.set $foo A B)
=>
(drop A) (drop B)
We also need to trap if A, the reference, is null, which this PR
fixes,
(struct.set $foo A B)
=>
(drop (ref.as_non_null A)) (drop B)
|
|
|
|
|
|
| |
This fixes two bugs: First, we need to compare the nominal types of function
constants when looking for constants to "merge", not just their structure.
Second, when creating the new function we must use the proper type of
those constants, and not just another type.
|
|
|
|
|
|
|
| |
Related: emscripten-core/emscripten#15893 (comment)
--pass-arg=asyncify-side-module option will be used not only from
side modules, but also from main modules.
|
|
|
|
|
|
|
|
|
| |
We can preserve return_calls in inlined functions when the inlined call site is
itself a return_call, since the call result types must transitively match in
that case. This solves a problem where the previous inlining logic could
introduce stack exhaustion by downgrading recursive return_calls to normal
calls.
Fixes #4587.
|
|
|
|
|
|
|
|
| |
247f4c20a1 introduced a bug that caused expressions that refer to data segments
to be associated with the wrong segments in the presence of other segments that
have no referring expressions at all.
Fixes #4569.
Fixes #4571.
|
|
|
|
|
|
|
|
| |
CoalesceLocals (#4574)
Normally we just replace unreachable local.gets with a constant (0, or null), but if
the local is non-nullable we can't do that.
Fixes #4573
|
|
|
|
|
| |
Fixes #4562
Fixes #4564
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#4553)
Previously we'd remove a field from a type if that field has no uses in any
sub- or super-type. In that case we'd remove it from all the types at once.
However, there is a case where we can remove a field only from a parent
but not from its children, if the field is at the end: if A has fields {x, y, z}
and its subtype B has fields {x, y, z, w}, and A pointers only access
field y while B pointers access all the fields, then we can remove z
from A. Removing it from the end is safe, and then B will not only add
w as it did before but also add z. Note that we cannot remove x,
because it is not at the end: removing it from just A but not B would
shift the indexes, making them incompatible.
|
|
|
|
|
| |
This basically just adds a call to ParamUtils::applyConstantValues, however,
we also need to be careful to not optimize in the presence of imports or
exports, so this adds a boolean that indicates unoptimizability.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves more logic from ConstantFieldPropagation into PossibleConstantValues,
that is, instead of handling the two cases of a Literal or a Name before calling
PossibleConstantValues, move that code into the helper class. That way all users of
PossibleConstantValues can benefit from it. In particular, this makes
DeadArgumentElimination now support optimizing immutable globals, as well as
ref.func and ref.null.
(Changes to test/lit/passes/dae-gc-refine-params.wast are to avoid the new
optimizations from kicking in, so that it still tests what it tested before.)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new signature-pruning pass that prunes parameters from
signature types where those parameters are never used in any function
that has that type. This is similar to DeadArgumentElimination but works
on a set of functions, and it can handle indirect calls.
Also move a little code from SignatureRefining into a shared place to
avoid duplication of logic to update signature types.
This pattern happens in j2wasm code, for example if all method functions
for some virtual method just return a constant and do not use the this
pointer.
|
| |
|
|
|
|
|
|
|
| |
When copying a MemorySize or MemoryGrow instruction (e.g. for inlining),
transfer the memory type also to the copy. Otherwise it will always be
i32, even if memory64 should be used.
This fixes issue #4530.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Merge similar functions that only differs constant values (like immediate
operand of const and call insts) by parameterization.
Performing this pass at post-link time can merge more functions across
objects. Inspired by Swift compiler's optimization which is derived from
LLVM's one:
https://github.com/apple/swift/blob/main/lib/LLVMPasses/LLVMMergeFunctions.cpp
https://github.com/llvm/llvm-project/blob/main/llvm/docs/MergeFunctions.rst
The basic ideas here are constant value parameterization and direct callee
parameterization by indirection.
Constant value parameterization is like below:
;; Before
(func $big-const-42 (result i32)
[[many instr 1]]
(i32.const 44)
[[many instr 2]]
)
(func $big-const-43 (result i32)
[[many instr 1]]
(i32.const 45)
[[many instr 2]]
)
;; After
(func $byn$mgfn-shared$big-const-42 (result i32)
[[many instr 1]]
(local.get $0) ;; parameterized!!
[[many instr 2]]
)
(func $big-const-42 (result i32)
(call $byn$mgfn-shared$big-const-42
(i32.const 42)
)
)
(func $big-const-43 (result i32)
(call $byn$mgfn-shared$big-const-42
(i32.const 43)
)
)
Direct callee parameterization is similar to the constant value parameterization,
but it parameterizes callee function i by ref.func instead. Therefore it is enabled
only when reference-types and typed-function-references features are enabled.
I saw 1 ~ 2 % reduction for SwiftWasm binary and Ruby's wasm port
using wasi-sdk, and 3 ~ 4.5% reduction for Unity WebGL binary when -Oz.
|
|
|
|
| |
We were missing this particular case, which we can in fact handle
when the cast is static.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass ignores reads from structs - it only cares about writes (during a
create or a struct.set). That makes sense since we want to refine the type
of fields to more specific things based on what is actually written to them.
However, a corner case was missed: If we ignore reads, the pass may
"cleverly" optimize to something that is no longer valid to read from. How
that happens is if there is no info at all for a type - no sets or news, so all
we have is a read, which as mentioned before we ignore, so we think we
have nothing at all for that type, and can do arbitrary stuff with it. But then
the arbitrary replacement can be invalid to read from, say if it has fewer
fields.
To handle that, just emit an unreachable. If all we have is a get but no
new then there cannot be an instance here at all. (That's only true in a
closed world, of course, but this entire pass assumes that anyhow.)
|
|
|
|
|
|
|
|
|
|
|
| |
(#4399)
Final part of #4265
(i32(x) >= 0) & (i32(y) >= 0) ==> i32(x | y) >= 0
(i64(x) >= 0) & (i64(y) >= 0) ==> i64(x | y) >= 0
(i32(x) == -1) & (i32(y) == -1) ==> i32(x & y) == -1
(i64(x) == -1) & (i64(y) == -1) ==> i64(x & y) == -1
|
|
|
|
|
|
|
|
|
| |
(#4372)
(i32(x) >= 0) | (i32(y) >= 0) ==> i32(x & y) >= 0
(i64(x) >= 0) | (i64(y) >= 0) ==> i64(x & y) >= 0
(i32(x) != -1) | (i32(y) != -1) ==> i32(x & y) != -1
(i64(x) != -1) | (i64(y) != -1) ==> i64(x & y) != -1
|
|
|
|
|
|
|
|
|
|
|
| |
This PR is part of the solution to emscripten-core/emscripten#15594.
emscripten Asyncify won't work properly in side modules, because the
globals, __asyncify_state and __asyncify_data, are not synchronized
between main-module and side-modules.
A new pass arg, asyncify-side-module, is added to make
__asyncify_state and __asyncify_data imported in the instrumented
wasm.
|
|
|
|
|
| |
without "ignoreImplicitTraps" (#4295)" (#4459)
This reverts commit 5cf3521708cfada341285414df2dc7366d7e5454.
|
|
|
|
| |
"ignoreImplicitTraps" (#4295)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`ref.cast` can be statically removed when the ref's type is a subtype of
the intended RTT type and either of `--ignore-implicit-traps` or
`--traps-never-happen` is given: https://github.com/WebAssembly/binaryen/blob/083ab9842ec3d4ca278c95e1a33112ae7cd4d9e5/src/passes/OptimizeInstructions.cpp#L1603-L1624
Some more context: https://github.com/WebAssembly/binaryen/pull/4097#discussion_r694456784
But this can create a block in which a `pop` is nested, which makes the
`catch` invalid. The test in this PR is the same as the example given by
@kripken in #4237. This calls the fixup function
`EHUtils::handleBlockNestedPops` at the end of the pass to fix this.
Also, because this pass creates a lot of blocks in other patterns, I
think it is possible there can be other patterns to cause this kind of
`pop` nesting.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Inlining creates additional `block`s at inlined call sites, which can be
inside a `catch`. For example:
```wast
(try
(do)
(catch $tag
(call $callee
(pop i32)
)
)
)
```
After inlining, this becomes
```wast
(try
(do)
(catch $tag
(block $__inlined_func$callee
(local.set $0
(pop i32) ;; Invalid!!
)
(nop)
)
)
)
```
Now the `pop` is nested in a `block`, which makes this invalid. This PR
runs `EHUtils::handleBlockNestedPops` at the end to assign the `pop` to
a local right after the `catch`, making the code valid again:
```wast
(try
(do)
(catch $tag
(local.set $new ;; New local to store `pop` result
(pop i32)
)
(block $__inlined_func$callee
(local.set $0
(local.get $new)
)
(nop)
)
)
)
```
|
|
|
|
|
|
| |
Similar to what DeadArgumentElimination does for individual functions, this
can refine the results of a set of functions all using the same heap type, when
they all return something more specific. After this PR SignatureRefining can
refine both params and results and is basically complete.
|