| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
|
|
|
|
| |
This will be used in an upcoming type optimization pass and may be
generally useful.
|
|
|
|
|
| |
The local was only used once, so it didn't really add much. And, it was
causing some compilers to error on "unused variable" (when building without
assertions, the use was removed).
|
|
|
|
|
|
| |
We had a TODO to use it once Names was optimized, which it has been.
The Names version is also far faster. When building
https://github.com/JetBrains/kotlinconf-app it saves 70 seconds(!).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before the PR:
$ bin/wasm-opt test/hello_world.wat --metrics
total
[exports] : 1
[funcs] : 1
[globals] : 0
[imports] : 0
[memories] : 1
[memory-data] : 0
[tables] : 0
[tags] : 0
[total] : 3
[vars] : 0
Binary : 1
LocalGet : 2
After the PR:
$ bin/wasm-opt test/hello_world.wat --metrics
Metrics
total
[exports] : 1
[funcs] : 1
...
Note the "Metrics" addition at the top. And the title can be customized:
$ bin/wasm-opt test/hello_world.wat --metrics=text
Metrics: text
total
[exports] : 1
[funcs] : 1
The custom title can be helpful when multiple invocations of metrics are used
at once, e.g. --metrics=before -O3 --metrics=after.
|
|
|
|
|
|
|
|
|
| |
Implement a non-recursive version of Tarjan's Strongly Connected
Component algorithm that consumes and produces iterators for maximum
flexibility.
This will be used in an optimization that transforms the heap type graph
to use minimal recursion groups, which correspond to the strongly
connected components of the type graph.
|
| |
|
| |
|
|
|
|
| |
Generalize the code for simplifying element segments to handle more than
just null and funcref elements.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We marked various expressions as having cost "Unacceptable", fixed at 100, to
ensure we never moved them out from an If arm, etc. Giving them such a high
cost avoids that problem - the cost is higher than the limit we have for moving
code from conditional to unconditional execution - but it also means the total
cost is unrealistic. For example, a function with one such instruction + an add
(cost 1) would end up with cost 101, and removing the add would look
insignificant, which causes issues for things that want to compare costs
(like Monomorphization).
To fix this, adjust some costs. The main change here is to give casts a cost of 5.
I measured this in depth, see the attached benchmark scripts, and it looks
clear that in both V8 and SpiderMonkey the cost of a cast is high enough to
make it not worth turning an if with ref.test arm into a select (which would
always execute the test).
Other costs adjusted here matter a lot less, because they are on operations
that have side effects and so the optimizer will anyhow not move them from
conditional to unconditional execution, but I tried to make them a bit more
realistic while I was removing "Unacceptable":
* Give most atomic operations the 10 cost we've been using for atomic loads/
stores. Perhaps wait and notify should be slower, however, but it seems like
assuming fast switching might be more relevant.
* Give growth operations a cost of 20, and throw operations a cost of 10. These
numbers are entirely made up as I am not even sure how to measure them in
a useful way (but, again, this should not matter much as they have side
effects).
|
|
|
|
|
|
| |
We used the target's type for the read from the source, but due to
subtyping those might be different.
Found by the fuzzer.
|
|
|
| |
Fixes #6781
|
|
|
|
|
|
|
| |
When lacking a common supertype the GLB operation makes the type of the cast
unreachable, which errors on getHeapType in the later code.
Fixes #6738
|
|
|
| |
Fixes #6776.
|
|
|
|
|
|
|
|
| |
Aside from the fact that there's no need for this to be non-const and
this is the usual way to write an assignment operator, this is also
needed because of a recent change to std::pair
(https://github.com/llvm/llvm-project/pull/89652). This seems to be
forcing pair to want the const version of the assignment operator of its
members.
|
|
|
|
|
|
|
|
| |
Followup to #6727 which added support for failing casts in Struct2Local, but it
turns out that it required Array2Struct changes as well. Specifically, when we
turn an array into a struct then casts can look like they behave differently
(what used to be an array input, becomes a struct), so like with RefTest that we
already handled, check if the cast succeeds in the original form and handle
that.
|
|
|
|
|
|
|
|
|
| |
This abbreviates a common pattern where we first had to check whether a
heap type was basic, then if it was, get its unshared version and
compare it to some expected BasicHeapType.
Suggested in
https://github.com/WebAssembly/binaryen/pull/6771#discussion_r1683005495.
|
|
|
|
|
|
|
|
| |
Update the fuzzer to both handle shared types in initial contents and
create and use new shared types without crashing or producing invalid
modules. Since V8 does not have a complete implementation of
shared-everything-threads yet, disable fuzzing V8 when shared-everything
is enabled. To avoid losing too much coverage of V8, disable
shared-everything in the fuzzer more frequently than other features.
|
| |
|
| |
|
|
|
|
| |
We previously special-cased things like GC types, but switch to a more
general solution of detecting what features a table's type requires.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously call operands were monomorphized (considered as part of the
call context, so we can create a specialized function with those operands
fixed) if they were constant or had a different type than the function
parameter's type. This generalizes that to pull in pretty much all the code
we possibly can, including nested code. For example:
(call $foo
(struct.new $struct
(i32.const 10)
(local.get $x)
(local.get $y)
)
)
This can turn into
(call $foo_mono
(local.get $x)
(local.get $y)
)
The struct.new and even one of the struct.new's children is moved into the
called function, replacing the original ref argument with two other ones. If the
original called function was this:
(func $foo (param $ref (ref ..))
..
)
then the monomorphized function then looks like this:
(func $foo_mono (param $x i32) (param $y i32)
(local $ref (ref ..))
(local.set $ref
(struct.new $struct
(i32.const 10)
(local.get $x)
(local.get $y)
)
)
..
)
The struct.new and its constant child appear here, and we read the
parameters.
To do this, generalize the code that creates the call context to accept
everything that is impossible to copy (like a local.get) or that would be
tricky and likely unworthwhile (like another call or a tuple). Also check
for effect interactions, as this code motion does some reordering.
For this to work, we need to adjust how we compute the costs we
compare when deciding what to monomorphize. Before we just
compared the called function to the monomorphized called function,
which was good enough when the call context only contained consts,
but now it can contain arbitrarily nested code. The proper comparison
is between these two:
* Old function + call context
* New monomorphized function
Including the call context makes this a fair comparison. In the example
above, the struct.new and the i32.const are part of the call context,
and so they are in the monomorphized function, so if we didn't count
them in other function we'd decide not to optimize anything with a large
context.
The new functionality is tested in a new file. A few parts of existing
tests needed changes to not become pointless after this improvement,
namely by replacing stuff that we now optimize with things that we
don't like replacing an i32.eqz with a local.get. There are also a
handful of test outcomes that change in CAREFUL mode due to the
new cost analysis.
|
|
|
|
|
|
| |
Similar to #6765, but for types instead of heap types. Generalize the
logic for transforming written reference types to types that are
supported without GC so that it will automatically handle shared types
and other new types correctly.
|
|
|
|
|
|
|
|
|
|
| |
We represent `ref.null`s as having bottom heap types, even when GC is
not enabled. Bottom heap types are a feature of the GC proposal, so in
that case the binary writer needs to write the corresponding top type
instead. We previously had separate logic for this for each type
hierarchy in the binary writer, but that did not handle shared types and
would not have automatically handled other new types, either. Simplify
and generalize the implementation and test that we can write `ref.null`s
of shared types without GC enabled.
|
|
|
|
|
|
|
| |
Update the validator to reject mixed-shareability ref.eq, although this
is still under discussion in
https://github.com/WebAssembly/shared-everything-threads/issues/76. Fix
the implementation of `Literal::operator==` to work properly with shared
i31ref.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#6752)" (#6761)
Allowing Literals with different types to compare equal causes problems
for passes that want equality to mean real equality, e.g. because they
are using literals as map keys or because they otherwise need to use
them interchangeably.
At a minimum, we would need to differentiate a `refEq` operation where
mixed-shareability i31refs can compare equal from physical equality on
Literals, but there is also appetite to disallow mixed-shareability
ref.eq at the spec level. See
https://github.com/WebAssembly/shared-everything-threads/issues/76.
|
|
|
|
|
|
| |
Component binary format: https://github.com/WebAssembly/component-model/blob/main/design/mvp/Binary.md#component-definitions
Context:
https://github.com/WebAssembly/binaryen/issues/6728#issuecomment-2231288924
|
|
|
|
|
|
|
| |
`ref.null` of shared types should only be allowed when shared-everything
is enabled, but we were previously checking only that reference types
were enabled when validating `ref.null`. Update the code to check all
features required by the null type and factor out shared logic for
printing lists of missing feature options in error messages.
|
|
|
| |
--skip-pass can now be specified more than once on the commandline.
|
|
|
|
|
|
|
| |
`ref.null` of shared types should only be allowed when shared-everything
is enabled, but we were previously checking only that reference types
were enabled when validating `ref.null`. Update the code to check all
features required by the null type and factor out shared logic for
printing lists of missing feature options in error messages.
|
|
|
|
| |
The logic for adding the shared-everything feature was not previously
executed for shared basic heap types.
|
|
|
|
|
|
|
| |
Add functions to:
* Set and get the trapsNeverHappen, closedWorld, generateStackIR and optimizeStackIR flags
* Manage the list of passes to skip.
|
|
|
|
| |
When creating a new subtype, make sure to copy the supertype's
shareability.
|
|
|
|
|
|
|
| |
Normally, values of different types can never compare equal to each
other, but since i31refs are not actually allocations, `ref.eq` has no
way to differentiate a shared i31ref and an unshared i31ref with the
same value, so it will report them as equal. Update the implementation
of value equality to reflect this correctly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
flexibleCopy always visited parents before children, but it visited
vector children in reverse order:
(call ;; 1
(call $a) ;; 3
(call $b) ;; 2
)
The order of children happened to not matter in any user of this code,
and that's just what you get when you iterate over children in a vector
and push them to a stack before visiting them, so this odd ordering
was not noticed.
For a new user I will introduce soon, however, it would be nice to have
the normal pre-order:
(call ;; 1
(call $a) ;; 2
(call $b) ;; 3
)
(2 & 3 swapped).
This cannot be tested in the current code as it is NFC, but the later PR
will depend on it and test it heavily.
|
|
|
|
|
|
| |
When we switched to the new type printing machinery, we inserted this
extra space to minimize the diff in the test output compared with the
previous type printer. Improve the quality of the printed output by
removing it.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When creating a reference to `func`, fix the probability of choosing to
continue on to choose some function other than the last one rather than
making it depend on the number of functions. Then, do not eagerly pick
from the rest of the candidate functions. Instead, fall through to the
more general logic that will already pick a random candidate function.
Also move the logic for coming up with a concrete signature down to
where it is needed.
These simplifications will make it easier to update the code to handle
shared types.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Each pass instance can now store an argument for it, which can be different.
This may be a breaking change for the corner case of running a pass multiple
times and setting the pass's argument multiple times as well (before, the last
pass argument affected them all; now, it affects the last instance only). This
only affects arguments with the name of a pass; others remain global, as
before (and multiple passes can read them, in fact). See the CHANGELOG for
details.
Fixes #6646
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now consider a drop to be part of the call context: If we see
(drop
(call $foo)
)
(func $foo (result i32)
(i32.const 42)
)
Then we'd monomorphize to this:
(call $foo_1) ;; call the specialized function instead
(func $foo_1 ;; the specialized function returns nothing
(drop ;; the drop was moved into here
(i32.const 42)
)
)
With the drop now in the called function, we may be able to optimize out unused work.
Refactor a bit of code out of DAE that we can reuse here, into a new return-utils.h.
|
|
|
|
| |
The standard name for the instruction is `ref.i31`. Remove support for
the non-standard name and update tests that were still using it.
|
|
|
|
|
|
|
| |
Implement `ref.i31_shared` the new instruction for creating references
to shared i31s. Implement binary and text parsing and emitting as well
as interpretation. Copy the upstream spec test for i31 and modify it so
that all the heap types are shared. Comment out some parts that we do
not yet support.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
E.g. loading 4 bytes from 2^32 - 2 should error: 2 bytes are past the maximum
address. Before this PR we added 2^32 - 2 + 4 and overflowed to 2, which we
saw as a low and safe address. This PR adds an extra check for an overflow in
that add.
Also add unreachables after calls to segfault(), which reduces the overhead of
the extra check here (the unreachable apparently allows VMs to see that
control flow ends, after the segfault() which is truly no-return).
Fixes emscripten-core/emscripten#21557
|
|
|
|
|
|
|
|
| |
The full syntax for an expression in an element syntax looks like
`(item (ref.null none))`, but we have been printing the abbreviated
version, which omits the `(item ...)`. This abbreviation is only valid
when the item has only a single instruction, so it is not always correct
to use it. Rather than determining whether or not to use the
abbreviation on a case-by-case basis, always print the full syntax.
|
|
|
| |
This edge case make the lowering a little more tricky.
|
|
|
|
|
| |
Eventually we will need to do some tuning of compile time speed, but for
now it is going to be simpler to do all the opts, in particular because it makes
writing tests simpler.
|
|
|
|
|
| |
`any.convert_extern` and `extern.convert_any` return references to
shared heap types iff their operands are references to shared heap
types.
|