| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
visitBlock() and validateCallParamsAndResult() both assumed they were
running inside a function, but might be called on global code too. Calls
and blocks are invalid in global positions, so we should error there, but
must do so properly without a null deref.
Fixes #6847
Fixes #6848
|
|
|
| |
Ensure the "fp16" feature is enabled for FP16 instructions.
|
|
|
|
|
|
|
|
|
|
|
| |
The best way to lower strings is via the "magic imports" API that uses
the names of imported string globals as their values. This approach only
works for valid UTF-8 strings, though. The existing
string-lowering-magic-imports pass falls back to putting non-UTF-8
strings in a JSON custom section, but this requires the runtime to
support that custom section for correctness. To help catch errors early
when runtimes do not support the strings custom section, add a new pass
that uses magic imports and raises an error if there are any invalid
strings.
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
|
|
| |
* Add interpreter support for exnref values.
* Fix optimization passes to support try_table.
* Enable the interpreter (but not in V8, see code) on exceptions.
|
|
|
|
|
|
|
|
|
| |
We previously printed explicit typeuses (e.g. `(type $f)`) in function
signatures when GC was enabled. But even when GC is not enabled,
function types may use non-MVP features that require the explicit
typeuse to be printed. Fix the printer to always print the explicit type
use for such types.
Fixes #6850.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of our type optimization passes emit all non-public types as a
single large rec group, which trivially ensures that different types
remain different, even if they are optimized to have the same structure.
Usually emitting a single large rec group is fine, but it also means
that if the module is split, all of the types will need to be repeated
in all of the split modules. To better support this use case, add a pass
that can split the large rec group back into minimal rec groups, taking
care to preserve separate type identities by emitting different
permutations of the same group where possible or by inserting unused
brand types to differentiate them.
|
|
|
|
|
| |
Audit the remaining ocurrences of `== HeapType::` and fix those that did
not handle shared types correctly. Add tests for some of the fixes;
others are NFC but clarify the code.
|
|
|
|
|
| |
Also use TableInit in the interpreter to initialize module's table
state, which will now handle traps properly, fixing #6431
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't properly validate that yet. E.g.:
(module
(rec
(type $func (func))
(type $unused (sub (struct (field v128))))
)
(func $func (type $func))
)
That v128 is not used, but it ends up in the output because it is in a rec group that is used.
Atm we do not require that SIMD be enabled in such a case, which can trip up the fuzzer.
Context: #6820. For now, modify the test that uncovered this.
|
|
|
|
|
|
|
| |
This is based on these two proposals:
* https://github.com/WebAssembly/tool-conventions/blob/main/BuildId.md
* https://github.com/tc39/source-map/blob/main/proposals/debug-id.md
|
|
|
|
|
|
| |
Since reference types only introduced function and extern references,
all of the types in the `any` hierarchy require GC, including `none`.
Fixes #6839.
|
|
|
|
|
|
|
|
|
| |
Previously we included supertypes, but did not increase their count.
This was done so that the output for the nominal type system, which
introduced explicitly supertypes, would more closely match the output
with the old equirecursive types system. Neither type system exists
anymore and we only support the single, standard isorecursive type
system, so we can now properly count supertypes. It turns out it doesn't
make much of a difference in the test outputs anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The argument is the minimum benefit we must see for us to decide to optimize, e.g.
--monomorphize --pass-arg=monomorphize-min-benefit@50
When the minimum benefit is 50% then if we reduce the cost by 50% through
monomorphization then we optimize there. 95% would only optimize when we
remove almost all the cost, etc.
In practice I see 95% will actually tend to reduce code size overall, as while we add
monomorphized versions of functions, we only do so when we remove a lot of
work and size, and after inlining we gain benefits. However, 50% or even lower can
lead to better benchmark results, in return for larger code size, just like with
inlining. To be careful, the default is set to 95%.
Previously we optimized whenever we saw any benefit at all, which is the same
as requiring a minimum benefit of 0%. Old tests have the flag applied in this PR
to set that value, so they do not change.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we tracked only whether an expression was relevant to analysis, that is,
whether it interacted with the allocation we were tracing the behavior of. That is
not enough for all cases, though, so also track the form of the interaction, namely
whether the allocation flows through or is fully consumed. An example where that
matters:
(ref.eq
(struct.get $A 0
(local.tee $x
(struct.new_default $A)
)
)
(local.get $x)
)
Here the local.get flows out the allocation, but the struct.get only fully consumes
it. Before this PR we thought the struct.get flowed the allocation, and we misoptimized
this to 1.
To make this possible, do a bunch of minor refactoring:
* Move ParentChildInteraction out of the class.
* Add a "None" interaction there.
* Replace the set of reached expressions with a map of them to their interactions.
* Add helper functions to get an expression's interaction or to update it when replacing.
The new testcase here shows the main fix. The new assertions are covered by existing
testcases.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, we only removed fields from the end of a struct. If we had, say
struct Foo {
int x;
int y;
int z;
};
// Add no fields but inherit the parent's.
struct Bar : Foo {};
If y is only used in Bar, but never Foo, then we still kept it around, because
if we removed it from Foo we'd end up with Foo = {x, z}, Bar = {x, y, z} which
is invalid - Bar no longer extends Foo. But we can do this if we first reorder
the two:
struct Foo {
int x;
int z;
int y; // now y is at the end
};
struct Bar : Foo {};
And the optimized form is
struct Foo {
int x;
int z;
};
struct Bar : Foo {
int y; // now y is added in Bar
};
This lets us remove all fields possible in all cases AFAIK.
This situation is not super-common, as most fields are actually used both
up and down the hierarchy (if they are used at all), but testing on some
large real-world codebases, I see 10 fields removed in Java, 45 in Kotlin,
and 31 in Dart testcases.
The NFC change to src/wasm-type-ordering.h was needed for this to
compile.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The syntax for handler clauses in `resume` instructions has recently
changed, using `on` instead of `tag` now.
Instead of
```
(resume $ct (tag $tag0 $block0) ... (tag $tagn $blockn))
```
we now have
```
(resume $ct (on $tag0 $block0) ... (on $tagn $blockn))
```
This PR adapts parsing, printing, and some tests accordingly.
(Note that this PR deliberately makes none of the other changes that
will arise from implementing the new, combined stack switching proposal,
yet.)
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The optimization is to only use ChildLocalizer, which moves children to
locals, if we actually have a reason to use it. It is simple enough to see if
we are removing fields with side effects here, and only call ChildLocalizer
if we are not. However, this will become much more complicated in a
subsequent PR which will reorder fields, which allows removing yet more
of them (without reordering, we can only remove fields at the end, if any
subtype needs the field).
This is a pretty minor optimization, as it avoids adding a few locals in the rare
case of struct.new operands having side effects. We run --gto at the
start of the pipeline, so later opts will clean that up anyhow. (Though, this
might make us a little less efficient, but the following PR will justify this
regression.)
|
|
|
|
|
|
| |
The type index from the TypeBuilder error was mapped to a file location
incorrectly, resulting in an assertion failure.
Fixes #6816.
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
|
| |
The `timport$` prefix is already used for tables, so the binary parser
currently uses `eimport$` to name tags (I guess because they are
normally exception tags?).
|
| |
|
|
|
|
|
|
| |
We had a TODO to use it once Names was optimized, which it has been.
The Names version is also far faster. When building
https://github.com/JetBrains/kotlinconf-app it saves 70 seconds(!).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before the PR:
$ bin/wasm-opt test/hello_world.wat --metrics
total
[exports] : 1
[funcs] : 1
[globals] : 0
[imports] : 0
[memories] : 1
[memory-data] : 0
[tables] : 0
[tags] : 0
[total] : 3
[vars] : 0
Binary : 1
LocalGet : 2
After the PR:
$ bin/wasm-opt test/hello_world.wat --metrics
Metrics
total
[exports] : 1
[funcs] : 1
...
Note the "Metrics" addition at the top. And the title can be customized:
$ bin/wasm-opt test/hello_world.wat --metrics=text
Metrics: text
total
[exports] : 1
[funcs] : 1
The custom title can be helpful when multiple invocations of metrics are used
at once, e.g. --metrics=before -O3 --metrics=after.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We marked various expressions as having cost "Unacceptable", fixed at 100, to
ensure we never moved them out from an If arm, etc. Giving them such a high
cost avoids that problem - the cost is higher than the limit we have for moving
code from conditional to unconditional execution - but it also means the total
cost is unrealistic. For example, a function with one such instruction + an add
(cost 1) would end up with cost 101, and removing the add would look
insignificant, which causes issues for things that want to compare costs
(like Monomorphization).
To fix this, adjust some costs. The main change here is to give casts a cost of 5.
I measured this in depth, see the attached benchmark scripts, and it looks
clear that in both V8 and SpiderMonkey the cost of a cast is high enough to
make it not worth turning an if with ref.test arm into a select (which would
always execute the test).
Other costs adjusted here matter a lot less, because they are on operations
that have side effects and so the optimizer will anyhow not move them from
conditional to unconditional execution, but I tried to make them a bit more
realistic while I was removing "Unacceptable":
* Give most atomic operations the 10 cost we've been using for atomic loads/
stores. Perhaps wait and notify should be slower, however, but it seems like
assuming fast switching might be more relevant.
* Give growth operations a cost of 20, and throw operations a cost of 10. These
numbers are entirely made up as I am not even sure how to measure them in
a useful way (but, again, this should not matter much as they have side
effects).
|
|
|
|
|
|
| |
We used the target's type for the read from the source, but due to
subtyping those might be different.
Found by the fuzzer.
|
|
|
| |
Fixes #6781
|
|
|
| |
Fixes #6776.
|
|
|
|
|
|
|
|
| |
Followup to #6727 which added support for failing casts in Struct2Local, but it
turns out that it required Array2Struct changes as well. Specifically, when we
turn an array into a struct then casts can look like they behave differently
(what used to be an array input, becomes a struct), so like with RefTest that we
already handled, check if the cast succeeds in the original form and handle
that.
|
| |
|
| |
|
|
|
|
| |
We previously special-cased things like GC types, but switch to a more
general solution of detecting what features a table's type requires.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously call operands were monomorphized (considered as part of the
call context, so we can create a specialized function with those operands
fixed) if they were constant or had a different type than the function
parameter's type. This generalizes that to pull in pretty much all the code
we possibly can, including nested code. For example:
(call $foo
(struct.new $struct
(i32.const 10)
(local.get $x)
(local.get $y)
)
)
This can turn into
(call $foo_mono
(local.get $x)
(local.get $y)
)
The struct.new and even one of the struct.new's children is moved into the
called function, replacing the original ref argument with two other ones. If the
original called function was this:
(func $foo (param $ref (ref ..))
..
)
then the monomorphized function then looks like this:
(func $foo_mono (param $x i32) (param $y i32)
(local $ref (ref ..))
(local.set $ref
(struct.new $struct
(i32.const 10)
(local.get $x)
(local.get $y)
)
)
..
)
The struct.new and its constant child appear here, and we read the
parameters.
To do this, generalize the code that creates the call context to accept
everything that is impossible to copy (like a local.get) or that would be
tricky and likely unworthwhile (like another call or a tuple). Also check
for effect interactions, as this code motion does some reordering.
For this to work, we need to adjust how we compute the costs we
compare when deciding what to monomorphize. Before we just
compared the called function to the monomorphized called function,
which was good enough when the call context only contained consts,
but now it can contain arbitrarily nested code. The proper comparison
is between these two:
* Old function + call context
* New monomorphized function
Including the call context makes this a fair comparison. In the example
above, the struct.new and the i32.const are part of the call context,
and so they are in the monomorphized function, so if we didn't count
them in other function we'd decide not to optimize anything with a large
context.
The new functionality is tested in a new file. A few parts of existing
tests needed changes to not become pointless after this improvement,
namely by replacing stuff that we now optimize with things that we
don't like replacing an i32.eqz with a local.get. There are also a
handful of test outcomes that change in CAREFUL mode due to the
new cost analysis.
|
|
|
|
|
|
| |
Similar to #6765, but for types instead of heap types. Generalize the
logic for transforming written reference types to types that are
supported without GC so that it will automatically handle shared types
and other new types correctly.
|
|
|
|
|
|
|
|
|
|
| |
We represent `ref.null`s as having bottom heap types, even when GC is
not enabled. Bottom heap types are a feature of the GC proposal, so in
that case the binary writer needs to write the corresponding top type
instead. We previously had separate logic for this for each type
hierarchy in the binary writer, but that did not handle shared types and
would not have automatically handled other new types, either. Simplify
and generalize the implementation and test that we can write `ref.null`s
of shared types without GC enabled.
|
|
|
|
|
|
| |
Once the fuzzer is updated to be able to handle initial contents with
shared types, it still will not be able to handle initial contents with
continuation types. To avoid future issues, remove continuations from
lit/basic/shared-types.wast.
|
|
|
|
|
|
| |
Component binary format: https://github.com/WebAssembly/component-model/blob/main/design/mvp/Binary.md#component-definitions
Context:
https://github.com/WebAssembly/binaryen/issues/6728#issuecomment-2231288924
|
|
|
|
|
|
|
| |
`ref.null` of shared types should only be allowed when shared-everything
is enabled, but we were previously checking only that reference types
were enabled when validating `ref.null`. Update the code to check all
features required by the null type and factor out shared logic for
printing lists of missing feature options in error messages.
|
|
|
| |
--skip-pass can now be specified more than once on the commandline.
|
|
|
|
|
|
|
| |
`ref.null` of shared types should only be allowed when shared-everything
is enabled, but we were previously checking only that reference types
were enabled when validating `ref.null`. Update the code to check all
features required by the null type and factor out shared logic for
printing lists of missing feature options in error messages.
|
|
|
|
| |
The logic for adding the shared-everything feature was not previously
executed for shared basic heap types.
|
|
|
|
| |
When creating a new subtype, make sure to copy the supertype's
shareability.
|
|
|
|
|
|
| |
When we switched to the new type printing machinery, we inserted this
extra space to minimize the diff in the test output compared with the
previous type printer. Improve the quality of the printed output by
removing it.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Each pass instance can now store an argument for it, which can be different.
This may be a breaking change for the corner case of running a pass multiple
times and setting the pass's argument multiple times as well (before, the last
pass argument affected them all; now, it affects the last instance only). This
only affects arguments with the name of a pass; others remain global, as
before (and multiple passes can read them, in fact). See the CHANGELOG for
details.
Fixes #6646
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now consider a drop to be part of the call context: If we see
(drop
(call $foo)
)
(func $foo (result i32)
(i32.const 42)
)
Then we'd monomorphize to this:
(call $foo_1) ;; call the specialized function instead
(func $foo_1 ;; the specialized function returns nothing
(drop ;; the drop was moved into here
(i32.const 42)
)
)
With the drop now in the called function, we may be able to optimize out unused work.
Refactor a bit of code out of DAE that we can reuse here, into a new return-utils.h.
|
|
|
|
| |
The standard name for the instruction is `ref.i31`. Remove support for
the non-standard name and update tests that were still using it.
|