| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this change, replace lane was converting all the F16 lanes to F32
and then replacing one lane with the F16 (I32 representation) value, but
it did not then convert all the other lanes back to F16 (I32). To fix
this we can just leave the lanes as I32 and replace the one lane.
Note: Previous replace lane tests didn't catch this since they started
with vectors with all zeros so the F32->I32 didn't matter. Also, other
operations don't run into this issue since they iterate over all lanes
and convert the F32's back to F16 (I32).
---------
Co-authored-by: Alon Zakai <alonzakai@gmail.com>
|
|
|
|
|
|
|
|
| |
This allows to remove a reference field from all Java objects reducing
the per object memory and initialization overhead.
The pass is designed to run direclty on the J2CL output before other
optimizations since it relies on invariants that might get lost in
optimization. If the invariants don't hold the pass aborts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As with all type optimizations, MinimizeRecGroups only changes private
types, which are the only types that are safe to modify. However, it is
important for correctness that MinimimizeRecGroups maintain separate
type identities for all types, whether public or private, to ensure that
casts that should differentiate two types cannot change behavior.
Previously the pass worked exclusively on private types, so there was
nothing preventing it from constructing a minimial rec group that
happened to have the same shape, and therefore type identity, as a
public rec group. #6886 exhibits a fuzzer test case where this happens
and changes the behavior of the program.
Fix the bug by recording all public rec group shapes and resolve
conflicts with these shapes by updating the shape of the conflicting
non-public type.
Fixes #6886.
|
|
|
|
|
|
| |
We computed both get and set influences, but getGetInfluences() was
never called, so that work was entirely pointless.
This makes the pass 20% faster.
|
|
|
|
|
|
|
| |
We previous incremented the use count for a declared supertype only if
it was also a type we had never seen before. Fix the count by treating
the supertype the same as any other type used in a type definition.
Update tests accordingly, including by manually moving input types
around to better match the output.
|
|
|
|
|
|
|
|
|
|
| |
LocalGraph by default will compute all the local.sets that can be read from all
local.gets. However, many passes only query a small amount of those. To
avoid wasted work, add a lazy mode that only computes sets when asked about
a get.
This is then used in a single place, LoopInvariantCodeMotion, which becomes
18% faster.
|
| |
|
|
|
|
|
|
| |
This saves memory and could in principle improve performance, although a
quick experiment with 30 samples on ReorderGlobals did not yield a
statistically significant improvement. At any rate, using Index is more
consistent with other parts of the code base.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rec groups need to be topologically sorted for the output module to be
valid, but the specific order of rec groups also affects the module size
because types at lower indices requires fewer bytes to reference. We
previously optimized for code size when gathering types by sorting the
list of groups before doing the topological sort. This was brittle,
though, and depended on implementation details of the topological sort
to be correct.
Replace the old topological sort with use of the new
`TopologicalSort::minSort` utility, which is a more principled method of
achieving a minimal topological sort with respect to some comparator.
Also draw inspiration from ReorderGlobals and apply an exponential
factor to take the users of a rec group into account when determining
its weight.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than finding the minimum sort with respect to the original order
of vertices, find the minimum sort with respect to an arbitrary
user-provided comparator. Users of the minSort utility previously had to
sort their input graphs according to their desired ordering, but now
they can simply provide their comparator instead.
Take advantage of the new functionality in ReorderGlobals and also
standardize on a single data type for representing dependence graphs to
avoid unnecessary conversions. Together, these changes slightly speed up
ReorderGlobals.
Move the topological sort code previously in a .cpp file into the header
so the comparator can be provided as a lambda template parameter instead
of as a `std::function`. This makes ReorderGlobals about 5% faster.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HeapStoreOptimization (#6882)
This just moves code out of OptimizeInstructions to the new pass. The existing
test is renamed and now runs the new pass instead. The new pass is run right
after each --optimize-instructions invocation, so it should not cause any
noticeable effects whatsoever, making this NFC.
The motivation here is that there is a bug in the pass, see the new testcase
added at the end, which shows the bug. It is not practical to fix that bug in
OptimizeInstructions since we need more than peephole optimizations to do
so. This PR moves the code to a new pass so we can fix it there properly,
later.
The new pass is named HeapStoreOptimization since the same infrastructure
we will need to fix the bug will also help dead store elimination and related
things.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
A few notes:
- The F32x4 and F64x2 versions of madd and nmadd are missing spect
tests.
- For madd, the implementation was incorrectly doing `(b*c)+a` where it
should be `(a*b)+c`.
- For nmadd, the implementation was incorrectly doing `(-b*c)+a` where
it should be `-(a*b)+c`.
- There doesn't appear to be a great way to actually implement a fused
nmadd, but the spec allows the double rounded version I added.
|
|
|
|
|
|
|
| |
Previously they were structs and their results were accessed with
`operator*()`, but that was unnecessarily complicated and could lead to
problems with temporary lifetimes being too short. Simplify the
utilities by making them functions. This also allows the wrapper
templates to infer the proper element types automatically.
|
|
|
|
|
|
|
|
| |
Reuse the code implementing Kahn's topological sort algorithm with a new
configuration that uses a min-heap to always choose the best available
element.
Also add wrapper utilities that can find topological sorts of graphs
with arbitrary element types, not just indices.
|
|
|
|
| |
Previously for in-tree builds, they were put directly into test/, which
unnecessarily pollutes the tree.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before we just had a map that people would access with localGraph.getSetses[get],
while now it is a call localGraph.getSets(get), which more nicely hides the internal
implementation details.
Also rename getSetses => getSetsMap.
This will allow a later PR to optimize the internals of this API.
This is performance-neutral as far as I can measure. (We do replace a direct read
from a data structure with a call, but the call is in a header and should always get
inlined.)
|
|
|
|
|
|
|
| |
The instructions relaxed_fma and relaxed_fnma have been renamed to
relaxed_madd and relaxed_nmadd.
https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md#binary-format
|
|
|
|
|
|
|
|
|
| |
The parser function for `action` returned a `MaybeResult`, but we were
treating it as returning a normal `Result` and not checking that it had
contents in several places. Replace the current `action()` with
`maybeAction()` and add a new `action()` that requires the action to be
present.
Fixes #6872.
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This constructed a LocalGraph, which computes the sets that reach each get. But
all we need to know is which params are live, so instead we can do a liveness
computation (which is just a boolean, not the list of sets). Also, it is simple to get
the liveness computation to only work on the parameters and not all the locals,
as a further optimization.
Existing tests cover this, though I did find that the case of unreachability needed
a new test.
On a large testcase I am looking at, this makes --dae 17% faster.
|
|
|
|
|
|
|
|
|
|
| |
visitBlock() and validateCallParamsAndResult() both assumed they were
running inside a function, but might be called on global code too. Calls
and blocks are invalid in global positions, so we should error there, but
must do so properly without a null deref.
Fixes #6847
Fixes #6848
|
|
|
| |
Ensure the "fp16" feature is enabled for FP16 instructions.
|
|
|
|
|
|
|
|
|
|
|
| |
The best way to lower strings is via the "magic imports" API that uses
the names of imported string globals as their values. This approach only
works for valid UTF-8 strings, though. The existing
string-lowering-magic-imports pass falls back to putting non-UTF-8
strings in a JSON custom section, but this requires the runtime to
support that custom section for correctness. To help catch errors early
when runtimes do not support the strings custom section, add a new pass
that uses magic imports and raises an error if there are any invalid
strings.
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spec tests pass the value `ref.extern n`, where `n` is some integer,
into exported functions that expect to receive externrefs and receive
such values back out as return values. The payload serves to distinguish
externrefs so the test can assert that the correct one was returned.
Parse these values in wast scripts and represent them as externalized
i31refs carrying the payload. We will need a different representation
eventually, since some tests explicitly expect these externrefs to not
be i31refs, but this suffices to get several new tests passing.
To get the memory64 version of table_grow.wast passing, additionally fix
the interpreter to handle growing 64-bit tables correctly.
Delete the local versions of the upstream tests that can now be run
successfully.
|
|
|
|
|
|
|
|
| |
The leading bytes that indicate what kind of heap type is being defined
are bytes, but we were previously treating them as SLEB128-encoded
values. Since we emit the smallest LEB encodings possible, we were
writing the correct bytes in output files, but we were also improperly
accepting binaries that used more than one byte to encode these values.
This was caught by an upstream spec test.
|
|
|
|
|
|
| |
Run the upstream tests by default, except for a large list of them that
do not successfully run. Remove the local version of those that do
successfully run where the local version is entirely subsumed by the
upstream version.
|
|
|
|
|
|
| |
* Add interpreter support for exnref values.
* Fix optimization passes to support try_table.
* Enable the interpreter (but not in V8, see code) on exceptions.
|
|
|
|
|
|
|
|
|
| |
IRBuilder is responsible for validation involving type annotations on GC
instructions because those type annotations may not be preserved in the
built IR to be used by the main validator. For `array.init_elem`, we
were not using the type annotation to validate the element segment,
which allowed us to parse invalid modules when the reference operand was
a nullref. Add the missing validation in IRBuilder and fix a relevant
spec test.
|
|
|
|
|
|
|
|
|
| |
We previously printed explicit typeuses (e.g. `(type $f)`) in function
signatures when GC was enabled. But even when GC is not enabled,
function types may use non-MVP features that require the explicit
typeuse to be printed. Fix the printer to always print the explicit type
use for such types.
Fixes #6850.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of our type optimization passes emit all non-public types as a
single large rec group, which trivially ensures that different types
remain different, even if they are optimized to have the same structure.
Usually emitting a single large rec group is fine, but it also means
that if the module is split, all of the types will need to be repeated
in all of the split modules. To better support this use case, add a pass
that can split the large rec group back into minimal rec groups, taking
care to preserve separate type identities by emitting different
permutations of the same group where possible or by inserting unused
brand types to differentiate them.
|
|
|
|
|
| |
Audit the remaining ocurrences of `== HeapType::` and fix those that did
not handle shared types correctly. Add tests for some of the fixes;
others are NFC but clarify the code.
|
|
|
|
|
| |
Also use TableInit in the interpreter to initialize module's table
state, which will now handle traps properly, fixing #6431
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't properly validate that yet. E.g.:
(module
(rec
(type $func (func))
(type $unused (sub (struct (field v128))))
)
(func $func (type $func))
)
That v128 is not used, but it ends up in the output because it is in a rec group that is used.
Atm we do not require that SIMD be enabled in such a case, which can trip up the fuzzer.
Context: #6820. For now, modify the test that uncovered this.
|
|
|
|
|
|
|
| |
This is based on these two proposals:
* https://github.com/WebAssembly/tool-conventions/blob/main/BuildId.md
* https://github.com/tc39/source-map/blob/main/proposals/debug-id.md
|
|
|
|
|
|
| |
Since reference types only introduced function and extern references,
all of the types in the `any` hierarchy require GC, including `none`.
Fixes #6839.
|
|
|
|
|
|
|
|
|
| |
Previously we included supertypes, but did not increase their count.
This was done so that the output for the nominal type system, which
introduced explicitly supertypes, would more closely match the output
with the old equirecursive types system. Neither type system exists
anymore and we only support the single, standard isorecursive type
system, so we can now properly count supertypes. It turns out it doesn't
make much of a difference in the test outputs anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The argument is the minimum benefit we must see for us to decide to optimize, e.g.
--monomorphize --pass-arg=monomorphize-min-benefit@50
When the minimum benefit is 50% then if we reduce the cost by 50% through
monomorphization then we optimize there. 95% would only optimize when we
remove almost all the cost, etc.
In practice I see 95% will actually tend to reduce code size overall, as while we add
monomorphized versions of functions, we only do so when we remove a lot of
work and size, and after inlining we gain benefits. However, 50% or even lower can
lead to better benchmark results, in return for larger code size, just like with
inlining. To be careful, the default is set to 95%.
Previously we optimized whenever we saw any benefit at all, which is the same
as requiring a minimum benefit of 0%. Old tests have the flag applied in this PR
to set that value, so they do not change.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we tracked only whether an expression was relevant to analysis, that is,
whether it interacted with the allocation we were tracing the behavior of. That is
not enough for all cases, though, so also track the form of the interaction, namely
whether the allocation flows through or is fully consumed. An example where that
matters:
(ref.eq
(struct.get $A 0
(local.tee $x
(struct.new_default $A)
)
)
(local.get $x)
)
Here the local.get flows out the allocation, but the struct.get only fully consumes
it. Before this PR we thought the struct.get flowed the allocation, and we misoptimized
this to 1.
To make this possible, do a bunch of minor refactoring:
* Move ParentChildInteraction out of the class.
* Add a "None" interaction there.
* Replace the set of reached expressions with a map of them to their interactions.
* Add helper functions to get an expression's interaction or to update it when replacing.
The new testcase here shows the main fix. The new assertions are covered by existing
testcases.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, we only removed fields from the end of a struct. If we had, say
struct Foo {
int x;
int y;
int z;
};
// Add no fields but inherit the parent's.
struct Bar : Foo {};
If y is only used in Bar, but never Foo, then we still kept it around, because
if we removed it from Foo we'd end up with Foo = {x, z}, Bar = {x, y, z} which
is invalid - Bar no longer extends Foo. But we can do this if we first reorder
the two:
struct Foo {
int x;
int z;
int y; // now y is at the end
};
struct Bar : Foo {};
And the optimized form is
struct Foo {
int x;
int z;
};
struct Bar : Foo {
int y; // now y is added in Bar
};
This lets us remove all fields possible in all cases AFAIK.
This situation is not super-common, as most fields are actually used both
up and down the hierarchy (if they are used at all), but testing on some
large real-world codebases, I see 10 fields removed in Java, 45 in Kotlin,
and 31 in Dart testcases.
The NFC change to src/wasm-type-ordering.h was needed for this to
compile.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The syntax for handler clauses in `resume` instructions has recently
changed, using `on` instead of `tag` now.
Instead of
```
(resume $ct (tag $tag0 $block0) ... (tag $tagn $blockn))
```
we now have
```
(resume $ct (on $tag0 $block0) ... (on $tagn $blockn))
```
This PR adapts parsing, printing, and some tests accordingly.
(Note that this PR deliberately makes none of the other changes that
will arise from implementing the new, combined stack switching proposal,
yet.)
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The optimization is to only use ChildLocalizer, which moves children to
locals, if we actually have a reason to use it. It is simple enough to see if
we are removing fields with side effects here, and only call ChildLocalizer
if we are not. However, this will become much more complicated in a
subsequent PR which will reorder fields, which allows removing yet more
of them (without reordering, we can only remove fields at the end, if any
subtype needs the field).
This is a pretty minor optimization, as it avoids adding a few locals in the rare
case of struct.new operands having side effects. We run --gto at the
start of the pipeline, so later opts will clean that up anyhow. (Though, this
might make us a little less efficient, but the following PR will justify this
regression.)
|
|
|
|
|
|
| |
The type index from the TypeBuilder error was mapped to a file location
incorrectly, resulting in an assertion failure.
Fixes #6816.
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
|
| |
Single-segment mappings were already handled in readNextDebugLocation,
but not in readSourceMapHeader.
|
|
|
|
|
| |
The `timport$` prefix is already used for tables, so the binary parser
currently uses `eimport$` to name tags (I guess because they are
normally exception tags?).
|
|
|
|
|
|
|
|
|
| |
Use an extension of Kahn's algorithm for finding topological orders that
iteratively makes every possible choice at every step to find all the
topological orders. The order being constructed and the set of possible
choices are managed in-place in the same buffer, so the algorithm takes
linear time and space plus amortized constant time per generated order.
This will be used in an upcoming type optimization.
|
| |
|