| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
| |
We previously printed explicit typeuses (e.g. `(type $f)`) in function
signatures when GC was enabled. But even when GC is not enabled,
function types may use non-MVP features that require the explicit
typeuse to be printed. Fix the printer to always print the explicit type
use for such types.
Fixes #6850.
|
|
|
|
|
|
|
|
| |
Replace code that checked `isStruct()`, `isArray()`, etc. in sequence
with uses of `HeapType::getKind()` and switch statements. This will make
it easier to find the code that needs updating if/when we add new heap
type kinds in the future. It also makes it much easier to find code that
already needs updating to handle continuation types by grepping for
"TODO: cont".
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of our type optimization passes emit all non-public types as a
single large rec group, which trivially ensures that different types
remain different, even if they are optimized to have the same structure.
Usually emitting a single large rec group is fine, but it also means
that if the module is split, all of the types will need to be repeated
in all of the split modules. To better support this use case, add a pass
that can split the large rec group back into minimal rec groups, taking
care to preserve separate type identities by emitting different
permutations of the same group where possible or by inserting unused
brand types to differentiate them.
|
|
|
|
|
| |
Audit the remaining ocurrences of `== HeapType::` and fix those that did
not handle shared types correctly. Add tests for some of the fixes;
others are NFC but clarify the code.
|
|
|
|
|
| |
Also use TableInit in the interpreter to initialize module's table
state, which will now handle traps properly, fixing #6431
|
|
|
|
|
| |
Also we had a mix of os.environ.get and os.getenv. Prefer the former, as the default
value does actual work, so it's a little more efficient to not run it unnecessarily. That is,
os.getenv('X', work()) is less efficient than os.environ.get('X') or work().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't properly validate that yet. E.g.:
(module
(rec
(type $func (func))
(type $unused (sub (struct (field v128))))
)
(func $func (type $func))
)
That v128 is not used, but it ends up in the output because it is in a rec group that is used.
Atm we do not require that SIMD be enabled in such a case, which can trip up the fuzzer.
Context: #6820. For now, modify the test that uncovered this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous rules for stale types were complicated and hard to
remember: in general it was ok for result types to be further refinable
as long as they were not refinable all the way to `unreachable`, but
control flow structures had a carve-out and it was ok for them to be
refinable all the way to unreachable.
Simplify the rules so that further refinable result types are always ok,
no matter what they can be refined to and no matter what kind of
instruction is being validated. This will be much easier to remember and
reason about.
This relaxation of the rules strictly increases the set of valid IR, so
no passes or tests need to be updated. It does make it possible for us
to miss type refinement opportunities that previously would have been
validation errors, but only in cases where non-control-flow instructions
could have been refined all the way to unreachable, so the risk seems
small.
|
|
|
|
|
|
|
|
| |
Diff without whitespace is smaller.
* HeapType::ext was handled in two places. The second place was wrong, but not reached.
* Near the end all we have left are refs, so no need to check isRef etc.
* Simplify the code to get the heap type once.
|
|
|
|
|
|
|
| |
This is based on these two proposals:
* https://github.com/WebAssembly/tool-conventions/blob/main/BuildId.md
* https://github.com/tc39/source-map/blob/main/proposals/debug-id.md
|
|
|
|
|
|
| |
Since reference types only introduced function and extern references,
all of the types in the `any` hierarchy require GC, including `none`.
Fixes #6839.
|
|
|
|
|
|
|
|
|
| |
Previously we included supertypes, but did not increase their count.
This was done so that the output for the nominal type system, which
introduced explicitly supertypes, would more closely match the output
with the old equirecursive types system. Neither type system exists
anymore and we only support the single, standard isorecursive type
system, so we can now properly count supertypes. It turns out it doesn't
make much of a difference in the test outputs anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The argument is the minimum benefit we must see for us to decide to optimize, e.g.
--monomorphize --pass-arg=monomorphize-min-benefit@50
When the minimum benefit is 50% then if we reduce the cost by 50% through
monomorphization then we optimize there. 95% would only optimize when we
remove almost all the cost, etc.
In practice I see 95% will actually tend to reduce code size overall, as while we add
monomorphized versions of functions, we only do so when we remove a lot of
work and size, and after inlining we gain benefits. However, 50% or even lower can
lead to better benchmark results, in return for larger code size, just like with
inlining. To be careful, the default is set to 95%.
Previously we optimized whenever we saw any benefit at all, which is the same
as requiring a minimum benefit of 0%. Old tests have the flag applied in this PR
to set that value, so they do not change.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we tracked only whether an expression was relevant to analysis, that is,
whether it interacted with the allocation we were tracing the behavior of. That is
not enough for all cases, though, so also track the form of the interaction, namely
whether the allocation flows through or is fully consumed. An example where that
matters:
(ref.eq
(struct.get $A 0
(local.tee $x
(struct.new_default $A)
)
)
(local.get $x)
)
Here the local.get flows out the allocation, but the struct.get only fully consumes
it. Before this PR we thought the struct.get flowed the allocation, and we misoptimized
this to 1.
To make this possible, do a bunch of minor refactoring:
* Move ParentChildInteraction out of the class.
* Add a "None" interaction there.
* Replace the set of reached expressions with a map of them to their interactions.
* Add helper functions to get an expression's interaction or to update it when replacing.
The new testcase here shows the main fix. The new assertions are covered by existing
testcases.
|
|
|
| |
Fixes #6833
|
|
|
|
|
|
|
|
|
|
| |
Previously a module's type names were updated in
`GlobalTypeRewriter::rebuildTypes`, which builds new versions of the
existing types, rather than `GlobalTypeRewriter::mapTypes`, which
otherwise handles replacing old types with new types everywhere in a
module, but should not necessarily replace names. So that users of
`mapTypes` who are building their own versions of existing types can
also easily update type names, split type name mapping logic out into a
new method `GlobalTypeRewriter::mapTypeNames`.
|
|
|
|
|
|
|
|
|
|
| |
Given a function that maps the old child heap types to new child heap
types, the new API takes care of copying the rest of the structure of a
given heap type into a TypeBuilder slot.
Use the new API in GlobalTypeRewriter::rebuildTypes. It will also be
used in an upcoming type optimization. This refactoring also required
adding the ability to clear the supertype of a TypeBuilder slot, which
was previously not possible.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, we only removed fields from the end of a struct. If we had, say
struct Foo {
int x;
int y;
int z;
};
// Add no fields but inherit the parent's.
struct Bar : Foo {};
If y is only used in Bar, but never Foo, then we still kept it around, because
if we removed it from Foo we'd end up with Foo = {x, z}, Bar = {x, y, z} which
is invalid - Bar no longer extends Foo. But we can do this if we first reorder
the two:
struct Foo {
int x;
int z;
int y; // now y is at the end
};
struct Bar : Foo {};
And the optimized form is
struct Foo {
int x;
int z;
};
struct Bar : Foo {
int y; // now y is added in Bar
};
This lets us remove all fields possible in all cases AFAIK.
This situation is not super-common, as most fields are actually used both
up and down the hierarchy (if they are used at all), but testing on some
large real-world codebases, I see 10 fields removed in Java, 45 in Kotlin,
and 31 in Dart testcases.
The NFC change to src/wasm-type-ordering.h was needed for this to
compile.
|
|
|
|
| |
Without this all the newly created thunks lack names in the name
section.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The syntax for handler clauses in `resume` instructions has recently
changed, using `on` instead of `tag` now.
Instead of
```
(resume $ct (tag $tag0 $block0) ... (tag $tagn $blockn))
```
we now have
```
(resume $ct (on $tag0 $block0) ... (on $tagn $blockn))
```
This PR adapts parsing, printing, and some tests accordingly.
(Note that this PR deliberately makes none of the other changes that
will arise from implementing the new, combined stack switching proposal,
yet.)
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
|
|
| |
Make `TopologicalOrders` its own iterator rather than having a separate
iterator class that wraps a pointer to `TopologicalOrders`. This
simplifies usage in cases where an iterator needs to be persistently
stored. Notably, all of the tests continue working as they are.
|
|
|
|
|
|
|
|
|
|
|
| |
This is very similar to the internal utilities for canonicalizing rec
groups in the type system implementation, except that the new utility
also supports ordered comparison of rec groups, and of course the new
utility only uses the public type API.
A follow-up PR will replace the internal implementation of rec group
comparison and hashing in the type system with this one.
Another follow-up PR will use this new utility in a type optimization.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The optimization is to only use ChildLocalizer, which moves children to
locals, if we actually have a reason to use it. It is simple enough to see if
we are removing fields with side effects here, and only call ChildLocalizer
if we are not. However, this will become much more complicated in a
subsequent PR which will reorder fields, which allows removing yet more
of them (without reordering, we can only remove fields at the end, if any
subtype needs the field).
This is a pretty minor optimization, as it avoids adding a few locals in the rare
case of struct.new operands having side effects. We run --gto at the
start of the pipeline, so later opts will clean that up anyhow. (Though, this
might make us a little less efficient, but the following PR will justify this
regression.)
|
|
|
|
|
|
| |
Match the current spec and clarify terminology by renaming the old
`deftype` to `rectype` and renaming the old `subtype` to `typedef`. Also
split the parser for actual `subtype` out of the parser for the newly
named `typedef`.
|
|
|
|
|
|
| |
The type index from the TypeBuilder error was mapped to a file location
incorrectly, resulting in an assertion failure.
Fixes #6816.
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
|
|
|
|
|
| |
PR ##6803 proposed removing Type::isString and HeapType::isString in
favor of more explicit, verbose callsites. There was no consensus to
make this change, but it was accidentally committed as part of #6804.
Revert the accidental change, except for the useful, noncontroversial
parts, such as fixing the `isString` implementation and a few other
locations to correctly handle shared types.
|
|
|
|
|
| |
Single-segment mappings were already handled in readNextDebugLocation,
but not in readSourceMapHeader.
|
|
|
|
|
|
| |
The code for collecting inhabitable types incorrectly considered shared,
non-nullable externrefs to be inhabitable, which disagreed with the code
for rewriting types to be inhabitable, which was correct, causing the
type fuzzer to report an error.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The HeapType API has functions like `isBasic()`, `isStruct()`,
`isSignature()`, etc. to test the classification of a heap type. Many
users have to call these functions in sequence and handle all or most of
the possible classifications. When we add a new kind of heap type,
finding and updating all these sites is a manual and error-prone
process.
To make adding new heap type kinds easier, introduce a new API that
returns an enum classifying the heap type. The enum can be used in
switch statements and the compiler's exhaustiveness checker will flag
use sites that need to be updated when we add a new kind of heap type.
This commit uses the new enum internally in the type system, but
follow-on commits will add new uses and convert uses of the existing
APIs to use `getKind` instead.
|
|
|
|
|
| |
The `timport$` prefix is already used for tables, so the binary parser
currently uses `eimport$` to name tags (I guess because they are
normally exception tags?).
|
|
|
|
|
| |
As a followup we could probably make these more consistent. For example,
we could use a single char prefix for defined functions/tables/globals
(e.g. f0/t0/g0)
|
| |
|
|
|
|
|
|
|
|
|
| |
Use an extension of Kahn's algorithm for finding topological orders that
iteratively makes every possible choice at every step to find all the
topological orders. The order being constructed and the set of possible
choices are managed in-place in the same buffer, so the algorithm takes
linear time and space plus amortized constant time per generated order.
This will be used in an upcoming type optimization.
|
| |
|
| |
|
|
|
|
| |
This will be used in an upcoming type optimization pass and may be
generally useful.
|
|
|
|
|
| |
The local was only used once, so it didn't really add much. And, it was
causing some compilers to error on "unused variable" (when building without
assertions, the use was removed).
|
|
|
|
|
|
| |
We had a TODO to use it once Names was optimized, which it has been.
The Names version is also far faster. When building
https://github.com/JetBrains/kotlinconf-app it saves 70 seconds(!).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before the PR:
$ bin/wasm-opt test/hello_world.wat --metrics
total
[exports] : 1
[funcs] : 1
[globals] : 0
[imports] : 0
[memories] : 1
[memory-data] : 0
[tables] : 0
[tags] : 0
[total] : 3
[vars] : 0
Binary : 1
LocalGet : 2
After the PR:
$ bin/wasm-opt test/hello_world.wat --metrics
Metrics
total
[exports] : 1
[funcs] : 1
...
Note the "Metrics" addition at the top. And the title can be customized:
$ bin/wasm-opt test/hello_world.wat --metrics=text
Metrics: text
total
[exports] : 1
[funcs] : 1
The custom title can be helpful when multiple invocations of metrics are used
at once, e.g. --metrics=before -O3 --metrics=after.
|
|
|
|
|
|
|
|
|
| |
Implement a non-recursive version of Tarjan's Strongly Connected
Component algorithm that consumes and produces iterators for maximum
flexibility.
This will be used in an optimization that transforms the heap type graph
to use minimal recursion groups, which correspond to the strongly
connected components of the type graph.
|
| |
|
| |
|
|
|
|
| |
Generalize the code for simplifying element segments to handle more than
just null and funcref elements.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We marked various expressions as having cost "Unacceptable", fixed at 100, to
ensure we never moved them out from an If arm, etc. Giving them such a high
cost avoids that problem - the cost is higher than the limit we have for moving
code from conditional to unconditional execution - but it also means the total
cost is unrealistic. For example, a function with one such instruction + an add
(cost 1) would end up with cost 101, and removing the add would look
insignificant, which causes issues for things that want to compare costs
(like Monomorphization).
To fix this, adjust some costs. The main change here is to give casts a cost of 5.
I measured this in depth, see the attached benchmark scripts, and it looks
clear that in both V8 and SpiderMonkey the cost of a cast is high enough to
make it not worth turning an if with ref.test arm into a select (which would
always execute the test).
Other costs adjusted here matter a lot less, because they are on operations
that have side effects and so the optimizer will anyhow not move them from
conditional to unconditional execution, but I tried to make them a bit more
realistic while I was removing "Unacceptable":
* Give most atomic operations the 10 cost we've been using for atomic loads/
stores. Perhaps wait and notify should be slower, however, but it seems like
assuming fast switching might be more relevant.
* Give growth operations a cost of 20, and throw operations a cost of 10. These
numbers are entirely made up as I am not even sure how to measure them in
a useful way (but, again, this should not matter much as they have side
effects).
|
|
|
|
|
|
| |
We used the target's type for the read from the source, but due to
subtyping those might be different.
Found by the fuzzer.
|