| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we do a local.set of a value into a local then we have both a subtyping constraint - for
the value to be valid to put in that local - and also a flow of a value, which can then reach
more places. Such flow then interacts with casts in Unsubtyping, since it needs to know
what can flow where in order to know how casts force us to keep subtyping relations.
That regressed in the not-actually-NFC #6323 in which I added the innocuous lines
to add subtyping constraints in ref.eq. It seems fine to require that the arms of a
RefEq must be of type eqref, but Unsubtyping then assuming those arms flowed into
a location of type eqref... which means casts might force us to not optimize some
things.
To fix this, differentiate the rare case of non-flowing subtyping constraints, which is
basically only RefEq. There are perhaps a few more cases (like i31 operations) but they
do not matter in practice for Unsubtyping anyhow; I suggest we land this first to undo
the regression and then at our leisure investigate the other instructions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR is part of a series that adds basic support for the [typed
continuations/wasmfx proposal](https://github.com/wasmfx/specfx).
This particular PR adds support for the `cont.new` instruction for creating
continuations, documented [here(https://github.com/wasmfx/specfx/blob/main/proposals/continuations/Overview.md#instructions).
In short, these instructions are of the form `(cont.new $ct)` where `$ct` must
be a continuation type. The instruction takes a single (nullable) function
reference as its argument, which means that the folded representation of the
instruction is of the form `(cont.new $ct (foo ...))`.
Support for the instruction is implemented in both the old and the new wat
parser.
Note that this PR does not implement validation of the new instruction.
|
|
|
|
|
|
|
|
| |
This replaces horrible hacks to find which nulls need to switch (from none to
noext) with general code using SubtypingDiscoverer. That helper is aware of
where each expression is written, so we can find those nulls trivially.
This is NFC on existing usage but should fix any remaining bugs with null
constants.
|
|
|
|
|
|
| |
This pulls out the subtype-exprs.h parts of #6108
These are NFC in the current codebase, but are fixes for that unlanded PR, and
another unrelated PR that will be opened shortly.
|
|
|
|
| |
We want to actually remove all stringref appearances, in both public and
private types.
|
| |
|
|
|
| |
We only noted the type but not the literal value.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass finds all string.const and creates globals for them. After this transform, no
string.const appears anywhere but in a global, and each string appears in one global
which is then global.get-ed everywhere.
This avoids overhead in VMs where executing a string.const is an allocation, and is
also a good step towards imported strings. For that, this pass will be extended from
gathering to a full lowering pass, which will first gather into globals as this pass does,
and then turn each of those globals with a string.const into an imported externref.
(For that reason this pass is in a file called StringLowering, as the two passes will
share much of their code, and the larger pass should decide the name I think.)
This pass runs in -O2 and above. Repeated executions have no downside (see
details in code).
|
| |
|
|
|
|
|
| |
Fixes a fuzz testcase for wasm-ctor-eval.
Add the beginnings of a polyfill for stdckdint.h to help that.
|
|
|
|
|
| |
This PR is part of a series that adds basic support for the [typed continuations proposal](https://github.com/wasmfx/specfx).
This particular PR adds support for the `resume` instruction. The most notable missing feature is validation, which is not implemented, yet.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds basic support for the new instructions in the new EH proposal
passed at the Oct CG hybrid CG meeting:
https://github.com/WebAssembly/meetings/blob/main/main/2023/CG-10.md
https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md
This mainly adds two instructions: `try_table` and `throw_ref`. This is
the bare minimum required to read and write text and binary format, and
does not include analyses or optimizations. (It includes some analysis
required for validation of existing instructions.) Validation for
the new instructions is not yet included.
`try_table` faces the same problem with the `resume` instruction in
#6083 that without the module-level tag info, we are unable to know the
'sent types' of `try_table`. This solves it with a similar approach
taken in #6083: this adds `Module*` parameter to `finalize` methods,
which defaults to `nullptr` when not given. The `Module*` parameter is
given when called from the binary and text parser, and we cache those
tag types in `sentTypes` array within `TryTable` class. In later
optimization passes, as long as they don't touch tags, it is fine to
call `finalize` without the `Module*`. Refer to
https://github.com/WebAssembly/binaryen/pull/6083#issuecomment-1854634679
and #6096 for related discussions when `resume` was added.
|
|
|
|
|
|
|
| |
`_VECTOR` or `_ARRAY` defines in `wasm-delegations-fields.def` are
supposed to be defined in terms of their non-vector/array counterparts
when undefined. This removes empty `_VECTOR`/`_ARRAY` defines when
including `wasm-delegations-fields.def`, while adding definitions for
`DELEGATE_GET_FIELD` in case it is missing.
|
|
|
|
| |
Those fields should be copied together with all the rest of the metadata that
already is. This was just missed in the prior PR.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove hardcoded paths for globals/functions/etc. in favor of general code
paths that support all the module elements uniformly. As a result of that, we
now support all parts of wasm, such as tables and element segments, that
we didn't before.
This refactoring is NFC aside from adding functionality. Note that this reduces
the size of wasm-metadce by 10% while increasing its functionality - the
benefits of writing generic code.
To support this, add some trivial generic helpers to get or iterate over module
elements using their kind in a dynamic manner. Using them might make
wasm-metadce slightly slower, but I can't measure any difference.
|
|
|
|
|
|
|
|
|
|
| |
This implements an idea I mentioned in the past, to extract the subtyping discovery
code out of Unsubtyping so it could be reused elsewhere. Example possible uses:
the validator could use to remove a lot of code, and also a future PR of mine will
need it. Separately from those, I think this is a nice refactoring as it makes Unsubtyping
much smaller.
This just moves the code out and adds some C++ template elbow grease as needed.
|
|
|
|
|
|
| |
This is not needed in GUFA as it tracks local values precisely (each set is connected to
the gets that actually read from it), but in a future PR it will be useful to track local
values per index (each set is connected to all gets for that index, i.e., each local index
is a single "location").
|
| |
|
|
|
| |
Helps #5951
|
|
|
|
|
|
|
|
|
|
| |
Followup to #6046 - the fuzzer found we missed handling the case of the entry itself
being unreachable, or of an unreachable loop later. Properly identifying unreachable
code requires a flow analysis, unfortunately, so this PR gives up on that and instead
allows LocalGraph to be imprecise in unreachable code. That avoids adding any
overhead, but does mean the IR may be slightly confusing when debugging. It does
not have any optimization downsides, however, as it only affects unreachable code.
Also add a dump() impl in that file which helps debugging.
|
|
|
|
|
|
|
|
|
|
|
| |
If a local index has no sets, then all gets of that index read from the entry block
(a param, or a zero for a local). This is actually a common case, where a param
has no other set, and so it is worth optimizing, which this PR does by avoiding
any flowing operation at all for that index: we just skip and write the entry block
as the source of information for such gets.
#6042 on precompute-propagate goes from 3 minutes to 2 seconds with this (!).
But that testcase is rather special in that it is a huge function with many, many
gets in it, so the overhead we remove is very noticeable there.
|
|
|
|
|
|
|
|
|
|
|
|
| |
allGets was declared in a scope that kept it alive for all blocks, and at
the end of the loop we clear the gets for a particular block. That's clumsy,
and makes a followup harder, so this PR moves it to the natural place for it.
(That is, it moves it to the scope that handles a particular block, and removes
the manual clearing-out of the get at the end of the loop iteration.) Optimizing
compilers are smart enough to be efficient about stack allocations
of objects inside loops anyhow (which I measured).
Helps #6042.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Type annotations on multivalue blocks (and loops, ifs, and trys) are type
indices that refer to function types in the type section. For these type
annotations, the identities of the function types does not matter. As long as
the referenced type has the correct parameters and results, it will be valid to
use.
Previously, when collecting module types, we always used the "default" function
type for multivalue control flow, i.e. we used a final function type with no
supertypes in a singleton rec group. However, in cases where the program already
contains another function type with the expected signature, using the default
type is unnecessary and bloats the type section.
Update the type collecting code to reuse existing function types for multivalue
control flow where possible rather than unconditionally adding the default
function type. Similarly, update the binary writer to use the first heap type
with the required signature when emitting annotations on multivalue control flow
structures. To make this all testable, update the printer to print the type
annotations as well, rather than just the result types. Since the parser was not
able to parse those newly emitted type annotations, update the parser as well.
|
|
|
|
| |
With this, the fuzzer can replace e.g. an eq expression with a specific struct type,
because now it is away that struct types have eq as their ancestor.
|
|
|
|
| |
A later PR will add getSuperType which will mean "get the general super type -
either declared, or not".
|
|
|
|
|
|
|
|
| |
Empty, passive element segments were always emitted as having `func` type
because all their elements trivially were RefFunc (because they have no
elements) and because we were incorrectly checking table types if they existed
instead of the element segment's type directly to see if it was non-func.
Fix the bug by checking each element segment's type directly and add a test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#5989)
Fixes #5983: The testcase from there is used here in a new testcase
remove-unused-brs_levels in which we check if we are willing to unconditionally
do a division operation. Turning an if with an arm that does a division into a
select, which always does the division, is almost 5x slower, so we should probably
be extremely careful about doing that.
I took some measurements and have some suggestions for changes in this PR:
* Raise the cost of div/rem to what I measure on my machine, which is 5x slower
than an add, or worse.
* For some reason we added the if arms rather than take the max of them, so
fix that. This does not help the issue, but was confusing.
* Adjust TooCostlyToRunUnconditionally in the pass from 9 to 8 (this helps
balance the last point).
* Use half that value when not optimizing for size. That is, we allow only 4 extra
unconditional work normally, and 8 in -Os, and when -Oz then we allow any
extra amount.
Aside from the new testcases, some existing ones changed. They all appear to
change in a reasonable way, to me.
We should perhaps go even further than this, and not even run a division
unconditionally in -Os, but I wasn't sure it makes sense to go that far as
other benchmarks may be affected. For now, this makes the benchmark in
#5983 run at full speed in -O3 or -Os, and it remains slow in -Oz. The
modified version of the benchmark that only divides in the if (no other
operations) is still fast in -O3, but it become slow in -Os as we do turn that
if into a select (but again, I didn't want to go that far as to overfit on that one
benchmark).
|
|
|
|
|
|
| |
Move the topological sort from the constructor to a separate method. This CRTP
utility calls into its subclass to query supertype relationships, but doing so
in the base class constructor means that the subclass has not been fully
initialized yet, which can cause problems.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we see A->f0 = A->f0 then we might be copying fields not only between
instances of A but also of any subtypes of A, and so if some subtype has
value x then that x might now have reached any other subtype of A
(even in a sibling type, so long as A is their parent).
We already thought we were handling that, but the mechanism we used to
do so (copying New info to Set info, and letting Set info propagate) was not
enough.
Also add a small constructor to save the work of computing subTypes again.
Add TODOs for some cases that we could optimize regarding copies but
do not, yet.
|
|
|
|
|
| |
Probably any array of non-reference data can be allowed to be public and sent
out of the module, as it is just data. For now, however, just special case the i8
and i16 array types which are useful already for string interop.
|
|
|
|
|
|
|
|
|
| |
TypeFinalization finalizes all types that we can, that is, all private types that have no
children. TypeUnFinalization unfinalizes (opens) all (private) types.
These could be used by first opening all types, optimizing, and then finalizing, as that
might find more opportunities.
Fixes #5933
|
|
|
|
|
|
|
|
| |
This instruction was standardized as part of the bulk memory proposal, but we
never implemented it until now. Leave similar instructions like table.copy as
future work.
Fixes #5939.
|
|
|
|
|
|
|
|
| |
Globally replace the source string "I31New" with "RefI31" in preparation for
renaming the instruction from "i31.new" to "ref.i31", as implemented in the spec
in https://github.com/WebAssembly/gc/pull/422. This would be NFC, except that it
also changes the string in the external-facing C APIs.
A follow-up PR will make the corresponding behavioral change.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Match the spec and parse the shorthand binary and text formats as final and emit
final types without supertypes using the shorthands as well. This is a
potentially-breaking change, since the text and binary shorthands can no longer
be used to define types that have subtypes.
Also make TypeBuilder entries final by default to better match the spec and
update the internal APIs to use the "open" terminology rather than "final"
terminology. Future changes will update the text format to use the standard "sub
open" rather than the current "sub final" keywords. The exception is the new wat
parser, which supporst "sub open" as of this change, since it didn't support
final types at all previously.
|
|
|
|
|
| |
Now that the WasmGC spec has settled on a way of validating non-nullable locals,
we no longer need this experimental feature that allowed nonstandard uses of
non-nullable locals.
|
|
|
|
|
|
| |
The code validating and fixing up non-nullable locals previously did not
correctly handle tuples that contained non-nullable elements, which could have
resulted in invalid modules going undetected. Update the code to handle tuples
and add tests.
|
|
|
| |
Renaming the multimemory flag in Binaryen to match its naming in LLVM.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Simplify the optimization of ref.cast and ref.test in OptimizeInstructions by
moving the loop that examines fallthrough values one at a time out to a shared
function in properties.h. Also simplify ref.cast optimization by analyzing the
cast result in just one place.
In addition to simplifying the code, also make the cast optimizations more
powerful by analyzing the nullability and heap type of the cast value
independently, resulting in a potentially more precise analysis of the cast
behavior. Also improve optimization power by considering fallthrough values when
optimizing the SuccessOnlyIfNonNull case.
|
|
|
|
|
| |
Remove old, experimental instructions and type encodings that will not be
shipped as part of WasmGC. Updating the encodings and text format to match the
final spec is left as future work.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Br and BrOn can consider the code before and after them connected if it might
be reached (which is the case if the Br has a condition, which BrOn always has).
The wasm2js changes may look a little odd as some of them have this:
i64toi32_i32$1 = i64toi32_i32$2;
i64toi32_i32$1 = i64toi32_i32$2;
I looked into that and the reason is that those outputs are not optimized, and
also even in unoptimized wasm2js we do run simplify-locals once (to try to
reduce the downsides of flatten). As a result, this PR makes a difference there,
and that difference can lead to such odd duplicated code after other operations.
However, there are no changes to optimized wasm2js outputs, so there is no
actual problem.
Followup to #5860.
|
|
|
|
|
|
|
|
|
|
|
| |
SimplifyLocals (#5860)
This addresses most of the minor regression from the correctness fix in #5857.
That PR makes us consider calls as branching, but in some cases it is ok to
ignore that branching (see the comment in the code here), which this PR allows as
an option.
This undoes one test change from that PR, showing it undoes the regression for
SimplifyLocals. More tests are added to cover this specifically as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calls were simply not handled there, so we could think we were still in the same
basic block when we were not, affecting various passes (but somehow this went
unnoticed until the TNHOracle #5850 ran on some particular Java code).
One existing test was affected, and two new tests are added: one for TNHOracle
where I detected this, and one in OptimizeCasts which is perhaps a simpler way
to see the problem.
All the cases but the TNH one, however, do not need this fix for correctness
since they actually don't care if a call would throw. As a TODO, we should find a
way to undo this minor regression. The regression only affects builds with EH
enabled, though, so most users should be unaffected even in the interm.
|
|
|
|
|
|
|
|
| |
None of that code is speed-sensitive, or at least doesn't need to be inlined to be
fast. Move it to cpp for faster compile times.
This caused a cascade of necessary header fixes (i.e. after removing unneeded
header inclusions in module-utils.h, files that improperly depended on that
stopped working and needed an added include).
|
|
|
|
|
|
|
|
| |
In order of appearance in this diff:
* Use heapType rather than otherHeapType when either will do, for brevity.
* We checked isNull after checking for isLiteral (line 173), but every null is a literal.
* Calling setNoneOrNull simplifies some repetitive code.
* We checked isLiteral yet again lower down.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TypeMapper is a utility used to globally rewrite types, mapping some eliminated
source types into destination types they should be replaced with. This was
previously done by first rewriting all the types in the IR according to the
given mapping, then rewriting the type definitions and updating all the types in
the IR again. Not only was doing the rewriting twice inefficient, it also
introduced a subtle bug where the set of private types eligible to be rewritten
could be inconsistent because updating types in the IR could change the types of
control flow structures. The fuzzer found a case where this inconsistency caused
the type rebuilding to fail.
Fix the bug by first building the new types with the mapping applied and only
then rewriting the IR a single time.
Also add a `TypeBuilder::dump` utility for use in debugging.
Fixes #5845.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a TrapsNeverHappen oracle that is used inside the main PossibleContents
oracle of GUFA. The idea is that when traps never happen we can reason "backwards"
from information to things that must be true before it:
temp = x.field;
x.cast_to<Y>(); // Y is a subtype of x's type X
Here we cast x to a subtype. If we assume traps never happen then the cast must
succeed, and that means we can assume we had a Y on the previous line, where
perhaps that information lets us infer the value of x.field.
This PR focuses on calls, which are the more interesting situation to optimize
because other passes do some work already inside functions. Specifically, we look
for things that will trap in the called function or the caller, such as if the called
function always casts a param to some type, we can assume the caller passes
such a type in. And if we have a call_ref then any target that would trap cannot be
called (at least in a closed world).
This has some benefits, in particular when combined with --gufa-cast-all since
that casts more things, which lets us apply the inferences made here. I see 3.3%
fewer call_ref instructions on a Kotlin testcase, for example. This helps more
on -Os when we inline less.
|
| |
|
|
|
|
|
|
|
|
|
| |
A pass that just operates on locals, for example, does not care about branches outside of
the function. That means that when we see a call, then even if EH is enabled we don't need
to create a new basic block right after it (unless the call is inside a try-catch - then it might
branch to the catch, of course).
This makes CFG-using passes 9% faster compared to before this PR. This plus #5827 offset
the slowdown from #5823 and overall give an improvement compared to before.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this PR, if a call had no paths to a catch in the same function then we skipped
creating a new basic block right after it. As a result, we could have a call in the middle
of a basic block. If EH is enabled that means we might transfer control flow out of
the function from the middle of a block. But it is better to have the property that
any transfer of control flow - to another basic block, or outside of the function - can
only happen at the end of a basic block.
This causes some overhead, but a subsequent PR (#5838) will remove that as a
followup, and this PR adds a little code to pass the module and check if EH is enabled,
and avoid the overhead if not, which at least avoids regressing the non-EH case
until that followup lands.
|