| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This PR is part of a series that adds basic support for the typed continuations proposal.
This PR relaxes the restriction that tags must not have results , only params. Tags with
results must not be used for exception handling and are only allowed if the typed
continuations feature is enabled.
As a minor point, this PR also changes the printing of tags without params: To make the
presentation consistent, (param) is omitted when printing a tag.
|
|
|
| |
This test is failing on main, looks like the update to the test was overwritten when commits merged. Fixing with the result of running update_lit_test.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#5989)
Fixes #5983: The testcase from there is used here in a new testcase
remove-unused-brs_levels in which we check if we are willing to unconditionally
do a division operation. Turning an if with an arm that does a division into a
select, which always does the division, is almost 5x slower, so we should probably
be extremely careful about doing that.
I took some measurements and have some suggestions for changes in this PR:
* Raise the cost of div/rem to what I measure on my machine, which is 5x slower
than an add, or worse.
* For some reason we added the if arms rather than take the max of them, so
fix that. This does not help the issue, but was confusing.
* Adjust TooCostlyToRunUnconditionally in the pass from 9 to 8 (this helps
balance the last point).
* Use half that value when not optimizing for size. That is, we allow only 4 extra
unconditional work normally, and 8 in -Os, and when -Oz then we allow any
extra amount.
Aside from the new testcases, some existing ones changed. They all appear to
change in a reasonable way, to me.
We should perhaps go even further than this, and not even run a division
unconditionally in -Os, but I wasn't sure it makes sense to go that far as
other benchmarks may be affected. For now, this makes the benchmark in
#5983 run at full speed in -O3 or -Os, and it remains slow in -Oz. The
modified version of the benchmark that only divides in the if (no other
operations) is still fast in -O3, but it become slow in -Os as we do turn that
if into a select (but again, I didn't want to go that far as to overfit on that one
benchmark).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
```wast
(if (result i32)
(expr0)
(i32.const 1)
(expr1)
)
```
can be written as
```wast
(i32.or
(expr0)
(expr1)
)
```
Also this removes some unused variables and methods.
This also adds an optimization for
```wast
(i32.eqz
(global.get $__asyncify_state)
)
```
in `--mod-asyncify-always-and-only-unwind` to fix an unexpected
regression caused by this.
|
|
|
|
|
|
| |
Just like we do with other casts, refine the cast type to be the greatest lower
bound of its previous cast type and its input type. The difference is that the
output type of ref.test remains i32, but it's still useful to retain more
precise type information.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we see A->f0 = A->f0 then we might be copying fields not only between
instances of A but also of any subtypes of A, and so if some subtype has
value x then that x might now have reached any other subtype of A
(even in a sibling type, so long as A is their parent).
We already thought we were handling that, but the mechanism we used to
do so (copying New info to Set info, and letting Set info propagate) was not
enough.
Also add a small constructor to save the work of computing subTypes again.
Add TODOs for some cases that we could optimize regarding copies but
do not, yet.
|
|
|
| |
Like table.set, it can modify a table.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
local2stack removes a pair of
local.set 0
local.get 0
when that set is not used anywhere else: whatever value is put into the local,
we can just leave it on the stack to replace the get. However, we only handled
actual uses of the set which we checked using LocalGraph. There may be code
that does not actually use the set local, but needs that set purely for validation
reasons:
local.set 0
local.get 0
block
local.set 0
end
local.get
That last get reads the value set in the block, so the first set is not used by it.
But for validation purposes, the inner set stops helping at the block end, so
we do need that initial set.
To fix this, check for gets that need our set to validate before removing any.
Fixes #5917
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#5968)
Apparently $N (e.g. FooClass$5) is a convention in Java for anonymous classes, so our
$N that we use to disambiguate could be confusing. As the way we disambiguate does
not matter, switch to using _N. This PR does that in both TypeSSA and NameTypes.
Also make NameTypes "lint" names as it goes. That pass tries to give types nice names,
leaving existing ones that seem ok, and renaming long or unnamed ones. This PR makes
it aware of the _N notation and it tries to remove it, if removing it does not cause a
collision. An example of how that helps is if TypeSSA creates a subtype $Foo_0 and then
we manage to remove $Foo, then we can use the shorter name for the subtype.
|
|
|
|
|
| |
Probably any array of non-reference data can be allowed to be public and sent
out of the module, as it is just data. For now, however, just special case the i8
and i16 array types which are useful already for string interop.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
E.g.
(local $x (ref eq)
...
(local.set $x
(struct.new $float
...
)
)
(struct.get $float 0
(ref.cast (ref $float)
(local.get $x)
)
)
This PR allows us to use heap2local, ignoring the passing cast.
This is similar to existing handling of ref.as_non_null.
|
|
|
|
| |
This Stack IR optimization is not compatible with a much more powerful
optimization we plan to do for tuples in the binary writer.
|
|
|
|
| |
Fix some whitespace, and name and reorder a few items to make the output better
match the input, but otherwise port the tests to lit unmodified.
|
|
|
|
|
|
|
|
|
| |
TypeFinalization finalizes all types that we can, that is, all private types that have no
children. TypeUnFinalization unfinalizes (opens) all (private) types.
These could be used by first opening all types, optimizing, and then finalizing, as that
might find more opportunities.
Fixes #5933
|
| |
|
|
|
|
|
|
|
| |
Remove support for the "struct_subtype", "array_subtype", "func_subtype", and
"extends" notations we used at various times to declare WasmGC types, leaving
only support for the standard text fromat for declaring types. Update all the
tests using the old formats and delete tests that existed solely to test the old
formats.
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases tuples are obviously not needed, such as when they are only used
in local operations and make/extract. Such tuples are not used as return values or
in control flow structures, so we might as well lower them to individual locals per
lane, which other passes can optimize a lot better.
I believe LLVM does the same with its own tuples: it lowers them as much as
possible, leaving only necessary ones.
Fixes #5923
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
E.g.
(tuple.extract 1
(tuple.make (A) (B) (C))
=>
(B)
Modify some existing tests to not be in this trivial form, so that they do not
stop testing what they should.
|
|
|
|
|
| |
Replace i31.new with ref.i31 in the printer, tests, and source code. Continue
parsing i31.new for the time being to allow a graceful transition. Also update
the JS API to reflect the new instruction name.
|
|
|
|
| |
Remove the old forms of ref.test and ref.cast that took heap types instead of
ref types and remove the old array.init_static name for array.new_fixed.
|
|
|
|
|
|
| |
Previously, the printer incorrectly reconstructed imported functions' types from
their signatures instead of printing their types directly. This could cause the
printer to print uses of types that were never defined and did not exist in the
module. Fix the bug by printing imported functions' heap types directly.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Match the spec and parse the shorthand binary and text formats as final and emit
final types without supertypes using the shorthands as well. This is a
potentially-breaking change, since the text and binary shorthands can no longer
be used to define types that have subtypes.
Also make TypeBuilder entries final by default to better match the spec and
update the internal APIs to use the "open" terminology rather than "final"
terminology. Future changes will update the text format to use the standard "sub
open" rather than the current "sub final" keywords. The exception is the new wat
parser, which supporst "sub open" as of this change, since it didn't support
final types at all previously.
|
|
|
|
|
| |
Now that the WasmGC spec has settled on a way of validating non-nullable locals,
we no longer need this experimental feature that allowed nonstandard uses of
non-nullable locals.
|
|
|
|
|
|
|
| |
In the binary parser, when creating a scratch local to hold multivalue results
as tuples, we previously ensured that the scratch local did not contain any
non-nullable by modifying its type and inserting ref.as_non_null as necessary.
Now that we properly support non-nullable elements in tuple locals, however,
this parser behavior is no longer necessary. Remove it.
|
|
|
|
|
|
| |
The code validating and fixing up non-nullable locals previously did not
correctly handle tuples that contained non-nullable elements, which could have
resulted in invalid modules going undetected. Update the code to handle tuples
and add tests.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When printing Binaryen IR, we previously generated names for unnamed heap types
based on their structure. This was useful for seeing the structure of simple
types at a glance without having to separately go look up their definitions, but
it also had two problems:
1. The same name could be generated for multiple types. The generated names did
not take into account rec group structure or finality, so types that differed
only in these properties would have the same name. Also, generated type names
were limited in length, so very large types that shared only some structure
could also end up with the same names. Using the same name for multiple types
produces incorrect and unparsable output.
2. The generated names were not useful beyond the most trivial examples. Even
with length limits, names for nontrivial types were extremely long and visually
noisy, which made reading disassembled real-world code more challenging.
Fix these problems by emitting simple indexed names for unnamed heap types
instead. This regresses readability for very simple examples, but the trade off
is worth it.
This change also reduces the number of type printing systems we have by one.
Previously we had the system in Print.cpp, but we had another, more general and
extensible system in wasm-type-printing.h and wasm-type.cpp as well. Remove the
old type printing system from Print.cpp and replace it with a much smaller use
of the new system. This requires significant refactoring of Print.cpp so that
PrintExpressionContents object now holds a reference to a parent
PrintSExpression object that holds the type name state.
This diff is very large because almost every test output changed slightly. To
minimize the diff and ease review, change the type printer in wasm-type.cpp to
behave the same as the old type printer in Print.cpp except for the differences
in name generation. These changes will be reverted in much smaller PRs in the
future to generally improve how types are printed.
|
|
|
|
|
|
|
|
|
| |
Previously it was possible that the supertype merging phase would merge
unrelated types when DFA minimization would split a common supertype out of a
partition, leaving unrelated types behind in the same partition. Fix the problem
by post-processing the partitions in the supertype merging phase to split any
partitions that contain unrelated types.
Fixes #5877.
|
| |
|
|
|
|
|
|
|
| |
If we refine a signature type that is used in a call.without.effects then that call's
results may need to be updated. In the IR it looks like a normal call that happens to
pass a function reference as the last param, but it actually means that we call that
function (without side effects), so we need to have the same results, and the validator
already verified that (so the new testcase here fails without this fix).
|
|
|
|
|
|
|
|
|
| |
* Update text output for `ref.cast` and `ref.test`
* Update text output for `array.new_fixed`
* Update tests with new syntax for `ref.cast` and `ref.test`
* Update tests with new `array.new_fixed` syntax
|
|
|
|
|
|
|
|
| |
The improvements to RemoveUnusedBrs in #5887 also introduced a regression where
the pass did not correctly handle unreachable fallthrough values and crashed
with an assertion failure. Fix the problem by returning early when a fallthrough
value is unreachable and add a regression test.
Fixes #5892.
|
|
|
|
|
|
|
|
|
|
|
| |
* Allow new syntax for some stringref opcodes
Fixes #5607
* Update stringref text output
* Update tests with new syntax for stringref opcodes
Except in test/lit/strings.wat, to check that the legacy syntax still works.
|
|
|
| |
Renaming the multimemory flag in Binaryen to match its naming in LLVM.
|
|
|
|
|
|
|
| |
Optimize both the known-null and known-non-null cases for BrOnNull and
BrOnNonNull and optimize for more cast behaviors such as SuccessOnlyIfNonNull
and Unreachable for BrOnCast and BrOnCastFail. Leave optimizing
SuccessOnlyIfNull to future work, since that's more complicated. Use type
information from fallthrough values to inform all the optimizations.
|
|
|
|
|
|
|
|
|
|
| |
Previously CallRef::finalize() would never update the type of the CallRef, even
if the type of the call target had been refined to give a more precise result
type. Besides unnecessarily losing type information, this could also lead to
validation errors, since the validator checks that the type of CallRef matches
the result type of the target signature.
Fix the bug by updating CallRef's type based on its target signature in
CallRef::finalize() and add a test that depends on this refinalization.
|
|
|
|
| |
Similar to #5885 this was uncovered by #5881 #5882. Here we need to refinalize
when we replace a local.get with a null, since the null's type is more refined.
|
|
|
|
| |
This has been a bug for a while but it became noticeable after #5881 #5882
which do more work in refinalization.
|
|
|
|
|
|
| |
We previously improved the nullability and heap type of the ref.cast target type
in RefCast::finalize() based on what we knew about its input type. Simplify the
code and make this improvement more powerful by using the greatest lower bound
of the original cast target and input type.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The WasmGC spec will require that the target cast type of br_on_cast and
br_on_cast_fail be a subtype of the input type, but so far Binaryen has not
enforced this constraint, so it could produce invalid modules when optimizations
refined the input to a br_on_cast* such that it was no longer a supertype of the
cast target type.
Fix this problem by setting the cast target type to be the greatest lower bound
of the original cast target type and the current input type in
`BrOn::finalize()`. This maintains the invariant that the cast target type
should be a subtype of the input type and it also does not change cast behavior;
any value that could make the original cast succeed at runtime necessarily
inhabits both the original cast target type and the input type, so it also must
inhabit their greatest lower bound and will make the updated cast succeed as
well.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Simplify the optimization of ref.cast and ref.test in OptimizeInstructions by
moving the loop that examines fallthrough values one at a time out to a shared
function in properties.h. Also simplify ref.cast optimization by analyzing the
cast result in just one place.
In addition to simplifying the code, also make the cast optimizations more
powerful by analyzing the nullability and heap type of the cast value
independently, resulting in a potentially more precise analysis of the cast
behavior. Also improve optimization power by considering fallthrough values when
optimizing the SuccessOnlyIfNonNull case.
|
|
|
|
|
|
|
|
| |
We shouldn't need to in the general case, but the fuzzer found a corner case
where we do need to, see the explanation + testcase, but basically Heap2Local
replaces struct fields with locals, and the locals should have the same types,
but if a field was somehow less refined for some reason, then the locals could
actually be more refined. (And a field could be less refined if we read it from a
typed that was under-refined due to a tee or such.)
|
|
|
|
|
| |
Remove old, experimental instructions and type encodings that will not be
shipped as part of WasmGC. Updating the encodings and text format to match the
final spec is left as future work.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Br and BrOn can consider the code before and after them connected if it might
be reached (which is the case if the Br has a condition, which BrOn always has).
The wasm2js changes may look a little odd as some of them have this:
i64toi32_i32$1 = i64toi32_i32$2;
i64toi32_i32$1 = i64toi32_i32$2;
I looked into that and the reason is that those outputs are not optimized, and
also even in unoptimized wasm2js we do run simplify-locals once (to try to
reduce the downsides of flatten). As a result, this PR makes a difference there,
and that difference can lead to such odd duplicated code after other operations.
However, there are no changes to optimized wasm2js outputs, so there is no
actual problem.
Followup to #5860.
|
|
|
|
|
|
|
| |
Followup to #5860, this does the same for (part of) OptimizeCasts.
As there, this is valid because it's ok if we branch away. This part of the pass
picks a different local to get when it knows locals have the same values but one
is more refined. It is ok to add a tee earlier even if it isn't used later.
|
|
|
|
|
|
|
| |
Followup to #5860, this does the same for LocalCSE.
As there, this is valid because it's ok if we branch away. This pass adds a local.tee of
a reused value and then gets it later, and it's ok to add a tee even if we branch away
and do not use it.
|
|
|
|
|
|
|
| |
Followup to #5860, this does the same for SimplifyGlobals as for SimplifyLocals.
As there, this is valid because it's ok if we branch away. This part of the pass
applies a global value to a global.get based on a dominating global.set, so any
dominance is good enough for us.
|
|
|
|
|
|
|
|
|
|
|
| |
SimplifyLocals (#5860)
This addresses most of the minor regression from the correctness fix in #5857.
That PR makes us consider calls as branching, but in some cases it is ok to
ignore that branching (see the comment in the code here), which this PR allows as
an option.
This undoes one test change from that PR, showing it undoes the regression for
SimplifyLocals. More tests are added to cover this specifically as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calls were simply not handled there, so we could think we were still in the same
basic block when we were not, affecting various passes (but somehow this went
unnoticed until the TNHOracle #5850 ran on some particular Java code).
One existing test was affected, and two new tests are added: one for TNHOracle
where I detected this, and one in OptimizeCasts which is perhaps a simpler way
to see the problem.
All the cases but the TNH one, however, do not need this fix for correctness
since they actually don't care if a call would throw. As a TODO, we should find a
way to undo this minor regression. The regression only affects builds with EH
enabled, though, so most users should be unaffected even in the interm.
|