| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CodeFolding previously did not consider br_on_* instructions at all, so
it would happily merge tails even if there were br_on_* branches to the
same label with non-matching tails. Fix the bug by making any label
targeted by any instruction not explicitly handled by CodeFolding
unoptimizable. This will gracefully handle other branching instructions
like `resume` and `resume_throw` as well. Folding these branches
properly is left as future work.
Also rename the test file from code-folding_enable-threads.wast to just
code-folding.wast and enable all features instead of just threads. The
old name was left over from when the test was originally ported to lit,
and the new feature is necessary because the new test uses GC
instructions.
|
|
|
|
| |
Replacing an if with a select may have refined the type. Without this fix,
the sharper stale type checks complain.
|
|
|
|
|
|
|
|
| |
I forgot that there is a validation rule that the output type for
br_on_cast and br_on_cast_fail must be a subtype of the input type. We
were previously printing bottom input types in cases where the cast
operand was unreachable, but that's only valid if the cast type is the
same bottom type. Instead print the most precise valid input type, which
is the cast type itself.
|
|
|
|
|
|
|
|
|
| |
Since Load expressions use their `type` field to encode the type of the
loaded value, unreachable loads need to come up with some other valid
type to print. Previously we always chose i32 as that type, but that's
not valid when the load was originally a v128 load with an alignment of
8, since 8 is greater than the maximum valid alignment of 4 for an i32.
Fix the problem by taking alignment into account when choosing a type
for the unreachable load.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
br_on_cast and br_on_cast_fail have two type annotations: one for their
input type and one for their cast type. In cases where their operands
were unreachable, we were previously printing "unreachable" for the
input type annotation. This is not valid wat because "unreachable" is
not a reference type.
To fix the problem, print the bottom type of the cast type's hierarchy
as the input type for br_on_cast and br_on_cast_fail when the operand is
unreachable. This ensures that the instructions have the most precise
possible output type according to Wasm typing rules, so it maximizes the
number of contexts in which the printed instructions are valid.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We previously allowed valid expressions to have stale types as long as
those stale types were supertypes of the most precise possible types for
the expressions. Allowing stale types like this could mask bugs where we
failed to propagate precise type information, though.
Make validation stricter by requiring all expressions except for control
flow structures to have the most precise possible types. Control flow
structures are exempt because many passes that can refine types wrap the
refined expressions in blocks with the old type to avoid the need for
refinalization. This pattern would be broken and we would need to
refinalize more frequently without this exception for control flow
structures.
Now that all non-control flow expressions must have precise types,
remove functionality relating to building select instructions with
non-precise types. Since finalization of selects now always calculates a
LUB rather than using a provided type, remove the type parameter from
BinaryenSelect in the C and JS APIs.
Now that stale types are no longer valid, fix a bug in TypeSSA where it
failed to refinalize module-level code. This bug previously would not
have caused problems on its own, but the stale types could cause
problems for later runs of Unsubtyping. Now the stale types would cause
TypeSSA output to fail validation.
Also fix a bug where Builder::replaceWithIdenticalType was in fact
replacing with refined types.
Fixes #7087.
|
|
|
|
|
|
|
|
|
|
| |
Before, we would simply not export a function that had an e.g. anyref
param. As a result, the modules were effectively "closed", which was
good for testing full closed-world mode, but not for testing degrees of
open world. To improve that, this PR allows the fuzzer to export such
functions, and an "enclose world" pass is added that "closes" the wasm
(makes it more compatible with closed-world) that is run 50% of the
time, giving us coverage of both styles.
|
|
|
|
|
|
|
|
|
|
|
| |
This pass lowers nontrapping FP to int instructions to implement LLVM's
conversion behavior.
This means that they are not fully complete lowerings according to the
wasm spec, but have the same
undefined behavior that LLM does. This keeps the pass simpler and
preserves existing behavior when
compiling without nontrapping-ft.
This will be used in emscripten, so that we can build libraries with
nontrapping-fp and lower them away after link if desired.
|
|
|
|
|
|
| |
Since IRBuilder already knows what labels are used by branches, it is
easy for it to pass that information when finalizing blocks. This avoids
finalization having to walk the blocks looking for branches, speeding up
a future version of the binary parser that uses IRBuilder by 10%.
|
|
|
|
| |
Since the resulting code has the same undefined behavior as LLVM, make
the pass name reflect that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
IRBuilder contains a pointer to the current function that is used to
create scratch locals, look up the operand types for returns, etc. This
pointer is nullable because IRBuilder can also be used in non-function
contexts such as global initializers. Visiting the start of a function
sets the function pointer, and after this change visiting the end of a
function resets the pointer to null. This avoids potential problems
where code outside a function would be able to incorrectly use scratch
locals and returns if the IRBuilder had previously been used to build a
function.
This change requires some adjustments to Outlining, which visits code
out of order, so ends up visiting code from inside a function after
visiting the end of the function. To support this use case, add a
`setFunction` method to IRBuilder that lets the user explicitly control
its function context. Also remove the optional function pointer
parameter to the IRBuilder constructor since it is less flexible and not
used.
|
|
|
|
|
|
|
|
|
|
| |
When IRBuilder builds an empty non-block scope such as a function body,
an if arm, a try block, etc, it needs to produce some expression to
represent the empty contents. Previously it produced a nop, but change
it to produce an empty block instead. The binary writer and printer have
special logic to elide empty blocks, so this produces smaller output.
Update J2CLOpts to recognize functions containing empty blocks as
trivial to avoid regressing one of its tests.
|
|
|
|
| |
(#7072)
|
|
|
|
|
|
|
|
|
|
|
| |
There were previously two separate code paths for printing function
signatures, one for imported functions and one for declared functions.
The only intended difference was that parameter names were printed for
declared functions but not for imported functions.
Reduce duplication by consolidating the code paths, and add support for
printing names for imported function parameters that have them. Also fix
a bug where empty names were printed as `$` rather than the correct
`$""`.
|
|
|
|
|
|
|
|
| |
This pass lowers away memory.copy and memory.fill operations. It
generates a function that implements the each of the instructions and
replaces the instructions with calls to those functions.
It does not handle other bulk memory operations (e.g. passive segments
and table operations) because they are not used by emscripten to enable
targeting old browsers that don't support bulk memory.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR fixes this situation:
(block $out
(local.set $x (struct.new X Y Z))
(struct.set $X 0 (local.get $x) (..br $out..)) ;; X' here has a br
)
(local.get $x)
=>
(block $out
(local.set $x (struct.new (..br $out..) Y Z))
)
(local.get $x)
We want to fold the struct.set into the struct.new, but the br is
a problem: if it executes then we skip the struct.set, and the last
local.get in fact reads the struct before the write. And, if we did this
optimization, we'd end up with the br on the struct.new, so it
would skip that instruction and even the local.set.
To fix this, we use the new API from #7039, which lets us query,
"is it ok to move the local.set to where the struct.set is?"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I believe the history here is that
1. We added a PickLoadSigns pass. It checks if a load from memory is stored in
a local that is only every used in a signed or an unsigned manner. If it is, we can
adjust the sign of the load (load8_u/s) to do the sign/unsign during the load.
2. The pass finds each LocalGet and looks either 2 or 3 parents above it. For
a sign operation, we need to look up 3, since the operation is x << K >> K. For
an unsigned, we need only 2, since we have x & M. We hardcoded those
numbers 2 and 3.
3. We added the SignExt feature, which adds i32.extend8_s. This does a sign
extend with a single instruction, not two nested ones, so now we can sign-
extend at depth 2, unlike before. Properties::getSignExtValue was updated
for this, but not the pass PickLoadSigns.
The bug that is fixed here is that we looked at depth 3 for a sign-extend, and
we blindly accepted it if we found one. So we ended up accepting
(i32.extend8_s (ANYTHING (x))), which is a sign-extend of something, but
not of x, which is bad.
We were also missing an optimization opportunity, as we didn't look for
depth 2 sign extends.
This bug is quite old, from when Properties got SignExt support, in #3910.
But the blame isn't there - to notice this then, we'd have had to check each
caller of getSignExtValue throughout the codebase, which isn't reasonable.
The fault is mine, from the first write-up of PickLoadSigns in 2017: the code
should have been fully general, handling 2/3 and checking the output when
it does so (adding == curr, that the sign/zero-extended value is the one we
expect). That is what this PR does.
|
|
|
| |
See https://github.com/WebAssembly/memory64/pull/92
|
| |
|
|
|
|
|
| |
When we combine a load/store offset with a const, we must not
overflow, as the semantics of offsets do not wrap.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CFP is less precise than GUFA, in particular, when it flows around types then
it does not consider what field it is flowing them to, and its core data
structure is "if a struct.get is done on this type's field, what can be read?".
To see the issue this PR fixes, assume we have
A
/ \
B C
Then if we see struct.set $C, we know that can be read by a struct.get $A
(we can store a reference to a C in such a local/param/etc.), so we propagate
the value of that set to A. And, in general, anything in A can appear in B
(say, if we see a copy, a struct.set of struct.get that operates on types A,
then one of the sides might be a B), so we propagate from A to B. But
now we have propagated something from C to B, which might be of an
incompatible type.
This cannot cause runtime issues, as it just means we are propagating more
than we should, and will end up with less-useful results. But it can break
validation if no other value is possible but one with an incompatible type,
as we'd replace a struct.get $B with a value that only makes sense for C.
(The qualifier "no other value is possible" was added in the previous
sentence because if another one is possible then we'd end up with too
many values to infer anything, and not optimize at all, avoiding any error.)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a regression from #7019. That PR fixed an error on situations with
mixed public and private types, but it made us stop optimizing in valid cases,
including cases with entirely private types.
The specific regression was that we checked if we had an entry in the
map of "can become immutable", and we thought that was enough. But
we may have a private child type with a public parent, and still be able to
optimize in the child if the field is not present in the parent. We also did
not have exhaustive checking of all the states canBecomeImmutable can be,
so add those + testing.
|
|
|
|
|
|
|
|
|
|
|
| |
This is NFC on 64-bit systems but noticeable on 32.
Also remove the 32-bit path in hash_combine. That isn't necessary for this fix,
but it makes the code simpler and also makes debugging between systems
simpler. It might also avoid problems in future cases, if we are lucky. The only
cost is perhaps a slight slowdown on 32-bit systems, which seems worth it.
Fixes #7046
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
unrefine the output (#7036)
Paradoxically, when a BrOn's castType is refined, its own type (the type it flows out)
can get un-refined: making the castType non-nullable means nulls no longer
flow on the branch, so they may flow out directly, making the BrOn nullable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TypeMerging works by representing the type definition graph as a
partitioned DFA and then refining the partitions to find mergeable
types. #7023 was due to a bug where the DFA included edges from public
types to their children, but did not necessarily include corresponding
states for those children.
One way to fix the bug would have been to traverse the type graph,
finding all reachable public types and creating DFA states for them, but
that might be expensive in cases where there are large graphs of public
types.
Instead, fix the problem by removing the edges from public types to
their children entirely. Types reachable from public types are also
public and therefore are not eligible to be merged, so these edges were
never necessary for correctness.
Fixes #7023.
|
|
|
|
|
|
|
|
| |
We only checked for the case of the immediate super being public while we
are private, but it might be a grandsuper instead. That is, any ancestor that
is public will prevent GTO from removing a field (since we can only add
fields on top of our ancestors). Also, the ancestors might not all have the
field, which would add more complexity to that particular assertion, so just
remove it, and add comprehensive tests.
|
|
|
|
|
|
|
|
|
|
|
| |
These were added to avoid common problems with closed world mode, but
in practice they are causing more harm than good, forcing users to work
around them. In the meantime (until #6965), remove this validation to unblock
current toolchain makers.
Fix GlobalTypeOptimization and AbstractTypeRefining on issues that this
uncovers: without this validation, it is possible to run them on more wasm
files than before, hence these were not previously detected. They are
bundled in this PR because their tests cannot validate before this PR.
|
|
|
|
|
|
|
|
|
|
| |
Similar to #7017 . As with that PR, this reduces some optimizations that were
valid, as we tried to do something complex here and refine types in a public
rec group when it seemed safe to do so, but our analysis was incomplete.
The testcase here shows how another operation can end up causing a
dependency that breaks things, if another type that uses one that we
modify is public. To be safe, ignore all public types. In the future perhaps we
can find a good way to handle "almost-private" types in public rec groups,
in closed world.
|
|
|
| |
Similar to #7017 and #7018
|
| |
|
|
|
|
|
|
| |
TypeUpdater which it uses internally already does so, but we must also
ignore such types earlier, and make no other modifications to them.
Helps #7015
|
|
|
|
|
|
|
| |
This can help find errors in the middle of passes like Inlining, that do
multiple cycles and include optimizations in the middle.
We do this in BINARYEN_PASS_DEBUG >= 2 to avoid slowing down the
timing reports in 1.
|
|
|
|
|
| |
A mutable exported global might be shared with another module which
writes to it using the current type, which is unsafe and the type system does
not allow, so do not refine there.
|
|
|
|
|
|
|
|
|
|
| |
(#7008)
When we gather strings, we create new globals for each one, that is then
the canonical defining global for it, which will then be used everywhere
else. We create such a global if we lack one, but if we happen to have such
a global - a global that simply defines a string - then we reuse it. But we
didn't handle the case where there was a use before the definition, and
failed to sort the definition before the use.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have
(drop
(block $b (result exnref)
(try_table (catch_all_ref $b)
then we don't really need to send the ref: it is dropped, so we can just replace
catch_all_ref with catch_all and then remove the drop and the block value.
MergeBlocks already had logic to remove block values, so it is the natural
place to add this.
|
|
|
|
|
|
|
|
|
|
| |
with ref.as_non_null (#7004)
(any.convert_extern/extern.convert_any (ref.as_non_null ..))
=>
(ref.as_non_null (any.convert_extern/extern.convert_any ..))
This then allows the RefAsNonNull to be combined with parents in some cases
(whereas the reverse allows nothing).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows
(block $out (result i32)
(try_table (catch..)
..
(br $out
(i32.const 42)
)
)
)
=>
(block $out (result i32)
(try_table (result i32) (catch..) ;; add a result
..
(i32.const 42) ;; remove the br around the value
)
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#6994)
In #6984 we optimized dropped blocks even if they had unreachable code. In #6988
that part was reverted, and blocks with unreachable code were ignored once more.
However, I realized that the check was not actually for unreachable code, but for
having an unreachable child, so it would miss things like this:
(block
(block
..
(br $somewhere) ;; unreachable type, but no unreachable code
)
)
But it is useful to merge such blocks: we don't need the inner block here.
To fix this, just run ReFinalize if we change anything, which will propagate
unreachability as needed. I think MergeBlocks was written before we had
that utility, so it didn't use it...
This is not only useful for itself but will unblock an EH optimization in a
later PR, that has code in this form. It also simplifies the code by removing
the hasUnreachableChild checks.
|
|
|
|
|
|
|
|
| |
#6980 was missing the logic to reset flows after replacing a throw. The process
of replacing the throw introduces new code and in particular a drop, which
blocks branches from flowing to their targets.
In the testcase here, the br was turned into nop before this fix.
|
|
|
| |
This error makes #6989 less confusing.
|
|
|
|
| |
We ignored legacy Trys in #6980, but they can also catch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When I refactored the optimizeDroppedBlock logic in #6982, I didn't move the
unreachability check with that code, which was wrong. When that function
was called from another place in #6984, the fuzzer found an issue.
Diff without whitespace is smaller. This reverts almost all the test updates
from #6984 - those changes were on blocks with unreachable children.
The change was safe on them, but in general removing a block value in the
presence of unreachable code is tricky, so it's best to avoid it.
The testcase is a little bizarre, but it's the one the fuzzer found and I can't
find a way to generate a better one (other than to reduce it, which I did).
|
|
|
|
|
|
| |
Just call optimizeDroppedBlock from visitDrop to handle that.
Followup to #6982. This optimizes the new testcase added there. Some older
tests also improve.
|
|
|
|
|
|
|
|
| |
This change is NFC on all things we previously optimized, but also makes
us optimize TryTable, BrOn, etc., by replacing hard-coded logic for Break
with generic code.
Also simplify the code there a little - we didn't really need ControlFlowWalker.
|
|
|
|
|
|
|
|
| |
This just moves the code out into a function. A later PR will use it in another place.
Add a test that shows the motivation for that later PR: we fail to optimize away
a block return value at the top level of a function. Fixing that will involve calling
the new function here from another place.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
E.g.
(try_table (catch_all $catch)
(throw $e)
)
=>
(try_table (catch_all $catch)
(br $catch)
)
This can then allow other passes to remove the TryTable, if no throwing
things remain.
|
|
|
|
|
|
|
| |
Support 5-segment source mappings, which add a name.
Reference:
https://github.com/tc39/source-map/blob/main/source-map-rev3.md#proposed-format
|