| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
GlobalStructInference optimizes gets of immutable fields of structs that
are only ever instantiated to initialize immutable globals. Due to all
the immutability, it's not possible for the optimized reads to
synchronize with any writes via the accessed memory, so we just need to
be careful to replace removed seqcst gets with seqcst fences.
As a drive-by, fix some stale comments in gsi.wast.
|
|
|
|
|
|
|
|
|
| |
Conservatively avoid introducing synchronization bugs by not optimizing
atomic struct.gets at all in GUFA. It is possible that we could be more
precise in the future.
Also remove obsolete logic dealing with the types of null values as a
drive-by. All null values now have bottom types, so the type mismatch
this code checked for is impossible.
|
|
|
|
|
| |
GTO removes fields that are never read and also removes sets to those
fields. Update the pass to add a seqcst fence when removing a seqcst set
to preserve its effect on the global order of seqcst operations.
|
|
|
|
|
|
|
|
|
| |
Sequentially consistent gets that are optimized out need to have seqcst
fences inserted in their place to keep the same effect on global
ordering of sequentially consistent operations. In principle, acquire
gets could be similarly optimized with an acquire fence in their place,
but acquire fences synchronize more strongly than acquire gets, so this
may have a negative performance impact. For now, inhibit optimization of
acquire gets.
|
|
|
|
|
|
|
|
|
|
|
| |
Heap2Local replaces gets and sets of non-escaping heap allocations with
gets and sets of locals. Since the accessed data does not escape, it
cannot be used to directly synchronize with other threads, so this
optimization is generally safe even in the presence of shared structs
and atomic struct accesses. The only caveat is that sequentially
consistent accesses additionally participate in the global ordering of
sequentially consistent operations, and that effect on the global
ordering cannot be removed. Insert seqcst fences to maintain this global
synchronization when removing sequentially consistent gets and sets.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement support for both sequentially consistent and acquire-release
variants of `struct.atomic.get` and `struct.atomic.set`, as proposed by
shared-everything-threads. Introduce a new `MemoryOrdering` enum for
describing different levels of atomicity (or the lack thereof). This new
enum should eventually be adopted by linear memory atomic accessors as
well to support acquire-release semantics, but for now just use it in
`StructGet` and `StructSet`.
In addition to implementing parsing and emitting for the instructions,
validate that shared-everything is enabled to use them, mark them as
having synchronization side effects, and lightly optimize them by
relaxing acquire-release accesses to non-shared structs to normal,
unordered accesses. This is valid because such accesses cannot possibly
synchronize with other threads. Also update Precompute to avoid
optimizing out synchronization points.
There are probably other passes that need to be updated to avoid
incorrectly optimizing synchronizing accesses, but identifying and
fixing them is left as future work.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We normally like to move brs after ifs into the if, when in a loop:
(loop $loop
(if
..
(unreachable)
(code)
)
(br $loop)
)
=>
(loop $loop
(if
..
(unreachable)
(block
(code)
(br $loop) ;; moved in
)
)
)
However this may be invalid to do if the if condition is unreachable, as
then one arm may be concrete (`code` in the example could be an `i32`,
for example). As this is dead code anyhow, leave it for DCE.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#7138)
If we see a StructGet with no content (the type it reads from has no writes)
then we can make it unreachable. The old code literally just changed the type
to unreachable, which would later work out with refinalization - but only if
the StructGet's ref was unreachable. But it is possible for this situation to
occur without that, and if so, this hit the validation error "can't have an
unreachable node without an unreachable child".
To fix this, merge all code paths that handle "impossible" situations, which
simplifies things, and add this situation.
This uncovered an existing bug where we noted default values of refs, but
not non-refs (which could lead us to think that a field of a struct that only
was ever created by struct.new_default, was never created at all). Fixed as
well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to call-export*, these imports call a wasm function from outside the
module. The difference is that we send a function reference for them to call
(rather than an export index).
This gives more coverage, first by sending a ref from wasm to JS, and also
since we will now try to call anything that is sent. Exports, in comparison,
are filtered by the fuzzer to things that JS can handle, so this may lead to
more traps, but maybe also some new situations. This also leads to adding
more logic to execution-results.h to model JS trapping properly.
fuzz_shell.js is refactored to allow sharing code between call-export* and
call-ref*.
|
|
|
|
|
|
|
|
|
| |
This pass is now just part of Memory64Lowering.
Once this lands we can remove the `--table64-lowering` flag from
emscripten. Because I've used an alias here there will be some interim
period where emscripten will run this pass twice since it passed both
flags. However, this will only be temporary and that second run will be
a no-op since the first one will remove the feature.
|
|
|
|
| |
In open world we must assume that a funcref that escapes to the outside
might be called.
|
|
|
|
|
|
|
|
|
|
| |
RemoveUnusedBrs sinks blocks into If arms when those arms contain
branches to the blocks and the other arm and condition do not. Now that
we type Ifs with unreachable conditions as unreachable, it is possible
for the If arms to have a different type than the block that would be
sunk, so sinking the block would produce invalid IR. Fix the problem by
never sinking blocks into Ifs with unreachable conditions.
Fixes #7128.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
IRBuilder is a utility for turning arbitrary valid streams of Wasm
instructions into valid Binaryen IR. It is already used in the text
parser, so now use it in the binary parser as well. Since the IRBuilder
API for building each intruction requires only the information that the
binary and text formats include as immediates to that instruction, the
parser is now much simpler than before. In particular, it does not need
to manage a stack of instructions to figure out what the children of
each expression should be; IRBuilder handles this instead.
There are some differences between the IR constructed by IRBuilder and
the IR the binary parser constructed before this change. Most
importantly, IRBuilder generates better multivalue code because it
avoids eagerly breaking up multivalue results into individual components
that might need to be immediately reassembled into a tuple. It also
parses try-delegate more correctly, allowing the delegate to target
arbitrary labels, not just other `try`s. There are also a couple
superficial differences in the generated label and scratch local names.
As part of this change, add support for recording binary source
locations in IRBuilder.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the only Ifs that were typed unreachable were those in which
both arms were unreachable and those in which the condition was
unreachable that would have otherwise been typed none. This caused
problems in IRBuilder because Ifs with unreachable conditions and
value-returning arms would have concrete types, effectively hiding the
unreachable condition from the logic for dropping concretely typed
expressions preceding an unreachable expression when finishing a scope.
Relax the conditions under which an If can be typed unreachable so that
all Ifs with unreachable conditions or two unreachable arms are typed
unreachable. Propagating unreachability more eagerly this way makes
various optimizations of Ifs more powerful. It also requires new
handling for unreachable Ifs with concretely typed arms in the Printer
to ensure that printed wat remains valid.
Also update Unsubtyping, Flatten, and CodeFolding to account for the
newly unreachable Ifs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CodeFolding previously only worked on blocks that did not produce
values. It worked on Ifs that produced values, but only by accident; the
logic for folding matching tails was not written to support tails
producing concrete values, but it happened to work for Ifs because
subsequent ReFinalize runs fixed all the incorrect types it produced.
Improve the power of the optimization by explicitly handling tails that
produce concrete values for both blocks and ifs. Now that the core logic
handles concrete values correctly, remove the unnecessary ReFinalize
run.
Also remove the separate optimization of Ifs with identical arms; this
optimization requires ReFinalize and is already performed by
OptimizeInstructions.
|
|
|
|
|
|
|
|
| |
The LUB of sibling types is their common supertype, but after the
sibling types are merged, their LUB is the merged type, which is a
strict subtype of the previous LUB. This means that merging sibling
types causes `selects` to have stale types when the two select arms
previously had the two merged sibling types. To fix any potential stale
types, ReFinalize after merging sibling types.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CodeFolding previously did not consider br_on_* instructions at all, so
it would happily merge tails even if there were br_on_* branches to the
same label with non-matching tails. Fix the bug by making any label
targeted by any instruction not explicitly handled by CodeFolding
unoptimizable. This will gracefully handle other branching instructions
like `resume` and `resume_throw` as well. Folding these branches
properly is left as future work.
Also rename the test file from code-folding_enable-threads.wast to just
code-folding.wast and enable all features instead of just threads. The
old name was left over from when the test was originally ported to lit,
and the new feature is necessary because the new test uses GC
instructions.
|
|
|
|
| |
Replacing an if with a select may have refined the type. Without this fix,
the sharper stale type checks complain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the classification of public types propagated public
visibility only through types that had previously been collected by
`collectHeapTypes`. Since there are settings that cause
`collectHeapTypes` to collect fewer types, it was possible for public
types to be missed if they were only public because they were reached by
an uncollected types.
Ensure that all public heap types are properly classified by propagating
public visibility even through types that are not part of the collected
output.
Fixes #7103.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We previously allowed valid expressions to have stale types as long as
those stale types were supertypes of the most precise possible types for
the expressions. Allowing stale types like this could mask bugs where we
failed to propagate precise type information, though.
Make validation stricter by requiring all expressions except for control
flow structures to have the most precise possible types. Control flow
structures are exempt because many passes that can refine types wrap the
refined expressions in blocks with the old type to avoid the need for
refinalization. This pattern would be broken and we would need to
refinalize more frequently without this exception for control flow
structures.
Now that all non-control flow expressions must have precise types,
remove functionality relating to building select instructions with
non-precise types. Since finalization of selects now always calculates a
LUB rather than using a provided type, remove the type parameter from
BinaryenSelect in the C and JS APIs.
Now that stale types are no longer valid, fix a bug in TypeSSA where it
failed to refinalize module-level code. This bug previously would not
have caused problems on its own, but the stale types could cause
problems for later runs of Unsubtyping. Now the stale types would cause
TypeSSA output to fail validation.
Also fix a bug where Builder::replaceWithIdenticalType was in fact
replacing with refined types.
Fixes #7087.
|
|
|
|
|
|
|
|
|
|
| |
Before, we would simply not export a function that had an e.g. anyref
param. As a result, the modules were effectively "closed", which was
good for testing full closed-world mode, but not for testing degrees of
open world. To improve that, this PR allows the fuzzer to export such
functions, and an "enclose world" pass is added that "closes" the wasm
(makes it more compatible with closed-world) that is run 50% of the
time, giving us coverage of both styles.
|
|
|
|
|
|
|
|
|
|
|
| |
This pass lowers nontrapping FP to int instructions to implement LLVM's
conversion behavior.
This means that they are not fully complete lowerings according to the
wasm spec, but have the same
undefined behavior that LLM does. This keeps the pass simpler and
preserves existing behavior when
compiling without nontrapping-ft.
This will be used in emscripten, so that we can build libraries with
nontrapping-fp and lower them away after link if desired.
|
|
|
|
|
|
|
|
|
|
|
|
| |
IRBuilder often has to generate new label names for blocks and other
scopes. Previously it would generate each new name by starting with
"block" or "label" and incrementing a suffix until finding a fresh name,
but this made name generation quadratic in the number of names to
generate.
To spend less time generating names, track a hint index at which to
start looking for a fresh name and increment it every time a name is
generated. This speeds up a version of the binary parser that uses
IRBuilder by about 15%.
|
|
|
|
| |
Since the resulting code has the same undefined behavior as LLVM, make
the pass name reflect that.
|
|
|
|
|
|
|
|
|
|
| |
When IRBuilder builds an empty non-block scope such as a function body,
an if arm, a try block, etc, it needs to produce some expression to
represent the empty contents. Previously it produced a nop, but change
it to produce an empty block instead. The binary writer and printer have
special logic to elide empty blocks, so this produces smaller output.
Update J2CLOpts to recognize functions containing empty blocks as
trivial to avoid regressing one of its tests.
|
|
|
|
| |
heap-store-optimization.wast had a test without its accompanying
generated output.
|
|
|
|
| |
(#7072)
|
|
|
|
|
|
|
|
| |
This pass lowers away memory.copy and memory.fill operations. It
generates a function that implements the each of the instructions and
replaces the instructions with calls to those functions.
It does not handle other bulk memory operations (e.g. passive segments
and table operations) because they are not used by emscripten to enable
targeting old browsers that don't support bulk memory.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR fixes this situation:
(block $out
(local.set $x (struct.new X Y Z))
(struct.set $X 0 (local.get $x) (..br $out..)) ;; X' here has a br
)
(local.get $x)
=>
(block $out
(local.set $x (struct.new (..br $out..) Y Z))
)
(local.get $x)
We want to fold the struct.set into the struct.new, but the br is
a problem: if it executes then we skip the struct.set, and the last
local.get in fact reads the struct before the write. And, if we did this
optimization, we'd end up with the br on the struct.new, so it
would skip that instruction and even the local.set.
To fix this, we use the new API from #7039, which lets us query,
"is it ok to move the local.set to where the struct.set is?"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I believe the history here is that
1. We added a PickLoadSigns pass. It checks if a load from memory is stored in
a local that is only every used in a signed or an unsigned manner. If it is, we can
adjust the sign of the load (load8_u/s) to do the sign/unsign during the load.
2. The pass finds each LocalGet and looks either 2 or 3 parents above it. For
a sign operation, we need to look up 3, since the operation is x << K >> K. For
an unsigned, we need only 2, since we have x & M. We hardcoded those
numbers 2 and 3.
3. We added the SignExt feature, which adds i32.extend8_s. This does a sign
extend with a single instruction, not two nested ones, so now we can sign-
extend at depth 2, unlike before. Properties::getSignExtValue was updated
for this, but not the pass PickLoadSigns.
The bug that is fixed here is that we looked at depth 3 for a sign-extend, and
we blindly accepted it if we found one. So we ended up accepting
(i32.extend8_s (ANYTHING (x))), which is a sign-extend of something, but
not of x, which is bad.
We were also missing an optimization opportunity, as we didn't look for
depth 2 sign extends.
This bug is quite old, from when Properties got SignExt support, in #3910.
But the blame isn't there - to notice this then, we'd have had to check each
caller of getSignExtValue throughout the codebase, which isn't reasonable.
The fault is mine, from the first write-up of PickLoadSigns in 2017: the code
should have been fully general, handling 2/3 and checking the output when
it does so (adding == curr, that the sign/zero-extended value is the one we
expect). That is what this PR does.
|
| |
|
|
|
|
|
| |
When we combine a load/store offset with a const, we must not
overflow, as the semantics of offsets do not wrap.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CFP is less precise than GUFA, in particular, when it flows around types then
it does not consider what field it is flowing them to, and its core data
structure is "if a struct.get is done on this type's field, what can be read?".
To see the issue this PR fixes, assume we have
A
/ \
B C
Then if we see struct.set $C, we know that can be read by a struct.get $A
(we can store a reference to a C in such a local/param/etc.), so we propagate
the value of that set to A. And, in general, anything in A can appear in B
(say, if we see a copy, a struct.set of struct.get that operates on types A,
then one of the sides might be a B), so we propagate from A to B. But
now we have propagated something from C to B, which might be of an
incompatible type.
This cannot cause runtime issues, as it just means we are propagating more
than we should, and will end up with less-useful results. But it can break
validation if no other value is possible but one with an incompatible type,
as we'd replace a struct.get $B with a value that only makes sense for C.
(The qualifier "no other value is possible" was added in the previous
sentence because if another one is possible then we'd end up with too
many values to infer anything, and not optimize at all, avoiding any error.)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a regression from #7019. That PR fixed an error on situations with
mixed public and private types, but it made us stop optimizing in valid cases,
including cases with entirely private types.
The specific regression was that we checked if we had an entry in the
map of "can become immutable", and we thought that was enough. But
we may have a private child type with a public parent, and still be able to
optimize in the child if the field is not present in the parent. We also did
not have exhaustive checking of all the states canBecomeImmutable can be,
so add those + testing.
|
| |
|
|
|
|
|
|
|
|
| |
unrefine the output (#7036)
Paradoxically, when a BrOn's castType is refined, its own type (the type it flows out)
can get un-refined: making the castType non-nullable means nulls no longer
flow on the branch, so they may flow out directly, making the BrOn nullable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TypeMerging works by representing the type definition graph as a
partitioned DFA and then refining the partitions to find mergeable
types. #7023 was due to a bug where the DFA included edges from public
types to their children, but did not necessarily include corresponding
states for those children.
One way to fix the bug would have been to traverse the type graph,
finding all reachable public types and creating DFA states for them, but
that might be expensive in cases where there are large graphs of public
types.
Instead, fix the problem by removing the edges from public types to
their children entirely. Types reachable from public types are also
public and therefore are not eligible to be merged, so these edges were
never necessary for correctness.
Fixes #7023.
|
|
|
|
|
|
|
|
| |
We only checked for the case of the immediate super being public while we
are private, but it might be a grandsuper instead. That is, any ancestor that
is public will prevent GTO from removing a field (since we can only add
fields on top of our ancestors). Also, the ancestors might not all have the
field, which would add more complexity to that particular assertion, so just
remove it, and add comprehensive tests.
|
|
|
|
|
|
|
|
|
|
|
| |
These were added to avoid common problems with closed world mode, but
in practice they are causing more harm than good, forcing users to work
around them. In the meantime (until #6965), remove this validation to unblock
current toolchain makers.
Fix GlobalTypeOptimization and AbstractTypeRefining on issues that this
uncovers: without this validation, it is possible to run them on more wasm
files than before, hence these were not previously detected. They are
bundled in this PR because their tests cannot validate before this PR.
|
|
|
|
|
|
|
|
|
|
| |
Similar to #7017 . As with that PR, this reduces some optimizations that were
valid, as we tried to do something complex here and refine types in a public
rec group when it seemed safe to do so, but our analysis was incomplete.
The testcase here shows how another operation can end up causing a
dependency that breaks things, if another type that uses one that we
modify is public. To be safe, ignore all public types. In the future perhaps we
can find a good way to handle "almost-private" types in public rec groups,
in closed world.
|
|
|
| |
Similar to #7017 and #7018
|
| |
|
|
|
|
|
|
| |
TypeUpdater which it uses internally already does so, but we must also
ignore such types earlier, and make no other modifications to them.
Helps #7015
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When EH+GC are enabled then wasm has non-nullable types, and the
sent exnref should be non-nullable. In BinaryenIR we use the non-
nullable type all the time, which we also do for function references
and other things; we lower it if GC is not enabled to a nullable type
for the binary format (see `WasmBinaryWriter::writeType`, to which
comments were added in this PR). That is, this PR makes us handle
exnref the same as those other types.
A new test verifies that behavior. Various existing tests are updated
because ReFinalize will now use the more refined type, so this is
an optimization. It is also a bugfix as in #6987 we started to emit
the refined form in the fuzzer, and this PR makes us handle it
properly in validation and ReFinalization.
|
|
|
|
|
| |
Similar to Break, BrOn, etc., we must apply subtyping constraints of the
types we send to blocks, so that Unsubtyping will not remove subtypings
that are actually needed.
|
|
|
|
|
| |
A mutable exported global might be shared with another module which
writes to it using the current type, which is unsafe and the type system does
not allow, so do not refine there.
|
|
|
|
|
|
|
|
|
|
| |
(#7008)
When we gather strings, we create new globals for each one, that is then
the canonical defining global for it, which will then be used everywhere
else. We create such a global if we lack one, but if we happen to have such
a global - a global that simply defines a string - then we reuse it. But we
didn't handle the case where there was a use before the definition, and
failed to sort the definition before the use.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have
(drop
(block $b (result exnref)
(try_table (catch_all_ref $b)
then we don't really need to send the ref: it is dropped, so we can just replace
catch_all_ref with catch_all and then remove the drop and the block value.
MergeBlocks already had logic to remove block values, so it is the natural
place to add this.
|
|
|
|
|
|
|
|
|
|
| |
with ref.as_non_null (#7004)
(any.convert_extern/extern.convert_any (ref.as_non_null ..))
=>
(ref.as_non_null (any.convert_extern/extern.convert_any ..))
This then allows the RefAsNonNull to be combined with parents in some cases
(whereas the reverse allows nothing).
|