| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
If all the fields of a struct.new are defaultable, see if replacing it with a
struct.new_default preserves the behavior, and reduce that way if so.
Also add a missing --closed-world to the --remove-unused-types
invocation. Without that, it was erroring and not working, which I
noticed when testing this. The test also checks that.
|
|
|
|
|
|
|
|
|
| |
GlobalStructInference optimizes gets of immutable fields of structs that
are only ever instantiated to initialize immutable globals. Due to all
the immutability, it's not possible for the optimized reads to
synchronize with any writes via the accessed memory, so we just need to
be careful to replace removed seqcst gets with seqcst fences.
As a drive-by, fix some stale comments in gsi.wast.
|
|
|
|
|
|
|
|
|
| |
Conservatively avoid introducing synchronization bugs by not optimizing
atomic struct.gets at all in GUFA. It is possible that we could be more
precise in the future.
Also remove obsolete logic dealing with the types of null values as a
drive-by. All null values now have bottom types, so the type mismatch
this code checked for is impossible.
|
|
|
|
|
| |
GTO removes fields that are never read and also removes sets to those
fields. Update the pass to add a seqcst fence when removing a seqcst set
to preserve its effect on the global order of seqcst operations.
|
|
|
|
|
|
|
|
|
| |
Sequentially consistent gets that are optimized out need to have seqcst
fences inserted in their place to keep the same effect on global
ordering of sequentially consistent operations. In principle, acquire
gets could be similarly optimized with an acquire fence in their place,
but acquire fences synchronize more strongly than acquire gets, so this
may have a negative performance impact. For now, inhibit optimization of
acquire gets.
|
|
|
|
|
|
|
| |
1. Error on retrying due to a wasm-opt issue, locally (in production, we
don't want to error on ClusterFuzz).
2. Move some asserts from test_run_py to the helper generate_testcases
(so that the asserts happen in all callers).
|
|
|
|
|
|
|
|
|
|
|
| |
Heap2Local replaces gets and sets of non-escaping heap allocations with
gets and sets of locals. Since the accessed data does not escape, it
cannot be used to directly synchronize with other threads, so this
optimization is generally safe even in the presence of shared structs
and atomic struct accesses. The only caveat is that sequentially
consistent accesses additionally participate in the global ordering of
sequentially consistent operations, and that effect on the global
ordering cannot be removed. Insert seqcst fences to maintain this global
synchronization when removing sequentially consistent gets and sets.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement support for both sequentially consistent and acquire-release
variants of `struct.atomic.get` and `struct.atomic.set`, as proposed by
shared-everything-threads. Introduce a new `MemoryOrdering` enum for
describing different levels of atomicity (or the lack thereof). This new
enum should eventually be adopted by linear memory atomic accessors as
well to support acquire-release semantics, but for now just use it in
`StructGet` and `StructSet`.
In addition to implementing parsing and emitting for the instructions,
validate that shared-everything is enabled to use them, mark them as
having synchronization side effects, and lightly optimize them by
relaxing acquire-release accesses to non-shared structs to normal,
unordered accesses. This is valid because such accesses cannot possibly
synchronize with other threads. Also update Precompute to avoid
optimizing out synchronization points.
There are probably other passes that need to be updated to avoid
incorrectly optimizing synchronizing accesses, but identifying and
fixing them is left as future work.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We normally like to move brs after ifs into the if, when in a loop:
(loop $loop
(if
..
(unreachable)
(code)
)
(br $loop)
)
=>
(loop $loop
(if
..
(unreachable)
(block
(code)
(br $loop) ;; moved in
)
)
)
However this may be invalid to do if the if condition is unreachable, as
then one arm may be concrete (`code` in the example could be an `i32`,
for example). As this is dead code anyhow, leave it for DCE.
|
|
|
|
|
|
|
| |
(#7154)
With this option, each time we reduce we save a file w.wasm.17 or such,
incrementing that counter. This is useful when debugging the reducer, but
might have more uses.
|
|
|
|
|
|
|
|
|
|
| |
* Add a new "sleep" fuzzer import, that does a sleep for some ms.
* Add JSPI support in fuzz_shell.js. This is in the form of commented-out async/await
keywords - commented out so that normal fuzzing is not impacted. When we want
to fuzz JSPI, we uncomment them. We also apply the JSPI operations of marking
imports and exports as suspending/promising.
JSPI fuzzing is added to both fuzz_opt.py and ClusterFuzz's run.py.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since multivalue was standardized, WebAssembly has supported not only
multiple results but also an arbitrary number of inputs on control flow
structures, but until now Binaryen did not support control flow input.
Binaryen IR still has no way to represent control flow input, so lower
it away using scratch locals in IRBuilder. Since both the text and
binary parsers use IRBuilder, this gives us full support for parsing
control flow inputs.
The lowering scheme is mostly simple. A local.set writing the control
flow inputs to a scratch local is inserted immediately before the
control flow structure begins and a local.get retrieving those inputs is
inserted inside the control flow structure before the rest of its body.
The only complications come from ifs, in which the inputs must be
retrieved at the beginning of both arms, and from loops, where branches
to the beginning of the loop must be transformed so their values are
written to the scratch local along the way.
Resolves #6407.
|
|
|
| |
Fixes #7145
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Value types were previously represented internally as either enum values
for "basic," i.e. non-reference, non-tuple types or pointers to
`TypeInfo` structs encoding either references or tuples. Update the
representation of reference types to use one bit to encode nullability
and the rest of the bits to encode the referenced heap type. This allows
canonical reference types to be created with a single logical or rather
than by taking a lock on a global type store and doing a hash map lookup
to canonicalize.
This change is a massive performance improvement and dramatically
improves how performance scales with threads because the removed lock
was highly contended. Even with a single core, the performance of an O3
optimization pipeline on a WasmGC module improves by 6%. With 8 cores,
the improvement increases to 29% and with all 128 threads on my machine,
the improvement reaches 46%.
The full new encoding of types is as follows:
- If the type ID is within the range of the basic types, the type is
the corresponding basic type.
- Otherwise, if bit 0 is set, the type is a tuple and the rest of the
bits are a canonical pointer to the tuple.
- Otherwise, the type is a reference type. Bit 1 determines the
nullability and the rest of the bits encode the heap type.
Also update the encodings of basic heap types so they no longer use the
low two bits to avoid conflicts with the use of those bits in the
encoding of types.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#7138)
If we see a StructGet with no content (the type it reads from has no writes)
then we can make it unreachable. The old code literally just changed the type
to unreachable, which would later work out with refinalization - but only if
the StructGet's ref was unreachable. But it is possible for this situation to
occur without that, and if so, this hit the validation error "can't have an
unreachable node without an unreachable child".
To fix this, merge all code paths that handle "impossible" situations, which
simplifies things, and add this situation.
This uncovered an existing bug where we noted default values of refs, but
not non-refs (which could lead us to think that a field of a struct that only
was ever created by struct.new_default, was never created at all). Fixed as
well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to call-export*, these imports call a wasm function from outside the
module. The difference is that we send a function reference for them to call
(rather than an export index).
This gives more coverage, first by sending a ref from wasm to JS, and also
since we will now try to call anything that is sent. Exports, in comparison,
are filtered by the fuzzer to things that JS can handle, so this may lead to
more traps, but maybe also some new situations. This also leads to adding
more logic to execution-results.h to model JS trapping properly.
fuzz_shell.js is refactored to allow sharing code between call-export* and
call-ref*.
|
|
|
|
|
|
|
|
|
|
| |
LLVM recently split the bulk-memory-opt feature out from bulk-memory,
containing just memory.copy and memory.fill. This change follows that,
making bulk-memory-opt also enabled when all of bulk-memory is enabled.
It also introduces call-indirect-overlong following LLVM, but ignores
it, since Binaryen has always allowed the encoding (i.e. command
line flags enabling or disabling the feature are accepted but
ignored).
|
|
|
|
|
|
|
|
|
| |
This pass is now just part of Memory64Lowering.
Once this lands we can remove the `--table64-lowering` flag from
emscripten. Because I've used an alias here there will be some interim
period where emscripten will run this pass twice since it passed both
flags. However, this will only be temporary and that second run will be
a no-op since the first one will remove the feature.
|
|
|
|
| |
In open world we must assume that a funcref that escapes to the outside
might be called.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move all state relevant to reading source maps out of WasmBinaryReader
and into a new utility, SourceMapReader. This is a prerequisite for
parallelizing the parsing of function bodies, since the source map
reader state is different at the beginning of each function.
Also take the opportunity to simplify the way we read source maps, for
example by deferring the reading of anything but the position of a debug
location until it will be used and by using `std::optional` instead of
singleton `std::set`s to store function prologue and epilogue debug
locations.
|
|
|
|
|
|
|
|
|
|
| |
While parsing a binary file, there may be pops that need to be fixed up
even if EH is not (yet) enabled because the target features section has
not been parsed yet. Previously `EHUtils::handleBlockNestedPops` did not
do anything if EH was not enabled, so the binary parser would fail to
fix up pops in that case. Add an optional parameter to override this
behavior so the parser can fix up pops unconditionally.
Fixes #7127.
|
|
|
|
|
|
|
|
|
|
| |
RemoveUnusedBrs sinks blocks into If arms when those arms contain
branches to the blocks and the other arm and condition do not. Now that
we type Ifs with unreachable conditions as unreachable, it is possible
for the If arms to have a different type than the block that would be
sunk, so sinking the block would produce invalid IR. Fix the problem by
never sinking blocks into Ifs with unreachable conditions.
Fixes #7128.
|
|
|
|
| |
Even if the size is 0, if the offset is > 0 then we should trap.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
IRBuilder is a utility for turning arbitrary valid streams of Wasm
instructions into valid Binaryen IR. It is already used in the text
parser, so now use it in the binary parser as well. Since the IRBuilder
API for building each intruction requires only the information that the
binary and text formats include as immediates to that instruction, the
parser is now much simpler than before. In particular, it does not need
to manage a stack of instructions to figure out what the children of
each expression should be; IRBuilder handles this instead.
There are some differences between the IR constructed by IRBuilder and
the IR the binary parser constructed before this change. Most
importantly, IRBuilder generates better multivalue code because it
avoids eagerly breaking up multivalue results into individual components
that might need to be immediately reassembled into a tuple. It also
parses try-delegate more correctly, allowing the delegate to target
arbitrary labels, not just other `try`s. There are also a couple
superficial differences in the generated label and scratch local names.
As part of this change, add support for recording binary source
locations in IRBuilder.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the only Ifs that were typed unreachable were those in which
both arms were unreachable and those in which the condition was
unreachable that would have otherwise been typed none. This caused
problems in IRBuilder because Ifs with unreachable conditions and
value-returning arms would have concrete types, effectively hiding the
unreachable condition from the logic for dropping concretely typed
expressions preceding an unreachable expression when finishing a scope.
Relax the conditions under which an If can be typed unreachable so that
all Ifs with unreachable conditions or two unreachable arms are typed
unreachable. Propagating unreachability more eagerly this way makes
various optimizations of Ifs more powerful. It also requires new
handling for unreachable Ifs with concretely typed arms in the Printer
to ensure that printed wat remains valid.
Also update Unsubtyping, Flatten, and CodeFolding to account for the
newly unreachable Ifs.
|
|
|
| |
This feature is depended on by our ClusterFuzz integration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CodeFolding previously only worked on blocks that did not produce
values. It worked on Ifs that produced values, but only by accident; the
logic for folding matching tails was not written to support tails
producing concrete values, but it happened to work for Ifs because
subsequent ReFinalize runs fixed all the incorrect types it produced.
Improve the power of the optimization by explicitly handling tails that
produce concrete values for both blocks and ifs. Now that the core logic
handles concrete values correctly, remove the unnecessary ReFinalize
run.
Also remove the separate optimization of Ifs with identical arms; this
optimization requires ReFinalize and is already performed by
OptimizeInstructions.
|
|
|
|
|
|
|
|
|
| |
The two files are then linked and run by fuzz_shell.js (we had this functionality
already in order to fuzz wasm-split). By adding multiple build and run commands
of both the primary and secondary wasm files, we can end up with multiple
instances of two different wasm files that call between themselves.
To help testing, add a script that extracts the wasm files from the testcase. This
may also be useful in the future for testcase reduction.
|
|
|
|
|
|
|
|
| |
The LUB of sibling types is their common supertype, but after the
sibling types are merged, their LUB is the merged type, which is a
strict subtype of the previous LUB. This means that merging sibling
types causes `selects` to have stale types when the two select arms
previously had the two merged sibling types. To fix any potential stale
types, ReFinalize after merging sibling types.
|
|
|
|
|
|
| |
(#7116)
Lower away saturating fptoint operations when we know we are using
emscripten.
|
|
|
|
|
| |
This was never right for over a decade, and just never used I suppose... it should
have been called "take" since it grabbed data from the other item and then set
that other item to empty. Fix it so it swaps properly.
|
|
|
|
|
| |
Previously the interpreter only executed overflow and bounds checks for
memory.grow on 32-bit memories. Run the checks on 64-bit memories as
well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CodeFolding previously did not consider br_on_* instructions at all, so
it would happily merge tails even if there were br_on_* branches to the
same label with non-matching tails. Fix the bug by making any label
targeted by any instruction not explicitly handled by CodeFolding
unoptimizable. This will gracefully handle other branching instructions
like `resume` and `resume_throw` as well. Folding these branches
properly is left as future work.
Also rename the test file from code-folding_enable-threads.wast to just
code-folding.wast and enable all features instead of just threads. The
old name was left over from when the test was originally ported to lit,
and the new feature is necessary because the new test uses GC
instructions.
|
|
|
|
| |
Replacing an if with a select may have refined the type. Without this fix,
the sharper stale type checks complain.
|
|
|
|
| |
The only internal use was in wasm2js, which doesn't need it. Fix API
tests to explicitly drop expressions as necessary.
|
|
|
|
|
|
|
|
| |
I forgot that there is a validation rule that the output type for
br_on_cast and br_on_cast_fail must be a subtype of the input type. We
were previously printing bottom input types in cases where the cast
operand was unreachable, but that's only valid if the cast type is the
same bottom type. Instead print the most precise valid input type, which
is the cast type itself.
|
|
|
|
|
|
|
|
|
| |
Since Load expressions use their `type` field to encode the type of the
loaded value, unreachable loads need to come up with some other valid
type to print. Previously we always chose i32 as that type, but that's
not valid when the load was originally a v128 load with an alignment of
8, since 8 is greater than the maximum valid alignment of 4 for an i32.
Fix the problem by taking alignment into account when choosing a type
for the unreachable load.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the classification of public types propagated public
visibility only through types that had previously been collected by
`collectHeapTypes`. Since there are settings that cause
`collectHeapTypes` to collect fewer types, it was possible for public
types to be missed if they were only public because they were reached by
an uncollected types.
Ensure that all public heap types are properly classified by propagating
public visibility even through types that are not part of the collected
output.
Fixes #7103.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
br_on_cast and br_on_cast_fail have two type annotations: one for their
input type and one for their cast type. In cases where their operands
were unreachable, we were previously printing "unreachable" for the
input type annotation. This is not valid wat because "unreachable" is
not a reference type.
To fix the problem, print the bottom type of the cast type's hierarchy
as the input type for br_on_cast and br_on_cast_fail when the operand is
unreachable. This ensures that the instructions have the most precise
possible output type according to Wasm typing rules, so it maximizes the
number of contexts in which the printed instructions are valid.
|
|
|
|
|
| |
This just moves code around. It will allow more code reuse in a later PR.
Also add a bit of test logging.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We previously allowed valid expressions to have stale types as long as
those stale types were supertypes of the most precise possible types for
the expressions. Allowing stale types like this could mask bugs where we
failed to propagate precise type information, though.
Make validation stricter by requiring all expressions except for control
flow structures to have the most precise possible types. Control flow
structures are exempt because many passes that can refine types wrap the
refined expressions in blocks with the old type to avoid the need for
refinalization. This pattern would be broken and we would need to
refinalize more frequently without this exception for control flow
structures.
Now that all non-control flow expressions must have precise types,
remove functionality relating to building select instructions with
non-precise types. Since finalization of selects now always calculates a
LUB rather than using a provided type, remove the type parameter from
BinaryenSelect in the C and JS APIs.
Now that stale types are no longer valid, fix a bug in TypeSSA where it
failed to refinalize module-level code. This bug previously would not
have caused problems on its own, but the stale types could cause
problems for later runs of Unsubtyping. Now the stale types would cause
TypeSSA output to fail validation.
Also fix a bug where Builder::replaceWithIdenticalType was in fact
replacing with refined types.
Fixes #7087.
|
|
|
|
|
|
|
|
|
|
|
| |
When a loop has no name, the name does not matter, but we also cannot
emit the same name for all such loops, as that is invalid JS. Just do not
emit a while(){} at all in that case, as no continue can exist anyhow.
Fixes #7099
Also fix two missing * in error reporting logic, that was printing pointers
rather than the expression we wanted to print. I think we changed how
iostream prints things years ago, and forgot to update these.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main fuzz_shell.js code builds and runs the given wasm. After the refactoring
in #7096, it is simple to append to that file and add more build and run operations,
adding more variety to the code, including cross-module interactions. Add logic
to run.py to do that for ClusterFuzz.
To test this, add a node test that builds a module with internal state that can
actually show which module is being executed. The test appends a build+run
operation, whose output prove that we are calling from the first module to the
second and vice versa.
Also add a ClusterFuzz test for run.py that verifies that we add a variety of
build/run operations.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This mostly moves the code around and avoids some duplication. It
also tracks the list of exports with both names and values, so that
if we compile more than one module, we can still access exports
from the previous.
Also add a first test of running fuzz_shell.js in node.
This does make build() append the exports, which was done before
on the main module but not the second one. That only affects the
wasm-split fuzzer, which is not active yet, so this is still NFC.
|
|
|
|
| |
Also add a test that the ClusterFuzz run.py does not warn,
which was helpful when debugging this.
|
|
|
|
|
|
|
|
|
|
| |
Before, we would simply not export a function that had an e.g. anyref
param. As a result, the modules were effectively "closed", which was
good for testing full closed-world mode, but not for testing degrees of
open world. To improve that, this PR allows the fuzzer to export such
functions, and an "enclose world" pass is added that "closes" the wasm
(makes it more compatible with closed-world) that is run 50% of the
time, giving us coverage of both styles.
|
|
|
|
|
|
|
|
|
|
|
| |
This pass lowers nontrapping FP to int instructions to implement LLVM's
conversion behavior.
This means that they are not fully complete lowerings according to the
wasm spec, but have the same
undefined behavior that LLM does. This keeps the pass simpler and
preserves existing behavior when
compiling without nontrapping-ft.
This will be used in emscripten, so that we can build libraries with
nontrapping-fp and lower them away after link if desired.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main addition here is a bundle_clusterfuzz.py script which will package up
the exact files that should be uploaded to ClusterFuzz. It also documents the
process and bundling and testing. You can do
bundle.py OUTPUT_FILE.tgz
That bundles wasm-opt from ./bin., which is enough for local testing. For
actually uploading to ClusterFuzz, we need a portable build, and @dschuff
had the idea to reuse the emsdk build, which works nicely. Doing
bundle.py OUTPUT_FILE.tgz --build-dir=/path/to/emsdk/upstream/
will bundle wasm-opt (+libs) from the emsdk. I verified that those builds
work on ClusterFuzz.
I added several forms of testing here. First, our main fuzzer fuzz_opt.py now
has a ClusterFuzz testcase handler, which simulates a ClusterFuzz environment.
Second, there are smoke tests that run in the unit test suite, and can also be
run separately:
python -m unittest test/unit/test_cluster_fuzz.py
Those unit tests can also run on a given bundle, e.g. one created from an
emsdk build, for testing right before upload:
BINARYEN_CLUSTER_FUZZ_BUNDLE=/path/to/bundle.tgz python -m unittest test/unit/test_cluster_fuzz.py
A third piece of testing is to add a --fuzz-passes test. That is a mode for
-ttf (translate random data into a valid wasm fuzz testcase) that uses random
data to pick and run a set of passes, to further shape the wasm. (--fuzz-passes
had no previous testing, and this PR fixes it and tidies it up a little, adding some
newer passes too).
Otherwise this PR includes the key run.py script that is bundled and then
executed by ClusterFuzz, basically a python script that runs wasm-opt -ttf [..]
to generate testcases, sets up their JS, and emits them.
fuzz_shell.js, which is the JS to execute testcases, will now check if it is
provided binary data of a wasm file. If so, it does not read a wasm file from
argv[1]. (This is needed because ClusterFuzz expects a single file for the
testcase, so we make a JS file with bundled wasm inside it.)
|
|
|
|
|
|
|
|
|
|
|
|
| |
IRBuilder often has to generate new label names for blocks and other
scopes. Previously it would generate each new name by starting with
"block" or "label" and incrementing a suffix until finding a fresh name,
but this made name generation quadratic in the number of names to
generate.
To spend less time generating names, track a hint index at which to
start looking for a fresh name and increment it every time a name is
generated. This speeds up a version of the binary parser that uses
IRBuilder by about 15%.
|
|
|
|
| |
Since the resulting code has the same undefined behavior as LLVM, make
the pass name reflect that.
|