| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
IRBuilder contains a pointer to the current function that is used to
create scratch locals, look up the operand types for returns, etc. This
pointer is nullable because IRBuilder can also be used in non-function
contexts such as global initializers. Visiting the start of a function
sets the function pointer, and after this change visiting the end of a
function resets the pointer to null. This avoids potential problems
where code outside a function would be able to incorrectly use scratch
locals and returns if the IRBuilder had previously been used to build a
function.
This change requires some adjustments to Outlining, which visits code
out of order, so ends up visiting code from inside a function after
visiting the end of the function. To support this use case, add a
`setFunction` method to IRBuilder that lets the user explicitly control
its function context. Also remove the optional function pointer
parameter to the IRBuilder constructor since it is less flexible and not
used.
|
|
|
|
|
|
|
|
|
|
| |
When IRBuilder builds an empty non-block scope such as a function body,
an if arm, a try block, etc, it needs to produce some expression to
represent the empty contents. Previously it produced a nop, but change
it to produce an empty block instead. The binary writer and printer have
special logic to elide empty blocks, so this produces smaller output.
Update J2CLOpts to recognize functions containing empty blocks as
trivial to avoid regressing one of its tests.
|
|
|
|
|
|
|
|
|
|
|
| |
The binary reader has special handling for blocks immediately nested
inside other blocks to eliminate recursion while parsing very deep
stacks of blocks. This special handling did not record binary locations
for the nested blocks, though.
Add logic to record binary locations for nested blocks. This binary
reading code is about to be replaced with completely different code that
uses IRBuilder instead, but this change will eliminate some test
differences that we would otherwise see when we make that change.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of setting the local names at the end of binary reading, eagerly
set them before parsing function bodies. This is NFC now, but will fix a
future bug once the binary reader uses IRBuilder. IRBuilder can
introduce new scratch locals, and it gives them the names `$scratch`,
`$scratch_1`, etc. If the name section includes locals with the same
names and we set those local names after parsing function bodies, then
we can end up with multiple locals with the same names. Setting the
names before parsing the function bodies ensures that IRBuilder will
generate different names for the scratch locals.
The alternative fix would be to generate fresh names when setting names
from the name section, but it is better to respect the names in the name
section and use fresh names for the newly introduced scratch locals
instead.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
IRBuilder introduces scratch locals to hoist values from underneath
stacky code to the top of the stack for consumption by the next
instruction. When it does so, the sequence of instructions from the set
to the get of the scratch local is packaged in a block so the entire
sequence can be made a child of the next instruction. In cases where the
hoisted value comes from a `pop`, this packaging can make the IR
invalid, since `pop`s are not allowed to appear inside blocks.
Detect when this problem might occur and fix it by running
`EHUtils::handleBlockNestedPops` after the function containing the
problem has been constructed.
|
|
|
|
|
|
|
|
|
| |
Rather than back-patching names when we get to the names section in the
binary reader, skip ahead to read the names section before anything else
so we can use the final names right away. This is a prerequisite for
using IRBuilder in the binary reader.
The only functional change is that we now allow empty local names. Empty
names are perfectly valid.
|
|
|
| |
See https://github.com/WebAssembly/memory64/pull/92
|
|
|
|
|
|
|
|
| |
This has not been emitted in LLVM since
https://github.com/llvm/llvm-project/commit/3f34e1b883351c7d98426b084386a7aa762aa366.
The corresponding proposed tool-conventions change:
https://github.com/WebAssembly/tool-conventions/pull/236
|
|
|
|
|
|
|
|
| |
Emscripten's JS loader code for wasm-split isn't prepared for handling
multiple tables that binaryen automatically creates when reference types
are enabled (especially in conjunction with dynamic loading). For now,
disable creation of multiple tables when using Emscripten's table ABI
(distinguished by importing or exporting a table named
"__indirect_function_table".
|
|
|
|
| |
table.fill was introduced by the reference-types proposal (but also, only
makes sense among the other bulk memory operations, so require both).
|
|
|
|
|
|
|
|
|
|
|
| |
These were added to avoid common problems with closed world mode, but
in practice they are causing more harm than good, forcing users to work
around them. In the meantime (until #6965), remove this validation to unblock
current toolchain makers.
Fix GlobalTypeOptimization and AbstractTypeRefining on issues that this
uncovers: without this validation, it is possible to run them on more wasm
files than before, hence these were not previously detected. They are
bundled in this PR because their tests cannot validate before this PR.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When EH+GC are enabled then wasm has non-nullable types, and the
sent exnref should be non-nullable. In BinaryenIR we use the non-
nullable type all the time, which we also do for function references
and other things; we lower it if GC is not enabled to a nullable type
for the binary format (see `WasmBinaryWriter::writeType`, to which
comments were added in this PR). That is, this PR makes us handle
exnref the same as those other types.
A new test verifies that behavior. Various existing tests are updated
because ReFinalize will now use the more refined type, so this is
an optimization. It is also a bugfix as in #6987 we started to emit
the refined form in the fuzzer, and this PR makes us handle it
properly in validation and ReFinalization.
|
|
|
|
|
|
|
|
|
|
|
|
| |
getModuleElement (#6998)
Passing a constant string to functions requires memory allocation, and
allocation is inherently slow. Since we are using C++17, we can use
string_view and remove this unnecessary allocation.
Although the code seems simple enough for the optimizer to remove this
allocation after inlining, tests on Clang 18 show that this is not the
case (on Apple Silicon at least).
|
|
|
|
|
|
|
| |
Support 5-segment source mappings, which add a name.
Reference:
https://github.com/tc39/source-map/blob/main/source-map-rev3.md#proposed-format
|
|
|
|
| |
Some versions of libcxx or clang error without this, apparently due to
Type being a forward declaration.
|
|
|
|
|
|
|
| |
This raises the number of locals accepted by the binary parser to the
absolute limit in the spec. A warning is now printed when writing a
binary file if the Web limit of 50,000 locals is exceeded.
Fixes #6968.
|
|
|
|
|
|
|
|
|
|
| |
Note: FP16 is a little different from F32/F64 since it can't represent
the full 2^16 integer range. 65504 is the max whole integer. This leads
to some slightly strange behavior when converting integers greater than
65504 since they become infinity.
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
|
|
|
|
|
| |
The purpose of the datacount section is to pre-declare how many data
segments there will be so that engines can allocate space for them
and not have to back patch subsequent instructions in the code section
that refer to them. Once we use IRBuilder in the binary parser, we will
have to have the data segments available by the time we parse
instructions that use them, so eagerly construct the data segments when
parsing the datacount section.
|
|
|
|
|
|
|
|
| |
In preparation for using IRBuilder in the binary parser, eagerly create
Functions when parsing the function section so that they are already
created once we parse the code section. IRBuilder will require the
functions to exist when parsing calls so it can figure out what type
each call should have, even when there is a call to a function whose
body has not been parsed yet.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a struct.get or array.get is optimized to have a null reference
operand, its return type loses meaning since the operation will always
trap. Previously when refinalizing such expressions, we just left their
return type unchanged since there was no longer an associated struct or
array type to calculate it from. However, this could lead to a strange
setup where the stale return type was the last remaining use of some
heap type in the module. That heap type would never be emitted in the
binary, but it was still used in the IR, so type optimizations would
have to keep updating it. Our type collecting logic went out of its way
to include the return types of struct.get and array.get expressions to
account for this strange possibility, even though it otherwise collected
only types that would appear in binaries.
In principle, all of this should have applied to `call_ref` as well, but
the type collection logic did not have the necessary special case, so
there was probably a latent bug there.
Get rid of these special cases in the type collection logic and make it
impossible for the IR to use a stale type that no longer appears in the
binary by updating such stale types during finalization. One possibility
would have been to make the return types of null accessors unreachable,
but this violates the usual invariant that unreachable instructions must
either have unreachable children or be branches or `(unreachable)`.
Instead, refine the return types to be uninhabitable non-nullable
references to bottom, which is nearly as good as refining them directly
to unreachable.
We can consider refining them to `unreachable` in the future, but
another problem with that is that it would currently allow the parsers
to admit more invalid modules with arbitrary junk after null accessor
instructions.
|
|
|
|
|
|
|
| |
This avoids creating a large Literals (SmallVector of Literal) and then copying it. All the places
that construct GCData do not need the Literals afterwards.
This gives a 7% speedup on the --precompute benchmark from #6931
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
methods (#6936)
This just moves code around. As a result, isRef() vanishes entirely from the
profiling traces in #6931, since now the core isRef/Tuple/etc. methods are
all inlineable.
This also required some reordering of wasm-type.h, namely to move HeapType
up front. No changes to that class otherwise.
TypeInfo is now in the header. getTypeInfo is now a static method on Type.
This has the downside of moving internal details into the header, and it may
increase compile time a little. The upside is making the --precompute benchmark
from #6931 significantly faster, 33%, and it will also help the many
Type::isNonNullable() etc. calls we have scattered around the codebase in
other passes too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a few heap types that are hard-coded to be considered public
and therefore allowed on module boundaries even in --closed-world mode,
specifically to support js-string-builtins. We previously considered
both open and closed (i.e. final) mutable i8 arrays to be public in this
manner, but js-string-builtins only uses the closed versions, so remove
the open versions.
This fixes a particular bug in which Unsubtyping optimized a private
array type to be equivalent to an ignorable public array type,
incorrectly changing the behavior of a cast, but it does not address the
larger problem of optimizations producing types that are equivalent to
public types. Add a TODO about that problem for now.
Fixes #6935.
|
|
|
|
|
|
|
|
| |
We were doing a debug logging for every LEB byte. It turns out that the
isDebugEnabled() calls are expensive when called so frequently: in a
release+assertion build, even with debug disabled, these checks are the
highest thing in the profile. This PR removes the checks, which makes
binary reading 12% faster.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike other module elements, types are not stored on the `Module`.
Instead, they are collected by traversing the IR before printing and
binary writing. The code that collects the types tries to optimize the
order of rec groups based on the number of times each type is used. As a
result, the output order of types generally has no relation to the input
order of types. In addition, most type optimizations rewrite the types
into a single large rec group, and the order of types in that group is
essentially arbitrary. Changes to the code for counting type uses,
sorting types, or sorting rec groups can yield very large changes in the
output order of types, producing test diffs that are hard to review and
potentially harming the readability of tests by moving output types away
from the corresponding input types.
To help make test output more stable and readable, introduce a tool
option that causes the order of output types to match the order of input
types as closely as possible. It is implemented by having the parsers
record the indices of the input types on the `Module` just like they
already record the type names. The `GlobalTypeRewriter` infrastructure
used by type optimizations associates the new types with the old indices
just like it already does for names and also respects the input order
when rewriting types into a large recursion group.
By default, wasm-opt and other tools clear the recorded type indices
after parsing the module, so their default behavior is not modified by
this change.
Follow-on PRs will use the new flag in more tests, which will generate
large diffs but leave the tests in stable, more readable states that
will no longer change due to other changes to the optimizing type
sorting logic.
|
|
|
|
|
| |
This new API on lazy local graphs allows us to use laziness in another place,
StackIR opts. This makes writing the binary (which includes StackIR opts, when
those are enabled), 10% faster.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this change, replace lane was converting all the F16 lanes to F32
and then replacing one lane with the F16 (I32 representation) value, but
it did not then convert all the other lanes back to F16 (I32). To fix
this we can just leave the lanes as I32 and replace the one lane.
Note: Previous replace lane tests didn't catch this since they started
with vectors with all zeros so the F32->I32 didn't matter. Also, other
operations don't run into this issue since they iterate over all lanes
and convert the F32's back to F16 (I32).
---------
Co-authored-by: Alon Zakai <alonzakai@gmail.com>
|
|
|
|
|
|
|
| |
This renames `Catch(All)_P3` enum to denote the old Phase 3
`catch(_all)` instructions to `Catch(All)_Legacy`, which sounds clearer.
This is also to be consistent with
https://github.com/llvm/llvm-project/pull/107187.
|
|
|
|
|
| |
This replaces direct access of the data structure graph.*influences[foo] with a call
graph.get*influences(foo). This will allow a later PR to make those calls optionally
lazy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
A few notes:
- The F32x4 and F64x2 versions of madd and nmadd are missing spect
tests.
- For madd, the implementation was incorrectly doing `(b*c)+a` where it
should be `(a*b)+c`.
- For nmadd, the implementation was incorrectly doing `(-b*c)+a` where
it should be `-(a*b)+c`.
- There doesn't appear to be a great way to actually implement a fused
nmadd, but the spec allows the double rounded version I added.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before we just had a map that people would access with localGraph.getSetses[get],
while now it is a call localGraph.getSets(get), which more nicely hides the internal
implementation details.
Also rename getSetses => getSetsMap.
This will allow a later PR to optimize the internals of this API.
This is performance-neutral as far as I can measure. (We do replace a direct read
from a data structure with a call, but the call is in a header and should always get
inlined.)
|
|
|
|
|
|
|
| |
The instructions relaxed_fma and relaxed_fnma have been renamed to
relaxed_madd and relaxed_nmadd.
https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md#binary-format
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
|
|
|
|
|
|
| |
visitBlock() and validateCallParamsAndResult() both assumed they were
running inside a function, but might be called on global code too. Calls
and blocks are invalid in global positions, so we should error there, but
must do so properly without a null deref.
Fixes #6847
Fixes #6848
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spec tests use constants like `ref.array` and `ref.eq` to assert that
exported function return references of the correct types. Support more
such constants in the wast parser.
Also fix a bug where the interpretation of `array.new_data` for arrays
of packed fields was not properly truncating the packed data. Move the
function for reading fields from memory from literal.cpp to
wasm-interpreter.h, where the function for truncating packed data lives.
Other bugs prevent us from enabling any more spec tests as a result of
this change, but we can get farther through several of them before
failing. Update the comments about the failures accordingly.
|
|
|
| |
Ensure the "fp16" feature is enabled for FP16 instructions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is in quite ancient code, so it's a long-standing issue, but it got worse
when we enabled StackIR in more situations (#6568), which made it more
noticeable, I think.
For example, testing on test_biggerswitch in Emscripten, the LLVM part
is pretty slow too so the Binaryen slowdown didn't stand out hugely, but
just doing
wasm-opt --optimize-level=2 input.wasm -o output.wasm
(that is, do no work, but set the optimize level to 2 so that StackIR opts
are run) used to take 28 seconds (!). With this PR that goes down to less
than 1.
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spec tests pass the value `ref.extern n`, where `n` is some integer,
into exported functions that expect to receive externrefs and receive
such values back out as return values. The payload serves to distinguish
externrefs so the test can assert that the correct one was returned.
Parse these values in wast scripts and represent them as externalized
i31refs carrying the payload. We will need a different representation
eventually, since some tests explicitly expect these externrefs to not
be i31refs, but this suffices to get several new tests passing.
To get the memory64 version of table_grow.wast passing, additionally fix
the interpreter to handle growing 64-bit tables correctly.
Delete the local versions of the upstream tests that can now be run
successfully.
|
|
|
|
|
|
|
|
| |
The leading bytes that indicate what kind of heap type is being defined
are bytes, but we were previously treating them as SLEB128-encoded
values. Since we emit the smallest LEB encodings possible, we were
writing the correct bytes in output files, but we were also improperly
accepting binaries that used more than one byte to encode these values.
This was caught by an upstream spec test.
|
|
|
|
|
|
| |
* Add interpreter support for exnref values.
* Fix optimization passes to support try_table.
* Enable the interpreter (but not in V8, see code) on exceptions.
|
|
|
|
|
|
|
|
|
| |
IRBuilder is responsible for validation involving type annotations on GC
instructions because those type annotations may not be preserved in the
built IR to be used by the main validator. For `array.init_elem`, we
were not using the type annotation to validate the element segment,
which allowed us to parse invalid modules when the reference operand was
a nullref. Add the missing validation in IRBuilder and fix a relevant
spec test.
|
|
|
|
|
|
|
|
| |
Replace code that checked `isStruct()`, `isArray()`, etc. in sequence
with uses of `HeapType::getKind()` and switch statements. This will make
it easier to find the code that needs updating if/when we add new heap
type kinds in the future. It also makes it much easier to find code that
already needs updating to handle continuation types by grepping for
"TODO: cont".
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of our type optimization passes emit all non-public types as a
single large rec group, which trivially ensures that different types
remain different, even if they are optimized to have the same structure.
Usually emitting a single large rec group is fine, but it also means
that if the module is split, all of the types will need to be repeated
in all of the split modules. To better support this use case, add a pass
that can split the large rec group back into minimal rec groups, taking
care to preserve separate type identities by emitting different
permutations of the same group where possible or by inserting unused
brand types to differentiate them.
|
|
|
|
|
| |
Audit the remaining ocurrences of `== HeapType::` and fix those that did
not handle shared types correctly. Add tests for some of the fixes;
others are NFC but clarify the code.
|
|
|
|
|
| |
Also use TableInit in the interpreter to initialize module's table
state, which will now handle traps properly, fixing #6431
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous rules for stale types were complicated and hard to
remember: in general it was ok for result types to be further refinable
as long as they were not refinable all the way to `unreachable`, but
control flow structures had a carve-out and it was ok for them to be
refinable all the way to unreachable.
Simplify the rules so that further refinable result types are always ok,
no matter what they can be refined to and no matter what kind of
instruction is being validated. This will be much easier to remember and
reason about.
This relaxation of the rules strictly increases the set of valid IR, so
no passes or tests need to be updated. It does make it possible for us
to miss type refinement opportunities that previously would have been
validation errors, but only in cases where non-control-flow instructions
could have been refined all the way to unreachable, so the risk seems
small.
|
|
|
|
|
|
|
|
| |
Diff without whitespace is smaller.
* HeapType::ext was handled in two places. The second place was wrong, but not reached.
* Near the end all we have left are refs, so no need to check isRef etc.
* Simplify the code to get the heap type once.
|
|
|
|
|
|
|
| |
This is based on these two proposals:
* https://github.com/WebAssembly/tool-conventions/blob/main/BuildId.md
* https://github.com/tc39/source-map/blob/main/proposals/debug-id.md
|
|
|
|
|
|
| |
Since reference types only introduced function and extern references,
all of the types in the `any` hierarchy require GC, including `none`.
Fixes #6839.
|