| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
| |
This makes the behavior consistent with emcc builds where we don't run
finalization, and potentially makes testing and debugging easier.
Emscripten still strips the target features section when optimizing.
|
|
|
|
|
|
|
|
|
|
| |
Make the ID enum an `int8_t`, and make the Specific ID a `constexpr`
of that type.
This seems more idiomatic and makes some code simpler, see the
change to `find_all.h` which no longer needs a cast to compile.
This has no performance impact.
|
| |
|
|
|
|
|
|
|
|
| |
unrefine the output (#7036)
Paradoxically, when a BrOn's castType is refined, its own type (the type it flows out)
can get un-refined: making the castType non-nullable means nulls no longer
flow on the branch, so they may flow out directly, making the BrOn nullable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to
* https://github.com/WebAssembly/binaryen/pull/6330
* https://github.com/WebAssembly/binaryen/issues/6311
* https://github.com/WebAssembly/binaryen/pull/6912
* https://github.com/WebAssembly/binaryen/issues/5946
This extends the region we ignore the gcc warning in.
The warning:
ninja: job failed: /usr/bin/c++ -I/src/src -I/src/third_party/FP16/include -I/src/third_party/llvm-project/include -I/src -static -DBUILD_LLVM_DWARF -Wall -Werror -Wextra -Wno-unused-parameter -Wno-dangling-pointer -fno-omit-frame-pointer -fno-rtti -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -Wswitch -Wimplicit-fallthrough -Wnon-virtual-dtor -fPIC -fdiagnostics-color=always -O3 -DNDEBUG -UNDEBUG -std=c++17 -MD -MT src/passes/CMakeFiles/passes.dir/Precompute.cpp.o -MF src/passes/CMakeFiles/passes.dir/Precompute.cpp.o.d -o src/passes/CMakeFiles/passes.dir/Precompute.cpp.o -c /src/src/passes/Precompute.cpp
In file included from /src/src/literal.h:27,
from /src/src/wasm.h:36,
from /src/src/ir/boolean.h:20,
from /src/src/ir/bits.h:20,
from /src/src/ir/properties.h:20,
from /src/src/ir/iteration.h:20,
from /src/src/passes/Precompute.cpp:30:
In copy constructor 'wasm::SmallVector<T, N>::SmallVector(const wasm::SmallVector<T, N>&) [with T = wasm::Expression*; long unsigned int N = 10]',
inlined from 'constexpr std::pair<_T1, _T2>::pair(const _T1&, const _T2&) [with _U1 = wasm::Select* const; _U2 = wasm::SmallVector<wasm::Expression*, 10>; typename std::enable_if<(std::_PCC<true, _T1, _T2>::_ConstructiblePair<_U1, _U2>() && std::_PCC<true, _T1, _T2>::_ImplicitlyConvertiblePair<_U1, _U2>()), bool>::type <anonymous> = true; _T1 = wasm::Select* const; _T2 = wasm::SmallVector<wasm::Expression*, 10>]' at /usr/include/c++/13.2.1/bits/stl_pair.h:559:21,
inlined from 'T& wasm::InsertOrderedMap<Key, T>::operator[](const Key&) [with Key = wasm::Select*; T = wasm::SmallVector<wasm::Expression*, 10>]' at /src/src/support/insert_ordered.h:112:29,
inlined from 'void wasm::Precompute::partiallyPrecompute(wasm::Function*)::StackFinder::visitSelect(wasm::Select*)' at /src/src/passes/Precompute.cpp:571:24,
inlined from 'static void wasm::Walker<SubType, VisitorType>::doVisitSelect(SubType*, wasm::Expression**) [with SubType = wasm::Precompute::partiallyPrecompute(wasm::Function*)::StackFinder; VisitorType = wasm::Visitor<wasm::Precompute::partiallyPrecompute(wasm::Function*)::StackFinder, void>]' at /src/src/wasm-delegations.def:50:1:
/src/src/support/small_vector.h:69:35: error: '<unnamed>.wasm::SmallVector<wasm::Expression*, 10>::fixed' may be used uninitialized [-Werror=maybe-uninitialized]
69 | : usedFixed(other.usedFixed), fixed(other.fixed), flexible(other.flexible) {
| ^~~~~~~~~~~~~~~~~~
In file included from /src/src/passes/Precompute.cpp:37:
/src/src/support/insert_ordered.h: In static member function 'static void wasm::Walker<SubType, VisitorType>::doVisitSelect(SubType*, wasm::Expression**) [with SubType = wasm::Precompute::partiallyPrecompute(wasm::Function*)::StackFinder; VisitorType = wasm::Visitor<wasm::Precompute::partiallyPrecompute(wasm::Function*)::StackFinder, void>]':
/src/src/support/insert_ordered.h:112:29: note: '<anonymous>' declared here
112 | std::pair<const Key, T> kv = {k, {}};
| ^~
|
|
|
| |
Corrected `maybeRefType` declaration to `maybeReftype`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TypeMerging works by representing the type definition graph as a
partitioned DFA and then refining the partitions to find mergeable
types. #7023 was due to a bug where the DFA included edges from public
types to their children, but did not necessarily include corresponding
states for those children.
One way to fix the bug would have been to traverse the type graph,
finding all reachable public types and creating DFA states for them, but
that might be expensive in cases where there are large graphs of public
types.
Instead, fix the problem by removing the edges from public types to
their children entirely. Types reachable from public types are also
public and therefore are not eligible to be merged, so these edges were
never necessary for correctness.
Fixes #7023.
|
|
|
|
|
|
|
|
|
|
|
| |
We already generated (throw ..) instructions in wasm, but it makes sense to model
throws from outside as well, as they cross the module boundary. This adds a new fuzzer
import to the generated modules, "throw", that just does a throw from JS etc.
Also be more precise about handling fuzzing-support imports in fuzz-exec: we now
check that logging functions start with "log*" and error otherwise (this check is
now needed given we have "throw", which is not logging). Also fix a minor issue
with name conflicts for logging functions by using getValidFunctionName for them,
both for logging and for throw.
|
|
|
|
|
|
|
|
| |
We only checked for the case of the immediate super being public while we
are private, but it might be a grandsuper instead. That is, any ancestor that
is public will prevent GTO from removing a field (since we can only add
fields on top of our ancestors). Also, the ancestors might not all have the
field, which would add more complexity to that particular assertion, so just
remove it, and add comprehensive tests.
|
|
|
|
|
|
|
|
|
|
|
| |
These were added to avoid common problems with closed world mode, but
in practice they are causing more harm than good, forcing users to work
around them. In the meantime (until #6965), remove this validation to unblock
current toolchain makers.
Fix GlobalTypeOptimization and AbstractTypeRefining on issues that this
uncovers: without this validation, it is possible to run them on more wasm
files than before, hence these were not previously detected. They are
bundled in this PR because their tests cannot validate before this PR.
|
|
|
|
|
|
|
|
|
|
| |
Similar to #7017 . As with that PR, this reduces some optimizations that were
valid, as we tried to do something complex here and refine types in a public
rec group when it seemed safe to do so, but our analysis was incomplete.
The testcase here shows how another operation can end up causing a
dependency that breaks things, if another type that uses one that we
modify is public. To be safe, ignore all public types. In the future perhaps we
can find a good way to handle "almost-private" types in public rec groups,
in closed world.
|
|
|
| |
Similar to #7017 and #7018
|
| |
|
|
|
|
|
|
| |
TypeUpdater which it uses internally already does so, but we must also
ignore such types earlier, and make no other modifications to them.
Helps #7015
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When EH+GC are enabled then wasm has non-nullable types, and the
sent exnref should be non-nullable. In BinaryenIR we use the non-
nullable type all the time, which we also do for function references
and other things; we lower it if GC is not enabled to a nullable type
for the binary format (see `WasmBinaryWriter::writeType`, to which
comments were added in this PR). That is, this PR makes us handle
exnref the same as those other types.
A new test verifies that behavior. Various existing tests are updated
because ReFinalize will now use the more refined type, so this is
an optimization. It is also a bugfix as in #6987 we started to emit
the refined form in the fuzzer, and this PR makes us handle it
properly in validation and ReFinalization.
|
|
|
|
|
| |
Similar to Break, BrOn, etc., we must apply subtyping constraints of the
types we send to blocks, so that Unsubtyping will not remove subtypings
that are actually needed.
|
|
|
|
|
|
|
| |
This can help find errors in the middle of passes like Inlining, that do
multiple cycles and include optimizations in the middle.
We do this in BINARYEN_PASS_DEBUG >= 2 to avoid slowing down the
timing reports in 1.
|
| |
|
|
|
|
| |
This was used in asm2wasm (the asm.js to wasm compiler, used in
fastcomp, before the LLVM wasm backend replaced it).
|
|
|
|
|
| |
A mutable exported global might be shared with another module which
writes to it using the current type, which is unsafe and the type system does
not allow, so do not refine there.
|
|
|
|
|
|
|
|
|
|
| |
(#7008)
When we gather strings, we create new globals for each one, that is then
the canonical defining global for it, which will then be used everywhere
else. We create such a global if we lack one, but if we happen to have such
a global - a global that simply defines a string - then we reuse it. But we
didn't handle the case where there was a use before the definition, and
failed to sort the definition before the use.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have
(drop
(block $b (result exnref)
(try_table (catch_all_ref $b)
then we don't really need to send the ref: it is dropped, so we can just replace
catch_all_ref with catch_all and then remove the drop and the block value.
MergeBlocks already had logic to remove block values, so it is the natural
place to add this.
|
|
|
|
|
|
|
|
|
|
| |
with ref.as_non_null (#7004)
(any.convert_extern/extern.convert_any (ref.as_non_null ..))
=>
(ref.as_non_null (any.convert_extern/extern.convert_any ..))
This then allows the RefAsNonNull to be combined with parents in some cases
(whereas the reverse allows nothing).
|
|
|
|
|
|
|
|
|
|
|
|
| |
getModuleElement (#6998)
Passing a constant string to functions requires memory allocation, and
allocation is inherently slow. Since we are using C++17, we can use
string_view and remove this unnecessary allocation.
Although the code seems simple enough for the optimizer to remove this
allocation after inlining, tests on Clang 18 show that this is not the
case (on Apple Silicon at least).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows
(block $out (result i32)
(try_table (catch..)
..
(br $out
(i32.const 42)
)
)
)
=>
(block $out (result i32)
(try_table (result i32) (catch..) ;; add a result
..
(i32.const 42) ;; remove the br around the value
)
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#6994)
In #6984 we optimized dropped blocks even if they had unreachable code. In #6988
that part was reverted, and blocks with unreachable code were ignored once more.
However, I realized that the check was not actually for unreachable code, but for
having an unreachable child, so it would miss things like this:
(block
(block
..
(br $somewhere) ;; unreachable type, but no unreachable code
)
)
But it is useful to merge such blocks: we don't need the inner block here.
To fix this, just run ReFinalize if we change anything, which will propagate
unreachability as needed. I think MergeBlocks was written before we had
that utility, so it didn't use it...
This is not only useful for itself but will unblock an EH optimization in a
later PR, that has code in this form. It also simplifies the code by removing
the hasUnreachableChild checks.
|
|
|
|
|
| |
BrOn does not always send a value. This is an odd asymmetry in the wasm
spec, where br_on_null does not send the null on the branch (which makes sense,
but the asymmetry does mean we need to special-case it).
|
|
|
|
|
|
|
|
| |
#6980 was missing the logic to reset flows after replacing a throw. The process
of replacing the throw introduces new code and in particular a drop, which
blocks branches from flowing to their targets.
In the testcase here, the br was turned into nop before this fix.
|
|
|
|
| |
Also make Try/TryTables with type none, and not just concrete types as
before.
|
|
|
| |
This error makes #6989 less confusing.
|
|
|
|
| |
We ignored legacy Trys in #6980, but they can also catch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When I refactored the optimizeDroppedBlock logic in #6982, I didn't move the
unreachability check with that code, which was wrong. When that function
was called from another place in #6984, the fuzzer found an issue.
Diff without whitespace is smaller. This reverts almost all the test updates
from #6984 - those changes were on blocks with unreachable children.
The change was safe on them, but in general removing a block value in the
presence of unreachable code is tricky, so it's best to avoid it.
The testcase is a little bizarre, but it's the one the fuzzer found and I can't
find a way to generate a better one (other than to reduce it, which I did).
|
|
|
|
|
|
| |
Just call optimizeDroppedBlock from visitDrop to handle that.
Followup to #6982. This optimizes the new testcase added there. Some older
tests also improve.
|
|
|
|
|
|
|
|
| |
This change is NFC on all things we previously optimized, but also makes
us optimize TryTable, BrOn, etc., by replacing hard-coded logic for Break
with generic code.
Also simplify the code there a little - we didn't really need ControlFlowWalker.
|
|
|
|
|
|
|
|
| |
This just moves the code out into a function. A later PR will use it in another place.
Add a test that shows the motivation for that later PR: we fail to optimize away
a block return value at the top level of a function. Fixing that will involve calling
the new function here from another place.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
E.g.
(try_table (catch_all $catch)
(throw $e)
)
=>
(try_table (catch_all $catch)
(br $catch)
)
This can then allow other passes to remove the TryTable, if no throwing
things remain.
|
|
|
|
|
|
|
| |
Support 5-segment source mappings, which add a name.
Reference:
https://github.com/tc39/source-map/blob/main/source-map-rev3.md#proposed-format
|
|
|
|
| |
Some versions of libcxx or clang error without this, apparently due to
Type being a forward declaration.
|
|
|
|
|
|
|
| |
This raises the number of locals accepted by the binary parser to the
absolute limit in the spec. A warning is now printed when writing a
binary file if the Web limit of 50,000 locals is exceeded.
Fixes #6968.
|
|
|
|
|
|
|
|
|
|
|
| |
When we precompute something, we try to avoid allocating a new copy. That's important to
avoid many allocations each time we run Precompute - otherwise, each time we see a br
we'd allocate a fresh one, and for its values. But we had a bug where we reused a RefFunc
as the value of a br without updating the type. It's actually tricky to reach a situation where
we find a RefFunc to reuse and it is different from the actual one we want, but the fuzzer
found one.
Fixes the fuzz bug reported on #6845 (but unrelated to that PR).
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Note: FP16 is a little different from F32/F64 since it can't represent
the full 2^16 integer range. 65504 is the max whole integer. This leads
to some slightly strange behavior when converting integers greater than
65504 since they become infinity.
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Inlining was careful about nested calls like this:
(call $a
(call $b)
)
If we inlined the outer call first, we'd have
(block $inlined-code-from-a
..code..
(call $b)
)
After that, the inner call is a child of a block, not of a call. That is,
we've moved the inner call to another parent. To replace that
inner call when we inline, we'd need to update the new parent,
which would require work. To avoid that work, the pass simply
created a block in the middle:
(call $a
(block
(call $b)
)
)
Now the inner call's immediate parent will not change when we
inline the outer call.
However, it turns out that this was entirely unnecessary. We find
the calls using a post-order traversal, and we store the actions in
a vector that we traverse in order, so we only ever process things
in the optimal order of children before parents. And in that order
there is no problem: inlining the inner call first leads to
(call $a
(block $inlined-code-from-b
(..code..)
)
)
That does not affect the outer call's parent.
This PR removes the creation of the unnecessary blocks. This doesn't
improve the final output as optimizations remove the unneeded
blocks later anyhow, but it does make the code simpler and a little
faster. It also makes debugging less confusing. But this is not truly
NFC because --inlining (but not --inlining-optimizing) will actually
emit fewer blocks now (but only --inlining-optimizing is used by
default in production).
The diff on tests here is very small when ignoring whitespace. The
remaining differences are just emitting fewer obviously-unneeded
blocks. There is also one test that needed manual changes,
inlining-eh-legacy, because it tested that we do Pop fixups, but
after emitting one fewer block, those fixups were not needed. I
added a new test there with two nested calls, which does end up
needing those fixups. I also added such a test in
inlining_all-features so that we have coverage for such nested
calls (we might remove the eh-legacy file some day, and other
existing tests with nested calls that I found were more complex).
|
|
|
|
| |
In WasmGC modules there is often no memory at all, and we can skip
walking the code in this pass in such cases.
|
|
|
|
|
|
| |
This makes it slightly faster.
Followup to
https://github.com/WebAssembly/binaryen/pull/6953#discussion_r1765313401
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We may inline multiple times into a single function. Previously, if we did so, we
did the "fixups" such as ReFinalize and non-nullable local fixes once per such
inlining. But that is wasteful as each ReFinalize etc. scans the whole function,
and could be done after we copy all the code from all the inlinings, which is
what this PR does: it splits doInlining() into one function that inlines code and
one that does the updates after, and the update is done after all inlinings.
This turns out to be very important, a 5x speedup on two large real-world wasm
files I am looking at. The reason is that we actually inline more than once in
half the cases, and sometimes far more - in one case we inline over 1,000
times into a function! (and ReFinalized 1,000 times too many)
This is practically NFC, but it turns out that there are some tiny noticeable
differences between running ReFinalize once at the end vs. once after each
inlining. These differences are not really functional or observable in the
behavior of the code, and optimizations would remove them anyhow, but
they are noticeable in two tests here. The changes to tests are, in order:
* Different block names, just because the counter we use sees more things.
* In a testcase with unreachable code, we inline twice into a function, and
the first inlining brings in an unreachable, and ReFinalizing early will lead
to it propagating differently than if we wait to ReFinalize. (It actually leads
to another cycle of inlining in that case, as a fluke.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already parallelized collecting info about functions and finding
inlining opportunities, but the actual inlining - copying the code into
the target function - was done sequentially. It turns out that a lot of
work happens there: this PR makes the pass over 2x faster.
Details:
* Move nameHint to InliningAction, so it is there when we apply the
actions.
* Add a DoInlining internal pass which calls doInlining().
* Refactor OptUtils a little to make it easy to run DoInlining before
opts.
* Also remove the return value of doInlining which was not used.
|
|
|
|
|
|
|
|
|
| |
The purpose of the datacount section is to pre-declare how many data
segments there will be so that engines can allocate space for them
and not have to back patch subsequent instructions in the code section
that refer to them. Once we use IRBuilder in the binary parser, we will
have to have the data segments available by the time we parse
instructions that use them, so eagerly construct the data segments when
parsing the datacount section.
|
|
|
|
|
|
|
|
| |
In preparation for using IRBuilder in the binary parser, eagerly create
Functions when parsing the function section so that they are already
created once we parse the code section. IRBuilder will require the
functions to exist when parsing calls so it can figure out what type
each call should have, even when there is a call to a function whose
body has not been parsed yet.
|