| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
Also update the parser so that implicit type uses are not matched with shared
function types.
|
|
|
|
|
|
|
| |
Since the BasicHeapTypes are in an enum, calling HeapType methods on them
requires something like `HeapType(HeapType::func).someMethod()`. This is
unnecessarily verbose, so add a new `HeapTypes` namespace that contains
constexpr HeapType globals that can be used instead, shorting this to
`HeapTypes::func.someMethod()`.
|
|
|
|
|
|
|
| |
Since the BasicHeapTypes are in an enum, calling HeapType methods on them
requires something like `HeapType(HeapType::func).someMethod()`. This is
unnecessarily verbose, so add a new `HeapTypes` namespace that contains
constexpr HeapType globals that can be used instead, shorting this to
`HeapTypes::func.someMethod()`.
|
|
|
|
|
| |
Add the feature and flags to enable and disable it. Require the new feature to
be enabled for shared heap types to validate. To make the test work, update the
validator to actually check features for global types.
|
|
|
|
|
|
|
| |
(#6659)
This avoids special-casing particular global init forms. After this we should
support everything in global inits that we support anywhere else.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this we now print e.g.
(local.set $temp (; local type: i32 ;)
...
This can be nice in large functions to avoid needing to scroll up to
see the local type, e.g. when debugging why unsubtyping doesn't
work somewhere.
Also avoid [ ] in this mode, in favor of the standard (; ;), and put those
at the end rather than at the start.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Parse the text format for shared composite types as described in the
shared-everything thread proposal. Update the parser to use 'comptype' instead
of 'strtype' to match the final GC spec and add the new syntactic class
'sharecomptype'.
Update the type canonicalization logic to take sharedness into account to avoid
merging shared and unshared types. Make the same change in the TypeMerging pass.
Ensure that shared and unshared types cannot be in a subtype relationship with
each other.
Follow-up PRs will add shared abstract heap types, binary parsing and emitting
for shared types, and fuzzer support for shared types.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We automatically copy debuginfo in replaceCurrent(), but there are a few
places that do other operations than simple replacements. call-utils.h will
turn a call_ref with a select target into two direct calls, and we were missing
the logic to copy debuginfo from the call_ref to the calls.
To make this work, refactor out the copying logic from wasm-traversal, into
debuginfo.h, and use it in call-utils.h.
debuginfo.h itself is renamed from debug.h (as now this needs to be included
from wasm-traversal, which nearly everything does, and it turns out some files
have internal stuff like a debug() helper that ends up conflicing with the old
debug namespace).
Also rename the old copyDebugInfo function to copyDebugInfoBetweenFunctions
which is more explicit. That is also moved from the header to a cpp file because
it depends on wasm-traversal (so we'd end up with recursive headers otherwise).
That is fine, as that method is called after copying a function, which is not that
frequent. The new copyDebugInfoToReplacement (which was refactored out of
wasm-traversal) is in the header because it can be called very frequently (every
single instruction we optimize) and we want it to get inlined.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because the parser has five stages, it requires instantiating all of the
templates in parsers.h with up to five different contexts. Instantiating all
those templates in a single compilation unit takes a long time. On my machine, a
release build of wat-parser.cpp.o took 32 seconds. To reduce the time of
incremental rebuilds on machines with many cores, split the code across several
compilation units so that the templates need to be instantiated for just a
single context in each unit. On my machine the longest compilation time after
this splitting is 17 seconds. The time for a full release build also drops from
42 seconds to 33 seconds. On machines with fewer cores, the benefit may be
smaller or even negative, though.
|
|
|
|
|
|
|
|
| |
The module splitting code incorrectly assumed that there would be at least one
active element segment and failed to initialize the table slot manager with a
function table if that was not the case. Fix the bug by setting the table even
when there are no active segments and add a test.
Fixes #6572 and #6637.
|
|
|
|
|
|
|
| |
TableGet, Set, Size, Grow, Fill, Copy.
Also move "null" into shared-constants, to make the code
more consistent overall.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The binary writing of `stringview_wtf16.slice` requires scratch locals to store
the `start` and `end` operands while the string operand is converted to a
stringview. To avoid unbounded binary bloat when round-tripping, we detect the
case that `start` and `end` are already `local.get`s and avoid using scratch
locals by deferring the binary writing of the `local.get` operands until after
the stringview conversoins is emitted.
We previously optimized the scratch locals for `start` and `end` independently,
but this could produce incorrect code in the case where the `local.get` for
`start` is deferred but its value is changed by a `local.set` in the code for
`end`. Fix the problem by only optimizing to avoid scratch locals in the case
where both `start` and `end` are already `local.get`s, so they will still be
emitted in the original relative order and they cannot interfere with each other
anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need StringLowering to modify even public types, as it must replace every
single stringref with externref, even if that modifies the ABI. To achieve that
we told it that all string-using types were private, which let TypeUpdater update
them, but the problem is that it moves all private types to a new single
rec group, which meant public and private types ended up in the same group.
As a result, a single public type would make it all public, preventing optimizations
and breaking things as in #6630 #6640.
Ideally TypeUpdater would modify public types while keeping them in the same
rec groups, but this may be a very specific issue for StringLowering, and that
might be a lot of work. Instead, just make StringLowering handle public types of
functions in a manual way, which is simple and should handle all cases that
matter in practice, at least in J2Wasm.
|
|
|
| |
This adds ref.eq, ref.null, ref.is_null, ref.func.
|
|
|
|
|
| |
Create a temp var to store the ChildIterator.
Fixes #6639
|
|
|
|
|
|
|
|
|
|
| |
Without that logic we could end up dropping that particular effect. This actually
made a test pass when it should not: the modified test here has a function with
effects that are ok to remove, but it had a loop which adds MayNotReturn which
we should actually not remove, so it was removed erroneously.
To fix the test, add other effects there (local ones) that we can see are removable.
Also add a function with a loop to test that we do not remove an infinite loop,
which adds coverage for the fix here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The parser was incorrectly handling the parsing of declarative element segments whose `init` is a `vec(expr)`.
https://webassembly.github.io/spec/core/binary/modules.html#element-section
Binry parser was simply reading a single `u32LEB` value for `init`
instead of parsing a expression regardless `usesExpressions = true`.
This commit updates the `WasmBinaryReader::readElementSegments` function
to correctly parse the expressions for declarative element segments by
calling `readExpression` instead of `getU32LEB` when `usesExpressions = true`.
Resolves the parsing exception:
"[parse exception: bad section size, started at ... not being equal to new position ...]"
Related discussion: https://github.com/tanishiking/scala-wasm/issues/136
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old ordering in that pass did a topological sort while sorting by uses
both within topological groups and between them. That could be unoptimal
in some cases, however, and actually on J2CL output this pass made the
binary larger, which is how we noticed this.
The problem is that such a toplogical sort keeps topological groups in
place, but it can be useful to interleave them sometimes. Imagine this:
$c - $a
/
$e
\
$d - $b
Here $e depends on $c, etc. The optimal order may interleave the two
arms here, e.g. $a, $b, $d, $c, $e. That is because the dependencies define
a partial order, and so the arms here are actually independent.
Sorting by toplogical depth first might help in some cases, but also is not
optimal in general, as we may want to mix toplogical depths:
$a, $c, $b, $d, $e does so, and it may be the best ordering.
This PR implements a natural greedy algorithm that picks the global with
the highest use count at each step, out of the set of possible globals, which
is the set of globals that have no unresolved dependencies. So we start by
picking the first global with no dependencies and add at at the front; then
that unlocks anything that depended on it and we pick from that set, and
so forth.
This may also not be optimal, but it is easy to make it more flexible by
customizing the counts, and we consider 4 sorts here:
* Set all counts to 0. This means we only take into account dependencies,
and we break ties by the original order, so this is as close to the original
order as we can be.
* Use the actual use counts. This is the simple greedy algorithm.
* Set the count of each global to also contain the counts of its children,
so the count is the total that might be unlocked. This gives more weight
to globals that can unlock more later, so it is less greedy.
* As last, but weight children's counts lower in an exponential way, which
makes sense as they may depend on other globals too.
In practice it is simple to generate cases where 1, 2, or 3 is optimal (see
new tests), but on real-world J2CL I see that 4 (with a particular exponential
coefficient) is best, so the pass computes all 4 and picks the best. As a
result it will never worsen the size and it has a good chance of
improving.
The differences between these are small, so in theory we could pick any
of them, but given they are all modifications of a single algorithm it is
very easy to compute them all with little code complexity.
The benefits are rather small here, but this can save a few hundred
bytes on a multi-MB Java file. This comes at a tiny compile time cost, but
seems worth it for the new guarantee to never regress size.
|
|
|
|
|
|
|
|
|
| |
--log-execution=NAME will use NAME as the module for the logger
function import, rather than infer it.
If the name is not provided (--log-execution as before this PR) then we
will try to automatically decide which to use ("env", unless we see
another module name is used, which can be the case in optimized
modules).
|
|
|
|
|
| |
If we replace a type with another, use the original name for the new type,
and give the old a unique name (for the rare cases in which it has uses).
|
|
|
|
|
| |
We had that logic right in other places, but the specific part of Vacuum that
looks at code that leads up to an unreachable did not check for infinite loops,
so it could remove them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SignaturePruning pass optimizes away parameters that it proves are safe to
remove. It turns out that that does not always match the definition of private
types, which is more restrictive. Specifically, if say all the types are in one big
rec group and one of them is used on an exported function then all of them are
considered public (as the rec group is). However, in closed world, it would be ok
to leave that rec group unchanged but to create a pruned version of that type
and use it, in cases where we see it is safe to remove a parameter. (See the
testcase for a concrete example.)
To put it another way, SignaturePruning already proves that a parameter is
safe to remove in all the ways that matter. Before this PR, however, the testcase
in this PR would error - so this PR is not an optimization but a bugfix, really -
because SignaturePruning would see that a parameter is safe to remove but
then TypeUpdating would see the type is public and so it would leave it alone,
leading to a broken module.
This situation is in fact not that rare, and happens on real-world Java code.
The reason we did not notice it before is that typically there are no remaining
SignaturePruning opportunities late in the process (when other closed world
optimizations have typically led to a single big rec group).
The concrete fix here is to add additionalPrivateTypes to a few more places
in TypeUpdating. We already supported that for cases where a pass knew
better than the general logic what can be modified, and this adds that
ability to the signature-rewriting logic there. Then SignaturePruning can
send in all the types it has proven are safe to modify.
* Also necessary here is to only add from additionalPrivateTypes if the type
is not already in our list (or we'd end up with duplicates in the final rec
group).
* Also move newSignatures in SignaturePruning out of the top level, which
was confusing (the pass has multiple iterations, and we want each to have
a fresh instance).
|
| |
|
|
|
|
|
| |
Remove `SExpressionParser`, `SExpressionWasmBuilder`, and `cashew::Parser`.
Simplify gen-s-parser.py. Remove the --new-wat-parser and
--deprecated-wat-parser flags.
|
|
|
|
| |
When generating assertions, traverse the `WASTScript` data structure rather than
interleaving assertion parsing with emitting.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Doing it before anything else can help a lot if there is a significant amount of
dead code that can be removed, as it saves work for all the later passes. We
did run this pass if GC was enabled just a few passes later down, but even so
it is worthwhile to run it an additional time, and it makes sense to do even
without GC (though in typical optimized LLVM outputs there will be little
dead code).
If there is no dead code then this is wasted work, but this is a fairly fast pass,
and I measure no significant slowdown due to this. E.g. on the 35 MB clang.wasm
(which is already optimized, so little dead code) it takes around a second, while
all of -O2 takes almost two minutes, so the difference is just 1%.
On J2CL I measure a 15% speedup in -O3 --closed-world -tnh, and also the
binary is 2.5% smaller, which means there is less work for later cycles of -O3.
|
|
|
|
|
|
|
|
|
| |
(#6584)
Heap stores (struct.set) are optimized into the struct.new when they are adjacent
in a statement list.
Pushing struct.new down past irrelevant instructions increases the likelihood that
it ends up adjacent to sets.
|
|
|
|
|
|
|
|
|
|
| |
caught it (#6626)
The DataSegment was manually added to .dataSegments, but we need to add it
using addDataSegment so the maps are updated and getDataSegment(name)
works.
Also add validation that would have caught this earlier: check that each item in
the item lists can be fetched by name.
|
| |
|
|
|
| |
The offsets are unsigned.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this PR we generate global.gets in globals, which we did not do before.
We do that by replacing makeConst (the only thing we did before, for the
contents of globals) with makeTrivial, and add code to makeTrivial to sometimes
make a global.get. When no suitable global exists, makeGlobalGet will emit a
constant, so there is no danger in trying.
Also raise the number of globals a little.
Also explicitly note the current limitation of requiring all tuple globals to contain
tuple.make and nothing else, including not global.get, and avoid adding such
invalid global.gets in tuple globals in the fuzzer.
|
| |
|
|
|
|
| |
If we wanted to switch types in such cases we'd need to refinalize (which is likely
worth doing, though other passes should refine globals anyhow).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the new wast parser to parse a full script up front, then traverse the
parsed script data structure and execute the commands. wasm-shell had previously
used the new wat parser for top-level modules, but it now uses the new parser
for module assertions as well. Fix various bugs this uncovered.
After this change, wasm-shell supports all the assertions used in the upstream
spec tests (although not new kinds of assertions introduced in any proposals).
Uncomment various `assert_exhaustion` tests that we can now execute.
Other kinds of assertions remain commented out in our tests: wasm-shell now
supports `assert_unlinkable`, but the interpreter does not eagerly check for the
existence of imports, so those tests do not pass. Tests that check for NaNs also
remain commented out because they do not yet use the standard syntax that
wasm-shell now supports for canonical and arithmetic NaN results, and our
interpreter would not pass all of those tests even if they did use the standard
syntax.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
validation (#6603)
GlobalRefining did not traverse module code, so it did not update global.gets
in other globals.
Add missing validation that actually errors on that: We did not check global.get
types.
These could be separate PRs but it would be difficult to test them separately.
|
|
|
|
|
| |
NFC (#6600)
Followup to #6599.
|
|
|
|
|
|
|
|
|
| |
This allows modules to contains both 32-bit and 64-bit segment.
In order to check the table/memory state when visiting segments we need
to ensure that memories/tables are visited only after their segments.
The comments in visitTable/visitMemory already assumed this but it
wasn't true in practice.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes us compliant with the wasm spec by adding a cast: we use the refined
type for br_if fallthrough values, and the wasm spec uses the branch target. If the
two differ, we add a cast after the br_if to make things match.
Alternatively we could match the wasm spec's typing in our IR, but we hope the wasm
spec will improve here, and so this is will only be temporary in that case. Even if not,
this is useful because by using the most refined type in the IR we optimize in the best
way possible, and only suffer when we emit fixups in the binary, but in practice those
cases are very rare: br_if is almost always dropped rather than used, in real-world
code (except for fuzz cases and exploits).
We check carefully when a br_if value is actually used (and not dropped) and its type
actually differs, and it does not already have a cast. The last condition ensures that
we do not keep adding casts over repeated roundtripping.
|
|
|
|
|
| |
Changes to wasm-validator.cpp here are mostly for consistency between
elem and data segment validation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
test (#6596)
This existed before #6495 but became noticeable there. We only looked at
the fallthrough values in the later part of areConsecutiveInputsEqual, but
there can be invalidation due to the non-fallthrough part:
(i32.add
(local.get $x)
(block
(local.set $x ..)
(local.get $x)
)
)
The set can cause the local.get to differ the second time. To fix this,
check if the non-fallthrough part invalidates the fallthrough (but only
on the right hand side).
Fixes #6593
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We settled on the name `WASM_EXNREF` for the new setting in Emscripten
for the name for the new EH option.
https://github.com/emscripten-core/emscripten/blob/2bc5e3156f07e603bc4f3580cf84c038ea99b2df/src/settings.js#L782-L786
"New EH" sounds vague and I'm not sure if "experimental" is really
necessary anyway, given that the potential users of this option is aware
that this is a new spec that has been adopted recently.
To make the option names consistent, this renames `--translate-to-eh`
(the option that only runs the translator) to `--translate-to-exnref`,
and `--experimental-new-eh` to `--emit-exnref` (the option that runs the
translator at the end of the whole pipeline), and renames the pass and
variable names in the code accordingly as well.
In case anyone is using the old option names (and also to make the
Chromium CI pass), this does not delete the old options.
|
|
|
|
|
| |
The whole annotation was parsed as a keyword, which prevented file paths with non-ascii characters or paths starting with `/` or `.`.
Also, there was a typo: one was comparing `fileSize` rather than `lineSize` to `contents->npos`.
|
|
|
|
|
|
| |
The stringref proposal has been superseded by the imported JS strings proposal,
but the former has many more operations than the latter. To reduce complexity,
remove all operations that are part of stringref but not part of imported
strings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The stringview types from the stringref proposal have three irregularities that
break common invariants and require pervasive special casing to handle properly:
they are supertypes of `none` but not subtypes of `any`, they cannot be the
targets of casts, and they cannot be used to construct nullable references. At
the same time, the stringref proposal has been superseded by the imported
strings proposal, which does not have these irregularities. The cost of
maintaing and improving our support for stringview types is no longer worth the
benefit of supporting them.
Simplify the code base by entirely removing the stringview types and related
instructions that do not have analogues in the imported strings proposal and do
not make sense in the absense of stringviews.
Three remaining instructions, `stringview_wtf16.get_codeunit`,
`stringview_wtf16.slice`, and `stringview_wtf16.length` take stringview operands
in the stringref proposal but cannot be removed because they lower to operations
from the imported strings proposal. These instructions are changed to take
stringref operands in Binaryen IR, and to allow a graceful upgrade path for
users of these instructions, the text and binary parsers still accept but ignore
`string.as_wtf16`, which is the instruction used to convert stringrefs to
stringviews. The binary writer emits code sequences that use scratch locals and `string.as_wtf16` to keep the output valid.
Future PRs will further align binaryen with the imported strings proposal
instead of the stringref proposal, for example by making `string` a subtype of
`extern` instead of a subtype of `any` and by removing additional instructions
that do not have analogues in the imported strings proposal.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
#6587 was incorrect: It checked generativity early in an incremental manner, but
it did not accumulate that information as we do with hashes. As a result we
could end up optimizing something with a generative child, and sadly we lacked
testing for that case.
This adds incremental generativity computation alongside hashes. It also splits
out this check from isRelevant.
Also add a test for nested effects (as opposed to generativity), but that already
worked before this PR (as we compute effects and invalidation as we go, already).
|
|
|
|
| |
I recently add TableSize/Grow and noticed I didn't need these. It seems
they are superfluous.
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#6520)
;;@
with nothing else (no source:line) can be used to specify that the following
expression does not have any debug info associated to it. This can be used
to stop the automatic propagation of debug info in the text parsers.
The text printer has also been updated to output this comment when needed.
|