| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
Implement `ref.i31_shared` the new instruction for creating references
to shared i31s. Implement binary and text parsing and emitting as well
as interpretation. Copy the upstream spec test for i31 and modify it so
that all the heap types are shared. Comment out some parts that we do
not yet support.
|
|
|
|
|
|
|
|
| |
The full syntax for an expression in an element syntax looks like
`(item (ref.null none))`, but we have been printing the abbreviated
version, which omits the `(item ...)`. This abbreviation is only valid
when the item has only a single instruction, so it is not always correct
to use it. Rather than determining whether or not to use the
abbreviation on a case-by-case basis, always print the full syntax.
|
|
|
| |
This edge case make the lowering a little more tricky.
|
|
|
| |
Test was converted using port_passes_tests_to_lit.py.
|
|
|
|
|
| |
Eventually we will need to do some tuning of compile time speed, but for
now it is going to be simpler to do all the opts, in particular because it makes
writing tests simpler.
|
|
|
|
|
|
|
| |
Rather than trying to trampoline primary-to-secondary calls through an
existing table, just create a fresh table for this purpose. This ensures
that modifications to the existing tables cannot interfere with
primary-to-secondary calls and conversely that loading the secondary
module cannot overwrite modifications to the tables.
|
|
|
|
|
|
|
| |
Previously we just did not optimize cases where our escape analysis showed an
allocation flowed into a cast that failed. However, after inlining there can be
real-world cases where that happens, even in traps-never-happen mode (if the
cast is behind a conditional branch), so it seems worth optimizing.
|
|
|
|
|
| |
This is a tiny bit more code but it is more consistent with other
operations, and it saves work later.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the pass would monomorphize a call when we were sending more
refined types than the target expects. This generalizes the pass to also consider
the case where we send a constant in a parameter.
To achieve that, this refactors the pass to explicitly define the "call context",
which is the code around the call (inputs and outputs) that may end up leading
to optimization opportunities when combined with the target function. Also
add comments about the overall design + roadmap.
The existing test is mostly unmodified, and the diff there is smaller when
ignoring whitespace. We do "regress" those tests by adding more local.set
operations, as in the refactoring that makes things a lot simpler, that is, to
handle the general case of an operand having either a refined type or be a
constant, we copy it inside the function, which works either way. This
"regression" is only in the testing version of the pass (the normal version
runs optimizations, which would remove that extra code).
This also enables the pass when GC is disabled. Previously we only handled
refined types, so only GC could benefit. Add a test for MVP content
specifically to show we operate there as well.
|
| |
|
|
|
|
|
|
|
|
|
| |
Normally we use it when optimizing (above a certain level). This lets the user
prevent it from being used even then.
Also add optimization options to wasm-metadce so that this is possible
there as well and not just in wasm-opt (this also opens the door to running
more passes in metadce, which may be useful later).
|
|
|
|
|
|
|
|
|
| |
There are times after collecting a profile, we wish to manually include
specific functions into the primary module.
It could be due to non-deterministic profiling or functions for error
scenarios (e.g. _trap).
This PR helps to unlock this workflow by honoring both the
`--keep-funcs` flag as well as the `--profile` flag
|
|
|
|
| |
The `filecheck` command used for tests does not support `CHECK-SAME`, so
use should be avoided.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Anything else right before an unreachable is removed by the main DCE
pass anyhow, but because of the structured form of BinaryenIR we can't remove
a drop. That is, this is the difference between
(i32.eqz
(i32.const 42)
(unreachable)
)
and
(drop
(call $foo)
)
(unreachable)
In both cases the unreachable is preceded by something we don't need,
but in the latter case it must remain in BinaryenIR for validation.
To optimize this, add a rule in StackIR.
Fixes #6715
|
|
|
|
|
|
|
|
|
| |
Rename instructions `extern.internalize` into `any.convert_extern` and
`extern.externalize` into `extern.convert_any` to follow more closely
the spec. This was changed in
https://github.com/WebAssembly/gc/issues/432.
The legacy name is still accepted in text inputs and in the C and JS
APIs.
|
|
|
|
|
|
|
|
|
| |
(#6709)
Previously the replacement select got the debug info, but we should also copy it
to the values, as often optimizations lead to one of those values remaining by
itself.
Similar to #6652 in general form.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RefTest (#6692)
CFP focuses on finding when a field always contains a constant, and then replaces
a struct.get with that constant. If we find there are two constant values, then in some
cases we can still optimize, if we have a way to pick between them. All we have is the
struct.get and its reference, so we must use a ref.test:
(struct.get $T x (..ref..))
=>
(select
(..constant1..)
(..constant2..)
(ref.test $U (..ref..))
)
This is valid if, of all the subtypes of $T, those that pass the test have
constant1 in that field, and those that fail the test have constant2. For
example, a simple case is where $T has two subtypes, $T is never created
itself, and each of the two subtypes has a different constant value.
This is a somewhat risky operation, as ref.test is not necessarily cheap.
To mitigate that, this is a new pass, --cfp-reftest that is not run by
default, and also we only optimize when we can use a ref.test on what
we think will be a final type (because ref.test on a final type can be
faster in VMs).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Give the type fuzzer the ability to generate shared heap types when the
shared-everything feature is enabled. It correctly ensures that shared
structs and arrays cannot reference unshared heap types, but that
unshared heap types can reference any heap type.
Update the main fuzzer so that for the time being it never uses the
shared-everything feature when generating additional heap types, so it
never generates shared types. We can lift this restriction once the main
fuzzer has been updated to properly handle shared types.
As a drive-by, fix some logic for subtracting feature sets from each
other that is used in this commit.
|
|
|
|
|
| |
If an allocation does not escape, then we can compute ref.eq for it: when
compared to itself the result is 1, and when compared to anything else it
is 0 (since it did not escape, anything else must be different).
|
|
|
| |
Add spec tests checking validation for structs and arrays.
|
|
|
| |
Fixes #6690
|
|
|
|
|
|
|
| |
This pass receives a list of functions to trace, and then wraps them in calls to
imports. This can be useful for tracing malloc/free calls, for example, but is
generic.
Fixes #6548
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#6688)
If we have
(global $g (struct.new $S
(i32.const 1)
(struct.new $T ..)
(ref.func $f)
))
then before this PR if we wanted to read the middle field we'd stop, as it is non-constant.
However, we can un-nest it, making it constant:
(global $g.unnested (struct.new $T ..))
(global $g (struct.new $S
(i32.const 1)
(global.get $g.unnested)
(ref.func $f)
))
Now the field is a global.get of an immutable global, which is constant. Using this
technique we can handle anything in a struct field, constant or not. The cost of adding
a global is likely offset by the benefit of being able to refer to it directly, as that opens
up more opportunities later.
Concretely, this replaces the constant values we look for in GSI with a variant over
constants or expressions (we do still want to group constants, as multiple globals
with the same constant field can be treated as a whole). And we note cases where we
need to un-nest, and handle those at the end.
|
|
|
|
|
|
|
|
|
|
|
| |
Implement binary and text parsing and printing of shared basic heap types and
incorporate them into the type hierarchy.
To avoid the massive amount of code duplication that would be necessary if we
were to add separate enum variants for each of the shared basic heap types, use
bit 0 to indicate whether the type is shared and replace `getBasic()` with
`getBasic(Unshared)`, which clears that bit. Update all the use sites to record
whether the original type was shared and produce shared or unshared output
without code duplication.
|
|
|
|
|
|
|
|
|
| |
When popping past an unreachable instruction would lead to popping from an empty
stack or popping an incorrect type, we need to avoid popping and produce new
Unreachable instructions instead to ensure we parse valid IR. The logic for
this was flawed and made the synthetic Unreachable come before the popped
unreachable child, which was not correct in the case that that popped
unreachable was a branch or other non-trapping instruction. Fix and simplify the
logic and re-enable the spec test that uncovered the bug.
|
|
|
|
| |
This is achieved by simply replacing the Literal with PossibleConstantValues, which
supports both Literals and Globals.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code used i instead of index, as in this pseudocode:
for i in range(num_names):
index = readU32LEB() # index of the data segment to name
name = readName() # name to give that segment
data[i] = name # XXX 'i' should be 'index'
That (funnily enough) happened to always work before since we write names in
order. That is, normally given segments A,B,C we'd write then in the names section
as A,B,C. Then the reader, which had the bug, would always have i and index
identical in value anyhow. But if a wasm producer used different indexes, a
problem could happen.
To test this, add a binary file that has a reversed name section.
Fixes #6672
|
|
|
|
|
|
|
| |
As an abbreviation, a `typeuse` can be given as just a list of parameters and
results, in which case it corresponds to the index of the first function type
with the same parameters and results. That function type must also be an MVP
function type, i.e. it cannot have a nontrivial rec group, be non-final, or have
a declared supertype. The parser did not previously implement all of these rules.
|
|
|
|
| |
Also update the parser so that implicit type uses are not matched with shared
function types.
|
|
|
|
|
| |
Add the feature and flags to enable and disable it. Require the new feature to
be enabled for shared heap types to validate. To make the test work, update the
validator to actually check features for global types.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this we now print e.g.
(local.set $temp (; local type: i32 ;)
...
This can be nice in large functions to avoid needing to scroll up to
see the local type, e.g. when debugging why unsubtyping doesn't
work somewhere.
Also avoid [ ] in this mode, in favor of the standard (; ;), and put those
at the end rather than at the start.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Parse the text format for shared composite types as described in the
shared-everything thread proposal. Update the parser to use 'comptype' instead
of 'strtype' to match the final GC spec and add the new syntactic class
'sharecomptype'.
Update the type canonicalization logic to take sharedness into account to avoid
merging shared and unshared types. Make the same change in the TypeMerging pass.
Ensure that shared and unshared types cannot be in a subtype relationship with
each other.
Follow-up PRs will add shared abstract heap types, binary parsing and emitting
for shared types, and fuzzer support for shared types.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We automatically copy debuginfo in replaceCurrent(), but there are a few
places that do other operations than simple replacements. call-utils.h will
turn a call_ref with a select target into two direct calls, and we were missing
the logic to copy debuginfo from the call_ref to the calls.
To make this work, refactor out the copying logic from wasm-traversal, into
debuginfo.h, and use it in call-utils.h.
debuginfo.h itself is renamed from debug.h (as now this needs to be included
from wasm-traversal, which nearly everything does, and it turns out some files
have internal stuff like a debug() helper that ends up conflicing with the old
debug namespace).
Also rename the old copyDebugInfo function to copyDebugInfoBetweenFunctions
which is more explicit. That is also moved from the header to a cpp file because
it depends on wasm-traversal (so we'd end up with recursive headers otherwise).
That is fine, as that method is called after copying a function, which is not that
frequent. The new copyDebugInfoToReplacement (which was refactored out of
wasm-traversal) is in the header because it can be called very frequently (every
single instruction we optimize) and we want it to get inlined.
|
|
|
|
|
|
|
|
| |
The module splitting code incorrectly assumed that there would be at least one
active element segment and failed to initialize the table slot manager with a
function table if that was not the case. Fix the bug by setting the table even
when there are no active segments and add a test.
Fixes #6572 and #6637.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The binary writing of `stringview_wtf16.slice` requires scratch locals to store
the `start` and `end` operands while the string operand is converted to a
stringview. To avoid unbounded binary bloat when round-tripping, we detect the
case that `start` and `end` are already `local.get`s and avoid using scratch
locals by deferring the binary writing of the `local.get` operands until after
the stringview conversoins is emitted.
We previously optimized the scratch locals for `start` and `end` independently,
but this could produce incorrect code in the case where the `local.get` for
`start` is deferred but its value is changed by a `local.set` in the code for
`end`. Fix the problem by only optimizing to avoid scratch locals in the case
where both `start` and `end` are already `local.get`s, so they will still be
emitted in the original relative order and they cannot interfere with each other
anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need StringLowering to modify even public types, as it must replace every
single stringref with externref, even if that modifies the ABI. To achieve that
we told it that all string-using types were private, which let TypeUpdater update
them, but the problem is that it moves all private types to a new single
rec group, which meant public and private types ended up in the same group.
As a result, a single public type would make it all public, preventing optimizations
and breaking things as in #6630 #6640.
Ideally TypeUpdater would modify public types while keeping them in the same
rec groups, but this may be a very specific issue for StringLowering, and that
might be a lot of work. Instead, just make StringLowering handle public types of
functions in a manual way, which is simple and should handle all cases that
matter in practice, at least in J2Wasm.
|
|
|
|
|
|
|
|
|
|
| |
Without that logic we could end up dropping that particular effect. This actually
made a test pass when it should not: the modified test here has a function with
effects that are ok to remove, but it had a loop which adds MayNotReturn which
we should actually not remove, so it was removed erroneously.
To fix the test, add other effects there (local ones) that we can see are removable.
Also add a function with a loop to test that we do not remove an infinite loop,
which adds coverage for the fix here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The parser was incorrectly handling the parsing of declarative element segments whose `init` is a `vec(expr)`.
https://webassembly.github.io/spec/core/binary/modules.html#element-section
Binry parser was simply reading a single `u32LEB` value for `init`
instead of parsing a expression regardless `usesExpressions = true`.
This commit updates the `WasmBinaryReader::readElementSegments` function
to correctly parse the expressions for declarative element segments by
calling `readExpression` instead of `getU32LEB` when `usesExpressions = true`.
Resolves the parsing exception:
"[parse exception: bad section size, started at ... not being equal to new position ...]"
Related discussion: https://github.com/tanishiking/scala-wasm/issues/136
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old ordering in that pass did a topological sort while sorting by uses
both within topological groups and between them. That could be unoptimal
in some cases, however, and actually on J2CL output this pass made the
binary larger, which is how we noticed this.
The problem is that such a toplogical sort keeps topological groups in
place, but it can be useful to interleave them sometimes. Imagine this:
$c - $a
/
$e
\
$d - $b
Here $e depends on $c, etc. The optimal order may interleave the two
arms here, e.g. $a, $b, $d, $c, $e. That is because the dependencies define
a partial order, and so the arms here are actually independent.
Sorting by toplogical depth first might help in some cases, but also is not
optimal in general, as we may want to mix toplogical depths:
$a, $c, $b, $d, $e does so, and it may be the best ordering.
This PR implements a natural greedy algorithm that picks the global with
the highest use count at each step, out of the set of possible globals, which
is the set of globals that have no unresolved dependencies. So we start by
picking the first global with no dependencies and add at at the front; then
that unlocks anything that depended on it and we pick from that set, and
so forth.
This may also not be optimal, but it is easy to make it more flexible by
customizing the counts, and we consider 4 sorts here:
* Set all counts to 0. This means we only take into account dependencies,
and we break ties by the original order, so this is as close to the original
order as we can be.
* Use the actual use counts. This is the simple greedy algorithm.
* Set the count of each global to also contain the counts of its children,
so the count is the total that might be unlocked. This gives more weight
to globals that can unlock more later, so it is less greedy.
* As last, but weight children's counts lower in an exponential way, which
makes sense as they may depend on other globals too.
In practice it is simple to generate cases where 1, 2, or 3 is optimal (see
new tests), but on real-world J2CL I see that 4 (with a particular exponential
coefficient) is best, so the pass computes all 4 and picks the best. As a
result it will never worsen the size and it has a good chance of
improving.
The differences between these are small, so in theory we could pick any
of them, but given they are all modifications of a single algorithm it is
very easy to compute them all with little code complexity.
The benefits are rather small here, but this can save a few hundred
bytes on a multi-MB Java file. This comes at a tiny compile time cost, but
seems worth it for the new guarantee to never regress size.
|
|
|
|
|
|
|
|
|
| |
--log-execution=NAME will use NAME as the module for the logger
function import, rather than infer it.
If the name is not provided (--log-execution as before this PR) then we
will try to automatically decide which to use ("env", unless we see
another module name is used, which can be the case in optimized
modules).
|
|
|
|
|
| |
If we replace a type with another, use the original name for the new type,
and give the old a unique name (for the rare cases in which it has uses).
|
| |
|
|
|
|
|
| |
We had that logic right in other places, but the specific part of Vacuum that
looks at code that leads up to an unreachable did not check for infinite loops,
so it could remove them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SignaturePruning pass optimizes away parameters that it proves are safe to
remove. It turns out that that does not always match the definition of private
types, which is more restrictive. Specifically, if say all the types are in one big
rec group and one of them is used on an exported function then all of them are
considered public (as the rec group is). However, in closed world, it would be ok
to leave that rec group unchanged but to create a pruned version of that type
and use it, in cases where we see it is safe to remove a parameter. (See the
testcase for a concrete example.)
To put it another way, SignaturePruning already proves that a parameter is
safe to remove in all the ways that matter. Before this PR, however, the testcase
in this PR would error - so this PR is not an optimization but a bugfix, really -
because SignaturePruning would see that a parameter is safe to remove but
then TypeUpdating would see the type is public and so it would leave it alone,
leading to a broken module.
This situation is in fact not that rare, and happens on real-world Java code.
The reason we did not notice it before is that typically there are no remaining
SignaturePruning opportunities late in the process (when other closed world
optimizations have typically led to a single big rec group).
The concrete fix here is to add additionalPrivateTypes to a few more places
in TypeUpdating. We already supported that for cases where a pass knew
better than the general logic what can be modified, and this adds that
ability to the signature-rewriting logic there. Then SignaturePruning can
send in all the types it has proven are safe to modify.
* Also necessary here is to only add from additionalPrivateTypes if the type
is not already in our list (or we'd end up with duplicates in the final rec
group).
* Also move newSignatures in SignaturePruning out of the top level, which
was confusing (the pass has multiple iterations, and we want each to have
a fresh instance).
|
|
|
|
|
| |
Remove `SExpressionParser`, `SExpressionWasmBuilder`, and `cashew::Parser`.
Simplify gen-s-parser.py. Remove the --new-wat-parser and
--deprecated-wat-parser flags.
|
|
|
|
|
|
|
|
|
| |
(#6584)
Heap stores (struct.set) are optimized into the struct.new when they are adjacent
in a statement list.
Pushing struct.new down past irrelevant instructions increases the likelihood that
it ends up adjacent to sets.
|
|
|
|
| |
This renames old EH tests in the form of `-eh-old.wast` to
`-eh-legacy.wast`, to be clearer in names.
|
|
|
| |
The offsets are unsigned.
|
| |
|
|
|
|
| |
If we wanted to switch types in such cases we'd need to refinalize (which is likely
worth doing, though other passes should refine globals anyhow).
|