| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CFP is less precise than GUFA, in particular, when it flows around types then
it does not consider what field it is flowing them to, and its core data
structure is "if a struct.get is done on this type's field, what can be read?".
To see the issue this PR fixes, assume we have
A
/ \
B C
Then if we see struct.set $C, we know that can be read by a struct.get $A
(we can store a reference to a C in such a local/param/etc.), so we propagate
the value of that set to A. And, in general, anything in A can appear in B
(say, if we see a copy, a struct.set of struct.get that operates on types A,
then one of the sides might be a B), so we propagate from A to B. But
now we have propagated something from C to B, which might be of an
incompatible type.
This cannot cause runtime issues, as it just means we are propagating more
than we should, and will end up with less-useful results. But it can break
validation if no other value is possible but one with an incompatible type,
as we'd replace a struct.get $B with a value that only makes sense for C.
(The qualifier "no other value is possible" was added in the previous
sentence because if another one is possible then we'd end up with too
many values to infer anything, and not optimize at all, avoiding any error.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RefTest (#6692)
CFP focuses on finding when a field always contains a constant, and then replaces
a struct.get with that constant. If we find there are two constant values, then in some
cases we can still optimize, if we have a way to pick between them. All we have is the
struct.get and its reference, so we must use a ref.test:
(struct.get $T x (..ref..))
=>
(select
(..constant1..)
(..constant2..)
(ref.test $U (..ref..))
)
This is valid if, of all the subtypes of $T, those that pass the test have
constant1 in that field, and those that fail the test have constant2. For
example, a simple case is where $T has two subtypes, $T is never created
itself, and each of the two subtypes has a different constant value.
This is a somewhat risky operation, as ref.test is not necessarily cheap.
To mitigate that, this is a new pass, --cfp-reftest that is not run by
default, and also we only optimize when we can use a ref.test on what
we think will be a final type (because ref.test on a final type can be
faster in VMs).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Heap2Local's (#6493)
CFP already had logic for truncating but not for sign-extending, which this
fixes.
Use the new helper function in Heap2Local as well. This changes the model
there from "truncate on set, sign-extend on get" to "truncate or sign-extend
on get". That is both simpler by reusing the same logic as CFP but also more
optimal: the idea to truncate on sets made sense since sets are rarer, but if
we must then sign-extend on gets then we can end up doing more work
overall (as the truncations on sets are not needed if all gets are signed).
Found by #6486
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we see A->f0 = A->f0 then we might be copying fields not only between
instances of A but also of any subtypes of A, and so if some subtype has
value x then that x might now have reached any other subtype of A
(even in a sibling type, so long as A is their parent).
We already thought we were handling that, but the mechanism we used to
do so (copying New info to Set info, and letting Set info propagate) was not
enough.
Also add a small constructor to save the work of computing subTypes again.
Add TODOs for some cases that we could optimize regarding copies but
do not, yet.
|
|
|
|
| |
The logic ignored copied values, which was fine for struct.get operations but
not for struct.new.
|
|
|
|
|
| |
The same bug was present in both: We ignored packing, so writing a larger
value than fits in the field would lead to us propagating that original value.
|
|
|
|
| |
Equirecursive is no longer standards track and its implementation is extremely
complex. Remove it.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously only WalkerPasses had access to the `getPassRunner` and
`getPassOptions` methods. Move those methods to `Pass` so all passes can use
them. As a result, the `PassRunner` passed to `Pass::run` and
`Pass::runOnFunction` is no longer necessary, so remove it.
Also update `Pass::create` to return a unique_ptr, which is more efficient than
having it return a raw pointer only to have the `PassRunner` wrap that raw
pointer in a `unique_ptr`.
Delete the unused template `PassRunner::getLast()`, which looks like it was
intended to enable retrieving previous analyses and has been in the code base
since 2015 but is not implemented anywhere.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An overview of this is in the README in the diff here (conveniently, it is near the
top of the diff). Basically, we fix up nn locals after each pass, by default. This keeps
things easy to reason about - what validates is what is valid wasm - but there are
some minor nuances as mentioned there, in particular, we ignore nameless blocks
(which are commonly added by various passes; ignoring them means we can keep
more locals non-nullable).
The key addition here is LocalStructuralDominance which checks which local
indexes have the "structural dominance" property of 1a, that is, that each get has
a set in its block or an outer block that precedes it. I optimized that function quite
a lot to reduce the overhead of running that logic after each pass. The overhead
is something like 2% on J2Wasm and 0% on Dart (0%, because in this mode we
shrink code size, so there is less work actually, and it balances out).
Since we run fixups after each pass, this PR removes logic to manually call the
fixup code from various places we used to call it (like eh-utils and various passes).
Various passes are now marked as requiresNonNullableLocalFixups => false.
That lets us skip running the fixups after them, which we normally do automatically.
This helps avoid overhead. Most passes still need the fixups, though - any pass
that adds a local, or a named block, or moves code around, likely does.
This removes a hack in SimplifyLocals that is no longer needed. Before we
worked to avoid moving a set into a try, as it might not validate. Now, we just do it
and let fixups happen automatically if they need to: in the common code they
probably don't, so the extra complexity seems not worth it.
Also removes a hack from StackIR. That hack tried to avoid roundtrip adding a
nondefaultable local. But we have the logic to fix that up now, and opts will
likely keep it non-nullable as well.
Various tests end up updated here because now a local can be non-nullable -
previous fixups are no longer needed.
Note that this doesn't remove the gc-nn-locals feature. That has been useful for
testing, and may still be useful in the future - it basically just allows nn locals in
all positions (that can't read the null default value at the entry). We can consider
removing it separately.
Fixes #4824
|
|
|
|
|
|
|
|
|
|
|
| |
Nominal types don't make much sense without GC, and in particular trying to emit
them with typed function references but not GC enabled can result in invalid
binaries because nominal types do not respect the type ordering constraints
required by the typed function references proposal. Making this change was
mostly straightforward, but required fixing the fuzzer to use --nominal only
when GC is enabled and required exiting early from nominal-only optimizations
when GC was not enabled.
Fixes #4756.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves more logic from ConstantFieldPropagation into PossibleConstantValues,
that is, instead of handling the two cases of a Literal or a Name before calling
PossibleConstantValues, move that code into the helper class. That way all users of
PossibleConstantValues can benefit from it. In particular, this makes
DeadArgumentElimination now support optimizing immutable globals, as well as
ref.func and ref.null.
(Changes to test/lit/passes/dae-gc-refine-params.wast are to avoid the new
optimizations from kicking in, so that it still tests what it tested before.)
|
|
|
|
| |
This just moves PossibleConstantValues to a new separate file
(as a preparation for other passes using it too).
|
|
|
|
| |
This avoids cluttering the main wasm namespace, and clarifies what the
scanner does.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we write an immutable global to a field, and that is the only thing
we ever write, then we can replace reads of the field with a get of
the global. To do that, this tracks immutable globals written to
fields and not just constant values.
Normally this is not needed, as if the global is immutable then we
propagate its constant value to everywhere anyhow. However, for
references this is useful: If we have a global immutable vtable,
for example, then we cannot replace a get of it with a constant.
So this PR helps with immutable reference types in globals, allowing
us to propagate global.gets to them to more places, which then
can allow optimizations there.
This + later opts removes 25% of array.gets from j2wasm. I believe
almost all of those are itable calls, so this means those are getting
devirtualized now. I see something like a 5% speedup due to that.
|
|
|
| |
Saves a little code size and might prevent some bugs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add struct.get tracking, and if a field is never read from, simply remove
it.
This will error if a field is written using struct.new with a value with side
effects. It is not clear we can handle that, as if the struct.new is in a
global then we can't save the other values to locals etc. to reorder
things. We could perhaps use other globals for it (ugh) but at least for
now, that corner case does not happen on any code I can see.
This allows a quite large code size reduction on j2wasm output (20%). The
reason is that many vtable fields are not actually read, and so removing
them and the ref.func they hold allows us to get rid of those functions,
and code that they reach.
|
|
|
|
|
|
|
|
|
| |
This method is in parallel to runOnFunction above it. It sets the runner
and then does the walk, like that method.
Also set runner to nullptr by default. I noticed ubsan was warning on
things here, which this should avoid, but otherwise I'm not aware of an
actual bug, so this should be NFC. But it does provide a safer API
that should avoid future bugs.
|
|
|
|
|
|
|
|
|
| |
This just moves code outside and makes it more generic. One set of
functionality are "struct utils", which are tools to scan wasm for info
about the usage of struct fields, and to analyze that data. The other
tool is a general analysis of nominal subtypes.
The code will be useful in a few upcoming passes, so this will avoid a
significant amount of code duplication.
|
|
|
|
|
|
|
|
|
|
|
| |
This finishes the refactoring started in #4115 by doing the
same change to pass a Module into EffectAnalyzer instead of
features. To do so this refactors the fallthrough API and a few
other small things. After those changes, this PR removes the
old feature constructor of EffectAnalyzer entirely.
This requires a small breaking change in the C API, changing
BinaryenExpressionGetSideEffects's feature param to a
module. That makes this change not NFC, but otherwise it is.
|
|
|
|
|
|
|
|
| |
When looking for all values written to a field, we can ignore
values that are loaded from that same field, i.e., are copied from
something already present there. Such operations never introduce
new values.
This helps by a small but non-zero amount on j2cl.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ConstantFieldPropagation (#4064)
Previously we tracked them in the same way. That means that we did the same
when seeing if either a struct.new or a struct.set can write to the memory that
is read by a struct.get, where the rule is that if either type is a subtype of the
other then they might. But with struct.new we know the precise type, which
means we can do better.
Specifically, if we see a new of type B, then only a get of a supertype of B can
possibly read that data: it is not possible for our struct of type B to appear in
a location that requires a subtype of B. Conceptually:
A = type struct
B = type extends A
C = type extends B
x = struct.new<B>
struct.get<A>(y) // x might appear here, as it can be assigned to a
// variable y of a supertype
struct.get<C>(y) // x cannot appear here
This allows more devirtualization. It is a followup for #4052 that implements
a TODO from there.
The diff without whitespace is simpler.
|
|
A field in a struct is constant if we can see that in the entire program we
only ever write the same constant value to it. For example, imagine a
vtable type that we construct with the same funcrefs each time, then (if
we have no struct.sets, or if we did, and they had the same value), we
could replace a get with that constant value, since it cannot be anything
else:
(struct.new $T (i32.const 10) (rtt))
..no other conflicting values..
(struct.get $T 0) => (i32.const 10)
If the value is a function reference, then this may allow other passes
to turn what was a call_ref into a direct call and perhaps also get
inlined, effectively a form of devirtualization.
This only works in nominal typing, as we need to know the supertype
of each type. (It could work in theory in structural, but we'd need to do
hard work to find all the possible supertypes, and it would also
become far less effective.)
This deletes a trivial test for running -O on GC content. We have
many more tests for GC at this point, so that test is not needed, and
this PR also optimizes the code into something trivial and
uninteresting anyhow.
|