| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
If the types are completely incompatible, we know the cast will fail. However,
ref.cast does allow a null to pass through, which makes it a little more
complicated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#4101)
Reverse postorder basically just means that a block's immediate dominator
must precede it in the list. That is useful because then algorithms that look at
dominance can simply process the list in order, and the immediate dominator
will have already been seen before each block.
Another way to put it is that in reverse postorder a block's dominators
appear before it in the list, as do all non-loop predecessors. At least in
reducible graphs that is the case, and our IR, like wasm, is reducible.
It is pretty natural to emit reverse postorder on wasm given the
reducibility: simply process the wasm in postorder, and make sure to
create new basic blocks only when reaching their code - that is, do not
create them "ahead of time". We were doing that in a single place, for
try-catch, so this PR refactors that. Specifically it makes us create the
basic blocks for catches right when we reach them, and not earlier.
So the data structure that used to store them becomes a list of things
to connect to them.
This is useful for #4100 , see more details there.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable it in -O3 and -Os and higher.
This helps very little on output from LLVM, but also it does not alter
compile times much anyhow. On code that has not been run through
an optimizing compiler already, this can help quite a lot, e.g., 15% of
code size on some wasm GC samples.
This will not normally help with speed, as optimizing VMs do such
things anyhow. However, this can help baseline compilers and
interpreters and so forth.
|
|
|
|
|
|
|
|
| |
Before, we'd compute the hash of a child, then store that in a map, then
the parent would find the child's hash in the map using the pointer to the
child. But as we do a simple postorder walk, we can use a stack, and
avoid hashing the child pointers.
This makes it 10% faster or so.
|
|
|
|
|
|
|
|
|
|
|
| |
This allows common patterns in J2CL to be optimized, where we write
to various array indices and get the values or the reference from a
struct.
It would be nice to do even better here, and look at actually specific
types, but I think we should be careful to keep the runtime constant.
That seems hard to do if we accumulate a list of types and do
Type::isSubType on them etc. But maybe someone has a better
idea than this PR?
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we replace
A
A
A
with
(local.set A)
(local.get)
(local.get)
then it is ok for A to trap (so long as it does so deterministically), as if
it does trap then the first appearance will do so, and the others not
be reached anyhow.
This helps GC code as often there are repeated struct.gets and such that
may trap.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The goal of this mode is to remove obviously-unneeded code like
(drop
(i32.load
(local.get $x)))
In general we can't remove it, as the load might trap - we'd be removing
a side effect. This is fairly rare in general, but actually becomes quite
annoying with wasm GC code where such patterns are more common,
and we really need to remove them.
Historically the IgnoreImplicitTraps option was meant to help here. However,
in practice it did not quite work well enough for most production code, as
mentioned e.g. in #3934 . TrapsNeverHappen mode is an attempt to fix that,
based on feedback from @askeksa in that issue, and also I believe this
implements an idea that @fitzgen mentioned a while ago (sorry, I can't
remember where exactly...). So I'm hopeful this will be generally useful
and not just for GC.
The idea in TrapsNeverHappen mode is that traps are assumed to not
actually happen at runtime. That is, if there is a trap in the code, it will
not be reached, or if it is reached then it will not trap. For example, an
(unreachable) would be assumed to never be reached, which means
that the optimizer can remove it and any code that executes right before
it:
(if
(..condition..)
(block
(..code that can be removed, if it does not branch out..)
(..code that can be removed, if it does not branch out..)
(..code that can be removed, if it does not branch out..)
(unreachable)))
And something like a load from memory is assumed to not trap, etc.,
which in particular would let us remove that dropped load from earlier.
This mode should be usable in production builds with assertions
disabled, if traps are seen as failing assertions. That might not be true
of all release builds (maybe some use traps for other purposes), but
hopefully in some. That is, if traps are like assertions, then enabling
this new mode would be like disabling assertions in release builds
and living with the fact that if an assertion would have been hit then
that is "undefined behavior" and the optimizer might have removed
the trap or done something weird.
TrapsNeverHappen (TNH) is different from IgnoreImplicitTraps (IIT).
The old IIT mode would just ignore traps when computing effects.
That is a simple model, but a problem happens with a trap behind
a condition, like this:
if (x != 0) foo(1 / x);
We won't trap on integer division by zero here only because of the
guarding if. In IIT, we'd compute no side effects on 1 / x, and then
we might end up moving it around, depending on other code in
the area, and potentially out of the if - which would make it happen
unconditionally, which would break.
TNH avoids that problem because it does not simply ignore traps.
Instead, there is a new hasUnremovableSideEffects() method
that must be opted-in by passes. That checks if there are no side
effects, or if there are, if we can remove them - and we know we can
remove a trap if we are running under TrapsNeverHappen mode,
as the trap won't happen by assumption. A pass must only use that
method where it is safe, that is, where it would either remove the
side effect (in which case, no problem), or if not, that it at least does
not move it around (avoiding the above problem with IIT).
This PR does not implement all optimizations possible with
TNH, just a small initial set of things to get started. It is already
useful on wasm GC code, including being as good as IIT on removing
unnecessary casts in some cases, see the test suite updates here.
Also, a significant part of the 18% speedup measured in
#4052 (comment)
is due to my testing with this enabled, as otherwise the devirtualization
there leaves a lot of unneeded code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Technically this is not a new pass, but it is a rewrite almost from scratch.
Local Common Subexpression Elimination looks for repeated patterns,
stuff like this:
x = (a + b) + c
y = a + b
=>
temp = a + b
x = temp + c
y = temp
The old pass worked on flat IR, which is inefficient, and was overly
complicated because of that. The new pass uses a new algorithm that
I think is pretty simple, see the detailed comment at the top.
This keeps the pass enabled only in -O4, like before - right after
flattening the IR. That is to make this as minimal a change as possible.
Followups will enable the pass in the main pipeline, that is, we will
finally be able to run it by default. (Note that to make the pass work
well after flatten, an extra simplify-locals is added - the old pass used
to do part of simplify-locals internally, which was one source of
complexity. Even so, some of the -O4 tests have changes, due to
minor factors - they are just minor orderings etc., which can be
seen by inspecting the outputs before and after using e.g.
--metrics)
This plus some followup work leads to large wins on wasm GC output.
On j2cl there is a common pattern of repeated struct.gets, so common
that this pass removes 85% of all struct.gets, which makes the total
binary 15% smaller. However, on LLVM-emitted code the benefit is
minor, less than 1%.
|
|
|
|
|
|
|
|
| |
When looking for all values written to a field, we can ignore
values that are loaded from that same field, i.e., are copied from
something already present there. Such operations never introduce
new values.
This helps by a small but non-zero amount on j2cl.
|
|
|
|
|
| |
Use ToolOptions there, which adds --nominal support.
We must also pass --nominal to the sub-commands we run.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
typing (#4069)
(if (result i32)
(local.get $x)
(struct.get $B 1
(ref.null $B)
)
(struct.get $C 1
(ref.null $C)
)
)
With structural typing it is safe to turn this into this:
(struct.get $A 1
(if (result (ref $A))
(local.get $x)
(ref.null $B)
(ref.null $C)
)
)
Here $A is the LUB of the others. This works since $A must have
field 1 in it. But with nominal types it is possible that the LUB in fact
does not have that field, and we would not validate.
This actually seems like a more general issue that might happen with
other things, even though atm perhaps it can't. For simplicity, avoid this
pattern in both nominal and structural typing, to avoid making a
difference between them.
|
| |
|
|
|
|
|
|
| |
This adds and tests the new method. It will be used in a new pass later, where
computing shallow hashes allows it to be done in linear time.
99% of the diff is whitespace.
|
|
|
|
|
|
|
|
|
|
| |
This caused no noticeable bugs, but it could in theory in new passes - in fact
in a pass I will open later this week it did.
Also fix the order in wasm.h. That part has no effect, but it is nice to be
consistent. After this PR, everything should match the single source of
truth which is wasm-delegations-fields.h (as that is used in printing, binary
reading/writing, etc., so it has to be correct). Also Switch now matches
the ordering in Break.
|
|
|
|
|
|
|
|
|
|
|
| |
This was being set in the creation of Loads in the binary reader, but
forgotten in the SIMD logic - which ends up creating a Load with
type v128, and signed_ was uninitialized.
Very hard to test this, but I saw it "break" hash value computation
which is how I noticed this.
Also initialize the I31 sign field. Now all of them in wasm.h are
properly initialized.
|
|
|
|
|
|
|
|
| |
(#4051)
We ignore sets in unreachable code, but their values may not be
compatible with a new type we specialize a local for. That is, the
validator cares about unreachable sets, while logically we don't need
to, and this pass doesn't. Fix up such unreachable sets at the end.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
themselves (#4070)
If the only uses of a global are
if (global == 0) {
global = 1;
}
Then we do not need that global: while it has both reads and
writes, the value in the global does not cause anything observable.
It is read, but only to write to itself, and nothing else.
This happens in real-world code from j2cl quite a lot, as they have
an initialization pattern with globals, and in some cases we can
optimize away the work done in the initialization, leaving only the
globals in this pattern.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First, move the tiny pattern of call-ref-of-ref-func from Directize
into OptimizeInstructions. This is important because Directize is
a global optimization pass - it looks at the table to see if a
CallIndirect can be turned into a direct call. We only run global
passes at the end of the pipeline, but we don't need any global
data for call-ref of a ref-func, and OptimizeInstructions is the
place for such patterns.
Second, extend that to also handle fallthrough values. This is
less simple, but as call_ref is so inefficient, it's worth doing all
we can.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we would keep doing iterations of inlining until we hit
the number of functions in the module (at which point, we would
definitely know we are recursing). This prevents infinite recursion,
but it can take a very very long time to notice that in a huge
module with one tiny recursive function and 100,000 other ones.
To do better than that, track how many times we've inlined into
a function. After a fixed number of such inlinings, stop.
Aside from avoding very slow compile times, such infinite
recursion likely is not that beneficial to do a great many times,
so anyhow it is best to stop after a few iterations.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This only moves code around. visitBrOn was in the main part of the pass,
which was incorrect as it could interfere with other work being done there.
Specifically, we have a stack of Ifs there, and if we replace a BrOn with
an If, an assertion was hit. To fix this, run it like sinkBlocks(), in a separate
interleaved phase.
This fixes a bug reported by askeksa-google here:
https://github.com/WebAssembly/gc/issues/226#issuecomment-868739853
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ConstantFieldPropagation (#4064)
Previously we tracked them in the same way. That means that we did the same
when seeing if either a struct.new or a struct.set can write to the memory that
is read by a struct.get, where the rule is that if either type is a subtype of the
other then they might. But with struct.new we know the precise type, which
means we can do better.
Specifically, if we see a new of type B, then only a get of a supertype of B can
possibly read that data: it is not possible for our struct of type B to appear in
a location that requires a subtype of B. Conceptually:
A = type struct
B = type extends A
C = type extends B
x = struct.new<B>
struct.get<A>(y) // x might appear here, as it can be assigned to a
// variable y of a supertype
struct.get<C>(y) // x cannot appear here
This allows more devirtualization. It is a followup for #4052 that implements
a TODO from there.
The diff without whitespace is simpler.
|
|
|
|
|
| |
This moves code out of the existing methods so that it can be
called directly from more places. A later PR will use the new
methods.
|
| |
|
|
|
|
| |
Localizer may have added a new tee and a local, and we should apply
the tee in the right place.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A field in a struct is constant if we can see that in the entire program we
only ever write the same constant value to it. For example, imagine a
vtable type that we construct with the same funcrefs each time, then (if
we have no struct.sets, or if we did, and they had the same value), we
could replace a get with that constant value, since it cannot be anything
else:
(struct.new $T (i32.const 10) (rtt))
..no other conflicting values..
(struct.get $T 0) => (i32.const 10)
If the value is a function reference, then this may allow other passes
to turn what was a call_ref into a direct call and perhaps also get
inlined, effectively a form of devirtualization.
This only works in nominal typing, as we need to know the supertype
of each type. (It could work in theory in structural, but we'd need to do
hard work to find all the possible supertypes, and it would also
become far less effective.)
This deletes a trivial test for running -O on GC content. We have
many more tests for GC at this point, so that test is not needed, and
this PR also optimizes the code into something trivial and
uninteresting anyhow.
|
|
|
|
|
|
| |
Define a "single constant expression" consistently, as a thing that is a
compile-time constant. Move I31New out of there, and also clean up
the function it is moved into, canInitializeGlobal(), to be consistent
in validating children.
|
|
|
|
|
|
| |
We were missing StructNew and ArrayLen.
Also refactor the helper method to allow for shorter code in each
caller.
|
| |
|
| |
|
|
|
| |
The wrong variable was null checked, leading to segfaults.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new run line to every list test containing a struct type to run the test
again with nominal typing. In cases where tests do not use any struct subtyping,
this does not change the test output. In cases where struct subtyping is used, a
new check prefix is introduced to capture the difference that `(extends ...)`
clauses are emitted in nominal mode but not in equirecursive mode. There are no
other test differences.
Some tests are cleaned up along the way. Notably,
O_all-features{,-ignore-implicit-traps}.wast is consolidated to a single file.
|
|
|
|
|
| |
Add a new OptimizeForJS pass which contains rewriting rules specific to JavaScript.
LLVM usually lowers x != 0 && (x & (x - 1)) == 0 (isPowerOf2) to popcnt(x) == 1 which is ok for wasm and other targets but is quite expensive for JavaScript. In this PR we lower the popcnt pattern back to the isPowerOf2 pattern.
|
|
|
|
| |
The field name lookup was previously failing because the lookup used a fresh
struct type rather than using the intended struct type.
|
|
|
|
|
|
|
|
| |
We just cleared the list of exports, but the exportMap was still populated, so
the data was in an inconsistent state.
This fixes the case of running --extract-function multiple times on a file (which
is usually not useful, unless one of the passes you are debugging adds new
functions to the file - which some do).
|
| |
|
| |
|
| |
|
|
|
|
|
| |
DeadArgumentElimination (#4036)
To do this we need to look at tail calls and not just returns.
|
|
|
|
|
|
| |
It is ok to use the default value of a reference even if we refine the type,
as it would be a more specifically-typed null, and all nulls compare the
same. However, if the default is used then we *cannot* alter the type to
be non-nullable, as then we'd use a null where that is not allowed.
|
|
|
|
| |
Partially reverts #4025, removing the code and updates the test to show
we do the optimization.
|
|
|
|
| |
interpreter (#4023)
|
|
|
|
|
|
|
|
|
| |
The tail call spec does not include subtyping because it is based on the
upstream spec, which does not contain subtyping. However, there is no reason
that subtyping shouldn't apply to tail calls like it does for any other call or
return. Update the validator to allow subtyping and avoid a related null pointer
dereference while we're at it.
Do not run the test in with --nominal because it is buggy in that mode.
|
|
|
|
| |
(#4031)
|
|
|
| |
This is necessary when using GC and EH together, for instance.
|