| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
| |
The order of operations could allow us to add vars but then later decide
not to do the optimization due to unreachability. And then we did not do a
fixup for non-nullability for those args, leading to a fuzzer error.
|
|
|
|
|
|
|
|
| |
Found by the fuzzer. Calling makeZero on an rtt with depth will
error because we try to create a zero Literal from it, and we can't
do that - we don't know a list of super types to give it. We could
work around it, but we don't want to: if the rtt has depth then we
can't make a nice zero for it, we'd need some rtt.subs anyhow,
so simply mark it as a type we can't make a zero for.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#4307)
i64.extend_i32_u(i32.load8_u(x)) -> i64.load8_u(x)
i64.extend_i32_u(i32.load16_u(x)) -> i64.load16_u(x)
i64.extend_i32_s(i32.load8_u(x)) -> i64.load8_u(x)
i64.extend_i32_s(i32.load16_u(x)) -> i64.load16_u(x)
i64.extend_i32_s(i32.load8_s(x)) -> i64.load8_s(x)
i64.extend_i32_s(i32.load16_s(x)) -> i64.load16_s(x)
i64.extend_i32_u(i32.load(x))) -> i64.load32_u(x)
i64.extend_i32_s(i32.load(x))) -> i64.load32_s(x)
don't apply to
i64.extend_i32_u(i32.load8_s(x)) -> skip
i64.extend_i32_u(i32.load16_s(x)) -> skip
i64.extend_i32_s(i32.atomic.load(x)) -> skip
|
|
|
| |
This just moves code around.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this we had special logic for various call types. This replaces all
that with a single general code path, which unifies everything except
for control flow constructs (which remain as before, handled in a
special way for each of them).
The algorithm is simple and direct, basically it goes through the
children and when it finds a block, it sees if it can move the block's
contents outside of the parent. While doing so it takes into account
effects and so forth.
To make this easy, a random-access API is added to ChildIterator.
Diff without whitespace makes the existing test updates a lot simpler.
Note that this is not NFC as the old algorithm had some quirks like
not taking into account effects when there were more than 2 children;
the new code is uniform in how it handles things.
This ends up removing 19% of all blocks in j2wasm, which reduces
1% of total code size.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#4319)
Before, if we saw a param is written, that prevented us from subtyping it:
function foo(x : oldType) {
..
x = someValue;
..
}
Even if all calls to foo send some specific struct type that we'd like to subtype
to, seeing that write stopped us. To handle such a write we need to do some
extra handling for the case in which it is written a less-specific type (that is,
if someValue is of type oldType, something like this:
function foo(x : newType) {
var x_old : oldType;
x_old = x; // copy the param to x_old, and use x_old everywhere
..
x_old = someValue;
..
}
That is, still refine the param type, but inside the function use a new local that
has the old type, and is guaranteed to validate. This PR implements that logic
so that we can optimize more cases.
To allow that, this PR avoids trying to both refine a type and remove a param as
being unused - that has annoying corner cases. If it is unused, we can simply
remove it anyhow.
|
|
|
|
|
|
|
|
|
| |
This specializes the fields of structs based on the types written to them. That is,
if a field is of type A but in practice we always write some subtype B to it
then we can change the type of the field to that.
On j2wasm this manages to improve at least one field in 2% of types. Not a
large amount, but this does lead to further benefits in later opts (e.g. about a third
of the improvements are to turn a field non-nullable).
|
|
|
|
|
|
| |
Tiny followup to #4314
Also updates some function types in test output, fixing breakage on
main after racing landings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old algorithm can be summarized as: In each basic block, start at the beginning.
Each pair of live locals there might interfere with each other, as they might arrive from
different entry blocks with different values. Afterwards, go through the block and find
overlapping live ranges, and mark interferences there as well.
This is non-linear because at the start of the block we do a double-loop over all
pairs of live locals, which in general can be O(N^2) (N - number of locals). It also
has the downside of ignoring copies: if two locals have overlapping live ranges but
they must have identical values on those ranges, they do not actually interfere,
for example
x = 10;
y = x;
.. // live ranges overlap here
foo(x, y); // live ranges end here.
We can ignore this overlap since the copy shows they are identical there, but the
pass did not take this into account. To some extent other passes can remove such
copies (SimplifyLocals, MergeLocals, RedundantSetElimination), but in general
this was a weak spot for the optimizer.
I realized there is a solution to both these problems: In Wasm, given that we have
a default value for all locals, if a local is live at the start of a block then it must be
live at the end of all the blocks reaching it. That is so because the liveness will
extend backwards all the way to some set of the local, possibly all the way to
the zero-initialization at the start of the function, and it extends that way through
all predecessor blocks. A consequence of this is that there are no interferences
between locals that only occur during a merge: The live ranges include the
predecessor blocks, and theirs, and so forth, until we reach a block where one
of the locals is assigned a value different than the other. That is a necessary and
sufficient condition for intererence, and therefore when processing a block we
only need to look at its contents, and can ignore the merging of control flow,
which allows us to be linear.
More details on this and on the new algorithm in comments in the source, but
the basic idea is that it simply goes through each block in a linear way, finding
which values are assigned to each local (using a numbering of unique values),
and noting which are live at each time. If two locals are live and one is assigned
a value that is not the same as the value in the other, mark them as interfering.
This is of substantial benefit to j2wasm output, I believe because it is common
there to find local subexpression elimination opportunities after inlining, and
each time we find one we add a local. If we inline different functions into the
same target, we may end up with copied locals for each of them. (This was
not noticed in the past because it is very rare on LLVM output, which has
already had inlining and GVN etc. done.)
There is a small benefit to LLVM output as well, though just a few
percent at best. However, it is enough to be noticeable on some of
the code size tests.
This is also faster than the previous pass. It's normally not noticeable
as this pass is not one of the slowest anyhow, but I found some real-world
codebases where the pass becomes 50% faster. I have not found any
case where it is slower than the old algorithm.
Fuzzed over several days to be sure this is correct, and also verified
on the emscripten test suite.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We found one cast that has another as its input, and forgot that
the child was possibly a fallthrough value. That is, there might be more
code that needs to be kept around.
Rather than fix the middle of the three cases there - the one with
HeapType::isSubType(childIntendedType, intendedType) - I
noticed it is not actually needed. That case checks if the child's
type is more specific than the parent's, and if so, then the parent
is not needed. But we already handle that earlier above in the
same function: regardless of what the child of a static cast is,
if the cast is static and the input is the proper type already, the
cast is unneeded (see lines 1565-1566).
|
|
|
|
| |
This avoids cluttering the main wasm namespace, and clarifies what the
scanner does.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves validation of `catch` bodies mostly by checking the
validity of `pop`s.
For every `catch` body:
- Checks if its tag exists
- If the tag's type is none:
- Ensures there shouldn't be any `pop`s
- If the tag's type is not none:
- Checks if there's a single `pop` within the catch body
- Checks if the tag type matches the `pop`'s type
- Checks if the `pop`'s location is valid
For every `catch_all` body:
- Ensures there shuldn't be any `pop`s
This uncovers several bugs related to `pop`s in existing tests, which
this PR also fixes.
|
|
|
|
|
|
|
| |
Without this roundtripping may not work in nominal mode, as
we might not assign the expected heap types in the right places.
Specifically, when the signature matches but the nominal types are
distinct then we need to keep them that way (and the sugar in the
text format parsing will merge them).
|
|
|
|
|
|
|
|
|
| |
We marked that as only trapping if the input as nullable. But ref.as_func
will trap if it isn't a func, for example.
We could in theory try to check if a trap is possible, like checking if the
input is already non-nullable or already a function, etc., but we have
optimization passes to get rid of RefAs when they are not needed
anyhow, so there is no point to duplicate that here.
|
|
|
|
|
|
|
|
| |
Now that all known issues with that pass are fixed, enable it by
default. This adds it in a place that seems to make sense on j2wasm,
but in general multiple cycles of optimization will be needed.
This adds a test showing that we run this pass and that it helps
ConstantFieldPropagation by running before it.
|
|
|
|
|
| |
The BrOn logic there is incremental in optimizing and updating types, and so
we cannot assume that at every point in the middle the types are fully
updated.
|
|
|
| |
Fixes #4308.
|
|
|
|
|
| |
Generate both nullable and non-nullable references to basic HeapTypes and
introduce `i31` and `data` HeapTypes. Generate subtypes rather than exact types
for all concrete-typed children.
|
|
|
|
|
|
|
|
|
|
|
| |
We only need to use locals if there are effects we can't remove, or if they
interact with other children. Improve the comment to explain what the
ChildLocalizer is working towards: a state where all the children of the
expression can be reordered or removed freely (local.gets have that
property, as do other things if they have no relevant effects).
Aside from avoiding wasteful locals, this is necessary for running
GlobalTypeOptimization on j2wasm: That code will do a global.get
of an rtt, and those cannot be placed in locals.
|
|
|
|
| |
In preparation for using it from a separate file specifically for generating
random HeapTypes that has no need to depend on all of fuzzing.h.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allocation and cast instructions without explicit RTTs should use the canonical
RTTs for the given types. Furthermore, the RTTs for nominal types should reflect
the static type hierarchy. Previously, however, we implemented allocations and
casts without RTTs using an alternative system that only used static types
rather than RTT values. This alternative system would work fine in a world
without first-class RTTs, but it did not properly allow mixing instructions that
use RTTs and instructions that do not use RTTs as intended by the M4 GC spec.
This PR fixes the issue by using canonical RTTs where appropriate and cleans up
the relevant casting code using std::variant.
|
|
|
|
|
|
|
|
|
| |
This is a minor refactoring in DAE to have a helper class that does the
incremental LUB calculation. The class is also used in LocalSubtyping,
where it has the effect of making the work incremental which it was not
before (that would have no observable consequence, but it should make
us faster in the common case where we fail to find a new LUB).
This will allow further optimization in a central place later.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to what we do with structs, if a global is immutable then we know it
cannot interact with calls.
This changes the JS API for getSideEffects(). That was actually broken,
as passing in the optional module param would just pass it along to the
compiled C code, so it was coerced to 0 or 1, and not a pointer to a module.
To fix that, this now does module.ptr to actually get the pointer, and this is
now actually tested as without a module we cannot compute the effects of a
global. This PR also makes the module param mandatory in the JS API,
as again, without a module we can't compute global effects. (The module
param has already been mandatory in the C++ API for some time.)
|
|
|
|
| |
This helps prevent bugs where we assume that the GCData has either a HeapType or
Rtt without checking. Indeed, one such bug is found and fixed.
|
|
|
|
|
|
|
| |
When the allocation we optimize away flows through a loop, then just like
with a block we must change the type to be nullable, since we are replacing
the allocation with a null.
Fixes #4287
|
| |
|
|
|
|
|
| |
We have separate logic for printing their headers and bodies, and they were not in sync.
Specifically, we would not emit drops in the body of a block, which is not valid, and would
fail roundtripping on text.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we write an immutable global to a field, and that is the only thing
we ever write, then we can replace reads of the field with a get of
the global. To do that, this tracks immutable globals written to
fields and not just constant values.
Normally this is not needed, as if the global is immutable then we
propagate its constant value to everywhere anyhow. However, for
references this is useful: If we have a global immutable vtable,
for example, then we cannot replace a get of it with a constant.
So this PR helps with immutable reference types in globals, allowing
us to propagate global.gets to them to more places, which then
can allow optimizations there.
This + later opts removes 25% of array.gets from j2wasm. I believe
almost all of those are itable calls, so this means those are getting
devirtualized now. I see something like a 5% speedup due to that.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Canonicalize:
(signed)x > -1 ==> x >= 0
(signed)x <= -1 ==> x < 0
(signed)x < 1 ==> x <= 0
(signed)x >= 1 ==> x > 0
(unsigned)x < 1 ==> x == 0
(unsigned)x >= 1 ==> x != 0
This should help #4265, and in general 0 is usually a more
common constant, and reasonable to canonicalize to.
|
|
|
|
|
| |
Without this, we'd just return the old type for the tuple, which meant
its fields referred to unrewritten types, and possible validation errors
if the types changed.
|
|
|
|
|
|
| |
Having a monolithic header file containing all the implementation meant there
was no good way to split up the code or introduce new files. The new
implementation file and source directory will make it much easier to add new
fuzzing functionality in new files.
|
|
|
| |
Saves a little code size and might prevent some bugs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#4263)
If struct.new operands have side effects, and we are removing the operand
as the field is removed, we must keep the side effects. To handle that, store
all the operands in locals and read from the locals, and then removing a
local.get is always safe to do, and nothing has been reordered:
(struct.new
(A)
(side effect) ;; this field will be removed
(B)
)
=>
(local.set $a (A))
(local.set $t (side effect))
(local.set $b (B))
(struct.new
(local.get $a)
(local.get $b)
)
Later passes can remove unneeded local operations etc.
This is necessary before enabling this pass, as this corner case occurs on
j2wasm.
|
|
|
|
|
|
| |
Do so by applying --debug to extraFlags right at the start. That global
is used everywhere already. In particular, this PR removes manually adding
-g in the first diff chunk here, and you can see extraFlags appears there
already on the previous line.
|
|
|
|
| |
This adds support for `try`-`delegate` to `CFGWalker`. This also adds a
single test for `catch`-less `try`.
|
|
|
|
|
|
|
| |
The current code the innermost (`i`th) case specially first and handles
`i-1`th `try` in each loop iteration. This puts the `i`th case in the
loop and each iteration handles `i`th `try`, which is simpler. Then we
don't need to check `throwingInstsStack.empty()` in the beginning
because the `for` loop wouldn't be entered if it's empty anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add struct.get tracking, and if a field is never read from, simply remove
it.
This will error if a field is written using struct.new with a value with side
effects. It is not clear we can handle that, as if the struct.new is in a
global then we can't save the other values to locals etc. to reorder
things. We could perhaps use other globals for it (ugh) but at least for
now, that corner case does not happen on any code I can see.
This allows a quite large code size reduction on j2wasm output (20%). The
reason is that many vtable fields are not actually read, and so removing
them and the ref.func they hold allows us to get rid of those functions,
and code that they reach.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already detected code that looks like
if (foo == 0) {
foo = 1;
}
That "read only to write" pattern occurs also in functions, like this:
function bar() {
if (foo == 0) return;
foo = 1;
}
This PR detects that pattern. It moves code around to share almost
all the logic with the previous pattern (the git diff is not that useful
there, sadly, but looking at them side by side that should be
obvious).
This helps in j2cl on some common clinits, where the clinit function
ends up empty, which is exactly this pattern.
|
|
|
| |
Fuzzing followup to #4244.
|
|
|
|
|
| |
Code in the If condition can be moved out to before the if.
Existing test updates are 99% whitespace.
|
|
|
| |
This sets the C++ standard variable in the build to C++17, and makes use of std::optional (a C++17 library feature) in one place, to test that it's working.
|
| |
|
|
|
|
|
|
|
|
|
| |
Just as the --nominal flag forces all types to be parsed as nominal, the
--structural flag forces all types to be parsed as equirecursive. This is the
current default behavior, but a future PR will change the default to parse types
as either structural or nominal according to their syntax or encoding. This new
flag will then be necessary to get the current behavior.
Also take this opportunity to deduplicate more flags in the help tests.
|
|
|
|
| |
Very simple with the work so far, just add StructGet/ArrayGet code to check
if the field is immutable, and allow the get to go through in that case.
|
|
|
|
|
|
| |
This adds support for tag-using instructions (`throw` and `catch`) to
wasm-metadce. We had to use a hacky workaround in
emscripten-core/emscripten#15266 because of the lack of this support;
after this lands we can remove it.
|
|
|
|
|
|
|
|
| |
Switch from "extends" to M4 nominal syntax
Change all test inputs from using the old (extends $super) syntax to using the
new *_subtype syntax for their inputs and also update the printer to emit the
new syntax. Add a new test explicitly testing the old notation to make sure it
keeps working until we remove support for it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimizes this type of pattern:
(local.set $x (struct.new X Y Z))
(struct.set (local.get $x) X')
=>
(local.set $x (struct.new X' Y Z))
Note how the struct.set is removed, and X' moves to where X was.
This removes almost 90% (!) of the struct.sets in j2wasm output, which reduces
total code size by 2.5%. However, I see no speedup with this - I guess that either
this is not on the hot path, or V8 optimizes it well already, or the CPU is making
stores "free" anyhow...
|
| |
|
|
|
|
|
|
|
|
|
| |
Not sure why the current code tries to add the name even when it is
null, but it causes `dump()` to behave strangely and pollute stdout when
it tries to print `root.str`.
Also this changes code printing `Name.str` to printing just `Name`; when
`Name.str` is null, it prints `(null Name)` instead of polluting stdout,
and it is the recommended way of printing `Name` anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Precompute will run the interpreter on struct.new etc. repeatedly,
as it keeps doing so while it propagates constant values around (if one
of the operands to the struct.new becomes constant, that could have
a noticeable effect). But creating new GC data means we lose track of
their identity, and so ref.eq would not work, and we disabled basically
all struct operations. This implements identity tracking so we can start
to optimize there, which is a step towards using it for immutable field
propagation.
To track identity, always store the data representing each struct.new
in the source using the same GCData structure. That keeps identity
consistent no matter how many times we execute.
|