| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
This adds support for `try`-`delegate` to `CFGWalker`. This also adds a
single test for `catch`-less `try`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add struct.get tracking, and if a field is never read from, simply remove
it.
This will error if a field is written using struct.new with a value with side
effects. It is not clear we can handle that, as if the struct.new is in a
global then we can't save the other values to locals etc. to reorder
things. We could perhaps use other globals for it (ugh) but at least for
now, that corner case does not happen on any code I can see.
This allows a quite large code size reduction on j2wasm output (20%). The
reason is that many vtable fields are not actually read, and so removing
them and the ref.func they hold allows us to get rid of those functions,
and code that they reach.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already detected code that looks like
if (foo == 0) {
foo = 1;
}
That "read only to write" pattern occurs also in functions, like this:
function bar() {
if (foo == 0) return;
foo = 1;
}
This PR detects that pattern. It moves code around to share almost
all the logic with the previous pattern (the git diff is not that useful
there, sadly, but looking at them side by side that should be
obvious).
This helps in j2cl on some common clinits, where the clinit function
ends up empty, which is exactly this pattern.
|
|
|
| |
Fuzzing followup to #4244.
|
|
|
|
|
| |
Code in the If condition can be moved out to before the if.
Existing test updates are 99% whitespace.
|
| |
|
|
|
|
|
|
|
|
|
| |
Just as the --nominal flag forces all types to be parsed as nominal, the
--structural flag forces all types to be parsed as equirecursive. This is the
current default behavior, but a future PR will change the default to parse types
as either structural or nominal according to their syntax or encoding. This new
flag will then be necessary to get the current behavior.
Also take this opportunity to deduplicate more flags in the help tests.
|
|
|
|
| |
Very simple with the work so far, just add StructGet/ArrayGet code to check
if the field is immutable, and allow the get to go through in that case.
|
|
|
|
|
|
| |
This adds support for tag-using instructions (`throw` and `catch`) to
wasm-metadce. We had to use a hacky workaround in
emscripten-core/emscripten#15266 because of the lack of this support;
after this lands we can remove it.
|
|
|
|
|
|
|
|
| |
Switch from "extends" to M4 nominal syntax
Change all test inputs from using the old (extends $super) syntax to using the
new *_subtype syntax for their inputs and also update the printer to emit the
new syntax. Add a new test explicitly testing the old notation to make sure it
keeps working until we remove support for it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimizes this type of pattern:
(local.set $x (struct.new X Y Z))
(struct.set (local.get $x) X')
=>
(local.set $x (struct.new X' Y Z))
Note how the struct.set is removed, and X' moves to where X was.
This removes almost 90% (!) of the struct.sets in j2wasm output, which reduces
total code size by 2.5%. However, I see no speedup with this - I guess that either
this is not on the hot path, or V8 optimizes it well already, or the CPU is making
stores "free" anyhow...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Precompute will run the interpreter on struct.new etc. repeatedly,
as it keeps doing so while it propagates constant values around (if one
of the operands to the struct.new becomes constant, that could have
a noticeable effect). But creating new GC data means we lose track of
their identity, and so ref.eq would not work, and we disabled basically
all struct operations. This implements identity tracking so we can start
to optimize there, which is a step towards using it for immutable field
propagation.
To track identity, always store the data representing each struct.new
in the source using the same GCData structure. That keeps identity
consistent no matter how many times we execute.
|
|
|
|
|
|
|
|
|
|
|
| |
Side effects in the first element are always ok there, as they are
not moved across anything else: they happen before their parent
both before and after the opt.
The pass just left ternary as a TODO, so do at least one part of
that now (we can do the rest as well, with some care).
This is fairly useful on array.set which has 3 operands, and the
first often has interesting things in it.
|
|
|
|
| |
This makes Binaryen match LLVM on a real-world case, which is probably
the safest heuristic to use.
|
|
|
|
|
|
| |
This is the easy part of using immutability more: Just note immutable
fields as such when we read from them, and then a write to a struct
does not interfere with such reads. That is, only a read from a mutable
field can notice the effect of a write.
|
|
|
|
|
| |
`BinaryenTableSizeSetTable` was being declared in the header correctly, but defined
as `BinaryenTableSetSizeTable`. Add test for `BinaryenTableSizeGetTable` and
`BinaryenTableSizeSetTable`.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Implement parsing the new {func,struct,array}_subtype format for nominal types.
For now, the new format is parsed the same way the old-style (extends X) format
is parsed, i.e. in --nominal mode types are parsed as nominal but otherwise they
are parsed as equirecursive. Intentionally do not parse the new types
unconditionally as nominal for now to allow frontends to update their nominal
text format while continuing to use the workflow of running wasm-opt without
--nominal to lower nominal types to structural types.
|
|
|
|
|
|
|
|
| |
See #4220 - this lets us handle the common case for now of simply having
an identical heap type to the table when the signature is identical.
With this PR, #4207's optimization of call_ref + table.get into
call_indirect now leads to a binary that works in V8 in nominal mode.
|
|
|
| |
Followup to #4215
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new pass to perform global type optimization. So far this just
does one thing, to find fields with no struct.set and to turn them
immutable (where possible - sub and supertypes must agree).
To do that, this adds a GlobalTypeRewriter utility which rewrites
all the heap types in the module, allowing changes while doing so.
In this PR, the change is to flip the mutable field. Otherwise, the
utility handles all the boilerplate of creating temp heap types using
a TypeBuilder, and it handles replacing the types in every place
they are used in the module.
This is not enabled by default yet as I don't see enough of a benefit
on j2cl. This PR is basically the simplest thing to do in the space of
global type optimization, and the simplest way I can think of to
fully test the GlobalTypeRewriter (which can't be done as a unit
test, really, since we want to emit a full module and validate it etc.).
This PR builds the foundation for more complicated things like
removing unused fields, subtyping fields, and more.
|
|
|
|
|
|
|
|
|
|
|
| |
patterns (#4181)
i32(x) ? i32(x) : 0 ==> x
i32(x) ? 0 : i32(x) ==> {x, 0}
i64(x) == 0 ? 0 : i64(x) ==> x
i64(x) != 0 ? i64(x) : 0 ==> x
i64(x) == 0 ? i64(x) : 0 ==> {x, 0}
i64(x) != 0 ? 0 : i64(x) ==> {x, 0}
|
|
|
|
|
|
|
|
| |
These new nominal types do not depend on the global type sytem being changed
with the --nominal flag. Instead, they can coexist with the existing
equirecursive structural types, as required in the new milestone 4 spec. This PR
implements subtyping, upper bounding, canonicalizing, and other type operations
but using the new types in the parsers and elsewhere in Binaryen is left to a
follow-on PR.
|
|
|
|
|
| |
Before this fix, the first table (index 0) is counted as its element segment
having "no table index" even when its type is not funcref, which could break
things if that table had a more specialized type.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(call_indirect
..args..
(select
(i32.const x)
(i32.const y)
(condition)
)
)
=>
(if
(condition)
(call $func-for-x
..args..
)
(call $func-for-y
..args..
)
)
To do this we must reorder the condition with the args, and also use
the args more than once, so place them all in locals.
This works towards the goal of polymorphic devirtualization, that is,
turning an indirect call of more than one possible target into more
than one direct call.
|
|
|
| |
Rather than load from the table and call that reference, call using the table.
|
| |
|
|
|
|
|
|
|
|
| |
- fpcast-emu.wast
- generate-dyncalls_all-features.wast
- generaite-i64-dyncalls.wast
- instrument-locals_all-features_disable-typed-function-references.wast
- instrument-memory.wast
- instrument-memory64.wast
|
|
|
|
| |
Adds the part of the spec test suite that this passes (without table.set we
can't do it all).
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that they are all implemented, we can optimize them. This removes the
big if that ignored static operations, and implements things for them.
In general this matches the existing rtt-using case, but there are a few things
we can do better, which this does:
* A cast of a subtype to a type always succeeds.
* A test of a subtype to a type is always 1 (if non-nullable).
* Repeated static casts can leave just the most demanding of them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A SmallSet starts with fixed storage that it uses in the simplest
possible way (linear scan, no sorting). If it exceeds a size then it
starts using a normal std::set. So for small amounts of data it
avoids allocation and any other overhead.
This adds a unit test and also uses it in LocalGraph which provides
a large amount of additional coverage.
I also changed an unrelated data structure from std::map to
std::unordered_map which I noticed while doing profiling in
LocalGraph. (And a tiny bit of additional refactoring there.)
This makes LocalGraph-using passes like ssa-nomerge and
precompute-propagate 10-15% faster on a bunch of real-world
codebases I tested.
|
|
|
|
|
|
|
|
|
| |
This clang-formats c/cpp files in test/ directory, and updates
clang-format-diff.sh so that it does not ignore test/ directory anymore.
bigswitch.cpp is excluded from formatting, because there are big
commented-out code blocks, and apparently clang-format messes up
formatting in them. Also to make matters worse, different clang-format
versions do different things on those commented-out code blocks.
|
|
|
|
|
| |
Locally I saw a 10% speedup on j2cl but reports of regressions have
arrived, so let's disable it for now pending investigation. The option added
here should make it easy to experiment.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the set of functions to keep was initially empty, then the profile
added new functions to keep, then the --keep-funcs functions were added, then
the --split-funcs functions were removed. This method of composing these
different options was arbitrary and not necessarily intuitive, and it prevented
reasonable workflows from working. For example, providing only a --split-funcs
list would result in all functions being split out not matter which functions
were listed.
To make the behavior of these options, and --split-funcs in particular, more
intuitive, disallow mixing them and when --split-funcs is used, split out only
the listed functions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Precompute has a mode in which it propagates results from local.sets to
local.gets. That constructs a LocalGraph which is a non-trivial amount of
work. We used to run multiple iterations of this, but investigation shows that
such opportunities are extremely rare, as doing just a single propagation
iteration has no effect on the entire emscripten benchmark suite, nor on
j2cl output. Furthermore, we run this pass twice in the normal pipeline (once
early, once late) so even if there are such opportunities they may be
optimized already. And, --converge is a way to get additional iterations of
all passes if a user wants that, so it makes sense not to costly work for more
iterations automatically.
In effect, 99.99% of the time before this pass we would create the LocalGraph
twice: once the first time, then a second time only to see that we can't
actually optimize anything further. This PR makes us only create it once, which
makes precompute-propagate 10% faster on j2cl and even faster on other things
like poppler (33%) and LLVM (29%).
See the change in the test suite for an example of a case that does require
more than one iteration to be optimized. Note that even there, we only manage
to get benefit from a second iteration by doing something that overlaps with
another pass (optimizing out an if with condition 0), which shows even more
how unnecessary the extra work was.
See #4165
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
if (A) {
if (B) {
C
}
}
=>
if (A ? B : 0) {
C
}
when B has no side effects, and is fast enough to consider running
unconditionally. In that case, we replace an if with a select and a
zero, which is the same size, but should be faster and may be
further optimized.
As suggested in #4168
|
| |
|
|
|
|
|
|
|
|
|
| |
See #4149
This modifies the test added in #4163 which used static casts on
dynamically-created structs and arrays. That was technically not
valid (as we won't want users to "mix" the two forms). This makes that
test 100% static, which both fixes the test and gives test coverage
to the new instructions added here.
|
|
|
|
|
|
|
|
|
|
|
|
| |
We added an optional ReFinalize in OptimizeInstructions at some point,
but that is not valid: The ReFinalize only updates types when all other
works is done, but the pass works incrementally. The bug the fuzzer found
is that a child is changed to be unreachable, and then the parent is
optimized before finalize() is called on it, which led to an assertion being
hit (as the child was unreachable but not the parent, which should also
be).
To fix this, do not change types in this pass. Emit an extra block with a
declared type when necessary. Other passes can remove the extra block.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
These variants take a HeapType that is the type we intend to cast to,
and do not take an RTT.
These are intended to be more statically optimizable. For now though
this PR just implements the minimum to get them parsing and to get
through the optimizer without crashing.
Spec: https://docs.google.com/document/d/1afthjsL_B9UaMqCA5ekgVmOm75BVFu6duHNsN9-gnXw/edit#
See #4149
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR helps with functions like this:
function foo(x) {
if (x) {
..
lots of work here
..
}
}
If "lots of work" is large enough, then we won't inline such a
function. However, we may end up calling into the function
only to get a false on that if and immediately exit. So it is useful
to partially inline this function, basically by creating a split
of it into a condition part that is inlineable
function foo$inlineable(x) {
if (x) {
foo$outlined();
}
}
and an outlined part that is not inlineable:
function foo$outlined(x) {
..
lots of work here
..
}
We can then inline the inlineable part. That means that a call
like
foo(param);
turns into
if (param) {
foo$outlined();
}
In other words, we end up replacing a call and then a check with
a check and then a call. Any time that the condition is false, this
will be a speedup.
The cost here is increased size, as we duplicate the condition
into the callsites. For that reason, only do this when heavily
optimizing for size.
This is a 10% speedup on j2cl. This helps two types of functions
there: Java class inits, which often look like "have I been
initialized before? if not, do all this work", and also assertion
methods which look like "if the input is null, throw an
exception".
|
|
|
|
|
|
|
|
|
|
| |
If we can remove such traps, we can remove ref.as_non_null if the local
type is nullable anyhow.
If we support non-nullable locals, however, then do not do this, as it
could inhibit specializing the local type later. Do the same for tees which
we had existing code for.
Background: #4061 (comment)
|
|
|
|
|
|
| |
This means that when check.py tries to run the lit tests with
BINARYEN_PASS_DEBUG, this is now correctly reflected in the tests. Manually
validated to catch the bug identified in
https://github.com/WebAssembly/binaryen/pull/4130#discussion_r709619855.
|
|
|
|
|
|
|
|
|
| |
That PR reused the same node twice in the output, which fails on the
assertion in BINARYEN_PASS_DEBUG=1 mode.
No new test is needed because the existing test suite fails already
in that mode. That the PR managed to land seems to say that we are
not testing pass-debug mode on our lit tests, which we need to
investigate.
|
|
|
|
|
|
| |
Avoids a crash in calling getHeapType when there isn't one.
Also add the relevant lit test (and a few others) to the list of files to
fuzz more heavily.
|
| |
|