| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
| |
Fuzzing followup to #4244.
|
|
|
|
|
| |
Code in the If condition can be moved out to before the if.
Existing test updates are 99% whitespace.
|
|
|
| |
This sets the C++ standard variable in the build to C++17, and makes use of std::optional (a C++17 library feature) in one place, to test that it's working.
|
| |
|
|
|
|
|
|
|
|
|
| |
Just as the --nominal flag forces all types to be parsed as nominal, the
--structural flag forces all types to be parsed as equirecursive. This is the
current default behavior, but a future PR will change the default to parse types
as either structural or nominal according to their syntax or encoding. This new
flag will then be necessary to get the current behavior.
Also take this opportunity to deduplicate more flags in the help tests.
|
|
|
|
| |
Very simple with the work so far, just add StructGet/ArrayGet code to check
if the field is immutable, and allow the get to go through in that case.
|
|
|
|
|
|
| |
This adds support for tag-using instructions (`throw` and `catch`) to
wasm-metadce. We had to use a hacky workaround in
emscripten-core/emscripten#15266 because of the lack of this support;
after this lands we can remove it.
|
|
|
|
|
|
|
|
| |
Switch from "extends" to M4 nominal syntax
Change all test inputs from using the old (extends $super) syntax to using the
new *_subtype syntax for their inputs and also update the printer to emit the
new syntax. Add a new test explicitly testing the old notation to make sure it
keeps working until we remove support for it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimizes this type of pattern:
(local.set $x (struct.new X Y Z))
(struct.set (local.get $x) X')
=>
(local.set $x (struct.new X' Y Z))
Note how the struct.set is removed, and X' moves to where X was.
This removes almost 90% (!) of the struct.sets in j2wasm output, which reduces
total code size by 2.5%. However, I see no speedup with this - I guess that either
this is not on the hot path, or V8 optimizes it well already, or the CPU is making
stores "free" anyhow...
|
| |
|
|
|
|
|
|
|
|
|
| |
Not sure why the current code tries to add the name even when it is
null, but it causes `dump()` to behave strangely and pollute stdout when
it tries to print `root.str`.
Also this changes code printing `Name.str` to printing just `Name`; when
`Name.str` is null, it prints `(null Name)` instead of polluting stdout,
and it is the recommended way of printing `Name` anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Precompute will run the interpreter on struct.new etc. repeatedly,
as it keeps doing so while it propagates constant values around (if one
of the operands to the struct.new becomes constant, that could have
a noticeable effect). But creating new GC data means we lose track of
their identity, and so ref.eq would not work, and we disabled basically
all struct operations. This implements identity tracking so we can start
to optimize there, which is a step towards using it for immutable field
propagation.
To track identity, always store the data representing each struct.new
in the source using the same GCData structure. That keeps identity
consistent no matter how many times we execute.
|
|
|
|
|
|
|
|
|
|
|
| |
Side effects in the first element are always ok there, as they are
not moved across anything else: they happen before their parent
both before and after the opt.
The pass just left ternary as a TODO, so do at least one part of
that now (we can do the rest as well, with some care).
This is fairly useful on array.set which has 3 operands, and the
first often has interesting things in it.
|
|
|
|
| |
This makes Binaryen match LLVM on a real-world case, which is probably
the safest heuristic to use.
|
| |
|
|
|
|
|
|
| |
This is the easy part of using immutability more: Just note immutable
fields as such when we read from them, and then a write to a struct
does not interfere with such reads. That is, only a read from a mutable
field can notice the effect of a write.
|
|
|
|
|
|
|
|
|
|
|
| |
Add an assert on not emitting a null name (which would cause
a crash a few lines down on trying to read its bytes). I hit that
when writing a buggy pass that updated field names.
Also fix the case of a type not having a name but some of its
fields having names. We can't test that atm since our text
format requires types to have names anyhow, so this is a
fix for a possible future where we do allow parsing non-named
types.
|
|
|
|
|
|
|
| |
Div/rem by a constant can be optimized by VMs, so it is usually
closer to the speed of a mul.
Div on 64-bit (either with or without a constant) can be slower
than 32-bit, so bump that up by one as well.
|
|
|
|
|
| |
`BinaryenTableSizeSetTable` was being declared in the header correctly, but defined
as `BinaryenTableSetSizeTable`. Add test for `BinaryenTableSizeGetTable` and
`BinaryenTableSizeSetTable`.
|
|
|
|
|
| |
We moved call_ref out of there, but it was still checking for the possible
presence of call_refs (using the feature), which means that even if we had
no valid tables to optimize on, we'd scan the whole module.
|
|
|
|
|
|
|
|
|
| |
This method is in parallel to runOnFunction above it. It sets the runner
and then does the walk, like that method.
Also set runner to nullptr by default. I noticed ubsan was warning on
things here, which this should avoid, but otherwise I'm not aware of an
actual bug, so this should be NFC. But it does provide a safer API
that should avoid future bugs.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Implement parsing the new {func,struct,array}_subtype format for nominal types.
For now, the new format is parsed the same way the old-style (extends X) format
is parsed, i.e. in --nominal mode types are parsed as nominal but otherwise they
are parsed as equirecursive. Intentionally do not parse the new types
unconditionally as nominal for now to allow frontends to update their nominal
text format while continuing to use the workflow of running wasm-opt without
--nominal to lower nominal types to structural types.
|
|
|
|
|
|
|
|
| |
See #4220 - this lets us handle the common case for now of simply having
an identical heap type to the table when the signature is identical.
With this PR, #4207's optimization of call_ref + table.get into
call_indirect now leads to a binary that works in V8 in nominal mode.
|
|
|
| |
Followup to #4215
|
| |
|
|
|
| |
Clearer this way.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new pass to perform global type optimization. So far this just
does one thing, to find fields with no struct.set and to turn them
immutable (where possible - sub and supertypes must agree).
To do that, this adds a GlobalTypeRewriter utility which rewrites
all the heap types in the module, allowing changes while doing so.
In this PR, the change is to flip the mutable field. Otherwise, the
utility handles all the boilerplate of creating temp heap types using
a TypeBuilder, and it handles replacing the types in every place
they are used in the module.
This is not enabled by default yet as I don't see enough of a benefit
on j2cl. This PR is basically the simplest thing to do in the space of
global type optimization, and the simplest way I can think of to
fully test the GlobalTypeRewriter (which can't be done as a unit
test, really, since we want to emit a full module and validate it etc.).
This PR builds the foundation for more complicated things like
removing unused fields, subtyping fields, and more.
|
|
|
|
|
|
|
|
|
|
|
| |
patterns (#4181)
i32(x) ? i32(x) : 0 ==> x
i32(x) ? 0 : i32(x) ==> {x, 0}
i64(x) == 0 ? 0 : i64(x) ==> x
i64(x) != 0 ? i64(x) : 0 ==> x
i64(x) == 0 ? i64(x) : 0 ==> {x, 0}
i64(x) != 0 ? 0 : i64(x) ==> {x, 0}
|
|
|
|
|
|
|
|
| |
These new nominal types do not depend on the global type sytem being changed
with the --nominal flag. Instead, they can coexist with the existing
equirecursive structural types, as required in the new milestone 4 spec. This PR
implements subtyping, upper bounding, canonicalizing, and other type operations
but using the new types in the parsers and elsewhere in Binaryen is left to a
follow-on PR.
|
|
|
|
|
|
|
|
| |
Update the binary format used in --nominal mode to match the format of nominal
types in milestone 4. In particular, types without declared supertypes are now
emitted using the nominal type codes with either `func` or `data` as their
supertypes. This change is hopefully enough to get --nominal mode code running
on V8's milestone 4 implementation until the rest of the type system changes can
be implemented for use without --nominal.
|
|
|
|
|
| |
Before this fix, the first table (index 0) is counted as its element segment
having "no table index" even when its type is not funcref, which could break
things if that table had a more specialized type.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(call_indirect
..args..
(select
(i32.const x)
(i32.const y)
(condition)
)
)
=>
(if
(condition)
(call $func-for-x
..args..
)
(call $func-for-y
..args..
)
)
To do this we must reorder the condition with the args, and also use
the args more than once, so place them all in locals.
This works towards the goal of polymorphic devirtualization, that is,
turning an indirect call of more than one possible target into more
than one direct call.
|
|
|
|
|
| |
The type field is present in all Expressions, but RefNull's delegations
marked it as if it were a new field. That meant that we process it twice.
This was just some extra work mostly.
|
|
|
|
|
|
|
|
|
| |
This just moves code outside and makes it more generic. One set of
functionality are "struct utils", which are tools to scan wasm for info
about the usage of struct fields, and to analyze that data. The other
tool is a general analysis of nominal subtypes.
The code will be useful in a few upcoming passes, so this will avoid a
significant amount of code duplication.
|
|
|
| |
Rather than load from the table and call that reference, call using the table.
|
| |
|
|
|
| |
Emscripten must have rolled in a new warning about using `|` on booleans.
|
| |
|
|
|
| |
It's deprecated in C++17
|
|
|
|
| |
Adds the part of the spec test suite that this passes (without table.set we
can't do it all).
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that they are all implemented, we can optimize them. This removes the
big if that ignored static operations, and implements things for them.
In general this matches the existing rtt-using case, but there are a few things
we can do better, which this does:
* A cast of a subtype to a type always succeeds.
* A test of a subtype to a type is always 1 (if non-nullable).
* Repeated static casts can leave just the most demanding of them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A SmallSet starts with fixed storage that it uses in the simplest
possible way (linear scan, no sorting). If it exceeds a size then it
starts using a normal std::set. So for small amounts of data it
avoids allocation and any other overhead.
This adds a unit test and also uses it in LocalGraph which provides
a large amount of additional coverage.
I also changed an unrelated data structure from std::map to
std::unordered_map which I noticed while doing profiling in
LocalGraph. (And a tiny bit of additional refactoring there.)
This makes LocalGraph-using passes like ssa-nomerge and
precompute-propagate 10-15% faster on a bunch of real-world
codebases I tested.
|
|
|
|
|
| |
Locally I saw a 10% speedup on j2cl but reports of regressions have
arrived, so let's disable it for now pending investigation. The option added
here should make it easy to experiment.
|
|
|
|
|
|
|
|
|
|
| |
By mistake the recent partial inlining work introduced quadratic time into
the compiler: erasing a function from the list of functions takes linear time,
which is why we have removeFunctions that does a group at a time.
This isn't noticeable on small programs, but on j2cl output this makes the
inlining-optimizing step 2x faster.
See #4165
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the set of functions to keep was initially empty, then the profile
added new functions to keep, then the --keep-funcs functions were added, then
the --split-funcs functions were removed. This method of composing these
different options was arbitrary and not necessarily intuitive, and it prevented
reasonable workflows from working. For example, providing only a --split-funcs
list would result in all functions being split out not matter which functions
were listed.
To make the behavior of these options, and --split-funcs in particular, more
intuitive, disallow mixing them and when --split-funcs is used, split out only
the listed functions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Precompute has a mode in which it propagates results from local.sets to
local.gets. That constructs a LocalGraph which is a non-trivial amount of
work. We used to run multiple iterations of this, but investigation shows that
such opportunities are extremely rare, as doing just a single propagation
iteration has no effect on the entire emscripten benchmark suite, nor on
j2cl output. Furthermore, we run this pass twice in the normal pipeline (once
early, once late) so even if there are such opportunities they may be
optimized already. And, --converge is a way to get additional iterations of
all passes if a user wants that, so it makes sense not to costly work for more
iterations automatically.
In effect, 99.99% of the time before this pass we would create the LocalGraph
twice: once the first time, then a second time only to see that we can't
actually optimize anything further. This PR makes us only create it once, which
makes precompute-propagate 10% faster on j2cl and even faster on other things
like poppler (33%) and LLVM (29%).
See the change in the test suite for an example of a case that does require
more than one iteration to be optimized. Note that even there, we only manage
to get benefit from a second iteration by doing something that overlaps with
another pass (optimizing out an if with condition 0), which shows even more
how unnecessary the extra work was.
See #4165
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
if (A) {
if (B) {
C
}
}
=>
if (A ? B : 0) {
C
}
when B has no side effects, and is fast enough to consider running
unconditionally. In that case, we replace an if with a select and a
zero, which is the same size, but should be faster and may be
further optimized.
As suggested in #4168
|