| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
When generating assertions, traverse the `WASTScript` data structure rather than
interleaving assertion parsing with emitting.
|
|
|
|
|
|
|
|
|
|
|
| |
Each time we inline we put the contents in a block. Before we used the same name
each time we inlined the same method, and as a result had many conflicts if a
function was inlined many times. With this PR we emit a different name each
time.
This is not 100% NFC as it does change block names, which is observable in the IR
(as can be seen in the test updates).
This helps #5696 in speeding up UniqueNameManner.
|
|
|
|
|
|
|
| |
As noted in #4739, legacy language emitting nan and infinity
exists, with the observation that it can be removed once asm.js
is no longer used and global NaN is available.
This commit removes that asm.js-specific code accordingly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
possible (#5194)
(local.set $refined (cast (local.get $plain)))
..
.. (local.get $plain) .. ;; we can change this to read from $refined
By using the more refined type we may be able to eliminate casts later.
To do this, look at the fallthrough value (so we can look through a cast or a block
value - this is the reason for the small wasm2js improvements in tests), and also
extend the code that picks which local index to read to look at types (previously
we just ignored any pairs of locals with different types).
|
|
|
|
|
|
|
|
| |
The previous code was making emscripten-specific assumptions about
imports basically all coming from the `env` module.
I can't find a way to make this backwards compatible so may do a
combined roll with the emscripten-side change:
https://github.com/emscripten-core/emscripten/pull/17806
|
|
|
|
|
|
| |
This import was being injected and then used to implement trapping.
Rather than injecting an import that doesn't exist in the original
module we instead use the existing mechanism to implement this as
an internal helper.
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we were assuming asmLibraryArg which is what emscripten
passes as the `env` import object but using this method is more
flexible and should allow wasm2js to work with import that are
not all form a single object.
The slight size increase here is just temporary until emscripten
gets updated.
See https://github.com/emscripten-core/emscripten/pull/17737
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old algorithm can be summarized as: In each basic block, start at the beginning.
Each pair of live locals there might interfere with each other, as they might arrive from
different entry blocks with different values. Afterwards, go through the block and find
overlapping live ranges, and mark interferences there as well.
This is non-linear because at the start of the block we do a double-loop over all
pairs of live locals, which in general can be O(N^2) (N - number of locals). It also
has the downside of ignoring copies: if two locals have overlapping live ranges but
they must have identical values on those ranges, they do not actually interfere,
for example
x = 10;
y = x;
.. // live ranges overlap here
foo(x, y); // live ranges end here.
We can ignore this overlap since the copy shows they are identical there, but the
pass did not take this into account. To some extent other passes can remove such
copies (SimplifyLocals, MergeLocals, RedundantSetElimination), but in general
this was a weak spot for the optimizer.
I realized there is a solution to both these problems: In Wasm, given that we have
a default value for all locals, if a local is live at the start of a block then it must be
live at the end of all the blocks reaching it. That is so because the liveness will
extend backwards all the way to some set of the local, possibly all the way to
the zero-initialization at the start of the function, and it extends that way through
all predecessor blocks. A consequence of this is that there are no interferences
between locals that only occur during a merge: The live ranges include the
predecessor blocks, and theirs, and so forth, until we reach a block where one
of the locals is assigned a value different than the other. That is a necessary and
sufficient condition for intererence, and therefore when processing a block we
only need to look at its contents, and can ignore the merging of control flow,
which allows us to be linear.
More details on this and on the new algorithm in comments in the source, but
the basic idea is that it simply goes through each block in a linear way, finding
which values are assigned to each local (using a numbering of unique values),
and noting which are live at each time. If two locals are live and one is assigned
a value that is not the same as the value in the other, mark them as interfering.
This is of substantial benefit to j2wasm output, I believe because it is common
there to find local subexpression elimination opportunities after inlining, and
each time we find one we add a local. If we inline different functions into the
same target, we may end up with copied locals for each of them. (This was
not noticed in the past because it is very rare on LLVM output, which has
already had inlining and GVN etc. done.)
There is a small benefit to LLVM output as well, though just a few
percent at best. However, it is enough to be noticeable on some of
the code size tests.
This is also faster than the previous pass. It's normally not noticeable
as this pass is not one of the slowest anyhow, but I found some real-world
codebases where the pass becomes 50% faster. I have not found any
case where it is slower than the old algorithm.
Fuzzed over several days to be sure this is correct, and also verified
on the emscripten test suite.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Canonicalize:
(signed)x > -1 ==> x >= 0
(signed)x <= -1 ==> x < 0
(signed)x < 1 ==> x <= 0
(signed)x >= 1 ==> x > 0
(unsigned)x < 1 ==> x == 0
(unsigned)x >= 1 ==> x != 0
This should help #4265, and in general 0 is usually a more
common constant, and reasonable to canonicalize to.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable it in -O3 and -Os and higher.
This helps very little on output from LLVM, but also it does not alter
compile times much anyhow. On code that has not been run through
an optimizing compiler already, this can help quite a lot, e.g., 15% of
code size on some wasm GC samples.
This will not normally help with speed, as optimizing VMs do such
things anyhow. However, this can help baseline compilers and
interpreters and so forth.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements emscripten-core/emscripten#13744
Inlining functions with a single use allows us to remove the function afterward.
That looks highly beneficial, shrinking every single benchmark in emscripten's
benchmark suite, by an average of 2% on the macrobenchmarks and 3.5% on
all of them. Speed also improves, although mostly on the microbenchmarks so
that might be less realistic.
There may be a slight downside to startup time due to emitting larger functions,
but given the baseline compilers in VMs these days it seems worth it, as the
delay would be just to get to the upper tier. On the benchmark suite the risk
seems low.
See more details in the PR above.
|
| |
|
|
|
|
|
|
|
| |
Using addition in more places is better for gzip, and helps simplify the
optimizer as well.
Add a FinalOptimizer phase to do optimizations like our signed LEB tweaks, to
reduce binary size in the rare case when we do want a subtraction.
|
| |
|
|
|
|
|
| |
Selectify turns an if-else into a select where possible. Previously we abandoned
hope if any part of the if had a side effect. But it's fine for the condition to have a
side effect, so long as moving it to the end doesn't invalidate the arms.
|
| |
|
|
|
|
|
| |
Also, format the asmFunc call to make it more readable in the ES6
modules case.
|
| |
|
| |
|
|
|
| |
We don't ever emit "use asm" anymore, so this similar annotation is not really useful, it just increases size.
|
| |
|
| |
|
|
|
| |
This happens on e.g. an i32 load of a constant offset, then we have constant >> 2.
|
|
|
| |
This helps quite a lot on wasm2js.
|
|
|
|
|
| |
In JS a reinterpret is especially expensive, as we implement it as a write to a temp buffer and a read using another view. This finds places where we load a value from memory, then reinterpret it later - in that case, we can load it using another view, at the cost of another load and another local.
This is helpful on things like Box2D, where there are many reinterprets due to the main 2D vector class being an union over two floats/ints, and LLVM likes to do a single i64 load of them.
|
|
|
|
| |
When loading a boolean, prefer the signed heap (which is more commonly used, and may be faster).
We never use HEAPU32 (HEAP32 is always enough), just remove it.
|
|
|
| |
A minifier would probably remove them later anyhow, but they make reading the code annoying and hard.
|
|
|
| |
In particular, coalesce-locals is useful even if closure is run later (apparently it finds stuff closure can't).
|
|
We flatten for the i64 lowering etc. passes, and it is worth optimizing afterwards, to clean up stuff they created. That is run if the user ran wasm2js with an optimization level (like wasm2js -O3).
Split the test files to check both optimized and unoptimized code.
|