| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
As the name of a class, uppercase seems better here.
|
|
|
|
|
|
|
|
|
|
|
| |
The code had a different workaround, namely to add a block with an explicit
type to avoid changing the type at all (i.e. the block declared the original
type of the thing we replaced, not the refined type). But by adding a block we
can end up with an invalid pop, as the fuzzer found, see the EH testcase here.
To fix this, use the usual workaround of just running ReFinalize. That is simpler
and also improves the code while we do it. It does add more work, but this is
likely a very rare situation anyhow.
|
|
|
|
|
|
|
|
|
| |
A pass that just operates on locals, for example, does not care about branches outside of
the function. That means that when we see a call, then even if EH is enabled we don't need
to create a new basic block right after it (unless the call is inside a try-catch - then it might
branch to the catch, of course).
This makes CFG-using passes 9% faster compared to before this PR. This plus #5827 offset
the slowdown from #5823 and overall give an improvement compared to before.
|
| |
|
|
|
| |
Same testcase as in #5287 but in another pass.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously only WalkerPasses had access to the `getPassRunner` and
`getPassOptions` methods. Move those methods to `Pass` so all passes can use
them. As a result, the `PassRunner` passed to `Pass::run` and
`Pass::runOnFunction` is no longer necessary, so remove it.
Also update `Pass::create` to return a unique_ptr, which is more efficient than
having it return a raw pointer only to have the `PassRunner` wrap that raw
pointer in a `unique_ptr`.
Delete the unused template `PassRunner::getLast()`, which looks like it was
intended to enable retrieving previous analyses and has been in the code base
since 2015 but is not implemented anywhere.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An overview of this is in the README in the diff here (conveniently, it is near the
top of the diff). Basically, we fix up nn locals after each pass, by default. This keeps
things easy to reason about - what validates is what is valid wasm - but there are
some minor nuances as mentioned there, in particular, we ignore nameless blocks
(which are commonly added by various passes; ignoring them means we can keep
more locals non-nullable).
The key addition here is LocalStructuralDominance which checks which local
indexes have the "structural dominance" property of 1a, that is, that each get has
a set in its block or an outer block that precedes it. I optimized that function quite
a lot to reduce the overhead of running that logic after each pass. The overhead
is something like 2% on J2Wasm and 0% on Dart (0%, because in this mode we
shrink code size, so there is less work actually, and it balances out).
Since we run fixups after each pass, this PR removes logic to manually call the
fixup code from various places we used to call it (like eh-utils and various passes).
Various passes are now marked as requiresNonNullableLocalFixups => false.
That lets us skip running the fixups after them, which we normally do automatically.
This helps avoid overhead. Most passes still need the fixups, though - any pass
that adds a local, or a named block, or moves code around, likely does.
This removes a hack in SimplifyLocals that is no longer needed. Before we
worked to avoid moving a set into a try, as it might not validate. Now, we just do it
and let fixups happen automatically if they need to: in the common code they
probably don't, so the extra complexity seems not worth it.
Also removes a hack from StackIR. That hack tried to avoid roundtrip adding a
nondefaultable local. But we have the logic to fix that up now, and opts will
likely keep it non-nullable as well.
Various tests end up updated here because now a local can be non-nullable -
previous fixups are no longer needed.
Note that this doesn't remove the gc-nn-locals feature. That has been useful for
testing, and may still be useful in the future - it basically just allows nn locals in
all positions (that can't read the null default value at the entry). We can consider
removing it separately.
Fixes #4824
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in a function. (#4567)
* Lift the restriction in liveness-traversal.h that supported max 65535 locals in a function.
* Lint
* Fix typo
* Fix static
* Lint
* Lint
* Lint
* Add needed canRun function
* lint
* Use either a sparse or a dense matrix for tracking liveness copies, depending on the locals count.
* Lint
* Fix lint
* Lint
* Implement sparse_square_matrix class and use that as a backing.
* Lint
* Lint
* Lint #includes
* Lint
* Lint includes
* Remove unnecessary code
* Fix canonical accesses to copies matrix
* Lint
* Add missing variable update
* Remove canRun() function
* Address review
* Update expected test results
* Update test name
* Add asserts to sparse_square_matrix set and get functions that they are not out of bound.
* Lint includes
* Update test expectation
* Use .clear() + .resize() to reset totalCopies vector
|
|
|
|
|
| |
Fixes #4562
Fixes #4564
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the old hardcoded value numbering in that pass and makes
it use the new code that was split into helper code. The immediate benefit
of this is to make the code aware of identical constants: if two locals have
the same constant then they do not interfere. Future improvements to
numbering will also automatically help here.
This changes some constants in existing tests so that they keep testing
what they were testing before, and adds new tests for the new benefit here.
This implements a proposed TODO from #4314
|
|
|
|
|
|
| |
Tiny followup to #4314
Also updates some function types in test output, fixing breakage on
main after racing landings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old algorithm can be summarized as: In each basic block, start at the beginning.
Each pair of live locals there might interfere with each other, as they might arrive from
different entry blocks with different values. Afterwards, go through the block and find
overlapping live ranges, and mark interferences there as well.
This is non-linear because at the start of the block we do a double-loop over all
pairs of live locals, which in general can be O(N^2) (N - number of locals). It also
has the downside of ignoring copies: if two locals have overlapping live ranges but
they must have identical values on those ranges, they do not actually interfere,
for example
x = 10;
y = x;
.. // live ranges overlap here
foo(x, y); // live ranges end here.
We can ignore this overlap since the copy shows they are identical there, but the
pass did not take this into account. To some extent other passes can remove such
copies (SimplifyLocals, MergeLocals, RedundantSetElimination), but in general
this was a weak spot for the optimizer.
I realized there is a solution to both these problems: In Wasm, given that we have
a default value for all locals, if a local is live at the start of a block then it must be
live at the end of all the blocks reaching it. That is so because the liveness will
extend backwards all the way to some set of the local, possibly all the way to
the zero-initialization at the start of the function, and it extends that way through
all predecessor blocks. A consequence of this is that there are no interferences
between locals that only occur during a merge: The live ranges include the
predecessor blocks, and theirs, and so forth, until we reach a block where one
of the locals is assigned a value different than the other. That is a necessary and
sufficient condition for intererence, and therefore when processing a block we
only need to look at its contents, and can ignore the merging of control flow,
which allows us to be linear.
More details on this and on the new algorithm in comments in the source, but
the basic idea is that it simply goes through each block in a linear way, finding
which values are assigned to each local (using a numbering of unique values),
and noting which are live at each time. If two locals are live and one is assigned
a value that is not the same as the value in the other, mark them as interfering.
This is of substantial benefit to j2wasm output, I believe because it is common
there to find local subexpression elimination opportunities after inlining, and
each time we find one we add a local. If we inline different functions into the
same target, we may end up with copied locals for each of them. (This was
not noticed in the past because it is very rare on LLVM output, which has
already had inlining and GVN etc. done.)
There is a small benefit to LLVM output as well, though just a few
percent at best. However, it is enough to be noticeable on some of
the code size tests.
This is also faster than the previous pass. It's normally not noticeable
as this pass is not one of the slowest anyhow, but I found some real-world
codebases where the pass becomes 50% faster. I have not found any
case where it is slower than the old algorithm.
Fuzzed over several days to be sure this is correct, and also verified
on the emscripten test suite.
|
|
|
| |
Emscripten must have rolled in a new warning about using `|` on booleans.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Precompute not only computes values, but looks at the fallthrough,
(local.set 0
(block
..stuff we can ignore..
;; the fallthrough we care about - if a value is set to local 0, it is this
(i32.const 10)
)
)
Normally that is fine, but the fuzzer found a case where it is not: RefCast may
return a different type than the fallthrough, even an incompatible type if we
try to do something bad like cast a function to a struct. As we may then
propagate the value to a place that expects the proper type, this can cause an
error.
To fix this, check if the precomputed value is a proper subtype. If it is not,
then do not look through into the fallthrough, but compute the entire thing.
(In the case of a bad RefCast of a func to a struct, it would then indicate a
trap happens, and we would not precompute the value.)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the addDefault* methods would avoid adding opt passes that we
know are incompatible with DWARF. However, that didn't handle the case of
passes that are added in other ways. For example, when running Asyncify,
emcc will run --flatten before, and that pass is not compatible with DWARF.
This PR lets us warn on that by annotating the passes themselves. Then we
use those annotation to either not run a pass at all (matching the previous
behavior) or to show a warning when necessary.
Fixes emscripten-core/emscripten#13288 . That is, concretely
after this PR running asyncify + DWARF will show a warning to the user.
|
|
|
|
|
|
|
|
|
|
|
| |
coalesce-locals is nonlinear in the number of locals, so it is
greatly beneficial to reorder the locals (which then drops the
unused ones at the end automatically). The default passes
do this already, but wasm2js does some custom work, and
this was missing.
With this change that pass takes 10x less time on poppler
with --flatten --flatten --simplify-locals-notee-nostructure
which approximates what wasm2js does.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The analysis currently uses a dense matrix. If there are >65535
locals then the indexes don't fit in a 32-bit type like a wasm32
index, which led to overflows and incorrect behavior. To avoid
that, don't run passes with liveness analysis for now if they have
that many locals.
Note that skipping coalesce-locals (the main liveness-using
pass) is not that bad, as we run it more than once, and it's
likely that even if the first must be skipped, we can still run
the second (which is after simplify- and reorder-locals, which
can greatly reduce the local count).
Fixes #2559
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Reflected new renamed instruction names in code and tests:
- `get_local` -> `local.get`
- `set_local` -> `local.set`
- `tee_local` -> `local.tee`
- `get_global` -> `global.get`
- `set_global` -> `global.set`
- `current_memory` -> `memory.size`
- `grow_memory` -> `memory.grow`
- Removed APIs related to old instruction names in Binaryen.js and added
APIs with new names if they are missing.
- Renamed `typedef SortedVector LocalSet` to `SetsOfLocals` to prevent
name clashes.
- Resolved several TODO renaming items in wasm-binary.h:
- `TableSwitch` -> `BrTable`
- `I32ConvertI64` -> `I32WrapI64`
- `I64STruncI32` -> `I64SExtendI32`
- `I64UTruncI32` -> `I64UExtendI32`
- `F32ConvertF64` -> `F32DemoteI64`
- `F64ConvertF32` -> `F64PromoteF32`
- Renamed `BinaryenGetFeatures` and `BinaryenSetFeatures` to
`BinaryenModuleGetFeatures` and `BinaryenModuleSetFeatures` for
consistency.
|
|
|
| |
Applies the changes in #2065, and temprarily disables the hook since it's too slow to run on a change this large. We should re-enable it in a later commit.
|
|
|
| |
Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
|
|
|
| |
Removed semicolons that cause errors when compiling with -pedantic-errors.
|
| |
|
|
|
|
| |
that one var by reusing a param
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If copies is the case where an if arm is a get that feeds into
a set of the same local:
(set_local $x
(if (result i32)
(..condition..)
(..result)
(get_local $x)
)
)
We can rework this so that the if-else is only an if, which
executes the code path not going to the get.
This was done in coalesce-locals only because it is likely to
work there as after coalescing there are more copies. However,
the logic is of removing a branch, and so belongs in
remove-unused-brs, and fits alongside existing logic there
for handling ifs with an arm that is a br. Also refactor that
code so that the two optimizations can feed into each other.
|
|
|
|
|
|
|
|
|
| |
If locals are known to contain the same value, we can
* Pick which local to use for a get_local of any of them. Makes sense to prefer the most common, to increase the chance of one dropping to zero uses.
* Remove copies between a local and one that we know contains the same value.
This is a consistent win, small though, around 0.1-0.2%.
|
| |
|
|
|
|
| |
* rename WasmType to Type. it's in the wasm:: namespace anyhow, and without Wasm- it fits in better alongside Index, Address, Expression, Module, etc.
|
|
|
|
|
| |
This simplifies the logic there into a more standard flow operation. This is not always faster, but it is much faster on the worst cases we saw before like sqlite, and it is simpler.
The rewrite also fixes a fuzz bug.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an experiment to help with Boehm-style GC. It will spill things that could be pointers to the C stack, so that they can be seen by conservative garbage collection.
The spills add code size and runtime overhead, but actually less than I thought: 10% slower (smaller than the difference between VMs), 15% gzip size larger. We can do even better with more optimizations for this, like a dead store elimination pass.
This PR does the following:
* Add the new pass.
* Create an abi/ dir, with info about the pointer size and stack manipulation utilities.
* Separates out the liveness analysis from CoalesceLocals, so that other passes can use it (like SpillPointers).
* Refactor out the SortedVector class from the liveness analysis to a separate file (just seems nicer that way).
|
|
|
| |
The IR is indeed a tree, but not an "abstract syntax tree" since there is no language for which it is the syntax (except in the most trivial and meaningless sense).
|
| |
|
|
|
|
| |
the get in that arm. only when it is not a tee can we remove that arm and make the if-else into an if
|
| |
|
|
|
|
| |
need the value, as it may do things
|
|
|
|
|
|
|
| |
* Add SSA pass which ensures a single assign for each local, except for merged locals where we ensure exactly a single assign from one of the paths leading to that use
* Also add InstrumentLocals pass, useful for debugging locals (similar to InstrumentMemory but for locals)
* Fix a PickLoadSigns bug with tees not being ignored, which was not noticed until now because we ran it on flatter output by default, but the ssa pass uncovered the bug
|
|
|
|
|
|
|
| |
* validate that types are properly finalized, when in pass-debug mode (BINARYEN_PASS_DEBUG env var): check after each pass is run that the type of each node is equal to the proper type (when finalizing it, i.e., fully recomputing the type).
* fix many fuzz bugs found by that.
* in particular, fix dce bugs with type changes not being fully updated during code removal. add a new TypeUpdater helper class that lets a pass update types efficiently, by the helper tracking deps between blocks and branches etc., and updating/propagating type changes only as necessary.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* add debugInfo option to passes, and use it to keep debug info alive through optimizations when we need it
* add fib testcase for debug info
* when preserving debug info, do not move code around call-imports, so debug info intrinsics remain stationary
* improve wasm-module-building handling of the single-threaded case: don't create workers, which is more efficient and also nicer for debugging
* process debug info in a more precise way, reordering it from being after the node (as it was a comment in JS) to before the node
* remove unreachable hack for debug info, which is no longer needed since we reorder them, and make sure to finalize blocks in which we reorder
|
| |
|
|
|
|
|
|
|
|
| |
* add a coalesce-locals test for costly backedges
* add script to strip local names out, for convenient diffing
* prioritize backedge copies
|
| |
|
|
|
|
| |
* Implement binary search using <algorithm>
|
|
|
|
| |
anyhow, and it just adds overhead to not ignore it
|
| |
|
|
|
|
| |
when in practice we need a lot less (and on e.g. sqlite, n^2 is very large)
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
coalesce-locals
|
| |
|