summaryrefslogtreecommitdiff
path: root/src/passes/CoalesceLocals.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [NFC] Standardize Super:: over super:: (#6920)Alon Zakai2024-09-101-1/+1
| | | | As the name of a class, uppercase seems better here.
* CoalesceLocals: ReFinalize when we refine a type (#6502)Alon Zakai2024-04-161-12/+12
| | | | | | | | | | | The code had a different workaround, namely to add a block with an explicit type to avoid changing the type at all (i.e. the block declared the original type of the thing we replaced, not the refined type). But by adding a block we can end up with an invalid pop, as the fuzzer found, see the EH testcase here. To fix this, use the usual workaround of just running ReFinalize. That is simpler and also improves the code while we do it. It does add more work, but this is likely a very rare situation anyhow.
* CFGWalker: Allow users to ignore branches outside the function [NFC] (#5838)Alon Zakai2023-07-261-0/+4
| | | | | | | | | A pass that just operates on locals, for example, does not care about branches outside of the function. That means that when we see a call, then even if EH is enabled we don't need to create a new basic block right after it (unless the call is inside a try-catch - then it might branch to the catch, of course). This makes CFG-using passes 9% faster compared to before this PR. This plus #5827 offset the slowdown from #5823 and overall give an improvement compared to before.
* Use C++17's [[maybe_unused]]. NFC (#5309)Sam Clegg2022-12-021-2/+1
|
* [Wasm GC] Fix CoalesceLocals on tees that receive a refined type (#5289)Alon Zakai2022-11-221-3/+22
| | | Same testcase as in #5287 but in another pass.
* Refactor interaction between Pass and PassRunner (#5093)Thomas Lively2022-09-301-2/+6
| | | | | | | | | | | | | | Previously only WalkerPasses had access to the `getPassRunner` and `getPassOptions` methods. Move those methods to `Pass` so all passes can use them. As a result, the `PassRunner` passed to `Pass::run` and `Pass::runOnFunction` is no longer necessary, so remove it. Also update `Pass::create` to return a unique_ptr, which is more efficient than having it return a raw pointer only to have the `PassRunner` wrap that raw pointer in a `unique_ptr`. Delete the unused template `PassRunner::getLast()`, which looks like it was intended to enable retrieving previous analyses and has been in the code base since 2015 but is not implemented anywhere.
* [Wasm GC] Support non-nullable locals in the "1a" form (#4959)Alon Zakai2022-08-311-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An overview of this is in the README in the diff here (conveniently, it is near the top of the diff). Basically, we fix up nn locals after each pass, by default. This keeps things easy to reason about - what validates is what is valid wasm - but there are some minor nuances as mentioned there, in particular, we ignore nameless blocks (which are commonly added by various passes; ignoring them means we can keep more locals non-nullable). The key addition here is LocalStructuralDominance which checks which local indexes have the "structural dominance" property of 1a, that is, that each get has a set in its block or an outer block that precedes it. I optimized that function quite a lot to reduce the overhead of running that logic after each pass. The overhead is something like 2% on J2Wasm and 0% on Dart (0%, because in this mode we shrink code size, so there is less work actually, and it balances out). Since we run fixups after each pass, this PR removes logic to manually call the fixup code from various places we used to call it (like eh-utils and various passes). Various passes are now marked as requiresNonNullableLocalFixups => false. That lets us skip running the fixups after them, which we normally do automatically. This helps avoid overhead. Most passes still need the fixups, though - any pass that adds a local, or a named block, or moves code around, likely does. This removes a hack in SimplifyLocals that is no longer needed. Before we worked to avoid moving a set into a try, as it might not validate. Now, we just do it and let fixups happen automatically if they need to: in the common code they probably don't, so the extra complexity seems not worth it. Also removes a hack from StackIR. That hack tried to avoid roundtrip adding a nondefaultable local. But we have the logic to fix that up now, and opts will likely keep it non-nullable as well. Various tests end up updated here because now a local can be non-nullable - previous fixups are no longer needed. Note that this doesn't remove the gc-nn-locals feature. That has been useful for testing, and may still be useful in the future - it basically just allows nn locals in all positions (that can't read the null default value at the entry). We can consider removing it separately. Fixes #4824
* Lift the restriction in liveness-traversal.h that supported max 65535 locals ↵juj2022-04-281-25/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in a function. (#4567) * Lift the restriction in liveness-traversal.h that supported max 65535 locals in a function. * Lint * Fix typo * Fix static * Lint * Lint * Lint * Add needed canRun function * lint * Use either a sparse or a dense matrix for tracking liveness copies, depending on the locals count. * Lint * Fix lint * Lint * Implement sparse_square_matrix class and use that as a backing. * Lint * Lint * Lint #includes * Lint * Lint includes * Remove unnecessary code * Fix canonical accesses to copies matrix * Lint * Add missing variable update * Remove canRun() function * Address review * Update expected test results * Update test name * Add asserts to sparse_square_matrix set and get functions that they are not out of bound. * Lint includes * Update test expectation * Use .clear() + .resize() to reset totalCopies vector
* Use LiteralUtils::canMakeZero before calling makeZero (#4568)Alon Zakai2022-04-011-4/+5
| | | | | Fixes #4562 Fixes #4564
* CoalesceLocals: Use ValueNumbering (#4355)Alon Zakai2021-11-241-14/+20
| | | | | | | | | | | | This removes the old hardcoded value numbering in that pass and makes it use the new code that was split into helper code. The immediate benefit of this is to make the code aware of identical constants: if two locals have the same constant then they do not interfere. Future improvements to numbering will also automatically help here. This changes some constants in existing tests so that they keep testing what they were testing before, and adds new tests for the new benefit here. This implements a proposed TODO from #4314
* CoalesceLocals: Remove a redundant tee of the same local as a parent set (#4318)Alon Zakai2021-11-091-1/+6
| | | | | | Tiny followup to #4314 Also updates some function types in test output, fixing breakage on main after racing landings.
* CoalesceLocals: Rewrite the algorithm to be linear and to ignore copies (#4314)Alon Zakai2021-11-101-27/+161
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old algorithm can be summarized as: In each basic block, start at the beginning. Each pair of live locals there might interfere with each other, as they might arrive from different entry blocks with different values. Afterwards, go through the block and find overlapping live ranges, and mark interferences there as well. This is non-linear because at the start of the block we do a double-loop over all pairs of live locals, which in general can be O(N^2) (N - number of locals). It also has the downside of ignoring copies: if two locals have overlapping live ranges but they must have identical values on those ranges, they do not actually interfere, for example x = 10; y = x; .. // live ranges overlap here foo(x, y); // live ranges end here. We can ignore this overlap since the copy shows they are identical there, but the pass did not take this into account. To some extent other passes can remove such copies (SimplifyLocals, MergeLocals, RedundantSetElimination), but in general this was a weak spot for the optimizer. I realized there is a solution to both these problems: In Wasm, given that we have a default value for all locals, if a local is live at the start of a block then it must be live at the end of all the blocks reaching it. That is so because the liveness will extend backwards all the way to some set of the local, possibly all the way to the zero-initialization at the start of the function, and it extends that way through all predecessor blocks. A consequence of this is that there are no interferences between locals that only occur during a merge: The live ranges include the predecessor blocks, and theirs, and so forth, until we reach a block where one of the locals is assigned a value different than the other. That is a necessary and sufficient condition for intererence, and therefore when processing a block we only need to look at its contents, and can ignore the merging of control flow, which allows us to be linear. More details on this and on the new algorithm in comments in the source, but the basic idea is that it simply goes through each block in a linear way, finding which values are assigned to each local (using a numbering of unique values), and noting which are live at each time. If two locals are live and one is assigned a value that is not the same as the value in the other, mark them as interfering. This is of substantial benefit to j2wasm output, I believe because it is common there to find local subexpression elimination opportunities after inlining, and each time we find one we add a local. If we inline different functions into the same target, we may end up with copied locals for each of them. (This was not noticed in the past because it is very rare on LLVM output, which has already had inlining and GVN etc. done.) There is a small benefit to LLVM output as well, though just a few percent at best. However, it is enough to be noticeable on some of the code size tests. This is also faster than the previous pass. It's normally not noticeable as this pass is not one of the slowest anyhow, but I found some real-world codebases where the pass becomes 50% faster. I have not found any case where it is slower than the old algorithm. Fuzzed over several days to be sure this is correct, and also verified on the emscripten test suite.
* Fix Emscripten build by changing `|` to `||` (#4205)Thomas Lively2021-10-041-1/+1
| | | Emscripten must have rolled in a new warning about using `|` on booleans.
* [Wasm GC] Fix precomputing of incompatible fallthrough values (#3875)Alon Zakai2021-05-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | Precompute not only computes values, but looks at the fallthrough, (local.set 0 (block ..stuff we can ignore.. ;; the fallthrough we care about - if a value is set to local 0, it is this (i32.const 10) ) ) Normally that is fine, but the fuzzer found a case where it is not: RefCast may return a different type than the fallthrough, even an incompatible type if we try to do something bad like cast a function to a struct. As we may then propagate the value to a place that expects the proper type, this can cause an error. To fix this, check if the precomputed value is a proper subtype. If it is not, then do not look through into the fallthrough, but compute the entire thing. (In the case of a bad RefCast of a func to a struct, it would then indicate a trap happens, and we would not precompute the value.)
* Warn when running a pass not compatible with DWARF (#3506)Alon Zakai2021-01-261-0/+4
| | | | | | | | | | | | Previously the addDefault* methods would avoid adding opt passes that we know are incompatible with DWARF. However, that didn't handle the case of passes that are added in other ways. For example, when running Asyncify, emcc will run --flatten before, and that pass is not compatible with DWARF. This PR lets us warn on that by annotating the passes themselves. Then we use those annotation to either not run a pass at all (matching the previous behavior) or to show a warning when necessary. Fixes emscripten-core/emscripten#13288 . That is, concretely after this PR running asyncify + DWARF will show a warning to the user.
* Run reorder-locals more in wasm2js (#2729)Alon Zakai2020-04-081-0/+5
| | | | | | | | | | | coalesce-locals is nonlinear in the number of locals, so it is greatly beneficial to reorder the locals (which then drops the unused ones at the end automatically). The default passes do this already, but wasm2js does some custom work, and this was missing. With this change that pass takes 10x less time on poppler with --flatten --flatten --simplify-locals-notee-nostructure which approximates what wasm2js does.
* Skip liveness analysis if too many locals (#2560)Alon Zakai2020-01-061-0/+3
| | | | | | | | | | | | | | | The analysis currently uses a dense matrix. If there are >65535 locals then the indexes don't fit in a 32-bit type like a wasm32 index, which led to overflows and incorrect behavior. To avoid that, don't run passes with liveness analysis for now if they have that many locals. Note that skipping coalesce-locals (the main liveness-using pass) is not that bad, as we run it more than once, and it's likely that even if the first must be skipped, we can still run the second (which is after simplify- and reorder-locals, which can greatly reduce the local count). Fixes #2559
* Reflect instruction renaming in code (#2128)Heejin Ahn2019-05-211-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Reflected new renamed instruction names in code and tests: - `get_local` -> `local.get` - `set_local` -> `local.set` - `tee_local` -> `local.tee` - `get_global` -> `global.get` - `set_global` -> `global.set` - `current_memory` -> `memory.size` - `grow_memory` -> `memory.grow` - Removed APIs related to old instruction names in Binaryen.js and added APIs with new names if they are missing. - Renamed `typedef SortedVector LocalSet` to `SetsOfLocals` to prevent name clashes. - Resolved several TODO renaming items in wasm-binary.h: - `TableSwitch` -> `BrTable` - `I32ConvertI64` -> `I32WrapI64` - `I64STruncI32` -> `I64SExtendI32` - `I64UTruncI32` -> `I64UExtendI32` - `F32ConvertF64` -> `F32DemoteI64` - `F64ConvertF32` -> `F64PromoteF32` - Renamed `BinaryenGetFeatures` and `BinaryenSetFeatures` to `BinaryenModuleGetFeatures` and `BinaryenModuleSetFeatures` for consistency.
* clang-tidy braces changes (#2075)Alon Zakai2019-05-011-7/+14
| | | Applies the changes in #2065, and temprarily disables the hook since it's too slow to run on a change this large. We should re-enable it in a later commit.
* Apply format changes from #2048 (#2059)Alon Zakai2019-04-261-69/+117
| | | Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
* Remove unnecessary semicolons (#1942)Ryoga2019-03-181-1/+1
| | | Removed semicolons that cause errors when compiling with -pedantic-errors.
* Optimize away sets of the same local (#1940)Alon Zakai2019-03-071-0/+4
|
* CoalesceLocals: run even if we have just 1 var, as we may be able to remove ↵Alon Zakai2019-03-061-5/+0
| | | | that one var by reusing a param
* Move if copy logic from coalesce-locals to remove-unused-brs.Alon Zakai (kripken)2018-12-041-33/+0
| | | | | | | | | | | | | | | | | | | | | | | If copies is the case where an if arm is a get that feeds into a set of the same local: (set_local $x (if (result i32) (..condition..) (..result) (get_local $x) ) ) We can rework this so that the if-else is only an if, which executes the code path not going to the get. This was done in coalesce-locals only because it is likely to work there as after coalescing there are more copies. However, the logic is of removing a branch, and so belongs in remove-unused-brs, and fits alongside existing logic there for handling ifs with an arm that is a br. Also refactor that code so that the two optimizations can feed into each other.
* Optimize equivalent locals (#1540)Alon Zakai2018-05-101-6/+2
| | | | | | | | | If locals are known to contain the same value, we can * Pick which local to use for a get_local of any of them. Makes sense to prefer the most common, to increase the chance of one dropping to zero uses. * Remove copies between a local and one that we know contains the same value. This is a consistent win, small though, around 0.1-0.2%.
* Fix MSVC warnings when compiling the binaryen target (#1535)Daniel Wirtz2018-05-091-2/+2
|
* Rename WasmType => Type (#1398)Alon Zakai2018-02-021-1/+1
| | | | * rename WasmType to Type. it's in the wasm:: namespace anyhow, and without Wasm- it fits in better alongside Index, Address, Expression, Module, etc.
* Improve LocalGraph (#1382)Alon Zakai2018-01-241-1/+1
| | | | | This simplifies the logic there into a more standard flow operation. This is not always faster, but it is much faster on the worst cases we saw before like sqlite, and it is simpler. The rewrite also fixes a fuzz bug.
* SpillPointers pass (#1339)Alon Zakai2017-12-301-283/+4
| | | | | | | | | | | | | This is an experiment to help with Boehm-style GC. It will spill things that could be pointers to the C stack, so that they can be seen by conservative garbage collection. The spills add code size and runtime overhead, but actually less than I thought: 10% slower (smaller than the difference between VMs), 15% gzip size larger. We can do even better with more optimizations for this, like a dead store elimination pass. This PR does the following: * Add the new pass. * Create an abi/ dir, with info about the pointer size and stack manipulation utilities. * Separates out the liveness analysis from CoalesceLocals, so that other passes can use it (like SpillPointers). * Refactor out the SortedVector class from the liveness analysis to a separate file (just seems nicer that way).
* notation change: AST => IR (#1245)Alon Zakai2017-10-241-1/+1
| | | The IR is indeed a tree, but not an "abstract syntax tree" since there is no language for which it is the syntax (except in the most trivial and meaningless sense).
* Add a superclass typedef to WalkerPass to simplify overrides (#1211)jgravelle-google2017-10-041-1/+1
|
* when removing an if-copy in coalesce-locals, if it's a tee, we do still need ↵Alon Zakai (kripken)2017-07-131-1/+3
| | | | the get in that arm. only when it is not a tee can we remove that arm and make the if-else into an if
* fix handling of an if in a tee without an else, in coalesce-localsAlon Zakai (kripken)2017-07-131-1/+3
|
* fix coalesce-locals handling of set/tee local of an unreachable; we still ↵Alon Zakai (kripken)2017-07-111-2/+6
| | | | need the value, as it may do things
* SSA pass (#1049)Alon Zakai2017-06-131-29/+1
| | | | | | | * Add SSA pass which ensures a single assign for each local, except for merged locals where we ensure exactly a single assign from one of the paths leading to that use * Also add InstrumentLocals pass, useful for debugging locals (similar to InstrumentMemory but for locals) * Fix a PickLoadSigns bug with tees not being ignored, which was not noticed until now because we ran it on flatter output by default, but the ssa pass uncovered the bug
* Validate finalization (#1014)Alon Zakai2017-05-181-7/+4
| | | | | | | * validate that types are properly finalized, when in pass-debug mode (BINARYEN_PASS_DEBUG env var): check after each pass is run that the type of each node is equal to the proper type (when finalizing it, i.e., fully recomputing the type). * fix many fuzz bugs found by that. * in particular, fix dce bugs with type changes not being fully updated during code removal. add a new TypeUpdater helper class that lets a pass update types efficiently, by the helper tracking deps between blocks and branches etc., and updating/propagating type changes only as necessary.
* Preserve debug info through the optimizer (#981)Alon Zakai2017-04-281-0/+1
| | | | | | | | | | | | | | * add debugInfo option to passes, and use it to keep debug info alive through optimizations when we need it * add fib testcase for debug info * when preserving debug info, do not move code around call-imports, so debug info intrinsics remain stationary * improve wasm-module-building handling of the single-threaded case: don't create workers, which is more efficient and also nicer for debugging * process debug info in a more precise way, reordering it from being after the node (as it was a comment in JS) to before the node * remove unreachable hack for debug info, which is no longer needed since we reorder them, and make sure to finalize blocks in which we reorder
* Optimize away copies through an if (#816)Alon Zakai2016-10-311-4/+52
|
* Add priority to copies on backedges (#791)Alon Zakai2016-10-201-1/+31
| | | | | | | | * add a coalesce-locals test for costly backedges * add script to strip local names out, for convenient diffing * prioritize backedge copies
* handle unreachable tee_local properly in coalesce-locals (#761)Alon Zakai2016-10-121-1/+5
|
* Implement binary search for coalesce-locals (#744)Loo Rong Jie2016-10-121-27/+14
| | | | * Implement binary search using <algorithm>
* track unreachable code in coalesce-locals, we know what is unreachable ↵Alon Zakai2016-10-081-1/+10
| | | | anyhow, and it just adds overhead to not ignore it
* add a commentAlon Zakai2016-09-141-1/+1
|
* allocate newCopies on demand in coalesce-locals, to avoid n^2 allocation ↵Alon Zakai2016-09-141-1/+2
| | | | when in practice we need a lot less (and on e.g. sqlite, n^2 is very large)
* coalesce-locals code cleanupAlon Zakai2016-09-141-3/+3
|
* fix getCopies return type, so that we take into account the full range of valuesAlon Zakai2016-09-141-1/+1
|
* optimization commentAlon Zakai2016-09-101-1/+1
|
* sort locals by number of total copiesAlon Zakai2016-09-101-11/+77
|
* take into account removed copies even when number of locals is the same, in ↵Alon Zakai2016-09-091-5/+46
| | | | coalesce-locals
* add drop and tee expressionsAlon Zakai2016-09-071-1/+11
|