summaryrefslogtreecommitdiff
path: root/src/ir/LocalGraph.cpp
Commit message (Collapse)AuthorAgeFilesLines
* LocalGraph::canMoveSet (#7039)Alon Zakai2024-11-111-40/+166
| | | | | This new API lets us ask if a set can be safely moved to a new position. The new position must be the location of an expression from a particular class (this allows us to populate the IR once and then query any of those locations).
* [NFC] Add isSSA to LazyLocalGraph, and use it in OptimizeAddedConstants (#6952)Alon Zakai2024-09-181-0/+32
| | | This makes the pass 15% faster.
* [NFC] Make Precompute use a lazy LocalGraph (#6934)Alon Zakai2024-09-121-6/+46
| | | | | | | To do this, add locations and getInfluences to LazyLocalGraph. Both cannot really be computed in a fine-grained manner, so just compute them all on the first request. That is not as efficient as our lazy computation of getSets and setInfluences, but they are also less important, and this change makes the pass 20% faster.
* [NFC] Make LazyLocalGraph even lazier (#6919)Alon Zakai2024-09-101-1/+14
| | | | | | | | | | | | | | | Do not even construct the Flower helper class until we actually need it. This avoids even scanning the function and building the internal CFG if we never get any API call that needs it. This speeds up LICM by 50% (as now we never construct the CFG if we don't find a loop), and Stack IR-enabled binary writing by 10% (as many functions do not have locals in positions that can be optimized using LocalGraph). This moves |locations| from the base class to LocalGraph. It is not needed in the lazy version, so that makes sense for now (we can't keep it in the base, as then it would need to be mutable, which only makes sense for laziness).
* [NFC] LazyLocalGraph: Add getSetInfluences() (#6909)Alon Zakai2024-09-091-6/+62
| | | | | This new API on lazy local graphs allows us to use laziness in another place, StackIR opts. This makes writing the binary (which includes StackIR opts, when those are enabled), 10% faster.
* [NFC] Add a lazy mode to LocalGraph (#6895)Alon Zakai2024-09-051-27/+135
| | | | | | | | | | LocalGraph by default will compute all the local.sets that can be read from all local.gets. However, many passes only query a small amount of those. To avoid wasted work, add a lazy mode that only computes sets when asked about a get. This is then used in a single place, LoopInvariantCodeMotion, which becomes 18% faster.
* [NFC] Refactor LocalGraph to split up flow() for future laziness work (#6880)Alon Zakai2024-09-031-88/+150
|
* [NFC] Refactor LocalGraph's core getSets API (#6877)Alon Zakai2024-08-281-13/+13
| | | | | | | | | | | | | | Before we just had a map that people would access with localGraph.getSetses[get], while now it is a call localGraph.getSets(get), which more nicely hides the internal implementation details. Also rename getSetses => getSetsMap. This will allow a later PR to optimize the internals of this API. This is performance-neutral as far as I can measure. (We do replace a direct read from a data structure with a call, but the call is in a header and should always get inlined.)
* Fix unreachable code in LocalGraph by making it imprecise there (#6048)Alon Zakai2023-10-241-18/+10
| | | | | | | | | | Followup to #6046 - the fuzzer found we missed handling the case of the entry itself being unreachable, or of an unreachable loop later. Properly identifying unreachable code requires a flow analysis, unfortunately, so this PR gives up on that and instead allows LocalGraph to be imprecise in unreachable code. That avoids adding any overhead, but does mean the IR may be slightly confusing when debugging. It does not have any optimization downsides, however, as it only affects unreachable code. Also add a dump() impl in that file which helps debugging.
* [NFC] LocalGraph: Optimize params with no sets (#6046)Alon Zakai2023-10-241-1/+34
| | | | | | | | | | | If a local index has no sets, then all gets of that index read from the entry block (a param, or a zero for a local). This is actually a common case, where a param has no other set, and so it is worth optimizing, which this PR does by avoiding any flowing operation at all for that index: we just skip and write the entry block as the source of information for such gets. #6042 on precompute-propagate goes from 3 minutes to 2 seconds with this (!). But that testcase is rather special in that it is a huge function with many, many gets in it, so the overhead we remove is very noticeable there.
* [NFC] LocalGraph: Move definition to logical place (#6045)Alon Zakai2023-10-241-3/+5
| | | | | | | | | | | | allGets was declared in a scope that kept it alive for all blocks, and at the end of the loop we clear the gets for a particular block. That's clumsy, and makes a followup harder, so this PR moves it to the natural place for it. (That is, it moves it to the scope that handles a particular block, and removes the manual clearing-out of the get at the end of the loop iteration.) Optimizing compilers are smart enough to be efficient about stack allocations of objects inside loops anyhow (which I measured). Helps #6042.
* CFGWalker: Allow users to ignore branches outside the function [NFC] (#5838)Alon Zakai2023-07-261-0/+4
| | | | | | | | | A pass that just operates on locals, for example, does not care about branches outside of the function. That means that when we see a call, then even if EH is enabled we don't need to create a new basic block right after it (unless the call is inside a try-catch - then it might branch to the catch, of course). This makes CFG-using passes 9% faster compared to before this PR. This plus #5827 offset the slowdown from #5823 and overall give an improvement compared to before.
* End the current basic block on a Call (#5823)Alon Zakai2023-07-261-3/+5
| | | | | | | | | | | | | Before this PR, if a call had no paths to a catch in the same function then we skipped creating a new basic block right after it. As a result, we could have a call in the middle of a basic block. If EH is enabled that means we might transfer control flow out of the function from the middle of a block. But it is better to have the property that any transfer of control flow - to another basic block, or outside of the function - can only happen at the end of a basic block. This causes some overhead, but a subsequent PR (#5838) will remove that as a followup, and this PR adds a little code to pass the module and check if EH is enabled, and avoid the overhead if not, which at least avoids regressing the non-EH case until that followup lands.
* [Debugging] Fix compile error for dumping LocalGraph (#5055)Axis2022-09-201-4/+4
|
* Modernize code to C++17 (#3104)Max Graey2021-11-221-16/+7
|
* Allow only computing necessary influences in LocalGraph. NFC (#3861)Alon Zakai2021-05-051-6/+12
| | | | | | | Some passes need setInfluences but not getInfluences, but were computing them nonetheless. This makes e.g. MergeLocals 12% faster. It will also help use LocalGraph in new passes with less worries about speed.
* Add LocalGraph::equivalent (#3848)Alon Zakai2021-04-291-1/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This compares two local.gets and checks whether we are sure they are equivalent, that is, they contain the same value. This does not solve the general problem, but uses the existing info to get a positive answer for the common case where two gets only receive values by a single set, like (local.set $x ..) (a use.. (local.get $x)) (another use.. (local.get $x)) If they only receive values from the same single set, then we know it must dominate them. The only risk is that the set is "in between" the gets, that is, that the set occurs after one get and before the other. That can happen in a loop in theory, (loop $loop (use (local.get $x)) (local.set $x ..some new value each iteration..) (use (local.get $x)) (br_if $loop ..) ) Both of those gets receive a value from the set, and they may be different values, from different loop iterations. But as mentioned in the source code, this is not a problem since wasm always has a zero-initialization value, and so the first local.get in that loop would have another set from which it can receive a value, the function entry. (The only way to avoid that is for this entire code to be unreachable, in which case nothing matters.) This will be useful in dead store elimination, which has to use this to reason about references and pointers in order to be able to do anything useful with GC and memory.
* Refactor printing code so that printing Expressions always works (#3450)Alon Zakai2020-12-171-1/+0
| | | | | | | | This avoids needing to add include wasm-printing if a file doesn't already have it. To achieve that, add the std::ostream hooks in wasm.h, and also use them when possible, removing the need for the special WasmPrinter object. Also stop printing in "full" (print types on each line) in error messages by default. The user can still get that, as always, using BINARYEN_PRINT_FULL=1 in the env.
* Reflect instruction renaming in code (#2128)Heejin Ahn2019-05-211-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Reflected new renamed instruction names in code and tests: - `get_local` -> `local.get` - `set_local` -> `local.set` - `tee_local` -> `local.tee` - `get_global` -> `global.get` - `set_global` -> `global.set` - `current_memory` -> `memory.size` - `grow_memory` -> `memory.grow` - Removed APIs related to old instruction names in Binaryen.js and added APIs with new names if they are missing. - Renamed `typedef SortedVector LocalSet` to `SetsOfLocals` to prevent name clashes. - Resolved several TODO renaming items in wasm-binary.h: - `TableSwitch` -> `BrTable` - `I32ConvertI64` -> `I32WrapI64` - `I64STruncI32` -> `I64SExtendI32` - `I64UTruncI32` -> `I64UExtendI32` - `F32ConvertF64` -> `F32DemoteI64` - `F64ConvertF32` -> `F64PromoteF32` - Renamed `BinaryenGetFeatures` and `BinaryenSetFeatures` to `BinaryenModuleGetFeatures` and `BinaryenModuleSetFeatures` for consistency.
* clang-tidy braces changes (#2075)Alon Zakai2019-05-011-4/+8
| | | Applies the changes in #2065, and temprarily disables the hook since it's too slow to run on a change this large. We should re-enable it in a later commit.
* Apply format changes from #2048 (#2059)Alon Zakai2019-04-261-33/+46
| | | Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
* Consistently optimize small added constants into load/store offsets (#1924)Alon Zakai2019-03-011-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | See #1919 - we did not do this consistently before. This adds a lowMemoryUnused option to PassOptions. It can be passed on the commandline with --low-memory-unused. If enabled, we run the new optimize-added-constants pass, which does the real work here, replacing older code in post-emscripten. Aside from running at the proper time (unlike the old pass, see #1919), this also has a -propagate mode, which can do stuff like this: y = x + 10 [..] load(y) [..] load(y) => y = x + 10 [..] load(x, offset=10) [..] load(x, offset=10) That is, it can propagate such offsets to the loads/stores. This pattern is common in big interpreter loops, where the pointers are offsets into a big struct of state. The pass does this propagation by using a new feature of LocalGraph, which can verify which locals are in SSA mode. Binaryen IR is not SSA (intentionally, since it's a later IR), but if a local only has a single set for all gets, that means that local is in such a state, and can be optimized. The tricky thing is that all locals are initialized to zero, so there are at minimum two sets. But if we verify that the real set dominates all the gets, then the zero initialization cannot reach them, and we are safe. This PR also makes safe-heap aware of lowMemoryUnused. If so, we check for not just an access of 0, but the range 0-1023. This makes zlib 5% faster, with either the wasm backend or asm2wasm. It also makes it 0.5% smaller. Also helps sqlite (1.5% faster) and lua (1% faster)
* Massive renaming (#1855)Thomas Lively2019-01-071-3/+3
| | | | | | Automated renaming according to https://github.com/WebAssembly/spec/issues/884#issuecomment-426433329.
* Some minor LocalGraph improvements (#1625)Alon Zakai2018-07-211-80/+57
| | | | | * Remove the Action class - we just need a pointer to a get or set. This simplifies the code and saves a little memory, but doesn't seem to have any impact on speed. * Miscellaneous code style and comment changes.
* Speedup localgraph (#1610)Loppin Vincent2018-07-201-18/+64
| | | | | | | | | | * LocalGraph : Replace seen unordered_set by boolean check. * LocalGraph : use unordered_map to store index -> last set_local instead of vector. * LocalGraph : - Use internal counter to avoid invalidation at each cycle. - Move all blocks structs into a contiguous vector of smaller ones.
* Improve LocalGraph (#1382)Alon Zakai2018-01-241-211/+164
| | | | | This simplifies the logic there into a more standard flow operation. This is not always faster, but it is much faster on the worst cases we saw before like sqlite, and it is simpler. The rewrite also fixes a fuzz bug.
* merge-locals pass (#1334)Alon Zakai2017-12-171-5/+2
| | | | | | | | | This optimizes the situation described in #1331. Namely, when x is copied into y, then on subsequent gets of x we could use y instead, and vice versa, as their value is equal. Specifically, this seems to get rid of the definite overlap in the live ranges of x and y, as removing it allows coalesce-locals to merge them. The pass therefore does nothing if the live range of y ends there anyhow. The danger here is that we may extend the live range so that it causes more conflicts with other things, so this is a heuristic, but I've tested it on every codebase I can find and it always produces a net win, even on one I saw a 0.4% reduction of code size, which surprised me. This is a fairly slow pass, because it uses LocalGraph which isn't much optimized. This PR includes a minor optimization for it, but we should rewrite it. Meanwhile this is just enabled in -O3 and -Oz. This PR also includes some fuzzing improvements, to better test stuff like this.
* notation change: AST => IR (#1245)Alon Zakai2017-10-241-0/+273
The IR is indeed a tree, but not an "abstract syntax tree" since there is no language for which it is the syntax (except in the most trivial and meaningless sense).