summaryrefslogtreecommitdiff
path: root/src/ir/local-graph.h
Commit message (Collapse)AuthorAgeFilesLines
* LocalGraph::canMoveSet (#7039)Alon Zakai2024-11-111-1/+37
| | | | | This new API lets us ask if a set can be safely moved to a new position. The new position must be the location of an expression from a particular class (this allows us to populate the IR once and then query any of those locations).
* [NFC] Add isSSA to LazyLocalGraph, and use it in OptimizeAddedConstants (#6952)Alon Zakai2024-09-181-0/+15
| | | This makes the pass 15% faster.
* [NFC] Make Precompute use a lazy LocalGraph (#6934)Alon Zakai2024-09-121-0/+26
| | | | | | | To do this, add locations and getInfluences to LazyLocalGraph. Both cannot really be computed in a fine-grained manner, so just compute them all on the first request. That is not as efficient as our lazy computation of getSets and setInfluences, but they are also less important, and this change makes the pass 20% faster.
* [NFC] Make LazyLocalGraph even lazier (#6919)Alon Zakai2024-09-101-5/+12
| | | | | | | | | | | | | | | Do not even construct the Flower helper class until we actually need it. This avoids even scanning the function and building the internal CFG if we never get any API call that needs it. This speeds up LICM by 50% (as now we never construct the CFG if we don't find a loop), and Stack IR-enabled binary writing by 10% (as many functions do not have locals in positions that can be optimized using LocalGraph). This moves |locations| from the base class to LocalGraph. It is not needed in the lazy version, so that makes sense for now (we can't keep it in the base, as then it would need to be mutable, which only makes sense for laziness).
* [NFC] LazyLocalGraph: Add getSetInfluences() (#6909)Alon Zakai2024-09-091-11/+30
| | | | | This new API on lazy local graphs allows us to use laziness in another place, StackIR opts. This makes writing the binary (which includes StackIR opts, when those are enabled), 10% faster.
* [NFC] Add a lazy mode to LocalGraph (#6895)Alon Zakai2024-09-051-29/+72
| | | | | | | | | | LocalGraph by default will compute all the local.sets that can be read from all local.gets. However, many passes only query a small amount of those. To avoid wasted work, add a lazy mode that only computes sets when asked about a get. This is then used in a single place, LoopInvariantCodeMotion, which becomes 18% faster.
* [NFC] Convert LocalGraph influences accesses to function calls (#6899)Alon Zakai2024-09-041-5/+25
| | | | | This replaces direct access of the data structure graph.*influences[foo] with a call graph.get*influences(foo). This will allow a later PR to make those calls optionally lazy.
* [NFC] Refactor LocalGraph to split up flow() for future laziness work (#6880)Alon Zakai2024-09-031-5/+14
|
* [NFC] Refactor LocalGraph's core getSets API (#6877)Alon Zakai2024-08-281-18/+29
| | | | | | | | | | | | | | Before we just had a map that people would access with localGraph.getSetses[get], while now it is a call localGraph.getSets(get), which more nicely hides the internal implementation details. Also rename getSetses => getSetsMap. This will allow a later PR to optimize the internals of this API. This is performance-neutral as far as I can measure. (We do replace a direct read from a data structure with a call, but the call is in a header and should always get inlined.)
* Fix unreachable code in LocalGraph by making it imprecise there (#6048)Alon Zakai2023-10-241-0/+8
| | | | | | | | | | Followup to #6046 - the fuzzer found we missed handling the case of the entry itself being unreachable, or of an unreachable loop later. Properly identifying unreachable code requires a flow analysis, unfortunately, so this PR gives up on that and instead allows LocalGraph to be imprecise in unreachable code. That avoids adding any overhead, but does mean the IR may be slightly confusing when debugging. It does not have any optimization downsides, however, as it only affects unreachable code. Also add a dump() impl in that file which helps debugging.
* End the current basic block on a Call (#5823)Alon Zakai2023-07-261-2/+6
| | | | | | | | | | | | | Before this PR, if a call had no paths to a catch in the same function then we skipped creating a new basic block right after it. As a result, we could have a call in the middle of a basic block. If EH is enabled that means we might transfer control flow out of the function from the middle of a block. But it is better to have the property that any transfer of control flow - to another basic block, or outside of the function - can only happen at the end of a basic block. This causes some overhead, but a subsequent PR (#5838) will remove that as a followup, and this PR adds a little code to pass the module and check if EH is enabled, and avoid the overhead if not, which at least avoids regressing the non-EH case until that followup lands.
* Add a SmallSet and use it in LocalGraph. NFC (#4188)Alon Zakai2021-09-291-7/+11
| | | | | | | | | | | | | | | | | A SmallSet starts with fixed storage that it uses in the simplest possible way (linear scan, no sorting). If it exceeds a size then it starts using a normal std::set. So for small amounts of data it avoids allocation and any other overhead. This adds a unit test and also uses it in LocalGraph which provides a large amount of additional coverage. I also changed an unrelated data structure from std::map to std::unordered_map which I noticed while doing profiling in LocalGraph. (And a tiny bit of additional refactoring there.) This makes LocalGraph-using passes like ssa-nomerge and precompute-propagate 10-15% faster on a bunch of real-world codebases I tested.
* Allow only computing necessary influences in LocalGraph. NFC (#3861)Alon Zakai2021-05-051-1/+7
| | | | | | | Some passes need setInfluences but not getInfluences, but were computing them nonetheless. This makes e.g. MergeLocals 12% faster. It will also help use LocalGraph in new passes with less worries about speed.
* Add LocalGraph::equivalent (#3848)Alon Zakai2021-04-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This compares two local.gets and checks whether we are sure they are equivalent, that is, they contain the same value. This does not solve the general problem, but uses the existing info to get a positive answer for the common case where two gets only receive values by a single set, like (local.set $x ..) (a use.. (local.get $x)) (another use.. (local.get $x)) If they only receive values from the same single set, then we know it must dominate them. The only risk is that the set is "in between" the gets, that is, that the set occurs after one get and before the other. That can happen in a loop in theory, (loop $loop (use (local.get $x)) (local.set $x ..some new value each iteration..) (use (local.get $x)) (br_if $loop ..) ) Both of those gets receive a value from the set, and they may be different values, from different loop iterations. But as mentioned in the source code, this is not a problem since wasm always has a zero-initialization value, and so the first local.get in that loop would have another set from which it can receive a value, the function entry. (The only way to avoid that is for this entire code to be unreachable, in which case nothing matters.) This will be useful in dead store elimination, which has to use this to reason about references and pointers in order to be able to do anything useful with GC and memory.
* Reflect instruction renaming in code (#2128)Heejin Ahn2019-05-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Reflected new renamed instruction names in code and tests: - `get_local` -> `local.get` - `set_local` -> `local.set` - `tee_local` -> `local.tee` - `get_global` -> `global.get` - `set_global` -> `global.set` - `current_memory` -> `memory.size` - `grow_memory` -> `memory.grow` - Removed APIs related to old instruction names in Binaryen.js and added APIs with new names if they are missing. - Renamed `typedef SortedVector LocalSet` to `SetsOfLocals` to prevent name clashes. - Resolved several TODO renaming items in wasm-binary.h: - `TableSwitch` -> `BrTable` - `I32ConvertI64` -> `I32WrapI64` - `I64STruncI32` -> `I64SExtendI32` - `I64UTruncI32` -> `I64UExtendI32` - `F32ConvertF64` -> `F32DemoteI64` - `F64ConvertF32` -> `F64PromoteF32` - Renamed `BinaryenGetFeatures` and `BinaryenSetFeatures` to `BinaryenModuleGetFeatures` and `BinaryenModuleSetFeatures` for consistency.
* Apply format changes from #2048 (#2059)Alon Zakai2019-04-261-13/+18
| | | Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
* Consistently optimize small added constants into load/store offsets (#1924)Alon Zakai2019-03-011-3/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | See #1919 - we did not do this consistently before. This adds a lowMemoryUnused option to PassOptions. It can be passed on the commandline with --low-memory-unused. If enabled, we run the new optimize-added-constants pass, which does the real work here, replacing older code in post-emscripten. Aside from running at the proper time (unlike the old pass, see #1919), this also has a -propagate mode, which can do stuff like this: y = x + 10 [..] load(y) [..] load(y) => y = x + 10 [..] load(x, offset=10) [..] load(x, offset=10) That is, it can propagate such offsets to the loads/stores. This pattern is common in big interpreter loops, where the pointers are offsets into a big struct of state. The pass does this propagation by using a new feature of LocalGraph, which can verify which locals are in SSA mode. Binaryen IR is not SSA (intentionally, since it's a later IR), but if a local only has a single set for all gets, that means that local is in such a state, and can be optimized. The tricky thing is that all locals are initialized to zero, so there are at minimum two sets. But if we verify that the real set dominates all the gets, then the zero initialization cannot reach them, and we are safe. This PR also makes safe-heap aware of lowMemoryUnused. If so, we check for not just an access of 0, but the range 0-1023. This makes zlib 5% faster, with either the wasm backend or asm2wasm. It also makes it 0.5% smaller. Also helps sqlite (1.5% faster) and lua (1% faster)
* Massive renaming (#1855)Thomas Lively2019-01-071-2/+2
| | | | | | Automated renaming according to https://github.com/WebAssembly/spec/issues/884#issuecomment-426433329.
* fix an ssa bug: we can't assume that even if all set locations assign to the ↵Alon Zakai2018-02-221-1/+1
| | | | same local that it is ok to read that local, as locals may also be written elsewhere (#1423)
* Improve LocalGraph (#1382)Alon Zakai2018-01-241-58/+8
| | | | | This simplifies the logic there into a more standard flow operation. This is not always faster, but it is much faster on the worst cases we saw before like sqlite, and it is simpler. The rewrite also fixes a fuzz bug.
* merge-locals pass (#1334)Alon Zakai2017-12-171-2/+1
| | | | | | | | | This optimizes the situation described in #1331. Namely, when x is copied into y, then on subsequent gets of x we could use y instead, and vice versa, as their value is equal. Specifically, this seems to get rid of the definite overlap in the live ranges of x and y, as removing it allows coalesce-locals to merge them. The pass therefore does nothing if the live range of y ends there anyhow. The danger here is that we may extend the live range so that it causes more conflicts with other things, so this is a heuristic, but I've tested it on every codebase I can find and it always produces a net win, even on one I saw a 0.4% reduction of code size, which surprised me. This is a fairly slow pass, because it uses LocalGraph which isn't much optimized. This PR includes a minor optimization for it, but we should rewrite it. Meanwhile this is just enabled in -O3 and -Oz. This PR also includes some fuzzing improvements, to better test stuff like this.
* notation change: AST => IR (#1245)Alon Zakai2017-10-241-0/+111
The IR is indeed a tree, but not an "abstract syntax tree" since there is no language for which it is the syntax (except in the most trivial and meaningless sense).