summaryrefslogtreecommitdiff
path: root/src/passes/OptimizeAddedConstants.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [NFC] Add isSSA to LazyLocalGraph, and use it in OptimizeAddedConstants (#6952)Alon Zakai2024-09-181-7/+5
| | | This makes the pass 15% faster.
* [NFC] Avoid collecting unnecessary parents in OptimizeAddedConstants (#6953)Alon Zakai2024-09-181-2/+24
| | | This makes the pass 20% faster.
* [NFC] OptimizeAddedConstants: Early exit if there are no memories (#6926)Alon Zakai2024-09-101-0/+5
| | | | | | | | The pass optimizes loads and stores, so without a memory there is nothing to do. This only helps if the user set --low-memory-unused and also has no memory, which is likely rare, but it's a trivial change so it seems worthwhile. In particular this pass constructs a LocalGraph, so if we can avoid work it can be substantial.
* [NFC] Standardize Super:: over super:: (#6920)Alon Zakai2024-09-101-1/+1
| | | | As the name of a class, uppercase seems better here.
* [NFC] Convert LocalGraph influences accesses to function calls (#6899)Alon Zakai2024-09-041-1/+1
| | | | | This replaces direct access of the data structure graph.*influences[foo] with a call graph.get*influences(foo). This will allow a later PR to make those calls optionally lazy.
* [NFC] Refactor LocalGraph's core getSets API (#6877)Alon Zakai2024-08-281-1/+1
| | | | | | | | | | | | | | Before we just had a map that people would access with localGraph.getSetses[get], while now it is a call localGraph.getSets(get), which more nicely hides the internal implementation details. Also rename getSetses => getSetsMap. This will allow a later PR to optimize the internals of this API. This is performance-neutral as far as I can measure. (We do replace a direct read from a data structure with a call, but the call is in a header and should always get inlined.)
* OptimizeAddedConstants: Replace an assert with a proper error (#6375)Alon Zakai2024-03-041-2/+5
| | | See #6373
* End the current basic block on a Call (#5823)Alon Zakai2023-07-261-1/+1
| | | | | | | | | | | | | Before this PR, if a call had no paths to a catch in the same function then we skipped creating a new basic block right after it. As a result, we could have a call in the middle of a basic block. If EH is enabled that means we might transfer control flow out of the function from the middle of a block. But it is better to have the property that any transfer of control flow - to another basic block, or outside of the function - can only happen at the end of a basic block. This causes some overhead, but a subsequent PR (#5838) will remove that as a followup, and this PR adds a little code to pass the module and check if EH is enabled, and avoid the overhead if not, which at least avoids regressing the non-EH case until that followup lands.
* [NFC] Remove our bespoke `make_unique` implementation (#5613)Thomas Lively2023-03-311-1/+1
| | | | This code predates our adoption of C++14 and can now be removed in favor of `std::make_unique`, which should be more efficient.
* Update comment in OptimizeAddedConstants.cpp (#5283)Alon Zakai2022-11-291-2/+2
|
* Refactor interaction between Pass and PassRunner (#5093)Thomas Lively2022-09-301-1/+3
| | | | | | | | | | | | | | Previously only WalkerPasses had access to the `getPassRunner` and `getPassOptions` methods. Move those methods to `Pass` so all passes can use them. As a result, the `PassRunner` passed to `Pass::run` and `Pass::runOnFunction` is no longer necessary, so remove it. Also update `Pass::create` to return a unique_ptr, which is more efficient than having it return a raw pointer only to have the `PassRunner` wrap that raw pointer in a `unique_ptr`. Delete the unused template `PassRunner::getLast()`, which looks like it was intended to enable retrieving previous analyses and has been in the code base since 2015 but is not implemented anywhere.
* Add wasm64 support in OptimizeAddedConstants (#5043)Axis2022-09-211-9/+24
| | | This lets that pass optimize 64-bit offsets on memory64 loads and stores.
* [Wasm GC] Support non-nullable locals in the "1a" form (#4959)Alon Zakai2022-08-311-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An overview of this is in the README in the diff here (conveniently, it is near the top of the diff). Basically, we fix up nn locals after each pass, by default. This keeps things easy to reason about - what validates is what is valid wasm - but there are some minor nuances as mentioned there, in particular, we ignore nameless blocks (which are commonly added by various passes; ignoring them means we can keep more locals non-nullable). The key addition here is LocalStructuralDominance which checks which local indexes have the "structural dominance" property of 1a, that is, that each get has a set in its block or an outer block that precedes it. I optimized that function quite a lot to reduce the overhead of running that logic after each pass. The overhead is something like 2% on J2Wasm and 0% on Dart (0%, because in this mode we shrink code size, so there is less work actually, and it balances out). Since we run fixups after each pass, this PR removes logic to manually call the fixup code from various places we used to call it (like eh-utils and various passes). Various passes are now marked as requiresNonNullableLocalFixups => false. That lets us skip running the fixups after them, which we normally do automatically. This helps avoid overhead. Most passes still need the fixups, though - any pass that adds a local, or a named block, or moves code around, likely does. This removes a hack in SimplifyLocals that is no longer needed. Before we worked to avoid moving a set into a try, as it might not validate. Now, we just do it and let fixups happen automatically if they need to: in the common code they probably don't, so the extra complexity seems not worth it. Also removes a hack from StackIR. That hack tried to avoid roundtrip adding a nondefaultable local. But we have the logic to fix that up now, and opts will likely keep it non-nullable as well. Various tests end up updated here because now a local can be non-nullable - previous fixups are no longer needed. Note that this doesn't remove the gc-nn-locals feature. That has been useful for testing, and may still be useful in the future - it basically just allows nn locals in all positions (that can't read the null default value at the entry). We can consider removing it separately. Fixes #4824
* Modernize code to C++17 (#3104)Max Graey2021-11-221-2/+1
|
* Use the new module version of EffectAnalyzer (#4116)Alon Zakai2021-08-311-2/+1
| | | | | | | | | | | This finishes the refactoring started in #4115 by doing the same change to pass a Module into EffectAnalyzer instead of features. To do so this refactors the fallthrough API and a few other small things. After those changes, this PR removes the old feature constructor of EffectAnalyzer entirely. This requires a small breaking change in the C API, changing BinaryenExpressionGetSideEffects's feature param to a module. That makes this change not NFC, but otherwise it is.
* Allow only computing necessary influences in LocalGraph. NFC (#3861)Alon Zakai2021-05-051-1/+1
| | | | | | | Some passes need setInfluences but not getInfluences, but were computing them nonetheless. This makes e.g. MergeLocals 12% faster. It will also help use LocalGraph in new passes with less worries about speed.
* Add EH support for EffectAnalyzer (#2631)Heejin Ahn2020-02-031-1/+2
| | | | | | | | | | | | | | | | | | | | This adds EH support to `EffectAnalyzer`. Before `throw` and `rethrow` conservatively set property. Now `EffectAnalyzer` has a new property `throws` to represent an expression that can throw, and expression that can throw sets `throws` correctly. When EH is enabled, any calls can throw too, so we cannot reorder them with another expression with any side effects, meaning all calls should be treated in the same way as branches when evaluating `invalidate`. This prevents many reorderings, so this patch sets `throws` for calls only when the exception handling features is enabled. This is also why I passed `--disable-exception-handling` to `wasm2js` tests. Most of code changes outside of `EffectAnalyzer` class was made in order to pass `FeatureSet` to it. `throws` isn't always set whenever an expression contains a throwable instruction. When an throwable instruction is within an inner try, it will be caught by the corresponding inner catch, so it does not set `throws`.
* [NFC] Enforce use of `Type::` on type names (#2434)Thomas Lively2020-01-071-3/+3
|
* [NFC] Clean up unnecessary `template`s in calls 🧹🧹🧹 (#2394)Thomas Lively2020-01-071-4/+4
|
* Reflect instruction renaming in code (#2128)Heejin Ahn2019-05-211-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Reflected new renamed instruction names in code and tests: - `get_local` -> `local.get` - `set_local` -> `local.set` - `tee_local` -> `local.tee` - `get_global` -> `global.get` - `set_global` -> `global.set` - `current_memory` -> `memory.size` - `grow_memory` -> `memory.grow` - Removed APIs related to old instruction names in Binaryen.js and added APIs with new names if they are missing. - Renamed `typedef SortedVector LocalSet` to `SetsOfLocals` to prevent name clashes. - Resolved several TODO renaming items in wasm-binary.h: - `TableSwitch` -> `BrTable` - `I32ConvertI64` -> `I32WrapI64` - `I64STruncI32` -> `I64SExtendI32` - `I64UTruncI32` -> `I64UExtendI32` - `F32ConvertF64` -> `F32DemoteI64` - `F64ConvertF32` -> `F64PromoteF32` - Renamed `BinaryenGetFeatures` and `BinaryenSetFeatures` to `BinaryenModuleGetFeatures` and `BinaryenModuleSetFeatures` for consistency.
* Apply format changes from #2048 (#2059)Alon Zakai2019-04-261-51/+63
| | | Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
* Optimize added constants with propagation only if we see we will remove all ↵Alon Zakai2019-03-061-8/+68
| | | | uses of the original add, as otherwise we may just be adding work (both an offset, and an add). Refactor local-utils.h, and make UnneededSetRemover also check for side effects, so it cleanly removes all traces of unneeded sets.
* Run multiple iterations in OptimizeAddedConstantsAlon Zakai2019-03-061-13/+33
| | | | | Multiple propagations may be possible in some cases, like nested structs in C.
* Propagate a load/store offset even if locals are not in ssa formAlon Zakai2019-03-061-11/+89
| | | | | | | | | | | | | | | | | The initial OptimizeAddedConstants pass did not try to handle the case of non-ssa locals. However, that can happen, and optimizing those cases too improves us by almost 1% of code size on some large benchmarks like bullet. How this works is that if we see b = a + 10 a = c load(b) then we copy the base value at the add, a' = a b = a' + 10 a = c load(a', offset=10) This no longer has a guarantee of improving code size, since in theory both b and a may have other uses. However, in practice it's very common for b to be optimized out later.
* Consistently optimize small added constants into load/store offsets (#1924)Alon Zakai2019-03-011-0/+239
See #1919 - we did not do this consistently before. This adds a lowMemoryUnused option to PassOptions. It can be passed on the commandline with --low-memory-unused. If enabled, we run the new optimize-added-constants pass, which does the real work here, replacing older code in post-emscripten. Aside from running at the proper time (unlike the old pass, see #1919), this also has a -propagate mode, which can do stuff like this: y = x + 10 [..] load(y) [..] load(y) => y = x + 10 [..] load(x, offset=10) [..] load(x, offset=10) That is, it can propagate such offsets to the loads/stores. This pattern is common in big interpreter loops, where the pointers are offsets into a big struct of state. The pass does this propagation by using a new feature of LocalGraph, which can verify which locals are in SSA mode. Binaryen IR is not SSA (intentionally, since it's a later IR), but if a local only has a single set for all gets, that means that local is in such a state, and can be optimized. The tricky thing is that all locals are initialized to zero, so there are at minimum two sets. But if we verify that the real set dominates all the gets, then the zero initialization cannot reach them, and we are safe. This PR also makes safe-heap aware of lowMemoryUnused. If so, we check for not just an access of 0, but the range 0-1023. This makes zlib 5% faster, with either the wasm backend or asm2wasm. It also makes it 0.5% smaller. Also helps sqlite (1.5% faster) and lua (1% faster)