summaryrefslogtreecommitdiff
path: root/src/passes/LocalCSE.cpp
Commit message (Collapse)AuthorAgeFilesLines
* LocalCSE: Fix regression from #6587 by accumulating generativity (#6591)Alon Zakai2024-05-151-28/+44
| | | | | | | | | | | | | #6587 was incorrect: It checked generativity early in an incremental manner, but it did not accumulate that information as we do with hashes. As a result we could end up optimizing something with a generative child, and sadly we lacked testing for that case. This adds incremental generativity computation alongside hashes. It also splits out this check from isRelevant. Also add a test for nested effects (as opposed to generativity), but that already worked before this PR (as we compute effects and invalidation as we go, already).
* LocalCSE: Ignore traps of code in between (#6588)Alon Zakai2024-05-141-1/+11
| | | | | | | | | | | | | | | | | | Given: (ORIGINAL) (in between) (COPY) We want to change that to (local.tee $temp (ORIGINAL)) (in between) (local.get $temp) It is fine if "in between" traps: then we never reach the new local.get. This is a safer situation than most optimizations because we are not reordering anything, only replacing known-equivalent code.
* LocalCSE: Check effects/generativity early (#6587)Alon Zakai2024-05-141-10/+40
| | | | | | | | | | | | | | | | | | | | Previously we checked late, and as a result might end up failing to optimize when a sub-pattern could have worked. E.g. (call (A) ) (call (A) ) The call cannot be optimized, but the A pattern repeats. Before this PR we'd greedily focus on the entire call and then fail. After this PR we skip the call before we commit to which patterns to try to optimize, so we succeed. Add a isShallowlyGenerative helper here as we compute this step by step as we go. Also remove a parameter to the generativity code (it did not use the features it was passed).
* LocalCSE: Do not optimize small things like global.get (#6087)Alon Zakai2023-11-081-3/+6
| | | | | | | | | LocalCSE is nice for large expressions, but for small things it has always been of unclear benefit since VMs also do GVN/CSE anyhow. So we are likely not speeding anything up, but hopefully we are reducing code size at least. Doing LocalCSE on something small like a global.get is very possibly going to increase code size, however (since we add a tee, and since the local gets are of similar size to global gets - depends on LUB sizes). On real-world Java code that overhead is noticeable, so this PR makes us more careful, and we skip things of size 1 (no children).
* LocalCSE: Connect adjacent blocks in LinearExecutionWalker (#5867)Alon Zakai2023-08-081-0/+11
| | | | | | | Followup to #5860, this does the same for LocalCSE. As there, this is valid because it's ok if we branch away. This pass adds a local.tee of a reused value and then gets it later, and it's ok to add a tee even if we branch away and do not use it.
* [Wasm GC] Fix GUFA on externalize/internalize (#5220)Alon Zakai2022-11-041-1/+1
| | | | | | | These operations emit a completely different type than their input, so they must be marked as roots, and not as things that flow values through them (because then we filter everything out as the types are not compatible). Fixes #5219
* Refactor interaction between Pass and PassRunner (#5093)Thomas Lively2022-09-301-1/+3
| | | | | | | | | | | | | | Previously only WalkerPasses had access to the `getPassRunner` and `getPassOptions` methods. Move those methods to `Pass` so all passes can use them. As a result, the `PassRunner` passed to `Pass::run` and `Pass::runOnFunction` is no longer necessary, so remove it. Also update `Pass::create` to return a unique_ptr, which is more efficient than having it return a raw pointer only to have the `PassRunner` wrap that raw pointer in a `unique_ptr`. Delete the unused template `PassRunner::getLast()`, which looks like it was intended to enable retrieving previous analyses and has been in the code base since 2015 but is not implemented anywhere.
* [Wasm GC] Support non-nullable locals in the "1a" form (#4959)Alon Zakai2022-08-311-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An overview of this is in the README in the diff here (conveniently, it is near the top of the diff). Basically, we fix up nn locals after each pass, by default. This keeps things easy to reason about - what validates is what is valid wasm - but there are some minor nuances as mentioned there, in particular, we ignore nameless blocks (which are commonly added by various passes; ignoring them means we can keep more locals non-nullable). The key addition here is LocalStructuralDominance which checks which local indexes have the "structural dominance" property of 1a, that is, that each get has a set in its block or an outer block that precedes it. I optimized that function quite a lot to reduce the overhead of running that logic after each pass. The overhead is something like 2% on J2Wasm and 0% on Dart (0%, because in this mode we shrink code size, so there is less work actually, and it balances out). Since we run fixups after each pass, this PR removes logic to manually call the fixup code from various places we used to call it (like eh-utils and various passes). Various passes are now marked as requiresNonNullableLocalFixups => false. That lets us skip running the fixups after them, which we normally do automatically. This helps avoid overhead. Most passes still need the fixups, though - any pass that adds a local, or a named block, or moves code around, likely does. This removes a hack in SimplifyLocals that is no longer needed. Before we worked to avoid moving a set into a try, as it might not validate. Now, we just do it and let fixups happen automatically if they need to: in the common code they probably don't, so the extra complexity seems not worth it. Also removes a hack from StackIR. That hack tried to avoid roundtrip adding a nondefaultable local. But we have the logic to fix that up now, and opts will likely keep it non-nullable as well. Various tests end up updated here because now a local can be non-nullable - previous fixups are no longer needed. Note that this doesn't remove the gc-nn-locals feature. That has been useful for testing, and may still be useful in the future - it basically just allows nn locals in all positions (that can't read the null default value at the entry). We can consider removing it separately. Fixes #4824
* [NFC] Avoid some unnecessary copies of PassOptions (#4361)Alon Zakai2021-12-011-5/+5
| | | | | | PassOptions is a fairly large structure and even includes a std::map. I also have plans to add further fields there to make it even larger. Before doing that I noticed that in some places we copy it instead of being consistent and taking it by reference, which this PR fixes.
* Modernize code to C++17 (#3104)Max Graey2021-11-221-3/+1
|
* Rename isIntrinsicallyNondeterministic() to isGenerative() (#4092)Alon Zakai2021-09-091-2/+1
|
* Add a Module parameter to EffectAnalyzer. NFC (#4115)Alon Zakai2021-08-311-2/+2
| | | | | | | | | | | | | Knowing the module will allow us to do more analysis in the effect analyzer. For now, this just refactors the code to allow providing a module instead of features, and to infer the features from the module. This actually shortens the code in most places which is nice (just pass module instead of module->features). This modifies basically all callers to use the new module form, except for the fallthrough logic. That would require some more refactoring, so to keep this PR reasonably small that is not yet done.
* Optimize LocalCSE hash computations using a stack. NFC (#4091)Alon Zakai2021-08-181-8/+9
| | | | | | | | Before, we'd compute the hash of a child, then store that in a map, then the parent would find the child's hash in the map using the pointer to the child. But as we do a simple postorder walk, we can use a stack, and avoid hashing the child pointers. This makes it 10% faster or so.
* LocalCSE: ignore traps (#4085)Alon Zakai2021-08-171-0/+9
| | | | | | | | | | | | | | | | | | | | If we replace A A A with (local.set A) (local.get) (local.get) then it is ok for A to trap (so long as it does so deterministically), as if it does trap then the first appearance will do so, and the others not be reached anyhow. This helps GC code as often there are repeated struct.gets and such that may trap.
* LocalCSE rewrite (#4079)Alon Zakai2021-08-171-192/+468
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Technically this is not a new pass, but it is a rewrite almost from scratch. Local Common Subexpression Elimination looks for repeated patterns, stuff like this: x = (a + b) + c y = a + b => temp = a + b x = temp + c y = temp The old pass worked on flat IR, which is inefficient, and was overly complicated because of that. The new pass uses a new algorithm that I think is pretty simple, see the detailed comment at the top. This keeps the pass enabled only in -O4, like before - right after flattening the IR. That is to make this as minimal a change as possible. Followups will enable the pass in the main pipeline, that is, we will finally be able to run it by default. (Note that to make the pass work well after flatten, an extra simplify-locals is added - the old pass used to do part of simplify-locals internally, which was one source of complexity. Even so, some of the -O4 tests have changes, due to minor factors - they are just minor orderings etc., which can be seen by inspecting the outputs before and after using e.g. --metrics) This plus some followup work leads to large wins on wasm GC output. On j2cl there is a common pattern of repeated struct.gets, so common that this pass removes 85% of all struct.gets, which makes the total binary 15% smaller. However, on LLVM-emitted code the benefit is minor, less than 1%.
* Refactor LinearExecutionWalker to a separate file. NFC (#3956)Alon Zakai2021-06-281-0/+1
|
* Warn when running a pass not compatible with DWARF (#3506)Alon Zakai2021-01-261-0/+4
| | | | | | | | | | | | Previously the addDefault* methods would avoid adding opt passes that we know are incompatible with DWARF. However, that didn't handle the case of passes that are added in other ways. For example, when running Asyncify, emcc will run --flatten before, and that pass is not compatible with DWARF. This PR lets us warn on that by annotating the passes themselves. Then we use those annotation to either not run a pass at all (matching the previous behavior) or to show a warning when necessary. Fixes emscripten-core/emscripten#13288 . That is, concretely after this PR running asyncify + DWARF will show a warning to the user.
* Refactor hashing (#3023)Daniel Wirtz2020-08-121-3/+5
| | | | | | | | | * Unifies internal hashing helpers to naturally integrate with std::hash * Removes the previous custom implementation * Computed hashes are now always size_t * Introduces a hash_combine helper * Fixes an overwritten partial hash in Relooper.cpp
* Use direct pointers as Type IDs (#2745)Thomas Lively2020-04-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of using indices into the global interned type table. This means that a lock is *never* needed to access an expanded Type. The Type lock is now only acquired when a complex Type is created. On a real-world wasm2js workload this improves wall clock time by 23% on my machine with 72 cores and makes traffic on the Type lock entirely insignificant. **Before** 72 cores real 0m6.914s user 184.014s sys 0m3.995s 1 core real 0m25.903s user 0m25.658s sys 0m0.253s **After** 72 cores real 5.349s user 70.309s sys 9.691s 1 core real 25.859s user 25.615s sys 0.253s
* Fix LocalCSE's usable local selection (#2638)Heejin Ahn2020-02-051-5/+29
| | | | | | | | | | | | | | | | Now that we have subtypes, we cannot reuse any local that contains the same expression, because that local's type can be a supertype. For example: ``` (local $0 anyref) (local $1 nullref) ... (local.set $0 (ref.null)) (local.set $1 (ref.null)) ;; cannot be replaced with (local.get $0) ``` This extends `usables` map's key to contain both `HashedExpression` and the local's type, so we can get the right usable local in presence of subtypes.
* Add EH support for EffectAnalyzer (#2631)Heejin Ahn2020-02-031-6/+12
| | | | | | | | | | | | | | | | | | | | This adds EH support to `EffectAnalyzer`. Before `throw` and `rethrow` conservatively set property. Now `EffectAnalyzer` has a new property `throws` to represent an expression that can throw, and expression that can throw sets `throws` correctly. When EH is enabled, any calls can throw too, so we cannot reorder them with another expression with any side effects, meaning all calls should be treated in the same way as branches when evaluating `invalidate`. This prevents many reorderings, so this patch sets `throws` for calls only when the exception handling features is enabled. This is also why I passed `--disable-exception-handling` to `wasm2js` tests. Most of code changes outside of `EffectAnalyzer` class was made in order to pass `FeatureSet` to it. `throws` isn't always set whenever an expression contains a throwable instruction. When an throwable instruction is within an inner try, it will be caught by the corresponding inner catch, so it does not set `throws`.
* Add support for reference types proposal (#2451)Heejin Ahn2019-12-301-2/+5
| | | | | | | | | | | | This adds support for the reference type proposal. This includes support for all reference types (`anyref`, `funcref`(=`anyfunc`), and `nullref`) and four new instructions: `ref.null`, `ref.is_null`, `ref.func`, and new typed `select`. This also adds subtype relationship support between reference types. This does not include table instructions yet. This also does not include wasm2js support. Fixes #2444 and fixes #2447.
* Make local.tee's type its local's type (#2511)Heejin Ahn2019-12-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the current spec, `local.tee`'s return type should be the same as its local's type. (Discussions on whether we should change this rule is going on in WebAssembly/reference-types#55, but here I will assume this spec does not change. If this changes, we should change many parts of Binaryen transformation anyway...) But currently in Binaryen `local.tee`'s type is computed from its value's type. This didn't make any difference in the MVP, but after we have subtype relationship in #2451, this can become a problem. For example: ``` (func $test (result funcref) (local $0 anyref) (local.tee $0 (ref.func $test) ) ) ``` This shouldn't validate in the spec, but this will pass Binaryen validation with the current `local.tee` implementation. This makes `local.tee`'s type computed from the local's type, and makes `LocalSet::makeTee` get a type parameter, to which we should pass the its corresponding local's type. We don't embed the local type in the class `LocalSet` because it may increase memory size. This also fixes the type of `local.get` to be the local type where `local.get` and `local.set` pair is created from `local.tee`.
* Multivalue type creation and inspection (#2459)Thomas Lively2019-11-221-1/+1
| | | | | | | | | | | | | Adds the ability to create multivalue types from vectors of concrete value types. All types are transparently interned, so their representation is still a single uint32_t. Types can be extracted into vectors of their component parts, and all the single value types expand into vectors containing themselves. Multivalue types are not yet used in the IR, but their creation and inspection functionality is exposed and tested in the C and JS APIs. Also makes common type predicates methods of Type and improves the ergonomics of type printing.
* Reflect instruction renaming in code (#2128)Heejin Ahn2019-05-211-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Reflected new renamed instruction names in code and tests: - `get_local` -> `local.get` - `set_local` -> `local.set` - `tee_local` -> `local.tee` - `get_global` -> `global.get` - `set_global` -> `global.set` - `current_memory` -> `memory.size` - `grow_memory` -> `memory.grow` - Removed APIs related to old instruction names in Binaryen.js and added APIs with new names if they are missing. - Renamed `typedef SortedVector LocalSet` to `SetsOfLocals` to prevent name clashes. - Resolved several TODO renaming items in wasm-binary.h: - `TableSwitch` -> `BrTable` - `I32ConvertI64` -> `I32WrapI64` - `I64STruncI32` -> `I64SExtendI32` - `I64UTruncI32` -> `I64UExtendI32` - `F32ConvertF64` -> `F32DemoteI64` - `F64ConvertF32` -> `F64PromoteF32` - Renamed `BinaryenGetFeatures` and `BinaryenSetFeatures` to `BinaryenModuleGetFeatures` and `BinaryenModuleSetFeatures` for consistency.
* Apply format changes from #2048 (#2059)Alon Zakai2019-04-261-11/+12
| | | Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
* Verify flat IR where it is expected, and give a nice error (#2009)Alon Zakai2019-04-161-1/+3
| | | Fixes #2007 #2008
* Massive renaming (#1855)Thomas Lively2019-01-071-2/+2
| | | | | | Automated renaming according to https://github.com/WebAssembly/spec/issues/884#issuecomment-426433329.
* LocalCSE: Consider pass options, both size and cost (#1840)Alon Zakai2018-12-211-2/+14
| | | With this we can optimize redundant global accesses fairly well (at least locally; licm also works), see #1831
* LocalCSE fuzz fix: invalidate the set operations too (#1778)Alon Zakai2018-11-281-2/+23
| | | | | We invalidated based on effects of set values, but not of the sets themselves. Without that, a set could be overridden by something irrelevant and we thought we could still reuse the old value. Before this PR, the testcase would have the last set's value be optimized into a get, incorrectly.
* Improve local-cse (#1594)Alon Zakai2018-06-081-34/+77
| | | | | This makes it much more effective, by rewriting it to depend on flatten. In flattened IR, it is very simple to check if an expression is equivalent to one already available for use in a local, and use that one instead, basically we just track values in locals. Helps with #1521
* Rename WasmType => Type (#1398)Alon Zakai2018-02-021-1/+1
| | | | * rename WasmType to Type. it's in the wasm:: namespace anyhow, and without Wasm- it fits in better alongside Index, Address, Expression, Module, etc.
* notation change: AST => IR (#1245)Alon Zakai2017-10-241-2/+2
| | | The IR is indeed a tree, but not an "abstract syntax tree" since there is no language for which it is the syntax (except in the most trivial and meaningless sense).
* Add a superclass typedef to WalkerPass to simplify overrides (#1211)jgravelle-google2017-10-041-1/+1
|
* add the option to seek named breaks, not just taken breaks; refactor headers ↵Alon Zakai (kripken)2017-07-111-1/+1
| | | | to make this practical
* Local CSE (#930)Alon Zakai2017-03-081-0/+156
Simple local common subexpression elimination. Useful mostly to reduce code size (as VMs do GVN etc.). Enabled by default in -Oz.