summaryrefslogtreecommitdiff
path: root/src/passes/Heap2Local.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Modernize code to C++17 (#3104)Max Graey2021-11-221-2/+1
|
* Heap2Local: Handle loops (#4288)Alon Zakai2021-10-281-2/+8
| | | | | | | When the allocation we optimize away flows through a loop, then just like with a block we must change the type to be nullable, since we are replacing the allocation with a null. Fixes #4287
* Add a SmallSet and use it in LocalGraph. NFC (#4188)Alon Zakai2021-09-291-1/+1
| | | | | | | | | | | | | | | | | A SmallSet starts with fixed storage that it uses in the simplest possible way (linear scan, no sorting). If it exceeds a size then it starts using a normal std::set. So for small amounts of data it avoids allocation and any other overhead. This adds a unit test and also uses it in LocalGraph which provides a large amount of additional coverage. I also changed an unrelated data structure from std::map to std::unordered_map which I noticed while doing profiling in LocalGraph. (And a tiny bit of additional refactoring there.) This makes LocalGraph-using passes like ssa-nomerge and precompute-propagate 10-15% faster on a bunch of real-world codebases I tested.
* [Wasm GC] Implement static (rtt-free) StructNew, ArrayNew, ArrayInit (#4172)Alon Zakai2021-09-231-1/+3
| | | | | | | | | See #4149 This modifies the test added in #4163 which used static casts on dynamically-created structs and arrays. That was technically not valid (as we won't want users to "mix" the two forms). This makes that test 100% static, which both fixes the test and gives test coverage to the new instructions added here.
* Use the new module version of EffectAnalyzer (#4116)Alon Zakai2021-08-311-2/+2
| | | | | | | | | | | This finishes the refactoring started in #4115 by doing the same change to pass a Module into EffectAnalyzer instead of features. To do so this refactors the fallthrough API and a few other small things. After those changes, this PR removes the old feature constructor of EffectAnalyzer entirely. This requires a small breaking change in the C API, changing BinaryenExpressionGetSideEffects's feature param to a module. That makes this change not NFC, but otherwise it is.
* [Wasm GC] Fix Heap2Local + non-nullable locals (#4017)Alon Zakai2021-07-231-9/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | Given (local $x (ref $foo)) (local.set $x ..) (.. (local.get $x)) If we remove the local.set but not the get, then we end up with (local $x (ref $foo)) (.. (local.get $x)) It looks like the local.get reads the initial value of a non-nullable local, which is not allowed. In practice, this would crash in precompute-propagate which would try to propagate the initial value to the get. Add an assertion there with a clear message, as until we have full validation of non-nullable locals (and the spec for that is in flux), that pass is where bugs will end up being noticed. To fix this, replace the get as well. We can replace it with a null for simplicity; it will never be used anyhow. This also uncovered a small bug with reached not containing all the things we reached - it was missing local.gets.
* [Wasm GC] Heap2Local: Replace the allocation with null (#3893)Alon Zakai2021-05-171-24/+54
| | | | | | | | | | | | | | | | Previously we would try to stop using the allocation as much as possible, for example not writing it to locals any more, and leaving it to other passes to actually remove it (and remove gets of those locals etc.). This seemed simpler and more modular, but does not actually work in some cases as the fuzzer has found. Specifically, if we stop writing our allocation to locals, then if we do a (ref.as_non_null (local.get ..)) of that, then we will trap on the null present in the local. Instead, this changes our rewriting to do slightly more work, but it is simpler in the end. We replace the allocation with a null, and replace all the places that use it accordingly, for example, updating types to be nullable, and removing RefAsNonNulls, etc. This literally gets rid of the allocation and all the places it flows to (leaving less for other passes to do later).
* [Wasm GC] Heap2Local: Handle branches (#3881)Alon Zakai2021-05-121-11/+24
| | | | | | | | | | | | | | | | | | If we branch to a block, and there are no other branches or a final value on the block either, then there is no mixing, and we may be able to optimize the allocation. Before this PR, all branches stopped us. To do this, add some helpers in BranchUtils. The main flow logic in Heap2Local used to stop when we reached a child for the second time. With branches, however, a child can flow both to its immediate parent, and to branch targets, and so the proper thing to look at is when we reach a parent for the second time (which would definitely indicate mixing). Tests are added for the new functionality. Note that some existing tests already covered some things we should not optimize, and so no tests were needed for them. The existing ones are: $get-through-block, $branch-to-block.
* Heap2Local: Use escape analysis to turn heap allocations into local data (#3866)Alon Zakai2021-05-121-0/+701
If we allocate some GC data, and do not let the reference escape, then we can replace the allocation with locals, one local for each field in the allocation basically. This avoids the allocation, and also allows us to optimize the locals further. On the Dart DeltaBlue benchmark, this is a 24% speedup (making it faster than the JS version, incidentially), and also a 6% reduction in code size. The tests are not the best way to show what this does, as the pass assumes other passes will clean up after. Here is an example to clarify. First, in pseudocode: ref = new Int(42) do { ref.set(ref.get() + 1) } while (import(ref.get()) That is, we allocate an int on the heap and use it as a counter. Unnecessarily, as it could be a normal int on the stack. Wat: (module ;; A boxed integer: an entire struct just to hold an int. (type $boxed-int (struct (field (mut i32)))) (import "env" "import" (func $import (param i32) (result i32))) (func "example" (local $ref (ref null $boxed-int)) ;; Allocate a boxed integer of 42 and save the reference to it. (local.set $ref (struct.new_with_rtt $boxed-int (i32.const 42) (rtt.canon $boxed-int) ) ) ;; Increment the integer in a loop, looking for some condition. (loop $loop (struct.set $boxed-int 0 (local.get $ref) (i32.add (struct.get $boxed-int 0 (local.get $ref) ) (i32.const 1) ) ) (br_if $loop (call $import (struct.get $boxed-int 0 (local.get $ref) ) ) ) ) ) ) Before this pass, the optimizer could do essentially nothing with this. Even with this pass, running -O1 has no effect, as the pass is only used in -O2+. However, running --heap2local -O1 leads to this: (func $0 (local $0 i32) (local.set $0 (i32.const 42) ) (loop $loop (br_if $loop (call $import (local.tee $0 (i32.add (local.get $0) (i32.const 1) ) ) ) ) ) ) All the GC heap operations have been removed, and we just have a plain int now, allowing a bunch of other opts to run. That output is basically the optimal code, I think.