diff options
author | Alon Zakai <azakai@google.com> | 2021-08-17 07:28:05 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2021-08-17 14:28:05 +0000 |
commit | ef654c63819a7d00fd0cc4181f170111fa4c15d2 (patch) | |
tree | f80b27d6e64ea5430fe4a05b7d37f8a4b59dd4b2 /test/lit/passes/local-cse.wast | |
parent | eeb864a593f08d1bebbbda5f6fbc21fa93c5b8af (diff) | |
download | binaryen-ef654c63819a7d00fd0cc4181f170111fa4c15d2.tar.gz binaryen-ef654c63819a7d00fd0cc4181f170111fa4c15d2.tar.bz2 binaryen-ef654c63819a7d00fd0cc4181f170111fa4c15d2.zip |
LocalCSE rewrite (#4079)
Technically this is not a new pass, but it is a rewrite almost from scratch.
Local Common Subexpression Elimination looks for repeated patterns,
stuff like this:
x = (a + b) + c
y = a + b
=>
temp = a + b
x = temp + c
y = temp
The old pass worked on flat IR, which is inefficient, and was overly
complicated because of that. The new pass uses a new algorithm that
I think is pretty simple, see the detailed comment at the top.
This keeps the pass enabled only in -O4, like before - right after
flattening the IR. That is to make this as minimal a change as possible.
Followups will enable the pass in the main pipeline, that is, we will
finally be able to run it by default. (Note that to make the pass work
well after flatten, an extra simplify-locals is added - the old pass used
to do part of simplify-locals internally, which was one source of
complexity. Even so, some of the -O4 tests have changes, due to
minor factors - they are just minor orderings etc., which can be
seen by inspecting the outputs before and after using e.g.
--metrics)
This plus some followup work leads to large wins on wasm GC output.
On j2cl there is a common pattern of repeated struct.gets, so common
that this pass removes 85% of all struct.gets, which makes the total
binary 15% smaller. However, on LLVM-emitted code the benefit is
minor, less than 1%.
Diffstat (limited to 'test/lit/passes/local-cse.wast')
-rw-r--r-- | test/lit/passes/local-cse.wast | 433 |
1 files changed, 433 insertions, 0 deletions
diff --git a/test/lit/passes/local-cse.wast b/test/lit/passes/local-cse.wast new file mode 100644 index 000000000..5cdf712f8 --- /dev/null +++ b/test/lit/passes/local-cse.wast @@ -0,0 +1,433 @@ +;; NOTE: Assertions have been generated by update_lit_checks.py --all-items and should not be edited. +;; NOTE: This test was ported using port_test.py and could be cleaned up. + +;; RUN: foreach %s %t wasm-opt --local-cse -S -o - | filecheck %s + +(module + (memory 100 100) + ;; CHECK: (type $none_=>_none (func)) + + ;; CHECK: (type $i32_=>_i32 (func (param i32) (result i32))) + + ;; CHECK: (type $none_=>_i64 (func (result i64))) + + ;; CHECK: (memory $0 100 100) + + ;; CHECK: (func $basics + ;; CHECK-NEXT: (local $x i32) + ;; CHECK-NEXT: (local $y i32) + ;; CHECK-NEXT: (local $2 i32) + ;; CHECK-NEXT: (local $3 i32) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.tee $2 + ;; CHECK-NEXT: (i32.add + ;; CHECK-NEXT: (i32.const 1) + ;; CHECK-NEXT: (i32.const 2) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.get $2) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (if + ;; CHECK-NEXT: (i32.const 0) + ;; CHECK-NEXT: (nop) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (i32.add + ;; CHECK-NEXT: (i32.const 1) + ;; CHECK-NEXT: (i32.const 2) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.tee $3 + ;; CHECK-NEXT: (i32.add + ;; CHECK-NEXT: (local.get $x) + ;; CHECK-NEXT: (local.get $y) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.get $3) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.get $3) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (call $basics) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.get $3) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (local.set $x + ;; CHECK-NEXT: (i32.const 100) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (i32.add + ;; CHECK-NEXT: (local.get $x) + ;; CHECK-NEXT: (local.get $y) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $basics + (local $x i32) + (local $y i32) + ;; These two adds can be optimized. + (drop + (i32.add (i32.const 1) (i32.const 2)) + ) + (drop + (i32.add (i32.const 1) (i32.const 2)) + ) + (if (i32.const 0) (nop)) + ;; This add is after an if, which means we are no longer in the same basic + ;; block - which means we cannot optimize it with the previous identical + ;; adds. + (drop + (i32.add (i32.const 1) (i32.const 2)) + ) + ;; More adds with different contents than the previous, but all three are + ;; identical. + (drop + (i32.add (local.get $x) (local.get $y)) + ) + (drop + (i32.add (local.get $x) (local.get $y)) + ) + (drop + (i32.add (local.get $x) (local.get $y)) + ) + ;; Calls have side effects, but that is not a problem for these adds which + ;; only use locals, so we can optimize the add after the call. + (call $basics) + (drop + (i32.add (local.get $x) (local.get $y)) + ) + ;; Modify $x, which means we cannot optimize the add after the set. + (local.set $x (i32.const 100)) + (drop + (i32.add (local.get $x) (local.get $y)) + ) + ) + + ;; CHECK: (func $recursive1 + ;; CHECK-NEXT: (local $x i32) + ;; CHECK-NEXT: (local $y i32) + ;; CHECK-NEXT: (local $2 i32) + ;; CHECK-NEXT: (local $3 i32) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.tee $3 + ;; CHECK-NEXT: (i32.add + ;; CHECK-NEXT: (i32.const 1) + ;; CHECK-NEXT: (local.tee $2 + ;; CHECK-NEXT: (i32.add + ;; CHECK-NEXT: (i32.const 2) + ;; CHECK-NEXT: (i32.const 3) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.get $3) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.get $2) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $recursive1 + (local $x i32) + (local $y i32) + ;; The first two dropped things are identical and can be optimized. + (drop + (i32.add + (i32.const 1) + (i32.add + (i32.const 2) + (i32.const 3) + ) + ) + ) + (drop + (i32.add + (i32.const 1) + (i32.add + (i32.const 2) + (i32.const 3) + ) + ) + ) + ;; The last thing here appears inside the previous pattern, and can still + ;; be optimized, with another local. + (drop + (i32.add + (i32.const 2) + (i32.const 3) + ) + ) + ) + + ;; CHECK: (func $recursive2 + ;; CHECK-NEXT: (local $x i32) + ;; CHECK-NEXT: (local $y i32) + ;; CHECK-NEXT: (local $2 i32) + ;; CHECK-NEXT: (local $3 i32) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.tee $3 + ;; CHECK-NEXT: (i32.add + ;; CHECK-NEXT: (i32.const 1) + ;; CHECK-NEXT: (local.tee $2 + ;; CHECK-NEXT: (i32.add + ;; CHECK-NEXT: (i32.const 2) + ;; CHECK-NEXT: (i32.const 3) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.get $2) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.get $3) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $recursive2 + (local $x i32) + (local $y i32) + ;; As before, but the order is different, with the sub-pattern in the + ;; middle. + (drop + (i32.add + (i32.const 1) + (i32.add + (i32.const 2) + (i32.const 3) + ) + ) + ) + (drop + (i32.add + (i32.const 2) + (i32.const 3) + ) + ) + (drop + (i32.add + (i32.const 1) + (i32.add + (i32.const 2) + (i32.const 3) + ) + ) + ) + ) + + ;; CHECK: (func $self + ;; CHECK-NEXT: (local $x i32) + ;; CHECK-NEXT: (local $y i32) + ;; CHECK-NEXT: (local $2 i32) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (i32.add + ;; CHECK-NEXT: (local.tee $2 + ;; CHECK-NEXT: (i32.add + ;; CHECK-NEXT: (i32.const 2) + ;; CHECK-NEXT: (i32.const 3) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (local.get $2) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.get $2) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $self + (local $x i32) + (local $y i32) + ;; As before, with just the large pattern and the sub pattern, but no + ;; repeats of the large pattern. + (drop + (i32.add + (i32.add + (i32.const 2) + (i32.const 3) + ) + (i32.add + (i32.const 2) + (i32.const 3) + ) + ) + ) + (drop + (i32.add + (i32.const 2) + (i32.const 3) + ) + ) + ) + + ;; CHECK: (func $loads + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (i32.load + ;; CHECK-NEXT: (i32.const 10) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (i32.load + ;; CHECK-NEXT: (i32.const 10) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $loads + ;; The possible trap on loads prevents optimization. + ;; TODO: optimize that too + (drop + (i32.load (i32.const 10)) + ) + (drop + (i32.load (i32.const 10)) + ) + ) + + ;; CHECK: (func $calls (param $x i32) (result i32) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (call $calls + ;; CHECK-NEXT: (i32.const 10) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (call $calls + ;; CHECK-NEXT: (i32.const 10) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (i32.const 10) + ;; CHECK-NEXT: ) + (func $calls (param $x i32) (result i32) + ;; The side effects of calls prevent optimization. + (drop + (call $calls (i32.const 10)) + ) + (drop + (call $calls (i32.const 10)) + ) + (i32.const 10) + ) + + ;; CHECK: (func $many-sets (result i64) + ;; CHECK-NEXT: (local $temp i64) + ;; CHECK-NEXT: (local $1 i64) + ;; CHECK-NEXT: (local.set $temp + ;; CHECK-NEXT: (local.tee $1 + ;; CHECK-NEXT: (i64.add + ;; CHECK-NEXT: (i64.const 1) + ;; CHECK-NEXT: (i64.const 2) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (local.set $temp + ;; CHECK-NEXT: (i64.const 9999) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (local.set $temp + ;; CHECK-NEXT: (local.get $1) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (local.get $temp) + ;; CHECK-NEXT: ) + (func $many-sets (result i64) + (local $temp i64) + ;; Assign to $temp three times here. We can optimize the add regardless of + ;; that, and should not be confused by the sets themselves having effects + ;; that are in conflict (the value is what matters). + (local.set $temp + (i64.add + (i64.const 1) + (i64.const 2) + ) + ) + (local.set $temp + (i64.const 9999) + ) + (local.set $temp + (i64.add + (i64.const 1) + (i64.const 2) + ) + ) + (local.get $temp) + ) + + ;; CHECK: (func $switch-children (param $x i32) (result i32) + ;; CHECK-NEXT: (local $1 i32) + ;; CHECK-NEXT: (block $label$1 (result i32) + ;; CHECK-NEXT: (br_table $label$1 $label$1 + ;; CHECK-NEXT: (local.tee $1 + ;; CHECK-NEXT: (i32.and + ;; CHECK-NEXT: (local.get $x) + ;; CHECK-NEXT: (i32.const 3) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (local.get $1) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $switch-children (param $x i32) (result i32) + (block $label$1 (result i32) + ;; We can optimize the two children of this switch. This test verifies + ;; that we do so properly and do not hit an assertion involving the + ;; ordering of the switch's children, which was incorrect in the past. + (br_table $label$1 $label$1 + (i32.and + (local.get $x) + (i32.const 3) + ) + (i32.and + (local.get $x) + (i32.const 3) + ) + ) + ) + ) +) + +(module + ;; CHECK: (type $none_=>_none (func)) + + ;; CHECK: (global $glob (mut i32) (i32.const 1)) + (global $glob (mut i32) (i32.const 1)) + + ;; CHECK: (global $other-glob (mut i32) (i32.const 1)) + (global $other-glob (mut i32) (i32.const 1)) + + ;; CHECK: (func $global + ;; CHECK-NEXT: (local $0 i32) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.tee $0 + ;; CHECK-NEXT: (global.get $glob) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.get $0) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (global.set $other-glob + ;; CHECK-NEXT: (i32.const 100) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (local.get $0) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (global.set $glob + ;; CHECK-NEXT: (i32.const 200) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $glob) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $global + ;; We should optimize redundant global.get operations. + (drop (global.get $glob)) + (drop (global.get $glob)) + ;; We can do it past a write to another global + (global.set $other-glob (i32.const 100)) + (drop (global.get $glob)) + ;; But we can't do it past a write to our global. + (global.set $glob (i32.const 200)) + (drop (global.get $glob)) + ) +) |