summaryrefslogtreecommitdiff
path: root/test/lit/passes/local-cse_all-features.wast
diff options
context:
space:
mode:
authorAlon Zakai <azakai@google.com>2021-08-17 07:28:05 -0700
committerGitHub <noreply@github.com>2021-08-17 14:28:05 +0000
commitef654c63819a7d00fd0cc4181f170111fa4c15d2 (patch)
treef80b27d6e64ea5430fe4a05b7d37f8a4b59dd4b2 /test/lit/passes/local-cse_all-features.wast
parenteeb864a593f08d1bebbbda5f6fbc21fa93c5b8af (diff)
downloadbinaryen-ef654c63819a7d00fd0cc4181f170111fa4c15d2.tar.gz
binaryen-ef654c63819a7d00fd0cc4181f170111fa4c15d2.tar.bz2
binaryen-ef654c63819a7d00fd0cc4181f170111fa4c15d2.zip
LocalCSE rewrite (#4079)
Technically this is not a new pass, but it is a rewrite almost from scratch. Local Common Subexpression Elimination looks for repeated patterns, stuff like this: x = (a + b) + c y = a + b => temp = a + b x = temp + c y = temp The old pass worked on flat IR, which is inefficient, and was overly complicated because of that. The new pass uses a new algorithm that I think is pretty simple, see the detailed comment at the top. This keeps the pass enabled only in -O4, like before - right after flattening the IR. That is to make this as minimal a change as possible. Followups will enable the pass in the main pipeline, that is, we will finally be able to run it by default. (Note that to make the pass work well after flatten, an extra simplify-locals is added - the old pass used to do part of simplify-locals internally, which was one source of complexity. Even so, some of the -O4 tests have changes, due to minor factors - they are just minor orderings etc., which can be seen by inspecting the outputs before and after using e.g. --metrics) This plus some followup work leads to large wins on wasm GC output. On j2cl there is a common pattern of repeated struct.gets, so common that this pass removes 85% of all struct.gets, which makes the total binary 15% smaller. However, on LLVM-emitted code the benefit is minor, less than 1%.
Diffstat (limited to 'test/lit/passes/local-cse_all-features.wast')
-rw-r--r--test/lit/passes/local-cse_all-features.wast272
1 files changed, 272 insertions, 0 deletions
diff --git a/test/lit/passes/local-cse_all-features.wast b/test/lit/passes/local-cse_all-features.wast
new file mode 100644
index 000000000..b51240c50
--- /dev/null
+++ b/test/lit/passes/local-cse_all-features.wast
@@ -0,0 +1,272 @@
+;; NOTE: Assertions have been generated by update_lit_checks.py --all-items and should not be edited.
+;; NOTE: This test was ported using port_test.py and could be cleaned up.
+
+;; RUN: foreach %s %t wasm-opt --local-cse --all-features -S -o - | filecheck %s
+
+(module
+ ;; CHECK: (type $i32_=>_i32 (func (param i32) (result i32)))
+
+ ;; CHECK: (type $none_=>_none (func))
+
+ ;; CHECK: (elem declare func $calls $ref.func)
+
+ ;; CHECK: (func $calls (param $x i32) (result i32)
+ ;; CHECK-NEXT: (drop
+ ;; CHECK-NEXT: (call_ref
+ ;; CHECK-NEXT: (i32.const 10)
+ ;; CHECK-NEXT: (ref.func $calls)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: (drop
+ ;; CHECK-NEXT: (call_ref
+ ;; CHECK-NEXT: (i32.const 10)
+ ;; CHECK-NEXT: (ref.func $calls)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: (i32.const 20)
+ ;; CHECK-NEXT: )
+ (func $calls (param $x i32) (result i32)
+ ;; The side effects of calls prevent optimization.
+ (drop
+ (call_ref (i32.const 10) (ref.func $calls))
+ )
+ (drop
+ (call_ref (i32.const 10) (ref.func $calls))
+ )
+ (i32.const 20)
+ )
+
+ ;; CHECK: (func $ref.func
+ ;; CHECK-NEXT: (drop
+ ;; CHECK-NEXT: (ref.func $ref.func)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: (drop
+ ;; CHECK-NEXT: (ref.func $ref.func)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ (func $ref.func
+ ;; RefFunc and other constants should be ignored - don't undo the effects
+ ;; of constant propagation.
+ (drop
+ (ref.func $ref.func)
+ )
+ (drop
+ (ref.func $ref.func)
+ )
+ )
+)
+
+(module
+ ;; CHECK: (type $A (struct (field i32)))
+ (type $A (struct (field i32)))
+
+ ;; CHECK: (type $ref|$A|_=>_none (func (param (ref $A))))
+
+ ;; CHECK: (type $B (array (mut i32)))
+ (type $B (array (mut i32)))
+
+ ;; CHECK: (type $none_=>_none (func))
+
+ ;; CHECK: (func $struct-gets (param $ref (ref $A))
+ ;; CHECK-NEXT: (local $1 i32)
+ ;; CHECK-NEXT: (drop
+ ;; CHECK-NEXT: (local.tee $1
+ ;; CHECK-NEXT: (struct.get $A 0
+ ;; CHECK-NEXT: (local.get $ref)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: (drop
+ ;; CHECK-NEXT: (local.get $1)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: (drop
+ ;; CHECK-NEXT: (local.get $1)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ (func $struct-gets (param $ref (ref $A))
+ ;; Repeated loads from a struct can be optimized.
+ ;;
+ ;; Note that these struct.gets cannot trap as the reference is non-nullable,
+ ;; so there are no side effects here, and we can optimize.
+ (drop
+ (struct.get $A 0
+ (local.get $ref)
+ )
+ )
+ (drop
+ (struct.get $A 0
+ (local.get $ref)
+ )
+ )
+ (drop
+ (struct.get $A 0
+ (local.get $ref)
+ )
+ )
+ )
+
+ ;; CHECK: (func $non-nullable-value (param $ref (ref $A))
+ ;; CHECK-NEXT: (local $1 (ref null $A))
+ ;; CHECK-NEXT: (drop
+ ;; CHECK-NEXT: (ref.as_non_null
+ ;; CHECK-NEXT: (local.tee $1
+ ;; CHECK-NEXT: (select (result (ref $A))
+ ;; CHECK-NEXT: (local.get $ref)
+ ;; CHECK-NEXT: (local.get $ref)
+ ;; CHECK-NEXT: (i32.const 1)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: (drop
+ ;; CHECK-NEXT: (ref.as_non_null
+ ;; CHECK-NEXT: (local.get $1)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ (func $non-nullable-value (param $ref (ref $A))
+ ;; The value that is repeated is non-nullable, which we must do some fixups
+ ;; for after creating a local of that type.
+ (drop
+ (select (result (ref $A))
+ (local.get $ref)
+ (local.get $ref)
+ (i32.const 1)
+ )
+ )
+ (drop
+ (select (result (ref $A))
+ (local.get $ref)
+ (local.get $ref)
+ (i32.const 1)
+ )
+ )
+ )
+
+ ;; CHECK: (func $creations
+ ;; CHECK-NEXT: (drop
+ ;; CHECK-NEXT: (struct.new_with_rtt $A
+ ;; CHECK-NEXT: (i32.const 1)
+ ;; CHECK-NEXT: (rtt.canon $A)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: (drop
+ ;; CHECK-NEXT: (struct.new_with_rtt $A
+ ;; CHECK-NEXT: (i32.const 1)
+ ;; CHECK-NEXT: (rtt.canon $A)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: (drop
+ ;; CHECK-NEXT: (array.new_with_rtt $B
+ ;; CHECK-NEXT: (i32.const 1)
+ ;; CHECK-NEXT: (i32.const 1)
+ ;; CHECK-NEXT: (rtt.canon $B)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: (drop
+ ;; CHECK-NEXT: (array.new_with_rtt $B
+ ;; CHECK-NEXT: (i32.const 1)
+ ;; CHECK-NEXT: (i32.const 1)
+ ;; CHECK-NEXT: (rtt.canon $B)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ (func $creations
+ ;; Allocating GC data has no side effects, but each allocation is unique
+ ;; and so we cannot replace separate allocations with a single one.
+ (drop
+ (struct.new_with_rtt $A
+ (i32.const 1)
+ (rtt.canon $A)
+ )
+ )
+ (drop
+ (struct.new_with_rtt $A
+ (i32.const 1)
+ (rtt.canon $A)
+ )
+ )
+ (drop
+ (array.new_with_rtt $B
+ (i32.const 1)
+ (i32.const 1)
+ (rtt.canon $B)
+ )
+ )
+ (drop
+ (array.new_with_rtt $B
+ (i32.const 1)
+ (i32.const 1)
+ (rtt.canon $B)
+ )
+ )
+ )
+)
+
+(module
+ ;; Real-world testcase from AssemblyScript, containing multiple nested things
+ ;; that can be optimized. The inputs to the add (the xors) are identical, and
+ ;; we can avoid repeating them.
+ ;; CHECK: (type $i32_i32_=>_i32 (func (param i32 i32) (result i32)))
+
+ ;; CHECK: (func $div16_internal (param $0 i32) (param $1 i32) (result i32)
+ ;; CHECK-NEXT: (local $2 i32)
+ ;; CHECK-NEXT: (i32.add
+ ;; CHECK-NEXT: (local.tee $2
+ ;; CHECK-NEXT: (i32.xor
+ ;; CHECK-NEXT: (i32.shr_s
+ ;; CHECK-NEXT: (i32.shl
+ ;; CHECK-NEXT: (local.get $0)
+ ;; CHECK-NEXT: (i32.const 16)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: (i32.const 16)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: (i32.shr_s
+ ;; CHECK-NEXT: (i32.shl
+ ;; CHECK-NEXT: (local.get $1)
+ ;; CHECK-NEXT: (i32.const 16)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: (i32.const 16)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: (local.get $2)
+ ;; CHECK-NEXT: )
+ ;; CHECK-NEXT: )
+ (func $div16_internal (param $0 i32) (param $1 i32) (result i32)
+ (i32.add
+ (i32.xor
+ (i32.shr_s
+ (i32.shl
+ (local.get $0)
+ (i32.const 16)
+ )
+ (i32.const 16)
+ )
+ (i32.shr_s
+ (i32.shl
+ (local.get $1)
+ (i32.const 16)
+ )
+ (i32.const 16)
+ )
+ )
+ (i32.xor
+ (i32.shr_s
+ (i32.shl
+ (local.get $0)
+ (i32.const 16)
+ )
+ (i32.const 16)
+ )
+ (i32.shr_s
+ (i32.shl
+ (local.get $1)
+ (i32.const 16)
+ )
+ (i32.const 16)
+ )
+ )
+ )
+ )
+)