diff options
author | Alon Zakai <azakai@google.com> | 2024-11-11 14:41:56 -0800 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-11-11 14:41:56 -0800 |
commit | f6cfc5647259c12036f58f0a3b6b3cfc83e4b914 (patch) | |
tree | fb13b2c529722fdda9910d33a307df80053fd01f /test | |
parent | 9e11c5fd12a7c6392ac62979cc16c62a368f6e33 (diff) | |
download | binaryen-f6cfc5647259c12036f58f0a3b6b3cfc83e4b914.tar.gz binaryen-f6cfc5647259c12036f58f0a3b6b3cfc83e4b914.tar.bz2 binaryen-f6cfc5647259c12036f58f0a3b6b3cfc83e4b914.zip |
Fix PickLoadSigns on SignExt feature instructions (#7069)
I believe the history here is that
1. We added a PickLoadSigns pass. It checks if a load from memory is stored in
a local that is only every used in a signed or an unsigned manner. If it is, we can
adjust the sign of the load (load8_u/s) to do the sign/unsign during the load.
2. The pass finds each LocalGet and looks either 2 or 3 parents above it. For
a sign operation, we need to look up 3, since the operation is x << K >> K. For
an unsigned, we need only 2, since we have x & M. We hardcoded those
numbers 2 and 3.
3. We added the SignExt feature, which adds i32.extend8_s. This does a sign
extend with a single instruction, not two nested ones, so now we can sign-
extend at depth 2, unlike before. Properties::getSignExtValue was updated
for this, but not the pass PickLoadSigns.
The bug that is fixed here is that we looked at depth 3 for a sign-extend, and
we blindly accepted it if we found one. So we ended up accepting
(i32.extend8_s (ANYTHING (x))), which is a sign-extend of something, but
not of x, which is bad.
We were also missing an optimization opportunity, as we didn't look for
depth 2 sign extends.
This bug is quite old, from when Properties got SignExt support, in #3910.
But the blame isn't there - to notice this then, we'd have had to check each
caller of getSignExtValue throughout the codebase, which isn't reasonable.
The fault is mine, from the first write-up of PickLoadSigns in 2017: the code
should have been fully general, handling 2/3 and checking the output when
it does so (adding == curr, that the sign/zero-extended value is the one we
expect). That is what this PR does.
Diffstat (limited to 'test')
-rw-r--r-- | test/lit/passes/pick-load-signs_sign-ext.wast | 80 |
1 files changed, 80 insertions, 0 deletions
diff --git a/test/lit/passes/pick-load-signs_sign-ext.wast b/test/lit/passes/pick-load-signs_sign-ext.wast new file mode 100644 index 000000000..c50af32c5 --- /dev/null +++ b/test/lit/passes/pick-load-signs_sign-ext.wast @@ -0,0 +1,80 @@ +;; NOTE: Assertions have been generated by update_lit_checks.py and should not be edited. +;; RUN: wasm-opt %s --pick-load-signs -all -S -o - | filecheck %s + +(module + ;; CHECK: (memory $0 16 17 shared) + (memory $0 16 17 shared) + + ;; CHECK: (func $load-other-use (type $0) (result i32) + ;; CHECK-NEXT: (local $temp i32) + ;; CHECK-NEXT: (block $label (result i32) + ;; CHECK-NEXT: (local.set $temp + ;; CHECK-NEXT: (i32.load8_u + ;; CHECK-NEXT: (i32.const 22) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (i32.extend8_s + ;; CHECK-NEXT: (br_if $label + ;; CHECK-NEXT: (local.get $temp) + ;; CHECK-NEXT: (i32.const 1) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (unreachable) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $load-other-use (result i32) + (local $temp i32) + ;; The load here is unsigned, while the value in the local is used in two + ;; ways: it is sign-extended, and it is sent on a branch by a br_if. We must + ;; not ignore the branch, as if we did then we'd think the only use is signed, + ;; and we'd optimize load8_u => load8_s. + (block $label (result i32) + (local.set $temp + (i32.load8_u + (i32.const 22) + ) + ) + (drop + (i32.extend8_s + (br_if $label + (local.get $temp) + (i32.const 1) + ) + ) + ) + (unreachable) + ) + ) + + ;; CHECK: (func $load-valid (type $1) + ;; CHECK-NEXT: (local $temp i32) + ;; CHECK-NEXT: (local.set $temp + ;; CHECK-NEXT: (i32.load8_s + ;; CHECK-NEXT: (i32.const 22) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (i32.extend8_s + ;; CHECK-NEXT: (local.get $temp) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $load-valid + (local $temp i32) + ;; As above, but remove the br_if in the middle. Now this is a valid case to + ;; optimize, load8_u => load8_s. + (local.set $temp + (i32.load8_u + (i32.const 22) + ) + ) + (drop + (i32.extend8_s + (local.get $temp) + ) + ) + ) +) + |