[Wasm GC] Fix TypeRefining corner case with uncreated types (#4500)

This pass ignores reads from structs - it only cares about writes (during a create or a struct.set). That makes sense since we want to refine the type of fields to more specific things based on what is actually written to them. However, a corner case was missed: If we ignore reads, the pass may "cleverly" optimize to something that is no longer valid to read from. How that happens is if there is no info at all for a type - no sets or news, so all we have is a read, which as mentioned before we ignore, so we think we have nothing at all for that type, and can do arbitrary stuff with it. But then the arbitrary replacement can be invalid to read from, say if it has fewer fields. To handle that, just emit an unreachable. If all we have is a get but no new then there cannot be an instance here at all. (That's only true in a closed world, of course, but this entire pass assumes that anyhow.)
author: Alon Zakai <azakai@google.com> 2022-02-03 14:10:46 -0800
committer: GitHub <noreply@github.com> 2022-02-03 22:10:46 +0000
commit: 9800178d7cfdb336fb5bac34f0d9844639f3a9cc (patch)
tree: d196a7447d865a602cb00fa90f745bafd30aacb2
parent: 4e8b85a8d3953234bbfa490cb1d4f5c32380fb3f (diff)
download: binaryen-9800178d7cfdb336fb5bac34f0d9844639f3a9cc.tar.gz
binaryen-9800178d7cfdb336fb5bac34f0d9844639f3a9cc.tar.bz2
binaryen-9800178d7cfdb336fb5bac34f0d9844639f3a9cc.zip
2 files changed, 136 insertions, 0 deletions
diff --git a/src/passes/TypeRefining.cpp b/src/passes/TypeRefining.cpp
index 9480322bb..74b23bb26 100644
--- a/src/passes/TypeRefining.cpp
+++ b/src/passes/TypeRefining.cpp
@@ -199,10 +199,59 @@ struct TypeRefining : public Pass {
     }
 
     if (canOptimize) {
+      updateInstructions(*module, runner);
       updateTypes(*module, runner);
     }
   }
 
+  // If we change types then some instructions may need to be modified.
+  // Specifically, we assume that reads from structs impose no constraints on
+  // us, so that we can optimize maximally. If a struct is never created nor
+  // written to, but only read from, then we have literally no constraints on it
+  // at all, and we can end up with a situation where we alter the type to
+  // something that is invalid for that read. To ensure the code still
+  // validates, simply remove such reads.
+  void updateInstructions(Module& wasm, PassRunner* runner) {
+    struct ReadUpdater : public WalkerPass<PostWalker<ReadUpdater>> {
+      bool isFunctionParallel() override { return true; }
+
+      TypeRefining& parent;
+
+      ReadUpdater(TypeRefining& parent) : parent(parent) {}
+
+      ReadUpdater* create() override { return new ReadUpdater(parent); }
+
+      void visitStructGet(StructGet* curr) {
+        if (curr->ref->type == Type::unreachable) {
+          return;
+        }
+
+        auto oldType = curr->ref->type.getHeapType();
+        auto newFieldType =
+          parent.finalInfos[oldType][curr->index].getBestPossible();
+        if (!Type::isSubType(newFieldType, curr->type)) {
+          // This instruction is invalid, so it must be the result of the
+          // situation described above: we ignored the read during our
+          // inference, and optimized accordingly, and so now we must remove it
+          // to keep the module validating. It doesn't matter what we emit here,
+          // since there are no struct.new or struct.sets for this type, so this
+          // code is logically unreachable.
+          //
+          // Note that we emit an unreachable here, which changes the type, and
+          // so we should refinalize. However, we will be refinalizing later
+          // anyhow in updateTypes, so there is no need.
+          Builder builder(*getModule());
+          replaceCurrent(builder.makeSequence(builder.makeDrop(curr->ref),
+                                              builder.makeUnreachable()));
+        }
+      }
+    };
+
+    ReadUpdater updater(*this);
+    updater.run(runner, &wasm);
+    updater.runOnModuleCode(runner, &wasm);
+  }
+
   void updateTypes(Module& wasm, PassRunner* runner) {
     class TypeRewriter : public GlobalTypeRewriter {
       TypeRefining& parent;
diff --git a/test/lit/passes/type-refining.wast b/test/lit/passes/type-refining.wast
index eb33f058a..7f903d64e 100644
--- a/test/lit/passes/type-refining.wast
+++ b/test/lit/passes/type-refining.wast
@@ -768,3 +768,90 @@
     )
   )
 )
+
+(module
+  ;; There are two parallel type hierarchies here: "Outer", which are objects
+  ;; that have fields, that contain the "Inner" objects.
+  ;;
+  ;; Root-Outer -> Leaf1-Outer
+  ;;            -> Leaf2-Outer
+  ;;
+  ;; Root-Inner -> Leaf1-Inner
+  ;;            -> Leaf2-Inner
+  ;;
+  ;; Adding their contents, where X[Y] means X has a field of type Y:
+  ;;
+  ;; Root-Outer[Root-Inner] -> Leaf1-Outer[Leaf1-Inner]
+  ;;                        -> Leaf2-Outer[Leaf2-Inner]
+
+  ;; CHECK:      (type $Leaf2-Inner (struct_subtype  $Root-Inner))
+  (type $Leaf2-Inner (struct_subtype  $Root-Inner))
+
+  ;; CHECK:      (type $none_=>_none (func_subtype func))
+
+  ;; CHECK:      (type $Leaf1-Outer (struct_subtype (field (ref $Leaf2-Inner)) $Root-Outer))
+  (type $Leaf1-Outer (struct_subtype (field (ref $Leaf1-Inner)) $Root-Outer))
+
+ ;; CHECK:      (type $Leaf2-Outer (struct_subtype (field (ref $Leaf2-Inner)) $Root-Outer))
+ (type $Leaf2-Outer (struct_subtype (field (ref $Leaf2-Inner)) $Root-Outer))
+
+  ;; CHECK:      (type $Root-Outer (struct_subtype (field (ref $Leaf2-Inner)) data))
+  (type $Root-Outer (struct_subtype (field (ref $Root-Inner)) data))
+
+  ;; CHECK:      (type $Root-Inner (struct_subtype  data))
+  (type $Root-Inner (struct_subtype  data))
+
+  (type $Leaf1-Inner (struct_subtype (field i32) $Root-Inner))
+
+  ;; CHECK:      (func $func (type $none_=>_none)
+  ;; CHECK-NEXT:  (drop
+  ;; CHECK-NEXT:   (block ;; (replaces something unreachable we can't emit)
+  ;; CHECK-NEXT:    (drop
+  ;; CHECK-NEXT:     (block
+  ;; CHECK-NEXT:      (drop
+  ;; CHECK-NEXT:       (ref.null $Leaf1-Outer)
+  ;; CHECK-NEXT:      )
+  ;; CHECK-NEXT:      (unreachable)
+  ;; CHECK-NEXT:     )
+  ;; CHECK-NEXT:    )
+  ;; CHECK-NEXT:   )
+  ;; CHECK-NEXT:  )
+  ;; CHECK-NEXT:  (drop
+  ;; CHECK-NEXT:   (struct.new $Leaf2-Outer
+  ;; CHECK-NEXT:    (struct.new_default $Leaf2-Inner)
+  ;; CHECK-NEXT:   )
+  ;; CHECK-NEXT:  )
+  ;; CHECK-NEXT: )
+  (func $func
+    (drop
+      ;; The situation here is that we have only a get for some types, and no
+      ;; other constraints. As we ignore gets, we work under no constraints at
+      ;; We then have to pick some type, so we pick the one used by our
+      ;; supertype - and the supertype might have picked up a type from another
+      ;; branch of the type tree, which is not a subtype of ours.
+      ;;
+      ;; In more detail, we never create an instance of $Leaf1-Outer, and we
+      ;; only have a get of its field. This optimization ignores the get (to not
+      ;; be limited by it). It will then optimize $Leaf1-Outer's field of
+      ;; $Leaf1-Inner (another struct for which we have no creation, and only a
+      ;; get) into $Leaf2-Inner, which is driven by the fact that we do have a
+      ;; creation of $Leaf2-Inner. But then this struct.get $Leaf1-Inner on field
+      ;; 0 is no longer valid, as we turn $Leaf1-Inner => $Leaf2-Inner, and
+      ;; $Leaf2-Inner has no field 0. To keep the module validating, we must not
+      ;; emit that. Instead, since there can be no instance of $Leaf1-Inner (as
+      ;; mentioned before, it is never created, nor anything that can be cast to
+      ;; it), we know this code is logically unreachable, and can emit an
+      ;; unreachable here.
+      (struct.get $Leaf1-Inner 0
+        (struct.get $Leaf1-Outer 0
+          (ref.null $Leaf1-Outer)
+        )
+      )
+    )
+    (drop
+      (struct.new $Leaf2-Outer
+        (struct.new_default $Leaf2-Inner)
+      )
+    )
+  )
+)
author	Alon Zakai <azakai@google.com>	2022-02-03 14:10:46 -0800
committer	GitHub <noreply@github.com>	2022-02-03 22:10:46 +0000
commit	9800178d7cfdb336fb5bac34f0d9844639f3a9cc (patch)
tree	d196a7447d865a602cb00fa90f745bafd30aacb2
parent	4e8b85a8d3953234bbfa490cb1d4f5c32380fb3f (diff)
download	binaryen-9800178d7cfdb336fb5bac34f0d9844639f3a9cc.tar.gz binaryen-9800178d7cfdb336fb5bac34f0d9844639f3a9cc.tar.bz2 binaryen-9800178d7cfdb336fb5bac34f0d9844639f3a9cc.zip