summaryrefslogtreecommitdiff
path: root/src/passes/Precompute.cpp
diff options
context:
space:
mode:
authorAlon Zakai <azakai@google.com>2024-08-21 16:13:21 -0700
committerGitHub <noreply@github.com>2024-08-21 16:13:21 -0700
commit2d99e10c03e619b01688268319c1cd43f7539e33 (patch)
tree8fb390668de6d4f83e0fabcc654c19d87477d0f3 /src/passes/Precompute.cpp
parent60bd610acaa91cab538497342eb06e3b50d2fac9 (diff)
downloadbinaryen-2d99e10c03e619b01688268319c1cd43f7539e33.tar.gz
binaryen-2d99e10c03e619b01688268319c1cd43f7539e33.tar.bz2
binaryen-2d99e10c03e619b01688268319c1cd43f7539e33.zip
[NFC] Avoid quadratic time when precomputing blocks (#6862)
When precomputing fails on a child block of a parent block, there is no point to precompute the parent, as that will fail as well. This makes --precompute on Emscripten's test_biggerswitch go from 1.44 seconds to 0.02 seconds (not a typo, that is 72x faster). The absolute number is not that big, but we do run this pass more than once, so it saves a noticeable chunk of time.
Diffstat (limited to 'src/passes/Precompute.cpp')
-rw-r--r--src/passes/Precompute.cpp67
1 files changed, 67 insertions, 0 deletions
diff --git a/src/passes/Precompute.cpp b/src/passes/Precompute.cpp
index c60136ec7..6e0d7535f 100644
--- a/src/passes/Precompute.cpp
+++ b/src/passes/Precompute.cpp
@@ -363,6 +363,73 @@ struct Precompute
}
}
+ void visitBlock(Block* curr) {
+ // When block precomputation fails, it can lead to quadratic slowness due to
+ // the "tower of blocks" pattern used to implement switches:
+ //
+ // (block
+ // (block
+ // ...
+ // (block
+ // (br_table ..
+ //
+ // If we try to precompute each block here, and fail on each, then we end up
+ // doing quadratic work. This is also wasted work as once a nested block
+ // fails to precompute there is not really a chance to succeed on the
+ // parent. If we do *not* fail to precompute, however, then we do want to
+ // precompute such nested blocks, e.g.:
+ //
+ // (block $out
+ // (block
+ // (br $out)
+ // )
+ // )
+ //
+ // Here we *can* precompute the inner block, so when we get to the outer one
+ // we see this:
+ //
+ // (block $out
+ // (br $out)
+ // )
+ //
+ // And that precomputes to nothing. Therefore when we see a child of the
+ // block that is another block (it failed to precompute to something
+ // simpler) then we leave early here.
+ //
+ // Note that in theory we could still precompute here if wasm had
+ // instructions that allow such things, e.g.:
+ //
+ // (block $out
+ // (block
+ // (cause side effect1)
+ // (cause side effect2)
+ // )
+ // (undo those side effects exactly)
+ // )
+ //
+ // We are forced to invent a side effect that we can precisely undo (unlike,
+ // say locals - a local.set would persist outside of the block, and even if
+ // we did another set to the original value, this pass doesn't track values
+ // that way). Only with that can we make the inner block un-precomputable
+ // (because there are side effects) but the outer one is (because those
+ // effects are undone). Note that it is critical that we have two things in
+ // the block, so that we can't precompute it to one of them (which is what
+ // we did to the br in the previous example). Note also that this is still
+ // optimizable using other passes, as merge-blocks will fold the two blocks
+ // together.
+ if (!curr->list.empty() && curr->list[0]->is<Block>()) {
+ // The first child is a block, that is, it could not be simplified, so
+ // this looks like the "tower of blocks" pattern. Avoid quadratic time
+ // here as explained above. (We could also look at other children of the
+ // block, but the only real-world pattern identified so far is on the
+ // first child, so keep things simple here.)
+ return;
+ }
+
+ // Otherwise, precompute normally like all other expressions.
+ visitExpression(curr);
+ }
+
// If we failed to precompute a constant, perhaps we can still precompute part
// of an expression. Specifically, consider this case:
//