diff options
author | Alon Zakai <azakai@google.com> | 2024-08-21 16:13:21 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-08-21 16:13:21 -0700 |
commit | 2d99e10c03e619b01688268319c1cd43f7539e33 (patch) | |
tree | 8fb390668de6d4f83e0fabcc654c19d87477d0f3 /src/passes/Precompute.cpp | |
parent | 60bd610acaa91cab538497342eb06e3b50d2fac9 (diff) | |
download | binaryen-2d99e10c03e619b01688268319c1cd43f7539e33.tar.gz binaryen-2d99e10c03e619b01688268319c1cd43f7539e33.tar.bz2 binaryen-2d99e10c03e619b01688268319c1cd43f7539e33.zip |
[NFC] Avoid quadratic time when precomputing blocks (#6862)
When precomputing fails on a child block of a parent block, there is no point
to precompute the parent, as that will fail as well.
This makes --precompute on Emscripten's test_biggerswitch go from 1.44
seconds to 0.02 seconds (not a typo, that is 72x faster). The absolute number
is not that big, but we do run this pass more than once, so it saves a noticeable
chunk of time.
Diffstat (limited to 'src/passes/Precompute.cpp')
-rw-r--r-- | src/passes/Precompute.cpp | 67 |
1 files changed, 67 insertions, 0 deletions
diff --git a/src/passes/Precompute.cpp b/src/passes/Precompute.cpp index c60136ec7..6e0d7535f 100644 --- a/src/passes/Precompute.cpp +++ b/src/passes/Precompute.cpp @@ -363,6 +363,73 @@ struct Precompute } } + void visitBlock(Block* curr) { + // When block precomputation fails, it can lead to quadratic slowness due to + // the "tower of blocks" pattern used to implement switches: + // + // (block + // (block + // ... + // (block + // (br_table .. + // + // If we try to precompute each block here, and fail on each, then we end up + // doing quadratic work. This is also wasted work as once a nested block + // fails to precompute there is not really a chance to succeed on the + // parent. If we do *not* fail to precompute, however, then we do want to + // precompute such nested blocks, e.g.: + // + // (block $out + // (block + // (br $out) + // ) + // ) + // + // Here we *can* precompute the inner block, so when we get to the outer one + // we see this: + // + // (block $out + // (br $out) + // ) + // + // And that precomputes to nothing. Therefore when we see a child of the + // block that is another block (it failed to precompute to something + // simpler) then we leave early here. + // + // Note that in theory we could still precompute here if wasm had + // instructions that allow such things, e.g.: + // + // (block $out + // (block + // (cause side effect1) + // (cause side effect2) + // ) + // (undo those side effects exactly) + // ) + // + // We are forced to invent a side effect that we can precisely undo (unlike, + // say locals - a local.set would persist outside of the block, and even if + // we did another set to the original value, this pass doesn't track values + // that way). Only with that can we make the inner block un-precomputable + // (because there are side effects) but the outer one is (because those + // effects are undone). Note that it is critical that we have two things in + // the block, so that we can't precompute it to one of them (which is what + // we did to the br in the previous example). Note also that this is still + // optimizable using other passes, as merge-blocks will fold the two blocks + // together. + if (!curr->list.empty() && curr->list[0]->is<Block>()) { + // The first child is a block, that is, it could not be simplified, so + // this looks like the "tower of blocks" pattern. Avoid quadratic time + // here as explained above. (We could also look at other children of the + // block, but the only real-world pattern identified so far is on the + // first child, so keep things simple here.) + return; + } + + // Otherwise, precompute normally like all other expressions. + visitExpression(curr); + } + // If we failed to precompute a constant, perhaps we can still precompute part // of an expression. Specifically, consider this case: // |