diff options
author | Alon Zakai <alonzakai@gmail.com> | 2019-03-01 10:28:07 -0800 |
---|---|---|
committer | GitHub <noreply@github.com> | 2019-03-01 10:28:07 -0800 |
commit | 689fe405a3417fbfd59456035add6f6f53149f35 (patch) | |
tree | d6f1dcaf0cbb85eb3ae830f68a46c9a6627d1562 /src/passes/PostEmscripten.cpp | |
parent | f59c3033e678ced61bc8c78e8ac9fbee31ef0210 (diff) | |
download | binaryen-689fe405a3417fbfd59456035add6f6f53149f35.tar.gz binaryen-689fe405a3417fbfd59456035add6f6f53149f35.tar.bz2 binaryen-689fe405a3417fbfd59456035add6f6f53149f35.zip |
Consistently optimize small added constants into load/store offsets (#1924)
See #1919 - we did not do this consistently before.
This adds a lowMemoryUnused option to PassOptions. It can be passed on the commandline with --low-memory-unused. If enabled, we run the new optimize-added-constants pass, which does the real work here, replacing older code in post-emscripten.
Aside from running at the proper time (unlike the old pass, see #1919), this also has a -propagate mode, which can do stuff like this:
y = x + 10
[..]
load(y)
[..]
load(y)
=>
y = x + 10
[..]
load(x, offset=10)
[..]
load(x, offset=10)
That is, it can propagate such offsets to the loads/stores. This pattern is common in big interpreter loops, where the pointers are offsets into a big struct of state.
The pass does this propagation by using a new feature of LocalGraph, which can verify which locals are in SSA mode. Binaryen IR is not SSA (intentionally, since it's a later IR), but if a local only has a single set for all gets, that means that local is in such a state, and can be optimized. The tricky thing is that all locals are initialized to zero, so there are at minimum two sets. But if we verify that the real set dominates all the gets, then the zero initialization cannot reach them, and we are safe.
This PR also makes safe-heap aware of lowMemoryUnused. If so, we check for not just an access of 0, but the range 0-1023.
This makes zlib 5% faster, with either the wasm backend or asm2wasm. It also makes it 0.5% smaller. Also helps sqlite (1.5% faster) and lua (1% faster)
Diffstat (limited to 'src/passes/PostEmscripten.cpp')
-rw-r--r-- | src/passes/PostEmscripten.cpp | 60 |
1 files changed, 0 insertions, 60 deletions
diff --git a/src/passes/PostEmscripten.cpp b/src/passes/PostEmscripten.cpp index 72c0d8808..7e2bacf25 100644 --- a/src/passes/PostEmscripten.cpp +++ b/src/passes/PostEmscripten.cpp @@ -32,66 +32,6 @@ struct PostEmscripten : public WalkerPass<PostWalker<PostEmscripten>> { Pass* create() override { return new PostEmscripten; } - // When we have a Load from a local value (typically a GetLocal) plus a constant offset, - // we may be able to fold it in. - // The semantics of the Add are to wrap, while wasm offset semantics purposefully do - // not wrap. So this is not always safe to do. For example, a load may depend on - // wrapping via - // (2^32 - 10) + 100 => wrap and load from address 90 - // Without wrapping, we get something too large, and an error. *However*, for - // asm2wasm output coming from Emscripten, we allocate the lowest 1024 for mapped - // globals. Mapped globals are simple types (i32, float or double), always - // accessed directly by a single constant. Therefore if we see (..) + K where - // K is less then 1024, then if it wraps, it wraps into [0, 1024) which is at best - // a mapped global, but it can't be because they are accessed directly (at worst, - // it's 0 or an unused section of memory that was reserved for mapped globlas). - // Thus it is ok to optimize such small constants into Load offsets. - - #define SAFE_MAX 1024 - - void optimizeMemoryAccess(Expression*& ptr, Address& offset) { - while (1) { - auto* add = ptr->dynCast<Binary>(); - if (!add) break; - if (add->op != AddInt32) break; - auto* left = add->left->dynCast<Const>(); - auto* right = add->right->dynCast<Const>(); - // note: in optimized code, we shouldn't see an add of two constants, so don't worry about that much - // (precompute would optimize that) - if (left) { - auto value = left->value.geti32(); - if (value >= 0 && value < SAFE_MAX) { - offset = offset + value; - ptr = add->right; - continue; - } - } - if (right) { - auto value = right->value.geti32(); - if (value >= 0 && value < SAFE_MAX) { - offset = offset + value; - ptr = add->left; - continue; - } - } - break; - } - // finally ptr may be a const, but it isn't worth folding that in (we still have a const); in fact, - // it's better to do the opposite for gzip purposes as well as for readability. - auto* last = ptr->dynCast<Const>(); - if (last) { - last->value = Literal(int32_t(last->value.geti32() + offset)); - offset = 0; - } - } - - void visitLoad(Load* curr) { - optimizeMemoryAccess(curr->ptr, curr->offset); - } - void visitStore(Store* curr) { - optimizeMemoryAccess(curr->ptr, curr->offset); - } - void visitCall(Call* curr) { // special asm.js imports can be optimized auto* func = getModule()->getFunction(curr->target); |