summaryrefslogtreecommitdiff
path: root/src/passes/OptimizeInstructions.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [Wasm GC] Fix OptimizeInstructions on unreachable ref.test (#4156)Alon Zakai2021-09-151-0/+4
| | | | | | Avoids a crash in calling getHeapType when there isn't one. Also add the relevant lit test (and a few others) to the list of files to fuzz more heavily.
* [OptimizeInstructions] Optimize memory.fill with constant arguments (#4130)Max Graey2021-09-141-1/+126
| | | | | | | | | | | | | | This is reland of #3071 Do similar optimizations as in #3038 but for memory.fill. `memory.fill(d, v, 0)` ==> `{ drop(d), drop(v) }` only with `ignoreImplicitTraps` or `trapsNeverHappen` `memory.fill(d, v, 1)` ==> `store8(d, v)` Further simplifications can be done only if v is constant because otherwise binary size would increase: `memory.fill(d, C, 1)` ==> `store8(d, (C & 0xFF))` `memory.fill(d, C, 2)` ==> `store16(d, (C & 0xFF) * 0x0101)` `memory.fill(d, C, 4)` ==> `store32(d, (C & 0xFF) * 0x01010101)` `memory.fill(d, C, 8)` ==> `store64(d, (C & 0xFF) * 0x0101010101010101)` `memory.fill(d, C, 16)` ==> `store128(d, i8x16.splat(C & 0xFF))`
* OptimizeInstructions: Optimize boolean selects (#4147)Alon Zakai2021-09-131-0/+19
| | | | | | | | | | | | If all a select's inputs are boolean, we can sometimes turn the select into an AND or an OR operation, x ? y : 0 => x & y x ? 1 : y => x | y I believe LLVM aggressively canonicalizes to this form. It makes sense to do here too as it is smaller (save the constant 0 or 1). It also allows further optimizations (which is why LLVM does it) but I don't think we have those yet.
* Rename isIntrinsicallyNondeterministic() to isGenerative() (#4092)Alon Zakai2021-09-091-2/+1
|
* [OptimizeInstructions] propagate sign for integer multiplication (#4098)Max Graey2021-09-091-0/+50
| | | | | | | | | | | | ```ts -x * -y => (x * y) -x * y => -(x * y) x * -y => -(x * y), if x != C && y != C -x * C => x * -C, if C != C_pot || shrinkLevel != 0 -x * C => -(x * C), otherwise ``` We are skipping propagation when lhs and rhs are constants because this should handled by constant folding. Also skip cases like `-x * 4 -> x * -4` for `shrinkLevel != 0`, as this will be further converted to `-(x << 2)`.
* [Refactoring] Cleanup asm2wasm. Use JS instead ASM prefix where possible. ↵Max Graey2021-09-011-6/+0
| | | | NFC (#4090)
* Use the new module version of EffectAnalyzer (#4116)Alon Zakai2021-08-311-17/+15
| | | | | | | | | | | This finishes the refactoring started in #4115 by doing the same change to pass a Module into EffectAnalyzer instead of features. To do so this refactors the fallthrough API and a few other small things. After those changes, this PR removes the old feature constructor of EffectAnalyzer entirely. This requires a small breaking change in the C API, changing BinaryenExpressionGetSideEffects's feature param to a module. That makes this change not NFC, but otherwise it is.
* Add a Module parameter to EffectAnalyzer. NFC (#4115)Alon Zakai2021-08-311-17/+13
| | | | | | | | | | | | | Knowing the module will allow us to do more analysis in the effect analyzer. For now, this just refactors the code to allow providing a module instead of features, and to infer the features from the module. This actually shortens the code in most places which is nice (just pass module instead of module->features). This modifies basically all callers to use the new module form, except for the fallthrough logic. That would require some more refactoring, so to keep this PR reasonably small that is not yet done.
* OptimizeInstructions: Handle trivial ref.cast and ref.test (#4097)Alon Zakai2021-08-241-28/+97
| | | | | If the types are completely incompatible, we know the cast will fail. However, ref.cast does allow a null to pass through, which makes it a little more complicated.
* TrapsNeverHappen mode (#4059)Alon Zakai2021-08-171-9/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The goal of this mode is to remove obviously-unneeded code like (drop (i32.load (local.get $x))) In general we can't remove it, as the load might trap - we'd be removing a side effect. This is fairly rare in general, but actually becomes quite annoying with wasm GC code where such patterns are more common, and we really need to remove them. Historically the IgnoreImplicitTraps option was meant to help here. However, in practice it did not quite work well enough for most production code, as mentioned e.g. in #3934 . TrapsNeverHappen mode is an attempt to fix that, based on feedback from @askeksa in that issue, and also I believe this implements an idea that @fitzgen mentioned a while ago (sorry, I can't remember where exactly...). So I'm hopeful this will be generally useful and not just for GC. The idea in TrapsNeverHappen mode is that traps are assumed to not actually happen at runtime. That is, if there is a trap in the code, it will not be reached, or if it is reached then it will not trap. For example, an (unreachable) would be assumed to never be reached, which means that the optimizer can remove it and any code that executes right before it: (if (..condition..) (block (..code that can be removed, if it does not branch out..) (..code that can be removed, if it does not branch out..) (..code that can be removed, if it does not branch out..) (unreachable))) And something like a load from memory is assumed to not trap, etc., which in particular would let us remove that dropped load from earlier. This mode should be usable in production builds with assertions disabled, if traps are seen as failing assertions. That might not be true of all release builds (maybe some use traps for other purposes), but hopefully in some. That is, if traps are like assertions, then enabling this new mode would be like disabling assertions in release builds and living with the fact that if an assertion would have been hit then that is "undefined behavior" and the optimizer might have removed the trap or done something weird. TrapsNeverHappen (TNH) is different from IgnoreImplicitTraps (IIT). The old IIT mode would just ignore traps when computing effects. That is a simple model, but a problem happens with a trap behind a condition, like this: if (x != 0) foo(1 / x); We won't trap on integer division by zero here only because of the guarding if. In IIT, we'd compute no side effects on 1 / x, and then we might end up moving it around, depending on other code in the area, and potentially out of the if - which would make it happen unconditionally, which would break. TNH avoids that problem because it does not simply ignore traps. Instead, there is a new hasUnremovableSideEffects() method that must be opted-in by passes. That checks if there are no side effects, or if there are, if we can remove them - and we know we can remove a trap if we are running under TrapsNeverHappen mode, as the trap won't happen by assumption. A pass must only use that method where it is safe, that is, where it would either remove the side effect (in which case, no problem), or if not, that it at least does not move it around (avoiding the above problem with IIT). This PR does not implement all optimizations possible with TNH, just a small initial set of things to get started. It is already useful on wasm GC code, including being as good as IIT on removing unnecessary casts in some cases, see the test suite updates here. Also, a significant part of the 18% speedup measured in #4052 (comment) is due to my testing with this enabled, as otherwise the devirtualization there leaves a lot of unneeded code.
* LocalCSE rewrite (#4079)Alon Zakai2021-08-171-18/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Technically this is not a new pass, but it is a rewrite almost from scratch. Local Common Subexpression Elimination looks for repeated patterns, stuff like this: x = (a + b) + c y = a + b => temp = a + b x = temp + c y = temp The old pass worked on flat IR, which is inefficient, and was overly complicated because of that. The new pass uses a new algorithm that I think is pretty simple, see the detailed comment at the top. This keeps the pass enabled only in -O4, like before - right after flattening the IR. That is to make this as minimal a change as possible. Followups will enable the pass in the main pipeline, that is, we will finally be able to run it by default. (Note that to make the pass work well after flatten, an extra simplify-locals is added - the old pass used to do part of simplify-locals internally, which was one source of complexity. Even so, some of the -O4 tests have changes, due to minor factors - they are just minor orderings etc., which can be seen by inspecting the outputs before and after using e.g. --metrics) This plus some followup work leads to large wins on wasm GC output. On j2cl there is a common pattern of repeated struct.gets, so common that this pass removes 85% of all struct.gets, which makes the total binary 15% smaller. However, on LLVM-emitted code the benefit is minor, less than 1%.
* [Wasm GC] Fix OptimizeInstructions on folding of identical code with nominal ↵Alon Zakai2021-08-161-4/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | typing (#4069) (if (result i32) (local.get $x) (struct.get $B 1 (ref.null $B) ) (struct.get $C 1 (ref.null $C) ) ) With structural typing it is safe to turn this into this: (struct.get $A 1 (if (result (ref $A)) (local.get $x) (ref.null $B) (ref.null $C) ) ) Here $A is the LUB of the others. This works since $A must have field 1 in it. But with nominal types it is possible that the LUB in fact does not have that field, and we would not validate. This actually seems like a more general issue that might happen with other things, even though atm perhaps it can't. For simplicity, avoid this pattern in both nominal and structural typing, to avoid making a difference between them.
* Improve optimization of call_ref into direct calls (#4068)Alon Zakai2021-08-101-0/+83
| | | | | | | | | | | | | First, move the tiny pattern of call-ref-of-ref-func from Directize into OptimizeInstructions. This is important because Directize is a global optimization pass - it looks at the table to see if a CallIndirect can be turned into a direct call. We only run global passes at the end of the pipeline, but we don't need any global data for call-ref of a ref-func, and OptimizeInstructions is the place for such patterns. Second, extend that to also handle fallthrough values. This is less simple, but as call_ref is so inefficient, it's worth doing all we can.
* [Wasm GC] RefEq(x, null) => RefIsNull(x) (#4066)Alon Zakai2021-08-091-0/+12
|
* [Wasm GC] Optimize repeated identical ref.casts (#4039)Alon Zakai2021-07-291-2/+31
|
* [Optimize Instructions] Simplify zero/sign extentions (special case) (#4009)Max Graey2021-07-221-9/+25
| | | | | | | For signed or unsigned extension to 64-bits after lowering from partially filled 64-bit arguments: ```rust i64.extend_i32_u(i32.wrap_i64(x)) => x // where maxBits(x) <= 32 i64.extend_i32_s(i32.wrap_i64(x)) => x // where maxBits(x) <= 31 ```
* Do not create a select with multivalue arms in OptimizeInstructions (#4012)Alon Zakai2021-07-221-1/+6
| | | Similar to #4005 but on OptimizeInstructions instead of RemoveUnusedBrs.
* [Optimize Instructions] Combine reinterprets, loads and stores (#4006)Max Graey2021-07-211-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | Fixes #3973 Loads: f32.reinterpret_i32(i32.load(x)) => f32.load(x) f64.reinterpret_i64(i64.load(x)) => f64.load(x) i32.reinterpret_f32(f32.load(x)) => i32.load(x) i64.reinterpret_f64(f64.load(x)) => i64.load(x) Stores: f32.store(y, f32.reinterpret_i32(x)) => i32.store(y, x) f64.store(y, f64.reinterpret_i64(x)) => i64.store(y, x) i32.store(y, i32.reinterpret_f32(x)) => f32.store(y, x) i64.store(y, i64.reinterpret_f64(x)) => f64.store(y, x) Also optimize reinterprets that are undone: i32.reinterpret_f32(f32.reinterpret_i32(x)) => x i64.reinterpret_f64(f64.reinterpret_i64(x)) => x f32.reinterpret_i32(i32.reinterpret_f32(x)) => x f64.reinterpret_i64(i64.reinterpret_f64(x)) => x
* [Optimize Instructions] Simplify sign extentions (#4004)Max Graey2021-07-201-0/+51
| | | | | | | | | | | | | | requiring sign-extension: ``` i64(x) << 56 >> 56 ==> i64.extend8_s(x) i64(x) << 48 >> 48 ==> i64.extend16_s(x) i64(x) << 32 >> 32 ==> i64.extend32_s(x) i64.extend_i32_s(i32.wrap_i64(x)) ==> i64.extend32_s(x) ``` general: ``` i32.wrap_i64(i64.extend_i32_s(x)) ==> x i32.wrap_i64(i64.extend_i32_u(x)) ==> x ```
* [Optimize-Instructions] Simplify sign patterns like x < 0 ? -1 : 1 (#3756)Max Graey2021-07-161-0/+24
| | | | | | i32(x) < 0 ? i32(-1) : i32(1) -> x >> 31 | 1 i64(x) < 0 ? i64(-1) : i64(1) -> x >> 63 | 1 This shrinks 2 bytes.
* [Wasm GC] Add experimental array.copy (#3911)Alon Zakai2021-05-271-0/+5
| | | | | | | | Spec for it is here: https://docs.google.com/document/d/1DklC3qVuOdLHSXB5UXghM_syCh-4cMinQ50ICiXnK3Q/edit# Also reorder some things in wasm.h that were not in the canonical order (that has no effect, but it is confusing to read).
* OptimizeInstructions: Do not change unreachability in if/select changes (#3840)Alon Zakai2021-04-231-2/+29
| | | | | | | | | | | | | | | | | | | For example, (if (result i32) (local.get $x) (return (local.get $y) ) (return (local.get $z) ) ) If we move the returns outside we'd become unreachable, but we should not make such type changes in this pass (they are handled by DCE and Vacuum). (found by the fuzzer)
* OptimizeInstructions: Handle EqZInt64 on an if/select arm, not just 32 (#3837)Alon Zakai2021-04-221-3/+11
|
* OptimizeInstructions: Fix/ignore eqz hoisting of if with unreachable arm (#3835)Alon Zakai2021-04-221-1/+5
| | | | | | | We tried to ignore unreachable code, but only checked the type of the entire node. But an arm might be unreachable, and after moving code around that requires more work to update the type. But such cases are best left to DCE anyhow, so just check for any unreachability and stop there.
* Generalize moving of identical code from if/select arms (#3833)Alon Zakai2021-04-211-19/+30
| | | | | | | | | | | | | | | Effects are fine in the moved code, if we are doing so on an if (which runs just one arm anyhow). Allow unreachable, which lets us hoist returns for example. Allow none, which lets us hoist drop and call for example. For this we also need to be careful with subtyping, as at least drop is polymorphic, so the child types may not have an LUB (see example in code). Adds a small ShallowEffectAnalyzer child of EffectAnalyzer that calls visit to just do a shallow analysis (instead of walk which walks the children).
* OptimizeInstructions: Move identical unary code out of if/select arms (#3828)Alon Zakai2021-04-211-9/+74
| | | | | | | | | | | | | | | | | | | | | (select (foo (X) ) (foo (Y) ) (condition) ) => (foo (select (X) (Y) (condition) ) ) To make this simpler, refactor optimizeTernary to be templated.
* Optimize if/select with one arm an EqZ and another a 0 or a 1 (#3822)Alon Zakai2021-04-201-0/+61
| | | | | | | | | | | | | | | | | | | | | | (select (i32.eqz (X)) (i32.const 0|1) (Y) ) => (i32.eqz (select (X) (i32.const 1|0) (Y) ) ) This is beneficial as the eqz may be folded into something on the outside. I see this pattern in real-world code, both a GC benchmark (which is why I noticed it) and it shrinks code size by tiny amounts on the emscripten benchmark suite as well.
* [Wasm GC] Reorder ref.as_non_null with tee and cast (#3820)Alon Zakai2021-04-191-0/+51
| | | | In both cases doing the ref.as_non_null last is beneficial as we have optimizations that can remove it based on where it is consumed.
* [Wasm GC] Optimize reference identity checks (#3814)Alon Zakai2021-04-191-0/+46
| | | | | * Note that ref.cast has a fallthrough value. * Optimize ref.eq on identical inputs.
* [Wasm GC] Optimize away unnecessary non-null assertions (#3800)Alon Zakai2021-04-121-0/+27
| | | | | | ref.as_non_null is not needed if the value flows into a place that traps on null anyhow. We replace a trap on one instruction with a trap on another, but we allow such things (and even changing trap types, which does not happen here).
* [Wasm GC] Optimize RefCast + ignore-implicit-traps (#3748)Alon Zakai2021-03-301-0/+36
| | | | | | | If we are ignoring implicit traps, and if the cast is from a subtype to a supertype, then we ignore the possible RTT-related inconsistency and can just drop the cast. See #3636
* [Wasm GC] Optimize RefIs and RefAs when the type lets us (#3725)Alon Zakai2021-03-301-0/+93
| | | | | | | | | This is similar to the optimization of BrOn in #3719 and #3724. When the type tells us the kind of input we have, we can tell at compile time what result we'll get, like ref.is_func of something with type (ref func) will always return 1, etc. There are some code size and perf tradeoffs that should be looked into more and are marked as TODOs.
* OptimizeInstructions: bool(x) xor 1 ==> !bool(x) (#3741)Alon Zakai2021-03-301-0/+14
| | | | | | | | | This was noticed by samparker on LLVM: https://reviews.llvm.org/D99171 This is apparently a pattern LLVM emits, and doing it there helps by 1-2% on the real-world Bullet Physics codebase. Seems worthwhile doing here as well.
* [Wasm GC] Optimize array.set stored values (#3690)Alon Zakai2021-03-161-0/+7
| | | Same as we already do for struct.set.
* [Wasm GC] Optimize struct stores like stores to memory, ignore unneeded bits ↵Alon Zakai2021-03-121-23/+36
| | | | | | | (#3680) When storing to an i8, we can ignore any higher bits, etc. Adds a getByteSize utility to Field to make this convenient.
* Refactor OptimizeInstructions to use visit* methods. NFC (#3678)Alon Zakai2021-03-111-507/+550
| | | | | | | | | | | | | | | | Instead of a single big optimize() method we now use separate functions per instruction. This gives us smaller functions and less nesting in some cases, and avoids manually casting and checking etc. The reason this was not done originally is that this pass does repeated applications. That is, if optimize() changed something, it would run again on the result, perhaps further optimizing it. It did not need to run on the children, but just on the result itself, so it didn't just do another full walk, and so the simplest way was to just do a loop on optimize(). To replace that, this PR modifies replaceCurrent() which the methods now call to report that the current node can be replaced. There is some code in there now that keeps doing more processing while changes happen. It's not trivial code as it avoids recursion, but that slight complexity seems worthwhile in order to simplify the bulk of the (very large) pass.
* Make unreachable a subtype of everything (#3673)Thomas Lively2021-03-111-23/+19
| | | | | | | Since in principle an unreachable expression can be used in any position. An exception to this rule is in OptimizeInstructions, which avoids replacing concrete expressions with unreachable expressions so that it doesn't need to refinalize any expressions. Notably, Type::getLeastUpperBound was already treating unreachable as the bottom type.
* cleanup to allow binaryen to be built in more strict environments (#3566)walkingeyerobot2021-02-161-0/+1
|
* Basic EH instrucion support for the new spec (#3487)Heejin Ahn2021-01-151-1/+3
| | | | | | | | | | | | | | | | | | | | This updates `try`-`catch`-`catch_all` and `rethrow` instructions to match the new spec. `delegate` is not included. Now `Try` contains not a single `catchBody` expression but a vector of catch bodies and events. This updates most existing routines, optimizations, and tests modulo the interpreter and the CFG traversal. Because the interpreter has not been updated yet, the EH spec test is temporarily disabled in check.py. Also, because the CFG traversal for EH is not yet updated, several EH tests in `rse_all-features.wast`, which uses CFG traversal, are temporarily commented out. Also added a few more tests in existing EH test functions in test/passes. In the previous spec, `catch` was catching all exceptions so it was assumed that anything `try` body throws is caught by its `catch`, but now we can assume the same only if there is a `catch_all`. Newly added tests test cases when there is a `catch_all` and cases there are only `catch`es separately.
* [OptimizeInstructions] Fix fuzz bug with shifts (#3376)Alon Zakai2020-12-021-8/+14
| | | | | | | | | | | | | | | | | | | | | | | The code there looks for a "sign-extend": (x << a) >> b where the right shift is signed. If a = b = 24 for example then that is a sign extend of an 8-bit value (it works by shifting the 8-bit value's sign bit to the position of the 32-bit value's sign bit, then shifting all the way back, which fills everything above 8 bits with the sign bit). The tricky thing is that in some cases we can handle a != b - but we forgot a place to check that. Specifically, a repeated sign-extend is not necessary, but if the outer one has extra shifts, we can't do it. This is annoyingly complex code, but for purposes of reviewing this PR, you can see (unless I messed up) that the only change is to ensure that when we look for a repeated sign extend, then we only optimize that case when there are no extra shifts. And a repeated sign-extend is obviously ok to remove, (((x << a) >> a) << a) >> a => (x << a) >> a This is an ancient bug, showing how hard it can be to find certain patterns either by fuzzing or in the real world... Fixes #3362
* [OptimizeInstructions] Fix a fuzz bug with comparing signed and unsigned ↵Alon Zakai2020-12-011-38/+41
| | | | values (#3399)
* [TypedFunctionReferences] Enable call_ref in fuzzer, and fix minor misc fuzz ↵Alon Zakai2020-11-251-1/+3
| | | | | | | | | | | | | | | | | | | | bugs (#3401) * Count signatures in tuple locals. * Count nested signature types (confirming @aheejin was right, that was missing). * Inlining was using the wrong type. * OptimizeInstructions should return -1 for unhandled types, not error. * The fuzzer should check for ref types as well, not just typed function references, similar to what GC does. * The fuzzer now creates a function if it has no other option for creating a constant expression of a function type, then does a ref.func of that. * Handle unreachability in call_ref binary reading. * S-expression parsing fixes in more places, and add a tiny fuzzer for it. * Switch fuzzer test to just have the metrics, and not print all the fuzz output which changes a lot. Also fix noprint handling which only worked on binaries before. * Fix Properties::getLiteral() to use the specific function type properly, and make Literal's function constructor require that, to prevent future bugs. * Turn all input types into nullable types, for now.
* Minor code cleanups. NFC (#3364)Alon Zakai2020-11-151-1/+1
| | | The vacuum code can be deleted as it is handled by the default anyhow.
* Some refactorings in addition to #3338 (#3336)Max Graey2020-11-121-16/+10
| | | | See discussion in #3303
* OptimizeInstructions: Fix regression from #3303 / #3275 (#3338)Alon Zakai2020-11-121-6/+5
| | | | | | | | | | | | | | | | | X - Y <= 0 => X <= Y That is true mathematically, but not in the case of an overflow, e.g. X=10, Y=0x8000000000000000. X - Y is a negative number, so X - Y <= 0 is true. But it is not true that X <= Y (as Y is negative, but X is not). See discussion in #3303 (comment) The actual regression was in #3275, but the fuzzer had an easier time finding it due to #3303
* Optimize i32(x) % C_pot in boolean context (#3307)Max Graey2020-11-101-2/+17
| | | | | | bool(i32(x) % C_pot) -> bool(i32(x) & (C_pot - 1)) bool(i32(x) % min_s) -> bool(i32(x) & max_s) For all other situations we already do this for (i32|i64).rem_s
* Canonicalize subtraction with constant on the right to addition (#3321)Max Graey2020-11-101-59/+79
| | | | | | | Using addition in more places is better for gzip, and helps simplify the optimizer as well. Add a FinalOptimizer phase to do optimizations like our signed LEB tweaks, to reduce binary size in the rare case when we do want a subtraction.
* Optimize signed / unsigned relationals when RHS is min or max constant (#3314)Max Graey2020-11-041-7/+89
|
* Slight refactoring of handOptimize (#3305)Max Graey2020-11-031-15/+15
| | | Move the checks for most unoptimizable expression types out into visitExpression and simplify some other code.
* Optimize x * -1.0 in non-fastMath case (#3315)Max Graey2020-11-031-3/+12
| | | | | | | | | | | | | | We can still make x * -1.0 cheaper for non-fastMath mode as: x * -1.0 -> -0.0 - x Should at least help baseline compilers. Also it could enable further optimizations, e.g.: a + b * -1 a + (-0.0 - b) (a - 0.0) - b a - b