summaryrefslogtreecommitdiff
path: root/src/passes/OptimizeInstructions.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* OptimizeInstructions: Check for possible added-constant overflows (#5227)Alon Zakai2022-12-201-14/+99
| | | | | | | | | | | Fix a regression from #5025 : we subtract constants there, and we need to be aware that such subtraction can change a constant from signed to unsigned if the comparison is signed, as 0x80000000 - 1 = 0x7fffffff 0x8000000 is a negative number when seen as signed, but always positive after the subtraction.
* Update RefCast representation to drop extra HeapType (#5350)Thomas Lively2022-12-201-2/+7
| | | | | | | | | The latest upstream version of ref.cast is parameterized with a target reference type, not just a heap type, because the nullability of the result is parameterizable. As a first step toward implementing these new, more flexible ref.cast instructions, change the internal representation of ref.cast to use the expression type as the cast target rather than storing a separate heap type field. For now require that the encoded semantics match the previously allowed semantics, though, so that none of the optimization passes need to be updated.
* [Wasm GC] Optimize away null arms that would trap (#5358)Alon Zakai2022-12-161-5/+74
| | | | | | | | | | | | | | | | | | | | | | | | E.g. (struct.get (select (ref.null ..) (something) (condition) ) ) If traps-never-happen then this can be (drop (condition)) (struct.get (something) ) That is, we can remove the arm that is null, as it would trap but traps are assumed to not happen. Also fix a bug this uncovers on struct.set on a null type.
* Fix a fuzz bug with incremental unreachability in OptimizeInstructions (#5237)Alon Zakai2022-11-091-1/+7
| | | | | | | | | | | OptimizeInstructions in rare cases can add unreachability. We propagate it out at the end all at once. The fuzzer was smart enough to find a very special combination of code + passes that can hit an issue, see the testcase. As mentioned in the TODO, we should perhaps avoid adding unreachability in OptimizeInstructions at all. If this happens again that might be worth the effort. But also checking the type of the child as in this PR doesn't add much complexity in the code.
* [Wasm GC] Externalize/Internalize allow nulls (#5175)Alon Zakai2022-10-211-0/+6
| | | | | These are encoded as RefAs operations, and we have optimizations that assume those trap on null, but Externalize/Internalize do not. Skip them there to avoid an error on the type being incorrect later.
* Implement bottom heap types (#5115)Thomas Lively2022-10-071-3/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | These types, `none`, `nofunc`, and `noextern` are uninhabited, so references to them can only possibly be null. To simplify the IR and increase type precision, introduce new invariants that all `ref.null` instructions must be typed with one of these new bottom types and that `Literals` have a bottom type iff they represent null values. These new invariants requires several additional changes. First, it is now possible that the `ref` or `target` child of a `StructGet`, `StructSet`, `ArrayGet`, `ArraySet`, or `CallRef` instruction has a bottom reference type, so it is not possible to determine what heap type annotation to emit in the binary or text formats. (The bottom types are not valid type annotations since they do not have indices in the type section.) To fix that problem, update the printer and binary emitter to emit unreachables instead of the instruction with undetermined type annotation. This is a valid transformation because the only possible value that could flow into those instructions in that case is null, and all of those instructions trap on nulls. That fix uncovered a latent bug in the binary parser in which new unreachables within unreachable code were handled incorrectly. This bug was not previously found by the fuzzer because we generally stop emitting code once we encounter an instruction with type `unreachable`. Now, however, it is possible to emit an `unreachable` for instructions that do not have type `unreachable` (but are known to trap at runtime), so we will continue emitting code. See the new test/lit/parse-double-unreachable.wast for details. Update other miscellaneous code that creates `RefNull` expressions and null `Literals` to maintain the new invariants as well.
* Refactor standardizeNaN (#5064)Max Graey2022-10-041-5/+1
| | | Change `standardizeNaN` to take a `Literal` to reduce usage verbosity.
* Refactor interaction between Pass and PassRunner (#5093)Thomas Lively2022-09-301-1/+3
| | | | | | | | | | | | | | Previously only WalkerPasses had access to the `getPassRunner` and `getPassOptions` methods. Move those methods to `Pass` so all passes can use them. As a result, the `PassRunner` passed to `Pass::run` and `Pass::runOnFunction` is no longer necessary, so remove it. Also update `Pass::create` to return a unique_ptr, which is more efficient than having it return a raw pointer only to have the `PassRunner` wrap that raw pointer in a `unique_ptr`. Delete the unused template `PassRunner::getLast()`, which looks like it was intended to enable retrieving previous analyses and has been in the code base since 2015 but is not implemented anywhere.
* [OptimizeInstruction] Prevent reordering for rule in #5034 (#5066)Max Graey2022-09-211-2/+3
|
* [OptimizeInstructions] Simplify add / sub with negative on LHS or RHS for ↵Max Graey2022-09-201-0/+25
| | | | | | | | | floating points (#5034) ``` (-x) + y -> y - x x + (-y) -> x - y x - (-y) -> x + y ```
* [OptimizeInstructions] More canonizations for floating points (#5033)Max Graey2022-09-151-14/+5
| | | | | | | | x - C -> x + (-C) min(C, x) -> min(x, C) max(C, x) -> max(x, C) And remove redundant rules
* Move relational-optimizing code to optimizeRelational [NFC] (#5036)Alon Zakai2022-09-131-33/+35
| | | | | | This just moves the code from #5025 to the right function, which I did not realize existed. optimizeRelational is where we optimize binary operations that do comparisons, and it's nice to put all that code together. Avoids repeated checks of isRelational() in separate places.
* OptimizeInstructions: Use min/max bits in comparisons (#5035)Alon Zakai2022-09-131-6/+84
| | | | | | | When we see e.g. x < y and x has fewer bits set, we can infer a result. Helps #5010. As mentioned there, this is one of the top superoptimizer findings. On j2wasm it ends up removing a few hundred binary operations for example.
* [OptimizeInstructions] Simplify floating point ops with NaN on right side ↵Max Graey2022-09-121-0/+34
| | | | | | | | | | | | | | | | | | | (#4985) x + nan -> nan' x - nan -> nan' x * nan -> nan' x / nan -> nan' min(x, nan) -> nan' max(x, nan) -> nan' where nan' is canonicalized nan of rhs x != nan -> 1 x == nan -> 0 x >= nan -> 0 x <= nan -> 0 x > nan -> 0 x < nan -> 0
* OptimizeInstructions: Optimize comparisons with an added offset (#5025)Alon Zakai2022-09-091-19/+81
| | | | | | | | | | | | | | E.g. x + C1 > C2 ==> x > (C2-C1) We do need to be careful of overflows in either the add on the left or the proposed subtract on the right. In the latter case, we can at least do x + C1 > C2 ==> x + (C1-C2) > 0 Helps #5008 (but more patterns remain). Found by the superoptimizer #4994. This was the top suggestion for Java and Dart.
* Switch to i32 operations when heading to a wrap anyhow (#5022)Alon Zakai2022-09-071-8/+119
| | | | | | | | | | | | | | | E.g. if we just do addition etc., then any higher bits will be wrapped out anyhow: int32_t(int64_t(x) + int64_t(10)) => x + int32_t(10) Found by the superoptimizer #4994 . This is by far the most promising suggestion it had. Interestingly, it mainly helps Go, where it removes 20% of all Unary operations (the extends and wraps), and Rust, where it removes 3%. Helps #5004. This handles the common cases I see in the superoptimizer output, but there are more that could be handled.
* [OptimizeInstructions] Simplify two binary expressions with asymmetric ↵Max Graey2022-09-061-0/+49
| | | | | | | | | | | | shifts and same constant (#4996) (x >> C) << C -> x & -(1 << C) (x >>> C) << C -> x & -(1 << C) (x << C) >>> C -> x & (-1 >>> C) // (x << C) >> C doesn't support Found by the superoptimizer #4994 Fixes #5012
* OptimizeInstructions: Select => and/or in more cases (#4154)Alon Zakai2022-09-011-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | x ? 0 : y ==> z & y where z = !x x ? y : 1 ==> z | y where z = !x Only do this when we have z = !x, that is, we can invert x without adding an actual eqz (which would add work). To do this, canonicalize selects to prefer to flip the arms, when possible, if it would move a constant to a location that the existing optimizations already turn into an and/or. That is, x >= 5 ? 0 : y != 42 would be canonicalized into x < 5 ? y != 42 : 0 and existing opts turn that into (x < 5) & (y != 42) The canonicalization does not always help this optimization, as we need the values to be boolean to do this, but canonicalizing is still nice to get more regular code which might compress slightly better.
* [Wasm GC] Support non-nullable locals in the "1a" form (#4959)Alon Zakai2022-08-311-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An overview of this is in the README in the diff here (conveniently, it is near the top of the diff). Basically, we fix up nn locals after each pass, by default. This keeps things easy to reason about - what validates is what is valid wasm - but there are some minor nuances as mentioned there, in particular, we ignore nameless blocks (which are commonly added by various passes; ignoring them means we can keep more locals non-nullable). The key addition here is LocalStructuralDominance which checks which local indexes have the "structural dominance" property of 1a, that is, that each get has a set in its block or an outer block that precedes it. I optimized that function quite a lot to reduce the overhead of running that logic after each pass. The overhead is something like 2% on J2Wasm and 0% on Dart (0%, because in this mode we shrink code size, so there is less work actually, and it balances out). Since we run fixups after each pass, this PR removes logic to manually call the fixup code from various places we used to call it (like eh-utils and various passes). Various passes are now marked as requiresNonNullableLocalFixups => false. That lets us skip running the fixups after them, which we normally do automatically. This helps avoid overhead. Most passes still need the fixups, though - any pass that adds a local, or a named block, or moves code around, likely does. This removes a hack in SimplifyLocals that is no longer needed. Before we worked to avoid moving a set into a try, as it might not validate. Now, we just do it and let fixups happen automatically if they need to: in the common code they probably don't, so the extra complexity seems not worth it. Also removes a hack from StackIR. That hack tried to avoid roundtrip adding a nondefaultable local. But we have the logic to fix that up now, and opts will likely keep it non-nullable as well. Various tests end up updated here because now a local can be non-nullable - previous fixups are no longer needed. Note that this doesn't remove the gc-nn-locals feature. That has been useful for testing, and may still be useful in the future - it basically just allows nn locals in all positions (that can't read the null default value at the entry). We can consider removing it separately. Fixes #4824
* [OptimizeInstruction] Reorder rules in optimizeSelect (#4984)Max Graey2022-08-291-32/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To unblock some optimizations. For example this: ```wat (select (i32.eqz (local.get $x)) (i32.const 0) (i32.eqz (local.get $y)) ) ``` Was previously optimized as: ```wat (i32.eqz (select (i32.const 1) (local.get $x) (local.get $y) ) ) ``` Because `optimizeSelect` applied `!x ? !y : 0 -> x ? 0 : !y` then `!(x ? 1 : y)`, blocking the next rules which could have folded this to `or` or `and`. After this PR the same example optimizes better: ```wat (i32.eqz (i32.or (local.get $x) (local.get $y) ) ) ```
* [OptimizeInstructions] Canonicalize relational ops near min/max values (#4282)Max Graey2022-08-261-0/+61
| | | | | | | | | | | | | A continuation of #4272. ``` (signed)x < s_min + 1 ==> x == s_min (signed)x >= s_min + 1 ==> x != s_min (signed)x > s_max - 1 ==> x == s_max (signed)x <= s_max - 1 ==> x != s_max (unsigned)x <= u_max - 1 ==> x != u_max (unsigned)x > u_max - 1 ==> x == u_max ```
* [TNH Fuzzing] Fix some equality checks that ignored effects (#4951)Alon Zakai2022-08-241-8/+20
| | | | | | | | | | | | | Fuzzing with TrapsNeverHappen found a bug, and then reading similar code I found another, where we check structural equality but ignored effects. Some things look equal but may have different values at runtime: (foo (call $x) (call $y) ) The arms are identical structurally but due to side effects may not be identical in value.
* [OptimizeInstructions] Simplify rounding or conversions after int to float ↵Max Graey2022-08-181-0/+67
| | | | | | | | | | | | | | | | | | | casts (#4720) i32 -> f64 -> i32 rountripping optimizations: ```rust i32(f64(i32(x))) -> x // i32.trunc(_sat)_f64_s(f64.convert_i32_s(x)) -> x u32(f64(u32(x))) -> x // i32.trunc(_sat)_f64_u(f64.convert_i32_u(x)) -> x // note assymetric signed / unsigned or unsigned / signed cases can't be simplified in similar way ``` and rounding after integer to float casts: ```rust ceil(float(int(x))) -> float(int(x)) floor(float(int(x))) -> float(int(x)) trunc(float(int(x))) -> float(int(x)) nearest(float(int(x))) -> float(int(x)) ``` where `float = f32 | f64`, `int = i32 | i64 | u32 | u64`
* Mutli-Memories Support in IR (#4811)Ashley Nelson2022-08-171-35/+65
| | | | | | | This PR removes the single memory restriction in IR, adding support for a single module to reference multiple memories. To support this change, a new memory name field was added to 13 memory instructions in order to identify the memory for the instruction. It is a goal of this PR to maintain backwards compatibility with existing text and binary wasm modules, so memory indexes remain optional for memory instructions. Similarly, the JS API makes assumptions about which memory is intended when only one memory is present in the module. Another goal of this PR is that existing tests behavior be unaffected. That said, tests must now explicitly define a memory before invoking memory instructions or exporting a memory, and memory names are now printed for each memory instruction in the text format. There remain quite a few places where a hardcoded reference to the first memory persist (memory flattening, for example, will return early if more than one memory is present in the module). Many of these call-sites, particularly within passes, will require us to rethink how the optimization works in a multi-memories world. Other call-sites may necessitate more invasive code restructuring to fully convert away from relying on a globally available, single memory pointer.
* [Optimize Instructions] Fold eqz(eqz(x)) to not-equal of zero (#4855)Max Graey2022-08-081-4/+17
| | | | | | eqz(eqz(i32(x))) -> i32(x) != 0 eqz(eqz(i64(x))) -> i64(x) != 0 Only when shrinkLevel == 0 (prefer speed over binary size).
* Remove RTTs (#4848)Thomas Lively2022-08-051-114/+71
| | | | | | | RTTs were removed from the GC spec and if they are added back in in the future, they will be heap types rather than value types as in our implementation. Updating our implementation to have RTTs be heap types would have been more work than deleting them for questionable benefit since we don't know how long it will be before they are specced again.
* [Optimize Instructions] Refactor squared rules (#4840)Max Graey2022-08-021-51/+127
| | | | | | | | | | | | + Move these rules to separate function; + Refactor them to use matches; + Add comments; + Handle rotational shifts as well; + Handle overflows for `<<`, `>>`, `>>>` shifts; + Add mixed rotate rules: ```rust rotl(rotr(x, C1), C2) => rotr(x, C1 - C2) rotr(rotl(x, C1), C2) => rotl(x, C1 - C2) ```
* [OptimizeInstructions] Add folding for mixed left shift and mul with ↵Max Graey2022-07-261-0/+17
| | | | | | constants on RHS (#4808) (x * C1) << C2 -> x * (C1 << C2) (x << C1) * C2 -> x * (C2 << C1)
* Remove basic reference types (#4802)Thomas Lively2022-07-201-3/+5
| | | | | | | | | Basic reference types like `Type::funcref`, `Type::anyref`, etc. made it easy to accidentally forget to handle reference types with the same basic HeapTypes but the opposite nullability. In principle there is nothing special about the types with shorthands except in the binary and text formats. Removing these shorthands from the internal type representation by removing all basic reference types makes some code more complicated locally, but simplifies code globally and encourages properly handling both nullable and non-nullable reference types.
* [Wasm GC] Check if ref.eq inputs can possibly be the same (#4780)Alon Zakai2022-07-141-3/+27
| | | | | For them to be the same we must have a value that can appear on both sides. If the heap types disallow that, then only null is possible, and if that is impossible as well then the result must be 0.
* [Wasm GC] RefIs / RefEq / RefTest return a boolean (#4786)Alon Zakai2022-07-081-0/+1
| | | | | | | | | | | | This marks all reference operations that return 0/1 as doing so. This allows various bitwise operations to be optimized on them. This also marks StringEq as a boolean, though we can't test that fully yet as Strings support is wip (no interpreter or other stuff yet). As a driveby this moves emitsBoolean to its own file, and uses it in getMaxBits to avoid redundancy (the redundant code paths now have a WASM_UNREACHABLE).
* [NFC] Refactor and clarify conditions for removing casts (#4754)Alon Zakai2022-06-291-43/+115
| | | This just moves code around and adds comments.
* [Wasm GC] OptimizeInstructions: Optimize ref.eq on equal inputs with a tee ↵Alon Zakai2022-06-241-3/+10
| | | | | | | | | | | (#4749) (ref.eq (local.tee $x (..)) (local.get $x) ) That will definitely return 1. Before this PR the side effects of tee stopped us from optimizing.
* [WasmGC] OptimizeInstructions: Improve RefIs cast ordering (#4752)Alon Zakai2022-06-241-4/+18
| | | | | | | | | | | | | | #4748 regressed us in some cases, because it removed casts first: (ref.is_func (ref.as_func (local.get $anyref))) If the cast is removed first, and the local has no useful type info, then we'd have removed the cast but could not remove the ref.is. But the ref.is could be optimized to 1, as it must be a func - the type info proves it thanks to the cast. To avoid this, remove casts after everything else.
* [Wasm GC] [TNH] OptimizeInstructions: remove casts leading to comparisons ↵Alon Zakai2022-06-241-0/+38
| | | | | | (#4748) Comparing references does not depend on the cast, so if we are ignoring traps in traps-never-happen mode then we can remove them.
* [Optimize Instructions] Simplify sign extentions (Part 2) (#4382)Max Graey2022-06-071-0/+16
| | | | | | Similar to #4004 but for 32-bit integers i32(x) << 24 >> 24 ==> i32.extend8_s(x) i32(x) << 16 >> 16 ==> i32.extend16_s(x)
* OptimizeInstructions: Turn call_ref of a select into an if over two direct ↵Alon Zakai2022-05-271-0/+18
| | | | | | calls (#4660) This extends the existing call_indirect code to do the same for call_ref, basically. The shared code is added to a new helper utility.
* OptimizeInstructions: Refinalize after a cast removal (#4611)Alon Zakai2022-04-251-0/+26
| | | | | | | | | Casts can replace a type with a subtype, which normally has no downsides, but in a corner case of struct types it can lead to us needing to refinalize higher up too, see details in the comment. We have avoided any Refinalize calls in OptimizeInstructions, but the case handled here requires it sadly. I considered moving it to another pass, but this is a peephole optimization so there isn't really a better place.
* [OptimizeInstructions] Combine some relational ops joined Or/And (Part 7-8) ↵Max Graey2022-01-261-12/+63
| | | | | | | | | | | (#4399) Final part of #4265 (i32(x) >= 0) & (i32(y) >= 0) ==> i32(x | y) >= 0 (i64(x) >= 0) & (i64(y) >= 0) ==> i64(x | y) >= 0 (i32(x) == -1) & (i32(y) == -1) ==> i32(x & y) == -1 (i64(x) == -1) & (i64(y) == -1) ==> i64(x & y) == -1
* [OptimizeInstructions] Combine some relational ops joined Or/And (Part 5-6) ↵Max Graey2022-01-201-6/+59
| | | | | | | | | (#4372) (i32(x) >= 0) | (i32(y) >= 0) ==> i32(x & y) >= 0 (i64(x) >= 0) | (i64(y) >= 0) ==> i64(x & y) >= 0 (i32(x) != -1) | (i32(y) != -1) ==> i32(x & y) != -1 (i64(x) != -1) | (i64(y) != -1) ==> i64(x & y) != -1
* Revert "[OptimizeInstructions] Optimize zero sized bulk memory ops even ↵Thomas Lively2022-01-141-21/+5
| | | | | without "ignoreImplicitTraps" (#4295)" (#4459) This reverts commit 5cf3521708cfada341285414df2dc7366d7e5454.
* [OptimizeInstructions] Optimize zero sized bulk memory ops even without ↵Max Graey2022-01-121-5/+21
| | | | "ignoreImplicitTraps" (#4295)
* [EH][GC] Fix nested pop after removing ref.cast (#4407)Heejin Ahn2021-12-281-0/+4
| | | | | | | | | | | | | | | | `ref.cast` can be statically removed when the ref's type is a subtype of the intended RTT type and either of `--ignore-implicit-traps` or `--traps-never-happen` is given: https://github.com/WebAssembly/binaryen/blob/083ab9842ec3d4ca278c95e1a33112ae7cd4d9e5/src/passes/OptimizeInstructions.cpp#L1603-L1624 Some more context: https://github.com/WebAssembly/binaryen/pull/4097#discussion_r694456784 But this can create a block in which a `pop` is nested, which makes the `catch` invalid. The test in this PR is the same as the example given by @kripken in #4237. This calls the fixup function `EHUtils::handleBlockNestedPops` at the end of the pass to fix this. Also, because this pass creates a lot of blocks in other patterns, I think it is possible there can be other patterns to cause this kind of `pop` nesting.
* [OptimizeInstructions] Combine some relational ops joined Or/And (Part 4) ↵Max Graey2021-12-141-4/+53
| | | | | | (#4339) (i32(x) < 0) & (i32(y) < 0) ==> i32(x & y) < 0 (i64(x) < 0) & (i64(y) < 0) ==> i64(x & y) < 0
* [OptimizeInstructions] Combine some relational ops joined Or/And (Part 3) ↵Max Graey2021-12-041-9/+47
| | | | | | | | (#4338) (i32(x) < 0) | (i32(y) < 0) ==> i32(x | y) < 0 (i32(x) != 0) | (i32(y) != 0) ==> i32(x | y) != 0 Likewise for i64.
* [NFC] Avoid some unnecessary copies of PassOptions (#4361)Alon Zakai2021-12-011-5/+5
| | | | | | PassOptions is a fairly large structure and even includes a std::map. I also have plans to add further fields there to make it even larger. Before doing that I noticed that in some places we copy it instead of being consistent and taking it by reference, which this PR fixes.
* Change from storing Signature to HeapType on CallIndirect (#4352)Thomas Lively2021-11-221-1/+1
| | | | | | | | | | | | With nominal function types, this change makes it so that we preserve the identity of the function type used with call_indirect instructions rather than recreating a function heap type, which may or may not be the same as the originally parsed heap type, from the function signature during module writing. This will simplify the type system implementation by removing the need to store a "canonical" nominal heap type for each unique signature. We previously depended on those canonical types to avoid creating multiple duplicate function types during module writing, but now we aren't creating any new function types at all.
* [OptimizeInstructions] Combine some relational ops joined Or/And (Part 2) ↵Max Graey2021-11-161-5/+24
| | | | | | (#4336) (i32(x) != 0) | (i32(y) != 0) ==> i32(x | y) != 0 (i64(x) != 0) | (i64(y) != 0) ==> i64(x | y) != 0
* [OptimizeInstructions] Combine some relational ops joined Or/And (Part 1) ↵Max Graey2021-11-161-8/+34
| | | | | | (#4333) (i32(x) == 0) & (i32(y) == 0) ==> i32(x | y) == 0 (i64(x) == 0) & (i64(y) == 0) ==> i64(x | y) == 0
* [OptimizeInstructions] Combine extend into i64 and 32-bit load operations ↵Max Graey2021-11-121-0/+40
| | | | | | | | | | | | | | | | | | | | (#4307) i64.extend_i32_u(i32.load8_u(x)) -> i64.load8_u(x) i64.extend_i32_u(i32.load16_u(x)) -> i64.load16_u(x) i64.extend_i32_s(i32.load8_u(x)) -> i64.load8_u(x) i64.extend_i32_s(i32.load16_u(x)) -> i64.load16_u(x) i64.extend_i32_s(i32.load8_s(x)) -> i64.load8_s(x) i64.extend_i32_s(i32.load16_s(x)) -> i64.load16_s(x) i64.extend_i32_u(i32.load(x))) -> i64.load32_u(x) i64.extend_i32_s(i32.load(x))) -> i64.load32_s(x) don't apply to i64.extend_i32_u(i32.load8_s(x)) -> skip i64.extend_i32_u(i32.load16_s(x)) -> skip i64.extend_i32_s(i32.atomic.load(x)) -> skip