summaryrefslogtreecommitdiff
path: root/src/passes/Precompute.cpp
Commit message (Collapse)AuthorAgeFilesLines
* LocalCSE rewrite (#4079)Alon Zakai2021-08-171-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Technically this is not a new pass, but it is a rewrite almost from scratch. Local Common Subexpression Elimination looks for repeated patterns, stuff like this: x = (a + b) + c y = a + b => temp = a + b x = temp + c y = temp The old pass worked on flat IR, which is inefficient, and was overly complicated because of that. The new pass uses a new algorithm that I think is pretty simple, see the detailed comment at the top. This keeps the pass enabled only in -O4, like before - right after flattening the IR. That is to make this as minimal a change as possible. Followups will enable the pass in the main pipeline, that is, we will finally be able to run it by default. (Note that to make the pass work well after flatten, an extra simplify-locals is added - the old pass used to do part of simplify-locals internally, which was one source of complexity. Even so, some of the -O4 tests have changes, due to minor factors - they are just minor orderings etc., which can be seen by inspecting the outputs before and after using e.g. --metrics) This plus some followup work leads to large wins on wasm GC output. On j2cl there is a common pattern of repeated struct.gets, so common that this pass removes 85% of all struct.gets, which makes the total binary 15% smaller. However, on LLVM-emitted code the benefit is minor, less than 1%.
* [Wasm GC] Handle uses of default values in LocalSubtyping (#4024)Alon Zakai2021-07-281-2/+4
| | | | | | It is ok to use the default value of a reference even if we refine the type, as it would be a more specifically-typed null, and all nulls compare the same. However, if the default is used then we *cannot* alter the type to be non-nullable, as then we'd use a null where that is not allowed.
* [Wasm GC] Fix Heap2Local + non-nullable locals (#4017)Alon Zakai2021-07-231-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Given (local $x (ref $foo)) (local.set $x ..) (.. (local.get $x)) If we remove the local.set but not the get, then we end up with (local $x (ref $foo)) (.. (local.get $x)) It looks like the local.get reads the initial value of a non-nullable local, which is not allowed. In practice, this would crash in precompute-propagate which would try to propagate the initial value to the get. Add an assertion there with a clear message, as until we have full validation of non-nullable locals (and the spec for that is in flux), that pass is where bugs will end up being noticed. To fix this, replace the get as well. We can replace it with a null for simplicity; it will never be used anyhow. This also uncovered a small bug with reached not containing all the things we reached - it was missing local.gets.
* Revert "Partially fix Precompute on SIMD (#3983)" (#4002)Alon Zakai2021-07-191-9/+9
| | | | | | | | | | | | | This reverts commit b68691e826a46d1b03b27c552b1f5b7f51f92665. Instead of applying the workaround to avoid precomputing SIMD in more places, which prevents things we could optimize before, we should probably focus on making the workaround not needed - that is, implement full SIMD support in the intepreter (the reason for that PR), and the other TODO in the comment here, // Until engines implement v128.const and we have SIMD-aware optimizations // that can break large v128.const instructions into smaller consts and // splats, do not try to precompute v128 expressions.
* Partially fix Precompute on SIMD (#3983)Alon Zakai2021-07-131-9/+9
| | | We had the logic in only one place.
* [Wasm GC] Add experimental array.copy (#3911)Alon Zakai2021-05-271-0/+1
| | | | | | | | Spec for it is here: https://docs.google.com/document/d/1DklC3qVuOdLHSXB5UXghM_syCh-4cMinQ50ICiXnK3Q/edit# Also reorder some things in wasm.h that were not in the canonical order (that has no effect, but it is confusing to read).
* [Wasm GC] Fix precomputing of incompatible fallthrough values (#3875)Alon Zakai2021-05-101-3/+21
| | | | | | | | | | | | | | | | | | | | | | Precompute not only computes values, but looks at the fallthrough, (local.set 0 (block ..stuff we can ignore.. ;; the fallthrough we care about - if a value is set to local 0, it is this (i32.const 10) ) ) Normally that is fine, but the fuzzer found a case where it is not: RefCast may return a different type than the fallthrough, even an incompatible type if we try to do something bad like cast a function to a struct. As we may then propagate the value to a place that expects the proper type, this can cause an error. To fix this, check if the precomputed value is a proper subtype. If it is not, then do not look through into the fallthrough, but compute the entire thing. (In the case of a bad RefCast of a func to a struct, it would then indicate a trap happens, and we would not precompute the value.)
* [Wasm GC] Fix precompute on GC data (#3810)Alon Zakai2021-04-151-13/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes precomputation on GC after #3803 was too optimistic. The issue is subtle. Precompute will repeatedly evaluate expressions and propagate their values, flowing them around, and it ignores side effects when doing so. For example: (block ..side effect.. (i32.const 1) ) When we evaluate that we see there are side effects, but regardless of them we know the value flowing out is 1. So we can propagate that value, if it is assigned to a local and read elsewhere. This is not valid for GC because struct.new and array.new have a "side effect" that is noticeable in the result. Each time we call struct.new we get a new struct with a new address, which ref.eq can distinguish. So when this pass evaluates the same thing multiple times it will get a different result. Also, we can't precompute a struct.get even if we know the struct, not unless we know the reference has not escaped (where a call could modify it). To avoid all that, do not precompute references, aside from the trivially safe ones like nulls and function references (simple constants that are the same each time we evaluate the expression emitting them). precomputeExpression() had a minor bug which this fixes. It checked the type of the expression to see if we can create a constant for it, but really it should check the value - since (separate from this PR) we have no way to emit a "constant" for a struct etc. Also that only matters if replaceExpression is true, that is, if we are replacing with a constant; if we just want the value internally, we have no limit on that. Also add Literal support for comparing GC refs, which is used by ref.eq. Without that tiny fix the tests here crash. This adds a bunch of tests, many for corner cases that we don't handle (since the PR makes us not propagate GC references). But they should be helpful if/when we do, to avoid the mistakes in #3803
* [Wasm GC] Full precompute support for GC (#3803)Alon Zakai2021-04-131-6/+8
| | | | | | | | | | | | The precompute pass ignored all reference types, but that was overly pessimistic: we can precompute some of them, namely a null and a reference to a function are fully precomputable, etc. To allow that to work, add missing integration in getFallthrough as well. With this, we can precompute quite a lot of field accesses in the existing -Oz testcase, as can be seen from the output. That testcase runs --fuzz-exec so it prints out all those logged values, proving they have not changed.
* [GC] Add br_on_cast (#3451)Alon Zakai2020-12-171-8/+35
| | | | | | | | | | | | | | | | The tricky part here, as pointed out by aheejin in my previous attempt, is that we need to know the type of the value we send if the branch is taken. We can normally calculate that from the rtt parameter's type - we are casting to that RTT, so we know what type that is - but if the rtt is unreachable, that's a problem. To fix that, store the cast type on BrOnCast instructions. This includes a test with a br_on_cast that succeeds and sends the cast value, one that fails and passes through the uncast value, and also of one with an unreachable RTT. This includes a fix for Precompute, as noticed by that new test. If a break is taken, with a ref as a value, we can't precompute it - for the same reasons we can't precompute a ref in general, that it is a pointer to possibly shared data.
* [GC] Fully implement RTT semantics (#3441)Alon Zakai2020-12-151-0/+4
| | | | | | | | | | | | | | This adds info to RTT literals so that they can represent the chain of rtt.canon/sub commands that generated them, and it adds an internal RTT for each GC allocation (array or struct). The approach taken is to simply store the full chain of rtt.sub types that led to each literal. This is not efficient, but it is simple and seems sufficient for the semantics described in the GC MVP doc - specifically, only the types matter, in that repeated executions of rtt.canon/sub on the same inputs yield equal outputs. This PR fixes a bunch of minor issues regarding that, enough to allow testing of the optimization and execution of ref.test/cast.
* [GC] Fix Array optimization issues (#3438)Alon Zakai2020-12-111-5/+6
| | | | | | | | | | | Precompute still tried to precompute a reference because the check was not in the topmost place. Also we truncated i8/i16 values, but did not extend them properly. That was also an issue with structs. The new test replaces the old one by moving from -O1 to -Oz (which runs more opts, and would have noticed this earlier), and adds array operations too, including sign extends.
* [GC] Add struct.new and start to test interesting execution (#3433)Alon Zakai2020-12-091-0/+5
| | | | | | | | | | | | | | With struct.new read/write support, we can start to do interesting things! This adds a test of creating a struct and seeing that references behave like references, that is, if we write to the value X refers to, and if Y refers to the same thing, when reading from Y's value we see the change as well. The test is run through all of -O1, which uncovered a minor issue in Precompute: We can't try to precompute a reference type, as we can't replace a reference with a value. Note btw that the test shows the optimizer properly running CoalesceLocals on reference types, merging two locals.
* Refactor naming convention for functions handling tuples (#3196)Max Graey2020-10-091-1/+2
| | | When there are two versions of a function, one handling tuples and the other handling non-tuple values, the previous naming convention was to have "Single" in the name of the non-tuple handling function. This PR simplifies the convention and shortens function names by making the names plural for the tuple-handling version and singular for the non-tuple-handling version.
* Fix RefNull issues (#3123)Daniel Wirtz2020-09-131-3/+6
| | | | | | | | | * ExpressionAnalyzer: Fix `ref.null ht` equality check to include `ht`. * Precompute: Fix `ref.null ht` expression reuse to also update `ht`. * Fuzzing: Fix `ref.null func` becoming canonicalized to `ref.func $funcref` when evaluating execution results, by adding a check for `isNull`. * Fuzzing: Print actual and expected execution results when aborting. * Tests: Update `if-arms-subtype` test in `optimize-instructions` to check that identical `if` arms become folded while not identical arms are kept.
* Update reference types (#3084)Daniel Wirtz2020-09-091-1/+1
| | | | | | | Align with the current state of the reference types proposal: * Remove `nullref` * Remove `externref` and `funcref` subtyping * A `Literal` of a nullable reference type can now represent `null` (previously was type `nullref`) * Update the tests and temporarily comment out those tests relying on subtyping
* Refactor ExpressionRunner (#2804)Daniel Wirtz2020-04-271-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tackles the concerns raised in https://github.com/WebAssembly/binaryen/issues/2797 directly related to https://github.com/WebAssembly/binaryen/pull/2702 by reverting merging all of `PrecomputeExpressionRunner` into the base `ExpressionRunner`, instead adding a common base for both the precompute pass and the new C-API to inherit. No functional changes. --- ### Current hierarchy after https://github.com/WebAssembly/binaryen/pull/2702 is ``` ExpressionRunner ├ [PrecomputeExpressionRunner] ├ [CExpressionRunner] ├ ConstantExpressionRunner └ RuntimeExpressionRunner ``` where `ExpressionRunner` contains functionality not utilized by `ConstantExpressionRunner` and `RuntimeExpressionRunner`. ### New hierarchy will be: ``` ExpressionRunner ├ ConstantExpressionRunner │ ├ [PrecomputeExpressionRunner] │ └ [CExpressionRunner] ├ InitializerExpressionRunner └ RuntimeExpressionRunner ``` with the precompute pass's and the C-API's shared functionality now moved out of `ExpressionRunner` into a new `ConstantExpressionRunner`. Also renames the previous `ConstantExpressionRunner` to `InitializerExpressionRunner` to [better represent its uses](https://webassembly.org/docs/modules/#initializer-expression) and to make its previous name usable for the new intermediate template, where it fits perfectly. Also adds a few comments answering some of the questions that came up recently. ### Old hierarchy before https://github.com/WebAssembly/binaryen/pull/2702 for comparison: ``` ExpressionRunner ├ [PrecomputeExpressionRunner] ├ ConstantExpressionRunner └ RuntimeExpressionRunner ```
* Fix ExpressionRunner issues found by the fuzzer (#2790)Daniel Wirtz2020-04-231-1/+1
| | | | | | | Fixes #2788 found by the fuzzer, introduced in #2702, which turned out to be incorrect usage of std::move, by removing any std::moves introduced in that PR to be better safe than sorry. Also fixes problems with WASM_INTERPRETER_DEBUG spotted during debugging.
* Refactor expression runner so it can be used via the C and JS APIs (#2702)Daniel Wirtz2020-04-201-78/+25
| | | | | | | Refactors most of the precompute pass's expression runner into its base class so it can also be used via the C and JS APIs. Also adds the option to populate the runner with known constant local and global values upfront, and remembers assigned intermediate values as well as traversing into functions if requested.
* Fix reuse of constant nodes in Precompute (#2764)Heejin Ahn2020-04-141-28/+33
| | | | | | | | | | Previously we tried to reuse `Const` node if a precomputed value is a constant node. But now we have two more kinds of constant node (`RefNull` and `RefFunc`), so we shouldn't reuse them interchangeably, meaning we shouldn't try to reuse a `Const` node when the value at hand is a `RefNull`. This correctly checks the type of node and tries to reuse only if the types of nodes match. Fixes #2759.
* Update Precompute to handle tuples (#2687)Thomas Lively2020-03-101-31/+27
| | | | | | This involves replacing `Literal::makeZero` with `Literal::makeZeroes` and `Literal::makeSingleZero` and updating `isConstantExpression` to handle constant tuples as well. Also makes `Literals` its own struct and adds convenience methods on it.
* Handle multivalue returns in the interpreter (#2684)Thomas Lively2020-03-101-18/+13
| | | | Updates the interpreter to properly flow vectors of values, including at function boundaries. Adds a small spec test for multivalue return.
* Initial multivalue support (#2675)Thomas Lively2020-03-051-19/+26
| | | | | | | | | Implements parsing and emitting of tuple creation and extraction and tuple-typed control flow for both the text and binary formats. TODO: - Extend Precompute/interpreter to handle tuple values - C and JS API support/testing - Figure out how to lower in stack IR - Fuzzing
* Add EH support for OptimizeInstructions (#2608)Heejin Ahn2020-02-051-1/+2
| | | | | | - Adds support for `Try` in `optimizeBoolean` function - Adds support for `Try` in `getFallThrough` function - Adds approximate cost values for instructions in EH and reference types proposals.
* [NFC] Enforce use of `Type::` on type names (#2434)Thomas Lively2020-01-071-6/+7
|
* Add support for reference types proposal (#2451)Heejin Ahn2019-12-301-8/+11
| | | | | | | | | | | | This adds support for the reference type proposal. This includes support for all reference types (`anyref`, `funcref`(=`anyfunc`), and `nullref`) and four new instructions: `ref.null`, `ref.is_null`, `ref.func`, and new typed `select`. This also adds subtype relationship support between reference types. This does not include table instructions yet. This also does not include wasm2js support. Fixes #2444 and fixes #2447.
* Multivalue type creation and inspection (#2459)Thomas Lively2019-11-221-4/+4
| | | | | | | | | | | | | Adds the ability to create multivalue types from vectors of concrete value types. All types are transparently interned, so their representation is still a single uint32_t. Types can be extracted into vectors of their component parts, and all the single value types expand into vectors containing themselves. Multivalue types are not yet used in the IR, but their creation and inspection functionality is exposed and tested in the C and JS APIs. Also makes common type predicates methods of Type and improves the ergonomics of type printing.
* Do not precompute SIMDLoad (#2409)Thomas Lively2019-10-301-0/+1
| | | This fixes a crash when programs containing load_splats are optimized.
* Add basic exception handling support (#2282)Heejin Ahn2019-08-131-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds basic support for exception handling instructions, according to the spec: https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md This PR includes support for: - Binary reading/writing - Wast reading/writing - Stack IR - Validation - binaryen.js + C API - Few IR routines: branch-utils, type-updating, etc - Few passes: just enough to make `wasm-opt -O` pass - Tests This PR does not include support for many optimization passes, fuzzer, or interpreter. They will be follow-up PRs. Try-catch construct is modeled in Binaryen IR in a similar manner to that of if-else: each of try body and catch body will contain a block, which can be omitted if there is only a single instruction. This block will not be emitted in wast or binary, as in if-else. As in if-else, `class Try` contains two expressions each for try body and catch body, and `catch` is not modeled as an instruction. `exnref` value pushed by `catch` is get by `pop` instruction. `br_on_exn` is special: it returns different types of values when taken and not taken. We make `exnref`, the type `br_on_exn` pushes if not taken, as `br_on_exn`'s type.
* Minimal Push/Pop support (#2207)Alon Zakai2019-07-031-0/+2
| | | | | | | This is the first stage of adding support for stacky/multivaluey things. It adds new push/pop instructions, and so far just shows that they can be read and written, and that the optimizer doesn't do anything immediately wrong on them. No fuzzer support, since there isn't a "correct" way to use these yet. The current test shows some "incorrect" usages of them, which is nice to see that we can parse/emit them, but we should replace them with proper usages of push/pop once we actually have those (see comments in the tests). This should be enough to unblock exceptions (which needs a pop in try-catches). It is also a step towards multivalue (I added some docs about that), but most of multivalue is left to be done.
* Limit interpreter depth in precompute, but not when running whole modules ↵Alon Zakai2019-07-011-2/+8
| | | | | | | (#2191) Keep limiting in precompute as before: that is useful since that pass is run as part of normal compilation, and we want to avoid native stack limits on all platforms. Also that pass is not likely to find any pattern of size 50 of higher that it can't precompute as a sum of smaller pieces. Restore the 250 limit from before for interpreting entire modules, as without that the fuzzer will sometimes hit the limit and cause a false positive.
* Reflect instruction renaming in code (#2128)Heejin Ahn2019-05-211-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Reflected new renamed instruction names in code and tests: - `get_local` -> `local.get` - `set_local` -> `local.set` - `tee_local` -> `local.tee` - `get_global` -> `global.get` - `set_global` -> `global.set` - `current_memory` -> `memory.size` - `grow_memory` -> `memory.grow` - Removed APIs related to old instruction names in Binaryen.js and added APIs with new names if they are missing. - Renamed `typedef SortedVector LocalSet` to `SetsOfLocals` to prevent name clashes. - Resolved several TODO renaming items in wasm-binary.h: - `TableSwitch` -> `BrTable` - `I32ConvertI64` -> `I32WrapI64` - `I64STruncI32` -> `I64SExtendI32` - `I64UTruncI32` -> `I64UExtendI32` - `F32ConvertF64` -> `F32DemoteI64` - `F64ConvertF32` -> `F64PromoteF32` - Renamed `BinaryenGetFeatures` and `BinaryenSetFeatures` to `BinaryenModuleGetFeatures` and `BinaryenModuleSetFeatures` for consistency.
* Look through fallthrough values in precompute-propagate (#2093)Alon Zakai2019-05-101-1/+3
| | | This helps quite a lot on wasm2js.
* clang-tidy braces changes (#2075)Alon Zakai2019-05-011-6/+12
| | | Applies the changes in #2065, and temprarily disables the hook since it's too slow to run on a change this large. We should re-enable it in a later commit.
* Apply format changes from #2048 (#2059)Alon Zakai2019-04-261-76/+73
| | | Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
* Do not precompute bulk memory operations (#2023)Thomas Lively2019-04-171-0/+12
| | | Fixes #1984
* don't precompute anything to a vector for now (#1999)Alon Zakai2019-04-101-1/+2
| | | We handled this for the normal case, but the optimizer can also precompute a return into a value, so check the output of the precomputation as well.
* Rename atomic wait/notify instructions (#1972)Heejin Ahn2019-03-301-1/+1
| | | | | | | | This renames the following: - `i32.wait` -> `i32.atomic.wait` - `i64.wait` -> `i64.atomic.wait` - `wake` -> `atomic.notify` to match the spec.
* Consistently optimize small added constants into load/store offsets (#1924)Alon Zakai2019-03-011-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | See #1919 - we did not do this consistently before. This adds a lowMemoryUnused option to PassOptions. It can be passed on the commandline with --low-memory-unused. If enabled, we run the new optimize-added-constants pass, which does the real work here, replacing older code in post-emscripten. Aside from running at the proper time (unlike the old pass, see #1919), this also has a -propagate mode, which can do stuff like this: y = x + 10 [..] load(y) [..] load(y) => y = x + 10 [..] load(x, offset=10) [..] load(x, offset=10) That is, it can propagate such offsets to the loads/stores. This pattern is common in big interpreter loops, where the pointers are offsets into a big struct of state. The pass does this propagation by using a new feature of LocalGraph, which can verify which locals are in SSA mode. Binaryen IR is not SSA (intentionally, since it's a later IR), but if a local only has a single set for all gets, that means that local is in such a state, and can be optimized. The tricky thing is that all locals are initialized to zero, so there are at minimum two sets. But if we verify that the real set dominates all the gets, then the zero initialization cannot reach them, and we are safe. This PR also makes safe-heap aware of lowMemoryUnused. If so, we check for not just an access of 0, but the range 0-1023. This makes zlib 5% faster, with either the wasm backend or asm2wasm. It also makes it 0.5% smaller. Also helps sqlite (1.5% faster) and lua (1% faster)
* Massive renaming (#1855)Thomas Lively2019-01-071-3/+3
| | | | | | Automated renaming according to https://github.com/WebAssembly/spec/issues/884#issuecomment-426433329.
* Do not precompute v128 expressions (#1839)Thomas Lively2018-12-191-0/+4
| | | | | | | Without this change, sequences like `i32.const 0, i32x4.splat` will get precomputed to v128.const ops, which are much larger and also not implemented in V8 yet. Until we have SIMD-aware optimization passes or at least engine support for v128.const, do not perform such transformations.
* SIMD (#1820)Thomas Lively2018-12-131-1/+1
| | | | | | | | | Implement and test the following functionality for SIMD. - Parsing and printing - Assembling and disassembling - Interpretation - C API - JS API
* Add v128 type (#1777)Thomas Lively2018-11-291-1/+0
|
* comment on nondeterminism in precompute pass [ci skip] (#1704)Alon Zakai2018-10-161-0/+5
|
* Unify imported and non-imported things (#1678)Alon Zakai2018-09-191-8/+3
| | | | | | | | | | | | | | Fixes #1649 This moves us to a single object for functions, which can be imported or nor, and likewise for globals (as a result, GetGlobals do not need to check if the global is imported or not, etc.). All imported things now inherit from Importable, which has the module and base of the import, and if they are set then it is an import. For convenient iteration, there are a few helpers like ModuleUtils::iterDefinedGlobals(wasm, [&](Global* global) { .. use global .. }); as often iteration only cares about imported or defined (non-imported) things.
* Change the Literal class's operator== to be bitwise (#1661)Alon Zakai2018-09-011-1/+1
| | | | | The change means that nan values will be compared bitwise when writing A == B, and so the float rule of a nan is different from itself would not apply. I think this is a safer default. In particular this PR fixes a fuzz bug in the rse pass, which placed Literals in a hash table, and due to nan != nan, an infinite loop... Also, looks like we really want a bitwise comparison pretty much everywhere anyhow, as can be seen in the diff here. Really the single place we need a floaty comparison is in the intepreter where we implement f32.eq etc., and there the code was already using the proper code path anyhow.
* Support constant globals in precompute pass (#1622)Daniel Wirtz2018-07-181-23/+31
| | | | | | | | | This PR includes non-mutable globals in precompute, which will allow me to continue removing manual inlining of constants in AssemblyScript without breaking something. Related: #1621, i.e. enum Animal { CAT = 0, DOG = CAT + 1 // requires that `Animal.CAT` is evaluated to // precompute the constant value for `Animal.DOG` }
* Fix MSVC warnings when compiling the binaryen target (#1535)Daniel Wirtz2018-05-091-1/+1
|
* precompute-propagate may benefit from multiple passes (#1518)Alon Zakai2018-04-271-8/+19
| | | One pass may remove code that includes a tee which then makes more optimization possible. Found by the Souper investigations.
* Improve precompute-propagate (#1514)Alon Zakai2018-04-261-5/+31
| | | Propagate constants through a tee_local. Found by Souper. Details in patch comments - basically we didn't differentiate precomputing a value and an expression.