summaryrefslogtreecommitdiff
path: root/src/passes/Precompute.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Fix the type of reused RefFunc in Precompute (#6976)Alon Zakai2024-09-301-1/+2
| | | | | | | | | | | When we precompute something, we try to avoid allocating a new copy. That's important to avoid many allocations each time we run Precompute - otherwise, each time we see a br we'd allocate a fresh one, and for its values. But we had a bug where we reused a RefFunc as the value of a br without updating the type. It's actually tricky to reach a situation where we find a RefFunc to reuse and it is different from the actual one we want, but the fuzzer found one. Fixes the fuzz bug reported on #6845 (but unrelated to that PR).
* [NFC] Make Precompute use a lazy LocalGraph (#6934)Alon Zakai2024-09-121-4/+4
| | | | | | | To do this, add locations and getInfluences to LazyLocalGraph. Both cannot really be computed in a fine-grained manner, so just compute them all on the first request. That is not as efficient as our lazy computation of getSets and setInfluences, but they are also less important, and this change makes the pass 20% faster.
* [NFC] Ignore invalid precomputed fallthrough values (#6933)Alon Zakai2024-09-111-19/+13
| | | | | | Followup to #6928. If the fallthrough precomputes to an invalid type, it makes no sense to precompute the non-fallthrough, as it can't return anything useful.
* [NFC] Optimize propagateLocals() (#6928)Alon Zakai2024-09-111-116/+156
| | | | | | | | | | | | | | That is the slowest part of --precompute-propagate, which is one of our slowest passes. This makes that pass 25% faster. The main change is to make the maps consider missing elements as non-constant, rather than storing a None element in them. That saves allocating entries for the common case of a non-constant set/get. Also switch to a simple vector for the work queue, which is possible if we only add work when a get/set becomes constant (which can only happen once). Another benefit to adding work in that manner is that we don't start by adding every single get/set as "work" at the start.
* [NFC] Standardize Super:: over super:: (#6920)Alon Zakai2024-09-101-2/+2
| | | | As the name of a class, uppercase seems better here.
* [NFC] Convert LocalGraph influences accesses to function calls (#6899)Alon Zakai2024-09-041-2/+2
| | | | | This replaces direct access of the data structure graph.*influences[foo] with a call graph.get*influences(foo). This will allow a later PR to make those calls optionally lazy.
* [NFC] Refactor LocalGraph's core getSets API (#6877)Alon Zakai2024-08-281-1/+1
| | | | | | | | | | | | | | Before we just had a map that people would access with localGraph.getSetses[get], while now it is a call localGraph.getSets(get), which more nicely hides the internal implementation details. Also rename getSetses => getSetsMap. This will allow a later PR to optimize the internals of this API. This is performance-neutral as far as I can measure. (We do replace a direct read from a data structure with a call, but the call is in a header and should always get inlined.)
* [NFC] Avoid quadratic time when precomputing blocks (#6862)Alon Zakai2024-08-211-0/+67
| | | | | | | | | When precomputing fails on a child block of a parent block, there is no point to precompute the parent, as that will fail as well. This makes --precompute on Emscripten's test_biggerswitch go from 1.44 seconds to 0.02 seconds (not a typo, that is 72x faster). The absolute number is not that big, but we do run this pass more than once, so it saves a noticeable chunk of time.
* Fix direct comparisons with unshared basic heap types (#6845)Thomas Lively2024-08-161-1/+1
| | | | | Audit the remaining ocurrences of `== HeapType::` and fix those that did not handle shared types correctly. Add tests for some of the fixes; others are NFC but clarify the code.
* Restore isString type methods (#6815)Thomas Lively2024-08-061-1/+1
| | | | | | | | | PR ##6803 proposed removing Type::isString and HeapType::isString in favor of more explicit, verbose callsites. There was no consensus to make this change, but it was accidentally committed as part of #6804. Revert the accidental change, except for the useful, noncontroversial parts, such as fixing the `isString` implementation and a few other locations to correctly handle shared types.
* [NFC] Add HeapType::getKind returning a new HeapTypeKind enum (#6804)Thomas Lively2024-08-061-1/+1
| | | | | | | | | | | | | | | | | The HeapType API has functions like `isBasic()`, `isStruct()`, `isSignature()`, etc. to test the classification of a heap type. Many users have to call these functions in sequence and handle all or most of the possible classifications. When we add a new kind of heap type, finding and updating all these sites is a manual and error-prone process. To make adding new heap type kinds easier, introduce a new API that returns an enum classifying the heap type. The enum can be used in switch statements and the compiler's exhaustiveness checker will flag use sites that need to be updated when we add a new kind of heap type. This commit uses the new enum internally in the type system, but follow-on commits will add new uses and convert uses of the existing APIs to use `getKind` instead.
* Fix stack-use-after-scope on Windows in Precompute (#6643)mtb2024-06-051-1/+2
| | | | | Create a temp var to store the ChildIterator. Fixes #6639
* [Strings] Remove operations not included in imported strings (#6589)Thomas Lively2024-05-151-3/+3
| | | | | | The stringref proposal has been superseded by the imported JS strings proposal, but the former has many more operations than the latter. To reduce complexity, remove all operations that are part of stringref but not part of imported strings.
* Precompute: Ignore mutable arrays in StringNew (#6522)Alon Zakai2024-04-231-0/+22
| | | | | All Struct/Array operations must ignore mutable fields in Precompute atm, which we did, but StringNew has an array variant that does an effective ArrayGet operation, which we didn't handle.
* Precompute: Mark StringEncode as non-removable, just like ArrayCopy (#6423)Alon Zakai2024-03-221-1/+8
| | | | | | | | | | | | | | The only StringEncode we support is the one that writes into an array, so it has the same effects as ArrayCopy. Precompute needs to be made aware of such side effects in a manual manner (as we already do for ArrayCopy etc.): it simply tries to execute code in the interpreter, and if it succeeds it replaces; it does not check for side effects (checking for side effects would prevent optimizing cases where the side effects do not happen, as we check them statically, e.g. dividing by a non-zero constant does not trap but a division would be seen as having a potential trap effect). I verified no other string operation is hit by this: all the others emit or operate on immutable strings; it is just StringEncode that is basically an Array operation that appears in the Strings proposal.)
* Revert "Strings: Disable precomputing for now (#6412)" (#6413)Alon Zakai2024-03-201-30/+0
| | | | | | | | This reverts commit 70ac213fce134840609190a5d3a18118a089ba8a. Reverts #6412 On second thought we found a way to make fixing this less urgent, and the code size downsides of this are worrying, so let's revert it.
* Strings: Disable precomputing for now (#6412)Alon Zakai2024-03-201-0/+30
| | | | Our UTF implementation is still not fully stable it seems as we have reports of issues. Disable it for now.
* Precompute: Optimize array.len (#6299)Alon Zakai2024-02-121-1/+1
| | | Arrays have immutable length, so we can optimize them like immutable fields.
* Revert "Stop propagating/inlining string constants (#6234)" (#6258)Alon Zakai2024-01-311-4/+2
| | | | | | | | | | This reverts commit 9090ce56fcc67e15005aeedc59c6bc6773220f11. This has the effect of once more propagating string constants from globals to other places (and from non-globals too), which is useful for various optimizations even if it isn't useful in the final output. To fix the final output problem, #6257 added a pass that is run at the end to collect string.const to globals, which allows us to once more propagate strings in the optimizer, now without a downside.
* Stop propagating/inlining string constants (#6234)Alon Zakai2024-01-231-2/+4
| | | | | This causes overhead atm since in VMs executing a string.const will actually allocate a string, and more copies means more allocations. For now, just do not add more. This required changes to two passes: SimplifyGlobals and Precompute.
* Precompute into select arms (#6212)Alon Zakai2024-01-101-12/+318
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | E.g. (i32.add (select (i32.const 100) (i32.const 200) (..condition..) ) (i32.const 50) ) ;; => (select (i32.const 150) (i32.const 250) (..condition..) ) We cannot fully precompute the select, but we can "partially precompute" it, by precomputing its arms using the parent. This may require looking several steps up the parent chain, which is an awkward operation in our simple walkers, so to do it we capture stacks of parents and operate directly on them. This is a little slower than a normal walk, so only do it when we see a promising select, and only in -O2 and above (this makes the pass 7% or so slower; not a large cost, but best to avoid it in -O1).
* Precompute: Check defaultability, not nullability (#6052)Alon Zakai2023-10-251-2/+2
| | | Followup to #6048, we did not handle nondefaultable tuples because of this.
* Fix unreachable code in LocalGraph by making it imprecise there (#6048)Alon Zakai2023-10-241-3/+11
| | | | | | | | | | Followup to #6046 - the fuzzer found we missed handling the case of the entry itself being unreachable, or of an unreachable loop later. Properly identifying unreachable code requires a flow analysis, unfortunately, so this PR gives up on that and instead allows LocalGraph to be imprecise in unreachable code. That avoids adding any overhead, but does mean the IR may be slightly confusing when debugging. It does not have any optimization downsides, however, as it only affects unreachable code. Also add a dump() impl in that file which helps debugging.
* End the current basic block on a Call (#5823)Alon Zakai2023-07-261-1/+1
| | | | | | | | | | | | | Before this PR, if a call had no paths to a catch in the same function then we skipped creating a new basic block right after it. As a result, we could have a call in the middle of a basic block. If EH is enabled that means we might transfer control flow out of the function from the middle of a block. But it is better to have the property that any transfer of control flow - to another basic block, or outside of the function - can only happen at the end of a basic block. This causes some overhead, but a subsequent PR (#5838) will remove that as a followup, and this PR adds a little code to pass the module and check if EH is enabled, and avoid the overhead if not, which at least avoids regressing the non-EH case until that followup lands.
* [NFC] Internally rename `ArrayInit` to `ArrayNewFixed` (#5526)Thomas Lively2023-02-281-2/+2
| | | | | | | | To match the standard instruction name, rename the expression class without changing any parsing or printing behavior. A follow-on PR will take care of the functional side of this change while keeping support for parsing the old name. This change will allow `ArrayInit` to be used as the expression class for the upcoming `array.init_data` and `array.init_elem` instructions.
* [Strings] Initial string execution support (#5491)Alon Zakai2023-02-151-0/+4
| | | | | | | | | | Store string data as GC data. Inefficient (one Const per char), but ok for now. Implement string.new_wtf16 and string.const, enough for basic testing. Create strings in makeConstantExpression, which enables ctor-eval support. Print strings in fuzz-exec which makes testing easier.
* Implement bottom heap types (#5115)Thomas Lively2022-10-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | These types, `none`, `nofunc`, and `noextern` are uninhabited, so references to them can only possibly be null. To simplify the IR and increase type precision, introduce new invariants that all `ref.null` instructions must be typed with one of these new bottom types and that `Literals` have a bottom type iff they represent null values. These new invariants requires several additional changes. First, it is now possible that the `ref` or `target` child of a `StructGet`, `StructSet`, `ArrayGet`, `ArraySet`, or `CallRef` instruction has a bottom reference type, so it is not possible to determine what heap type annotation to emit in the binary or text formats. (The bottom types are not valid type annotations since they do not have indices in the type section.) To fix that problem, update the printer and binary emitter to emit unreachables instead of the instruction with undetermined type annotation. This is a valid transformation because the only possible value that could flow into those instructions in that case is null, and all of those instructions trap on nulls. That fix uncovered a latent bug in the binary parser in which new unreachables within unreachable code were handled incorrectly. This bug was not previously found by the fuzzer because we generally stop emitting code once we encounter an instruction with type `unreachable`. Now, however, it is possible to emit an `unreachable` for instructions that do not have type `unreachable` (but are known to trap at runtime), so we will continue emitting code. See the new test/lit/parse-double-unreachable.wast for details. Update other miscellaneous code that creates `RefNull` expressions and null `Literals` to maintain the new invariants as well.
* Refactor interaction between Pass and PassRunner (#5093)Thomas Lively2022-09-301-1/+3
| | | | | | | | | | | | | | Previously only WalkerPasses had access to the `getPassRunner` and `getPassOptions` methods. Move those methods to `Pass` so all passes can use them. As a result, the `PassRunner` passed to `Pass::run` and `Pass::runOnFunction` is no longer necessary, so remove it. Also update `Pass::create` to return a unique_ptr, which is more efficient than having it return a raw pointer only to have the `PassRunner` wrap that raw pointer in a `unique_ptr`. Delete the unused template `PassRunner::getLast()`, which looks like it was intended to enable retrieving previous analyses and has been in the code base since 2015 but is not implemented anywhere.
* Remove RTTs (#4848)Thomas Lively2022-08-051-3/+2
| | | | | | | RTTs were removed from the GC spec and if they are added back in in the future, they will be heap types rather than value types as in our implementation. Updating our implementation to have RTTs be heap types would have been more work than deleting them for questionable benefit since we don't know how long it will be before they are specced again.
* Update reference type Literal constructors to use HeapType (#4857)Thomas Lively2022-08-011-1/+1
| | | | | | We already require non-null literals to have non-null types, but with this change we can enforce that constraint by construction. Also remove the default behavior of creating a function reference literal with heap type `func`, since there is always a more specific function type to use.
* Remove basic reference types (#4802)Thomas Lively2022-07-201-1/+2
| | | | | | | | | Basic reference types like `Type::funcref`, `Type::anyref`, etc. made it easy to accidentally forget to handle reference types with the same basic HeapTypes but the opposite nullability. In principle there is nothing special about the types with shorthands except in the binary and text formats. Removing these shorthands from the internal type representation by removing all basic reference types makes some code more complicated locally, but simplifies code globally and encourages properly handling both nullable and non-nullable reference types.
* [Precompute][SIMD] Enable constant folding for simd (#4381)Max Graey2021-12-131-9/+0
|
* Modernize code to C++17 (#3104)Max Graey2021-11-221-2/+1
|
* [Wasm GC] Propagate immutable fields (#4251)Alon Zakai2021-10-151-2/+31
| | | | Very simple with the work so far, just add StructGet/ArrayGet code to check if the field is immutable, and allow the get to go through in that case.
* Precompute: Track reference identity (#4243)Alon Zakai2021-10-141-15/+78
| | | | | | | | | | | | | | Precompute will run the interpreter on struct.new etc. repeatedly, as it keeps doing so while it propagates constant values around (if one of the operands to the struct.new becomes constant, that could have a noticeable effect). But creating new GC data means we lose track of their identity, and so ref.eq would not work, and we disabled basically all struct operations. This implements identity tracking so we can start to optimize there, which is a step towards using it for immutable field propagation. To track identity, always store the data representing each struct.new in the source using the same GCData structure. That keeps identity consistent no matter how many times we execute.
* Precompute: Only run a single LocalGraph iteration (#4184)Alon Zakai2021-09-231-19/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | Precompute has a mode in which it propagates results from local.sets to local.gets. That constructs a LocalGraph which is a non-trivial amount of work. We used to run multiple iterations of this, but investigation shows that such opportunities are extremely rare, as doing just a single propagation iteration has no effect on the entire emscripten benchmark suite, nor on j2cl output. Furthermore, we run this pass twice in the normal pipeline (once early, once late) so even if there are such opportunities they may be optimized already. And, --converge is a way to get additional iterations of all passes if a user wants that, so it makes sense not to costly work for more iterations automatically. In effect, 99.99% of the time before this pass we would create the LocalGraph twice: once the first time, then a second time only to see that we can't actually optimize anything further. This PR makes us only create it once, which makes precompute-propagate 10% faster on j2cl and even faster on other things like poppler (33%) and LLVM (29%). See the change in the test suite for an example of a case that does require more than one iteration to be optimized. Note that even there, we only manage to get benefit from a second iteration by doing something that overlaps with another pass (optimizing out an if with condition 0), which shows even more how unnecessary the extra work was. See #4165
* Use UniqueDeferringQueue in Precompute (#4179)Alon Zakai2021-09-221-7/+6
|
* Remove unneeded work in Precompute (#4176)Alon Zakai2021-09-221-1/+0
| | | | | isSSA is not called anywhere. See #4165
* [Wasm GC] ArrayInit support (#4138)Alon Zakai2021-09-101-0/+1
| | | | | | | array.init is like array.new_with_rtt except that it takes as arguments the values to initialize the array with (as opposed to a size and an optional initial value). Spec: https://docs.google.com/document/d/1afthjsL_B9UaMqCA5ekgVmOm75BVFu6duHNsN9-gnXw/edit#
* Rename isIntrinsicallyNondeterministic() to isGenerative() (#4092)Alon Zakai2021-09-091-2/+2
|
* Use the new module version of EffectAnalyzer (#4116)Alon Zakai2021-08-311-1/+1
| | | | | | | | | | | This finishes the refactoring started in #4115 by doing the same change to pass a Module into EffectAnalyzer instead of features. To do so this refactors the fallthrough API and a few other small things. After those changes, this PR removes the old feature constructor of EffectAnalyzer entirely. This requires a small breaking change in the C API, changing BinaryenExpressionGetSideEffects's feature param to a module. That makes this change not NFC, but otherwise it is.
* LocalCSE rewrite (#4079)Alon Zakai2021-08-171-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Technically this is not a new pass, but it is a rewrite almost from scratch. Local Common Subexpression Elimination looks for repeated patterns, stuff like this: x = (a + b) + c y = a + b => temp = a + b x = temp + c y = temp The old pass worked on flat IR, which is inefficient, and was overly complicated because of that. The new pass uses a new algorithm that I think is pretty simple, see the detailed comment at the top. This keeps the pass enabled only in -O4, like before - right after flattening the IR. That is to make this as minimal a change as possible. Followups will enable the pass in the main pipeline, that is, we will finally be able to run it by default. (Note that to make the pass work well after flatten, an extra simplify-locals is added - the old pass used to do part of simplify-locals internally, which was one source of complexity. Even so, some of the -O4 tests have changes, due to minor factors - they are just minor orderings etc., which can be seen by inspecting the outputs before and after using e.g. --metrics) This plus some followup work leads to large wins on wasm GC output. On j2cl there is a common pattern of repeated struct.gets, so common that this pass removes 85% of all struct.gets, which makes the total binary 15% smaller. However, on LLVM-emitted code the benefit is minor, less than 1%.
* [Wasm GC] Handle uses of default values in LocalSubtyping (#4024)Alon Zakai2021-07-281-2/+4
| | | | | | It is ok to use the default value of a reference even if we refine the type, as it would be a more specifically-typed null, and all nulls compare the same. However, if the default is used then we *cannot* alter the type to be non-nullable, as then we'd use a null where that is not allowed.
* [Wasm GC] Fix Heap2Local + non-nullable locals (#4017)Alon Zakai2021-07-231-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Given (local $x (ref $foo)) (local.set $x ..) (.. (local.get $x)) If we remove the local.set but not the get, then we end up with (local $x (ref $foo)) (.. (local.get $x)) It looks like the local.get reads the initial value of a non-nullable local, which is not allowed. In practice, this would crash in precompute-propagate which would try to propagate the initial value to the get. Add an assertion there with a clear message, as until we have full validation of non-nullable locals (and the spec for that is in flux), that pass is where bugs will end up being noticed. To fix this, replace the get as well. We can replace it with a null for simplicity; it will never be used anyhow. This also uncovered a small bug with reached not containing all the things we reached - it was missing local.gets.
* Revert "Partially fix Precompute on SIMD (#3983)" (#4002)Alon Zakai2021-07-191-9/+9
| | | | | | | | | | | | | This reverts commit b68691e826a46d1b03b27c552b1f5b7f51f92665. Instead of applying the workaround to avoid precomputing SIMD in more places, which prevents things we could optimize before, we should probably focus on making the workaround not needed - that is, implement full SIMD support in the intepreter (the reason for that PR), and the other TODO in the comment here, // Until engines implement v128.const and we have SIMD-aware optimizations // that can break large v128.const instructions into smaller consts and // splats, do not try to precompute v128 expressions.
* Partially fix Precompute on SIMD (#3983)Alon Zakai2021-07-131-9/+9
| | | We had the logic in only one place.
* [Wasm GC] Add experimental array.copy (#3911)Alon Zakai2021-05-271-0/+1
| | | | | | | | Spec for it is here: https://docs.google.com/document/d/1DklC3qVuOdLHSXB5UXghM_syCh-4cMinQ50ICiXnK3Q/edit# Also reorder some things in wasm.h that were not in the canonical order (that has no effect, but it is confusing to read).
* [Wasm GC] Fix precomputing of incompatible fallthrough values (#3875)Alon Zakai2021-05-101-3/+21
| | | | | | | | | | | | | | | | | | | | | | Precompute not only computes values, but looks at the fallthrough, (local.set 0 (block ..stuff we can ignore.. ;; the fallthrough we care about - if a value is set to local 0, it is this (i32.const 10) ) ) Normally that is fine, but the fuzzer found a case where it is not: RefCast may return a different type than the fallthrough, even an incompatible type if we try to do something bad like cast a function to a struct. As we may then propagate the value to a place that expects the proper type, this can cause an error. To fix this, check if the precomputed value is a proper subtype. If it is not, then do not look through into the fallthrough, but compute the entire thing. (In the case of a bad RefCast of a func to a struct, it would then indicate a trap happens, and we would not precompute the value.)
* [Wasm GC] Fix precompute on GC data (#3810)Alon Zakai2021-04-151-13/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes precomputation on GC after #3803 was too optimistic. The issue is subtle. Precompute will repeatedly evaluate expressions and propagate their values, flowing them around, and it ignores side effects when doing so. For example: (block ..side effect.. (i32.const 1) ) When we evaluate that we see there are side effects, but regardless of them we know the value flowing out is 1. So we can propagate that value, if it is assigned to a local and read elsewhere. This is not valid for GC because struct.new and array.new have a "side effect" that is noticeable in the result. Each time we call struct.new we get a new struct with a new address, which ref.eq can distinguish. So when this pass evaluates the same thing multiple times it will get a different result. Also, we can't precompute a struct.get even if we know the struct, not unless we know the reference has not escaped (where a call could modify it). To avoid all that, do not precompute references, aside from the trivially safe ones like nulls and function references (simple constants that are the same each time we evaluate the expression emitting them). precomputeExpression() had a minor bug which this fixes. It checked the type of the expression to see if we can create a constant for it, but really it should check the value - since (separate from this PR) we have no way to emit a "constant" for a struct etc. Also that only matters if replaceExpression is true, that is, if we are replacing with a constant; if we just want the value internally, we have no limit on that. Also add Literal support for comparing GC refs, which is used by ref.eq. Without that tiny fix the tests here crash. This adds a bunch of tests, many for corner cases that we don't handle (since the PR makes us not propagate GC references). But they should be helpful if/when we do, to avoid the mistakes in #3803
* [Wasm GC] Full precompute support for GC (#3803)Alon Zakai2021-04-131-6/+8
| | | | | | | | | | | | The precompute pass ignored all reference types, but that was overly pessimistic: we can precompute some of them, namely a null and a reference to a function are fully precomputable, etc. To allow that to work, add missing integration in getFallthrough as well. With this, we can precompute quite a lot of field accesses in the existing -Oz testcase, as can be seen from the output. That testcase runs --fuzz-exec so it prints out all those logged values, proving they have not changed.