summaryrefslogtreecommitdiff
path: root/src/passes
Commit message (Collapse)AuthorAgeFilesLines
...
* [FP16] Implement arithmetic operations. (#6855)Brendan Dahl2024-08-211-0/+25
| | | | Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
* [Exceptions] Finish interpreter + optimizer support for try_table. (#6814)Sébastien Doeraene2024-08-204-13/+31
| | | | | | * Add interpreter support for exnref values. * Fix optimization passes to support try_table. * Enable the interpreter (but not in V8, see code) on exceptions.
* Print explicit typeuses for non-MVP function types (#6851)Thomas Lively2024-08-191-2/+11
| | | | | | | | | We previously printed explicit typeuses (e.g. `(type $f)`) in function signatures when GC was enabled. But even when GC is not enabled, function types may use non-MVP features that require the explicit typeuse to be printed. Fix the printer to always print the explicit type use for such types. Fixes #6850.
* [NFC] Use HeapType::getKind more broadly (#6846)Thomas Lively2024-08-193-39/+67
| | | | | | | | Replace code that checked `isStruct()`, `isArray()`, etc. in sequence with uses of `HeapType::getKind()` and switch statements. This will make it easier to find the code that needs updating if/when we add new heap type kinds in the future. It also makes it much easier to find code that already needs updating to handle continuation types by grepping for "TODO: cont".
* Add a pass for minimizing recursion groups (#6832)Thomas Lively2024-08-174-0/+807
| | | | | | | | | | | | Most of our type optimization passes emit all non-public types as a single large rec group, which trivially ensures that different types remain different, even if they are optimized to have the same structure. Usually emitting a single large rec group is fine, but it also means that if the module is split, all of the types will need to be repeated in all of the split modules. To better support this use case, add a pass that can split the large rec group back into minimal rec groups, taking care to preserve separate type identities by emitting different permutations of the same group where possible or by inserting unused brand types to differentiate them.
* Fix direct comparisons with unshared basic heap types (#6845)Thomas Lively2024-08-162-5/+10
| | | | | Audit the remaining ocurrences of `== HeapType::` and fix those that did not handle shared types correctly. Add tests for some of the fixes; others are NFC but clarify the code.
* Implement table.init (#6827)Alon Zakai2024-08-164-0/+15
| | | | | Also use TableInit in the interpreter to initialize module's table state, which will now handle traps properly, fixing #6431
* Monomorphization: Add a flag to control the required improvement (#6837)Alon Zakai2024-08-141-9/+50
| | | | | | | | | | | | | | | | | | | The argument is the minimum benefit we must see for us to decide to optimize, e.g. --monomorphize --pass-arg=monomorphize-min-benefit@50 When the minimum benefit is 50% then if we reduce the cost by 50% through monomorphization then we optimize there. 95% would only optimize when we remove almost all the cost, etc. In practice I see 95% will actually tend to reduce code size overall, as while we add monomorphized versions of functions, we only do so when we remove a lot of work and size, and after inlining we gain benefits. However, 50% or even lower can lead to better benchmark results, in return for larger code size, just like with inlining. To be careful, the default is set to 95%. Previously we optimized whenever we saw any benefit at all, which is the same as requiring a minimum benefit of 0%. Old tests have the flag applied in this PR to set that value, so they do not change.
* Heap2Local: Track interactions in detail (#6834)Alon Zakai2024-08-131-70/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we tracked only whether an expression was relevant to analysis, that is, whether it interacted with the allocation we were tracing the behavior of. That is not enough for all cases, though, so also track the form of the interaction, namely whether the allocation flows through or is fully consumed. An example where that matters: (ref.eq (struct.get $A 0 (local.tee $x (struct.new_default $A) ) ) (local.get $x) ) Here the local.get flows out the allocation, but the struct.get only fully consumes it. Before this PR we thought the struct.get flowed the allocation, and we misoptimized this to 1. To make this possible, do a bunch of minor refactoring: * Move ParentChildInteraction out of the class. * Add a "None" interaction there. * Replace the set of reached expressions with a map of them to their interactions. * Add helper functions to get an expression's interaction or to update it when replacing. The new testcase here shows the main fix. The new assertions are covered by existing testcases.
* Add a TypeBuilder API for copying a heap type (#6828)Thomas Lively2024-08-121-14/+1
| | | | | | | | | | Given a function that maps the old child heap types to new child heap types, the new API takes care of copying the rest of the structure of a given heap type into a TypeBuilder slot. Use the new API in GlobalTypeRewriter::rebuildTypes. It will also be used in an upcoming type optimization. This refactoring also required adding the ability to clear the supertype of a TypeBuilder slot, which was previously not possible.
* GlobalTypeOptimization: Reorder fields in order to remove them (#6820)Alon Zakai2024-08-121-52/+149
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before, we only removed fields from the end of a struct. If we had, say struct Foo { int x; int y; int z; }; // Add no fields but inherit the parent's. struct Bar : Foo {}; If y is only used in Bar, but never Foo, then we still kept it around, because if we removed it from Foo we'd end up with Foo = {x, z}, Bar = {x, y, z} which is invalid - Bar no longer extends Foo. But we can do this if we first reorder the two: struct Foo { int x; int z; int y; // now y is at the end }; struct Bar : Foo {}; And the optimized form is struct Foo { int x; int z; }; struct Bar : Foo { int y; // now y is added in Bar }; This lets us remove all fields possible in all cases AFAIK. This situation is not super-common, as most fields are actually used both up and down the hierarchy (if they are used at all), but testing on some large real-world codebases, I see 10 fields removed in Java, 45 in Kotlin, and 31 in Dart testcases. The NFC change to src/wasm-type-ordering.h was needed for this to compile.
* Set hasExplicitName for thunks generated in FuncCastEmulation. NFC (#6826)Sam Clegg2024-08-091-0/+1
| | | | Without this all the newly created thunks lack names in the name section.
* Typed continuations: update syntax of handler clauses (#6824)Frank Emrich2024-08-091-3/+1
| | | | | | | | | | | | | | | | | | | | | The syntax for handler clauses in `resume` instructions has recently changed, using `on` instead of `tag` now. Instead of ``` (resume $ct (tag $tag0 $block0) ... (tag $tagn $blockn)) ``` we now have ``` (resume $ct (on $tag0 $block0) ... (on $tagn $blockn)) ``` This PR adapts parsing, printing, and some tests accordingly. (Note that this PR deliberately makes none of the other changes that will arise from implementing the new, combined stack switching proposal, yet.)
* [FP16] Implement relation operations. (#6825)Brendan Dahl2024-08-091-0/+18
| | | | Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
* [FP16] Implement lane access instructions. (#6821)Brendan Dahl2024-08-081-0/+9
| | | | Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
* GTO: Remove minor optimization of avoiding ChildLocalizer sometimes (#6818)Alon Zakai2024-08-071-21/+7
| | | | | | | | | | | | | | | | The optimization is to only use ChildLocalizer, which moves children to locals, if we actually have a reason to use it. It is simple enough to see if we are removing fields with side effects here, and only call ChildLocalizer if we are not. However, this will become much more complicated in a subsequent PR which will reorder fields, which allows removing yet more of them (without reordering, we can only remove fields at the end, if any subtype needs the field). This is a pretty minor optimization, as it avoids adding a few locals in the rare case of struct.new operands having side effects. We run --gto at the start of the pipeline, so later opts will clean that up anyhow. (Though, this might make us a little less efficient, but the following PR will justify this regression.)
* [FP16] Implement load and store instructions. (#6796)Brendan Dahl2024-08-061-3/+13
| | | | Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
* Restore isString type methods (#6815)Thomas Lively2024-08-061-1/+1
| | | | | | | | | PR ##6803 proposed removing Type::isString and HeapType::isString in favor of more explicit, verbose callsites. There was no consensus to make this change, but it was accidentally committed as part of #6804. Revert the accidental change, except for the useful, noncontroversial parts, such as fixing the `isString` implementation and a few other locations to correctly handle shared types.
* [NFC] Add HeapType::getKind returning a new HeapTypeKind enum (#6804)Thomas Lively2024-08-061-1/+1
| | | | | | | | | | | | | | | | | The HeapType API has functions like `isBasic()`, `isStruct()`, `isSignature()`, etc. to test the classification of a heap type. Many users have to call these functions in sequence and handle all or most of the possible classifications. When we add a new kind of heap type, finding and updating all these sites is a manual and error-prone process. To make adding new heap type kinds easier, introduce a new API that returns an enum classifying the heap type. The enum can be used in switch statements and the compiler's exhaustiveness checker will flag use sites that need to be updated when we add a new kind of heap type. This commit uses the new enum internally in the type system, but follow-on commits will add new uses and convert uses of the existing APIs to use `getKind` instead.
* [LegalizeJSInterface] Use explicit names for stub functions (#6806)Sam Clegg2024-08-051-0/+2
|
* Fix shareability handling in TypeSSA collision logic (#6798)Alon Zakai2024-08-011-0/+1
|
* Add a customizable title to Metrics reporting (#6792)Alon Zakai2024-07-302-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before the PR: $ bin/wasm-opt test/hello_world.wat --metrics total [exports] : 1 [funcs] : 1 [globals] : 0 [imports] : 0 [memories] : 1 [memory-data] : 0 [tables] : 0 [tags] : 0 [total] : 3 [vars] : 0 Binary : 1 LocalGet : 2 After the PR: $ bin/wasm-opt test/hello_world.wat --metrics Metrics total [exports] : 1 [funcs] : 1 ... Note the "Metrics" addition at the top. And the title can be customized: $ bin/wasm-opt test/hello_world.wat --metrics=text Metrics: text total [exports] : 1 [funcs] : 1 The custom title can be helpful when multiple invocations of metrics are used at once, e.g. --metrics=before -O3 --metrics=after.
* Cost analysis: Remove "Unacceptable" hack (#6782)Alon Zakai2024-07-251-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We marked various expressions as having cost "Unacceptable", fixed at 100, to ensure we never moved them out from an If arm, etc. Giving them such a high cost avoids that problem - the cost is higher than the limit we have for moving code from conditional to unconditional execution - but it also means the total cost is unrealistic. For example, a function with one such instruction + an add (cost 1) would end up with cost 101, and removing the add would look insignificant, which causes issues for things that want to compare costs (like Monomorphization). To fix this, adjust some costs. The main change here is to give casts a cost of 5. I measured this in depth, see the attached benchmark scripts, and it looks clear that in both V8 and SpiderMonkey the cost of a cast is high enough to make it not worth turning an if with ref.test arm into a select (which would always execute the test). Other costs adjusted here matter a lot less, because they are on operations that have side effects and so the optimizer will anyhow not move them from conditional to unconditional execution, but I tried to make them a bit more realistic while I was removing "Unacceptable": * Give most atomic operations the 10 cost we've been using for atomic loads/ stores. Perhaps wait and notify should be slower, however, but it seems like assuming fast switching might be more relevant. * Give growth operations a cost of 20, and throw operations a cost of 10. These numbers are entirely made up as I am not even sure how to measure them in a useful way (but, again, this should not matter much as they have side effects).
* TupleOptimization: Properly handle subtyping in copies (#6786)Alon Zakai2024-07-251-2/+7
| | | | | | We used the target's type for the read from the source, but due to subtyping those might be different. Found by the fuzzer.
* Make FunctionInfo's assignment operator argument const (#6780)Derek Schuff2024-07-221-1/+1
| | | | | | | | Aside from the fact that there's no need for this to be non-const and this is the usual way to write an assignment operator, this is also needed because of a recent change to std::pair (https://github.com/llvm/llvm-project/pull/89652). This seems to be forcing pair to want the const version of the assignment operator of its members.
* Heap2Local: Properly handle failing array casts (#6772)Alon Zakai2024-07-181-5/+39
| | | | | | | | Followup to #6727 which added support for failing casts in Struct2Local, but it turns out that it required Array2Struct changes as well. Specifically, when we turn an array into a struct then casts can look like they behave differently (what used to be an array input, becomes a struct), so like with RefTest that we already handled, check if the cast succeeds in the original form and handle that.
* Monomorphization: Add a limit on the number of parameters (#6774)Alon Zakai2024-07-181-0/+20
|
* Monomorphize all the things (#6760)Alon Zakai2024-07-181-28/+260
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously call operands were monomorphized (considered as part of the call context, so we can create a specialized function with those operands fixed) if they were constant or had a different type than the function parameter's type. This generalizes that to pull in pretty much all the code we possibly can, including nested code. For example: (call $foo (struct.new $struct (i32.const 10) (local.get $x) (local.get $y) ) ) This can turn into (call $foo_mono (local.get $x) (local.get $y) ) The struct.new and even one of the struct.new's children is moved into the called function, replacing the original ref argument with two other ones. If the original called function was this: (func $foo (param $ref (ref ..)) .. ) then the monomorphized function then looks like this: (func $foo_mono (param $x i32) (param $y i32) (local $ref (ref ..)) (local.set $ref (struct.new $struct (i32.const 10) (local.get $x) (local.get $y) ) ) .. ) The struct.new and its constant child appear here, and we read the parameters. To do this, generalize the code that creates the call context to accept everything that is impossible to copy (like a local.get) or that would be tricky and likely unworthwhile (like another call or a tuple). Also check for effect interactions, as this code motion does some reordering. For this to work, we need to adjust how we compute the costs we compare when deciding what to monomorphize. Before we just compared the called function to the monomorphized called function, which was good enough when the call context only contained consts, but now it can contain arbitrarily nested code. The proper comparison is between these two: * Old function + call context * New monomorphized function Including the call context makes this a fair comparison. In the example above, the struct.new and the i32.const are part of the call context, and so they are in the monomorphized function, so if we didn't count them in other function we'd decide not to optimize anything with a large context. The new functionality is tested in a new file. A few parts of existing tests needed changes to not become pointless after this improvement, namely by replacing stuff that we now optimize with things that we don't like replacing an i32.eqz with a local.get. There are also a handful of test outcomes that change in CAREFUL mode due to the new cost analysis.
* [threads] Update TypeSSA for shared types (#6753)Thomas Lively2024-07-161-1/+4
| | | | When creating a new subtype, make sure to copy the supertype's shareability.
* Allow different arguments for multiple instances of a pass (#6687)Christian Speckner2024-07-1515-64/+89
| | | | | | | | | | | | Each pass instance can now store an argument for it, which can be different. This may be a breaking change for the corner case of running a pass multiple times and setting the pass's argument multiple times as well (before, the last pass argument affected them all; now, it affects the last instance only). This only affects arguments with the name of a pass; others remain global, as before (and multiple passes can read them, in fact). See the CHANGELOG for details. Fixes #6646
* Monomorphize dropped functions (#6734)Alon Zakai2024-07-122-36/+88
| | | | | | | | | | | | | | | | | | | | | | | | | We now consider a drop to be part of the call context: If we see (drop (call $foo) ) (func $foo (result i32) (i32.const 42) ) Then we'd monomorphize to this: (call $foo_1) ;; call the specialized function instead (func $foo_1 ;; the specialized function returns nothing (drop ;; the drop was moved into here (i32.const 42) ) ) With the drop now in the called function, we may be able to optimize out unused work. Refactor a bit of code out of DAE that we can reuse here, into a new return-utils.h.
* [threads] ref.i31_shared (#6735)Thomas Lively2024-07-121-1/+5
| | | | | | | Implement `ref.i31_shared` the new instruction for creating references to shared i31s. Implement binary and text parsing and emitting as well as interpretation. Copy the upstream spec test for i31 and modify it so that all the heap types are shared. Comment out some parts that we do not yet support.
* SafeHeap: Handle overflows when adding the pointer and the size (#6409)Alon Zakai2024-07-121-11/+33
| | | | | | | | | | | | | E.g. loading 4 bytes from 2^32 - 2 should error: 2 bytes are past the maximum address. Before this PR we added 2^32 - 2 + 4 and overflowed to 2, which we saw as a low and safe address. This PR adds an extra check for an overflow in that add. Also add unreachables after calls to segfault(), which reduces the overhead of the extra check here (the unreachable apparently allows VMs to see that control flow ends, after the segfault() which is truly no-return). Fixes emscripten-core/emscripten#21557
* Do not abbreviate items in element segments (#6737)Thomas Lively2024-07-121-1/+2
| | | | | | | | The full syntax for an expression in an element syntax looks like `(item (ref.null none))`, but we have been printing the abbreviated version, which omits the `(item ...)`. This abbreviation is only valid when the item has only a single instruction, so it is not always correct to use it. Rather than determining whether or not to use the abbreviation on a case-by-case basis, always print the full syntax.
* Memory64Lowering: Handle -1 return value from memory.grow (#6733)Sam Clegg2024-07-111-2/+25
| | | This edge case make the lowering a little more tricky.
* Monomorphize: Use -O3 over -O1 + tweaks (#6732)Alon Zakai2024-07-111-22/+12
| | | | | Eventually we will need to do some tuning of compile time speed, but for now it is going to be simpler to do all the opts, in particular because it makes writing tests simpler.
* [WasmGC] Heap2Local: Optimize RefCast failures (#6727)Alon Zakai2024-07-111-22/+22
| | | | | | | Previously we just did not optimize cases where our escape analysis showed an allocation flowed into a cast that failed. However, after inlining there can be real-world cases where that happens, even in traps-never-happen mode (if the cast is behind a conditional branch), so it seems worth optimizing.
* Heap2Local: Drop RefEq's two arms (#6729)Alon Zakai2024-07-111-3/+3
| | | | | This is a tiny bit more code but it is more consistent with other operations, and it saves work later.
* Monomorphization: Optimize constants (#6711)Alon Zakai2024-07-111-82/+435
| | | | | | | | | | | | | | | | | | | | | | | Previously the pass would monomorphize a call when we were sending more refined types than the target expects. This generalizes the pass to also consider the case where we send a constant in a parameter. To achieve that, this refactors the pass to explicitly define the "call context", which is the code around the call (inputs and outputs) that may end up leading to optimization opportunities when combined with the target function. Also add comments about the overall design + roadmap. The existing test is mostly unmodified, and the diff there is smaller when ignoring whitespace. We do "regress" those tests by adding more local.set operations, as in the refactoring that makes things a lot simpler, that is, to handle the general case of an operand having either a refined type or be a constant, we copy it inside the function, which works either way. This "regression" is only in the testing version of the pass (the normal version runs optimizations, which would remove that extra code). This also enables the pass when GC is disabled. Previously we only handled refined types, so only GC could benefit. Add a test for MVP content specifically to show we operate there as well.
* [WasmGC] Heap2Local: Optimize RefIs and RefTest (#6705)Alon Zakai2024-07-111-1/+62
|
* Rename external conversion instructions (#6716)Jérôme Vouillon2024-07-084-8/+8
| | | | | | | | | Rename instructions `extern.internalize` into `any.convert_extern` and `extern.externalize` into `extern.convert_any` to follow more closely the spec. This was changed in https://github.com/WebAssembly/gc/issues/432. The legacy name is still accepted in text inputs and in the C and JS APIs.
* [DebugInfo] Add debug info to the values emitted in GlobalStructInference ↵Alon Zakai2024-07-021-15/+23
| | | | | | | | | (#6709) Previously the replacement select got the debug info, but we should also copy it to the values, as often optimizations lead to one of those values remaining by itself. Similar to #6652 in general form.
* ConstantFieldPropagation: Add a variation that picks between 2 values using ↵Alon Zakai2024-06-273-16/+255
| | | | | | | | | | | | | | | | | | | | | | | | | | | RefTest (#6692) CFP focuses on finding when a field always contains a constant, and then replaces a struct.get with that constant. If we find there are two constant values, then in some cases we can still optimize, if we have a way to pick between them. All we have is the struct.get and its reference, so we must use a ref.test: (struct.get $T x (..ref..)) => (select (..constant1..) (..constant2..) (ref.test $U (..ref..)) ) This is valid if, of all the subtypes of $T, those that pass the test have constant1 in that field, and those that fail the test have constant2. For example, a simple case is where $T has two subtypes, $T is never created itself, and each of the two subtypes has a different constant value. This is a somewhat risky operation, as ref.test is not necessarily cheap. To mitigate that, this is a new pass, --cfp-reftest that is not run by default, and also we only optimize when we can use a ref.test on what we think will be a final type (because ref.test on a final type can be faster in VMs).
* [NFC] Add HeapType::getFeatures() (#6707)Alon Zakai2024-06-271-1/+1
|
* [WasmGC] Heap2Local: Optimize RefEq (#6703)Alon Zakai2024-06-261-2/+42
| | | | | If an allocation does not escape, then we can compute ref.eq for it: when compared to itself the result is 1, and when compared to anything else it is 0 (since it did not escape, anything else must be different).
* [NFC] Remove a minor compile-time optimization in Heap2Local (#6699)Alon Zakai2024-06-251-43/+6
| | | | | | | | We tracked which expressions we saw an allocated struct/array reach, and then quickly exited when another one did (as when two allocations mix, we can optimize neither). It turns out that this helps very little in actual measurements (looks like within noise - likely we are ruling out the un-optimizable cases early otherwise anyhow). Also the complexity it adds is a problem for an improvement I want to make to the pass, so remove it.
* Add TraceCalls pass (#6619)Marcin Kolny2024-06-214-0/+224
| | | | | | | This pass receives a list of functions to trace, and then wraps them in calls to imports. This can be useful for tracing malloc/free calls, for example, but is generic. Fixes #6548
* GlobalStructInference: Un-nest struct.news in globals when that is helpful ↵Alon Zakai2024-06-201-62/+220
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#6688) If we have (global $g (struct.new $S (i32.const 1) (struct.new $T ..) (ref.func $f) )) then before this PR if we wanted to read the middle field we'd stop, as it is non-constant. However, we can un-nest it, making it constant: (global $g.unnested (struct.new $T ..)) (global $g (struct.new $S (i32.const 1) (global.get $g.unnested) (ref.func $f) )) Now the field is a global.get of an immutable global, which is constant. Using this technique we can handle anything in a struct field, constant or not. The cost of adding a global is likely offset by the benefit of being able to refer to it directly, as that opens up more opportunities later. Concretely, this replaces the constant values we look for in GSI with a variant over constants or expressions (we do still want to group constants, as multiple globals with the same constant field can be treated as a whole). And we note cases where we need to un-nest, and handle those at the end.
* [threads] Shared basic heap types (#6667)Thomas Lively2024-06-191-3/+4
| | | | | | | | | | | Implement binary and text parsing and printing of shared basic heap types and incorporate them into the type hierarchy. To avoid the massive amount of code duplication that would be necessary if we were to add separate enum variants for each of the shared basic heap types, use bit 0 to indicate whether the type is shared and replace `getBasic()` with `getBasic(Unshared)`, which clears that bit. Update all the use sites to record whether the original type was shared and produce shared or unshared output without code duplication.
* GlobalStructInference: Optimize globals too (#6674)Alon Zakai2024-06-171-11/+10
| | | | This is achieved by simply replacing the Literal with PossibleConstantValues, which supports both Literals and Globals.