summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* Monomorphization: Add a flag to control the required improvement (#6837)Alon Zakai2024-08-141-9/+50
| | | | | | | | | | | | | | | | | | | The argument is the minimum benefit we must see for us to decide to optimize, e.g. --monomorphize --pass-arg=monomorphize-min-benefit@50 When the minimum benefit is 50% then if we reduce the cost by 50% through monomorphization then we optimize there. 95% would only optimize when we remove almost all the cost, etc. In practice I see 95% will actually tend to reduce code size overall, as while we add monomorphized versions of functions, we only do so when we remove a lot of work and size, and after inlining we gain benefits. However, 50% or even lower can lead to better benchmark results, in return for larger code size, just like with inlining. To be careful, the default is set to 95%. Previously we optimized whenever we saw any benefit at all, which is the same as requiring a minimum benefit of 0%. Old tests have the flag applied in this PR to set that value, so they do not change.
* Heap2Local: Track interactions in detail (#6834)Alon Zakai2024-08-131-70/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we tracked only whether an expression was relevant to analysis, that is, whether it interacted with the allocation we were tracing the behavior of. That is not enough for all cases, though, so also track the form of the interaction, namely whether the allocation flows through or is fully consumed. An example where that matters: (ref.eq (struct.get $A 0 (local.tee $x (struct.new_default $A) ) ) (local.get $x) ) Here the local.get flows out the allocation, but the struct.get only fully consumes it. Before this PR we thought the struct.get flowed the allocation, and we misoptimized this to 1. To make this possible, do a bunch of minor refactoring: * Move ParentChildInteraction out of the class. * Add a "None" interaction there. * Replace the set of reached expressions with a map of them to their interactions. * Add helper functions to get an expression's interaction or to update it when replacing. The new testcase here shows the main fix. The new assertions are covered by existing testcases.
* Add missing parser error check in makeArrayInitElem (#6835)Sofi Aberegg2024-08-131-0/+1
| | | Fixes #6833
* [NFC] Separate out GlobalTypeRewriter::mapTypeNames (#6829)Thomas Lively2024-08-122-25/+32
| | | | | | | | | | Previously a module's type names were updated in `GlobalTypeRewriter::rebuildTypes`, which builds new versions of the existing types, rather than `GlobalTypeRewriter::mapTypes`, which otherwise handles replacing old types with new types everywhere in a module, but should not necessarily replace names. So that users of `mapTypes` who are building their own versions of existing types can also easily update type names, split type name mapping logic out into a new method `GlobalTypeRewriter::mapTypeNames`.
* Add a TypeBuilder API for copying a heap type (#6828)Thomas Lively2024-08-124-59/+105
| | | | | | | | | | Given a function that maps the old child heap types to new child heap types, the new API takes care of copying the rest of the structure of a given heap type into a TypeBuilder slot. Use the new API in GlobalTypeRewriter::rebuildTypes. It will also be used in an upcoming type optimization. This refactoring also required adding the ability to clear the supertype of a TypeBuilder slot, which was previously not possible.
* GlobalTypeOptimization: Reorder fields in order to remove them (#6820)Alon Zakai2024-08-123-55/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before, we only removed fields from the end of a struct. If we had, say struct Foo { int x; int y; int z; }; // Add no fields but inherit the parent's. struct Bar : Foo {}; If y is only used in Bar, but never Foo, then we still kept it around, because if we removed it from Foo we'd end up with Foo = {x, z}, Bar = {x, y, z} which is invalid - Bar no longer extends Foo. But we can do this if we first reorder the two: struct Foo { int x; int z; int y; // now y is at the end }; struct Bar : Foo {}; And the optimized form is struct Foo { int x; int z; }; struct Bar : Foo { int y; // now y is added in Bar }; This lets us remove all fields possible in all cases AFAIK. This situation is not super-common, as most fields are actually used both up and down the hierarchy (if they are used at all), but testing on some large real-world codebases, I see 10 fields removed in Java, 45 in Kotlin, and 31 in Dart testcases. The NFC change to src/wasm-type-ordering.h was needed for this to compile.
* Set hasExplicitName for thunks generated in FuncCastEmulation. NFC (#6826)Sam Clegg2024-08-091-0/+1
| | | | Without this all the newly created thunks lack names in the name section.
* Typed continuations: update syntax of handler clauses (#6824)Frank Emrich2024-08-092-5/+3
| | | | | | | | | | | | | | | | | | | | | The syntax for handler clauses in `resume` instructions has recently changed, using `on` instead of `tag` now. Instead of ``` (resume $ct (tag $tag0 $block0) ... (tag $tagn $blockn)) ``` we now have ``` (resume $ct (on $tag0 $block0) ... (on $tagn $blockn)) ``` This PR adapts parsing, printing, and some tests accordingly. (Note that this PR deliberately makes none of the other changes that will arise from implementing the new, combined stack switching proposal, yet.)
* [FP16] Implement relation operations. (#6825)Brendan Dahl2024-08-0913-3/+187
| | | | Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
* [FP16] Implement lane access instructions. (#6821)Brendan Dahl2024-08-0814-0/+107
| | | | Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
* Simplify TopologicalOrders (#6811)Thomas Lively2024-08-072-28/+22
| | | | | | Make `TopologicalOrders` its own iterator rather than having a separate iterator class that wraps a pointer to `TopologicalOrders`. This simplifies usage in cases where an iterator needs to be persistently stored. Notably, all of the tests continue working as they are.
* Add a utility for comparing and hashing rec group shapes (#6808)Thomas Lively2024-08-075-0/+519
| | | | | | | | | | | This is very similar to the internal utilities for canonicalizing rec groups in the type system implementation, except that the new utility also supports ordered comparison of rec groups, and of course the new utility only uses the public type API. A follow-up PR will replace the internal implementation of rec group comparison and hashing in the type system with this one. Another follow-up PR will use this new utility in a type optimization.
* GTO: Remove minor optimization of avoiding ChildLocalizer sometimes (#6818)Alon Zakai2024-08-071-21/+7
| | | | | | | | | | | | | | | | The optimization is to only use ChildLocalizer, which moves children to locals, if we actually have a reason to use it. It is simple enough to see if we are removing fields with side effects here, and only call ChildLocalizer if we are not. However, this will become much more complicated in a subsequent PR which will reorder fields, which allows removing yet more of them (without reordering, we can only remove fields at the end, if any subtype needs the field). This is a pretty minor optimization, as it avoids adding a few locals in the rare case of struct.new operands having side effects. We run --gto at the start of the pipeline, so later opts will clean that up anyhow. (Though, this might make us a little less efficient, but the following PR will justify this regression.)
* [NFC][parser] Rename deftype and subtype (#6819)Thomas Lively2024-08-074-36/+43
| | | | | | Match the current spec and clarify terminology by renaming the old `deftype` to `rectype` and renaming the old `subtype` to `typedef`. Also split the parser for actual `subtype` out of the parser for the newly named `typedef`.
* [parser] Fix bug when printing type builder errors (#6817)Thomas Lively2024-08-061-1/+1
| | | | | | The type index from the TypeBuilder error was mapped to a file location incorrectly, resulting in an assertion failure. Fixes #6816.
* [FP16] Implement load and store instructions. (#6796)Brendan Dahl2024-08-067-41/+166
| | | | Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
* Restore isString type methods (#6815)Thomas Lively2024-08-069-35/+31
| | | | | | | | | PR ##6803 proposed removing Type::isString and HeapType::isString in favor of more explicit, verbose callsites. There was no consensus to make this change, but it was accidentally committed as part of #6804. Revert the accidental change, except for the useful, noncontroversial parts, such as fixing the `isString` implementation and a few other locations to correctly handle shared types.
* [Source maps] Handle single-segment entries in source map header decoder (#6794)Ömer Sinan Ağacan2024-08-061-6/+13
| | | | | Single-segment mappings were already handled in readNextDebugLocation, but not in readSourceMapHeader.
* Fix sharedness bug in inhabitable type fuzzer (#6807)Thomas Lively2024-08-061-1/+2
| | | | | | The code for collecting inhabitable types incorrectly considered shared, non-nullable externrefs to be inhabitable, which disagreed with the code for rewriting types to be inhabitable, which was correct, causing the type fuzzer to report an error.
* [NFC] Add HeapType::getKind returning a new HeapTypeKind enum (#6804)Thomas Lively2024-08-069-144/+134
| | | | | | | | | | | | | | | | | The HeapType API has functions like `isBasic()`, `isStruct()`, `isSignature()`, etc. to test the classification of a heap type. Many users have to call these functions in sequence and handle all or most of the possible classifications. When we add a new kind of heap type, finding and updating all these sites is a manual and error-prone process. To make adding new heap type kinds easier, introduce a new API that returns an enum classifying the heap type. The enum can be used in switch statements and the compiler's exhaustiveness checker will flag use sites that need to be updated when we add a new kind of heap type. This commit uses the new enum internally in the type system, but follow-on commits will add new uses and convert uses of the existing APIs to use `getKind` instead.
* Make source parser consistent with binary parser when naming things. NFC (#6813)Sam Clegg2024-08-061-2/+3
| | | | | The `timport$` prefix is already used for tables, so the binary parser currently uses `eimport$` to name tags (I guess because they are normally exception tags?).
* WasmBinaryReader: Use helper function to create names for items. NFC (#6810)Sam Clegg2024-08-051-14/+15
| | | | | As a followup we could probably make these more consistent. For example, we could use a single char prefix for defined functions/tables/globals (e.g. f0/t0/g0)
* [NFC] Remove unused `isException` type methods (#6802)Thomas Lively2024-08-052-8/+0
|
* Add a utility for iterating over all topological orders (#6801)Thomas Lively2024-08-054-2/+225
| | | | | | | | | Use an extension of Kahn's algorithm for finding topological orders that iteratively makes every possible choice at every step to find all the topological orders. The order being constructed and the set of possible choices are managed in-place in the same buffer, so the algorithm takes linear time and space plus amortized constant time per generated order. This will be used in an upcoming type optimization.
* [LegalizeJSInterface] Use explicit names for stub functions (#6806)Sam Clegg2024-08-051-0/+2
|
* Fix shareability handling in TypeSSA collision logic (#6798)Alon Zakai2024-08-011-0/+1
|
* Add a disjoint sets (union-find) utility (#6797)Thomas Lively2024-08-011-0/+87
| | | | This will be used in an upcoming type optimization pass and may be generally useful.
* [NFC] Avoid a temp local (#6800)Alon Zakai2024-08-011-3/+3
| | | | | The local was only used once, so it didn't really add much. And, it was causing some compilers to error on "unused variable" (when building without assertions, the use was removed).
* Use Names::getValidNameGivenExisting in binary reading (#6793)Alon Zakai2024-07-311-8/+3
| | | | | | We had a TODO to use it once Names was optimized, which it has been. The Names version is also far faster. When building https://github.com/JetBrains/kotlinconf-app it saves 70 seconds(!).
* Fix shareability of externalized nulls (#6791)Alon Zakai2024-07-301-2/+3
|
* Add a customizable title to Metrics reporting (#6792)Alon Zakai2024-07-302-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before the PR: $ bin/wasm-opt test/hello_world.wat --metrics total [exports] : 1 [funcs] : 1 [globals] : 0 [imports] : 0 [memories] : 1 [memory-data] : 0 [tables] : 0 [tags] : 0 [total] : 3 [vars] : 0 Binary : 1 LocalGet : 2 After the PR: $ bin/wasm-opt test/hello_world.wat --metrics Metrics total [exports] : 1 [funcs] : 1 ... Note the "Metrics" addition at the top. And the title can be customized: $ bin/wasm-opt test/hello_world.wat --metrics=text Metrics: text total [exports] : 1 [funcs] : 1 The custom title can be helpful when multiple invocations of metrics are used at once, e.g. --metrics=before -O3 --metrics=after.
* Add a Tarjan's Strongly Connected Component utilty (#6790)Thomas Lively2024-07-301-0/+262
| | | | | | | | | Implement a non-recursive version of Tarjan's Strongly Connected Component algorithm that consumes and produces iterators for maximum flexibility. This will be used in an optimization that transforms the heap type graph to use minimal recursion groups, which correspond to the strongly connected components of the type graph.
* Fix shareability of internalized nulls (#6789)Alon Zakai2024-07-291-2/+3
|
* Generalize Literal::externalize/internalize for strings and shareability (#6784)Alon Zakai2024-07-291-18/+12
|
* [wasm-reduce] Do not crash on non-func element segments (#6778)Thomas Lively2024-07-261-10/+5
| | | | Generalize the code for simplifying element segments to handle more than just null and funcref elements.
* Cost analysis: Remove "Unacceptable" hack (#6782)Alon Zakai2024-07-252-27/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We marked various expressions as having cost "Unacceptable", fixed at 100, to ensure we never moved them out from an If arm, etc. Giving them such a high cost avoids that problem - the cost is higher than the limit we have for moving code from conditional to unconditional execution - but it also means the total cost is unrealistic. For example, a function with one such instruction + an add (cost 1) would end up with cost 101, and removing the add would look insignificant, which causes issues for things that want to compare costs (like Monomorphization). To fix this, adjust some costs. The main change here is to give casts a cost of 5. I measured this in depth, see the attached benchmark scripts, and it looks clear that in both V8 and SpiderMonkey the cost of a cast is high enough to make it not worth turning an if with ref.test arm into a select (which would always execute the test). Other costs adjusted here matter a lot less, because they are on operations that have side effects and so the optimizer will anyhow not move them from conditional to unconditional execution, but I tried to make them a bit more realistic while I was removing "Unacceptable": * Give most atomic operations the 10 cost we've been using for atomic loads/ stores. Perhaps wait and notify should be slower, however, but it seems like assuming fast switching might be more relevant. * Give growth operations a cost of 20, and throw operations a cost of 10. These numbers are entirely made up as I am not even sure how to measure them in a useful way (but, again, this should not matter much as they have side effects).
* TupleOptimization: Properly handle subtyping in copies (#6786)Alon Zakai2024-07-251-2/+7
| | | | | | We used the target's type for the read from the source, but due to subtyping those might be different. Found by the fuzzer.
* Validate RefAsNonNull (#6785)Alon Zakai2024-07-242-3/+18
| | | Fixes #6781
* Properly validate ref.cast when lacking a common supertype (#6741)Alon Zakai2024-07-231-0/+15
| | | | | | | When lacking a common supertype the GLB operation makes the type of the cast unreachable, which errors on getHeapType in the later code. Fixes #6738
* [threads] Calculate shared heap type depths in subtypes.h (#6777)Thomas Lively2024-07-231-7/+14
| | | Fixes #6776.
* Make FunctionInfo's assignment operator argument const (#6780)Derek Schuff2024-07-221-1/+1
| | | | | | | | Aside from the fact that there's no need for this to be non-const and this is the usual way to write an assignment operator, this is also needed because of a recent change to std::pair (https://github.com/llvm/llvm-project/pull/89652). This seems to be forcing pair to want the const version of the assignment operator of its members.
* Heap2Local: Properly handle failing array casts (#6772)Alon Zakai2024-07-181-5/+39
| | | | | | | | Followup to #6727 which added support for failing casts in Struct2Local, but it turns out that it required Array2Struct changes as well. Specifically, when we turn an array into a struct then casts can look like they behave differently (what used to be an array input, becomes a struct), so like with RefTest that we already handled, check if the cast succeeds in the original form and handle that.
* [NFC] Add HeapType::isMaybeShared(BasicHeapType) utility (#6773)Thomas Lively2024-07-188-11/+13
| | | | | | | | | This abbreviates a common pattern where we first had to check whether a heap type was basic, then if it was, get its unshared version and compare it to some expected BasicHeapType. Suggested in https://github.com/WebAssembly/binaryen/pull/6771#discussion_r1683005495.
* [threads] Update the fuzzer for shared types (#6771)Thomas Lively2024-07-182-54/+90
| | | | | | | | Update the fuzzer to both handle shared types in initial contents and create and use new shared types without crashing or producing invalid modules. Since V8 does not have a complete implementation of shared-everything-threads yet, disable fuzzing V8 when shared-everything is enabled. To avoid losing too much coverage of V8, disable shared-everything in the fuzzer more frequently than other features.
* Validate features for types used in element segments (#6769)Thomas Lively2024-07-181-0/+8
|
* Monomorphization: Add a limit on the number of parameters (#6774)Alon Zakai2024-07-181-0/+20
|
* Validate features for types used in tables (#6768)Thomas Lively2024-07-181-13/+8
| | | | We previously special-cased things like GC types, but switch to a more general solution of detecting what features a table's type requires.
* [threads] ref.i31_shared requires shared-everything in validation (#6767)Thomas Lively2024-07-181-0/+6
|
* Monomorphize all the things (#6760)Alon Zakai2024-07-181-28/+260
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously call operands were monomorphized (considered as part of the call context, so we can create a specialized function with those operands fixed) if they were constant or had a different type than the function parameter's type. This generalizes that to pull in pretty much all the code we possibly can, including nested code. For example: (call $foo (struct.new $struct (i32.const 10) (local.get $x) (local.get $y) ) ) This can turn into (call $foo_mono (local.get $x) (local.get $y) ) The struct.new and even one of the struct.new's children is moved into the called function, replacing the original ref argument with two other ones. If the original called function was this: (func $foo (param $ref (ref ..)) .. ) then the monomorphized function then looks like this: (func $foo_mono (param $x i32) (param $y i32) (local $ref (ref ..)) (local.set $ref (struct.new $struct (i32.const 10) (local.get $x) (local.get $y) ) ) .. ) The struct.new and its constant child appear here, and we read the parameters. To do this, generalize the code that creates the call context to accept everything that is impossible to copy (like a local.get) or that would be tricky and likely unworthwhile (like another call or a tuple). Also check for effect interactions, as this code motion does some reordering. For this to work, we need to adjust how we compute the costs we compare when deciding what to monomorphize. Before we just compared the called function to the monomorphized called function, which was good enough when the call context only contained consts, but now it can contain arbitrarily nested code. The proper comparison is between these two: * Old function + call context * New monomorphized function Including the call context makes this a fair comparison. In the example above, the struct.new and the i32.const are part of the call context, and so they are in the monomorphized function, so if we didn't count them in other function we'd decide not to optimize anything with a large context. The new functionality is tested in a new file. A few parts of existing tests needed changes to not become pointless after this improvement, namely by replacing stuff that we now optimize with things that we don't like replacing an i32.eqz with a local.get. There are also a handful of test outcomes that change in CAREFUL mode due to the new cost analysis.
* [threads] Simplify and generalize reftype writing without GC (#6766)Thomas Lively2024-07-181-16/+8
| | | | | | Similar to #6765, but for types instead of heap types. Generalize the logic for transforming written reference types to types that are supported without GC so that it will automatically handle shared types and other new types correctly.