summaryrefslogtreecommitdiff
path: root/src/ir
Commit message (Collapse)AuthorAgeFilesLines
* Do not optimize atomic gets in GUFA (#7161)Thomas Lively2024-12-191-0/+24
| | | | | | | | | Conservatively avoid introducing synchronization bugs by not optimizing atomic struct.gets at all in GUFA. It is possible that we could be more precise in the future. Also remove obsolete logic dealing with the types of null values as a drive-by. All null values now have bottom types, so the type mismatch this code checked for is impossible.
* Support atomic struct accessors (#7155)Thomas Lively2024-12-181-0/+15
| | | | | | | | | | | | | | | | | | | | | Implement support for both sequentially consistent and acquire-release variants of `struct.atomic.get` and `struct.atomic.set`, as proposed by shared-everything-threads. Introduce a new `MemoryOrdering` enum for describing different levels of atomicity (or the lack thereof). This new enum should eventually be adopted by linear memory atomic accessors as well to support acquire-release semantics, but for now just use it in `StructGet` and `StructSet`. In addition to implementing parsing and emitting for the instructions, validate that shared-everything is enabled to use them, mark them as having synchronization side effects, and lightly optimize them by relaxing acquire-release accesses to non-shared structs to normal, unordered accesses. This is valid because such accesses cannot possibly synchronize with other threads. Also update Precompute to avoid optimizing out synchronization points. There are probably other passes that need to be updated to avoid incorrectly optimizing synchronizing accesses, but identifying and fixing them is left as future work.
* Fix GUFA on calls to function refs in open world (#7135)Alon Zakai2024-12-041-13/+30
| | | | In open world we must assume that a funcref that escapes to the outside might be called.
* [NFC] Encapsulate source map reader state (#7132)Thomas Lively2024-12-031-24/+12
| | | | | | | | | | | | Move all state relevant to reading source maps out of WasmBinaryReader and into a new utility, SourceMapReader. This is a prerequisite for parallelizing the parsing of function bodies, since the source map reader state is different at the beginning of each function. Also take the opportunity to simplify the way we read source maps, for example by deferring the reading of anything but the position of a debug location until it will be used and by using `std::optional` instead of singleton `std::set`s to store function prologue and epilogue debug locations.
* Fixup block-nested pops even when EH is not enabled (#7130)Thomas Lively2024-12-032-4/+11
| | | | | | | | | | While parsing a binary file, there may be pops that need to be fixed up even if EH is not (yet) enabled because the target features section has not been parsed yet. Previously `EHUtils::handleBlockNestedPops` did not do anything if EH was not enabled, so the binary parser would fail to fix up pops in that case. Add an optional parameter to override this behavior so the parser can fix up pops unconditionally. Fixes #7127.
* Make more Ifs unreachable (#7094)Thomas Lively2024-11-271-1/+1
| | | | | | | | | | | | | | | | | | | Previously the only Ifs that were typed unreachable were those in which both arms were unreachable and those in which the condition was unreachable that would have otherwise been typed none. This caused problems in IRBuilder because Ifs with unreachable conditions and value-returning arms would have concrete types, effectively hiding the unreachable condition from the logic for dropping concretely typed expressions preceding an unreachable expression when finishing a scope. Relax the conditions under which an If can be typed unreachable so that all Ifs with unreachable conditions or two unreachable arms are typed unreachable. Propagating unreachability more eagerly this way makes various optimizations of Ifs more powerful. It also requires new handling for unreachable Ifs with concretely typed arms in the Printer to ensure that printed wat remains valid. Also update Unsubtyping, Flatten, and CodeFolding to account for the newly unreachable Ifs.
* Remove AutoDrop (#7106)Thomas Lively2024-11-221-88/+0
| | | | The only internal use was in wasm2js, which doesn't need it. Fix API tests to explicitly drop expressions as necessary.
* Propagate public visibility through all types (#7105)Thomas Lively2024-11-211-11/+8
| | | | | | | | | | | | | | Previously the classification of public types propagated public visibility only through types that had previously been collected by `collectHeapTypes`. Since there are settings that cause `collectHeapTypes` to collect fewer types, it was possible for public types to be missed if they were only public because they were reached by an uncollected types. Ensure that all public heap types are properly classified by propagating public visibility even through types that are not part of the collected output. Fixes #7103.
* LocalGraph::canMoveSet (#7039)Alon Zakai2024-11-112-41/+203
| | | | | This new API lets us ask if a set can be safely moved to a new position. The new position must be the location of an expression from a particular class (this allows us to populate the IR once and then query any of those locations).
* Rename indexType -> addressType. NFC (#7060)Sam Clegg2024-11-073-7/+7
| | | See https://github.com/WebAssembly/memory64/pull/92
* [wasm64] Fix copying of 64-bit tables, and fuzz them (#7065)Alon Zakai2024-11-071-0/+1
| | | | `ModuleUtils::copyTable` was not copying the `indexType` property.
* [wasm64] Fix wasm-ctor-eval + utils on 64-bit indexes for memory64 (#7059)Alon Zakai2024-11-061-2/+7
| | | | Some places assumed a 32-bit index.
* Module splitting: don't create new tables when splitting with Emscripten (#7050)Derek Schuff2024-11-021-6/+18
| | | | | | | | Emscripten's JS loader code for wasm-split isn't prepared for handling multiple tables that binaryen automatically creates when reference types are enabled (especially in conjunction with dynamic loading). For now, disable creation of multiple tables when using Emscripten's table ABI (distinguished by importing or exporting a table named "__indirect_function_table".
* [NFC] Use more precise types for Expression IDs (#7038)Alon Zakai2024-10-301-1/+1
| | | | | | | | | | Make the ID enum an `int8_t`, and make the Specific ID a `constexpr` of that type. This seems more idiomatic and makes some code simpler, see the change to `find_all.h` which no longer needs a cast to compile. This has no performance impact.
* [EH][GC] Add missing subtyping constraints from TryTable (#7012)Alon Zakai2024-10-161-1/+7
| | | | | Similar to Break, BrOn, etc., we must apply subtyping constraints of the types we send to blocks, so that Unsubtyping will not remove subtypings that are actually needed.
* Fix BranchUtils::operateOnScopeNameUsesAndSentValues() on BrOn (#6995)Alon Zakai2024-10-101-1/+2
| | | | | BrOn does not always send a value. This is an odd asymmetry in the wasm spec, where br_on_null does not send the null on the branch (which makes sense, but the asymmetry does mean we need to special-case it).
* Source Maps: Support 5 segment mappings (#6795)Ömer Sinan Ağacan2024-10-012-13/+69
| | | | | | | Support 5-segment source mappings, which add a name. Reference: https://github.com/tc39/source-map/blob/main/source-map-rev3.md#proposed-format
* [FP16] Implement conversion operations. (#6974)Brendan Dahl2024-09-262-0/+8
| | | | | | | | | | Note: FP16 is a little different from F32/F64 since it can't represent the full 2^16 integer range. 65504 is the max whole integer. This leads to some slightly strange behavior when converting integers greater than 65504 since they become infinity. Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
* [NFC] Use an unordered map in Parents (#6970)Alon Zakai2024-09-261-1/+1
| | | | | | This makes it slightly faster. Followup to https://github.com/WebAssembly/binaryen/pull/6953#discussion_r1765313401
* [NFC] Eagerly create Functions in binary parser (#6957)Thomas Lively2024-09-191-1/+4
| | | | | | | | In preparation for using IRBuilder in the binary parser, eagerly create Functions when parsing the function section so that they are already created once we parse the code section. IRBuilder will require the functions to exist when parsing calls so it can figure out what type each call should have, even when there is a call to a function whose body has not been parsed yet.
* [NFC] Add isSSA to LazyLocalGraph, and use it in OptimizeAddedConstants (#6952)Alon Zakai2024-09-182-0/+47
| | | This makes the pass 15% faster.
* [NFC + bugfix] Remove BreakTargetLocation from GUFA (#6956)Alon Zakai2024-09-182-52/+34
| | | | | | | | | | | | | | | | | | | | | | Before, a br would send its value to a BreakTargetLocation. That would then be linked to the target: { br's value } => BreakTargetLocation(target name) => { location of target } This PR skips the middle: { br's value } => { location of target } It just connects breaks directly to the targets. We can do that if we keep a map of the targets as we go. This is 2% faster as well as simplifies the code, as an NFC refactoring. But it also fixes a bug: we have handling on ExpressionLocation that filters values as they come in (they must accord with the expression's type). We were not doing that on BreakTargetLocation, leading to an assert. Removing BreakTargetLocation entirely is easier and better than adding filtering logic for it. Fixes #6955
* Improve types for null accesses and remove hacks (#6954)Thomas Lively2024-09-181-19/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a struct.get or array.get is optimized to have a null reference operand, its return type loses meaning since the operation will always trap. Previously when refinalizing such expressions, we just left their return type unchanged since there was no longer an associated struct or array type to calculate it from. However, this could lead to a strange setup where the stale return type was the last remaining use of some heap type in the module. That heap type would never be emitted in the binary, but it was still used in the IR, so type optimizations would have to keep updating it. Our type collecting logic went out of its way to include the return types of struct.get and array.get expressions to account for this strange possibility, even though it otherwise collected only types that would appear in binaries. In principle, all of this should have applied to `call_ref` as well, but the type collection logic did not have the necessary special case, so there was probably a latent bug there. Get rid of these special cases in the type collection logic and make it impossible for the IR to use a stale type that no longer appears in the binary by updating such stale types during finalization. One possibility would have been to make the return types of null accessors unreachable, but this violates the usual invariant that unreachable instructions must either have unreachable children or be branches or `(unreachable)`. Instead, refine the return types to be uninhabitable non-nullable references to bottom, which is nearly as good as refining them directly to unreachable. We can consider refining them to `unreachable` in the future, but another problem with that is that it would currently allow the parsers to admit more invalid modules with arbitrary junk after null accessor instructions.
* [wasm-split] Minimize non-function export names (#6951)Thomas Lively2024-09-171-2/+5
| | | | | | | | The module splitting utility has a configuration option for minimizing new export names, but it was previously applied only to newly exported functions. Using the new multi-split mode can produce lots of exported tables and splitting WasmGC programs can produce lots of exported globals, so minimizing these export names can have a big impact on code size.
* [wasm-split] Configure split functions rather than kept functions (#6949)Thomas Lively2024-09-172-4/+4
| | | | | | | | The configuration for the module splitting utility previous took a set of functions to keep in the primary module. Change it to take a list of functions to split into the secondary module instead. This improves the code quality in multi-split mode because it keeps stub functions generated by previous splits from being moved into secondary modules during later splits.
* [wasm-split] Run RemoveUnusedElements on secondary modules (#6945)Thomas Lively2024-09-171-0/+11
| | | | | | | | | Rather than analyze what module elements from the primary module a secondary module will need, the splitting logic conservatively imports all module elements from the primary module into the secondary module. Run RemoveUnusedElements on the secondary module to remove any of these imports that happen to be unnecessary. Leave a TODO mentioning the possibility of being more selective about which module elements get exported to reduce code size in the primary module, too.
* [wasm-split] Add an option to skip importing placeholders (#6942)Thomas Lively2024-09-162-22/+39
| | | | | | | | | | | | | | Wasm-split generally assumes that calls to secondary functions made before the secondary module has been loaded and instantiated should go to imported placeholder functions that can be responsible for loading the secondary module and forwarding the call to the loaded function. That scheme makes the loading entirely transparent from the application's point of view, which is not always a good thing. Other schemes would make it impossible for a secondary function to be called before the secondary module has been explicitly loaded, in which case the placeholder functions would never be called. To improve code size and simplify instantiation under these schemes, add a new `--no-placeholders` option that skips adding imported placeholder functions.
* Remove open "ignorable public" array types (#6940)Thomas Lively2024-09-161-0/+4
| | | | | | | | | | | | | | | | There are a few heap types that are hard-coded to be considered public and therefore allowed on module boundaries even in --closed-world mode, specifically to support js-string-builtins. We previously considered both open and closed (i.e. final) mutable i8 arrays to be public in this manner, but js-string-builtins only uses the closed versions, so remove the open versions. This fixes a particular bug in which Unsubtyping optimized a private array type to be equivalent to an ignorable public array type, incorrectly changing the behavior of a cast, but it does not address the larger problem of optimizations producing types that are equivalent to public types. Add a TODO about that problem for now. Fixes #6935.
* [NFC] Make Precompute use a lazy LocalGraph (#6934)Alon Zakai2024-09-122-6/+72
| | | | | | | To do this, add locations and getInfluences to LazyLocalGraph. Both cannot really be computed in a fine-grained manner, so just compute them all on the first request. That is not as efficient as our lazy computation of getSets and setInfluences, but they are also less important, and this change makes the pass 20% faster.
* Replace the old topological sort everywhere (#6902)Thomas Lively2024-09-102-131/+75
| | | | | | | | | To avoid having two separate topological sort utilities in the code base, replace remaining uses of the old DFS-based, CRTP topological sort with the newer Kahn's algorithm implementation. This would be NFC, except that the new topological sort produces a different order than the old topological sort, so the output of some passes is reordered.
* Add a --preserve-type-order option (#6916)Thomas Lively2024-09-103-36/+156
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike other module elements, types are not stored on the `Module`. Instead, they are collected by traversing the IR before printing and binary writing. The code that collects the types tries to optimize the order of rec groups based on the number of times each type is used. As a result, the output order of types generally has no relation to the input order of types. In addition, most type optimizations rewrite the types into a single large rec group, and the order of types in that group is essentially arbitrary. Changes to the code for counting type uses, sorting types, or sorting rec groups can yield very large changes in the output order of types, producing test diffs that are hard to review and potentially harming the readability of tests by moving output types away from the corresponding input types. To help make test output more stable and readable, introduce a tool option that causes the order of output types to match the order of input types as closely as possible. It is implemented by having the parsers record the indices of the input types on the `Module` just like they already record the type names. The `GlobalTypeRewriter` infrastructure used by type optimizations associates the new types with the old indices just like it already does for names and also respects the input order when rewriting types into a large recursion group. By default, wasm-opt and other tools clear the recorded type indices after parsing the module, so their default behavior is not modified by this change. Follow-on PRs will use the new flag in more tests, which will generate large diffs but leave the tests in stable, more readable states that will no longer change due to other changes to the optimizing type sorting logic.
* [NFC] Make LazyLocalGraph even lazier (#6919)Alon Zakai2024-09-102-6/+26
| | | | | | | | | | | | | | | Do not even construct the Flower helper class until we actually need it. This avoids even scanning the function and building the internal CFG if we never get any API call that needs it. This speeds up LICM by 50% (as now we never construct the CFG if we don't find a loop), and Stack IR-enabled binary writing by 10% (as many functions do not have locals in positions that can be optimized using LocalGraph). This moves |locations| from the base class to LocalGraph. It is not needed in the lazy version, so that makes sense for now (we can't keep it in the base, as then it would need to be mutable, which only makes sense for laziness).
* [NFC] LazyLocalGraph: Add getSetInfluences() (#6909)Alon Zakai2024-09-092-17/+92
| | | | | This new API on lazy local graphs allows us to use laziness in another place, StackIR opts. This makes writing the binary (which includes StackIR opts, when those are enabled), 10% faster.
* [NFC] Rename topological_orders.h to topological_sort.h (#6915)Thomas Lively2024-09-071-1/+1
| | | | | Now that the header includes topological sort utilities in addition to the utility that iterates over all topological orders, it makes more sense for it to be named topological_sort.h
* [NFC] Rename the old topological sort utility (#6914)Thomas Lively2024-09-062-3/+2
| | | | This will allow both the old and new topological sort utilities to be included into the same .cpp file while we phase out the old utility.
* Fix supertype counts when collecting heap types (#6905)Thomas Lively2024-09-051-12/+1
| | | | | | | We previous incremented the use count for a declared supertype only if it was also a type we had never seen before. Fix the count by treating the supertype the same as any other type used in a type definition. Update tests accordingly, including by manually moving input types around to better match the output.
* [NFC] Add a more powerful API for collecting heap types (#6904)Thomas Lively2024-09-052-116/+177
| | | | | | | | | | | | | | | | | | | | | | | | | Many passes need to know both the set of all used types and also the sets of private or public types. Previously there was no API to get both at once, so getting both required two API calls that internally collected all the types twice. Furthermore, there are many reasons to collect heap types, and they have different requirements about precisely which types need to be collected. For example, in some edge cases the IR can reference heap types that do not need to be emitted into a binary; passes that replace all types would need to collect these types, but the binary writer would not. The existing APIs for collecting types did not distinguish between these use cases, so the code conservatively collected extra types that were not always needed. Refactor the type collecting code to expose a new API that takes a description of which types need to be collected and returns the appropriate types, their use counts, and optionally whether they are each public or private. Keep this change non-functional by commenting on places where the code could be cleaned up or improved rather than actually making the changes. Follow-up PRs will implement the improvements, which will necessarily come with test changes.
* [NFC] Add a lazy mode to LocalGraph (#6895)Alon Zakai2024-09-052-56/+207
| | | | | | | | | | LocalGraph by default will compute all the local.sets that can be read from all local.gets. However, many passes only query a small amount of those. To avoid wasted work, add a lazy mode that only computes sets when asked about a get. This is then used in a single place, LoopInvariantCodeMotion, which becomes 18% faster.
* Use TopologicalSort::minSort to order rec groups (#6892)Thomas Lively2024-09-041-73/+59
| | | | | | | | | | | | | | | | Rec groups need to be topologically sorted for the output module to be valid, but the specific order of rec groups also affects the module size because types at lower indices requires fewer bytes to reference. We previously optimized for code size when gathering types by sorting the list of groups before doing the topological sort. This was brittle, though, and depended on implementation details of the topological sort to be correct. Replace the old topological sort with use of the new `TopologicalSort::minSort` utility, which is a more principled method of achieving a minimal topological sort with respect to some comparator. Also draw inspiration from ReorderGlobals and apply an exponential factor to take the users of a rec group into account when determining its weight.
* [NFC] Convert LocalGraph influences accesses to function calls (#6899)Alon Zakai2024-09-041-5/+25
| | | | | This replaces direct access of the data structure graph.*influences[foo] with a call graph.get*influences(foo). This will allow a later PR to make those calls optionally lazy.
* [NFC] Refactor LocalGraph to split up flow() for future laziness work (#6880)Alon Zakai2024-09-032-93/+164
|
* [FP16] Implement madd and nmadd. (#6878)Brendan Dahl2024-09-031-0/+2
| | | | | | | | | | | | | | Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md A few notes: - The F32x4 and F64x2 versions of madd and nmadd are missing spect tests. - For madd, the implementation was incorrectly doing `(b*c)+a` where it should be `(a*b)+c`. - For nmadd, the implementation was incorrectly doing `(-b*c)+a` where it should be `-(a*b)+c`. - There doesn't appear to be a great way to actually implement a fused nmadd, but the spec allows the double rounded version I added.
* [NFC] Refactor LocalGraph's core getSets API (#6877)Alon Zakai2024-08-283-33/+49
| | | | | | | | | | | | | | Before we just had a map that people would access with localGraph.getSetses[get], while now it is a call localGraph.getSets(get), which more nicely hides the internal implementation details. Also rename getSetses => getSetsMap. This will allow a later PR to optimize the internals of this API. This is performance-neutral as far as I can measure. (We do replace a direct read from a data structure with a call, but the call is in a header and should always get inlined.)
* Rename relaxed SIMD fma instructions to match spec. (#6876)Brendan Dahl2024-08-271-4/+4
| | | | | | | The instructions relaxed_fma and relaxed_fnma have been renamed to relaxed_madd and relaxed_nmadd. https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md#binary-format
* [FP16] Implement unary operations. (#6867)Brendan Dahl2024-08-272-0/+14
| | | | Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
* [FP16] Implement arithmetic operations. (#6855)Brendan Dahl2024-08-212-0/+24
| | | | Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
* [NFC] hash constant string as void* (#6863)Thomas Lively2024-08-211-1/+1
| | | | | | | | possible-contents.h hashes the location for caught exnrefs by hashing an arbitrary string, "caught-exnref-location". It previously used `std::hash<const char*>` for this, but some standard library implementations report an error when this template instantiation is used because hashing the location of a string is almost never correct. In this case it is fine, so switch to using `std::hash<const void*>`.
* [Exceptions] Finish interpreter + optimizer support for try_table. (#6814)Sébastien Doeraene2024-08-205-3/+86
| | | | | | * Add interpreter support for exnref values. * Fix optimization passes to support try_table. * Enable the interpreter (but not in V8, see code) on exceptions.
* [NFC] Use HeapType::getKind more broadly (#6846)Thomas Lively2024-08-192-11/+23
| | | | | | | | Replace code that checked `isStruct()`, `isArray()`, etc. in sequence with uses of `HeapType::getKind()` and switch statements. This will make it easier to find the code that needs updating if/when we add new heap type kinds in the future. It also makes it much easier to find code that already needs updating to handle continuation types by grepping for "TODO: cont".
* Implement table.init (#6827)Alon Zakai2024-08-166-0/+20
| | | | | Also use TableInit in the interpreter to initialize module's table state, which will now handle traps properly, fixing #6431