forks/binaryen.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	Support atomic struct accessors (#7155)	Thomas Lively	2024-12-18	4	-9/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement support for both sequentially consistent and acquire-release variants of `struct.atomic.get` and `struct.atomic.set`, as proposed by shared-everything-threads. Introduce a new `MemoryOrdering` enum for describing different levels of atomicity (or the lack thereof). This new enum should eventually be adopted by linear memory atomic accessors as well to support acquire-release semantics, but for now just use it in `StructGet` and `StructSet`. In addition to implementing parsing and emitting for the instructions, validate that shared-everything is enabled to use them, mark them as having synchronization side effects, and lightly optimize them by relaxing acquire-release accesses to non-shared structs to normal, unordered accesses. This is valid because such accesses cannot possibly synchronize with other threads. Also update Precompute to avoid optimizing out synchronization points. There are probably other passes that need to be updated to avoid incorrectly optimizing synchronizing accesses, but identifying and fixing them is left as future work.
*	[NFC] Move HeapType::isBottom() to header (#7150)	Thomas Lively	2024-12-13	1	-25/+0
\| \| \| \| \|	This makes Precompute about 5% faster on a WasmGC binary. Inspired by #6931.
*	Support control flow inputs in IRBuilder (#7149)	Thomas Lively	2024-12-13	2	-60/+109
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since multivalue was standardized, WebAssembly has supported not only multiple results but also an arbitrary number of inputs on control flow structures, but until now Binaryen did not support control flow input. Binaryen IR still has no way to represent control flow input, so lower it away using scratch locals in IRBuilder. Since both the text and binary parsers use IRBuilder, this gives us full support for parsing control flow inputs. The lowering scheme is mostly simple. A local.set writing the control flow inputs to a scratch local is inserted immediately before the control flow structure begins and a local.get retrieving those inputs is inserted inside the control flow structure before the rest of its body. The only complications come from ifs, in which the inputs must be retrieved at the beginning of both arms, and from loops, where branches to the beginning of the loop must be transformed so their values are written to the scratch local along the way. Resolves #6407.
*	[NFC] Encode reference types with bit packing (#7142)	Thomas Lively	2024-12-10	1	-286/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Value types were previously represented internally as either enum values for "basic," i.e. non-reference, non-tuple types or pointers to `TypeInfo` structs encoding either references or tuples. Update the representation of reference types to use one bit to encode nullability and the rest of the bits to encode the referenced heap type. This allows canonical reference types to be created with a single logical or rather than by taking a lock on a global type store and doing a hash map lookup to canonicalize. This change is a massive performance improvement and dramatically improves how performance scales with threads because the removed lock was highly contended. Even with a single core, the performance of an O3 optimization pipeline on a WasmGC module improves by 6%. With 8 cores, the improvement increases to 29% and with all 128 threads on my machine, the improvement reaches 46%. The full new encoding of types is as follows: - If the type ID is within the range of the basic types, the type is the corresponding basic type. - Otherwise, if bit 0 is set, the type is a tuple and the rest of the bits are a canonical pointer to the tuple. - Otherwise, the type is a reference type. Bit 1 determines the nullability and the rest of the bits encode the heap type. Also update the encodings of basic heap types so they no longer use the low two bits to avoid conflicts with the use of those bits in the encoding of types.
*	[NFC] Simplify TypeGraphWalker in wasm-type.cpp (#7143)	Thomas Lively	2024-12-10	1	-213/+139
\| \| \| \| \| \| \| \| \|	Co-locate the declaration and implementation of TypeGraphWalkerBase and its subtypes in wasm-type.cpp and simplify the implementation. Remove the preVisit and postVisit tasks for both Types and HeapTypes since overriding scanType and scanHeapType is sufficient for all users. Stop scanning the HeapTypes in reference types because a follow-on change (#7142) will make that much more complicated, and it turns out that it is not necessary.
*	Add bulk-memory-opt feature and ignore call-indirect-overlong (#7139)	Derek Schuff	2024-12-06	3	-6/+18
\| \| \| \| \| \| \| \| \| \|	LLVM recently split the bulk-memory-opt feature out from bulk-memory, containing just memory.copy and memory.fill. This change follows that, making bulk-memory-opt also enabled when all of bulk-memory is enabled. It also introduces call-indirect-overlong following LLVM, but ignores it, since Binaryen has always allowed the encoding (i.e. command line flags enabling or disabling the feature are accepted but ignored).
*	Remove incorrect warning when reading name section (#7140)	Thomas Lively	2024-12-06	1	-5/+0
\| \| \| \| \| \| \| \| \|	When we refactored how the name section is read, we accidentally left an old warning about invalid field name indices in place. The old warning code compares the type index from the names section to the size of the parsed type vector to determine if the index is out-of-bounds. Now that we parse the name section before the type section, this is no longer correct. Delete the old warning; we already have a new, correct warning for out-of-bound indices when we parse the type section.
*	[NFC] Encapsulate source map reader state (#7132)	Thomas Lively	2024-12-03	8	-283/+236
\| \| \| \| \| \| \| \| \| \| \| \|	Move all state relevant to reading source maps out of WasmBinaryReader and into a new utility, SourceMapReader. This is a prerequisite for parallelizing the parsing of function bodies, since the source map reader state is different at the beginning of each function. Also take the opportunity to simplify the way we read source maps, for example by deferring the reading of anything but the position of a debug location until it will be used and by using `std::optional` instead of singleton `std::set`s to store function prologue and epilogue debug locations.
*	Fixup block-nested pops even when EH is not enabled (#7130)	Thomas Lively	2024-12-03	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	While parsing a binary file, there may be pops that need to be fixed up even if EH is not (yet) enabled because the target features section has not been parsed yet. Previously `EHUtils::handleBlockNestedPops` did not do anything if EH was not enabled, so the binary parser would fail to fix up pops in that case. Add an optional parameter to override this behavior so the parser can fix up pops unconditionally. Fixes #7127.
*	[NFC] Rename {F32,F64}NearestInt to {F32,F64}Nearest (#7089)	Thomas Lively	2024-11-27	2	-7/+3
\| \| \| \| \| \|	Rename the opcode values in wasm-binary.h to better match the names of the corresponding instructions. This also makes these names match the scheme used by the rest of the basic unary operations, allowing for more macro use in the binary reader.
*	Use IRBuilder in the binary parser (#6963)	Thomas Lively	2024-11-26	2	-4246/+1453
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IRBuilder is a utility for turning arbitrary valid streams of Wasm instructions into valid Binaryen IR. It is already used in the text parser, so now use it in the binary parser as well. Since the IRBuilder API for building each intruction requires only the information that the binary and text formats include as immediates to that instruction, the parser is now much simpler than before. In particular, it does not need to manage a stack of instructions to figure out what the children of each expression should be; IRBuilder handles this instead. There are some differences between the IR constructed by IRBuilder and the IR the binary parser constructed before this change. Most importantly, IRBuilder generates better multivalue code because it avoids eagerly breaking up multivalue results into individual components that might need to be immediately reassembled into a tuple. It also parses try-delegate more correctly, allowing the delegate to target arbitrary labels, not just other `try`s. There are also a couple superficial differences in the generated label and scratch local names. As part of this change, add support for recording binary source locations in IRBuilder.
*	Make more Ifs unreachable (#7094)	Thomas Lively	2024-11-27	2	-32/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously the only Ifs that were typed unreachable were those in which both arms were unreachable and those in which the condition was unreachable that would have otherwise been typed none. This caused problems in IRBuilder because Ifs with unreachable conditions and value-returning arms would have concrete types, effectively hiding the unreachable condition from the logic for dropping concretely typed expressions preceding an unreachable expression when finishing a scope. Relax the conditions under which an If can be typed unreachable so that all Ifs with unreachable conditions or two unreachable arms are typed unreachable. Propagating unreachability more eagerly this way makes various optimizations of Ifs more powerful. It also requires new handling for unreachable Ifs with concretely typed arms in the Printer to ensure that printed wat remains valid. Also update Unsubtyping, Flatten, and CodeFolding to account for the newly unreachable Ifs.
*	Remove AutoDrop (#7106)	Thomas Lively	2024-11-22	1	-2/+1
\| \| \| \|	The only internal use was in wasm2js, which doesn't need it. Fix API tests to explicitly drop expressions as necessary.
*	Make validation of stale types stricter (#7097)	Thomas Lively	2024-11-21	4	-22/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We previously allowed valid expressions to have stale types as long as those stale types were supertypes of the most precise possible types for the expressions. Allowing stale types like this could mask bugs where we failed to propagate precise type information, though. Make validation stricter by requiring all expressions except for control flow structures to have the most precise possible types. Control flow structures are exempt because many passes that can refine types wrap the refined expressions in blocks with the old type to avoid the need for refinalization. This pattern would be broken and we would need to refinalize more frequently without this exception for control flow structures. Now that all non-control flow expressions must have precise types, remove functionality relating to building select instructions with non-precise types. Since finalization of selects now always calculates a LUB rather than using a provided type, remove the type parameter from BinaryenSelect in the C and JS APIs. Now that stale types are no longer valid, fix a bug in TypeSSA where it failed to refinalize module-level code. This bug previously would not have caused problems on its own, but the stale types could cause problems for later runs of Unsubtyping. Now the stale types would cause TypeSSA output to fail validation. Also fix a bug where Builder::replaceWithIdenticalType was in fact replacing with refined types. Fixes #7087.
*	Use hints when generating fresh labels in IRBuilder (#7086)	Thomas Lively	2024-11-18	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	IRBuilder often has to generate new label names for blocks and other scopes. Previously it would generate each new name by starting with "block" or "label" and incrementing a suffix until finding a fresh name, but this made name generation quadratic in the number of names to generate. To spend less time generating names, track a hint index at which to start looking for a fresh name and increment it every time a name is generated. This speeds up a version of the binary parser that uses IRBuilder by about 15%.
*	[NFC] Finalize blocks with explicit breakability in IRBuilder (#7085)	Thomas Lively	2024-11-18	1	-4/+8
\| \| \| \| \| \|	Since IRBuilder already knows what labels are used by branches, it is easy for it to pass that information when finalizing blocks. This avoids finalization having to walk the blocks looking for branches, speeding up a future version of the binary parser that uses IRBuilder by 10%.
*	[NFC] Remove redundant [[nodiscard]] attributes (#7084)	Thomas Lively	2024-11-15	1	-1/+1
\| \| \| \| \| \|	Now that Result and MaybeResult are annotated [[nodiscard]] at the type level, individual functions and methods that return these types do not need to be annotated [[nodiscard]] themselves. Remove the newly redundant annotations.
*	Reset function context when ending a function in IRBuilder (#7081)	Thomas Lively	2024-11-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IRBuilder contains a pointer to the current function that is used to create scratch locals, look up the operand types for returns, etc. This pointer is nullable because IRBuilder can also be used in non-function contexts such as global initializers. Visiting the start of a function sets the function pointer, and after this change visiting the end of a function resets the pointer to null. This avoids potential problems where code outside a function would be able to incorrectly use scratch locals and returns if the IRBuilder had previously been used to build a function. This change requires some adjustments to Outlining, which visits code out of order, so ends up visiting code from inside a function after visiting the end of the function. To support this use case, add a `setFunction` method to IRBuilder that lets the user explicitly control its function context. Also remove the optional function pointer parameter to the IRBuilder constructor since it is less flexible and not used.
*	Use empty blocks instead of nops for empty scopes in IRBuilder (#7080)	Thomas Lively	2024-11-14	1	-3/+4
\| \| \| \| \| \| \| \| \| \|	When IRBuilder builds an empty non-block scope such as a function body, an if arm, a try block, etc, it needs to produce some expression to represent the empty contents. Previously it produced a nop, but change it to produce an empty block instead. The binary writer and printer have special logic to elide empty blocks, so this produces smaller output. Update J2CLOpts to recognize functions containing empty blocks as trivial to avoid regressing one of its tests.
*	Record binary locations for nested blocks (#7078)	Thomas Lively	2024-11-14	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \|	The binary reader has special handling for blocks immediately nested inside other blocks to eliminate recursion while parsing very deep stacks of blocks. This special handling did not record binary locations for the nested blocks, though. Add logic to record binary locations for nested blocks. This binary reading code is about to be replaced with completely different code that uses IRBuilder instead, but this change will eliminate some test differences that we would otherwise see when we make that change.
*	[NFC] Eagerly set local names in binary reader (#7076)	Thomas Lively	2024-11-14	1	-19/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of setting the local names at the end of binary reading, eagerly set them before parsing function bodies. This is NFC now, but will fix a future bug once the binary reader uses IRBuilder. IRBuilder can introduce new scratch locals, and it gives them the names `$scratch`, `$scratch_1`, etc. If the name section includes locals with the same names and we set those local names after parsing function bodies, then we can end up with multiple locals with the same names. Setting the names before parsing the function bodies ensures that IRBuilder will generate different names for the scratch locals. The alternative fix would be to generate fresh names when setting names from the name section, but it is better to respect the names in the name section and use fresh names for the newly introduced scratch locals instead.
*	Fixup pops when necessary in IRBuilder (#7075)	Thomas Lively	2024-11-13	1	-3/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	IRBuilder introduces scratch locals to hoist values from underneath stacky code to the top of the stack for consumption by the next instruction. When it does so, the sequence of instructions from the set to the get of the scratch local is packaged in a block so the entire sequence can be made a child of the next instruction. In cases where the hoisted value comes from a `pop`, this packaging can make the IR invalid, since `pop`s are not allowed to appear inside blocks. Detect when this problem might occur and fix it by running `EHUtils::handleBlockNestedPops` after the function containing the problem has been constructed.
*	Read the names section first (#7074)	Thomas Lively	2024-11-13	1	-326/+325
\| \| \| \| \| \| \| \| \|	Rather than back-patching names when we get to the names section in the binary reader, skip ahead to read the names section before anything else so we can use the final names right away. This is a prerequisite for using IRBuilder in the binary reader. The only functional change is that we now allow empty local names. Empty names are perfectly valid.
*	Rename indexType -> addressType. NFC (#7060)	Sam Clegg	2024-11-07	2	-35/+37
\| \| \|	See https://github.com/WebAssembly/memory64/pull/92
*	Remove FeaturePrefix::FeatureRequired (NFC) (#7034)	Heejin Ahn	2024-11-04	1	-6/+2
\| \| \| \| \| \| \| \|	This has not been emitted in LLVM since https://github.com/llvm/llvm-project/commit/3f34e1b883351c7d98426b084386a7aa762aa366. The corresponding proposed tool-conventions change: https://github.com/WebAssembly/tool-conventions/pull/236
*	Module splitting: don't create new tables when splitting with Emscripten (#7050)	Derek Schuff	2024-11-02	1	-1/+1
\| \| \| \| \| \| \| \|	Emscripten's JS loader code for wasm-split isn't prepared for handling multiple tables that binaryen automatically creates when reference types are enabled (especially in conjunction with dynamic loading). For now, disable creation of multiple tables when using Emscripten's table ABI (distinguished by importing or exporting a table named "__indirect_function_table".
*	Require reference-types in addition to bulk-memory for table.fill (#7040)	daxpedda	2024-10-31	1	-2/+4
\| \| \| \|	table.fill was introduced by the reference-types proposal (but also, only makes sense among the other bulk memory operations, so require both).
*	Remove closed world validation checks (#7019)	Alon Zakai	2024-10-18	1	-50/+1
\| \| \| \| \| \| \| \| \| \| \|	These were added to avoid common problems with closed world mode, but in practice they are causing more harm than good, forcing users to work around them. In the meantime (until #6965), remove this validation to unblock current toolchain makers. Fix GlobalTypeOptimization and AbstractTypeRefining on issues that this uncovers: without this validation, it is possible to run them on more wasm files than before, hence these were not previously detected. They are bundled in this PR because their tests cannot validate before this PR.
*	[EH][GC] Send a non-nullable exnref from TryTable (#7013)	Alon Zakai	2024-10-17	3	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When EH+GC are enabled then wasm has non-nullable types, and the sent exnref should be non-nullable. In BinaryenIR we use the non- nullable type all the time, which we also do for function references and other things; we lower it if GC is not enabled to a nullable type for the binary format (see `WasmBinaryWriter::writeType`, to which comments were added in this PR). That is, this PR makes us handle exnref the same as those other types. A new test verifies that behavior. Various existing tests are updated because ReFinalize will now use the more refined type, so this is an optimization. It is also a bugfix as in #6987 we started to emit the refined form in the fuzzer, and this PR makes us handle it properly in validation and ReFinalization.
*	Optimize Module::get_* family of functions with std::string_view in ↵	Petr Makhnev	2024-10-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	getModuleElement (#6998) Passing a constant string to functions requires memory allocation, and allocation is inherently slow. Since we are using C++17, we can use string_view and remove this unnecessary allocation. Although the code seems simple enough for the optimizer to remove this allocation after inlining, tests on Clang 18 show that this is not the case (on Apple Silicon at least).
*	Source Maps: Support 5 segment mappings (#6795)	Ömer Sinan Ağacan	2024-10-01	2	-14/+65
\| \| \| \| \| \| \|	Support 5-segment source mappings, which add a name. Reference: https://github.com/tc39/source-map/blob/main/source-map-rev3.md#proposed-format
*	[NFC] Move a TypeInfo constructor out of a header (#6979)	Alon Zakai	2024-10-01	1	-0/+2
\| \| \| \|	Some versions of libcxx or clang error without this, apparently due to Type being a forward declaration.
*	Binary parser: Lift the limit on the number of locals (#6973)	Jérôme Vouillon	2024-09-30	1	-6/+14
\| \| \| \| \| \| \|	This raises the number of locals accepted by the binary parser to the absolute limit in the spec. A warning is now printed when writing a binary file if the Web limit of 50,000 locals is exceeded. Fixes #6968.
*	[FP16] Implement conversion operations. (#6974)	Brendan Dahl	2024-09-26	5	-0/+83
\| \| \| \| \| \| \| \| \| \|	Note: FP16 is a little different from F32/F64 since it can't represent the full 2^16 integer range. 65504 is the max whole integer. This leads to some slightly strange behavior when converting integers greater than 65504 since they become infinity. Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
*	[NFC] Eagerly create segments when parsing datacount (#6958)	Thomas Lively	2024-09-19	1	-3/+21
\| \| \| \| \| \| \| \| \|	The purpose of the datacount section is to pre-declare how many data segments there will be so that engines can allocate space for them and not have to back patch subsequent instructions in the code section that refer to them. Once we use IRBuilder in the binary parser, we will have to have the data segments available by the time we parse instructions that use them, so eagerly construct the data segments when parsing the datacount section.
*	[NFC] Eagerly create Functions in binary parser (#6957)	Thomas Lively	2024-09-19	1	-11/+11
\| \| \| \| \| \| \| \|	In preparation for using IRBuilder in the binary parser, eagerly create Functions when parsing the function section so that they are already created once we parse the code section. IRBuilder will require the functions to exist when parsing calls so it can figure out what type each call should have, even when there is a call to a function whose body has not been parsed yet.
*	Improve types for null accesses and remove hacks (#6954)	Thomas Lively	2024-09-18	1	-3/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a struct.get or array.get is optimized to have a null reference operand, its return type loses meaning since the operation will always trap. Previously when refinalizing such expressions, we just left their return type unchanged since there was no longer an associated struct or array type to calculate it from. However, this could lead to a strange setup where the stale return type was the last remaining use of some heap type in the module. That heap type would never be emitted in the binary, but it was still used in the IR, so type optimizations would have to keep updating it. Our type collecting logic went out of its way to include the return types of struct.get and array.get expressions to account for this strange possibility, even though it otherwise collected only types that would appear in binaries. In principle, all of this should have applied to `call_ref` as well, but the type collection logic did not have the necessary special case, so there was probably a latent bug there. Get rid of these special cases in the type collection logic and make it impossible for the IR to use a stale type that no longer appears in the binary by updating such stale types during finalization. One possibility would have been to make the return types of null accessors unreachable, but this violates the usual invariant that unreachable instructions must either have unreachable children or be branches or `(unreachable)`. Instead, refine the return types to be uninhabitable non-nullable references to bottom, which is nearly as good as refining them directly to unreachable. We can consider refining them to `unreachable` in the future, but another problem with that is that it would currently allow the parsers to admit more invalid modules with arbitrary junk after null accessor instructions.
*	[NFC] Make the GCData constructor a move constructor (#6946)	Alon Zakai	2024-09-17	1	-1/+1
\| \| \| \| \| \| \|	This avoids creating a large Literals (SmallVector of Literal) and then copying it. All the places that construct GCData do not need the Literals afterwards. This gives a 7% speedup on the --precompute benchmark from #6931
*	[NFC] Move enough of wasm-type.cpp into wasm-type.h to inline core is*() ↵	Alon Zakai	2024-09-16	1	-112/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	methods (#6936) This just moves code around. As a result, isRef() vanishes entirely from the profiling traces in #6931, since now the core isRef/Tuple/etc. methods are all inlineable. This also required some reordering of wasm-type.h, namely to move HeapType up front. No changes to that class otherwise. TypeInfo is now in the header. getTypeInfo is now a static method on Type. This has the downside of moving internal details into the header, and it may increase compile time a little. The upside is making the --precompute benchmark from #6931 significantly faster, 33%, and it will also help the many Type::isNonNullable() etc. calls we have scattered around the codebase in other passes too.
*	Remove open "ignorable public" array types (#6940)	Thomas Lively	2024-09-16	1	-9/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a few heap types that are hard-coded to be considered public and therefore allowed on module boundaries even in --closed-world mode, specifically to support js-string-builtins. We previously considered both open and closed (i.e. final) mutable i8 arrays to be public in this manner, but js-string-builtins only uses the closed versions, so remove the open versions. This fixes a particular bug in which Unsubtyping optimized a private array type to be equivalent to an ignorable public array type, incorrectly changing the behavior of a cast, but it does not address the larger problem of optimizations producing types that are equivalent to public types. Add a TODO about that problem for now. Fixes #6935.
*	[NFC] Remove excessive debug logging from binary reading (#6927)	Alon Zakai	2024-09-10	1	-177/+6
\| \| \| \| \| \| \| \|	We were doing a debug logging for every LEB byte. It turns out that the isDebugEnabled() calls are expensive when called so frequently: in a release+assertion build, even with debug disabled, these checks are the highest thing in the profile. This PR removes the checks, which makes binary reading 12% faster.
*	Add a --preserve-type-order option (#6916)	Thomas Lively	2024-09-10	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unlike other module elements, types are not stored on the `Module`. Instead, they are collected by traversing the IR before printing and binary writing. The code that collects the types tries to optimize the order of rec groups based on the number of times each type is used. As a result, the output order of types generally has no relation to the input order of types. In addition, most type optimizations rewrite the types into a single large rec group, and the order of types in that group is essentially arbitrary. Changes to the code for counting type uses, sorting types, or sorting rec groups can yield very large changes in the output order of types, producing test diffs that are hard to review and potentially harming the readability of tests by moving output types away from the corresponding input types. To help make test output more stable and readable, introduce a tool option that causes the order of output types to match the order of input types as closely as possible. It is implemented by having the parsers record the indices of the input types on the `Module` just like they already record the type names. The `GlobalTypeRewriter` infrastructure used by type optimizations associates the new types with the old indices just like it already does for names and also respects the input order when rewriting types into a large recursion group. By default, wasm-opt and other tools clear the recorded type indices after parsing the module, so their default behavior is not modified by this change. Follow-on PRs will use the new flag in more tests, which will generate large diffs but leave the tests in stable, more readable states that will no longer change due to other changes to the optimizing type sorting logic.
*	[NFC] LazyLocalGraph: Add getSetInfluences() (#6909)	Alon Zakai	2024-09-09	1	-8/+8
\| \| \| \| \|	This new API on lazy local graphs allows us to use laziness in another place, StackIR opts. This makes writing the binary (which includes StackIR opts, when those are enabled), 10% faster.
*	[FP16] Fix replace lane for F16x8. (#6906)	Brendan Dahl	2024-09-06	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this change, replace lane was converting all the F16 lanes to F32 and then replacing one lane with the F16 (I32 representation) value, but it did not then convert all the other lanes back to F16 (I32). To fix this we can just leave the lanes as I32 and replace the one lane. Note: Previous replace lane tests didn't catch this since they started with vectors with all zeros so the F32->I32 didn't matter. Also, other operations don't run into this issue since they iterate over all lanes and convert the F32's back to F16 (I32). --------- Co-authored-by: Alon Zakai <alonzakai@gmail.com>
*	[EH] Rename Catch(All)_P3 to Catch(All)_Legacy (NFC) (#6901)	Heejin Ahn	2024-09-04	2	-8/+9
\| \| \| \| \| \| \|	This renames `Catch(All)_P3` enum to denote the old Phase 3 `catch(_all)` instructions to `Catch(All)_Legacy`, which sounds clearer. This is also to be consistent with https://github.com/llvm/llvm-project/pull/107187.
*	[NFC] Convert LocalGraph influences accesses to function calls (#6899)	Alon Zakai	2024-09-04	1	-1/+1
\| \| \| \| \|	This replaces direct access of the data structure graph.influences[foo] with a call graph.getinfluences(foo). This will allow a later PR to make those calls optionally lazy.
*	[FP16] Implement madd and nmadd. (#6878)	Brendan Dahl	2024-09-03	3	-6/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md A few notes: - The F32x4 and F64x2 versions of madd and nmadd are missing spect tests. - For madd, the implementation was incorrectly doing `(bc)+a` where it should be `(ab)+c`. - For nmadd, the implementation was incorrectly doing `(-bc)+a` where it should be `-(ab)+c`. - There doesn't appear to be a great way to actually implement a fused nmadd, but the spec allows the double rounded version I added.
*	[NFC] Refactor LocalGraph's core getSets API (#6877)	Alon Zakai	2024-08-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before we just had a map that people would access with localGraph.getSetses[get], while now it is a call localGraph.getSets(get), which more nicely hides the internal implementation details. Also rename getSetses => getSetsMap. This will allow a later PR to optimize the internals of this API. This is performance-neutral as far as I can measure. (We do replace a direct read from a data structure with a call, but the call is in a header and should always get inlined.)
*	Rename relaxed SIMD fma instructions to match spec. (#6876)	Brendan Dahl	2024-08-27	3	-30/+34
\| \| \| \| \| \| \|	The instructions relaxed_fma and relaxed_fnma have been renamed to relaxed_madd and relaxed_nmadd. https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md#binary-format
*	[FP16] Implement unary operations. (#6867)	Brendan Dahl	2024-08-27	5	-7/+97
\| \| \| \|	Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md