forks/binaryen.git -

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[WasmGC] OptimizeInstructions: Reorder externalize/internalize operations ↵	Alon Zakai	2024-10-14	1	-5/+34
\| \| \| \| \| \| \| \| \| \|	with ref.as_non_null (#7004) (any.convert_extern/extern.convert_any (ref.as_non_null ..)) => (ref.as_non_null (any.convert_extern/extern.convert_any ..)) This then allows the RefAsNonNull to be combined with parents in some cases (whereas the reverse allows nothing).
*	[Wasm EH] Optimize values flowing out of TryTable (#6997)	Alon Zakai	2024-10-10	1	-21/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows (block $out (result i32) (try_table (catch..) .. (br $out (i32.const 42) ) ) ) => (block $out (result i32) (try_table (result i32) (catch..) ;; add a result .. (i32.const 42) ;; remove the br around the value ) )
*	ReFinalize in MergeBlocks so we can optimize unreachable instructions too ↵	Alon Zakai	2024-10-10	2	-11/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(#6994) In #6984 we optimized dropped blocks even if they had unreachable code. In #6988 that part was reverted, and blocks with unreachable code were ignored once more. However, I realized that the check was not actually for unreachable code, but for having an unreachable child, so it would miss things like this: (block (block .. (br $somewhere) ;; unreachable type, but no unreachable code ) ) But it is useful to merge such blocks: we don't need the inner block here. To fix this, just run ReFinalize if we change anything, which will propagate unreachability as needed. I think MergeBlocks was written before we had that utility, so it didn't use it... This is not only useful for itself but will unblock an EH optimization in a later PR, that has code in this form. It also simplifies the code by removing the hasUnreachableChild checks.
*	Fix BranchUtils::operateOnScopeNameUsesAndSentValues() on BrOn (#6995)	Alon Zakai	2024-10-10	1	-9/+54
\| \| \| \| \|	BrOn does not always send a value. This is an odd asymmetry in the wasm spec, where br_on_null does not send the null on the branch (which makes sense, but the asymmetry does mean we need to special-case it).
*	Fix flow reset during throw => break opts in RemoveUnusedBrs (#6993)	Alon Zakai	2024-10-08	1	-0/+40
\| \| \| \| \| \| \| \|	#6980 was missing the logic to reset flows after replacing a throw. The process of replacing the throw introduces new code and in particular a drop, which blocks branches from flowing to their targets. In the testcase here, the br was turned into nop before this fix.
*	Fix a misoptimization with mixed Try/TryTable in RemoveUnusedBrs (#6991)	Alon Zakai	2024-10-07	1	-0/+34
\| \| \| \|	We ignored legacy Trys in #6980, but they can also catch.
*	Fix a fuzz issue with #6984 (#6988)	Alon Zakai	2024-10-07	2	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When I refactored the optimizeDroppedBlock logic in #6982, I didn't move the unreachability check with that code, which was wrong. When that function was called from another place in #6984, the fuzzer found an issue. Diff without whitespace is smaller. This reverts almost all the test updates from #6984 - those changes were on blocks with unreachable children. The change was safe on them, but in general removing a block value in the presence of unreachable code is tricky, so it's best to avoid it. The testcase is a little bizarre, but it's the one the fuzzer found and I can't find a way to generate a better one (other than to reduce it, which I did).
*	MergeBlocks: Optimize all dropped blocks (#6984)	Alon Zakai	2024-10-04	3	-12/+34
\| \| \| \| \| \|	Just call optimizeDroppedBlock from visitDrop to handle that. Followup to #6982. This optimizes the new testcase added there. Some older tests also improve.
*	RemoveUnusedBrs: Generalize jump threading optimizations to all branches (#6983)	Alon Zakai	2024-10-04	3	-1/+81
\| \| \| \| \| \| \| \|	This change is NFC on all things we previously optimized, but also makes us optimize TryTable, BrOn, etc., by replacing hard-coded logic for Break with generic code. Also simplify the code there a little - we didn't really need ControlFlowWalker.
*	[NFC] Refactor out the dropped-block optimization code in MergeBlocks (#6982)	Alon Zakai	2024-10-03	1	-3/+24
\| \| \| \| \| \| \| \|	This just moves the code out into a function. A later PR will use it in another place. Add a test that shows the motivation for that later PR: we fail to optimize away a block return value at the top level of a function. Fixing that will involve calling the new function here from another place.
*	[Wasm EH] Optimize throws caught by TryTable into breaks (#6980)	Alon Zakai	2024-10-03	2	-2/+328
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	E.g. (try_table (catch_all $catch) (throw $e) ) => (try_table (catch_all $catch) (br $catch) ) This can then allow other passes to remove the TryTable, if no throwing things remain.
*	Source Maps: Support 5 segment mappings (#6795)	Ömer Sinan Ağacan	2024-10-01	5	-21/+74
\| \| \| \| \| \| \|	Support 5-segment source mappings, which add a name. Reference: https://github.com/tc39/source-map/blob/main/source-map-rev3.md#proposed-format
*	Fix the type of reused RefFunc in Precompute (#6976)	Alon Zakai	2024-09-30	1	-3/+32
\| \| \| \| \| \| \| \| \| \| \|	When we precompute something, we try to avoid allocating a new copy. That's important to avoid many allocations each time we run Precompute - otherwise, each time we see a br we'd allocate a fresh one, and for its values. But we had a bug where we reused a RefFunc as the value of a br without updating the type. It's actually tricky to reach a situation where we find a RefFunc to reuse and it is different from the actual one we want, but the fuzzer found one. Fixes the fuzz bug reported on #6845 (but unrelated to that PR).
*	[FP16] Implement conversion operations. (#6974)	Brendan Dahl	2024-09-26	1	-0/+87
\| \| \| \| \| \| \| \| \| \|	Note: FP16 is a little different from F32/F64 since it can't represent the full 2^16 integer range. 65504 is the max whole integer. This leads to some slightly strange behavior when converting integers greater than 65504 since they become infinity. Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
*	[NFC-ish] Stop creating unneeded blocks around calls when inlining (#6969)	Alon Zakai	2024-09-26	9	-1760/+1430
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Inlining was careful about nested calls like this: (call $a (call $b) ) If we inlined the outer call first, we'd have (block $inlined-code-from-a ..code.. (call $b) ) After that, the inner call is a child of a block, not of a call. That is, we've moved the inner call to another parent. To replace that inner call when we inline, we'd need to update the new parent, which would require work. To avoid that work, the pass simply created a block in the middle: (call $a (block (call $b) ) ) Now the inner call's immediate parent will not change when we inline the outer call. However, it turns out that this was entirely unnecessary. We find the calls using a post-order traversal, and we store the actions in a vector that we traverse in order, so we only ever process things in the optimal order of children before parents. And in that order there is no problem: inlining the inner call first leads to (call $a (block $inlined-code-from-b (..code..) ) ) That does not affect the outer call's parent. This PR removes the creation of the unnecessary blocks. This doesn't improve the final output as optimizations remove the unneeded blocks later anyhow, but it does make the code simpler and a little faster. It also makes debugging less confusing. But this is not truly NFC because --inlining (but not --inlining-optimizing) will actually emit fewer blocks now (but only --inlining-optimizing is used by default in production). The diff on tests here is very small when ignoring whitespace. The remaining differences are just emitting fewer obviously-unneeded blocks. There is also one test that needed manual changes, inlining-eh-legacy, because it tested that we do Pop fixups, but after emitting one fewer block, those fixups were not needed. I added a new test there with two nested calls, which does end up needing those fixups. I also added such a test in inlining_all-features so that we have coverage for such nested calls (we might remove the eh-legacy file some day, and other existing tests with nested calls that I found were more complex).
*	[NFC-ish] Avoid repeated ReFinalize etc. when inlining (#6967)	Alon Zakai	2024-09-24	2	-25/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We may inline multiple times into a single function. Previously, if we did so, we did the "fixups" such as ReFinalize and non-nullable local fixes once per such inlining. But that is wasteful as each ReFinalize etc. scans the whole function, and could be done after we copy all the code from all the inlinings, which is what this PR does: it splits doInlining() into one function that inlines code and one that does the updates after, and the update is done after all inlinings. This turns out to be very important, a 5x speedup on two large real-world wasm files I am looking at. The reason is that we actually inline more than once in half the cases, and sometimes far more - in one case we inline over 1,000 times into a function! (and ReFinalized 1,000 times too many) This is practically NFC, but it turns out that there are some tiny noticeable differences between running ReFinalize once at the end vs. once after each inlining. These differences are not really functional or observable in the behavior of the code, and optimizations would remove them anyhow, but they are noticeable in two tests here. The changes to tests are, in order: * Different block names, just because the counter we use sees more things. * In a testcase with unreachable code, we inline twice into a function, and the first inlining brings in an unreachable, and ReFinalizing early will lead to it propagating differently than if we wait to ReFinalize. (It actually leads to another cycle of inlining in that case, as a fluke.)
*	[NFC] Eagerly create segments when parsing datacount (#6958)	Thomas Lively	2024-09-19	2	-0/+9
\| \| \| \| \| \| \| \| \|	The purpose of the datacount section is to pre-declare how many data segments there will be so that engines can allocate space for them and not have to back patch subsequent instructions in the code section that refer to them. Once we use IRBuilder in the binary parser, we will have to have the data segments available by the time we parse instructions that use them, so eagerly construct the data segments when parsing the datacount section.
*	Improve types for null accesses and remove hacks (#6954)	Thomas Lively	2024-09-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a struct.get or array.get is optimized to have a null reference operand, its return type loses meaning since the operation will always trap. Previously when refinalizing such expressions, we just left their return type unchanged since there was no longer an associated struct or array type to calculate it from. However, this could lead to a strange setup where the stale return type was the last remaining use of some heap type in the module. That heap type would never be emitted in the binary, but it was still used in the IR, so type optimizations would have to keep updating it. Our type collecting logic went out of its way to include the return types of struct.get and array.get expressions to account for this strange possibility, even though it otherwise collected only types that would appear in binaries. In principle, all of this should have applied to `call_ref` as well, but the type collection logic did not have the necessary special case, so there was probably a latent bug there. Get rid of these special cases in the type collection logic and make it impossible for the IR to use a stale type that no longer appears in the binary by updating such stale types during finalization. One possibility would have been to make the return types of null accessors unreachable, but this violates the usual invariant that unreachable instructions must either have unreachable children or be branches or `(unreachable)`. Instead, refine the return types to be uninhabitable non-nullable references to bottom, which is nearly as good as refining them directly to unreachable. We can consider refining them to `unreachable` in the future, but another problem with that is that it would currently allow the parsers to admit more invalid modules with arbitrary junk after null accessor instructions.
*	[wasm-split] Minimize non-function export names (#6951)	Thomas Lively	2024-09-17	2	-13/+13
\| \| \| \| \| \| \| \|	The module splitting utility has a configuration option for minimizing new export names, but it was previously applied only to newly exported functions. Using the new multi-split mode can produce lots of exported tables and splitting WasmGC programs can produce lots of exported globals, so minimizing these export names can have a big impact on code size.
*	[wasm-split] Configure split functions rather than kept functions (#6949)	Thomas Lively	2024-09-17	1	-53/+49
\| \| \| \| \| \| \| \|	The configuration for the module splitting utility previous took a set of functions to keep in the primary module. Change it to take a list of functions to split into the secondary module instead. This improves the code quality in multi-split mode because it keeps stub functions generated by previous splits from being moved into secondary modules during later splits.
*	[wasm-split] Simplify handling of --keep-funcs and --split-funcs (#6948)	Thomas Lively	2024-09-17	3	-8/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Maintain the invariant that every defined functions belongs to either the set of kept functions or the set of split functions. Functions are kept by default except when --keep-funcs is specified without --split-funcs on the command line. This is mostly NFC except that it changes the default behavior when no arguments are specified on the command line to keep all functions. This will simplify a follow-on PR that switches from passing the kept functions to the module splitting utility to passing the split functions.
*	Fix selects of packed fields in GlobalStructOptimization (#6947)	Alon Zakai	2024-09-17	1	-0/+49
\| \| \| \| \|	We emit a select between two objects when only two objects exist of a particular type. However, if the field is packed, we did not handle truncating the written values.
*	[wasm-split] Run RemoveUnusedElements on secondary modules (#6945)	Thomas Lively	2024-09-17	11	-23/+81
\| \| \| \| \| \| \| \| \|	Rather than analyze what module elements from the primary module a secondary module will need, the splitting logic conservatively imports all module elements from the primary module into the secondary module. Run RemoveUnusedElements on the secondary module to remove any of these imports that happen to be unnecessary. Leave a TODO mentioning the possibility of being more selective about which module elements get exported to reduce code size in the primary module, too.
*	[wasm-split] Add a multi-split mode (#6943)	Thomas Lively	2024-09-16	3	-3/+248
\| \| \| \| \| \| \|	Add a mode that splits a module into arbitrarily many parts based on a simple manifest file. This is currently implemented by splitting out one module at a time in a loop, but this could change in the future if splitting out all the modules at once would improve the quality of the output.
*	Require string-style identifiers to be UTF-8 (#6941)	Thomas Lively	2024-09-16	4	-32/+46
\| \| \| \| \| \| \| \| \| \| \|	In the WebAssembly text format, strings can generally be arbitrary bytes, but identifiers must be valid UTF-8. Check for UTF-8 validity when parsing string-style identifiers in the lexer. Update StringLowering to generate valid UTF-8 global names even for strings that may not be valid UTF-8 and test that text round tripping works correctly after StringLowering. Fixes #6937.
*	[wasm-split] Add an option to skip importing placeholders (#6942)	Thomas Lively	2024-09-16	2	-0/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Wasm-split generally assumes that calls to secondary functions made before the secondary module has been loaded and instantiated should go to imported placeholder functions that can be responsible for loading the secondary module and forwarding the call to the loaded function. That scheme makes the loading entirely transparent from the application's point of view, which is not always a good thing. Other schemes would make it impossible for a secondary function to be called before the secondary module has been explicitly loaded, in which case the placeholder functions would never be called. To improve code size and simplify instantiation under these schemes, add a new `--no-placeholders` option that skips adding imported placeholder functions.
*	Remove open "ignorable public" array types (#6940)	Thomas Lively	2024-09-16	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a few heap types that are hard-coded to be considered public and therefore allowed on module boundaries even in --closed-world mode, specifically to support js-string-builtins. We previously considered both open and closed (i.e. final) mutable i8 arrays to be public in this manner, but js-string-builtins only uses the closed versions, so remove the open versions. This fixes a particular bug in which Unsubtyping optimized a private array type to be equivalent to an ignorable public array type, incorrectly changing the behavior of a cast, but it does not address the larger problem of optimizations producing types that are equivalent to public types. Add a TODO about that problem for now. Fixes #6935.
*	Fix Heap2Local on pops inside of newly-created blocks (#6938)	Alon Zakai	2024-09-16	1	-0/+124
\|
*	Fix parser error on block params (#6932)	Thomas Lively	2024-09-11	1	-0/+12
\| \| \| \| \| \| \| \| \| \|	The error checking we had to report an error when the input contains block parameters was in a code path that is no longer executed under normal circumstances. Specifically, it was part of the `ParseModuleTypesCtx` phase of parsing, which no longer parses function bodies. Move the error checking to the `ParseDefsCtx` phase, which does parse function bodies. Fixes #6929.
*	[EH] Fix pop enclosed within a block in DCE (#6922)	Heejin Ahn	2024-09-10	1	-3/+69
\| \| \| \| \| \| \| \| \| \| \|	#6400 fixed this case but that handled only when a `pop` is an immediate child of the current expression, but a `pop` can be nested deeper down. We conservatively run the EH fixup whenever we have a `pop` and create `block`s in the optimization. We considered using `FindAll<Pop>` to make it precise, but we decided the quadratic time plexity was not worth it. Fixes #6918.
*	Replace the old topological sort everywhere (#6902)	Thomas Lively	2024-09-10	7	-190/+185
\| \| \| \| \| \| \| \| \|	To avoid having two separate topological sort utilities in the code base, replace remaining uses of the old DFS-based, CRTP topological sort with the newer Kahn's algorithm implementation. This would be NFC, except that the new topological sort produces a different order than the old topological sort, so the output of some passes is reordered.
*	[NFC] OptimizeAddedConstants: Early exit if there are no memories (#6926)	Alon Zakai	2024-09-10	1	-0/+21
\| \| \| \| \| \| \| \|	The pass optimizes loads and stores, so without a memory there is nothing to do. This only helps if the user set --low-memory-unused and also has no memory, which is likely rare, but it's a trivial change so it seems worthwhile. In particular this pass constructs a LocalGraph, so if we can avoid work it can be substantial.
*	Use --preserve-type-order in more tests (#6923)	Thomas Lively	2024-09-10	4	-80/+76
\| \| \| \| \|	Update the remaining tests whose readability will be affected by the removal of the old topological sort in #6902, no matter how small their diffs would have been.
*	Use --preserve-type-order in select tests (#6917)	Thomas Lively	2024-09-10	4	-224/+177
\| \| \| \| \| \|	These are the tests that would otherwise have the largest diffs when changing the topological sort used to sort types. signature-refining_gto.wat also cannot be automatically updated, so there is extra benefit to making sure it has stable output.
*	Add a --preserve-type-order option (#6916)	Thomas Lively	2024-09-10	11	-4/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unlike other module elements, types are not stored on the `Module`. Instead, they are collected by traversing the IR before printing and binary writing. The code that collects the types tries to optimize the order of rec groups based on the number of times each type is used. As a result, the output order of types generally has no relation to the input order of types. In addition, most type optimizations rewrite the types into a single large rec group, and the order of types in that group is essentially arbitrary. Changes to the code for counting type uses, sorting types, or sorting rec groups can yield very large changes in the output order of types, producing test diffs that are hard to review and potentially harming the readability of tests by moving output types away from the corresponding input types. To help make test output more stable and readable, introduce a tool option that causes the order of output types to match the order of input types as closely as possible. It is implemented by having the parsers record the indices of the input types on the `Module` just like they already record the type names. The `GlobalTypeRewriter` infrastructure used by type optimizations associates the new types with the old indices just like it already does for names and also respects the input order when rewriting types into a large recursion group. By default, wasm-opt and other tools clear the recorded type indices after parsing the module, so their default behavior is not modified by this change. Follow-on PRs will use the new flag in more tests, which will generate large diffs but leave the tests in stable, more readable states that will no longer change due to other changes to the optimizing type sorting logic.
*	[NFC-ish] Remove LocalGraph from LocalSubtyping (#6921)	Alon Zakai	2024-09-10	1	-17/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The LocalGraph there was used for two purposes: 1. Get the list of gets and sets. 2. Get only the reachable gets and sets. It is trivial to get all the gets and sets in a much faster way, by just walking the code as this PR does. The downside is that we also consider unreachable gets and sets, so unreachable code can prevent us from optimizing, but that seems worthwhile as many passes make that assumption (and they all become maximally effective after --dce). That is the only non-NFC part here. Removing LocalGraph + the fixup code for unreachability makes this significantly shorter, and also 2-3x faster.
*	Adds a J2CL specific pass that moves itable entries to vtables (#6888)	Roberto Lublinerman	2024-09-06	4	-0/+242
\| \| \| \| \| \| \| \|	This allows to remove a reference field from all Java objects reducing the per object memory and initialization overhead. The pass is designed to run direclty on the J2CL output before other optimizations since it relies on invariants that might get lost in optimization. If the invariants don't hold the pass aborts.
*	Avoid conflicts with public rec groups in MinimizeRecGroups (#6911)	Thomas Lively	2024-09-06	1	-0/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As with all type optimizations, MinimizeRecGroups only changes private types, which are the only types that are safe to modify. However, it is important for correctness that MinimimizeRecGroups maintain separate type identities for all types, whether public or private, to ensure that casts that should differentiate two types cannot change behavior. Previously the pass worked exclusively on private types, so there was nothing preventing it from constructing a minimial rec group that happened to have the same shape, and therefore type identity, as a public rec group. #6886 exhibits a fuzzer test case where this happens and changes the behavior of the program. Fix the bug by recording all public rec group shapes and resolve conflicts with these shapes by updating the shape of the conflicting non-public type. Fixes #6886.
*	[NFC] Avoid wasted LocalGraph work in MergeLocals (#6908)	Alon Zakai	2024-09-05	1	-0/+30
\| \| \| \| \| \|	We computed both get and set influences, but getGetInfluences() was never called, so that work was entirely pointless. This makes the pass 20% faster.
*	Fix supertype counts when collecting heap types (#6905)	Thomas Lively	2024-09-05	8	-82/+73
\| \| \| \| \| \| \|	We previous incremented the use count for a declared supertype only if it was also a type we had never seen before. Fix the count by treating the supertype the same as any other type used in a type definition. Update tests accordingly, including by manually moving input types around to better match the output.
*	[NFC] Add a lazy mode to LocalGraph (#6895)	Alon Zakai	2024-09-05	1	-0/+82
\| \| \| \| \| \| \| \| \| \|	LocalGraph by default will compute all the local.sets that can be read from all local.gets. However, many passes only query a small amount of those. To avoid wasted work, add a lazy mode that only computes sets when asked about a get. This is then used in a single place, LoopInvariantCodeMotion, which becomes 18% faster.
*	Only generate string.consts custom section if it is needed (#6893)	Goktug Gokdogan	2024-09-05	1	-0/+22
\|
*	Use TopologicalSort::minSort to order rec groups (#6892)	Thomas Lively	2024-09-04	30	-465/+466
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rec groups need to be topologically sorted for the output module to be valid, but the specific order of rec groups also affects the module size because types at lower indices requires fewer bytes to reference. We previously optimized for code size when gathering types by sorting the list of groups before doing the topological sort. This was brittle, though, and depended on implementation details of the topological sort to be correct. Replace the old topological sort with use of the new `TopologicalSort::minSort` utility, which is a more principled method of achieving a minimal topological sort with respect to some comparator. Also draw inspiration from ReorderGlobals and apply an exponential factor to take the users of a rec group into account when determining its weight.
*	[NFC] Move optimizeSubsequentStructSet() to a new pass, ↵	Alon Zakai	2024-09-03	5	-2/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HeapStoreOptimization (#6882) This just moves code out of OptimizeInstructions to the new pass. The existing test is renamed and now runs the new pass instead. The new pass is run right after each --optimize-instructions invocation, so it should not cause any noticeable effects whatsoever, making this NFC. The motivation here is that there is a bug in the pass, see the new testcase added at the end, which shows the bug. It is not practical to fix that bug in OptimizeInstructions since we need more than peephole optimizations to do so. This PR moves the code to a new pass so we can fix it there properly, later. The new pass is named HeapStoreOptimization since the same infrastructure we will need to fix the bug will also help dead store elimination and related things.
*	[FP16] Implement madd and nmadd. (#6878)	Brendan Dahl	2024-09-03	1	-30/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md A few notes: - The F32x4 and F64x2 versions of madd and nmadd are missing spect tests. - For madd, the implementation was incorrectly doing `(bc)+a` where it should be `(ab)+c`. - For nmadd, the implementation was incorrectly doing `(-bc)+a` where it should be `-(ab)+c`. - There doesn't appear to be a great way to actually implement a fused nmadd, but the spec allows the double rounded version I added.
*	Move lit test temporary files to out/test/ (#6887)	Thomas Lively	2024-08-29	1	-1/+1
\| \| \| \|	Previously for in-tree builds, they were put directly into test/, which unnecessarily pollutes the tree.
*	Rename relaxed SIMD fma instructions to match spec. (#6876)	Brendan Dahl	2024-08-27	1	-28/+28
\| \| \| \| \| \| \|	The instructions relaxed_fma and relaxed_fnma have been renamed to relaxed_madd and relaxed_nmadd. https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md#binary-format
*	Check for required actions when parsing wast (#6874)	Thomas Lively	2024-08-27	1	-0/+8
\| \| \| \| \| \| \| \| \|	The parser function for `action` returned a `MaybeResult`, but we were treating it as returning a normal `Result` and not checking that it had contents in several places. Replace the current `action()` with `maybeAction()` and add a new `action()` that requires the action to be present. Fixes #6872.
*	[FP16] Implement unary operations. (#6867)	Brendan Dahl	2024-08-27	1	-30/+183
\| \| \| \|	Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
*	[NFC] Optimize ParamUtils::getUsedParams() (#6866)	Alon Zakai	2024-08-26	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \|	This constructed a LocalGraph, which computes the sets that reach each get. But all we need to know is which params are live, so instead we can do a liveness computation (which is just a boolean, not the list of sets). Also, it is simple to get the liveness computation to only work on the parameters and not all the locals, as a further optimization. Existing tests cover this, though I did find that the case of unreachability needed a new test. On a large testcase I am looking at, this makes --dae 17% faster.