forks/binaryen.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	[NFC] Move optimizeSubsequentStructSet() to a new pass, ↵	Alon Zakai	2024-09-03	5	-2/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HeapStoreOptimization (#6882) This just moves code out of OptimizeInstructions to the new pass. The existing test is renamed and now runs the new pass instead. The new pass is run right after each --optimize-instructions invocation, so it should not cause any noticeable effects whatsoever, making this NFC. The motivation here is that there is a bug in the pass, see the new testcase added at the end, which shows the bug. It is not practical to fix that bug in OptimizeInstructions since we need more than peephole optimizations to do so. This PR moves the code to a new pass so we can fix it there properly, later. The new pass is named HeapStoreOptimization since the same infrastructure we will need to fix the bug will also help dead store elimination and related things.
*	[FP16] Implement madd and nmadd. (#6878)	Brendan Dahl	2024-09-03	2	-30/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md A few notes: - The F32x4 and F64x2 versions of madd and nmadd are missing spect tests. - For madd, the implementation was incorrectly doing `(bc)+a` where it should be `(ab)+c`. - For nmadd, the implementation was incorrectly doing `(-bc)+a` where it should be `-(ab)+c`. - There doesn't appear to be a great way to actually implement a fused nmadd, but the spec allows the double rounded version I added.
*	[NFC] Change topological sort utilities to functions (#6889)	Thomas Lively	2024-09-03	1	-9/+7
\| \| \| \| \| \| \|	Previously they were structs and their results were accessed with `operator*()`, but that was unnecessarily complicated and could lead to problems with temporary lifetimes being too short. Simplify the utilities by making them functions. This also allows the wrapper templates to infer the proper element types automatically.
*	Add a utility for finding minimal topological sorts (#6884)	Thomas Lively	2024-08-29	1	-0/+55
\| \| \| \| \| \| \| \|	Reuse the code implementing Kahn's topological sort algorithm with a new configuration that uses a min-heap to always choose the best available element. Also add wrapper utilities that can find topological sorts of graphs with arbitrary element types, not just indices.
*	Move lit test temporary files to out/test/ (#6887)	Thomas Lively	2024-08-29	1	-1/+1
\| \| \| \|	Previously for in-tree builds, they were put directly into test/, which unnecessarily pollutes the tree.
*	[NFC] Refactor LocalGraph's core getSets API (#6877)	Alon Zakai	2024-08-28	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before we just had a map that people would access with localGraph.getSetses[get], while now it is a call localGraph.getSets(get), which more nicely hides the internal implementation details. Also rename getSetses => getSetsMap. This will allow a later PR to optimize the internals of this API. This is performance-neutral as far as I can measure. (We do replace a direct read from a data structure with a call, but the call is in a header and should always get inlined.)
*	Rename relaxed SIMD fma instructions to match spec. (#6876)	Brendan Dahl	2024-08-27	3	-36/+36
\| \| \| \| \| \| \|	The instructions relaxed_fma and relaxed_fnma have been renamed to relaxed_madd and relaxed_nmadd. https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md#binary-format
*	Check for required actions when parsing wast (#6874)	Thomas Lively	2024-08-27	1	-0/+8
\| \| \| \| \| \| \| \| \|	The parser function for `action` returned a `MaybeResult`, but we were treating it as returning a normal `Result` and not checking that it had contents in several places. Replace the current `action()` with `maybeAction()` and add a new `action()` that requires the action to be present. Fixes #6872.
*	[FP16] Implement unary operations. (#6867)	Brendan Dahl	2024-08-27	2	-30/+223
\| \| \| \|	Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
*	[NFC] Optimize ParamUtils::getUsedParams() (#6866)	Alon Zakai	2024-08-26	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \|	This constructed a LocalGraph, which computes the sets that reach each get. But all we need to know is which params are live, so instead we can do a liveness computation (which is just a boolean, not the list of sets). Also, it is simple to get the liveness computation to only work on the parameters and not all the locals, as a further optimization. Existing tests cover this, though I did find that the case of unreachability needed a new test. On a large testcase I am looking at, this makes --dae 17% faster.
*	Fix null dereference in FunctionValidator (#6849)	mtb	2024-08-26	1	-0/+14
\| \| \| \| \| \| \| \| \| \|	visitBlock() and validateCallParamsAndResult() both assumed they were running inside a function, but might be called on global code too. Calls and blocks are invalid in global positions, so we should error there, but must do so properly without a null deref. Fixes #6847 Fixes #6848
*	[FP16] Add a feature flag for FP16. (#6864)	Brendan Dahl	2024-08-22	15	-2/+55
\| \| \|	Ensure the "fp16" feature is enabled for FP16 instructions.
*	Add a string lowering mode disallowing non-UTF-8 strings (#6861)	Thomas Lively	2024-08-21	4	-0/+22
\| \| \| \| \| \| \| \| \| \| \|	The best way to lower strings is via the "magic imports" API that uses the names of imported string globals as their values. This approach only works for valid UTF-8 strings, though. The existing string-lowering-magic-imports pass falls back to putting non-UTF-8 strings in a JSON custom section, but this requires the runtime to support that custom section for correctness. To help catch errors early when runtimes do not support the strings custom section, add a new pass that uses magic imports and raises an error if there are any invalid strings.
*	[FP16] Implement arithmetic operations. (#6855)	Brendan Dahl	2024-08-21	2	-1/+266
\| \| \| \|	Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
*	Support `ref.extern n` in spec tests (#6858)	Thomas Lively	2024-08-21	5	-2212/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spec tests pass the value `ref.extern n`, where `n` is some integer, into exported functions that expect to receive externrefs and receive such values back out as return values. The payload serves to distinguish externrefs so the test can assert that the correct one was returned. Parse these values in wast scripts and represent them as externalized i31refs carrying the payload. We will need a different representation eventually, since some tests explicitly expect these externrefs to not be i31refs, but this suffices to get several new tests passing. To get the memory64 version of table_grow.wast passing, additionally fix the interpreter to handle growing 64-bit tables correctly. Delete the local versions of the upstream tests that can now be run successfully.
*	Fix encoding of heap type definitions (#6856)	Thomas Lively	2024-08-20	1	-963/+0
\| \| \| \| \| \| \| \|	The leading bytes that indicate what kind of heap type is being defined are bytes, but we were previously treating them as SLEB128-encoded values. Since we emit the smallest LEB encodings possible, we were writing the correct bytes in output files, but we were also improperly accepting binaries that used more than one byte to encode these values. This was caught by an upstream spec test.
*	Add the upstream spec testsuite as a submodule (#6853)	Thomas Lively	2024-08-20	60	-32153/+35
\| \| \| \| \| \|	Run the upstream tests by default, except for a large list of them that do not successfully run. Remove the local version of those that do successfully run where the local version is entirely subsumed by the upstream version.
*	[Exceptions] Finish interpreter + optimizer support for try_table. (#6814)	Sébastien Doeraene	2024-08-20	25	-385/+2539
\| \| \| \| \| \|	* Add interpreter support for exnref values. * Fix optimization passes to support try_table. * Enable the interpreter (but not in V8, see code) on exceptions.
*	Validate array.init_elem segment in IRBuilder (#6852)	Thomas Lively	2024-08-19	1	-1/+11
\| \| \| \| \| \| \| \| \|	IRBuilder is responsible for validation involving type annotations on GC instructions because those type annotations may not be preserved in the built IR to be used by the main validator. For `array.init_elem`, we were not using the type annotation to validate the element segment, which allowed us to parse invalid modules when the reference operand was a nullref. Add the missing validation in IRBuilder and fix a relevant spec test.
*	Print explicit typeuses for non-MVP function types (#6851)	Thomas Lively	2024-08-19	1	-0/+51
\| \| \| \| \| \| \| \| \|	We previously printed explicit typeuses (e.g. `(type $f)`) in function signatures when GC was enabled. But even when GC is not enabled, function types may use non-MVP features that require the explicit typeuse to be printed. Fix the printer to always print the explicit type use for such types. Fixes #6850.
*	Add a pass for minimizing recursion groups (#6832)	Thomas Lively	2024-08-17	5	-0/+1873
\| \| \| \| \| \| \| \| \| \| \| \|	Most of our type optimization passes emit all non-public types as a single large rec group, which trivially ensures that different types remain different, even if they are optimized to have the same structure. Usually emitting a single large rec group is fine, but it also means that if the module is split, all of the types will need to be repeated in all of the split modules. To better support this use case, add a pass that can split the large rec group back into minimal rec groups, taking care to preserve separate type identities by emitting different permutations of the same group where possible or by inserting unused brand types to differentiate them.
*	Fix direct comparisons with unshared basic heap types (#6845)	Thomas Lively	2024-08-16	2	-18/+131
\| \| \| \| \|	Audit the remaining ocurrences of `== HeapType::` and fix those that did not handle shared types correctly. Add tests for some of the fixes; others are NFC but clarify the code.
*	Implement table.init (#6827)	Alon Zakai	2024-08-16	9	-60/+423
\| \| \| \| \|	Also use TableInit in the interpreter to initialize module's table state, which will now handle traps properly, fixing #6431
*	[NFC] Avoid v128 in rec groups with no other v128 uses (#6843)	Alon Zakai	2024-08-15	1	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't properly validate that yet. E.g.: (module (rec (type $func (func)) (type $unused (sub (struct (field v128)))) ) (func $func (type $func)) ) That v128 is not used, but it ends up in the output because it is in a rec group that is used. Atm we do not require that SIMD be enabled in such a case, which can trip up the fuzzer. Context: #6820. For now, modify the test that uncovered this.
*	Save build ID in a source map (#6799)	Marcin Kolny	2024-08-15	2	-0/+7
\| \| \| \| \| \| \|	This is based on these two proposals: * https://github.com/WebAssembly/tool-conventions/blob/main/BuildId.md * https://github.com/tc39/source-map/blob/main/proposals/debug-id.md
*	Heap type `none` requires GC (#6840)	Thomas Lively	2024-08-14	3	-4/+12
\| \| \| \| \| \|	Since reference types only introduced function and extern references, all of the types in the `any` hierarchy require GC, including `none`. Fixes #6839.
*	Count supertypes when collecting module types (#6838)	Thomas Lively	2024-08-14	1	-3/+3
\| \| \| \| \| \| \| \| \|	Previously we included supertypes, but did not increase their count. This was done so that the output for the nominal type system, which introduced explicitly supertypes, would more closely match the output with the old equirecursive types system. Neither type system exists anymore and we only support the single, standard isorecursive type system, so we can now properly count supertypes. It turns out it doesn't make much of a difference in the test outputs anyway.
*	Monomorphization: Add a flag to control the required improvement (#6837)	Alon Zakai	2024-08-14	5	-12/+1668
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The argument is the minimum benefit we must see for us to decide to optimize, e.g. --monomorphize --pass-arg=monomorphize-min-benefit@50 When the minimum benefit is 50% then if we reduce the cost by 50% through monomorphization then we optimize there. 95% would only optimize when we remove almost all the cost, etc. In practice I see 95% will actually tend to reduce code size overall, as while we add monomorphized versions of functions, we only do so when we remove a lot of work and size, and after inlining we gain benefits. However, 50% or even lower can lead to better benchmark results, in return for larger code size, just like with inlining. To be careful, the default is set to 95%. Previously we optimized whenever we saw any benefit at all, which is the same as requiring a minimum benefit of 0%. Old tests have the flag applied in this PR to set that value, so they do not change.
*	Heap2Local: Track interactions in detail (#6834)	Alon Zakai	2024-08-13	1	-0/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we tracked only whether an expression was relevant to analysis, that is, whether it interacted with the allocation we were tracing the behavior of. That is not enough for all cases, though, so also track the form of the interaction, namely whether the allocation flows through or is fully consumed. An example where that matters: (ref.eq (struct.get $A 0 (local.tee $x (struct.new_default $A) ) ) (local.get $x) ) Here the local.get flows out the allocation, but the struct.get only fully consumes it. Before this PR we thought the struct.get flowed the allocation, and we misoptimized this to 1. To make this possible, do a bunch of minor refactoring: * Move ParentChildInteraction out of the class. * Add a "None" interaction there. * Replace the set of reached expressions with a map of them to their interactions. * Add helper functions to get an expression's interaction or to update it when replacing. The new testcase here shows the main fix. The new assertions are covered by existing testcases.
*	GlobalTypeOptimization: Reorder fields in order to remove them (#6820)	Alon Zakai	2024-08-12	1	-15/+443
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before, we only removed fields from the end of a struct. If we had, say struct Foo { int x; int y; int z; }; // Add no fields but inherit the parent's. struct Bar : Foo {}; If y is only used in Bar, but never Foo, then we still kept it around, because if we removed it from Foo we'd end up with Foo = {x, z}, Bar = {x, y, z} which is invalid - Bar no longer extends Foo. But we can do this if we first reorder the two: struct Foo { int x; int z; int y; // now y is at the end }; struct Bar : Foo {}; And the optimized form is struct Foo { int x; int z; }; struct Bar : Foo { int y; // now y is added in Bar }; This lets us remove all fields possible in all cases AFAIK. This situation is not super-common, as most fields are actually used both up and down the hierarchy (if they are used at all), but testing on some large real-world codebases, I see 10 fields removed in Java, 45 in Kotlin, and 31 in Dart testcases. The NFC change to src/wasm-type-ordering.h was needed for this to compile.
*	Typed continuations: update syntax of handler clauses (#6824)	Frank Emrich	2024-08-09	2	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The syntax for handler clauses in `resume` instructions has recently changed, using `on` instead of `tag` now. Instead of ``` (resume $ct (tag $tag0 $block0) ... (tag $tagn $blockn)) ``` we now have ``` (resume $ct (on $tag0 $block0) ... (on $tagn $blockn)) ``` This PR adapts parsing, printing, and some tests accordingly. (Note that this PR deliberately makes none of the other changes that will arise from implementing the new, combined stack switching proposal, yet.)
*	[FP16] Implement relation operations. (#6825)	Brendan Dahl	2024-08-09	2	-30/+242
\| \| \| \|	Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
*	[FP16] Implement lane access instructions. (#6821)	Brendan Dahl	2024-08-08	2	-0/+101
\| \| \| \|	Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
*	GTO: Remove minor optimization of avoiding ChildLocalizer sometimes (#6818)	Alon Zakai	2024-08-07	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The optimization is to only use ChildLocalizer, which moves children to locals, if we actually have a reason to use it. It is simple enough to see if we are removing fields with side effects here, and only call ChildLocalizer if we are not. However, this will become much more complicated in a subsequent PR which will reorder fields, which allows removing yet more of them (without reordering, we can only remove fields at the end, if any subtype needs the field). This is a pretty minor optimization, as it avoids adding a few locals in the rare case of struct.new operands having side effects. We run --gto at the start of the pipeline, so later opts will clean that up anyhow. (Though, this might make us a little less efficient, but the following PR will justify this regression.)
*	[parser] Fix bug when printing type builder errors (#6817)	Thomas Lively	2024-08-06	1	-0/+11
\| \| \| \| \| \|	The type index from the TypeBuilder error was mapped to a file location incorrectly, resulting in an assertion failure. Fixes #6816.
*	[FP16] Implement load and store instructions. (#6796)	Brendan Dahl	2024-08-06	2	-0/+94
\| \| \| \|	Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
*	[Source maps] Handle single-segment entries in source map header decoder (#6794)	Ömer Sinan Ağacan	2024-08-06	2	-0/+64
\| \| \| \| \|	Single-segment mappings were already handled in readNextDebugLocation, but not in readSourceMapHeader.
*	Make source parser consistent with binary parser when naming things. NFC (#6813)	Sam Clegg	2024-08-06	5	-16/+16
\| \| \| \| \|	The `timport$` prefix is already used for tables, so the binary parser currently uses `eimport$` to name tags (I guess because they are normally exception tags?).
*	Add a utility for iterating over all topological orders (#6801)	Thomas Lively	2024-08-05	2	-0/+102
\| \| \| \| \| \| \| \| \|	Use an extension of Kahn's algorithm for finding topological orders that iteratively makes every possible choice at every step to find all the topological orders. The order being constructed and the set of possible choices are managed in-place in the same buffer, so the algorithm takes linear time and space plus amortized constant time per generated order. This will be used in an upcoming type optimization.
*	Fix shareability handling in TypeSSA collision logic (#6798)	Alon Zakai	2024-08-01	1	-0/+39
\|
*	Add a disjoint sets (union-find) utility (#6797)	Thomas Lively	2024-08-01	2	-0/+107
\| \| \| \|	This will be used in an upcoming type optimization pass and may be generally useful.
*	Use Names::getValidNameGivenExisting in binary reading (#6793)	Alon Zakai	2024-07-31	8	-28/+45
\| \| \| \| \| \|	We had a TODO to use it once Names was optimized, which it has been. The Names version is also far faster. When building https://github.com/JetBrains/kotlinconf-app it saves 70 seconds(!).
*	Fix shareability of externalized nulls (#6791)	Alon Zakai	2024-07-30	1	-0/+8
\|
*	Add a customizable title to Metrics reporting (#6792)	Alon Zakai	2024-07-30	14	-3/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before the PR: $ bin/wasm-opt test/hello_world.wat --metrics total [exports] : 1 [funcs] : 1 [globals] : 0 [imports] : 0 [memories] : 1 [memory-data] : 0 [tables] : 0 [tags] : 0 [total] : 3 [vars] : 0 Binary : 1 LocalGet : 2 After the PR: $ bin/wasm-opt test/hello_world.wat --metrics Metrics total [exports] : 1 [funcs] : 1 ... Note the "Metrics" addition at the top. And the title can be customized: $ bin/wasm-opt test/hello_world.wat --metrics=text Metrics: text total [exports] : 1 [funcs] : 1 The custom title can be helpful when multiple invocations of metrics are used at once, e.g. --metrics=before -O3 --metrics=after.
*	Add a Tarjan's Strongly Connected Component utilty (#6790)	Thomas Lively	2024-07-30	2	-0/+312
\| \| \| \| \| \| \| \| \|	Implement a non-recursive version of Tarjan's Strongly Connected Component algorithm that consumes and produces iterators for maximum flexibility. This will be used in an optimization that transforms the heap type graph to use minimal recursion groups, which correspond to the strongly connected components of the type graph.
*	Fix shareability of internalized nulls (#6789)	Alon Zakai	2024-07-29	1	-0/+11
\|
*	Generalize Literal::externalize/internalize for strings and shareability (#6784)	Alon Zakai	2024-07-29	2	-0/+35
\|
*	[wasm-reduce] Do not crash on non-func element segments (#6778)	Thomas Lively	2024-07-26	2	-1/+24
\| \| \| \|	Generalize the code for simplifying element segments to handle more than just null and funcref elements.
*	Cost analysis: Remove "Unacceptable" hack (#6782)	Alon Zakai	2024-07-25	1	-7/+136
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We marked various expressions as having cost "Unacceptable", fixed at 100, to ensure we never moved them out from an If arm, etc. Giving them such a high cost avoids that problem - the cost is higher than the limit we have for moving code from conditional to unconditional execution - but it also means the total cost is unrealistic. For example, a function with one such instruction + an add (cost 1) would end up with cost 101, and removing the add would look insignificant, which causes issues for things that want to compare costs (like Monomorphization). To fix this, adjust some costs. The main change here is to give casts a cost of 5. I measured this in depth, see the attached benchmark scripts, and it looks clear that in both V8 and SpiderMonkey the cost of a cast is high enough to make it not worth turning an if with ref.test arm into a select (which would always execute the test). Other costs adjusted here matter a lot less, because they are on operations that have side effects and so the optimizer will anyhow not move them from conditional to unconditional execution, but I tried to make them a bit more realistic while I was removing "Unacceptable": * Give most atomic operations the 10 cost we've been using for atomic loads/ stores. Perhaps wait and notify should be slower, however, but it seems like assuming fast switching might be more relevant. * Give growth operations a cost of 20, and throw operations a cost of 10. These numbers are entirely made up as I am not even sure how to measure them in a useful way (but, again, this should not matter much as they have side effects).
*	TupleOptimization: Properly handle subtyping in copies (#6786)	Alon Zakai	2024-07-25	1	-2/+40
\| \| \| \| \| \|	We used the target's type for the read from the source, but due to subtyping those might be different. Found by the fuzzer.