forks/binaryen.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	[NFC] Avoid wasted LocalGraph work in MergeLocals (#6908)	Alon Zakai	2024-09-05	1	-1/+1
\| \| \| \| \| \|	We computed both get and set influences, but getGetInfluences() was never called, so that work was entirely pointless. This makes the pass 20% faster.
*	Fix supertype counts when collecting heap types (#6905)	Thomas Lively	2024-09-05	1	-12/+1
\| \| \| \| \| \| \|	We previous incremented the use count for a declared supertype only if it was also a type we had never seen before. Fix the count by treating the supertype the same as any other type used in a type definition. Update tests accordingly, including by manually moving input types around to better match the output.
*	[NFC] Add a more powerful API for collecting heap types (#6904)	Thomas Lively	2024-09-05	2	-116/+177
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Many passes need to know both the set of all used types and also the sets of private or public types. Previously there was no API to get both at once, so getting both required two API calls that internally collected all the types twice. Furthermore, there are many reasons to collect heap types, and they have different requirements about precisely which types need to be collected. For example, in some edge cases the IR can reference heap types that do not need to be emitted into a binary; passes that replace all types would need to collect these types, but the binary writer would not. The existing APIs for collecting types did not distinguish between these use cases, so the code conservatively collected extra types that were not always needed. Refactor the type collecting code to expose a new API that takes a description of which types need to be collected and returns the appropriate types, their use counts, and optionally whether they are each public or private. Keep this change non-functional by commenting on places where the code could be cleaned up or improved rather than actually making the changes. Follow-up PRs will implement the improvements, which will necessarily come with test changes.
*	[NFC] Add a lazy mode to LocalGraph (#6895)	Alon Zakai	2024-09-05	3	-59/+211
\| \| \| \| \| \| \| \| \| \|	LocalGraph by default will compute all the local.sets that can be read from all local.gets. However, many passes only query a small amount of those. To avoid wasted work, add a lazy mode that only computes sets when asked about a get. This is then used in a single place, LoopInvariantCodeMotion, which becomes 18% faster.
*	Only generate string.consts custom section if it is needed (#6893)	Goktug Gokdogan	2024-09-05	1	-7/+10
\|
*	[NFC] Use Index instead of size_t in topological sort util (#6903)	Thomas Lively	2024-09-05	3	-72/+76
\| \| \| \| \| \|	This saves memory and could in principle improve performance, although a quick experiment with 30 samples on ReorderGlobals did not yield a statistically significant improvement. At any rate, using Index is more consistent with other parts of the code base.
*	Use TopologicalSort::minSort to order rec groups (#6892)	Thomas Lively	2024-09-04	1	-73/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rec groups need to be topologically sorted for the output module to be valid, but the specific order of rec groups also affects the module size because types at lower indices requires fewer bytes to reference. We previously optimized for code size when gathering types by sorting the list of groups before doing the topological sort. This was brittle, though, and depended on implementation details of the topological sort to be correct. Replace the old topological sort with use of the new `TopologicalSort::minSort` utility, which is a more principled method of achieving a minimal topological sort with respect to some comparator. Also draw inspiration from ReorderGlobals and apply an exponential factor to take the users of a rec group into account when determining its weight.
*	[NFC] Compute only one dependence graph in ReorderGlobals (#6891)	Thomas Lively	2024-09-04	1	-28/+18
\| \| \| \| \| \|	We previously computed both forward and reverse dependence graphs, but one of them was only used for a single topological sort that could just as well be computed by reversing the topological sort on the other graph.
*	[EH] Rename Catch(All)_P3 to Catch(All)_Legacy (NFC) (#6901)	Heejin Ahn	2024-09-04	3	-10/+11
\| \| \| \| \| \| \|	This renames `Catch(All)_P3` enum to denote the old Phase 3 `catch(_all)` instructions to `Catch(All)_Legacy`, which sounds clearer. This is also to be consistent with https://github.com/llvm/llvm-project/pull/107187.
*	[NFC] Take custom comparator in TopologicalSort::minSort (#6890)	Thomas Lively	2024-09-04	4	-248/+200
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than finding the minimum sort with respect to the original order of vertices, find the minimum sort with respect to an arbitrary user-provided comparator. Users of the minSort utility previously had to sort their input graphs according to their desired ordering, but now they can simply provide their comparator instead. Take advantage of the new functionality in ReorderGlobals and also standardize on a single data type for representing dependence graphs to avoid unnecessary conversions. Together, these changes slightly speed up ReorderGlobals. Move the topological sort code previously in a .cpp file into the header so the comparator can be provided as a lambda template parameter instead of as a `std::function`. This makes ReorderGlobals about 5% faster.
*	[NFC] Convert LocalGraph influences accesses to function calls (#6899)	Alon Zakai	2024-09-04	8	-32/+40
\| \| \| \| \|	This replaces direct access of the data structure graph.influences[foo] with a call graph.getinfluences(foo). This will allow a later PR to make those calls optionally lazy.
*	[EH] Rename BrTarget to Trampoline (NFC) (#6898)	Heejin Ahn	2024-09-04	1	-30/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This renames "delegate_br_target" to "delegate_trampoline". So how we translate `try`-`delegate` is: - Before: ```wast (try $delegate_target ... (try (do ... ) (delegate $delegate_target) ) ... ) ``` - After: ```wast (try_table $delegate_target (throw_ref (block $delegate_br_target ... (try_table (catch_all $delegate_br_target) ... ) ... ) ) ) ``` So `delegate_br_target` is the destination we branch (via `try_table`) to, in order to rethrow the exnref using `throw_ref`. But given that the translated code does not actually have a `br`, I think this name can be confusing. This renames `br_target` to `trampoline`, given that the block is upon which we bounce the exnref off to reach the real delegate target. This is to be consistent with the variable names in the LLVM implementation (which has not been submitted yet).
*	[NFC] Convert HeapStoreOptimization to use a CFG (#6896)	Alon Zakai	2024-09-03	1	-5/+49
\| \| \| \| \| \| \| \| \|	This does not use the CFG yet, so there is no benefit (and likely some small slowdown). The next PR will actually use it to fix a correctness bug. This PR only sets up the CFG and converts the pass to operate on it, without changing any behavior or tests. Followup to #6882
*	[NFC] Move optimizeSubsequentStructSet() to a new pass, ↵	Alon Zakai	2024-09-03	5	-193/+250
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HeapStoreOptimization (#6882) This just moves code out of OptimizeInstructions to the new pass. The existing test is renamed and now runs the new pass instead. The new pass is run right after each --optimize-instructions invocation, so it should not cause any noticeable effects whatsoever, making this NFC. The motivation here is that there is a bug in the pass, see the new testcase added at the end, which shows the bug. It is not practical to fix that bug in OptimizeInstructions since we need more than peephole optimizations to do so. This PR moves the code to a new pass so we can fix it there properly, later. The new pass is named HeapStoreOptimization since the same infrastructure we will need to fix the bug will also help dead store elimination and related things.
*	[NFC] Refactor LocalGraph to split up flow() for future laziness work (#6880)	Alon Zakai	2024-09-03	3	-93/+165
\|
*	[NFC] Avoid repeated work in DeadArgumentElimination scanning (#6869)	Alon Zakai	2024-09-03	3	-61/+141
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pass may do multiple iterations, and before this PR it scanned the entire module each time. That is simpler than tracking stale data, but it can be quite slow. This PR adds staleness tracking, which makes it over 3x faster (and this can be one of our slowest passes in some cases, so this is significant). To achieve this: * Add a staleness marker on function info. * Rewrite how we track unseen calls. Previously we used atomics in a clever way, * now we just accumulate the data in a simple way (easier for staleness tracking). * Add staleness invalidation in the proper places. * Add a param to localizeCallsTo to allow us to learn when a function is changed. This kind of staleness analysis is usually not worthwhile, but given the 3x plus speedup it seems justified. I fuzzed it directly, and also any staleness bug can lead to validation errors, so normal fuzzing also gives us good coverage here.
*	[FP16] Fix max opcode typo. (#6894)	Brendan Dahl	2024-09-03	1	-1/+1
\|
*	[FP16] Implement madd and nmadd. (#6878)	Brendan Dahl	2024-09-03	10	-11/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md A few notes: - The F32x4 and F64x2 versions of madd and nmadd are missing spect tests. - For madd, the implementation was incorrectly doing `(bc)+a` where it should be `(ab)+c`. - For nmadd, the implementation was incorrectly doing `(-bc)+a` where it should be `-(ab)+c`. - There doesn't appear to be a great way to actually implement a fused nmadd, but the spec allows the double rounded version I added.
*	[NFC] Change topological sort utilities to functions (#6889)	Thomas Lively	2024-09-03	3	-75/+76
\| \| \| \| \| \| \|	Previously they were structs and their results were accessed with `operator*()`, but that was unnecessarily complicated and could lead to problems with temporary lifetimes being too short. Simplify the utilities by making them functions. This also allows the wrapper templates to infer the proper element types automatically.
*	Simplify ReorderGlobals using new topological sort utils (#6885)	Thomas Lively	2024-08-29	1	-141/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the new TopologicalSort and MinTopologicalSortOf utilities instead of the old CRTP topological sort utility and a bespoke heap-based topological sort in ReorderGlobals. Since there is no longer a heap to pop from, the direction of the custom comparator is now much more intuitive. Further simplify the code by switching from tracking the new order of globals using a sequence of new indices to tracking the order using a sequence of old indices. This change also makes the pass about 20% faster on a large real-world module.
*	Add a utility for finding minimal topological sorts (#6884)	Thomas Lively	2024-08-29	2	-9/+115
\| \| \| \| \| \| \| \|	Reuse the code implementing Kahn's topological sort algorithm with a new configuration that uses a min-heap to always choose the best available element. Also add wrapper utilities that can find topological sorts of graphs with arbitrary element types, not just indices.
*	[NFC] Refactor LocalGraph's core getSets API (#6877)	Alon Zakai	2024-08-28	13	-56/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before we just had a map that people would access with localGraph.getSetses[get], while now it is a call localGraph.getSets(get), which more nicely hides the internal implementation details. Also rename getSetses => getSetsMap. This will allow a later PR to optimize the internals of this API. This is performance-neutral as far as I can measure. (We do replace a direct read from a data structure with a call, but the call is in a header and should always get inlined.)
*	Rename relaxed SIMD fma instructions to match spec. (#6876)	Brendan Dahl	2024-08-27	13	-120/+124
\| \| \| \| \| \| \|	The instructions relaxed_fma and relaxed_fnma have been renamed to relaxed_madd and relaxed_nmadd. https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md#binary-format
*	Check for required actions when parsing wast (#6874)	Thomas Lively	2024-08-27	1	-3/+11
\| \| \| \| \| \| \| \| \|	The parser function for `action` returned a `MaybeResult`, but we were treating it as returning a normal `Result` and not checking that it had contents in several places. Replace the current `action()` with `maybeAction()` and add a new `action()` that requires the action to be present. Fixes #6872.
*	[FP16] Implement unary operations. (#6867)	Brendan Dahl	2024-08-27	14	-40/+263
\| \| \| \|	Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
*	[NFC] Optimize ParamUtils::getUsedParams() (#6866)	Alon Zakai	2024-08-26	4	-18/+43
\| \| \| \| \| \| \| \| \| \| \| \| \|	This constructed a LocalGraph, which computes the sets that reach each get. But all we need to know is which params are live, so instead we can do a liveness computation (which is just a boolean, not the list of sets). Also, it is simple to get the liveness computation to only work on the parameters and not all the locals, as a further optimization. Existing tests cover this, though I did find that the case of unreachability needed a new test. On a large testcase I am looking at, this makes --dae 17% faster.
*	Fix null dereference in FunctionValidator (#6849)	mtb	2024-08-26	1	-2/+11
\| \| \| \| \| \| \| \| \| \|	visitBlock() and validateCallParamsAndResult() both assumed they were running inside a function, but might be called on global code too. Calls and blocks are invalid in global positions, so we should error there, but must do so properly without a null deref. Fixes #6847 Fixes #6848
*	Support more reference constants in wast scripts (#6865)	Thomas Lively	2024-08-26	4	-22/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spec tests use constants like `ref.array` and `ref.eq` to assert that exported function return references of the correct types. Support more such constants in the wast parser. Also fix a bug where the interpretation of `array.new_data` for arrays of packed fields was not properly truncating the packed data. Move the function for reading fields from memory from literal.cpp to wasm-interpreter.h, where the function for truncating packed data lives. Other bugs prevent us from enabling any more spec tests as a result of this change, but we can get farther through several of them before failing. Update the comments about the failures accordingly.
*	[FP16] Add a feature flag for FP16. (#6864)	Brendan Dahl	2024-08-22	7	-132/+171
\| \| \|	Ensure the "fp16" feature is enabled for FP16 instructions.
*	[NFC] Avoid quadratic time when precomputing blocks (#6862)	Alon Zakai	2024-08-21	1	-0/+67
\| \| \| \| \| \| \| \| \|	When precomputing fails on a child block of a parent block, there is no point to precompute the parent, as that will fail as well. This makes --precompute on Emscripten's test_biggerswitch go from 1.44 seconds to 0.02 seconds (not a typo, that is 72x faster). The absolute number is not that big, but we do run this pass more than once, so it saves a noticeable chunk of time.
*	[NFC] Avoid quadratic time in StackIROptimizer::removeUnneededBlocks() (#6859)	Alon Zakai	2024-08-21	1	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is in quite ancient code, so it's a long-standing issue, but it got worse when we enabled StackIR in more situations (#6568), which made it more noticeable, I think. For example, testing on test_biggerswitch in Emscripten, the LLVM part is pretty slow too so the Binaryen slowdown didn't stand out hugely, but just doing wasm-opt --optimize-level=2 input.wasm -o output.wasm (that is, do no work, but set the optimize level to 2 so that StackIR opts are run) used to take 28 seconds (!). With this PR that goes down to less than 1.
*	Add a string lowering mode disallowing non-UTF-8 strings (#6861)	Thomas Lively	2024-08-21	3	-2/+24
\| \| \| \| \| \| \| \| \| \| \|	The best way to lower strings is via the "magic imports" API that uses the names of imported string globals as their values. This approach only works for valid UTF-8 strings, though. The existing string-lowering-magic-imports pass falls back to putting non-UTF-8 strings in a JSON custom section, but this requires the runtime to support that custom section for correctness. To help catch errors early when runtimes do not support the strings custom section, add a new pass that uses magic imports and raises an error if there are any invalid strings.
*	[FP16] Implement arithmetic operations. (#6855)	Brendan Dahl	2024-08-21	13	-10/+275
\| \| \| \|	Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
*	[NFC] hash constant string as void* (#6863)	Thomas Lively	2024-08-21	1	-1/+1
\| \| \| \| \| \| \| \|	possible-contents.h hashes the location for caught exnrefs by hashing an arbitrary string, "caught-exnref-location". It previously used `std::hash<const char>` for this, but some standard library implementations report an error when this template instantiation is used because hashing the location of a string is almost never correct. In this case it is fine, so switch to using `std::hash<const void>`.
*	Support `ref.extern n` in spec tests (#6858)	Thomas Lively	2024-08-21	4	-10/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spec tests pass the value `ref.extern n`, where `n` is some integer, into exported functions that expect to receive externrefs and receive such values back out as return values. The payload serves to distinguish externrefs so the test can assert that the correct one was returned. Parse these values in wast scripts and represent them as externalized i31refs carrying the payload. We will need a different representation eventually, since some tests explicitly expect these externrefs to not be i31refs, but this suffices to get several new tests passing. To get the memory64 version of table_grow.wast passing, additionally fix the interpreter to handle growing 64-bit tables correctly. Delete the local versions of the upstream tests that can now be run successfully.
*	Fix encoding of heap type definitions (#6856)	Thomas Lively	2024-08-20	2	-22/+22
\| \| \| \| \| \| \| \|	The leading bytes that indicate what kind of heap type is being defined are bytes, but we were previously treating them as SLEB128-encoded values. Since we emit the smallest LEB encodings possible, we were writing the correct bytes in output files, but we were also improperly accepting binaries that used more than one byte to encode these values. This was caught by an upstream spec test.
*	Add the upstream spec testsuite as a submodule (#6853)	Thomas Lively	2024-08-20	1	-0/+3
\| \| \| \| \| \|	Run the upstream tests by default, except for a large list of them that do not successfully run. Remove the local version of those that do successfully run where the local version is entirely subsumed by the upstream version.
*	[Exceptions] Finish interpreter + optimizer support for try_table. (#6814)	Sébastien Doeraene	2024-08-20	15	-29/+218
\| \| \| \| \| \|	* Add interpreter support for exnref values. * Fix optimization passes to support try_table. * Enable the interpreter (but not in V8, see code) on exceptions.
*	Validate array.init_elem segment in IRBuilder (#6852)	Thomas Lively	2024-08-19	1	-0/+10
\| \| \| \| \| \| \| \| \|	IRBuilder is responsible for validation involving type annotations on GC instructions because those type annotations may not be preserved in the built IR to be used by the main validator. For `array.init_elem`, we were not using the type annotation to validate the element segment, which allowed us to parse invalid modules when the reference operand was a nullref. Add the missing validation in IRBuilder and fix a relevant spec test.
*	Print explicit typeuses for non-MVP function types (#6851)	Thomas Lively	2024-08-19	1	-2/+11
\| \| \| \| \| \| \| \| \|	We previously printed explicit typeuses (e.g. `(type $f)`) in function signatures when GC was enabled. But even when GC is not enabled, function types may use non-MVP features that require the explicit typeuse to be printed. Fix the printer to always print the explicit type use for such types. Fixes #6850.
*	[NFC] Use HeapType::getKind more broadly (#6846)	Thomas Lively	2024-08-19	10	-263/+363
\| \| \| \| \| \| \| \|	Replace code that checked `isStruct()`, `isArray()`, etc. in sequence with uses of `HeapType::getKind()` and switch statements. This will make it easier to find the code that needs updating if/when we add new heap type kinds in the future. It also makes it much easier to find code that already needs updating to handle continuation types by grepping for "TODO: cont".
*	Add a pass for minimizing recursion groups (#6832)	Thomas Lively	2024-08-17	5	-0/+808
\| \| \| \| \| \| \| \| \| \| \| \|	Most of our type optimization passes emit all non-public types as a single large rec group, which trivially ensures that different types remain different, even if they are optimized to have the same structure. Usually emitting a single large rec group is fine, but it also means that if the module is split, all of the types will need to be repeated in all of the split modules. To better support this use case, add a pass that can split the large rec group back into minimal rec groups, taking care to preserve separate type identities by emitting different permutations of the same group where possible or by inserting unused brand types to differentiate them.
*	Fix direct comparisons with unshared basic heap types (#6845)	Thomas Lively	2024-08-16	6	-12/+21
\| \| \| \| \|	Audit the remaining ocurrences of `== HeapType::` and fix those that did not handle shared types correctly. Add tests for some of the fixes; others are NFC but clarify the code.
*	Implement table.init (#6827)	Alon Zakai	2024-08-16	27	-32/+276
\| \| \| \| \|	Also use TableInit in the interpreter to initialize module's table state, which will now handle traps properly, fixing #6431
*	Simplify validation of stale types (#6842)	Thomas Lively	2024-08-15	1	-24/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous rules for stale types were complicated and hard to remember: in general it was ok for result types to be further refinable as long as they were not refinable all the way to `unreachable`, but control flow structures had a carve-out and it was ok for them to be refinable all the way to unreachable. Simplify the rules so that further refinable result types are always ok, no matter what they can be refined to and no matter what kind of instruction is being validated. This will be much easier to remember and reason about. This relaxation of the rules strictly increases the set of valid IR, so no passes or tests need to be updated. It does make it possible for us to miss type refinement opportunities that previously would have been validation errors, but only in cases where non-control-flow instructions could have been refined all the way to unreachable, so the risk seems small.
*	[NFC] Clean up Literal copy constructor (#6841)	Alon Zakai	2024-08-15	1	-30/+27
\| \| \| \| \| \| \| \|	Diff without whitespace is smaller. * HeapType::ext was handled in two places. The second place was wrong, but not reached. * Near the end all we have left are refs, so no need to check isRef etc. * Simplify the code to get the heap type once.
*	Save build ID in a source map (#6799)	Marcin Kolny	2024-08-15	3	-1/+28
\| \| \| \| \| \| \|	This is based on these two proposals: * https://github.com/WebAssembly/tool-conventions/blob/main/BuildId.md * https://github.com/tc39/source-map/blob/main/proposals/debug-id.md
*	Heap type `none` requires GC (#6840)	Thomas Lively	2024-08-14	1	-1/+1
\| \| \| \| \| \|	Since reference types only introduced function and extern references, all of the types in the `any` hierarchy require GC, including `none`. Fixes #6839.
*	Count supertypes when collecting module types (#6838)	Thomas Lively	2024-08-14	1	-5/+1
\| \| \| \| \| \| \| \| \|	Previously we included supertypes, but did not increase their count. This was done so that the output for the nominal type system, which introduced explicitly supertypes, would more closely match the output with the old equirecursive types system. Neither type system exists anymore and we only support the single, standard isorecursive type system, so we can now properly count supertypes. It turns out it doesn't make much of a difference in the test outputs anyway.
*	Monomorphization: Add a flag to control the required improvement (#6837)	Alon Zakai	2024-08-14	1	-9/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The argument is the minimum benefit we must see for us to decide to optimize, e.g. --monomorphize --pass-arg=monomorphize-min-benefit@50 When the minimum benefit is 50% then if we reduce the cost by 50% through monomorphization then we optimize there. 95% would only optimize when we remove almost all the cost, etc. In practice I see 95% will actually tend to reduce code size overall, as while we add monomorphized versions of functions, we only do so when we remove a lot of work and size, and after inlining we gain benefits. However, 50% or even lower can lead to better benchmark results, in return for larger code size, just like with inlining. To be careful, the default is set to 95%. Previously we optimized whenever we saw any benefit at all, which is the same as requiring a minimum benefit of 0%. Old tests have the flag applied in this PR to set that value, so they do not change.