forks/binaryen.git -

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[GC] Ignore public types in SignaturePruning (#7018)	Alon Zakai	2024-10-18	1	-15/+6
\| \| \| \| \| \| \| \| \| \|	Similar to #7017 . As with that PR, this reduces some optimizations that were valid, as we tried to do something complex here and refine types in a public rec group when it seemed safe to do so, but our analysis was incomplete. The testcase here shows how another operation can end up causing a dependency that breaks things, if another type that uses one that we modify is public. To be safe, ignore all public types. In the future perhaps we can find a good way to handle "almost-private" types in public rec groups, in closed world.
*	[GC] Ignore public types in SignatureRefining (#7022)	Alon Zakai	2024-10-18	1	-12/+5
\| \| \|	Similar to #7017 and #7018
*	[EH] Add TryTable to StripEH (#7020)	Alon Zakai	2024-10-18	1	-0/+5
\|
*	[GC] Ignore public types in GlobalTypeOptimization (#7017)	Alon Zakai	2024-10-17	1	-3/+16
\| \| \| \| \| \|	TypeUpdater which it uses internally already does so, but we must also ignore such types earlier, and make no other modifications to them. Helps #7015
*	[EH][GC] Send a non-nullable exnref from TryTable (#7013)	Alon Zakai	2024-10-17	3	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When EH+GC are enabled then wasm has non-nullable types, and the sent exnref should be non-nullable. In BinaryenIR we use the non- nullable type all the time, which we also do for function references and other things; we lower it if GC is not enabled to a nullable type for the binary format (see `WasmBinaryWriter::writeType`, to which comments were added in this PR). That is, this PR makes us handle exnref the same as those other types. A new test verifies that behavior. Various existing tests are updated because ReFinalize will now use the more refined type, so this is an optimization. It is also a bugfix as in #6987 we started to emit the refined form in the fuzzer, and this PR makes us handle it properly in validation and ReFinalization.
*	[EH][GC] Add missing subtyping constraints from TryTable (#7012)	Alon Zakai	2024-10-16	1	-1/+7
\| \| \| \| \|	Similar to Break, BrOn, etc., we must apply subtyping constraints of the types we send to blocks, so that Unsubtyping will not remove subtypings that are actually needed.
*	[NFC] Add validation checks in OptUtils::optimizeAfterInlining (#7009)	Alon Zakai	2024-10-16	1	-5/+20
\| \| \| \| \| \| \|	This can help find errors in the middle of passes like Inlining, that do multiple cycles and include optimizations in the middle. We do this in BINARYEN_PASS_DEBUG >= 2 to avoid slowing down the timing reports in 1.
*	[Wasm GC] Fuzz BrOn (#7006)	Alon Zakai	2024-10-16	2	-6/+119
\|
*	[NFC] Remove unused, ancient file wasm-module-building.h (#7010)	Alon Zakai	2024-10-16	1	-316/+0
\| \| \| \|	This was used in asm2wasm (the asm.js to wasm compiler, used in fastcomp, before the LLVM wasm backend replaced it).
*	GlobalRefining: Do not refine mutable exported globals (#7007)	Alon Zakai	2024-10-15	1	-2/+9
\| \| \| \| \|	A mutable exported global might be shared with another module which writes to it using the current type, which is unsafe and the type system does not allow, so do not refine there.
*	[Strings] StringGathering: Handle uses of strings before their definitions ↵	Alon Zakai	2024-10-15	1	-10/+8
\| \| \| \| \| \| \| \| \| \|	(#7008) When we gather strings, we create new globals for each one, that is then the canonical defining global for it, which will then be used everywhere else. We create such a global if we lack one, but if we happen to have such a global - a global that simply defines a string - then we reuse it. But we didn't handle the case where there was a use before the definition, and failed to sort the definition before the use.
*	[WasmGC] OptimizeInstructions: Cancel out internalize+externalize pairs (#7005)	Alon Zakai	2024-10-14	1	-1/+11
\|
*	[Wasm EH] Optimize away _ref from try_table catches when unused (#6996)	Alon Zakai	2024-10-14	1	-4/+52
\| \| \| \| \| \| \| \| \| \| \| \| \|	If we have (drop (block $b (result exnref) (try_table (catch_all_ref $b) then we don't really need to send the ref: it is dropped, so we can just replace catch_all_ref with catch_all and then remove the drop and the block value. MergeBlocks already had logic to remove block values, so it is the natural place to add this.
*	[WasmGC] OptimizeInstructions: Reorder externalize/internalize operations ↵	Alon Zakai	2024-10-14	1	-2/+21
\| \| \| \| \| \| \| \| \| \|	with ref.as_non_null (#7004) (any.convert_extern/extern.convert_any (ref.as_non_null ..)) => (ref.as_non_null (any.convert_extern/extern.convert_any ..)) This then allows the RefAsNonNull to be combined with parents in some cases (whereas the reverse allows nothing).
*	Optimize Module::get_* family of functions with std::string_view in ↵	Petr Makhnev	2024-10-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	getModuleElement (#6998) Passing a constant string to functions requires memory allocation, and allocation is inherently slow. Since we are using C++17, we can use string_view and remove this unnecessary allocation. Although the code seems simple enough for the optimizer to remove this allocation after inlining, tests on Clang 18 show that this is not the case (on Apple Silicon at least).
*	[Wasm EH] Optimize values flowing out of TryTable (#6997)	Alon Zakai	2024-10-10	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows (block $out (result i32) (try_table (catch..) .. (br $out (i32.const 42) ) ) ) => (block $out (result i32) (try_table (result i32) (catch..) ;; add a result .. (i32.const 42) ;; remove the br around the value ) )
*	ReFinalize in MergeBlocks so we can optimize unreachable instructions too ↵	Alon Zakai	2024-10-10	1	-25/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(#6994) In #6984 we optimized dropped blocks even if they had unreachable code. In #6988 that part was reverted, and blocks with unreachable code were ignored once more. However, I realized that the check was not actually for unreachable code, but for having an unreachable child, so it would miss things like this: (block (block .. (br $somewhere) ;; unreachable type, but no unreachable code ) ) But it is useful to merge such blocks: we don't need the inner block here. To fix this, just run ReFinalize if we change anything, which will propagate unreachability as needed. I think MergeBlocks was written before we had that utility, so it didn't use it... This is not only useful for itself but will unblock an EH optimization in a later PR, that has code in this form. It also simplifies the code by removing the hasUnreachableChild checks.
*	Fix BranchUtils::operateOnScopeNameUsesAndSentValues() on BrOn (#6995)	Alon Zakai	2024-10-10	1	-1/+2
\| \| \| \| \|	BrOn does not always send a value. This is an odd asymmetry in the wasm spec, where br_on_null does not send the null on the branch (which makes sense, but the asymmetry does mean we need to special-case it).
*	Fix flow reset during throw => break opts in RemoveUnusedBrs (#6993)	Alon Zakai	2024-10-08	1	-0/+4
\| \| \| \| \| \| \| \|	#6980 was missing the logic to reset flows after replacing a throw. The process of replacing the throw introduces new code and in particular a drop, which blocks branches from flowing to their targets. In the testcase here, the br was turned into nop before this fix.
*	Fuzzer: Generate TryTables (#6987)	Alon Zakai	2024-10-07	2	-0/+69
\| \| \| \|	Also make Try/TryTables with type none, and not just concrete types as before.
*	Add explicit errors on unhandled instructions in Flatten (#6992)	Alon Zakai	2024-10-07	1	-0/+5
\| \| \|	This error makes #6989 less confusing.
*	Fix a misoptimization with mixed Try/TryTable in RemoveUnusedBrs (#6991)	Alon Zakai	2024-10-07	1	-10/+16
\| \| \| \|	We ignored legacy Trys in #6980, but they can also catch.
*	Fix a fuzz issue with #6984 (#6988)	Alon Zakai	2024-10-07	1	-14/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When I refactored the optimizeDroppedBlock logic in #6982, I didn't move the unreachability check with that code, which was wrong. When that function was called from another place in #6984, the fuzzer found an issue. Diff without whitespace is smaller. This reverts almost all the test updates from #6984 - those changes were on blocks with unreachable children. The change was safe on them, but in general removing a block value in the presence of unreachable code is tricky, so it's best to avoid it. The testcase is a little bizarre, but it's the one the fuzzer found and I can't find a way to generate a better one (other than to reduce it, which I did).
*	MergeBlocks: Optimize all dropped blocks (#6984)	Alon Zakai	2024-10-04	1	-0/+9
\| \| \| \| \| \|	Just call optimizeDroppedBlock from visitDrop to handle that. Followup to #6982. This optimizes the new testcase added there. Some older tests also improve.
*	RemoveUnusedBrs: Generalize jump threading optimizations to all branches (#6983)	Alon Zakai	2024-10-04	1	-24/+27
\| \| \| \| \| \| \| \|	This change is NFC on all things we previously optimized, but also makes us optimize TryTable, BrOn, etc., by replacing hard-coded logic for Break with generic code. Also simplify the code there a little - we didn't really need ControlFlowWalker.
*	[NFC] Refactor out the dropped-block optimization code in MergeBlocks (#6982)	Alon Zakai	2024-10-03	1	-30/+46
\| \| \| \| \| \| \| \|	This just moves the code out into a function. A later PR will use it in another place. Add a test that shows the motivation for that later PR: we fail to optimize away a block return value at the top level of a function. Fixing that will involve calling the new function here from another place.
*	[Wasm EH] Optimize throws caught by TryTable into breaks (#6980)	Alon Zakai	2024-10-03	1	-16/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	E.g. (try_table (catch_all $catch) (throw $e) ) => (try_table (catch_all $catch) (br $catch) ) This can then allow other passes to remove the TryTable, if no throwing things remain.
*	Source Maps: Support 5 segment mappings (#6795)	Ömer Sinan Ağacan	2024-10-01	8	-38/+179
\| \| \| \| \| \| \|	Support 5-segment source mappings, which add a name. Reference: https://github.com/tc39/source-map/blob/main/source-map-rev3.md#proposed-format
*	[NFC] Move a TypeInfo constructor out of a header (#6979)	Alon Zakai	2024-10-01	2	-1/+3
\| \| \| \|	Some versions of libcxx or clang error without this, apparently due to Type being a forward declaration.
*	Binary parser: Lift the limit on the number of locals (#6973)	Jérôme Vouillon	2024-09-30	1	-6/+14
\| \| \| \| \| \| \|	This raises the number of locals accepted by the binary parser to the absolute limit in the spec. A warning is now printed when writing a binary file if the Web limit of 50,000 locals is exceeded. Fixes #6968.
*	Fix the type of reused RefFunc in Precompute (#6976)	Alon Zakai	2024-09-30	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	When we precompute something, we try to avoid allocating a new copy. That's important to avoid many allocations each time we run Precompute - otherwise, each time we see a br we'd allocate a fresh one, and for its values. But we had a bug where we reused a RefFunc as the value of a br without updating the type. It's actually tricky to reach a situation where we find a RefFunc to reuse and it is different from the actual one we want, but the fuzzer found one. Fixes the fuzz bug reported on #6845 (but unrelated to that PR).
*	[NFC] Print type names in more places when logging (#6975)	Alon Zakai	2024-09-30	3	-3/+12
\|
*	[FP16] Implement conversion operations. (#6974)	Brendan Dahl	2024-09-26	16	-6/+188
\| \| \| \| \| \| \| \| \| \|	Note: FP16 is a little different from F32/F64 since it can't represent the full 2^16 integer range. 65504 is the max whole integer. This leads to some slightly strange behavior when converting integers greater than 65504 since they become infinity. Specified at https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
*	[NFC-ish] Stop creating unneeded blocks around calls when inlining (#6969)	Alon Zakai	2024-09-26	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Inlining was careful about nested calls like this: (call $a (call $b) ) If we inlined the outer call first, we'd have (block $inlined-code-from-a ..code.. (call $b) ) After that, the inner call is a child of a block, not of a call. That is, we've moved the inner call to another parent. To replace that inner call when we inline, we'd need to update the new parent, which would require work. To avoid that work, the pass simply created a block in the middle: (call $a (block (call $b) ) ) Now the inner call's immediate parent will not change when we inline the outer call. However, it turns out that this was entirely unnecessary. We find the calls using a post-order traversal, and we store the actions in a vector that we traverse in order, so we only ever process things in the optimal order of children before parents. And in that order there is no problem: inlining the inner call first leads to (call $a (block $inlined-code-from-b (..code..) ) ) That does not affect the outer call's parent. This PR removes the creation of the unnecessary blocks. This doesn't improve the final output as optimizations remove the unneeded blocks later anyhow, but it does make the code simpler and a little faster. It also makes debugging less confusing. But this is not truly NFC because --inlining (but not --inlining-optimizing) will actually emit fewer blocks now (but only --inlining-optimizing is used by default in production). The diff on tests here is very small when ignoring whitespace. The remaining differences are just emitting fewer obviously-unneeded blocks. There is also one test that needed manual changes, inlining-eh-legacy, because it tested that we do Pop fixups, but after emitting one fewer block, those fixups were not needed. I added a new test there with two nested calls, which does end up needing those fixups. I also added such a test in inlining_all-features so that we have coverage for such nested calls (we might remove the eh-legacy file some day, and other existing tests with nested calls that I found were more complex).
*	[NFC] Early-exit PickLoadSigns if there are no memories (#6971)	Alon Zakai	2024-09-26	1	-0/+5
\| \| \| \|	In WasmGC modules there is often no memory at all, and we can skip walking the code in this pass in such cases.
*	[NFC] Use an unordered map in Parents (#6970)	Alon Zakai	2024-09-26	1	-1/+1
\| \| \| \| \| \|	This makes it slightly faster. Followup to https://github.com/WebAssembly/binaryen/pull/6953#discussion_r1765313401
*	[NFC-ish] Avoid repeated ReFinalize etc. when inlining (#6967)	Alon Zakai	2024-09-24	1	-6/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We may inline multiple times into a single function. Previously, if we did so, we did the "fixups" such as ReFinalize and non-nullable local fixes once per such inlining. But that is wasteful as each ReFinalize etc. scans the whole function, and could be done after we copy all the code from all the inlinings, which is what this PR does: it splits doInlining() into one function that inlines code and one that does the updates after, and the update is done after all inlinings. This turns out to be very important, a 5x speedup on two large real-world wasm files I am looking at. The reason is that we actually inline more than once in half the cases, and sometimes far more - in one case we inline over 1,000 times into a function! (and ReFinalized 1,000 times too many) This is practically NFC, but it turns out that there are some tiny noticeable differences between running ReFinalize once at the end vs. once after each inlining. These differences are not really functional or observable in the behavior of the code, and optimizations would remove them anyhow, but they are noticeable in two tests here. The changes to tests are, in order: * Different block names, just because the counter we use sees more things. * In a testcase with unreachable code, we inline twice into a function, and the first inlining brings in an unreachable, and ReFinalizing early will lead to it propagating differently than if we wait to ReFinalize. (It actually leads to another cycle of inlining in that case, as a fluke.)
*	[NFC] Parallelize the actual inlining part of the Inlining pass (#6966)	Alon Zakai	2024-09-24	2	-30/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We already parallelized collecting info about functions and finding inlining opportunities, but the actual inlining - copying the code into the target function - was done sequentially. It turns out that a lot of work happens there: this PR makes the pass over 2x faster. Details: * Move nameHint to InliningAction, so it is there when we apply the actions. * Add a DoInlining internal pass which calls doInlining(). * Refactor OptUtils a little to make it easy to run DoInlining before opts. * Also remove the return value of doInlining which was not used.
*	[NFC] Eagerly create segments when parsing datacount (#6958)	Thomas Lively	2024-09-19	2	-3/+26
\| \| \| \| \| \| \| \| \|	The purpose of the datacount section is to pre-declare how many data segments there will be so that engines can allocate space for them and not have to back patch subsequent instructions in the code section that refer to them. Once we use IRBuilder in the binary parser, we will have to have the data segments available by the time we parse instructions that use them, so eagerly construct the data segments when parsing the datacount section.
*	[NFC] Eagerly create Functions in binary parser (#6957)	Thomas Lively	2024-09-19	4	-12/+23
\| \| \| \| \| \| \| \|	In preparation for using IRBuilder in the binary parser, eagerly create Functions when parsing the function section so that they are already created once we parse the code section. IRBuilder will require the functions to exist when parsing calls so it can figure out what type each call should have, even when there is a call to a function whose body has not been parsed yet.
*	[NFC] Add isSSA to LazyLocalGraph, and use it in OptimizeAddedConstants (#6952)	Alon Zakai	2024-09-18	3	-7/+52
\| \| \|	This makes the pass 15% faster.
*	[NFC] Avoid collecting unnecessary parents in OptimizeAddedConstants (#6953)	Alon Zakai	2024-09-18	1	-2/+24
\| \| \|	This makes the pass 20% faster.
*	[NFC + bugfix] Remove BreakTargetLocation from GUFA (#6956)	Alon Zakai	2024-09-18	2	-52/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before, a br would send its value to a BreakTargetLocation. That would then be linked to the target: { br's value } => BreakTargetLocation(target name) => { location of target } This PR skips the middle: { br's value } => { location of target } It just connects breaks directly to the targets. We can do that if we keep a map of the targets as we go. This is 2% faster as well as simplifies the code, as an NFC refactoring. But it also fixes a bug: we have handling on ExpressionLocation that filters values as they come in (they must accord with the expression's type). We were not doing that on BreakTargetLocation, leading to an assert. Removing BreakTargetLocation entirely is easier and better than adding filtering logic for it. Fixes #6955
*	Improve types for null accesses and remove hacks (#6954)	Thomas Lively	2024-09-18	2	-22/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a struct.get or array.get is optimized to have a null reference operand, its return type loses meaning since the operation will always trap. Previously when refinalizing such expressions, we just left their return type unchanged since there was no longer an associated struct or array type to calculate it from. However, this could lead to a strange setup where the stale return type was the last remaining use of some heap type in the module. That heap type would never be emitted in the binary, but it was still used in the IR, so type optimizations would have to keep updating it. Our type collecting logic went out of its way to include the return types of struct.get and array.get expressions to account for this strange possibility, even though it otherwise collected only types that would appear in binaries. In principle, all of this should have applied to `call_ref` as well, but the type collection logic did not have the necessary special case, so there was probably a latent bug there. Get rid of these special cases in the type collection logic and make it impossible for the IR to use a stale type that no longer appears in the binary by updating such stale types during finalization. One possibility would have been to make the return types of null accessors unreachable, but this violates the usual invariant that unreachable instructions must either have unreachable children or be branches or `(unreachable)`. Instead, refine the return types to be uninhabitable non-nullable references to bottom, which is nearly as good as refining them directly to unreachable. We can consider refining them to `unreachable` in the future, but another problem with that is that it would currently allow the parsers to admit more invalid modules with arbitrary junk after null accessor instructions.
*	[NFC] Make the GCData constructor a move constructor (#6946)	Alon Zakai	2024-09-17	4	-12/+36
\| \| \| \| \| \| \|	This avoids creating a large Literals (SmallVector of Literal) and then copying it. All the places that construct GCData do not need the Literals afterwards. This gives a 7% speedup on the --precompute benchmark from #6931
*	[wasm-split] Minimize non-function export names (#6951)	Thomas Lively	2024-09-17	1	-2/+5
\| \| \| \| \| \| \| \|	The module splitting utility has a configuration option for minimizing new export names, but it was previously applied only to newly exported functions. Using the new multi-split mode can produce lots of exported tables and splitting WasmGC programs can produce lots of exported globals, so minimizing these export names can have a big impact on code size.
*	[wasm-split] Configure split functions rather than kept functions (#6949)	Thomas Lively	2024-09-17	3	-12/+7
\| \| \| \| \| \| \| \|	The configuration for the module splitting utility previous took a set of functions to keep in the primary module. Change it to take a list of functions to split into the secondary module instead. This improves the code quality in multi-split mode because it keeps stub functions generated by previous splits from being moved into secondary modules during later splits.
*	[wasm-split] Simplify handling of --keep-funcs and --split-funcs (#6948)	Thomas Lively	2024-09-17	3	-67/+67
\| \| \| \| \| \| \| \| \| \| \| \|	Maintain the invariant that every defined functions belongs to either the set of kept functions or the set of split functions. Functions are kept by default except when --keep-funcs is specified without --split-funcs on the command line. This is mostly NFC except that it changes the default behavior when no arguments are specified on the command line to keep all functions. This will simplify a follow-on PR that switches from passing the kept functions to the module splitting utility to passing the split functions.
*	Fix selects of packed fields in GlobalStructOptimization (#6947)	Alon Zakai	2024-09-17	1	-2/+4
\| \| \| \| \|	We emit a select between two objects when only two objects exist of a particular type. However, if the field is packed, we did not handle truncating the written values.
*	[wasm-split] Run RemoveUnusedElements on secondary modules (#6945)	Thomas Lively	2024-09-17	2	-5/+17
\| \| \| \| \| \| \| \| \|	Rather than analyze what module elements from the primary module a secondary module will need, the splitting logic conservatively imports all module elements from the primary module into the secondary module. Run RemoveUnusedElements on the secondary module to remove any of these imports that happen to be unnecessary. Leave a TODO mentioning the possibility of being more selective about which module elements get exported to reduce code size in the primary module, too.