forks/binaryen.git -

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Require string-style identifiers to be UTF-8 (#6941)	Thomas Lively	2024-09-16	3	-32/+38
\| \| \| \| \| \| \| \| \| \| \|	In the WebAssembly text format, strings can generally be arbitrary bytes, but identifiers must be valid UTF-8. Check for UTF-8 validity when parsing string-style identifiers in the lexer. Update StringLowering to generate valid UTF-8 global names even for strings that may not be valid UTF-8 and test that text round tripping works correctly after StringLowering. Fixes #6937.
*	Remove open "ignorable public" array types (#6940)	Thomas Lively	2024-09-16	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a few heap types that are hard-coded to be considered public and therefore allowed on module boundaries even in --closed-world mode, specifically to support js-string-builtins. We previously considered both open and closed (i.e. final) mutable i8 arrays to be public in this manner, but js-string-builtins only uses the closed versions, so remove the open versions. This fixes a particular bug in which Unsubtyping optimized a private array type to be equivalent to an ignorable public array type, incorrectly changing the behavior of a cast, but it does not address the larger problem of optimizations producing types that are equivalent to public types. Add a TODO about that problem for now. Fixes #6935.
*	Fix Heap2Local on pops inside of newly-created blocks (#6938)	Alon Zakai	2024-09-16	1	-0/+124
\|
*	[EH] Fix pop enclosed within a block in DCE (#6922)	Heejin Ahn	2024-09-10	1	-3/+69
\| \| \| \| \| \| \| \| \| \| \|	#6400 fixed this case but that handled only when a `pop` is an immediate child of the current expression, but a `pop` can be nested deeper down. We conservatively run the EH fixup whenever we have a `pop` and create `block`s in the optimization. We considered using `FindAll<Pop>` to make it precise, but we decided the quadratic time plexity was not worth it. Fixes #6918.
*	Replace the old topological sort everywhere (#6902)	Thomas Lively	2024-09-10	5	-48/+48
\| \| \| \| \| \| \| \| \|	To avoid having two separate topological sort utilities in the code base, replace remaining uses of the old DFS-based, CRTP topological sort with the newer Kahn's algorithm implementation. This would be NFC, except that the new topological sort produces a different order than the old topological sort, so the output of some passes is reordered.
*	[NFC] OptimizeAddedConstants: Early exit if there are no memories (#6926)	Alon Zakai	2024-09-10	1	-0/+21
\| \| \| \| \| \| \| \|	The pass optimizes loads and stores, so without a memory there is nothing to do. This only helps if the user set --low-memory-unused and also has no memory, which is likely rare, but it's a trivial change so it seems worthwhile. In particular this pass constructs a LocalGraph, so if we can avoid work it can be substantial.
*	Use --preserve-type-order in more tests (#6923)	Thomas Lively	2024-09-10	4	-80/+76
\| \| \| \| \|	Update the remaining tests whose readability will be affected by the removal of the old topological sort in #6902, no matter how small their diffs would have been.
*	Use --preserve-type-order in select tests (#6917)	Thomas Lively	2024-09-10	4	-224/+177
\| \| \| \| \| \|	These are the tests that would otherwise have the largest diffs when changing the topological sort used to sort types. signature-refining_gto.wat also cannot be automatically updated, so there is extra benefit to making sure it has stable output.
*	Add a --preserve-type-order option (#6916)	Thomas Lively	2024-09-10	1	-0/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unlike other module elements, types are not stored on the `Module`. Instead, they are collected by traversing the IR before printing and binary writing. The code that collects the types tries to optimize the order of rec groups based on the number of times each type is used. As a result, the output order of types generally has no relation to the input order of types. In addition, most type optimizations rewrite the types into a single large rec group, and the order of types in that group is essentially arbitrary. Changes to the code for counting type uses, sorting types, or sorting rec groups can yield very large changes in the output order of types, producing test diffs that are hard to review and potentially harming the readability of tests by moving output types away from the corresponding input types. To help make test output more stable and readable, introduce a tool option that causes the order of output types to match the order of input types as closely as possible. It is implemented by having the parsers record the indices of the input types on the `Module` just like they already record the type names. The `GlobalTypeRewriter` infrastructure used by type optimizations associates the new types with the old indices just like it already does for names and also respects the input order when rewriting types into a large recursion group. By default, wasm-opt and other tools clear the recorded type indices after parsing the module, so their default behavior is not modified by this change. Follow-on PRs will use the new flag in more tests, which will generate large diffs but leave the tests in stable, more readable states that will no longer change due to other changes to the optimizing type sorting logic.
*	[NFC-ish] Remove LocalGraph from LocalSubtyping (#6921)	Alon Zakai	2024-09-10	1	-17/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The LocalGraph there was used for two purposes: 1. Get the list of gets and sets. 2. Get only the reachable gets and sets. It is trivial to get all the gets and sets in a much faster way, by just walking the code as this PR does. The downside is that we also consider unreachable gets and sets, so unreachable code can prevent us from optimizing, but that seems worthwhile as many passes make that assumption (and they all become maximally effective after --dce). That is the only non-NFC part here. Removing LocalGraph + the fixup code for unreachability makes this significantly shorter, and also 2-3x faster.
*	Adds a J2CL specific pass that moves itable entries to vtables (#6888)	Roberto Lublinerman	2024-09-06	1	-0/+230
\| \| \| \| \| \| \| \|	This allows to remove a reference field from all Java objects reducing the per object memory and initialization overhead. The pass is designed to run direclty on the J2CL output before other optimizations since it relies on invariants that might get lost in optimization. If the invariants don't hold the pass aborts.
*	Avoid conflicts with public rec groups in MinimizeRecGroups (#6911)	Thomas Lively	2024-09-06	1	-0/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As with all type optimizations, MinimizeRecGroups only changes private types, which are the only types that are safe to modify. However, it is important for correctness that MinimimizeRecGroups maintain separate type identities for all types, whether public or private, to ensure that casts that should differentiate two types cannot change behavior. Previously the pass worked exclusively on private types, so there was nothing preventing it from constructing a minimial rec group that happened to have the same shape, and therefore type identity, as a public rec group. #6886 exhibits a fuzzer test case where this happens and changes the behavior of the program. Fix the bug by recording all public rec group shapes and resolve conflicts with these shapes by updating the shape of the conflicting non-public type. Fixes #6886.
*	[NFC] Avoid wasted LocalGraph work in MergeLocals (#6908)	Alon Zakai	2024-09-05	1	-0/+30
\| \| \| \| \| \|	We computed both get and set influences, but getGetInfluences() was never called, so that work was entirely pointless. This makes the pass 20% faster.
*	Fix supertype counts when collecting heap types (#6905)	Thomas Lively	2024-09-05	6	-36/+34
\| \| \| \| \| \| \|	We previous incremented the use count for a declared supertype only if it was also a type we had never seen before. Fix the count by treating the supertype the same as any other type used in a type definition. Update tests accordingly, including by manually moving input types around to better match the output.
*	[NFC] Add a lazy mode to LocalGraph (#6895)	Alon Zakai	2024-09-05	1	-0/+82
\| \| \| \| \| \| \| \| \| \|	LocalGraph by default will compute all the local.sets that can be read from all local.gets. However, many passes only query a small amount of those. To avoid wasted work, add a lazy mode that only computes sets when asked about a get. This is then used in a single place, LoopInvariantCodeMotion, which becomes 18% faster.
*	Only generate string.consts custom section if it is needed (#6893)	Goktug Gokdogan	2024-09-05	1	-0/+22
\|
*	Use TopologicalSort::minSort to order rec groups (#6892)	Thomas Lively	2024-09-04	23	-169/+169
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rec groups need to be topologically sorted for the output module to be valid, but the specific order of rec groups also affects the module size because types at lower indices requires fewer bytes to reference. We previously optimized for code size when gathering types by sorting the list of groups before doing the topological sort. This was brittle, though, and depended on implementation details of the topological sort to be correct. Replace the old topological sort with use of the new `TopologicalSort::minSort` utility, which is a more principled method of achieving a minimal topological sort with respect to some comparator. Also draw inspiration from ReorderGlobals and apply an exponential factor to take the users of a rec group into account when determining its weight.
*	[NFC] Move optimizeSubsequentStructSet() to a new pass, ↵	Alon Zakai	2024-09-03	2	-2/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HeapStoreOptimization (#6882) This just moves code out of OptimizeInstructions to the new pass. The existing test is renamed and now runs the new pass instead. The new pass is run right after each --optimize-instructions invocation, so it should not cause any noticeable effects whatsoever, making this NFC. The motivation here is that there is a bug in the pass, see the new testcase added at the end, which shows the bug. It is not practical to fix that bug in OptimizeInstructions since we need more than peephole optimizations to do so. This PR moves the code to a new pass so we can fix it there properly, later. The new pass is named HeapStoreOptimization since the same infrastructure we will need to fix the bug will also help dead store elimination and related things.
*	[NFC] Optimize ParamUtils::getUsedParams() (#6866)	Alon Zakai	2024-08-26	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \|	This constructed a LocalGraph, which computes the sets that reach each get. But all we need to know is which params are live, so instead we can do a liveness computation (which is just a boolean, not the list of sets). Also, it is simple to get the liveness computation to only work on the parameters and not all the locals, as a further optimization. Existing tests cover this, though I did find that the case of unreachability needed a new test. On a large testcase I am looking at, this makes --dae 17% faster.
*	Add a string lowering mode disallowing non-UTF-8 strings (#6861)	Thomas Lively	2024-08-21	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \|	The best way to lower strings is via the "magic imports" API that uses the names of imported string globals as their values. This approach only works for valid UTF-8 strings, though. The existing string-lowering-magic-imports pass falls back to putting non-UTF-8 strings in a JSON custom section, but this requires the runtime to support that custom section for correctness. To help catch errors early when runtimes do not support the strings custom section, add a new pass that uses magic imports and raises an error if there are any invalid strings.
*	[Exceptions] Finish interpreter + optimizer support for try_table. (#6814)	Sébastien Doeraene	2024-08-20	16	-369/+1977
\| \| \| \| \| \|	* Add interpreter support for exnref values. * Fix optimization passes to support try_table. * Enable the interpreter (but not in V8, see code) on exceptions.
*	Add a pass for minimizing recursion groups (#6832)	Thomas Lively	2024-08-17	2	-0/+1864
\| \| \| \| \| \| \| \| \| \| \| \|	Most of our type optimization passes emit all non-public types as a single large rec group, which trivially ensures that different types remain different, even if they are optimized to have the same structure. Usually emitting a single large rec group is fine, but it also means that if the module is split, all of the types will need to be repeated in all of the split modules. To better support this use case, add a pass that can split the large rec group back into minimal rec groups, taking care to preserve separate type identities by emitting different permutations of the same group where possible or by inserting unused brand types to differentiate them.
*	Fix direct comparisons with unshared basic heap types (#6845)	Thomas Lively	2024-08-16	1	-0/+36
\| \| \| \| \|	Audit the remaining ocurrences of `== HeapType::` and fix those that did not handle shared types correctly. Add tests for some of the fixes; others are NFC but clarify the code.
*	Implement table.init (#6827)	Alon Zakai	2024-08-16	4	-0/+213
\| \| \| \| \|	Also use TableInit in the interpreter to initialize module's table state, which will now handle traps properly, fixing #6431
*	[NFC] Avoid v128 in rec groups with no other v128 uses (#6843)	Alon Zakai	2024-08-15	1	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't properly validate that yet. E.g.: (module (rec (type $func (func)) (type $unused (sub (struct (field v128)))) ) (func $func (type $func)) ) That v128 is not used, but it ends up in the output because it is in a rec group that is used. Atm we do not require that SIMD be enabled in such a case, which can trip up the fuzzer. Context: #6820. For now, modify the test that uncovered this.
*	Count supertypes when collecting module types (#6838)	Thomas Lively	2024-08-14	1	-3/+3
\| \| \| \| \| \| \| \| \|	Previously we included supertypes, but did not increase their count. This was done so that the output for the nominal type system, which introduced explicitly supertypes, would more closely match the output with the old equirecursive types system. Neither type system exists anymore and we only support the single, standard isorecursive type system, so we can now properly count supertypes. It turns out it doesn't make much of a difference in the test outputs anyway.
*	Monomorphization: Add a flag to control the required improvement (#6837)	Alon Zakai	2024-08-14	5	-12/+1668
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The argument is the minimum benefit we must see for us to decide to optimize, e.g. --monomorphize --pass-arg=monomorphize-min-benefit@50 When the minimum benefit is 50% then if we reduce the cost by 50% through monomorphization then we optimize there. 95% would only optimize when we remove almost all the cost, etc. In practice I see 95% will actually tend to reduce code size overall, as while we add monomorphized versions of functions, we only do so when we remove a lot of work and size, and after inlining we gain benefits. However, 50% or even lower can lead to better benchmark results, in return for larger code size, just like with inlining. To be careful, the default is set to 95%. Previously we optimized whenever we saw any benefit at all, which is the same as requiring a minimum benefit of 0%. Old tests have the flag applied in this PR to set that value, so they do not change.
*	Heap2Local: Track interactions in detail (#6834)	Alon Zakai	2024-08-13	1	-0/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we tracked only whether an expression was relevant to analysis, that is, whether it interacted with the allocation we were tracing the behavior of. That is not enough for all cases, though, so also track the form of the interaction, namely whether the allocation flows through or is fully consumed. An example where that matters: (ref.eq (struct.get $A 0 (local.tee $x (struct.new_default $A) ) ) (local.get $x) ) Here the local.get flows out the allocation, but the struct.get only fully consumes it. Before this PR we thought the struct.get flowed the allocation, and we misoptimized this to 1. To make this possible, do a bunch of minor refactoring: * Move ParentChildInteraction out of the class. * Add a "None" interaction there. * Replace the set of reached expressions with a map of them to their interactions. * Add helper functions to get an expression's interaction or to update it when replacing. The new testcase here shows the main fix. The new assertions are covered by existing testcases.
*	GlobalTypeOptimization: Reorder fields in order to remove them (#6820)	Alon Zakai	2024-08-12	1	-15/+443
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before, we only removed fields from the end of a struct. If we had, say struct Foo { int x; int y; int z; }; // Add no fields but inherit the parent's. struct Bar : Foo {}; If y is only used in Bar, but never Foo, then we still kept it around, because if we removed it from Foo we'd end up with Foo = {x, z}, Bar = {x, y, z} which is invalid - Bar no longer extends Foo. But we can do this if we first reorder the two: struct Foo { int x; int z; int y; // now y is at the end }; struct Bar : Foo {}; And the optimized form is struct Foo { int x; int z; }; struct Bar : Foo { int y; // now y is added in Bar }; This lets us remove all fields possible in all cases AFAIK. This situation is not super-common, as most fields are actually used both up and down the hierarchy (if they are used at all), but testing on some large real-world codebases, I see 10 fields removed in Java, 45 in Kotlin, and 31 in Dart testcases. The NFC change to src/wasm-type-ordering.h was needed for this to compile.
*	GTO: Remove minor optimization of avoiding ChildLocalizer sometimes (#6818)	Alon Zakai	2024-08-07	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The optimization is to only use ChildLocalizer, which moves children to locals, if we actually have a reason to use it. It is simple enough to see if we are removing fields with side effects here, and only call ChildLocalizer if we are not. However, this will become much more complicated in a subsequent PR which will reorder fields, which allows removing yet more of them (without reordering, we can only remove fields at the end, if any subtype needs the field). This is a pretty minor optimization, as it avoids adding a few locals in the rare case of struct.new operands having side effects. We run --gto at the start of the pipeline, so later opts will clean that up anyhow. (Though, this might make us a little less efficient, but the following PR will justify this regression.)
*	Fix shareability handling in TypeSSA collision logic (#6798)	Alon Zakai	2024-08-01	1	-0/+39
\|
*	Add a customizable title to Metrics reporting (#6792)	Alon Zakai	2024-07-30	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before the PR: $ bin/wasm-opt test/hello_world.wat --metrics total [exports] : 1 [funcs] : 1 [globals] : 0 [imports] : 0 [memories] : 1 [memory-data] : 0 [tables] : 0 [tags] : 0 [total] : 3 [vars] : 0 Binary : 1 LocalGet : 2 After the PR: $ bin/wasm-opt test/hello_world.wat --metrics Metrics total [exports] : 1 [funcs] : 1 ... Note the "Metrics" addition at the top. And the title can be customized: $ bin/wasm-opt test/hello_world.wat --metrics=text Metrics: text total [exports] : 1 [funcs] : 1 The custom title can be helpful when multiple invocations of metrics are used at once, e.g. --metrics=before -O3 --metrics=after.
*	Cost analysis: Remove "Unacceptable" hack (#6782)	Alon Zakai	2024-07-25	1	-7/+136
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We marked various expressions as having cost "Unacceptable", fixed at 100, to ensure we never moved them out from an If arm, etc. Giving them such a high cost avoids that problem - the cost is higher than the limit we have for moving code from conditional to unconditional execution - but it also means the total cost is unrealistic. For example, a function with one such instruction + an add (cost 1) would end up with cost 101, and removing the add would look insignificant, which causes issues for things that want to compare costs (like Monomorphization). To fix this, adjust some costs. The main change here is to give casts a cost of 5. I measured this in depth, see the attached benchmark scripts, and it looks clear that in both V8 and SpiderMonkey the cost of a cast is high enough to make it not worth turning an if with ref.test arm into a select (which would always execute the test). Other costs adjusted here matter a lot less, because they are on operations that have side effects and so the optimizer will anyhow not move them from conditional to unconditional execution, but I tried to make them a bit more realistic while I was removing "Unacceptable": * Give most atomic operations the 10 cost we've been using for atomic loads/ stores. Perhaps wait and notify should be slower, however, but it seems like assuming fast switching might be more relevant. * Give growth operations a cost of 20, and throw operations a cost of 10. These numbers are entirely made up as I am not even sure how to measure them in a useful way (but, again, this should not matter much as they have side effects).
*	TupleOptimization: Properly handle subtyping in copies (#6786)	Alon Zakai	2024-07-25	1	-2/+40
\| \| \| \| \| \|	We used the target's type for the read from the source, but due to subtyping those might be different. Found by the fuzzer.
*	[threads] Calculate shared heap type depths in subtypes.h (#6777)	Thomas Lively	2024-07-23	1	-0/+29
\| \| \|	Fixes #6776.
*	Heap2Local: Properly handle failing array casts (#6772)	Alon Zakai	2024-07-18	1	-0/+125
\| \| \| \| \| \| \| \|	Followup to #6727 which added support for failing casts in Struct2Local, but it turns out that it required Array2Struct changes as well. Specifically, when we turn an array into a struct then casts can look like they behave differently (what used to be an array input, becomes a struct), so like with RefTest that we already handled, check if the cast succeeds in the original form and handle that.
*	Monomorphization: Add a limit on the number of parameters (#6774)	Alon Zakai	2024-07-18	1	-0/+400
\|
*	Monomorphize all the things (#6760)	Alon Zakai	2024-07-18	5	-305/+2071
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously call operands were monomorphized (considered as part of the call context, so we can create a specialized function with those operands fixed) if they were constant or had a different type than the function parameter's type. This generalizes that to pull in pretty much all the code we possibly can, including nested code. For example: (call $foo (struct.new $struct (i32.const 10) (local.get $x) (local.get $y) ) ) This can turn into (call $foo_mono (local.get $x) (local.get $y) ) The struct.new and even one of the struct.new's children is moved into the called function, replacing the original ref argument with two other ones. If the original called function was this: (func $foo (param $ref (ref ..)) .. ) then the monomorphized function then looks like this: (func $foo_mono (param $x i32) (param $y i32) (local $ref (ref ..)) (local.set $ref (struct.new $struct (i32.const 10) (local.get $x) (local.get $y) ) ) .. ) The struct.new and its constant child appear here, and we read the parameters. To do this, generalize the code that creates the call context to accept everything that is impossible to copy (like a local.get) or that would be tricky and likely unworthwhile (like another call or a tuple). Also check for effect interactions, as this code motion does some reordering. For this to work, we need to adjust how we compute the costs we compare when deciding what to monomorphize. Before we just compared the called function to the monomorphized called function, which was good enough when the call context only contained consts, but now it can contain arbitrarily nested code. The proper comparison is between these two: * Old function + call context * New monomorphized function Including the call context makes this a fair comparison. In the example above, the struct.new and the i32.const are part of the call context, and so they are in the monomorphized function, so if we didn't count them in other function we'd decide not to optimize anything with a large context. The new functionality is tested in a new file. A few parts of existing tests needed changes to not become pointless after this improvement, namely by replacing stuff that we now optimize with things that we don't like replacing an i32.eqz with a local.get. There are also a handful of test outcomes that change in CAREFUL mode due to the new cost analysis.
*	Make it possible to skip several passes (#6714)	Jérôme Vouillon	2024-07-17	1	-0/+65
\| \| \|	--skip-pass can now be specified more than once on the commandline.
*	[threads] Update TypeSSA for shared types (#6753)	Thomas Lively	2024-07-16	1	-0/+72
\| \| \| \|	When creating a new subtype, make sure to copy the supertype's shareability.
*	Remove extra space printed in empty structs (#6750)	Thomas Lively	2024-07-16	38	-308/+308
\| \| \| \| \| \|	When we switched to the new type printing machinery, we inserted this extra space to minimize the diff in the test output compared with the previous type printer. Improve the quality of the printed output by removing it.
*	Monomorphize dropped functions (#6734)	Alon Zakai	2024-07-12	1	-0/+1191
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We now consider a drop to be part of the call context: If we see (drop (call $foo) ) (func $foo (result i32) (i32.const 42) ) Then we'd monomorphize to this: (call $foo_1) ;; call the specialized function instead (func $foo_1 ;; the specialized function returns nothing (drop ;; the drop was moved into here (i32.const 42) ) ) With the drop now in the called function, we may be able to optimize out unused work. Refactor a bit of code out of DAE that we can reuse here, into a new return-utils.h.
*	Remove non-standard `i31.new` (#6736)	Thomas Lively	2024-07-12	3	-44/+44
\| \| \| \|	The standard name for the instruction is `ref.i31`. Remove support for the non-standard name and update tests that were still using it.
*	Do not abbreviate items in element segments (#6737)	Thomas Lively	2024-07-12	4	-7/+7
\| \| \| \| \| \| \| \|	The full syntax for an expression in an element syntax looks like `(item (ref.null none))`, but we have been printing the abbreviated version, which omits the `(item ...)`. This abbreviation is only valid when the item has only a single instruction, so it is not always correct to use it. Rather than determining whether or not to use the abbreviation on a case-by-case basis, always print the full syntax.
*	Memory64Lowering: Handle -1 return value from memory.grow (#6733)	Sam Clegg	2024-07-11	1	-4/+18
\| \| \|	This edge case make the lowering a little more tricky.
*	Convert memory64 lowering test to lit. NFC (#6731)	Sam Clegg	2024-07-11	1	-0/+283
\| \| \|	Test was converted using port_passes_tests_to_lit.py.
*	Monomorphize: Use -O3 over -O1 + tweaks (#6732)	Alon Zakai	2024-07-11	2	-21/+24
\| \| \| \| \|	Eventually we will need to do some tuning of compile time speed, but for now it is going to be simpler to do all the opts, in particular because it makes writing tests simpler.
*	[WasmGC] Heap2Local: Optimize RefCast failures (#6727)	Alon Zakai	2024-07-11	1	-15/+89
\| \| \| \| \| \| \|	Previously we just did not optimize cases where our escape analysis showed an allocation flowed into a cast that failed. However, after inlining there can be real-world cases where that happens, even in traps-never-happen mode (if the cast is behind a conditional branch), so it seems worth optimizing.
*	Heap2Local: Drop RefEq's two arms (#6729)	Alon Zakai	2024-07-11	1	-81/+76
\| \| \| \| \|	This is a tiny bit more code but it is more consistent with other operations, and it saves work later.
*	Monomorphization: Optimize constants (#6711)	Alon Zakai	2024-07-11	4	-41/+812
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously the pass would monomorphize a call when we were sending more refined types than the target expects. This generalizes the pass to also consider the case where we send a constant in a parameter. To achieve that, this refactors the pass to explicitly define the "call context", which is the code around the call (inputs and outputs) that may end up leading to optimization opportunities when combined with the target function. Also add comments about the overall design + roadmap. The existing test is mostly unmodified, and the diff there is smaller when ignoring whitespace. We do "regress" those tests by adding more local.set operations, as in the refactoring that makes things a lot simpler, that is, to handle the general case of an operand having either a refined type or be a constant, we copy it inside the function, which works either way. This "regression" is only in the testing version of the pass (the normal version runs optimizations, which would remove that extra code). This also enables the pass when GC is disabled. Previously we only handled refined types, so only GC could benefit. Add a test for MVP content specifically to show we operate there as well.