summaryrefslogtreecommitdiff
path: root/src/passes
Commit message (Collapse)AuthorAgeFilesLines
...
* Add local.set/tee local type annotations to BINARYEN_PRINT_FULL (#6657)Alon Zakai2024-06-131-12/+25
| | | | | | | | | | | | | With this we now print e.g. (local.set $temp (; local type: i32 ;) ... This can be nice in large functions to avoid needing to scroll up to see the local type, e.g. when debugging why unsubtyping doesn't work somewhere. Also avoid [ ] in this mode, in favor of the standard (; ;), and put those at the end rather than at the start.
* [threads] Parse, build, and print shared composite types (#6654)Thomas Lively2024-06-121-0/+4
| | | | | | | | | | | | | | Parse the text format for shared composite types as described in the shared-everything thread proposal. Update the parser to use 'comptype' instead of 'strtype' to match the final GC spec and add the new syntactic class 'sharecomptype'. Update the type canonicalization logic to take sharedness into account to avoid merging shared and unshared types. Make the same change in the TypeMerging pass. Ensure that shared and unshared types cannot be in a subtype relationship with each other. Follow-up PRs will add shared abstract heap types, binary parsing and emitting for shared types, and fuzzer support for shared types.
* [DebugInfo] Copy debug info in call-utils.h (#6652)Alon Zakai2024-06-122-7/+11
| | | | | | | | | | | | | | | | | | | | | | | We automatically copy debuginfo in replaceCurrent(), but there are a few places that do other operations than simple replacements. call-utils.h will turn a call_ref with a select target into two direct calls, and we were missing the logic to copy debuginfo from the call_ref to the calls. To make this work, refactor out the copying logic from wasm-traversal, into debuginfo.h, and use it in call-utils.h. debuginfo.h itself is renamed from debug.h (as now this needs to be included from wasm-traversal, which nearly everything does, and it turns out some files have internal stuff like a debug() helper that ends up conflicing with the old debug namespace). Also rename the old copyDebugInfo function to copyDebugInfoBetweenFunctions which is more explicit. That is also moved from the header to a cpp file because it depends on wasm-traversal (so we'd end up with recursive headers otherwise). That is fine, as that method is called after copying a function, which is not that frequent. The new copyDebugInfoToReplacement (which was refactored out of wasm-traversal) is in the header because it can be called very frequently (every single instruction we optimize) and we want it to get inlined.
* [Strings] Keep public and private types separate in StringLowering (#6642)Alon Zakai2024-06-101-13/+39
| | | | | | | | | | | | | | | | We need StringLowering to modify even public types, as it must replace every single stringref with externref, even if that modifies the ABI. To achieve that we told it that all string-using types were private, which let TypeUpdater update them, but the problem is that it moves all private types to a new single rec group, which meant public and private types ended up in the same group. As a result, a single public type would make it all public, preventing optimizations and breaking things as in #6630 #6640. Ideally TypeUpdater would modify public types while keeping them in the same rec groups, but this may be a very specific issue for StringLowering, and that might be a lot of work. Instead, just make StringLowering handle public types of functions in a manual way, which is simple and should handle all cases that matter in practice, at least in J2Wasm.
* Fix stack-use-after-scope on Windows in Precompute (#6643)mtb2024-06-051-1/+2
| | | | | Create a temp var to store the ChildIterator. Fixes #6639
* Optimize ReorderGlobals ordering with a new algorithm (#6625)Alon Zakai2024-05-311-71/+326
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old ordering in that pass did a topological sort while sorting by uses both within topological groups and between them. That could be unoptimal in some cases, however, and actually on J2CL output this pass made the binary larger, which is how we noticed this. The problem is that such a toplogical sort keeps topological groups in place, but it can be useful to interleave them sometimes. Imagine this: $c - $a / $e \ $d - $b Here $e depends on $c, etc. The optimal order may interleave the two arms here, e.g. $a, $b, $d, $c, $e. That is because the dependencies define a partial order, and so the arms here are actually independent. Sorting by toplogical depth first might help in some cases, but also is not optimal in general, as we may want to mix toplogical depths: $a, $c, $b, $d, $e does so, and it may be the best ordering. This PR implements a natural greedy algorithm that picks the global with the highest use count at each step, out of the set of possible globals, which is the set of globals that have no unresolved dependencies. So we start by picking the first global with no dependencies and add at at the front; then that unlocks anything that depended on it and we pick from that set, and so forth. This may also not be optimal, but it is easy to make it more flexible by customizing the counts, and we consider 4 sorts here: * Set all counts to 0. This means we only take into account dependencies, and we break ties by the original order, so this is as close to the original order as we can be. * Use the actual use counts. This is the simple greedy algorithm. * Set the count of each global to also contain the counts of its children, so the count is the total that might be unlocked. This gives more weight to globals that can unlock more later, so it is less greedy. * As last, but weight children's counts lower in an exponential way, which makes sense as they may depend on other globals too. In practice it is simple to generate cases where 1, 2, or 3 is optimal (see new tests), but on real-world J2CL I see that 4 (with a particular exponential coefficient) is best, so the pass computes all 4 and picks the best. As a result it will never worsen the size and it has a good chance of improving. The differences between these are small, so in theory we could pick any of them, but given they are all modifications of a single algorithm it is very easy to compute them all with little code complexity. The benefits are rather small here, but this can save a few hundred bytes on a multi-MB Java file. This comes at a tiny compile time cost, but seems worth it for the new guarantee to never regress size.
* LogExecution: Optionally take a module name for the logger function (#6629)YAMAMOTO Takashi2024-05-311-12/+30
| | | | | | | | | --log-execution=NAME will use NAME as the module for the logger function import, rather than infer it. If the name is not provided (--log-execution as before this PR) then we will try to automatically decide which to use ("env", unless we see another module name is used, which can be the case in optimized modules).
* Avoid duplicate type names (#6633)Alon Zakai2024-05-301-1/+1
| | | | | If we replace a type with another, use the original name for the new type, and give the old a unique name (for the rare cases in which it has uses).
* Fix Vacuuming of code leading up to an infinite loop (#6632)Alon Zakai2024-05-291-4/+5
| | | | | We had that logic right in other places, but the specific part of Vacuum that looks at code that leads up to an unreachable did not check for infinite loops, so it could remove them.
* SignaturePruning: Properly handle public types (#6630)Alon Zakai2024-05-291-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SignaturePruning pass optimizes away parameters that it proves are safe to remove. It turns out that that does not always match the definition of private types, which is more restrictive. Specifically, if say all the types are in one big rec group and one of them is used on an exported function then all of them are considered public (as the rec group is). However, in closed world, it would be ok to leave that rec group unchanged but to create a pruned version of that type and use it, in cases where we see it is safe to remove a parameter. (See the testcase for a concrete example.) To put it another way, SignaturePruning already proves that a parameter is safe to remove in all the ways that matter. Before this PR, however, the testcase in this PR would error - so this PR is not an optimization but a bugfix, really - because SignaturePruning would see that a parameter is safe to remove but then TypeUpdating would see the type is public and so it would leave it alone, leading to a broken module. This situation is in fact not that rare, and happens on real-world Java code. The reason we did not notice it before is that typically there are no remaining SignaturePruning opportunities late in the process (when other closed world optimizations have typically led to a single big rec group). The concrete fix here is to add additionalPrivateTypes to a few more places in TypeUpdating. We already supported that for cases where a pass knew better than the general logic what can be modified, and this adds that ability to the signature-rewriting logic there. Then SignaturePruning can send in all the types it has proven are safe to modify. * Also necessary here is to only add from additionalPrivateTypes if the type is not already in our list (or we'd end up with duplicates in the final rec group). * Also move newSignatures in SignaturePruning out of the top level, which was confusing (the pass has multiple iterations, and we want each to have a fresh instance).
* Remove obsolete parser code (#6607)Thomas Lively2024-05-291-1/+0
| | | | | Remove `SExpressionParser`, `SExpressionWasmBuilder`, and `cashew::Parser`. Simplify gen-s-parser.py. Remove the --new-wat-parser and --deprecated-wat-parser flags.
* Run RemoveUnneededModuleElements early (#6620)Alon Zakai2024-05-291-0/+6
| | | | | | | | | | | | | | | | Doing it before anything else can help a lot if there is a significant amount of dead code that can be removed, as it saves work for all the later passes. We did run this pass if GC was enabled just a few passes later down, but even so it is worthwhile to run it an additional time, and it makes sense to do even without GC (though in typical optimized LLVM outputs there will be little dead code). If there is no dead code then this is wasted work, but this is a fairly fast pass, and I measure no significant slowdown due to this. E.g. on the 35 MB clang.wasm (which is already optimized, so little dead code) it takes around a second, while all of -O2 takes almost two minutes, so the difference is just 1%. On J2CL I measure a 15% speedup in -O3 --closed-world -tnh, and also the binary is 2.5% smaller, which means there is less work for later cycles of -O3.
* OptimizeInstructions: Push StructNew down to help it fold away StructSets ↵Roberto Lublinerman2024-05-281-17/+55
| | | | | | | | | (#6584) Heap stores (struct.set) are optimized into the struct.new when they are adjacent in a statement list. Pushing struct.new down past irrelevant instructions increases the likelihood that it ends up adjacent to sets.
* SimplifyGlobals: Do not switch a get to use a global of another type (#6605)Alon Zakai2024-05-201-1/+8
| | | | If we wanted to switch types in such cases we'd need to refinalize (which is likely worth doing, though other passes should refine globals anyhow).
* Fix generate-dyncalls and directize passed under table64 (#6604)Sam Clegg2024-05-182-13/+14
|
* Fix GlobalRefining's handling of gets in module code and add missing ↵Alon Zakai2024-05-171-2/+3
| | | | | | | | | | | validation (#6603) GlobalRefining did not traverse module code, so it did not update global.gets in other globals. Add missing validation that actually errors on that: We did not check global.get types. These could be separate PRs but it would be difficult to test them separately.
* [Memory64Lowering/Table64Lowering] Avoid dependency in visitation order. ↵Sam Clegg2024-05-162-26/+22
| | | | | NFC (#6600) Followup to #6599.
* [Table64Lowering] Don't assume that all segments are from 64-bit tables (#6599)Sam Clegg2024-05-162-6/+18
| | | | | | | | | This allows modules to contains both 32-bit and 64-bit segment. In order to check the table/memory state when visiting segments we need to ensure that memories/tables are visited only after their segments. The comments in visitTable/visitMemory already assumed this but it wasn't true in practice.
* Add table64 lowering pass (#6595)Sam Clegg2024-05-155-36/+189
| | | | | Changes to wasm-validator.cpp here are mostly for consistency between elem and data segment validation.
* OptimizeInstructions: Add missing invalidation check in consecutive equality ↵Alon Zakai2024-05-151-0/+22
| | | | | | | | | | | | | | | | | | | | | test (#6596) This existed before #6495 but became noticeable there. We only looked at the fallthrough values in the later part of areConsecutiveInputsEqual, but there can be invalidation due to the non-fallthrough part: (i32.add (local.get $x) (block (local.set $x ..) (local.get $x) ) ) The set can cause the local.get to differ the second time. To fix this, check if the non-fallthrough part invalidates the fallthrough (but only on the right hand side). Fixes #6593
* [EH] Rename option/pass names for new EH (exnref) (#6592)Heejin Ahn2024-05-153-7/+10
| | | | | | | | | | | | | | | | | We settled on the name `WASM_EXNREF` for the new setting in Emscripten for the name for the new EH option. https://github.com/emscripten-core/emscripten/blob/2bc5e3156f07e603bc4f3580cf84c038ea99b2df/src/settings.js#L782-L786 "New EH" sounds vague and I'm not sure if "experimental" is really necessary anyway, given that the potential users of this option is aware that this is a new spec that has been adopted recently. To make the option names consistent, this renames `--translate-to-eh` (the option that only runs the translator) to `--translate-to-exnref`, and `--experimental-new-eh` to `--emit-exnref` (the option that runs the translator at the end of the whole pipeline), and renames the pass and variable names in the code accordingly as well. In case anyone is using the old option names (and also to make the Chromium CI pass), this does not delete the old options.
* [Strings] Remove operations not included in imported strings (#6589)Thomas Lively2024-05-153-61/+9
| | | | | | The stringref proposal has been superseded by the imported JS strings proposal, but the former has many more operations than the latter. To reduce complexity, remove all operations that are part of stringref but not part of imported strings.
* [Strings] Remove stringview types and instructions (#6579)Thomas Lively2024-05-153-90/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The stringview types from the stringref proposal have three irregularities that break common invariants and require pervasive special casing to handle properly: they are supertypes of `none` but not subtypes of `any`, they cannot be the targets of casts, and they cannot be used to construct nullable references. At the same time, the stringref proposal has been superseded by the imported strings proposal, which does not have these irregularities. The cost of maintaing and improving our support for stringview types is no longer worth the benefit of supporting them. Simplify the code base by entirely removing the stringview types and related instructions that do not have analogues in the imported strings proposal and do not make sense in the absense of stringviews. Three remaining instructions, `stringview_wtf16.get_codeunit`, `stringview_wtf16.slice`, and `stringview_wtf16.length` take stringview operands in the stringref proposal but cannot be removed because they lower to operations from the imported strings proposal. These instructions are changed to take stringref operands in Binaryen IR, and to allow a graceful upgrade path for users of these instructions, the text and binary parsers still accept but ignore `string.as_wtf16`, which is the instruction used to convert stringrefs to stringviews. The binary writer emits code sequences that use scratch locals and `string.as_wtf16` to keep the output valid. Future PRs will further align binaryen with the imported strings proposal instead of the stringref proposal, for example by making `string` a subtype of `extern` instead of a subtype of `any` and by removing additional instructions that do not have analogues in the imported strings proposal.
* LocalCSE: Fix regression from #6587 by accumulating generativity (#6591)Alon Zakai2024-05-151-28/+44
| | | | | | | | | | | | | #6587 was incorrect: It checked generativity early in an incremental manner, but it did not accumulate that information as we do with hashes. As a result we could end up optimizing something with a generative child, and sadly we lacked testing for that case. This adds incremental generativity computation alongside hashes. It also splits out this check from isRelevant. Also add a test for nested effects (as opposed to generativity), but that already worked before this PR (as we compute effects and invalidation as we go, already).
* Remove redundant ptrType from MemorySize/Grow instructions. NFC (#6590)Sam Clegg2024-05-151-2/+2
| | | | I recently add TableSize/Grow and noticed I didn't need these. It seems they are superfluous.
* Source maps: Allow specifying that an expression has no debug info in text ↵Jérôme Vouillon2024-05-141-8/+24
| | | | | | | | | | | | (#6520) ;;@ with nothing else (no source:line) can be used to specify that the following expression does not have any debug info associated to it. This can be used to stop the automatic propagation of debug info in the text parsers. The text printer has also been updated to output this comment when needed.
* LocalCSE: Ignore traps of code in between (#6588)Alon Zakai2024-05-141-1/+11
| | | | | | | | | | | | | | | | | | Given: (ORIGINAL) (in between) (COPY) We want to change that to (local.tee $temp (ORIGINAL)) (in between) (local.get $temp) It is fine if "in between" traps: then we never reach the new local.get. This is a safer situation than most optimizations because we are not reordering anything, only replacing known-equivalent code.
* LocalCSE: Check effects/generativity early (#6587)Alon Zakai2024-05-142-11/+41
| | | | | | | | | | | | | | | | | | | | Previously we checked late, and as a result might end up failing to optimize when a sub-pattern could have worked. E.g. (call (A) ) (call (A) ) The call cannot be optimized, but the A pattern repeats. Before this PR we'd greedily focus on the entire call and then fail. After this PR we skip the call before we commit to which patterns to try to optimize, so we succeed. Add a isShallowlyGenerative helper here as we compute this step by step as we go. Also remove a parameter to the generativity code (it did not use the features it was passed).
* [memory64] Add table64 to existing memory64 support (#6577)Sam Clegg2024-05-101-0/+3
| | | | | | | Tests is still very limited. Hopefully we can use the upstream spec tests soon and avoid having to write our own tests for `.set/.set/.fill/etc`. See https://github.com/WebAssembly/memory64/issues/51
* [StackIR] Run StackIR during binary writing and not as a pass (#6568)Alon Zakai2024-05-098-677/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously we had passes --generate-stack-ir, --optimize-stack-ir, --print-stack-ir that could be run like any other passes. After generating StackIR it was stashed on the function and invalidated if we modified BinaryenIR. If it wasn't invalidated then it was used during binary writing. This PR switches things so that we optionally generate, optimize, and print StackIR only during binary writing. It also removes all traces of StackIR from wasm.h - after this, StackIR is a feature of binary writing (and printing) logic only. This is almost NFC, but there are some minor noticeable differences: 1. We no longer print has StackIR in the text format when we see it is there. It will not be there during normal printing, as it is only present during binary writing. (but --print-stack-ir still works as before; as mentioned above it runs during writing). 2. --generate/optimize/print-stack-ir change from being passes to being flags that control that behavior instead. As passes, their order on the commandline mattered, while now it does not, and they only "globally" affect things during writing. 3. The C API changes slightly, as there is no need to pass it an option "optimize" to the StackIR APIs. Whether we optimize is handled by --optimize-stack-ir which is set like other optimization flags on the PassOptions object, so we don't need the old option to those C APIs. The main benefit here is simplifying the code, so we don't need to think about StackIR in more places than just binary writing. That may also allow future improvements to our usage of StackIR.
* [J2Cl] Make J2clOpts more effective with transitive deps in constant ↵Roberto Lublinerman2024-05-091-43/+129
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | intialization (#6571) Constants that need to be hoisted sometimes are initialized by calling getters of other constants that need to be hoisted. These getters are non-trivial, e.g. (func $getConst1_<once>_@X (result (ref null $A)) (block (result (ref null $A)) (if (i32.eqz (ref.is_null (global.get $$const1@X))) (then (return (global.get $$const1@X)) ) ) (global.set $$const1@X (struct.new $A (i32.const 2))) (global.get $$const1@X) ) (func $getConst2_<once>_@X (result (ref null $A)) (block (result (ref null $A)) (if (i32.eqz (ref.is_null (global.get $$const2@X))) (then (return (global.get $$const2@X)) ) ) (global.set $$const2@X .... expression with (call $getConst1_<once>_@X) ....)) (global.get $$const2@X) ) and can only be simplified after the constants they initialize are hoisted. After the constant is hoisted the getter can be inlined and constants that depend on it for their initialization can now be hoisted. Before this pass, inlining would happen after the pass was run by a subsequent run of the inliner (likely as part of -O3), requiring as many runs of this pass, interleaved with the inliner, as the depth in the call sequence. By having a simpler inliner run as part of the loop in this pass, the pass becomes more effective and more independent of the call depths.
* Source map fixes (#6550)Jérôme Vouillon2024-05-022-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | * Keep debug locations at function start The `fn_prolog_epilog.debugInfo` test is failing otherwise, since there was debug information associated to the nop instruction at the beginning of the function. * Do not clear the debug information when reaching the end of the source map The last segment should extend to the end of the function. * Propagate debug location from the function prolog to its first instruction * Fix printing of epilogue location The text parser no longer propagates locations to the epilogue, so we should always print the location if there is one. * Fix debug location smearing The debug location of the last instruction should not smear into the function epilogue, and a debug location from a previous function should not smear into the prologue of the current function.
* Respect the Web limitation on Table size (#6567)Alon Zakai2024-05-011-0/+1
| | | | | Without this the fuzzer can error on differences in behavior between V8 and us. Also move the limitations constants to their own header.
* J2CLOpts: Add "precompute" and "remove-unused-brs" as additional cleanupRoberto Lublinerman2024-04-301-0/+2
| | | | This makes the cleanup of bodies of functions that have had constants hoisted from them more effective.
* [NFC] Use the new wat parser in RemoveNonJSOps (#6554)Thomas Lively2024-04-291-5/+4
|
* [Strings] Do not reuse mutable globals in StringGathering (#6531)Alon Zakai2024-04-241-1/+2
| | | | | We were reusing mutable globals in StringGathering, which meant that we'd use a global to represent a particular string but if it was mutated then it could contain a different string during execution.
* [EH] Fix assumption that all throw_refs are created from rethrows (#6524)Heejin Ahn2024-04-241-5/+7
| | | | | | | | `shouldBeRef` incorrectly assumed that all `throw_ref`s within a `catch` body had been generated from `rethrow`s, which was not true, because `throw_ref`s are also created when translating `try`-`delegate`s: https://github.com/WebAssembly/binaryen/blob/219e668e87b012c0634043ed702534b8be31231f/src/passes/TranslateEH.cpp#L304 This fixes the assumption and changes `cast` to `dynCast`.
* [EH] Fix missing outer block for catchless try (#6519)Heejin Ahn2024-04-241-8/+21
| | | | | | | | | | | | | | | | | | | When translating a `try` expression, we may need an 'outer' block that wraps the newly generated `try_table` so we can jump out of the expression when an exception does not occur. (The condition we use is when the `try` has any catches or if the `try` is a target of any inner `try-delegate`s: https://github.com/WebAssembly/binaryen/blob/219e668e87b012c0634043ed702534b8be31231f/src/passes/TranslateEH.cpp#L677) In case the `try` has either of `catch` or `delegate`, when we have the 'outer' block, we add the newly created `try_table` in the 'outer' block and replace the whole expression with the block: https://github.com/WebAssembly/binaryen/blob/219e668e87b012c0634043ed702534b8be31231f/src/passes/TranslateEH.cpp#L670 https://github.com/WebAssembly/binaryen/blob/219e668e87b012c0634043ed702534b8be31231f/src/passes/TranslateEH.cpp#L332-L340 But in case of a catchless `try`, we forgot to do that: https://github.com/WebAssembly/binaryen/blob/219e668e87b012c0634043ed702534b8be31231f/src/passes/TranslateEH.cpp#L388 So this PR fixes it.
* Precompute: Ignore mutable arrays in StringNew (#6522)Alon Zakai2024-04-231-0/+22
| | | | | All Struct/Array operations must ignore mutable fields in Precompute atm, which we did, but StringNew has an array variant that does an effective ArrayGet operation, which we didn't handle.
* OptimizeInstructions: Optimize subsequent struct.sets after ↵Alon Zakai2024-04-231-12/+21
| | | | | | | | | struct.new_with_default (#6523) Before we preferred not to add default values, as that increases code size. But since #6495 we turn more things into struct.new_with default, so it is important to handle this. It seems likely that in most cases the code size downside of adding default values is offset by avoiding a local.set later, so always do this (rather than add some kind of heuristic).
* [EH] Fix delegating to caller when func result is concrete (#6518)Heejin Ahn2024-04-231-1/+1
| | | | | | | | | | | hen the function return type is concrete, the translation result of a `try`-`delegate` that targets the caller includes a `return` that returns the whole function body: https://github.com/WebAssembly/binaryen/blob/219e668e87b012c0634043ed702534b8be31231f/src/passes/TranslateEH.cpp#L751-L763 We should do that based on the function's return type, not the function body's return type. The previous code didn't handle the case where the function's return type is concrete but the function body's return type is unreachable.
* DebugLocationPropagation: pass debuglocation from parent node to chil… (#6500)许鑫权2024-04-214-0/+105
| | | | | | | | | | | | | | | | | | | This PR creates a pass to propagate debug location from parent node to child nodes which has no debug location with pre-order traversal. This is useful for compilers that use Binaryen API to generate WebAssembly modules. It behaves like `wasm-opt` read text format file: children are tagged with the debug info of the parent, if they have no annotation of their own. For compilers that use Binaryen API to generate WebAssembly modules, it is a bit redundant to add debugInfo for each expression, Especially when the compiler wrap expressions. With this pass, compilers just need to add debugInfo for the parent node, which is more convenient. For example: ``` (drop (call $voidFunc) ) ``` Without this pass, if the compiler only adds debugInfo for the wrapped expression `drop`, the `call` expression has no corresponding source code mapping in DevTools debugging, which is obviously not user-friendly.
* OptimizeCasts: Also handle local.tee (#6507)Jérôme Vouillon2024-04-181-27/+26
| | | | | | | | | | | | | | | | | | Converts the following: (some.operation (ref.cast .. (local.tee $ref ..)) (local.get $ref) ) into: (some.operation (local.tee $temp (ref.cast .. (local.tee $ref ..)) ) (local.get $temp) )
* CoalesceLocals: ReFinalize when we refine a type (#6502)Alon Zakai2024-04-161-12/+12
| | | | | | | | | | | The code had a different workaround, namely to add a block with an explicit type to avoid changing the type at all (i.e. the block declared the original type of the thing we replaced, not the refined type). But by adding a block we can end up with an invalid pop, as the fuzzer found, see the EH testcase here. To fix this, use the usual workaround of just running ReFinalize. That is simpler and also improves the code while we do it. It does add more work, but this is likely a very rare situation anyhow.
* Do not repeat types names in text output (#6499)Thomas Lively2024-04-161-2/+35
| | | | | | | | | | For types that do not have explicit names, we generate index-based names in the printer. However, we did not previously ensure that the generated types were not already used as explicit names, so it was possible to print the same name for multiple types, which is not valid. Fix the problem by skipping indices that are already used as type names. Fixes #6492.
* OptimizeInstructions: Optimize StructNew/ArrayNew forms (#6495)Alon Zakai2024-04-151-0/+159
| | | | | | | | | | | | | struct.new with default values can be struct.new_default. array.new with default values can be array.new_default. array.new of size 1 should be array.new_fixed (saves the Const). array.new_fixed with default values can be array.new_default. array.new_fixed with equal but non-default values can be array.new (basically use a Const to say how many copies we want, rather than copy).
* [Strings] Add a string lowering pass using magic imports (#6497)Thomas Lively2024-04-154-16/+37
| | | | | | | | | | | | | | | | | The latest idea for efficient string constants is to encode the constants in the import names of their globals and implement fast paths in the engines for materializing those constants at instantiation time without needing to parse anything in JS. This strategy only works for valid strings (i.e. strings without unpaired surrogates) because only valid strings can be used as import names in the WebAssembly syntax. Add a new configuration of the StringLowering pass that encodes valid string contents in import names, falling back to the JSON custom section approach for invalid strings. To test this chang, update the printer to escape import and export names properly and update the legacy parser to parse escapes in import and export names properly. As a drive-by, remove the incorrect check in the parser that the import module and base names are non-empty.
* When creating `dynCall` and `legalstub` function mark them as ↵Sam Clegg2024-04-122-0/+2
| | | | | | | | | hasExplicitName. NFC (#6496) This is temporary workaround for the current test failures on the emscripten waterfall. The real fix I believe will be to make `hasExplicitName` the default in these kinds of cases. See: #6466
* Fix isGenerative on calls and test via improving ↵Alon Zakai2024-04-111-28/+52
| | | | | | | | | | | | | | | | OptimizeInstructions::areConsecutiveInputsEqual() (#6481) "Generative" is what we call something like a struct.new that may be syntactically identical to another struct.new, but each time a new value is generated. The same is true for calls, which can do anything, including return a different value for syntactically identical calls. This was not a bug because the main user of isGenerative, areConsecutiveInputsEqual(), was too weak to notice, that is, it gave up sooner, for other reasons. This PR improves that function to do a much better check, which makes the fix necessary to prevent regressions. This is not terribly important for itself, but will help a later PR that will add code that depends more heavily on areConsecutiveInputsEqual().
* Fix ConstantFieldPropagation signed packed field handling and improve ↵Alon Zakai2024-04-112-34/+10
| | | | | | | | | | | | | | | | Heap2Local's (#6493) CFP already had logic for truncating but not for sign-extending, which this fixes. Use the new helper function in Heap2Local as well. This changes the model there from "truncate on set, sign-extend on get" to "truncate or sign-extend on get". That is both simpler by reusing the same logic as CFP but also more optimal: the idea to truncate on sets made sense since sets are rarer, but if we must then sign-extend on gets then we can end up doing more work overall (as the truncations on sets are not needed if all gets are signed). Found by #6486