forks/binaryen.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	Improve fuzzing of both closed and open world styles of modules (#7090)	Alon Zakai	2024-11-19	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	Before, we would simply not export a function that had an e.g. anyref param. As a result, the modules were effectively "closed", which was good for testing full closed-world mode, but not for testing degrees of open world. To improve that, this PR allows the fuzzer to export such functions, and an "enclose world" pass is added that "closes" the wasm (makes it more compatible with closed-world) that is run 50% of the time, giving us coverage of both styles.
*	Add nontrapping-fptoint lowering pass (#7016)	Derek Schuff	2024-11-19	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	This pass lowers nontrapping FP to int instructions to implement LLVM's conversion behavior. This means that they are not fully complete lowerings according to the wasm spec, but have the same undefined behavior that LLM does. This keeps the pass simpler and preserves existing behavior when compiling without nontrapping-ft. This will be used in emscripten, so that we can build libraries with nontrapping-fp and lower them away after link if desired.
*	Rename memory-copy-fill-lowering pass (#7082)	Derek Schuff	2024-11-16	1	-2/+2
\| \| \| \|	Since the resulting code has the same undefined behavior as LLVM, make the pass name reflect that.
*	Introduce pass to lower memory.copy and memory.fill (#7021)	Derek Schuff	2024-11-13	1	-0/+4
\| \| \| \| \| \| \| \|	This pass lowers away memory.copy and memory.fill operations. It generates a function that implements the each of the instructions and replaces the instructions with calls to those functions. It does not handle other bulk memory operations (e.g. passive segments and table operations) because they are not used by emscripten to enable targeting old browsers that don't support bulk memory.
*	Adds a J2CL specific pass that moves itable entries to vtables (#6888)	Roberto Lublinerman	2024-09-06	1	-0/+4
\| \| \| \| \| \| \| \|	This allows to remove a reference field from all Java objects reducing the per object memory and initialization overhead. The pass is designed to run direclty on the J2CL output before other optimizations since it relies on invariants that might get lost in optimization. If the invariants don't hold the pass aborts.
*	[NFC] Move optimizeSubsequentStructSet() to a new pass, ↵	Alon Zakai	2024-09-03	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HeapStoreOptimization (#6882) This just moves code out of OptimizeInstructions to the new pass. The existing test is renamed and now runs the new pass instead. The new pass is run right after each --optimize-instructions invocation, so it should not cause any noticeable effects whatsoever, making this NFC. The motivation here is that there is a bug in the pass, see the new testcase added at the end, which shows the bug. It is not practical to fix that bug in OptimizeInstructions since we need more than peephole optimizations to do so. This PR moves the code to a new pass so we can fix it there properly, later. The new pass is named HeapStoreOptimization since the same infrastructure we will need to fix the bug will also help dead store elimination and related things.
*	Add a string lowering mode disallowing non-UTF-8 strings (#6861)	Thomas Lively	2024-08-21	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	The best way to lower strings is via the "magic imports" API that uses the names of imported string globals as their values. This approach only works for valid UTF-8 strings, though. The existing string-lowering-magic-imports pass falls back to putting non-UTF-8 strings in a JSON custom section, but this requires the runtime to support that custom section for correctness. To help catch errors early when runtimes do not support the strings custom section, add a new pass that uses magic imports and raises an error if there are any invalid strings.
*	Add a pass for minimizing recursion groups (#6832)	Thomas Lively	2024-08-17	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Most of our type optimization passes emit all non-public types as a single large rec group, which trivially ensures that different types remain different, even if they are optimized to have the same structure. Usually emitting a single large rec group is fine, but it also means that if the module is split, all of the types will need to be repeated in all of the split modules. To better support this use case, add a pass that can split the large rec group back into minimal rec groups, taking care to preserve separate type identities by emitting different permutations of the same group where possible or by inserting unused brand types to differentiate them.
*	Add a customizable title to Metrics reporting (#6792)	Alon Zakai	2024-07-30	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before the PR: $ bin/wasm-opt test/hello_world.wat --metrics total [exports] : 1 [funcs] : 1 [globals] : 0 [imports] : 0 [memories] : 1 [memory-data] : 0 [tables] : 0 [tags] : 0 [total] : 3 [vars] : 0 Binary : 1 LocalGet : 2 After the PR: $ bin/wasm-opt test/hello_world.wat --metrics Metrics total [exports] : 1 [funcs] : 1 ... Note the "Metrics" addition at the top. And the title can be customized: $ bin/wasm-opt test/hello_world.wat --metrics=text Metrics: text total [exports] : 1 [funcs] : 1 The custom title can be helpful when multiple invocations of metrics are used at once, e.g. --metrics=before -O3 --metrics=after.
*	Allow different arguments for multiple instances of a pass (#6687)	Christian Speckner	2024-07-15	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \|	Each pass instance can now store an argument for it, which can be different. This may be a breaking change for the corner case of running a pass multiple times and setting the pass's argument multiple times as well (before, the last pass argument affected them all; now, it affects the last instance only). This only affects arguments with the name of a pass; others remain global, as before (and multiple passes can read them, in fact). See the CHANGELOG for details. Fixes #6646
*	ConstantFieldPropagation: Add a variation that picks between 2 values using ↵	Alon Zakai	2024-06-27	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RefTest (#6692) CFP focuses on finding when a field always contains a constant, and then replaces a struct.get with that constant. If we find there are two constant values, then in some cases we can still optimize, if we have a way to pick between them. All we have is the struct.get and its reference, so we must use a ref.test: (struct.get $T x (..ref..)) => (select (..constant1..) (..constant2..) (ref.test $U (..ref..)) ) This is valid if, of all the subtypes of $T, those that pass the test have constant1 in that field, and those that fail the test have constant2. For example, a simple case is where $T has two subtypes, $T is never created itself, and each of the two subtypes has a different constant value. This is a somewhat risky operation, as ref.test is not necessarily cheap. To mitigate that, this is a new pass, --cfp-reftest that is not run by default, and also we only optimize when we can use a ref.test on what we think will be a final type (because ref.test on a final type can be faster in VMs).
*	Add TraceCalls pass (#6619)	Marcin Kolny	2024-06-21	1	-0/+4
\| \| \| \| \| \| \|	This pass receives a list of functions to trace, and then wraps them in calls to imports. This can be useful for tracing malloc/free calls, for example, but is generic. Fixes #6548
*	Run RemoveUnneededModuleElements early (#6620)	Alon Zakai	2024-05-29	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Doing it before anything else can help a lot if there is a significant amount of dead code that can be removed, as it saves work for all the later passes. We did run this pass if GC was enabled just a few passes later down, but even so it is worthwhile to run it an additional time, and it makes sense to do even without GC (though in typical optimized LLVM outputs there will be little dead code). If there is no dead code then this is wasted work, but this is a fairly fast pass, and I measure no significant slowdown due to this. E.g. on the 35 MB clang.wasm (which is already optimized, so little dead code) it takes around a second, while all of -O2 takes almost two minutes, so the difference is just 1%. On J2CL I measure a 15% speedup in -O3 --closed-world -tnh, and also the binary is 2.5% smaller, which means there is less work for later cycles of -O3.
*	Add table64 lowering pass (#6595)	Sam Clegg	2024-05-15	1	-0/+3
\| \| \| \| \|	Changes to wasm-validator.cpp here are mostly for consistency between elem and data segment validation.
*	[EH] Rename option/pass names for new EH (exnref) (#6592)	Heejin Ahn	2024-05-15	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We settled on the name `WASM_EXNREF` for the new setting in Emscripten for the name for the new EH option. https://github.com/emscripten-core/emscripten/blob/2bc5e3156f07e603bc4f3580cf84c038ea99b2df/src/settings.js#L782-L786 "New EH" sounds vague and I'm not sure if "experimental" is really necessary anyway, given that the potential users of this option is aware that this is a new spec that has been adopted recently. To make the option names consistent, this renames `--translate-to-eh` (the option that only runs the translator) to `--translate-to-exnref`, and `--experimental-new-eh` to `--emit-exnref` (the option that runs the translator at the end of the whole pipeline), and renames the pass and variable names in the code accordingly as well. In case anyone is using the old option names (and also to make the Chromium CI pass), this does not delete the old options.
*	[StackIR] Run StackIR during binary writing and not as a pass (#6568)	Alon Zakai	2024-05-09	1	-132/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we had passes --generate-stack-ir, --optimize-stack-ir, --print-stack-ir that could be run like any other passes. After generating StackIR it was stashed on the function and invalidated if we modified BinaryenIR. If it wasn't invalidated then it was used during binary writing. This PR switches things so that we optionally generate, optimize, and print StackIR only during binary writing. It also removes all traces of StackIR from wasm.h - after this, StackIR is a feature of binary writing (and printing) logic only. This is almost NFC, but there are some minor noticeable differences: 1. We no longer print has StackIR in the text format when we see it is there. It will not be there during normal printing, as it is only present during binary writing. (but --print-stack-ir still works as before; as mentioned above it runs during writing). 2. --generate/optimize/print-stack-ir change from being passes to being flags that control that behavior instead. As passes, their order on the commandline mattered, while now it does not, and they only "globally" affect things during writing. 3. The C API changes slightly, as there is no need to pass it an option "optimize" to the StackIR APIs. Whether we optimize is handled by --optimize-stack-ir which is set like other optimization flags on the PassOptions object, so we don't need the old option to those C APIs. The main benefit here is simplifying the code, so we don't need to think about StackIR in more places than just binary writing. That may also allow future improvements to our usage of StackIR.
*	DebugLocationPropagation: pass debuglocation from parent node to chil… (#6500)	许鑫权	2024-04-21	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR creates a pass to propagate debug location from parent node to child nodes which has no debug location with pre-order traversal. This is useful for compilers that use Binaryen API to generate WebAssembly modules. It behaves like `wasm-opt` read text format file: children are tagged with the debug info of the parent, if they have no annotation of their own. For compilers that use Binaryen API to generate WebAssembly modules, it is a bit redundant to add debugInfo for each expression, Especially when the compiler wrap expressions. With this pass, compilers just need to add debugInfo for the parent node, which is more convenient. For example: ``` (drop (call $voidFunc) ) ``` Without this pass, if the compiler only adds debugInfo for the wrapped expression `drop`, the `call` expression has no corresponding source code mapping in DevTools debugging, which is obviously not user-friendly.
*	[Strings] Add a string lowering pass using magic imports (#6497)	Thomas Lively	2024-04-15	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The latest idea for efficient string constants is to encode the constants in the import names of their globals and implement fast paths in the engines for materializing those constants at instantiation time without needing to parse anything in JS. This strategy only works for valid strings (i.e. strings without unpaired surrogates) because only valid strings can be used as import names in the WebAssembly syntax. Add a new configuration of the StringLowering pass that encodes valid string contents in import names, falling back to the JSON custom section approach for invalid strings. To test this chang, update the printer to escape import and export names properly and update the legacy parser to parse escapes in import and export names properly. As a drive-by, remove the incorrect check in the parser that the import module and base names are non-empty.
*	Remove "minimal" JS import/export legalization (#6428)	Sam Clegg	2024-03-22	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This change removes the "minimal" mode from `LegalizeJSInterface` which was added in #1883. The idea behind this change was to avoid legalizing most function except those we know that JS will be calling. The idea was that for dynamic linking we always want the non-legalized version to be shared between wasm module. These days we solve this problem in a different way with the `legalize-js-interface-export-originals` which exports the original functions alongside the legalized ones. Emscripten then always prefers the `$orig` functions when doing dynamic linking.
*	Fuzzer: Add a pass to prune illegal imports and exports for JS (#6312)	Alon Zakai	2024-02-20	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We already have passes to legalize i64 imports and exports, which the fuzzer will run so that we can run wasm files in JS VMs. SIMD and multivalue also pose a problem as they trap on the boundary. In principle we could legalize them as well, but that is substantial effort, so instead just prune them: given a wasm module, remove any imports or exports that use SIMD or multivalue (or anything else that is not legal for JS). Running this in the fuzzer will allow us to not skip running v8 on any testcase we enable SIMD and multivalue for. (Multivalue is allowed in newer VMs, so that part of this PR could be removed eventually.) Also remove the limitation on running v8 with multimemory (v8 now supports that).
*	Add a pass to propagate global constants to other globals (#6287)	Alon Zakai	2024-02-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SimplifyGlobals already does this, so this is a subset of that pass, and does not add anything new. It is useful for testing, however. In particular it allows testing that we propagate subsequent globals in a single pass, that is if one global reads from another and becomes constant, then it can be propagated as well. SimplifyGlobals runs multiple passes so this always worked, but with this pass we can test that we do it efficiently in one pass. This will also be useful for comparing stringref to imported strings, as it allows gathered strings to be propagated to other globals (possible with stringref, but not imported strings) but not anywhere else (which might have downsides as it could lead to more allocations). Also add an additional test for simplify-globals that we do not get confused by an unoptimizable global.get in the middle (see last part).
*	StringLowering pass (#6271)	Alon Zakai	2024-02-05	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This extends StringGathering by replacing the gathered string globals to imported globals. It adds a custom section with the strings that the imports are expected to provide. It also replaces the string type with extern. This is a complete lowering of strings, except for string operations that are a TODO. After running this, no strings remain in the wasm, and the outside JS is expected to provide the proper imports, which it can do by processing the JSON of the strings in the custom section "string.consts", which looks like ["foo", "bar", ..] That is, an array of strings, which are imported as (import "string.const" "0" (global $string.const_foo (ref extern))) ;; foo (import "string.const" "1" (global $string.const_bar (ref extern))) ;; bar
*	StringGathering pass (#6257)	Alon Zakai	2024-01-31	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pass finds all string.const and creates globals for them. After this transform, no string.const appears anywhere but in a global, and each string appears in one global which is then global.get-ed everywhere. This avoids overhead in VMs where executing a string.const is an allocation, and is also a good step towards imported strings. For that, this pass will be extended from gathering to a full lowering pass, which will first gather into globals as this pass does, and then turn each of those globals with a string.const into an imported externref. (For that reason this pass is in a file called StringLowering, as the two passes will share much of their code, and the larger pass should decide the name I think.) This pass runs in -O2 and above. Repeated executions have no downside (see details in code).
*	[EH] Change translator option name (#6259)	Heejin Ahn	2024-01-30	1	-2/+2
\| \| \| \| \| \|	The previous name feels too verbose and unwieldy. This also removes the "new-to-old EH" placeholder. I think it'd be better to add it back when it is actually added.
*	[EH] Add translator from old to new EH instructions (#6210)	Heejin Ahn	2024-01-23	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This translates the old Phase 3 EH instructions, which include `try`, `catch`, `catch_all`, `delegate`, and `rethrow`, into the new EH instructions, which include `try_table` (with `catch` / `catch_ref` / `catch_all` / `catch_all_ref`) and `throw_ref`, passed at the Oct 2023 CG meeting. This translator can be used as a standalone tool by users of the previous EH toolchain to generate binaries for the new spec without recompiling, and also can be used at the end of the Binaryen pipeline to produce binaries for the new spec while the end-to-end toolchain implementation for the new spec is in progress. While the goal of this pass is not optimization, this tries to a little better than the most naive implementation, namely by omitting a few instructions where possible and trying to minimize the number of additional locals, because this can be used as a standalone translator or the last stage of the pipeline while we can't post-optimize the results because the whole pipeline (-On) is not ready for the new EH.
*	Fix global effect computation with -O flags (#6211)	Alon Zakai	2024-01-09	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We tested --generate-global-effects --vacuum and such, but not --generate-global-effects -O3 or the other -O flags. Unfortunately, our targeted testing missed a bug because of that. Specifically, we have special logic for -O flags to make sure the passes they expand into run with the proper opt and shrink levels, but that logic happened to also interfere with global effect computation. It would also interfere with allowing GUFA info or other things to be stored on the side, which we've proposed. This PR fixes that + future issues. The fix is to just allow a pass runner to execute more than once. We thought to avoid that and assert against it to keep the model "hermetic" (you create a pass runner, you run the passes, and you throw it out), which feels nice in a way, but it led to the bug here, and I'm not sure it would prevent any other ones really. It is also more code. It is simpler to allow a runner to execute more than once, and add a method to clear it. With that, the logic for -O3 execution is both simpler and does not interfere with anything but the opt and shrink level flags: we create a single runner, give it the proper options, and then keep using that runner + those options as we go, normally.
*	Add J2CL optimization pass to binaryen. (#6151)	Goktug Gokdogan	2023-12-12	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR creates a new pass to optimize J2CL specific patterns that would otherwise difficult to recognize/prove generically by other binaryen passes. The pass currently handles fields what we call as "constant-like". These fields are fields initialized once and unconditionally through "clinit" function and technically they do have 2 observable states; - initial null/0 state - initialized state. However you can only observe initial null/0 state in contrived examples, not in real world/correct applications. This pass moves such "clinit" initialized fields to global initialization. Above pattern also matches other lazy init construct like String and Class literals (which binaryen already reduces to constant expressions). So the pass is generalized to include them as well. (by matching any functions with the name pattern "_@once_") In order for this pass to be effective: 1. It needs to run between O3 passes 2. We need to stop inlining of "once" functions. Stopping inlining of the once functions are important to preserve their structure. This both helps existing OnceReducer pass and new J2CL pass to be a lot more effective. Also it is not useful to inline these functions as by defintion they only executed once. This could be achieved by passing no-inline filter. Although the inlining is generally disabled for these functions, it is still needed for some cases since inliner is effectively responsible for removal of the once functions that are simplified into empty or simple delegating functions. For this reason, the pass will rename such trivial function so no-inline filter will no longer match them. Also note that after all optimizations completed, it does make sense to have a final stage where the "partial inline" of all once functions are allowed. This will speed them up by moving the initialization check to call-site.
*	Add no-inline IR annotation, and passes to set it based on function name (#6146)	Alon Zakai	2023-12-06	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Any function can now be annotated as not to be inlined fully (normally) or not to be inlined partially. In the future we'll want to read those annotations from the proposed wasm metadata section on code hints, and from wat text as well, but for now add trivial passes that set those fields based on function name wildcards, e.g.: --no-inline=leave-alone --inlining That will not inline any function whose name contains "leave-alone" in the name. --no-inline disables all inlining (full or partial) while --no-full-inline and --no-partial-inline affect only full or partial inlining.
*	[Outlining] Add SKIP_OUTLINING macro	Ashley Nelson	2023-11-14	1	-1/+1
\| \| \|	Allow outlining to be excluded from the command line on non-Emscripten builds.
*	[Outlining] Adds Outlining pass (#6110)	Ashley Nelson	2023-11-13	1	-0/+5
\| \| \|	Adds an outlining pass that performs outlining on a module end to end, and two tests.
*	[analysis] Add an experimental TypeGeneralizing optimization (#6080)	Thomas Lively	2023-11-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This new optimization will eventually weaken casts by generalizing (i.e. un-refining) their output types. If a cast is weakened enough that its output type is a supertype of its input type, the cast will be able to be removed by OptimizeInstructions. Unlike refining cast inputs, generalizing cast outputs can break module validation. For example, if the result of a cast is stored to a local and the cast is weakened enough that its output type is no longer a subtype of that local's type, then the local.set after the cast will no longer validate. To avoid this validation failure, this optimization would have to generalize the type of the local as well. In general, the more we can generalize the types of program locations, the more we can weaken casts of values that flow into those locations. This initial implementation only generalizes the types of locals and does not actually weaken casts yet. It serves as a proof of concept for the analysis required to perform the full optimization, though. The analysis uses the new analysis framework to perform a reverse analysis tracking type requirements for each local and reference-typed stack value in a function. Planned and potential future work includes: - Implementing the transfer function for all kinds of expressions. - Tracking requirements on the dynamic types of each location to generalize allocations as well. - Making the analysis interprocedural and generalizing the types of more program locations. - Optimizing tuple-typed locations. - Generalizing only those locations necessary to eliminate at least one cast (although this would make the anlysis bidirectional, so it is probably better left to separate passes).
*	Move --separate-data-segments into a pass so it can be run from wasm-opt (#6088)	Sam Clegg	2023-11-08	1	-0/+3
\| \| \| \| \| \| \| \|	Because we currently strip some data segments (i.e. EM_JS strings) during `--post-emscripten` this is too late as `--separate-data-segments` always runs in `wasm-emscripten-finalize`. Once emscripten switches over to using the pass directly we can remove the support from `wasm-emscripten-finalize`
*	Add an "unsubtyping" optimization (#5982)	Thomas Lively	2023-10-10	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new pass that analyzes the module to find the minimal subtyping relation that is necessary to maintain the validity and semantics of the program and rewrites the types to use this minimal relation. Besides eliminating references to otherwise-unused intermediate types, this optimization should unlock significant additional optimizing power in other type optimizations that are constrained by having to maintain supertype validity, since after this new optimization there are fewer and more general supertypes. The analysis works by visiting each expression and module element to collect the subtypings that are required to maintain its validity, then, using that as a starting point, iteratively adding new subtypings required by type definitions and casts until reaching a fixed point.
*	Automatically discard global effects in the rare passes that add effects (#5999)	Alon Zakai	2023-10-06	1	-0/+5
\| \| \| \| \|	All logging/instrumentation passes need to do this, to avoid us using stale global effects that are too low (too high is not optimal either, but at least it cannot cause bugs).
*	Add passes to finalize or unfinalize types (#5944)	Alon Zakai	2023-09-18	1	-0/+6
\| \| \| \| \| \| \| \| \|	TypeFinalization finalizes all types that we can, that is, all private types that have no children. TypeUnFinalization unfinalizes (opens) all (private) types. These could be used by first opening all types, optimizing, and then finalizing, as that might find more opportunities. Fixes #5933
*	Add a simple tuple optimization pass (#5937)	Alon Zakai	2023-09-14	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \|	In some cases tuples are obviously not needed, such as when they are only used in local operations and make/extract. Such tuples are not used as return values or in control flow structures, so we might as well lower them to individual locals per lane, which other passes can optimize a lot better. I believe LLVM does the same with its own tuples: it lowers them as much as possible, leaving only necessary ones. Fixes #5923
*	GUFA: Add a version that casts all of our inferences (#5846)	Alon Zakai	2023-07-27	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GUFA refines existing casts, but does not add new casts for fear of increasing code size and adding more cast operations at runtime. This PR adds a version that does add all those casts, and it looks like at least code size improves rather than regresses, at least on J2Wasm and Kotlin. That is, this pass adds a lot more casts, but subsequent optimizations benefit enough to shrink overall code size. However, this may still not be worthwhile, as even if code size decreases we may end up doing more casts at runtime, and those casts might be hard to remove, e.g.: (call $foo (x) ;; inferred to be non-null ) (func $foo (param (ref null $A) => (call $foo (ref.cast $A (x) ;; add a cast here ) (func $foo (param (ref $A) ;; later pass refines here That new cast cannot be removed after we refine the function parameter. If the function never benefits from the fact that the input is non-null, then the cast is wasted work (e.g. if the function only compares the input to another value). To use this new pass, try --gufa-cast-all rather than --gufa. As with normal GUFA, running the full optimizer afterwards is important, and even more important in order to get rid of as many of the new casts as possible.
*	Add a pass to sort functions by name (#5811)	Alon Zakai	2023-07-12	1	-0/+3
\|
*	Fix opt/shrink levels when running the optimizer multiple times, Part 2 (#5787)	Alon Zakai	2023-06-27	1	-19/+0
\| \| \| \| \| \| \| \| \| \| \|	This is a followup to #5333 . That fixed the selection of which passes to run, but forgot to also fix the global state of the current optimize/shrink levels. This PR fixes that. As a result, running -O3 -Oz will now work as expected: the first -O3 will run the right passes (as #5333 fixed) and while running them, the global optimize/shrinkLevels will be -O3 (and not -Oz), which this PR fixes. A specific result of this is that -O3 -Oz used to inline less, since the invocation of inlining during -O3 thought we were optimizing for size. The new test verifies that we do fully inline in the first -O3 now.
*	[EH] Add pass to remove EH instructions (#5770)	Heejin Ahn	2023-06-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pass strips all EH stuff, including EH instructions and tags, from the input module and disables the EH feature from the features section. 1. This removes `catch` and `catch_all` blocks from the code. So ```wast (try (do (some code) ) (catch ... ) ) ``` becomes just `(some code)`. Note that all `rethrow`s will be removed with `catch`es. Note that all `rethrow`s will be removed with `catch`es. 2. This converts 'throw (...)` into `unreachable`. Note that `rethrows 3. This removes all tags from the module, which are unused anyway after 1 and 2. 4. This removes exception handling feature from the features section. You can use the pass with ```console $ wasm-opt --enable-exception-handling --strip-eh INPUT -o OUTPUT ``` This is not an optimization pass, so it is not run unless you specify the pass explicitly. This is in effect similar to Clang's `-fignore-exceptions`, in which you can throw but it will result in a crash and we compile away all landing pads. This can be used for people who don't (or can't) use `-fignore-exceptions` in their build settings or who want to compile away `catch` blocks later. Closes emscripten-core/emscripten#19585.
*	Disable sign extension in SignExtLowering.cpp (#5676)	Thomas Lively	2023-04-19	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Disable sign extension in SignExtLowering.cpp The sign extension lowering pass would previously lower away the sign extension instructions, but it wouldn't disable the sign extension feature, so follow-on passes such as optimize-instructions could reintroduce sign extension instructions. Fix the pass to disable the sign extension feature to prevent sign extension instructions from being reintroduced later. * update pass description
*	[Wasm GC] Add AbstractTypeRefining pass (#5461)	Alon Zakai	2023-02-03	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a type hierarchy has abstract classes in the middle, that is, types that are never instantiated, then we can optimize casts and other operations to them. Say in Java that we have `AbstractList`, and it only has one subclass `IntList` that is ever created, then any place we have an `AbstractList` we must actually have an `IntList`, or a null. (Or, if no subtype is instantiated, then the value must definitely be a null.) The actual implementation does a type mapping, that is, it finds all places using an abstract type and makes them refer to the single instantiated subtype (or null). After that change, no references to the abstract type remain in the program, so this both refines types and also cleans up the type section.
*	Avoid spurious warnings in pass skipping (#5451)	Alon Zakai	2023-01-25	1	-7/+11
\| \| \| \| \|	Nested runners should be ignored, as they run some internal stuff in certain passes, which would not contain the pass the user asked to skip with --skip-pass.
*	Add a mechanism to skip a pass by name (#5448)	Alon Zakai	2023-01-24	1	-0/+25
\| \| \| \| \| \| \| \|	For example, -O3 --skip-pass=vacuum will run -O3 normally but it will not run the vacuum pass at all (which normally runs more than once in -O3).
*	Write debug info in Pass-Debug mode 3 (#5384)	Alon Zakai	2023-01-03	1	-1/+1
\| \| \| \|	Without the names section debugging can be hard sometimes, on the binaries that that mode emits for each pass.
*	Remove unused types during type optimizations (#5361)	Thomas Lively	2022-12-19	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The type rewriting utility in type-updating.cpp gathers all the used heap types, then rewrites them to newly built and possibly modified heap types. The problem is that for the isorecursive type system, the set of "used" heap types was overly broad because it also included unused heap types that are in a rec group with used types. In the context of emitting a binary, it is important to treat these types as used because failing to emit them would change the identity of the used types, but in the context of type optimizations it is ok to treat them as truly unused because we are changing type identities anyway. Update the type rewriting utility to only include truly used types in the set of output types. This causes all existing type optimizations to implicitly drop unused types, but only if they find any other optimizations to do and actually run the rewriter utitility. Their output will also still include unused types that were used before their optimizations were applied. To overcome these limitations and better match the optimizing power of nominal mode, which never includes unused types in the output, add a new type optimization pass that removes unused types and does nothing else and run it near the end of the global optimization pipeline.
*	Do not optimize public types (#5347)	Thomas Lively	2022-12-16	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Do not optimize or modify public heap types in any way. Public heap types include the types of imported or exported functions, tables, globals, etc. This is important to maintain the public interface of a module and ensure it can still link interact as intended with the outside world. Also add validation error if we find any nontrivial public types that are not the types of imported or exported functions. This error is meant to help the user ensure that type optimizations are not silently inhibited. In the future, we may want to add options to silence this error or downgrade it to a warning. This commit only updates the type updating machinery to avoid updating public types. It does not update any optimization passes accordingly. Since we avoid modifying public signature types already, this is not expected to break anything, but in the future once we have function subtyping or if we make the error optional, we may have to update some of our optimization passes.
*	Adds bounds checks to Load/Store in Multi-Memories Lowering Pass (#5256)	Ashley Nelson	2022-12-09	1	-0/+5
\| \| \|	Per the wasm spec guidelines for Load (rule 10) & Store (rule 12), this PR adds an option for bounds checking, producing a runtime error if the instruction exceeds the bounds of the particular memory within the combined memory.
*	[Wasm GC] Add TypeMerging pass (#5321)	Alon Zakai	2022-12-07	1	-0/+3
\| \| \| \| \| \| \| \|	This finds types that can be merged into their super: types that add no fields, and are not used in casts, etc. - so we might as well use the super. This complements TypeSSA, in that it can merge back the new types that TypeSSA created, if we never found a use for them. Without this, TypeSSA can bloat binary size quite a lot (I see 10-20%).
*	[Wasm GC] Add TypeSSA pass (#5299)	Alon Zakai	2022-12-02	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This creates new nominal types for each (interesting) struct.new. That then allows type-based optimizations to be more precise, as those optimizations will track separate info for each struct.new, in effect. That is kind of like SSA, however, we do not handle merges. For example: x = struct.new $A (5); print(x.value); y = struct.new $A (11); print(y.value); // => // x = struct.new $A.x (5); print(x.value); y = struct.new $A.y (11); print(y.value); After the pass runs each of those struct.new creates a unique type, and type-based analysis can see that 5 or 11 are the only values written in that type (if nothing else writes there). This bloats the type section with the new subtypes, so it is best used with a pass to merge unneeded duplicate types, which a later PR will add. That later PR will exactly merge back in the types created here, which are nominally different but indistinguishable otherwise. This pass is not enabled by default. It's not clear yet where is the best place to do it, as it must be balanced by type merging, but it might be better to do multiple rounds of optimization between the two. Needs more investigation.