forks/binaryen.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	Disable sign extension in SignExtLowering.cpp (#5676)	Thomas Lively	2023-04-19	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Disable sign extension in SignExtLowering.cpp The sign extension lowering pass would previously lower away the sign extension instructions, but it wouldn't disable the sign extension feature, so follow-on passes such as optimize-instructions could reintroduce sign extension instructions. Fix the pass to disable the sign extension feature to prevent sign extension instructions from being reintroduced later. * update pass description
*	[Wasm GC] Add AbstractTypeRefining pass (#5461)	Alon Zakai	2023-02-03	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a type hierarchy has abstract classes in the middle, that is, types that are never instantiated, then we can optimize casts and other operations to them. Say in Java that we have `AbstractList`, and it only has one subclass `IntList` that is ever created, then any place we have an `AbstractList` we must actually have an `IntList`, or a null. (Or, if no subtype is instantiated, then the value must definitely be a null.) The actual implementation does a type mapping, that is, it finds all places using an abstract type and makes them refer to the single instantiated subtype (or null). After that change, no references to the abstract type remain in the program, so this both refines types and also cleans up the type section.
*	Avoid spurious warnings in pass skipping (#5451)	Alon Zakai	2023-01-25	1	-7/+11
\| \| \| \| \|	Nested runners should be ignored, as they run some internal stuff in certain passes, which would not contain the pass the user asked to skip with --skip-pass.
*	Add a mechanism to skip a pass by name (#5448)	Alon Zakai	2023-01-24	1	-0/+25
\| \| \| \| \| \| \| \|	For example, -O3 --skip-pass=vacuum will run -O3 normally but it will not run the vacuum pass at all (which normally runs more than once in -O3).
*	Write debug info in Pass-Debug mode 3 (#5384)	Alon Zakai	2023-01-03	1	-1/+1
\| \| \| \|	Without the names section debugging can be hard sometimes, on the binaries that that mode emits for each pass.
*	Remove unused types during type optimizations (#5361)	Thomas Lively	2022-12-19	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The type rewriting utility in type-updating.cpp gathers all the used heap types, then rewrites them to newly built and possibly modified heap types. The problem is that for the isorecursive type system, the set of "used" heap types was overly broad because it also included unused heap types that are in a rec group with used types. In the context of emitting a binary, it is important to treat these types as used because failing to emit them would change the identity of the used types, but in the context of type optimizations it is ok to treat them as truly unused because we are changing type identities anyway. Update the type rewriting utility to only include truly used types in the set of output types. This causes all existing type optimizations to implicitly drop unused types, but only if they find any other optimizations to do and actually run the rewriter utitility. Their output will also still include unused types that were used before their optimizations were applied. To overcome these limitations and better match the optimizing power of nominal mode, which never includes unused types in the output, add a new type optimization pass that removes unused types and does nothing else and run it near the end of the global optimization pipeline.
*	Do not optimize public types (#5347)	Thomas Lively	2022-12-16	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Do not optimize or modify public heap types in any way. Public heap types include the types of imported or exported functions, tables, globals, etc. This is important to maintain the public interface of a module and ensure it can still link interact as intended with the outside world. Also add validation error if we find any nontrivial public types that are not the types of imported or exported functions. This error is meant to help the user ensure that type optimizations are not silently inhibited. In the future, we may want to add options to silence this error or downgrade it to a warning. This commit only updates the type updating machinery to avoid updating public types. It does not update any optimization passes accordingly. Since we avoid modifying public signature types already, this is not expected to break anything, but in the future once we have function subtyping or if we make the error optional, we may have to update some of our optimization passes.
*	Adds bounds checks to Load/Store in Multi-Memories Lowering Pass (#5256)	Ashley Nelson	2022-12-09	1	-0/+5
\| \| \|	Per the wasm spec guidelines for Load (rule 10) & Store (rule 12), this PR adds an option for bounds checking, producing a runtime error if the instruction exceeds the bounds of the particular memory within the combined memory.
*	[Wasm GC] Add TypeMerging pass (#5321)	Alon Zakai	2022-12-07	1	-0/+3
\| \| \| \| \| \| \| \|	This finds types that can be merged into their super: types that add no fields, and are not used in casts, etc. - so we might as well use the super. This complements TypeSSA, in that it can merge back the new types that TypeSSA created, if we never found a use for them. Without this, TypeSSA can bloat binary size quite a lot (I see 10-20%).
*	[Wasm GC] Add TypeSSA pass (#5299)	Alon Zakai	2022-12-02	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This creates new nominal types for each (interesting) struct.new. That then allows type-based optimizations to be more precise, as those optimizations will track separate info for each struct.new, in effect. That is kind of like SSA, however, we do not handle merges. For example: x = struct.new $A (5); print(x.value); y = struct.new $A (11); print(y.value); // => // x = struct.new $A.x (5); print(x.value); y = struct.new $A.y (11); print(y.value); After the pass runs each of those struct.new creates a unique type, and type-based analysis can see that 5 or 11 are the only values written in that type (if nothing else writes there). This bloats the type section with the new subtypes, so it is best used with a pass to merge unneeded duplicate types, which a later PR will add. That later PR will exactly merge back in the types created here, which are nominally different but indistinguishable otherwise. This pass is not enabled by default. It's not clear yet where is the best place to do it, as it must be balanced by type merging, but it might be better to do multiple rounds of optimization between the two. Needs more investigation.
*	[Wasm GC] Implement closed-world flag (#5303)	Alon Zakai	2022-11-30	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	With this change we default to an open world, that is, we do the safe thing by default: we no longer assume a closed world. Users that want a closed world must pass --closed-world. Atm we just do not run passes that assume a closed world. (We might later refine them to find which types don't escape and only optimize those.) The RemoveUnusedModuleElements is an exception in that the closed-world flag influences one part of its operation, but not the rest. Fixes #5292
*	Remove equirecursive typing (#5240)	Thomas Lively	2022-11-23	1	-4/+1
\| \| \| \|	Equirecursive is no longer standards track and its implementation is extremely complex. Remove it.
*	Dump only the binary in pass-debug mode (#5290)	Alon Zakai	2022-11-22	1	-4/+3
\| \| \| \| \|	Dumping the text is nice sometimes, but on huge testcases the wat can be 1 GB in size (!), and so dumping one per pass can lead to using 20 GB or so for the full optimization pipeline. Emit just the binary to avoid that.
*	[Wasm GC] Start an OptimizeCasts pass and reuse cast values there (#5263)	Alon Zakai	2022-11-17	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(some.operation (ref.cast .. (local.get $ref)) (local.get $ref) ) => (some.operation (local.tee $temp (ref.cast .. (local.get $ref)) ) (local.get $temp) ) This can help cases where we cast for some reason but happen to not use the cast value in all places. This occurs in j2wasm in itable calls sometimes: The this pointer is is refined, but the itable may be done with an unrefined pointer, which is less optimizable. So far this is just inside basic blocks, but that is enough for the cast of itable calls and other common patterns I see.
*	Add a pass to lower sign-ext operations to MVP (#5254)	Alon Zakai	2022-11-15	1	-0/+3
\| \| \| \|	Fixes #5250
*	[Wasm GC] Add Monomorphize pass (#5238)	Alon Zakai	2022-11-11	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Monomorphization finds cases where we send more refined types to a function than it declares. In such cases we can copy the function and refine the parameters: // B is a subtype of A foo(new B()); function foo(x : A) { ..} => foo_B(new B()); // call redirected to refined copy function foo(x : A) { ..} // unchanged function foo_B(x : B) { ..} // refined copy This increases code size so it may not be worth it in all cases. This initial PR is hopefully enough to start experimenting with this on performance, and so it does not enable the pass by default. This adds two variations of monomorphization, one that always does it, and the default which is "careful": it sees whether monomorphizing lets the refined function actually be better than the original (say, by removing a cast). If there is no improvement then we do not make any changes. This saves a significant amount of code size - on j2wasm the careful version increases by 13% instead of 20% - but it does run more slowly obviously.
*	ReorderGlobals pass (#4904)	Alon Zakai	2022-11-02	1	-0/+9
\| \| \| \| \| \| \| \| \|	This sorts globals by their usage (and respecting dependencies). If the module has very many globals then using smaller LEBs can matter. If there are fewer than 128 globals then we cannot reduce size, and the pass exits early (so this pass will not slow down MVP builds, which usually have just 1 global, the stack pointer). But with wasm GC it is common to use globals for vtables etc., and often there is a very large number of them.
*	Multi-Memories Lowering Pass (#5107)	Ashley Nelson	2022-11-01	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	Adds a multi-memories lowering pass that will create a single combined memory from the memories added to the module. This pass assumes that each memory is configured the same (type, shared). This pass also: - replaces existing memory.size instructions with a custom function that returns the size of each memory as if they existed independently - replaces existing memory.grow instructions with a custom function, using global offsets to track the page size of each memory so data doesn't overlap in the singled combined memory - adjusts the offsets of active data segments - adjusts the offsets of Loads/Stores
*	[Wasm GC] Enable various passes in hybrid mode, not just nominal (#5202)	Alon Zakai	2022-10-31	1	-1/+3
\|
*	Make `Name` a pointer, length pair (#5122)	Thomas Lively	2022-10-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the goal of supporting null characters (i.e. zero bytes) in strings. Rewrite the underlying interned `IString` to store a `std::string_view` rather than a `const char`, reduce the number of map lookups necessary to intern a string, and present a more immutable interface. Most importantly, replace the `c_str()` method that returned a `const char` with a `toString()` method that returns a `std::string`. This new method can correctly handle strings containing null characters. A `const char` can still be had by calling `data()` on the `std::string_view`, although this usage should be discouraged. This change is NFC in spirit, although not in practice. It does not intend to support any particular new functionality, but it is probably now possible to use strings containing null characters in at least some cases. At least one parser bug is also incidentally fixed. Follow-on PRs will explicitly support and test strings containing nulls for particular use cases. The C API still uses `const char` to represent strings. As strings containing nulls become better supported by the rest of Binaryen, this will no longer be sufficient. Updating the C and JS APIs to use pointer, length pairs is left as future work.
*	Refactor interaction between Pass and PassRunner (#5093)	Thomas Lively	2022-09-30	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously only WalkerPasses had access to the `getPassRunner` and `getPassOptions` methods. Move those methods to `Pass` so all passes can use them. As a result, the `PassRunner` passed to `Pass::run` and `Pass::runOnFunction` is no longer necessary, so remove it. Also update `Pass::create` to return a unique_ptr, which is more efficient than having it return a raw pointer only to have the `PassRunner` wrap that raw pointer in a `unique_ptr`. Delete the unused template `PassRunner::getLast()`, which looks like it was intended to enable retrieving previous analyses and has been in the code base since 2015 but is not implemented anywhere.
*	Allow optimizing with global function effects (#5040)	Alon Zakai	2022-09-16	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a map of function name => the effects of that function to the PassOptions structure. That lets us compute those effects once and then use them in multiple passes afterwards. For example, that lets us optimize away a call to a function that has no effects: (drop (call $nothing)) [..] (func $nothing ;; .. lots of stuff but no effects, only a returned value .. ) Vacuum will remove that dropped call if we tell it that the called function has no effects. Note that a nice result of adding this to the PassOptions struct is that all passes will use the extra info automatically. This is not enabled by default as the benefits seem rather minor, though it does help in a small but noticeable way on J2Wasm code, where we use call.without.effects and have situations like this: (func $foo (call $bar) ) (func $bar (call.without.effects ..) ) The call to bar looks like it has effects, normally, but with global effect info we know it actually doesn't. To use this, one would do --generate-global-effects [.. some passes that use the effects ..] --discard-global-effects Discarding is not necessary, but if there is a pass later that adds effects, then not discarding could lead to bugs, since we'd think there are fewer effects than there are. (However, normal optimization passes never add effects, only remove them.) It's also possible to call this multiple times: --generate-global-effects -O3 --generate-global-effects -O3 That computes affects after the first -O3, and may find fewer effects than earlier. This doesn't compute the full transitive closure of the effects across functions. That is, when computing a function's effects, we don't look into its own calls. The simple case so far is enough to handle the call.without.effects example from before (though it may take multiple optimization cycles).
*	Add JavaScript promise integration (JSPI) pass. (#4961)	Brendan Dahl	2022-09-02	1	-0/+3
\| \| \| \| \| \| \|	Add a pass that wraps all imports and exports with functions that handle storing and passing along the suspender externref needed for JSPI. https://github.com/WebAssembly/js-promise-integration/blob/main/proposals/js-promise-integration/Overview.md
*	[Wasm GC] Support non-nullable locals in the "1a" form (#4959)	Alon Zakai	2022-08-31	1	-9/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An overview of this is in the README in the diff here (conveniently, it is near the top of the diff). Basically, we fix up nn locals after each pass, by default. This keeps things easy to reason about - what validates is what is valid wasm - but there are some minor nuances as mentioned there, in particular, we ignore nameless blocks (which are commonly added by various passes; ignoring them means we can keep more locals non-nullable). The key addition here is LocalStructuralDominance which checks which local indexes have the "structural dominance" property of 1a, that is, that each get has a set in its block or an outer block that precedes it. I optimized that function quite a lot to reduce the overhead of running that logic after each pass. The overhead is something like 2% on J2Wasm and 0% on Dart (0%, because in this mode we shrink code size, so there is less work actually, and it balances out). Since we run fixups after each pass, this PR removes logic to manually call the fixup code from various places we used to call it (like eh-utils and various passes). Various passes are now marked as requiresNonNullableLocalFixups => false. That lets us skip running the fixups after them, which we normally do automatically. This helps avoid overhead. Most passes still need the fixups, though - any pass that adds a local, or a named block, or moves code around, likely does. This removes a hack in SimplifyLocals that is no longer needed. Before we worked to avoid moving a set into a try, as it might not validate. Now, we just do it and let fixups happen automatically if they need to: in the common code they probably don't, so the extra complexity seems not worth it. Also removes a hack from StackIR. That hack tried to avoid roundtrip adding a nondefaultable local. But we have the logic to fix that up now, and opts will likely keep it non-nullable as well. Various tests end up updated here because now a local can be non-nullable - previous fixups are no longer needed. Note that this doesn't remove the gc-nn-locals feature. That has been useful for testing, and may still be useful in the future - it basically just allows nn locals in all positions (that can't read the null default value at the entry). We can consider removing it separately. Fixes #4824
*	Function-level pass-debug mode 2 validation (#4897)	Alon Zakai	2022-08-12	1	-2/+32
\| \| \| \| \| \|	In BINARYEN_PASS_DEBUG=2 we save the module before each pass, and if validation fails afterwards, we print the module before. This PR does the same for function-parallel passes - in that case, we can actually show the specific function that broke validation, as opposed to the whole module.
*	Grand Unified Flow Analysis (GUFA) (#4598)	Alon Zakai	2022-07-22	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	This tracks the possible contents in the entire program all at once using a single IR. That is in contrast to say DeadArgumentElimination of LocalRefining etc., all of whom look at one particular aspect of the program (function params and returns in DAE, locals in LocalRefining). The cost is to build up an entire new IR, which takes a lot of new code (mostly in the already-landed PossibleContents). Another cost is this new IR is very big and requires a lot of time and memory to process. The benefit is that this can find opportunities that are only obvious when looking at the entire program, and also it can track information that is more specialized than the normal type system in the IR - in particular, this can track an ExactType, which is the case where we know the value is of a particular type exactly and not a subtype.
*	Enable GlobalStructInference by default (#4734)	Alon Zakai	2022-06-16	1	-0/+1
\| \| \| \| \| \| \| \| \|	This pass helps on at least one Java microbenchmark in a clear way. Real-world data is mixed, with no obvious benefit. But it does optimize 819 callsites on the real-world 14 MB J2Wasm binary, so we may find it helps if/when we run into those code paths. On that binary (the biggest we have for GC) this pass runs in 0.12 seconds, so there is very little downside to enabling it. It is a fast linear-time operation.
*	Restore and fix SpillPointers pass (#4570)	Alon Zakai	2022-06-06	1	-0/+3
\| \| \| \| \| \| \| \|	We have some possible use cases for this pass, and so are restoring it. This reverts the removal in #3261, fixes compile errors in internal API changes since then, and flips the direction of the stack for the wasm backend.
*	Global Struct Inference pass: Infer two constants in struct.get (#4659)	Alon Zakai	2022-06-01	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This optimizes constants in the megamorphic case of two: when we know two function references are possible, we could in theory emit this: (select (ref.func A) (ref.func B) (ref.eq (..ref value..) ;; globally, only 2 things are possible here, and one has ;; ref.func A as its value, and the other ref.func B (ref.func A)) That is, compare to one of the values, and emit the two possible values there. Other optimizations can then turn a call_ref on this select into an if over two direct calls, leading to devirtualization. We cannot compare a ref.func directly (since function references are not comparable), and so instead we look at immutable global structs. If we find a struct type that has only two possible values in some field, and the structs are in immutable globals (which happens in the vtable case in j2wasm for example), then we can compare the references of the struct to decide between the two values in the field.
*	[Wasm GC] Signature Pruning (#4545)	Alon Zakai	2022-03-25	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a new signature-pruning pass that prunes parameters from signature types where those parameters are never used in any function that has that type. This is similar to DeadArgumentElimination but works on a set of functions, and it can handle indirect calls. Also move a little code from SignatureRefining into a shared place to avoid duplication of logic to update signature types. This pattern happens in j2wasm code, for example if all method functions for some virtual method just return a constant and do not use the this pointer.
*	MergeSimilarFunctions optimization pass (#4414)	Yuta Saito	2022-03-03	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge similar functions that only differs constant values (like immediate operand of const and call insts) by parameterization. Performing this pass at post-link time can merge more functions across objects. Inspired by Swift compiler's optimization which is derived from LLVM's one: https://github.com/apple/swift/blob/main/lib/LLVMPasses/LLVMMergeFunctions.cpp https://github.com/llvm/llvm-project/blob/main/llvm/docs/MergeFunctions.rst The basic ideas here are constant value parameterization and direct callee parameterization by indirection. Constant value parameterization is like below: ;; Before (func $big-const-42 (result i32) [[many instr 1]] (i32.const 44) [[many instr 2]] ) (func $big-const-43 (result i32) [[many instr 1]] (i32.const 45) [[many instr 2]] ) ;; After (func $byn$mgfn-shared$big-const-42 (result i32) [[many instr 1]] (local.get $0) ;; parameterized!! [[many instr 2]] ) (func $big-const-42 (result i32) (call $byn$mgfn-shared$big-const-42 (i32.const 42) ) ) (func $big-const-43 (result i32) (call $byn$mgfn-shared$big-const-42 (i32.const 43) ) ) Direct callee parameterization is similar to the constant value parameterization, but it parameterizes callee function i by ref.func instead. Therefore it is enabled only when reference-types and typed-function-references features are enabled. I saw 1 ~ 2 % reduction for SwiftWasm binary and Ruby's wasm port using wasi-sdk, and 3 ~ 4.5% reduction for Unity WebGL binary when -Oz.
*	Remove NoExitRuntime pass (#4431)	Alon Zakai	2022-01-26	1	-4/+0
\| \| \| \|	After emscripten-core/emscripten#15905 lands Emscripten will no longer use it, and nothing else needs it AFAIK.
*	Modernize code to C++17 (#3104)	Max Graey	2021-11-22	1	-2/+2
\|
*	Add fixup function for nested pops in catch (#4348)	Heejin Ahn	2021-11-22	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds `EHUtils::handleBlockNestedPops`, which can be called at the end of passes that has a possibility to put `pop`s inside `block`s. This method assumes there exists a `pop` in a first-descendant line, even though it can be nested within a block. This allows a `pop` to be nested within a `block` or a `try`, but not a `loop`, since that means the `pop` can run multile times. In case of `if`, `pop` can exist only in its condition; if a `pop` is in its true or false body, that's not in the first-descendant line. This can be useful when optimization passes create blocks to do transformations. Wrapping expressions wiith a block does not change semantics most of the time, but if pops happen to be inside a block generated by those passes, they can result in invalid binaries. To test this, this adds `passes/test_passes.cpp`, which is intended to contain multiple test passes that test a single (or more) utility functions separately. Without this kind of pass, it is hard to test various cases in which nested `pop`s can be generated in existing passes. This PR also adds `PassRegistry::registerTestPass`, which registers a pass that's intended only for internal testing and does not show up in `wasm-opt --help`. Fixes #4237.
*	[Wasm GC] Signature Refining pass (#4326)	Alon Zakai	2021-11-19	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is fairly short and simple after the recent refactorings. This basically just finds all uses of each signature/function type, and then sees if it receives more specific types as params. It then rewrites the types if so. This just handles arguments so far, and not return types. This differs from DeadArgumentElimination's refineArguments() in that that pass modifies each function by itself, changing the type of the function as needed. That is only valid if the type is not observable, that is, if the function is called indirectly then DAE ignores it. This pass will work on the types themselves, so it considers all functions sharing a type as a whole, and when it upgrades that type it ends up affecting them all. This finds optimization opportunities on 4% of the total signature types in j2wasm. Those lead to some benefits in later opts, but the effect is not huge.
*	[Wasm GC] Global Refining pass (#4344)	Alon Zakai	2021-11-18	1	-0/+3
\| \| \| \| \| \| \| \|	Fairly simple, this uses the existing infrastructure to find opportunities to refine the type of a global variable. This a common pattern in j2wasm for example, where a global begins as a null of $java.lang.Object (the least specific type) but it is in practice always assigned an object of some specific type.
*	[NFC] HeapRefining => TypeRefining (#4332)	Alon Zakai	2021-11-16	1	-3/+3
\|
*	[NFC] Rename GlobalSubtyping => HeapRefining (#4331)	Alon Zakai	2021-11-16	1	-3/+3
\|
*	Add GlobalSubtyping pass (#4306)	Alon Zakai	2021-11-10	1	-0/+4
\| \| \| \| \| \| \| \| \|	This specializes the fields of structs based on the types written to them. That is, if a field is of type A but in practice we always write some subtype B to it then we can change the type of the field to that. On j2wasm this manages to improve at least one field in 2% of types. Not a large amount, but this does lead to further benefits in later opts (e.g. about a third of the improvements are to turn a field non-nullable).
*	[Wasm GC] Enable GlobalTypeOptimization (#4305)	Alon Zakai	2021-11-05	1	-1/+6
\| \| \| \| \| \| \| \|	Now that all known issues with that pass are fixed, enable it by default. This adds it in a place that seems to make sense on j2wasm, but in general multiple cycles of optimization will be needed. This adds a test showing that we run this pass and that it helps ConstantFieldPropagation by running before it.
*	[Wasm GC] Global Type Optimization: Remove unread fields (#4255)	Alon Zakai	2021-10-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add struct.get tracking, and if a field is never read from, simply remove it. This will error if a field is written using struct.new with a value with side effects. It is not clear we can handle that, as if the struct.new is in a global then we can't save the other values to locals etc. to reorder things. We could perhaps use other globals for it (ugh) but at least for now, that corner case does not happen on any code I can see. This allows a quite large code size reduction on j2wasm output (20%). The reason is that many vtable fields are not actually read, and so removing them and the ref.func they hold allows us to get rid of those functions, and code that they reach.
*	[Wasm GC] GlobalTypeOptimization: Turn fields immutable when possible (#4213)	Alon Zakai	2021-10-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new pass to perform global type optimization. So far this just does one thing, to find fields with no struct.set and to turn them immutable (where possible - sub and supertypes must agree). To do that, this adds a GlobalTypeRewriter utility which rewrites all the heap types in the module, allowing changes while doing so. In this PR, the change is to flip the mutable field. Otherwise, the utility handles all the boilerplate of creating temp heap types using a TypeBuilder, and it handles replacing the types in every place they are used in the module. This is not enabled by default yet as I don't see enough of a benefit on j2cl. This PR is basically the simplest thing to do in the space of global type optimization, and the simplest way I can think of to fully test the GlobalTypeRewriter (which can't be done as a unit test, really, since we want to emit a full module and validate it etc.). This PR builds the foundation for more complicated things like removing unused fields, subtyping fields, and more.
*	Add an Intrinsics mechanism, and a call.without.effects intrinsic (#4126)	Alon Zakai	2021-09-10	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An "intrinsic" is modeled as a call to an import. We could also add new IR things for them, but that would take more work and lead to less clear errors in other tools if they try to read a binary using such a nonstandard extension. A first intrinsic is added here, call.without.effects This is basically the same as call_ref except that the optimizer is free to assume the call has no side effects. Consequently, if the result is not used then it can be optimized out (as even if it is not used then side effects could have kept it around). Likewise, the lack of side effects allows more reordering and other things. A lowering pass for intrinsics is provided. Rather than automatically lower them to normal wasm at the end of optimizations, the user must call that pass explicitly. A typical workflow might be -O --intrinsic-lowering -O That optimizes with the intrinsic present - perhaps removing calls thanks to it - then lowers it into normal wasm - it turns into a call_ref - and then optimizes further, which would turns the call_ref into a direct call, potentially inline, etc.
*	Optimize away dominated calls to functions that run only once (#4111)	Alon Zakai	2021-09-03	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some functions run only once with this pattern: function foo() { if (foo$ran) return; foo$ran = 1; ... } If that global is not ever set to 0, then the function's payload (after the initial if and return) will never execute more than once. That means we can optimize away dominated calls: foo(); foo(); // we can remove this To do this, we find which globals are "once", which means they can fit in that pattern, as they are never set to 0. If a function looks like the above pattern, and it's global is "once", then the function is "once" as well, and we can perform this optimization. This removes over 8% of static calls in j2cl.
*	Enable LocalCSE by default (#4089)	Alon Zakai	2021-08-19	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Enable it in -O3 and -Os and higher. This helps very little on output from LLVM, but also it does not alter compile times much anyhow. On code that has not been run through an optimizing compiler already, this can help quite a lot, e.g., 15% of code size on some wasm GC samples. This will not normally help with speed, as optimizing VMs do such things anyhow. However, this can help baseline compilers and interpreters and so forth.
*	LocalCSE rewrite (#4079)	Alon Zakai	2021-08-17	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Technically this is not a new pass, but it is a rewrite almost from scratch. Local Common Subexpression Elimination looks for repeated patterns, stuff like this: x = (a + b) + c y = a + b => temp = a + b x = temp + c y = temp The old pass worked on flat IR, which is inefficient, and was overly complicated because of that. The new pass uses a new algorithm that I think is pretty simple, see the detailed comment at the top. This keeps the pass enabled only in -O4, like before - right after flattening the IR. That is to make this as minimal a change as possible. Followups will enable the pass in the main pipeline, that is, we will finally be able to run it by default. (Note that to make the pass work well after flatten, an extra simplify-locals is added - the old pass used to do part of simplify-locals internally, which was one source of complexity. Even so, some of the -O4 tests have changes, due to minor factors - they are just minor orderings etc., which can be seen by inspecting the outputs before and after using e.g. --metrics) This plus some followup work leads to large wins on wasm GC output. On j2cl there is a common pattern of repeated struct.gets, so common that this pass removes 85% of all struct.gets, which makes the total binary 15% smaller. However, on LLVM-emitted code the benefit is minor, less than 1%.
*	[Wasm GC] Constant Field Propagation (#4052)	Alon Zakai	2021-08-05	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A field in a struct is constant if we can see that in the entire program we only ever write the same constant value to it. For example, imagine a vtable type that we construct with the same funcrefs each time, then (if we have no struct.sets, or if we did, and they had the same value), we could replace a get with that constant value, since it cannot be anything else: (struct.new $T (i32.const 10) (rtt)) ..no other conflicting values.. (struct.get $T 0) => (i32.const 10) If the value is a function reference, then this may allow other passes to turn what was a call_ref into a direct call and perhaps also get inlined, effectively a form of devirtualization. This only works in nominal typing, as we need to know the supertype of each type. (It could work in theory in structural, but we'd need to do hard work to find all the possible supertypes, and it would also become far less effective.) This deletes a trivial test for running -O on GC content. We have many more tests for GC at this point, so that test is not needed, and this PR also optimizes the code into something trivial and uninteresting anyhow.
*	[JS] Add a new OptimizeForJS pass (#4033)	Max Graey	2021-08-02	1	-0/+3
\| \| \| \| \|	Add a new OptimizeForJS pass which contains rewriting rules specific to JavaScript. LLVM usually lowers x != 0 && (x & (x - 1)) == 0 (isPowerOf2) to popcnt(x) == 1 which is ok for wasm and other targets but is quite expensive for JavaScript. In this PR we lower the popcnt pattern back to the isPowerOf2 pattern.
*	[Wasm GC] Local-Subtyping pass (#3765)	Alon Zakai	2021-07-23	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a local is say anyref, but all values assigned to it are something more specific like funcref, then we can make the type of the local more specific. In principle that might allow further optimizations, as the local.gets of that local will have a more specific type that their users can see, like this: (import .. (func $get-funcref (result funcref))) (func $foo (result i32) (local $x anyref) (local.set $x (call $get-funcref)) (ref.is_func (local.get $x)) ) => (func $foo (result i32) (local $x funcref) ;; updated to a subtype of the original (local.set $x (call $get-funcref)) (ref.is_func (local.get $x)) ;; this can now be optimized to "1" ) A possible downside is that using more specific types may not end up allowing optimizations but may end up increasing the size of the binary (say, replacing lots of anyref with various specific types that compress more poorly; also, for recursive types the LUB may be a unique type appearing nowhere else in the wasm). We should investigate the code size factors more later.
*	Add a extract-function-index pass	Thomas Lively	2021-06-17	1	-0/+3
\| \| \| \| \| \| \| \|	This is a useful alternative to extract-function when you don't know the function's name. Also moves the extract-function tests to be lit tests and re-uses them as extract-function-index tests.