summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
...
* [NFC] Move value numbering code to a header (#4345)Alon Zakai2021-11-192-48/+100
| | | This just moves code out of RedundantSetElimination.
* [Wasm GC] Signature Refining pass (#4326)Alon Zakai2021-11-194-0/+213
| | | | | | | | | | | | | | | | | | | This is fairly short and simple after the recent refactorings. This basically just finds all uses of each signature/function type, and then sees if it receives more specific types as params. It then rewrites the types if so. This just handles arguments so far, and not return types. This differs from DeadArgumentElimination's refineArguments() in that that pass modifies each function by itself, changing the type of the function as needed. That is only valid if the type is not observable, that is, if the function is called indirectly then DAE ignores it. This pass will work on the types themselves, so it considers all functions sharing a type as a whole, and when it upgrades that type it ends up affecting them all. This finds optimization opportunities on 4% of the total signature types in j2wasm. Those lead to some benefits in later opts, but the effect is not huge.
* Allow building basic HeapTypes in nominal mode (#4346)Thomas Lively2021-11-192-94/+157
| | | | | | | | | | | | | | | | As we work toward allowing nominal and structural types to coexist, any difference in how they can be built or used will be an inconvenient footgun that we will have to work around. In the spirit of reducing the differences between the type systems, allow TypeBuilder to construct basic HeapTypes in nominal mode just as it can in equirecursive mode. Although this change is a net increase in code complexity for not much benefit (wasm-opt never needs to build basic HeapTypes), it is also an incremental step toward getting rid of separate type system modes, so I expect it to simplify other PRs in the near future. This change also uncovered a bug in how the type fuzzer generated subtypes of basic HeapTypes. The generated subtypes did not necessarily have the intended `Kind`, which caused failures in nominal subtype validation in the fuzzer.
* [Wasm GC] Global Refining pass (#4344)Alon Zakai2021-11-184-0/+144
| | | | | | | | Fairly simple, this uses the existing infrastructure to find opportunities to refine the type of a global variable. This a common pattern in j2wasm for example, where a global begins as a null of $java.lang.Object (the least specific type) but it is in practice always assigned an object of some specific type.
* [Wasm GC] Update nulls to allow finding better LUBs (#4340)Alon Zakai2021-11-184-75/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is common in GC code to have stuff like this: x = null; .. x = Data(); Nulls in wasm have a type, and if that initial null has say anyref then before this PR we would keep the type of x as anyref. However, while nulls have types, all null values are identical, and so we can in fact change x's type to a nullable reference of Data, by also changing the null's type to something more specific. LUBFinder now has an API that can return the best possible LUB so far, and that can be told to update nulls if we decide that the new LUB is worth using. This updates the passes using LUBFinder to use the new API. Note how TypeRefining becomes simpler because the special logic it had in a subclass of LUBFinder is now part of the main class (it used to remember if there was a null default; LUBFinder now handles both a null default as well as other nulls). This requires some changes to existing tests to avoid them from optimizing using nulls in ways that ends up not testing the original intent. Specifically the dae-gc-refine-params.wast now has calls to get a null of a type, instead of just having a ref.null of that type (which could be optimized now). And dae-gc-refine-return uses locals instead of ref.nulls.
* Small cleanups in type fuzzer (#4337)Thomas Lively2021-11-172-20/+14
| | | | | | | - Do not require defaultable types in function returns - Increase likelihood of `none` function return types - Correctly generate subtypes of basic types - Actually check output in tests - Print to cout instead of cerr
* [OptimizeInstructions] Combine some relational ops joined Or/And (Part 2) ↵Max Graey2021-11-161-5/+24
| | | | | | (#4336) (i32(x) != 0) | (i32(y) != 0) ==> i32(x | y) != 0 (i64(x) != 0) | (i64(y) != 0) ==> i64(x | y) != 0
* [OptimizeInstructions] Combine some relational ops joined Or/And (Part 1) ↵Max Graey2021-11-161-8/+34
| | | | | | (#4333) (i32(x) == 0) & (i32(y) == 0) ==> i32(x | y) == 0 (i64(x) == 0) & (i64(y) == 0) ==> i64(x | y) == 0
* Add a fuzzer specifically for types (#4328)Thomas Lively2021-11-1511-120/+875
| | | | | | | | | | | | | | | Add a new fuzzer binary that repeatedly generates random types to find bugs in the type system implementation. Each iteration creates some number of root types followed by some number of subtypes thereof. Each built type can contain arbitrary references to other built types, regardless of their order of construction. Right now the fuzzer only finds fatal errors in type building (and in its own implementation), but it is meant to be extended to check other properties in the future, such as that LUB calculations work as expected. The logic for creating types is also intended to be integrated into the main fuzzer in a follow-on PR so that the main fuzzer can fuzz with arbitrarily more interesting GC types.
* [NFC] HeapRefining => TypeRefining (#4332)Alon Zakai2021-11-164-10/+10
|
* [NFC] Rename GlobalSubtyping => HeapRefining (#4331)Alon Zakai2021-11-164-10/+10
|
* Add support for relaxed-simd instructions (#4320)Ng Zhi An2021-11-1512-33/+495
| | | | | | | | | | | | | | | | | | | | | This adds relaxed-simd instructions based on the current status of the proposal https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md. Binary opcodes are based on what is listed in https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md#binary-format. Text names are not fixed yet, and some sort sort of names that maps to the non-relaxed versions are chosen for this prototype. Support for these instructions have been added to LLVM via builtins, adding support here will allow Emscripten to successfully compile files that use those builtins. Interpreter support has also been added, and they delegate to the non-relaxed versions of the instructions. Most instructions are implemented in the interpreter the same way as the non-relaxed simd128 instructions, except for fma/fms, which is always fused.
* Directize: Fix handling of non-nullable locals and unreachability (#4330)Alon Zakai2021-11-151-7/+12
| | | | | The order of operations could allow us to add vars but then later decide not to do the optimization due to unreachability. And then we did not do a fixup for non-nullability for those args, leading to a fuzzer error.
* Fix vacuum on rtts with depth (#4327)Alon Zakai2021-11-151-0/+9
| | | | | | | | Found by the fuzzer. Calling makeZero on an rtt with depth will error because we try to create a zero Literal from it, and we can't do that - we don't know a list of super types to give it. We could work around it, but we don't want to: if the rtt has depth then we can't make a nice zero for it, we'd need some rtt.subs anyhow, so simply mark it as a type we can't make a zero for.
* [OptimizeInstructions] Combine extend into i64 and 32-bit load operations ↵Max Graey2021-11-121-0/+40
| | | | | | | | | | | | | | | | | | | | (#4307) i64.extend_i32_u(i32.load8_u(x)) -> i64.load8_u(x) i64.extend_i32_u(i32.load16_u(x)) -> i64.load16_u(x) i64.extend_i32_s(i32.load8_u(x)) -> i64.load8_u(x) i64.extend_i32_s(i32.load16_u(x)) -> i64.load16_u(x) i64.extend_i32_s(i32.load8_s(x)) -> i64.load8_s(x) i64.extend_i32_s(i32.load16_s(x)) -> i64.load16_s(x) i64.extend_i32_u(i32.load(x))) -> i64.load32_u(x) i64.extend_i32_s(i32.load(x))) -> i64.load32_s(x) don't apply to i64.extend_i32_u(i32.load8_s(x)) -> skip i64.extend_i32_u(i32.load16_s(x)) -> skip i64.extend_i32_s(i32.atomic.load(x)) -> skip
* [NFC] Refactor param updating code to a shared location (#4325)Alon Zakai2021-11-123-90/+108
| | | This just moves code around.
* MergeBlocks: Rewrite to use a generic algorithm (#4323)Alon Zakai2021-11-123-61/+153
| | | | | | | | | | | | | | | | | | | | | Before this we had special logic for various call types. This replaces all that with a single general code path, which unifies everything except for control flow constructs (which remain as before, handled in a special way for each of them). The algorithm is simple and direct, basically it goes through the children and when it finds a block, it sees if it can move the block's contents outside of the parent. While doing so it takes into account effects and so forth. To make this easy, a random-access API is added to ChildIterator. Diff without whitespace makes the existing test updates a lot simpler. Note that this is not NFC as the old algorithm had some quirks like not taking into account effects when there were more than 2 children; the new code is uniform in how it handles things. This ends up removing 19% of all blocks in j2wasm, which reduces 1% of total code size.
* DeadArgumentElimination argument subtyping: Add fixups if the param is used ↵Alon Zakai2021-11-112-16/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#4319) Before, if we saw a param is written, that prevented us from subtyping it: function foo(x : oldType) { .. x = someValue; .. } Even if all calls to foo send some specific struct type that we'd like to subtype to, seeing that write stopped us. To handle such a write we need to do some extra handling for the case in which it is written a less-specific type (that is, if someValue is of type oldType, something like this: function foo(x : newType) { var x_old : oldType; x_old = x; // copy the param to x_old, and use x_old everywhere .. x_old = someValue; .. } That is, still refine the param type, but inside the function use a new local that has the old type, and is guaranteed to validate. This PR implements that logic so that we can optimize more cases. To allow that, this PR avoids trying to both refine a type and remove a param as being unused - that has annoying corner cases. If it is unused, we can simply remove it anyhow.
* Add GlobalSubtyping pass (#4306)Alon Zakai2021-11-107-3/+292
| | | | | | | | | This specializes the fields of structs based on the types written to them. That is, if a field is of type A but in practice we always write some subtype B to it then we can change the type of the field to that. On j2wasm this manages to improve at least one field in 2% of types. Not a large amount, but this does lead to further benefits in later opts (e.g. about a third of the improvements are to turn a field non-nullable).
* CoalesceLocals: Remove a redundant tee of the same local as a parent set (#4318)Alon Zakai2021-11-091-1/+6
| | | | | | Tiny followup to #4314 Also updates some function types in test output, fixing breakage on main after racing landings.
* CoalesceLocals: Rewrite the algorithm to be linear and to ignore copies (#4314)Alon Zakai2021-11-101-27/+161
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old algorithm can be summarized as: In each basic block, start at the beginning. Each pair of live locals there might interfere with each other, as they might arrive from different entry blocks with different values. Afterwards, go through the block and find overlapping live ranges, and mark interferences there as well. This is non-linear because at the start of the block we do a double-loop over all pairs of live locals, which in general can be O(N^2) (N - number of locals). It also has the downside of ignoring copies: if two locals have overlapping live ranges but they must have identical values on those ranges, they do not actually interfere, for example x = 10; y = x; .. // live ranges overlap here foo(x, y); // live ranges end here. We can ignore this overlap since the copy shows they are identical there, but the pass did not take this into account. To some extent other passes can remove such copies (SimplifyLocals, MergeLocals, RedundantSetElimination), but in general this was a weak spot for the optimizer. I realized there is a solution to both these problems: In Wasm, given that we have a default value for all locals, if a local is live at the start of a block then it must be live at the end of all the blocks reaching it. That is so because the liveness will extend backwards all the way to some set of the local, possibly all the way to the zero-initialization at the start of the function, and it extends that way through all predecessor blocks. A consequence of this is that there are no interferences between locals that only occur during a merge: The live ranges include the predecessor blocks, and theirs, and so forth, until we reach a block where one of the locals is assigned a value different than the other. That is a necessary and sufficient condition for intererence, and therefore when processing a block we only need to look at its contents, and can ignore the merging of control flow, which allows us to be linear. More details on this and on the new algorithm in comments in the source, but the basic idea is that it simply goes through each block in a linear way, finding which values are assigned to each local (using a numbering of unique values), and noting which are live at each time. If two locals are live and one is assigned a value that is not the same as the value in the other, mark them as interfering. This is of substantial benefit to j2wasm output, I believe because it is common there to find local subexpression elimination opportunities after inlining, and each time we find one we add a local. If we inline different functions into the same target, we may end up with copied locals for each of them. (This was not noticed in the past because it is very rare on LLVM output, which has already had inlining and GVN etc. done.) There is a small benefit to LLVM output as well, though just a few percent at best. However, it is enough to be noticeable on some of the code size tests. This is also faster than the previous pass. It's normally not noticeable as this pass is not one of the slowest anyhow, but I found some real-world codebases where the pass becomes 50% faster. I have not found any case where it is slower than the old algorithm. Fuzzed over several days to be sure this is correct, and also verified on the emscripten test suite.
* OptimizeInstructions: Fix static cast optimizations (#4311)Alon Zakai2021-11-091-8/+22
| | | | | | | | | | | | | | We found one cast that has another as its input, and forgot that the child was possibly a fallthrough value. That is, there might be more code that needs to be kept around. Rather than fix the middle of the three cases there - the one with HeapType::isSubType(childIntendedType, intendedType) - I noticed it is not actually needed. That case checks if the child's type is more specific than the parent's, and if so, then the parent is not needed. But we already handle that earlier above in the same function: regardless of what the child of a static cast is, if the cast is static and the input is the proper type already, the cast is unneeded (see lines 1565-1566).
* [NFC] Add StructUtils namespace, and Scanner => StructScanner (#4317)Alon Zakai2021-11-093-19/+30
| | | | This avoids cluttering the main wasm namespace, and clarifies what the scanner does.
* [EH] Improve catch validation (#4315)Heejin Ahn2021-11-085-0/+177
| | | | | | | | | | | | | | | | | | | This improves validation of `catch` bodies mostly by checking the validity of `pop`s. For every `catch` body: - Checks if its tag exists - If the tag's type is none: - Ensures there shouldn't be any `pop`s - If the tag's type is not none: - Checks if there's a single `pop` within the catch body - Checks if the tag type matches the `pop`'s type - Checks if the `pop`'s location is valid For every `catch_all` body: - Ensures there shuldn't be any `pop`s This uncovers several bugs related to `pop`s in existing tests, which this PR also fixes.
* Print heap types in text format in nominal mode (#4316)Alon Zakai2021-11-081-0/+4
| | | | | | | Without this roundtripping may not work in nominal mode, as we might not assign the expected heap types in the right places. Specifically, when the signature matches but the nominal types are distinct then we need to keep them that way (and the sugar in the text format parsing will merge them).
* Effects.h: Fix RefAs (#4312)Alon Zakai2021-11-091-3/+9
| | | | | | | | | We marked that as only trapping if the input as nullable. But ref.as_func will trap if it isn't a func, for example. We could in theory try to check if a trap is possible, like checking if the input is already non-nullable or already a function, etc., but we have optimization passes to get rid of RefAs when they are not needed anyhow, so there is no point to duplicate that here.
* [Wasm GC] Enable GlobalTypeOptimization (#4305)Alon Zakai2021-11-051-1/+6
| | | | | | | | Now that all known issues with that pass are fixed, enable it by default. This adds it in a place that seems to make sense on j2wasm, but in general multiple cycles of optimization will be needed. This adds a test showing that we run this pass and that it helps ConstantFieldPropagation by running before it.
* Fix fuzz bug in RemoveUnusedBrs with incremental type updating (#4309)Alon Zakai2021-11-051-2/+7
| | | | | The BrOn logic there is incremental in optimizing and updating types, and so we cannot assume that at every point in the middle the types are fully updated.
* Return the correct flow when an RTT is breaking (#4310)Thomas Lively2021-11-051-1/+1
| | | Fixes #4308.
* Fuzz more basic GC types (#4303)Thomas Lively2021-11-042-116/+248
| | | | | Generate both nullable and non-nullable references to basic HeapTypes and introduce `i31` and `data` HeapTypes. Generate subtypes rather than exact types for all concrete-typed children.
* ChildLocalizer: Do not use locals needlessly (#4292)Alon Zakai2021-11-041-8/+34
| | | | | | | | | | | We only need to use locals if there are effects we can't remove, or if they interact with other children. Improve the comment to explain what the ChildLocalizer is working towards: a state where all the children of the expression can be reordered or removed freely (local.gets have that property, as do other things if they have no relevant effects). Aside from avoiding wasteful locals, this is necessary for running GlobalTypeOptimization on j2wasm: That code will do a global.get of an rtt, and those cannot be placed in locals.
* [NFC] Factor fuzzer randomness into a separate utility (#4304)Thomas Lively2021-11-045-85/+164
| | | | In preparation for using it from a separate file specifically for generating random HeapTypes that has no need to depend on all of fuzzing.h.
* Fix RTTs for RTT-less instructions (#4294)Thomas Lively2021-11-037-155/+160
| | | | | | | | | | | | Allocation and cast instructions without explicit RTTs should use the canonical RTTs for the given types. Furthermore, the RTTs for nominal types should reflect the static type hierarchy. Previously, however, we implemented allocations and casts without RTTs using an alternative system that only used static types rather than RTT values. This alternative system would work fine in a world without first-class RTTs, but it did not properly allow mixing instructions that use RTTs and instructions that do not use RTTs as intended by the M4 GC spec. This PR fixes the issue by using canonical RTTs where appropriate and cleans up the relevant casting code using std::variant.
* [Wasm GC] LUBFinder helper. NFC (#4298)Alon Zakai2021-11-013-18/+67
| | | | | | | | | This is a minor refactoring in DAE to have a helper class that does the incremental LUB calculation. The class is also used in LocalSubtyping, where it has the effect of making the work incremental which it was not before (that would have no observable consequence, but it should make us faster in the common case where we fail to find a new LUB). This will allow further optimization in a central place later.
* Effects: Differentiate mutable from immutable globals (#4286)Alon Zakai2021-10-294-21/+42
| | | | | | | | | | | | | Similar to what we do with structs, if a global is immutable then we know it cannot interact with calls. This changes the JS API for getSideEffects(). That was actually broken, as passing in the optional module param would just pass it along to the compiled C code, so it was coerced to 0 or 1, and not a pointer to a module. To fix that, this now does module.ptr to actually get the pointer, and this is now actually tested as without a module we cannot compute the effects of a global. This PR also makes the module param mandatory in the JS API, as again, without a module we can't compute global effects. (The module param has already been mandatory in the C++ API for some time.)
* [NFC] Use std::variant in GCData (#4289)Thomas Lively2021-10-283-7/+15
| | | | This helps prevent bugs where we assume that the GCData has either a HeapType or Rtt without checking. Indeed, one such bug is found and fixed.
* Heap2Local: Handle loops (#4288)Alon Zakai2021-10-281-2/+8
| | | | | | | When the allocation we optimize away flows through a loop, then just like with a block we must change the type to be nullable, since we are replacing the allocation with a null. Fixes #4287
* Switch binaryen.js/wasm to ESM (#4280)dcode2021-10-283-46/+6
|
* Fix printing of some unreachable GC instructions (#4281)Alon Zakai2021-10-271-22/+28
| | | | | We have separate logic for printing their headers and bodies, and they were not in sync. Specifically, we would not emit drops in the body of a block, which is not valid, and would fail roundtripping on text.
* [Wasm GC] Constant Field Propagation: Handle immutable globals (#4258)Alon Zakai2021-10-271-17/+49
| | | | | | | | | | | | | | | | | | If we write an immutable global to a field, and that is the only thing we ever write, then we can replace reads of the field with a get of the global. To do that, this tracks immutable globals written to fields and not just constant values. Normally this is not needed, as if the global is immutable then we propagate its constant value to everywhere anyhow. However, for references this is useful: If we have a global immutable vtable, for example, then we cannot replace a get of it with a constant. So this PR helps with immutable reference types in globals, allowing us to propagate global.gets to them to more places, which then can allow optimizations there. This + later opts removes 25% of array.gets from j2wasm. I believe almost all of those are itable calls, so this means those are getting devirtualized now. I see something like a 5% speedup due to that.
* [OptimizeInstructions] Canonicalize relational ops with near zero on rhs (#4272)Max Graey2021-10-261-2/+46
| | | | | | | | | | | | | Canonicalize: (signed)x > -1 ==> x >= 0 (signed)x <= -1 ==> x < 0 (signed)x < 1 ==> x <= 0 (signed)x >= 1 ==> x > 0 (unsigned)x < 1 ==> x == 0 (unsigned)x >= 1 ==> x != 0 This should help #4265, and in general 0 is usually a more common constant, and reasonable to canonicalize to.
* [Wasm GC] Add missing Tuple logic to TypeRewriter (#4278)Alon Zakai2021-10-261-0/+7
| | | | | Without this, we'd just return the old type for the tuple, which meant its fields referred to unrewritten types, and possible validation errors if the types changed.
* [NFC] Create a .cpp file for fuzzer implementation (#4279)Thomas Lively2021-10-264-3083/+3172
| | | | | | Having a monolithic header file containing all the implementation meant there was no good way to split up the code or introduce new files. The new implementation file and source directory will make it much easier to add new fuzzing functionality in new files.
* Use std::variant in ConstantFieldPropagation (#4270)Alon Zakai2021-10-261-30/+40
| | | Saves a little code size and might prevent some bugs.
* GlobalTypeOptimization: Handle side effects in removed fields in struct.new ↵Alon Zakai2021-10-262-8/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#4263) If struct.new operands have side effects, and we are removing the operand as the field is removed, we must keep the side effects. To handle that, store all the operands in locals and read from the locals, and then removing a local.get is always safe to do, and nothing has been reordered: (struct.new (A) (side effect) ;; this field will be removed (B) ) => (local.set $a (A)) (local.set $t (side effect)) (local.set $b (B)) (struct.new (local.get $a) (local.get $b) ) Later passes can remove unneeded local operations etc. This is necessary before enabling this pass, as this corner case occurs on j2wasm.
* Reducer: Apply --debug to all commands (#4275)Alon Zakai2021-10-251-3/+4
| | | | | | Do so by applying --debug to extraFlags right at the start. That global is used everywhere already. In particular, this PR removes manually adding -g in the first diff chunk here, and you can see extraFlags appears there already on the previous line.
* [EH] Support try-delegate in CFGWalker (#4269)Heejin Ahn2021-10-211-1/+23
| | | | This adds support for `try`-`delegate` to `CFGWalker`. This also adds a single test for `catch`-less `try`.
* [EH] Make doEndThrowingInst simpler (NFC) (#4267)Heejin Ahn2021-10-211-15/+10
| | | | | | | The current code the innermost (`i`th) case specially first and handles `i-1`th `try` in each loop iteration. This puts the `i`th case in the loop and each iteration handles `i`th `try`, which is simpler. Then we don't need to check `throwingInstsStack.empty()` in the beginning because the `for` loop wouldn't be entered if it's empty anyway.
* [Wasm GC] Global Type Optimization: Remove unread fields (#4255)Alon Zakai2021-10-205-56/+273
| | | | | | | | | | | | | | | Add struct.get tracking, and if a field is never read from, simply remove it. This will error if a field is written using struct.new with a value with side effects. It is not clear we can handle that, as if the struct.new is in a global then we can't save the other values to locals etc. to reorder things. We could perhaps use other globals for it (ugh) but at least for now, that corner case does not happen on any code I can see. This allows a quite large code size reduction on j2wasm output (20%). The reason is that many vtable fields are not actually read, and so removing them and the ref.func they hold allows us to get rid of those functions, and code that they reach.
* SimplifyGlobals: Detect trivial read-only-to-write functions (#4257)Alon Zakai2021-10-192-18/+71
| | | | | | | | | | | | | | | | | | | | | We already detected code that looks like if (foo == 0) { foo = 1; } That "read only to write" pattern occurs also in functions, like this: function bar() { if (foo == 0) return; foo = 1; } This PR detects that pattern. It moves code around to share almost all the logic with the previous pattern (the git diff is not that useful there, sadly, but looking at them side by side that should be obvious). This helps in j2cl on some common clinits, where the clinit function ends up empty, which is exactly this pattern.