summaryrefslogtreecommitdiff
path: root/src/ir
Commit message (Collapse)AuthorAgeFilesLines
...
* SubtypingDiscoverer: Differentiate non-flow subtyping constraints (#6344)Alon Zakai2024-02-271-2/+18
| | | | | | | | | | | | | | | | | | When we do a local.set of a value into a local then we have both a subtyping constraint - for the value to be valid to put in that local - and also a flow of a value, which can then reach more places. Such flow then interacts with casts in Unsubtyping, since it needs to know what can flow where in order to know how casts force us to keep subtyping relations. That regressed in the not-actually-NFC #6323 in which I added the innocuous lines to add subtyping constraints in ref.eq. It seems fine to require that the arms of a RefEq must be of type eqref, but Unsubtyping then assuming those arms flowed into a location of type eqref... which means casts might force us to not optimize some things. To fix this, differentiate the rare case of non-flowing subtyping constraints, which is basically only RefEq. There are perhaps a few more cases (like i31 operations) but they do not matter in practice for Unsubtyping anyhow; I suggest we land this first to undo the regression and then at our leisure investigate the other instructions.
* Typed continuations: cont.new instructions (#6308)Frank Emrich2024-02-226-0/+16
| | | | | | | | | | | | | | | | | This PR is part of a series that adds basic support for the [typed continuations/wasmfx proposal](https://github.com/wasmfx/specfx). This particular PR adds support for the `cont.new` instruction for creating continuations, documented [here(https://github.com/wasmfx/specfx/blob/main/proposals/continuations/Overview.md#instructions). In short, these instructions are of the form `(cont.new $ct)` where `$ct` must be a continuation type. The instruction takes a single (nullable) function reference as its argument, which means that the folded representation of the instruction is of the form `(cont.new $ct (foo ...))`. Support for the instruction is implemented in both the old and the new wat parser. Note that this PR does not implement validation of the new instruction.
* [NFC] Use SubtypingDiscoverer in StringLowering (#6325)Alon Zakai2024-02-201-6/+17
| | | | | | | | This replaces horrible hacks to find which nulls need to switch (from none to noext) with general code using SubtypingDiscoverer. That helper is aware of where each expression is written, so we can find those nulls trivially. This is NFC on existing usage but should fix any remaining bugs with null constants.
* subtype-exprs.h additions [NFC] (#6323)Alon Zakai2024-02-201-8/+31
| | | | | | This pulls out the subtype-exprs.h parts of #6108 These are NFC in the current codebase, but are fixes for that unlanded PR, and another unrelated PR that will be opened shortly.
* StringLowering: Modify string=>extern also in public types (#6301)Alon Zakai2024-02-132-4/+16
| | | | We want to actually remove all stringref appearances, in both public and private types.
* Allow updating basic HeapTypes in GlobalTypeRewriter::mapTypes (#6266)Alon Zakai2024-02-011-3/+0
|
* GUFA: Propagate string literals (#6262)Alon Zakai2024-02-011-1/+2
| | | We only noted the type but not the literal value.
* StringGathering pass (#6257)Alon Zakai2024-01-311-1/+0
| | | | | | | | | | | | | | | This pass finds all string.const and creates globals for them. After this transform, no string.const appears anywhere but in a global, and each string appears in one global which is then global.get-ed everywhere. This avoids overhead in VMs where executing a string.const is an allocation, and is also a good step towards imported strings. For that, this pass will be extended from gathering to a full lowering pass, which will first gather into globals as this pass does, and then turn each of those globals with a string.const into an imported externref. (For that reason this pass is in a file called StringLowering, as the two passes will share much of their code, and the larger pass should decide the name I think.) This pass runs in -O2 and above. Repeated executions have no downside (see details in code).
* Directize: Handle overflows and out of bounds (#6255)Alon Zakai2024-01-301-1/+8
|
* Memory flattening: Check for overflow (#6233)Alon Zakai2024-01-241-1/+6
| | | | | Fixes a fuzz testcase for wasm-ctor-eval. Add the beginnings of a polyfill for stdckdint.h to help that.
* Typed continuations: resume instructions (#6083)Frank Emrich2024-01-117-0/+40
| | | | | This PR is part of a series that adds basic support for the [typed continuations proposal](https://github.com/wasmfx/specfx). This particular PR adds support for the `resume` instruction. The most notable missing feature is validation, which is not implemented, yet.
* [NFC] Add more const annotations + a trivial == (#6216)Alon Zakai2024-01-092-3/+8
|
* [EH] Add instructions for new proposal (#6181)Heejin Ahn2023-12-1910-4/+52
| | | | | | | | | | | | | | | | | | | | | | | | This adds basic support for the new instructions in the new EH proposal passed at the Oct CG hybrid CG meeting: https://github.com/WebAssembly/meetings/blob/main/main/2023/CG-10.md https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md This mainly adds two instructions: `try_table` and `throw_ref`. This is the bare minimum required to read and write text and binary format, and does not include analyses or optimizations. (It includes some analysis required for validation of existing instructions.) Validation for the new instructions is not yet included. `try_table` faces the same problem with the `resume` instruction in #6083 that without the module-level tag info, we are unable to know the 'sent types' of `try_table`. This solves it with a similar approach taken in #6083: this adds `Module*` parameter to `finalize` methods, which defaults to `nullptr` when not given. The `Module*` parameter is given when called from the binary and text parser, and we cache those tag types in `sentTypes` array within `TryTable` class. In later optimization passes, as long as they don't touch tags, it is fine to call `finalize` without the `Module*`. Refer to https://github.com/WebAssembly/binaryen/pull/6083#issuecomment-1854634679 and #6096 for related discussions when `resume` was added.
* Remove empty _ARRAY/_VECTOR defines (NFC) (#6182)Heejin Ahn2023-12-145-19/+2
| | | | | | | `_VECTOR` or `_ARRAY` defines in `wasm-delegations-fields.def` are supposed to be defined in terms of their non-vector/array counterparts when undefined. This removes empty `_VECTOR`/`_ARRAY` defines when including `wasm-delegations-fields.def`, while adding definitions for `DELEGATE_GET_FIELD` in case it is missing.
* Inlining: Copy no-inline flags when copying a function (#6165)Alon Zakai2023-12-121-0/+3
| | | | Those fields should be copied together with all the rest of the metadata that already is. This was just missed in the prior PR.
* wasm-metadce all the things (#6142)Alon Zakai2023-11-301-0/+26
| | | | | | | | | | | | | | | Remove hardcoded paths for globals/functions/etc. in favor of general code paths that support all the module elements uniformly. As a result of that, we now support all parts of wasm, such as tables and element segments, that we didn't before. This refactoring is NFC aside from adding functionality. Note that this reduces the size of wasm-metadce by 10% while increasing its functionality - the benefits of writing generic code. To support this, add some trivial generic helpers to get or iterate over module elements using their kind in a dynamic manner. Using them might make wasm-metadce slightly slower, but I can't measure any difference.
* [NFC] Refactor out subtyping discovery code (#6106)Alon Zakai2023-11-151-0/+338
| | | | | | | | | | This implements an idea I mentioned in the past, to extract the subtyping discovery code out of Unsubtyping so it could be reused elsewhere. Example possible uses: the validator could use to remove a lot of code, and also a future PR of mine will need it. Separately from those, I think this is a nice refactoring as it makes Unsubtyping much smaller. This just moves the code out and adds some C++ template elbow grease as needed.
* [NFC] Add LocalLocation for future use (#6105)Alon Zakai2023-11-132-0/+27
| | | | | | This is not needed in GUFA as it tracks local values precisely (each set is connected to the gets that actually read from it), but in a future PR it will be useful to track local values per index (each set is connected to all gets for that index, i.e., each local index is a single "location").
* [NFC] Simplify LiteralUtils::canMakeZero (#6093)Alon Zakai2023-11-091-13/+1
|
* Implement table.copy (#6078)Alon Zakai2023-11-064-0/+10
| | | Helps #5951
* Fix unreachable code in LocalGraph by making it imprecise there (#6048)Alon Zakai2023-10-242-18/+18
| | | | | | | | | | Followup to #6046 - the fuzzer found we missed handling the case of the entry itself being unreachable, or of an unreachable loop later. Properly identifying unreachable code requires a flow analysis, unfortunately, so this PR gives up on that and instead allows LocalGraph to be imprecise in unreachable code. That avoids adding any overhead, but does mean the IR may be slightly confusing when debugging. It does not have any optimization downsides, however, as it only affects unreachable code. Also add a dump() impl in that file which helps debugging.
* [NFC] LocalGraph: Optimize params with no sets (#6046)Alon Zakai2023-10-241-1/+34
| | | | | | | | | | | If a local index has no sets, then all gets of that index read from the entry block (a param, or a zero for a local). This is actually a common case, where a param has no other set, and so it is worth optimizing, which this PR does by avoiding any flowing operation at all for that index: we just skip and write the entry block as the source of information for such gets. #6042 on precompute-propagate goes from 3 minutes to 2 seconds with this (!). But that testcase is rather special in that it is a huge function with many, many gets in it, so the overhead we remove is very noticeable there.
* [NFC] LocalGraph: Move definition to logical place (#6045)Alon Zakai2023-10-241-3/+5
| | | | | | | | | | | | allGets was declared in a scope that kept it alive for all blocks, and at the end of the loop we clear the gets for a particular block. That's clumsy, and makes a followup harder, so this PR moves it to the natural place for it. (That is, it moves it to the scope that handles a particular block, and removes the manual clearing-out of the get at the end of the loop iteration.) Optimizing compilers are smart enough to be efficient about stack allocations of objects inside loops anyhow (which I measured). Helps #6042.
* Reuse existing function types for blocks (#6022)Thomas Lively2023-10-181-48/+91
| | | | | | | | | | | | | | | | | | | | | Type annotations on multivalue blocks (and loops, ifs, and trys) are type indices that refer to function types in the type section. For these type annotations, the identities of the function types does not matter. As long as the referenced type has the correct parameters and results, it will be valid to use. Previously, when collecting module types, we always used the "default" function type for multivalue control flow, i.e. we used a final function type with no supertypes in a singleton rec group. However, in cases where the program already contains another function type with the expected signature, using the default type is unnecessary and bloats the type section. Update the type collecting code to reuse existing function types for multivalue control flow where possible rather than unconditionally adding the default function type. Similarly, update the binary writer to use the first heap type with the required signature when emitting annotations on multivalue control flow structures. To make this all testable, update the printer to print the type annotations as well, rather than just the result types. Since the parser was not able to parse those newly emitted type annotations, update the parser as well.
* Add getGeneralSuperType() that includes basic supers, and use in fuzzer (#6005)Alon Zakai2023-10-171-1/+0
| | | | With this, the fuzzer can replace e.g. an eq expression with a specific struct type, because now it is away that struct types have eq as their ancestor.
* [NFC] Rename getSuperType to getDeclaredSuperType (#6015)Alon Zakai2023-10-176-12/+12
| | | | A later PR will add getSuperType which will mean "get the general super type - either declared, or not".
* Fix a bug printing and emitting empty, passive element segments (#6002)Thomas Lively2023-10-091-7/+4
| | | | | | | | Empty, passive element segments were always emitted as having `func` type because all their elements trivially were RefFunc (because they have no elements) and because we were incorrectly checking table types if they existed instead of the element segment's type directly to see if it was non-func. Fix the bug by checking each element segment's type directly and add a test.
* RemoveUnusedBrs: Allow less unconditional work and in particular division ↵Alon Zakai2023-10-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#5989) Fixes #5983: The testcase from there is used here in a new testcase remove-unused-brs_levels in which we check if we are willing to unconditionally do a division operation. Turning an if with an arm that does a division into a select, which always does the division, is almost 5x slower, so we should probably be extremely careful about doing that. I took some measurements and have some suggestions for changes in this PR: * Raise the cost of div/rem to what I measure on my machine, which is 5x slower than an add, or worse. * For some reason we added the if arms rather than take the max of them, so fix that. This does not help the issue, but was confusing. * Adjust TooCostlyToRunUnconditionally in the pass from 9 to 8 (this helps balance the last point). * Use half that value when not optimizing for size. That is, we allow only 4 extra unconditional work normally, and 8 in -Os, and when -Oz then we allow any extra amount. Aside from the new testcases, some existing ones changed. They all appear to change in a reasonable way, to me. We should perhaps go even further than this, and not even run a division unconditionally in -Os, but I wasn't sure it makes sense to go that far as other benchmarks may be affected. For now, this makes the benchmark in #5983 run at full speed in -O3 or -Os, and it remains slow in -Oz. The modified version of the benchmark that only divides in the if (no other operations) is still fast in -O3, but it become slow in -Os as we do turn that if into a select (but again, I didn't want to go that far as to overfit on that one benchmark).
* [NFC] Refactor SupertypesFirst utility (#5979)Thomas Lively2023-09-271-4/+3
| | | | | | Move the topological sort from the constructor to a separate method. This CRTP utility calls into its subclass to query supertype relationships, but doing so in the base class constructor means that the subclass has not been fully initialized yet, which can cause problems.
* ConstantFieldPropagation: Fully handle copies (#5969)Alon Zakai2023-09-261-1/+8
| | | | | | | | | | | | | | | | If we see A->f0 = A->f0 then we might be copying fields not only between instances of A but also of any subtypes of A, and so if some subtype has value x then that x might now have reached any other subtype of A (even in a sibling type, so long as A is their parent). We already thought we were handling that, but the mechanism we used to do so (copying New info to Set info, and letting Set info propagate) was not enough. Also add a small constructor to save the work of computing subTypes again. Add TODOs for some cases that we could optimize regarding copies but do not, yet.
* Support i8/i16 mutable arrays as public types for string interop (#5814)Alon Zakai2023-09-211-0/+5
| | | | | Probably any array of non-reference data can be allowed to be public and sent out of the module, as it is just data. For now, however, just special case the i8 and i16 array types which are useful already for string interop.
* Add passes to finalize or unfinalize types (#5944)Alon Zakai2023-09-182-0/+11
| | | | | | | | | TypeFinalization finalizes all types that we can, that is, all private types that have no children. TypeUnFinalization unfinalizes (opens) all (private) types. These could be used by first opening all types, optimizing, and then finalizing, as that might find more opportunities. Fixes #5933
* Implement table.fill (#5949)Thomas Lively2023-09-184-0/+9
| | | | | | | | This instruction was standardized as part of the bulk memory proposal, but we never implemented it until now. Leave similar instructions like table.copy as future work. Fixes #5939.
* Replace I31New with RefI31 everywhere (#5930)Thomas Lively2023-09-136-6/+6
| | | | | | | | Globally replace the source string "I31New" with "RefI31" in preparation for renaming the instruction from "i31.new" to "ref.i31", as implemented in the spec in https://github.com/WebAssembly/gc/pull/422. This would be NFC, except that it also changes the string in the external-facing C APIs. A follow-up PR will make the corresponding behavioral change.
* Make final types the default (#5918)Thomas Lively2023-09-091-1/+1
| | | | | | | | | | | | | Match the spec and parse the shorthand binary and text formats as final and emit final types without supertypes using the shorthands as well. This is a potentially-breaking change, since the text and binary shorthands can no longer be used to define types that have subtypes. Also make TypeBuilder entries final by default to better match the spec and update the internal APIs to use the "open" terminology rather than "final" terminology. Future changes will update the text format to use the standard "sub open" rather than the current "sub final" keywords. The exception is the new wat parser, which supporst "sub open" as of this change, since it didn't support final types at all previously.
* Remove the GCNNLocals feature (#5080)Thomas Lively2023-08-311-10/+0
| | | | | Now that the WasmGC spec has settled on a way of validating non-nullable locals, we no longer need this experimental feature that allowed nonstandard uses of non-nullable locals.
* Validate and fix up tuples with non-nullable elements (#5909)Thomas Lively2023-08-302-24/+81
| | | | | | The code validating and fixing up non-nullable locals previously did not correctly handle tuples that contained non-nullable elements, which could have resulted in invalid modules going undetected. Update the code to handle tuples and add tests.
* Rename multimemory flag (#5890)Ashley Nelson2023-08-212-3/+3
| | | Renaming the multimemory flag in Binaryen to match its naming in LLVM.
* Improve cast optimizations (#5876)Thomas Lively2023-08-171-0/+68
| | | | | | | | | | | | Simplify the optimization of ref.cast and ref.test in OptimizeInstructions by moving the loop that examines fallthrough values one at a time out to a shared function in properties.h. Also simplify ref.cast optimization by analyzing the cast result in just one place. In addition to simplifying the code, also make the cast optimizations more powerful by analyzing the nullability and heap type of the cast value independently, resulting in a potentially more precise analysis of the cast behavior. Also improve optimization power by considering fallthrough values when optimizing the SuccessOnlyIfNonNull case.
* Remove legacy WasmGC instructions (#5861)Thomas Lively2023-08-091-3/+3
| | | | | Remove old, experimental instructions and type encodings that will not be shipped as part of WasmGC. Updating the encodings and text format to match the final spec is left as future work.
* LinearExecutionWalker: Optionally connect blocks for Br and BrOn (#5869)Alon Zakai2023-08-091-4/+13
| | | | | | | | | | | | | | | | | | | Br and BrOn can consider the code before and after them connected if it might be reached (which is the case if the Br has a condition, which BrOn always has). The wasm2js changes may look a little odd as some of them have this: i64toi32_i32$1 = i64toi32_i32$2; i64toi32_i32$1 = i64toi32_i32$2; I looked into that and the reason is that those outputs are not optimized, and also even in unoptimized wasm2js we do run simplify-locals once (to try to reduce the downsides of flatten). As a result, this PR makes a difference there, and that difference can lead to such odd duplicated code after other operations. However, there are no changes to optimized wasm2js outputs, so there is no actual problem. Followup to #5860.
* LinearExecutionTraversal: Add an option to connect adjacent code, use in ↵Alon Zakai2023-08-081-5/+42
| | | | | | | | | | | SimplifyLocals (#5860) This addresses most of the minor regression from the correctness fix in #5857. That PR makes us consider calls as branching, but in some cases it is ok to ignore that branching (see the comment in the code here), which this PR allows as an option. This undoes one test change from that PR, showing it undoes the regression for SimplifyLocals. More tests are added to cover this specifically as well.
* Fix LinearExecutionWalker on calls (#5857)Alon Zakai2023-08-071-3/+22
| | | | | | | | | | | | | | Calls were simply not handled there, so we could think we were still in the same basic block when we were not, affecting various passes (but somehow this went unnoticed until the TNHOracle #5850 ran on some particular Java code). One existing test was affected, and two new tests are added: one for TNHOracle where I detected this, and one in OptimizeCasts which is perhaps a simpler way to see the problem. All the cases but the TNH one, however, do not need this fix for correctness since they actually don't care if a call would throw. As a TODO, we should find a way to undo this minor regression. The regression only affects builds with EH enabled, though, so most users should be unaffected even in the interm.
* [NFC] Move ModuleUtils copying and renaming logic from header to cpp (#5855)Alon Zakai2023-08-024-188/+220
| | | | | | | | None of that code is speed-sensitive, or at least doesn't need to be inlined to be fast. Move it to cpp for faster compile times. This caused a cascade of necessary header fixes (i.e. after removing unneeded header inclusions in module-utils.h, files that improperly depended on that stopped working and needed an added include).
* [NFC] Remove some unneeded cruft from PossibleContents::intersect() (#5854)Alon Zakai2023-08-021-25/+4
| | | | | | | | In order of appearance in this diff: * Use heapType rather than otherHeapType when either will do, for brevity. * We checked isNull after checking for isLiteral (line 173), but every null is a literal. * Calling setNoneOrNull simplifies some repetitive code. * We checked isLiteral yet again lower down.
* Fix a fuzz bug in TypeMapper (#5851)Thomas Lively2023-08-022-8/+29
| | | | | | | | | | | | | | | | | | TypeMapper is a utility used to globally rewrite types, mapping some eliminated source types into destination types they should be replaced with. This was previously done by first rewriting all the types in the IR according to the given mapping, then rewriting the type definitions and updating all the types in the IR again. Not only was doing the rewriting twice inefficient, it also introduced a subtle bug where the set of private types eligible to be rewritten could be inconsistent because updating types in the IR could change the types of control flow structures. The fuzzer found a case where this inconsistency caused the type rebuilding to fail. Fix the bug by first building the new types with the mapping applied and only then rewriting the IR a single time. Also add a `TypeBuilder::dump` utility for use in debugging. Fixes #5845.
* GUFA: Infer using TrapsNeverHappen (#5850)Alon Zakai2023-08-022-12/+555
| | | | | | | | | | | | | | | | | | | | | | | | This adds a TrapsNeverHappen oracle that is used inside the main PossibleContents oracle of GUFA. The idea is that when traps never happen we can reason "backwards" from information to things that must be true before it: temp = x.field; x.cast_to<Y>(); // Y is a subtype of x's type X Here we cast x to a subtype. If we assume traps never happen then the cast must succeed, and that means we can assume we had a Y on the previous line, where perhaps that information lets us infer the value of x.field. This PR focuses on calls, which are the more interesting situation to optimize because other passes do some work already inside functions. Specifically, we look for things that will trap in the called function or the caller, such as if the called function always casts a param to some type, we can assume the caller passes such a type in. And if we have a call_ref then any target that would trap cannot be called (at least in a closed world). This has some benefits, in particular when combined with --gufa-cast-all since that casts more things, which lets us apply the inferences made here. I see 3.3% fewer call_ref instructions on a Kotlin testcase, for example. This helps more on -Os when we inline less.
* PossibleContents: Support more intersection types (#5847)Alon Zakai2023-07-312-29/+68
|
* CFGWalker: Allow users to ignore branches outside the function [NFC] (#5838)Alon Zakai2023-07-261-0/+4
| | | | | | | | | A pass that just operates on locals, for example, does not care about branches outside of the function. That means that when we see a call, then even if EH is enabled we don't need to create a new basic block right after it (unless the call is inside a try-catch - then it might branch to the catch, of course). This makes CFG-using passes 9% faster compared to before this PR. This plus #5827 offset the slowdown from #5823 and overall give an improvement compared to before.
* End the current basic block on a Call (#5823)Alon Zakai2023-07-263-6/+12
| | | | | | | | | | | | | Before this PR, if a call had no paths to a catch in the same function then we skipped creating a new basic block right after it. As a result, we could have a call in the middle of a basic block. If EH is enabled that means we might transfer control flow out of the function from the middle of a block. But it is better to have the property that any transfer of control flow - to another basic block, or outside of the function - can only happen at the end of a basic block. This causes some overhead, but a subsequent PR (#5838) will remove that as a followup, and this PR adds a little code to pass the module and check if EH is enabled, and avoid the overhead if not, which at least avoids regressing the non-EH case until that followup lands.