summaryrefslogtreecommitdiff
path: root/test/lit/passes
Commit message (Collapse)AuthorAgeFilesLines
* Cost analysis: Remove "Unacceptable" hack (#6782)Alon Zakai2024-07-251-7/+136
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We marked various expressions as having cost "Unacceptable", fixed at 100, to ensure we never moved them out from an If arm, etc. Giving them such a high cost avoids that problem - the cost is higher than the limit we have for moving code from conditional to unconditional execution - but it also means the total cost is unrealistic. For example, a function with one such instruction + an add (cost 1) would end up with cost 101, and removing the add would look insignificant, which causes issues for things that want to compare costs (like Monomorphization). To fix this, adjust some costs. The main change here is to give casts a cost of 5. I measured this in depth, see the attached benchmark scripts, and it looks clear that in both V8 and SpiderMonkey the cost of a cast is high enough to make it not worth turning an if with ref.test arm into a select (which would always execute the test). Other costs adjusted here matter a lot less, because they are on operations that have side effects and so the optimizer will anyhow not move them from conditional to unconditional execution, but I tried to make them a bit more realistic while I was removing "Unacceptable": * Give most atomic operations the 10 cost we've been using for atomic loads/ stores. Perhaps wait and notify should be slower, however, but it seems like assuming fast switching might be more relevant. * Give growth operations a cost of 20, and throw operations a cost of 10. These numbers are entirely made up as I am not even sure how to measure them in a useful way (but, again, this should not matter much as they have side effects).
* TupleOptimization: Properly handle subtyping in copies (#6786)Alon Zakai2024-07-251-2/+40
| | | | | | We used the target's type for the read from the source, but due to subtyping those might be different. Found by the fuzzer.
* [threads] Calculate shared heap type depths in subtypes.h (#6777)Thomas Lively2024-07-231-0/+29
| | | Fixes #6776.
* Heap2Local: Properly handle failing array casts (#6772)Alon Zakai2024-07-181-0/+125
| | | | | | | | Followup to #6727 which added support for failing casts in Struct2Local, but it turns out that it required Array2Struct changes as well. Specifically, when we turn an array into a struct then casts can look like they behave differently (what used to be an array input, becomes a struct), so like with RefTest that we already handled, check if the cast succeeds in the original form and handle that.
* Monomorphization: Add a limit on the number of parameters (#6774)Alon Zakai2024-07-181-0/+400
|
* Monomorphize all the things (#6760)Alon Zakai2024-07-185-305/+2071
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously call operands were monomorphized (considered as part of the call context, so we can create a specialized function with those operands fixed) if they were constant or had a different type than the function parameter's type. This generalizes that to pull in pretty much all the code we possibly can, including nested code. For example: (call $foo (struct.new $struct (i32.const 10) (local.get $x) (local.get $y) ) ) This can turn into (call $foo_mono (local.get $x) (local.get $y) ) The struct.new and even one of the struct.new's children is moved into the called function, replacing the original ref argument with two other ones. If the original called function was this: (func $foo (param $ref (ref ..)) .. ) then the monomorphized function then looks like this: (func $foo_mono (param $x i32) (param $y i32) (local $ref (ref ..)) (local.set $ref (struct.new $struct (i32.const 10) (local.get $x) (local.get $y) ) ) .. ) The struct.new and its constant child appear here, and we read the parameters. To do this, generalize the code that creates the call context to accept everything that is impossible to copy (like a local.get) or that would be tricky and likely unworthwhile (like another call or a tuple). Also check for effect interactions, as this code motion does some reordering. For this to work, we need to adjust how we compute the costs we compare when deciding what to monomorphize. Before we just compared the called function to the monomorphized called function, which was good enough when the call context only contained consts, but now it can contain arbitrarily nested code. The proper comparison is between these two: * Old function + call context * New monomorphized function Including the call context makes this a fair comparison. In the example above, the struct.new and the i32.const are part of the call context, and so they are in the monomorphized function, so if we didn't count them in other function we'd decide not to optimize anything with a large context. The new functionality is tested in a new file. A few parts of existing tests needed changes to not become pointless after this improvement, namely by replacing stuff that we now optimize with things that we don't like replacing an i32.eqz with a local.get. There are also a handful of test outcomes that change in CAREFUL mode due to the new cost analysis.
* Make it possible to skip several passes (#6714)Jérôme Vouillon2024-07-171-0/+65
| | | --skip-pass can now be specified more than once on the commandline.
* [threads] Update TypeSSA for shared types (#6753)Thomas Lively2024-07-161-0/+72
| | | | When creating a new subtype, make sure to copy the supertype's shareability.
* Remove extra space printed in empty structs (#6750)Thomas Lively2024-07-1638-308/+308
| | | | | | When we switched to the new type printing machinery, we inserted this extra space to minimize the diff in the test output compared with the previous type printer. Improve the quality of the printed output by removing it.
* Monomorphize dropped functions (#6734)Alon Zakai2024-07-121-0/+1191
| | | | | | | | | | | | | | | | | | | | | | | | | We now consider a drop to be part of the call context: If we see (drop (call $foo) ) (func $foo (result i32) (i32.const 42) ) Then we'd monomorphize to this: (call $foo_1) ;; call the specialized function instead (func $foo_1 ;; the specialized function returns nothing (drop ;; the drop was moved into here (i32.const 42) ) ) With the drop now in the called function, we may be able to optimize out unused work. Refactor a bit of code out of DAE that we can reuse here, into a new return-utils.h.
* Remove non-standard `i31.new` (#6736)Thomas Lively2024-07-123-44/+44
| | | | The standard name for the instruction is `ref.i31`. Remove support for the non-standard name and update tests that were still using it.
* Do not abbreviate items in element segments (#6737)Thomas Lively2024-07-124-7/+7
| | | | | | | | The full syntax for an expression in an element syntax looks like `(item (ref.null none))`, but we have been printing the abbreviated version, which omits the `(item ...)`. This abbreviation is only valid when the item has only a single instruction, so it is not always correct to use it. Rather than determining whether or not to use the abbreviation on a case-by-case basis, always print the full syntax.
* Memory64Lowering: Handle -1 return value from memory.grow (#6733)Sam Clegg2024-07-111-4/+18
| | | This edge case make the lowering a little more tricky.
* Convert memory64 lowering test to lit. NFC (#6731)Sam Clegg2024-07-111-0/+283
| | | Test was converted using port_passes_tests_to_lit.py.
* Monomorphize: Use -O3 over -O1 + tweaks (#6732)Alon Zakai2024-07-112-21/+24
| | | | | Eventually we will need to do some tuning of compile time speed, but for now it is going to be simpler to do all the opts, in particular because it makes writing tests simpler.
* [WasmGC] Heap2Local: Optimize RefCast failures (#6727)Alon Zakai2024-07-111-15/+89
| | | | | | | Previously we just did not optimize cases where our escape analysis showed an allocation flowed into a cast that failed. However, after inlining there can be real-world cases where that happens, even in traps-never-happen mode (if the cast is behind a conditional branch), so it seems worth optimizing.
* Heap2Local: Drop RefEq's two arms (#6729)Alon Zakai2024-07-111-81/+76
| | | | | This is a tiny bit more code but it is more consistent with other operations, and it saves work later.
* Monomorphization: Optimize constants (#6711)Alon Zakai2024-07-114-41/+812
| | | | | | | | | | | | | | | | | | | | | | | Previously the pass would monomorphize a call when we were sending more refined types than the target expects. This generalizes the pass to also consider the case where we send a constant in a parameter. To achieve that, this refactors the pass to explicitly define the "call context", which is the code around the call (inputs and outputs) that may end up leading to optimization opportunities when combined with the target function. Also add comments about the overall design + roadmap. The existing test is mostly unmodified, and the diff there is smaller when ignoring whitespace. We do "regress" those tests by adding more local.set operations, as in the refactoring that makes things a lot simpler, that is, to handle the general case of an operand having either a refined type or be a constant, we copy it inside the function, which works either way. This "regression" is only in the testing version of the pass (the normal version runs optimizations, which would remove that extra code). This also enables the pass when GC is disabled. Previously we only handled refined types, so only GC could benefit. Add a test for MVP content specifically to show we operate there as well.
* [WasmGC] Heap2Local: Optimize RefIs and RefTest (#6705)Alon Zakai2024-07-111-13/+376
|
* [StackIR] Allow StackIR to be disabled from the commandline (#6725)Alon Zakai2024-07-101-0/+66
| | | | | | | | | Normally we use it when optimizing (above a certain level). This lets the user prevent it from being used even then. Also add optimization options to wasm-metadce so that this is possible there as well and not just in wasm-opt (this also opens the door to running more passes in metadce, which may be useful later).
* StackIR: Optimize away a drop before an unreachable (#6719)Alon Zakai2024-07-081-0/+118
| | | | | | | | | | | | | | | | | | | | | | | | Anything else right before an unreachable is removed by the main DCE pass anyhow, but because of the structured form of BinaryenIR we can't remove a drop. That is, this is the difference between (i32.eqz (i32.const 42) (unreachable) ) and (drop (call $foo) ) (unreachable) In both cases the unreachable is preceded by something we don't need, but in the latter case it must remain in BinaryenIR for validation. To optimize this, add a rule in StackIR. Fixes #6715
* Rename external conversion instructions (#6716)Jérôme Vouillon2024-07-085-36/+36
| | | | | | | | | Rename instructions `extern.internalize` into `any.convert_extern` and `extern.externalize` into `extern.convert_any` to follow more closely the spec. This was changed in https://github.com/WebAssembly/gc/issues/432. The legacy name is still accepted in text inputs and in the C and JS APIs.
* [DebugInfo] Add debug info to the values emitted in GlobalStructInference ↵Alon Zakai2024-07-021-0/+73
| | | | | | | | | (#6709) Previously the replacement select got the debug info, but we should also copy it to the values, as often optimizations lead to one of those values remaining by itself. Similar to #6652 in general form.
* ConstantFieldPropagation: Add a variation that picks between 2 values using ↵Alon Zakai2024-06-272-0/+1462
| | | | | | | | | | | | | | | | | | | | | | | | | | | RefTest (#6692) CFP focuses on finding when a field always contains a constant, and then replaces a struct.get with that constant. If we find there are two constant values, then in some cases we can still optimize, if we have a way to pick between them. All we have is the struct.get and its reference, so we must use a ref.test: (struct.get $T x (..ref..)) => (select (..constant1..) (..constant2..) (ref.test $U (..ref..)) ) This is valid if, of all the subtypes of $T, those that pass the test have constant1 in that field, and those that fail the test have constant2. For example, a simple case is where $T has two subtypes, $T is never created itself, and each of the two subtypes has a different constant value. This is a somewhat risky operation, as ref.test is not necessarily cheap. To mitigate that, this is a new pass, --cfp-reftest that is not run by default, and also we only optimize when we can use a ref.test on what we think will be a final type (because ref.test on a final type can be faster in VMs).
* [WasmGC] Heap2Local: Optimize RefEq (#6703)Alon Zakai2024-06-261-8/+310
| | | | | If an allocation does not escape, then we can compute ref.eq for it: when compared to itself the result is 1, and when compared to anything else it is 0 (since it did not escape, anything else must be different).
* [threads] Validate shared-to-unshared edges in heap types (#6698)Thomas Lively2024-06-251-4/+4
| | | Add spec tests checking validation for structs and arrays.
* [WasmGC] Add missing ArrayNew variants to Properties::isGenerative (#6691)Alon Zakai2024-06-241-5/+87
| | | Fixes #6690
* Add TraceCalls pass (#6619)Marcin Kolny2024-06-212-0/+165
| | | | | | | This pass receives a list of functions to trace, and then wraps them in calls to imports. This can be useful for tracing malloc/free calls, for example, but is generic. Fixes #6548
* GlobalStructInference: Un-nest struct.news in globals when that is helpful ↵Alon Zakai2024-06-201-5/+412
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#6688) If we have (global $g (struct.new $S (i32.const 1) (struct.new $T ..) (ref.func $f) )) then before this PR if we wanted to read the middle field we'd stop, as it is non-constant. However, we can un-nest it, making it constant: (global $g.unnested (struct.new $T ..)) (global $g (struct.new $S (i32.const 1) (global.get $g.unnested) (ref.func $f) )) Now the field is a global.get of an immutable global, which is constant. Using this technique we can handle anything in a struct field, constant or not. The cost of adding a global is likely offset by the benefit of being able to refer to it directly, as that opens up more opportunities later. Concretely, this replaces the constant values we look for in GSI with a variant over constants or expressions (we do still want to group constants, as multiple globals with the same constant field can be treated as a whole). And we note cases where we need to un-nest, and handle those at the end.
* [threads] Shared basic heap types (#6667)Thomas Lively2024-06-191-0/+22
| | | | | | | | | | | Implement binary and text parsing and printing of shared basic heap types and incorporate them into the type hierarchy. To avoid the massive amount of code duplication that would be necessary if we were to add separate enum variants for each of the shared basic heap types, use bit 0 to indicate whether the type is shared and replace `getBasic()` with `getBasic(Unshared)`, which clears that bit. Update all the use sites to record whether the original type was shared and produce shared or unshared output without code duplication.
* GlobalStructInference: Optimize globals too (#6674)Alon Zakai2024-06-171-0/+51
| | | | This is achieved by simply replacing the Literal with PossibleConstantValues, which supports both Literals and Globals.
* [Parser] Update requirements for implicit type uses (#6665)Thomas Lively2024-06-142-74/+76
| | | | | | | As an abbreviation, a `typeuse` can be given as just a list of parameters and results, in which case it corresponds to the index of the first function type with the same parameters and results. That function type must also be an MVP function type, i.e. it cannot have a nontrivial rec group, be non-final, or have a declared supertype. The parser did not previously implement all of these rules.
* [threads] Binary reading and writing of shared composite types (#6664)Thomas Lively2024-06-141-3/+6
| | | | Also update the parser so that implicit type uses are not matched with shared function types.
* [threads] Parse, build, and print shared composite types (#6654)Thomas Lively2024-06-121-0/+74
| | | | | | | | | | | | | | Parse the text format for shared composite types as described in the shared-everything thread proposal. Update the parser to use 'comptype' instead of 'strtype' to match the final GC spec and add the new syntactic class 'sharecomptype'. Update the type canonicalization logic to take sharedness into account to avoid merging shared and unshared types. Make the same change in the TypeMerging pass. Ensure that shared and unshared types cannot be in a subtype relationship with each other. Follow-up PRs will add shared abstract heap types, binary parsing and emitting for shared types, and fuzzer support for shared types.
* [DebugInfo] Copy debug info in call-utils.h (#6652)Alon Zakai2024-06-121-2/+23
| | | | | | | | | | | | | | | | | | | | | | | We automatically copy debuginfo in replaceCurrent(), but there are a few places that do other operations than simple replacements. call-utils.h will turn a call_ref with a select target into two direct calls, and we were missing the logic to copy debuginfo from the call_ref to the calls. To make this work, refactor out the copying logic from wasm-traversal, into debuginfo.h, and use it in call-utils.h. debuginfo.h itself is renamed from debug.h (as now this needs to be included from wasm-traversal, which nearly everything does, and it turns out some files have internal stuff like a debug() helper that ends up conflicing with the old debug namespace). Also rename the old copyDebugInfo function to copyDebugInfoBetweenFunctions which is more explicit. That is also moved from the header to a cpp file because it depends on wasm-traversal (so we'd end up with recursive headers otherwise). That is fine, as that method is called after copying a function, which is not that frequent. The new copyDebugInfoToReplacement (which was refactored out of wasm-traversal) is in the header because it can be called very frequently (every single instruction we optimize) and we want it to get inlined.
* [Strings] Keep public and private types separate in StringLowering (#6642)Alon Zakai2024-06-102-28/+102
| | | | | | | | | | | | | | | | We need StringLowering to modify even public types, as it must replace every single stringref with externref, even if that modifies the ABI. To achieve that we told it that all string-using types were private, which let TypeUpdater update them, but the problem is that it moves all private types to a new single rec group, which meant public and private types ended up in the same group. As a result, a single public type would make it all public, preventing optimizations and breaking things as in #6630 #6640. Ideally TypeUpdater would modify public types while keeping them in the same rec groups, but this may be a very specific issue for StringLowering, and that might be a lot of work. Instead, just make StringLowering handle public types of functions in a manual way, which is simple and should handle all cases that matter in practice, at least in J2Wasm.
* Effects: Add missing combining logic for MayNotReturn (#6635)Alon Zakai2024-06-031-81/+295
| | | | | | | | | | Without that logic we could end up dropping that particular effect. This actually made a test pass when it should not: the modified test here has a function with effects that are ok to remove, but it had a loop which adds MayNotReturn which we should actually not remove, so it was removed erroneously. To fix the test, add other effects there (local ones) that we can see are removable. Also add a function with a loop to test that we do not remove an infinite loop, which adds coverage for the fix here.
* Optimize ReorderGlobals ordering with a new algorithm (#6625)Alon Zakai2024-05-313-29/+1437
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old ordering in that pass did a topological sort while sorting by uses both within topological groups and between them. That could be unoptimal in some cases, however, and actually on J2CL output this pass made the binary larger, which is how we noticed this. The problem is that such a toplogical sort keeps topological groups in place, but it can be useful to interleave them sometimes. Imagine this: $c - $a / $e \ $d - $b Here $e depends on $c, etc. The optimal order may interleave the two arms here, e.g. $a, $b, $d, $c, $e. That is because the dependencies define a partial order, and so the arms here are actually independent. Sorting by toplogical depth first might help in some cases, but also is not optimal in general, as we may want to mix toplogical depths: $a, $c, $b, $d, $e does so, and it may be the best ordering. This PR implements a natural greedy algorithm that picks the global with the highest use count at each step, out of the set of possible globals, which is the set of globals that have no unresolved dependencies. So we start by picking the first global with no dependencies and add at at the front; then that unlocks anything that depended on it and we pick from that set, and so forth. This may also not be optimal, but it is easy to make it more flexible by customizing the counts, and we consider 4 sorts here: * Set all counts to 0. This means we only take into account dependencies, and we break ties by the original order, so this is as close to the original order as we can be. * Use the actual use counts. This is the simple greedy algorithm. * Set the count of each global to also contain the counts of its children, so the count is the total that might be unlocked. This gives more weight to globals that can unlock more later, so it is less greedy. * As last, but weight children's counts lower in an exponential way, which makes sense as they may depend on other globals too. In practice it is simple to generate cases where 1, 2, or 3 is optimal (see new tests), but on real-world J2CL I see that 4 (with a particular exponential coefficient) is best, so the pass computes all 4 and picks the best. As a result it will never worsen the size and it has a good chance of improving. The differences between these are small, so in theory we could pick any of them, but given they are all modifications of a single algorithm it is very easy to compute them all with little code complexity. The benefits are rather small here, but this can save a few hundred bytes on a multi-MB Java file. This comes at a tiny compile time cost, but seems worth it for the new guarantee to never regress size.
* LogExecution: Optionally take a module name for the logger function (#6629)YAMAMOTO Takashi2024-05-311-0/+24
| | | | | | | | | --log-execution=NAME will use NAME as the module for the logger function import, rather than infer it. If the name is not provided (--log-execution as before this PR) then we will try to automatically decide which to use ("env", unless we see another module name is used, which can be the case in optimized modules).
* Avoid duplicate type names (#6633)Alon Zakai2024-05-301-1/+2
| | | | | If we replace a type with another, use the original name for the new type, and give the old a unique name (for the rare cases in which it has uses).
* [NFC] Port LogExecution test to lit (#6634)Alon Zakai2024-05-301-0/+146
|
* Fix Vacuuming of code leading up to an infinite loop (#6632)Alon Zakai2024-05-291-22/+41
| | | | | We had that logic right in other places, but the specific part of Vacuum that looks at code that leads up to an unreachable did not check for infinite loops, so it could remove them.
* SignaturePruning: Properly handle public types (#6630)Alon Zakai2024-05-291-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SignaturePruning pass optimizes away parameters that it proves are safe to remove. It turns out that that does not always match the definition of private types, which is more restrictive. Specifically, if say all the types are in one big rec group and one of them is used on an exported function then all of them are considered public (as the rec group is). However, in closed world, it would be ok to leave that rec group unchanged but to create a pruned version of that type and use it, in cases where we see it is safe to remove a parameter. (See the testcase for a concrete example.) To put it another way, SignaturePruning already proves that a parameter is safe to remove in all the ways that matter. Before this PR, however, the testcase in this PR would error - so this PR is not an optimization but a bugfix, really - because SignaturePruning would see that a parameter is safe to remove but then TypeUpdating would see the type is public and so it would leave it alone, leading to a broken module. This situation is in fact not that rare, and happens on real-world Java code. The reason we did not notice it before is that typically there are no remaining SignaturePruning opportunities late in the process (when other closed world optimizations have typically led to a single big rec group). The concrete fix here is to add additionalPrivateTypes to a few more places in TypeUpdating. We already supported that for cases where a pass knew better than the general logic what can be modified, and this adds that ability to the signature-rewriting logic there. Then SignaturePruning can send in all the types it has proven are safe to modify. * Also necessary here is to only add from additionalPrivateTypes if the type is not already in our list (or we'd end up with duplicates in the final rec group). * Also move newSignatures in SignaturePruning out of the top level, which was confusing (the pass has multiple iterations, and we want each to have a fresh instance).
* Remove obsolete parser code (#6607)Thomas Lively2024-05-291-1/+0
| | | | | Remove `SExpressionParser`, `SExpressionWasmBuilder`, and `cashew::Parser`. Simplify gen-s-parser.py. Remove the --new-wat-parser and --deprecated-wat-parser flags.
* OptimizeInstructions: Push StructNew down to help it fold away StructSets ↵Roberto Lublinerman2024-05-281-12/+54
| | | | | | | | | (#6584) Heap stores (struct.set) are optimized into the struct.new when they are adjacent in a statement list. Pushing struct.new down past irrelevant instructions increases the likelihood that it ends up adjacent to sets.
* [EH] Rename old EH tests from -old to -legacy (#6627)Heejin Ahn2024-05-2817-0/+0
| | | | This renames old EH tests in the form of `-eh-old.wast` to `-eh-legacy.wast`, to be clearer in names.
* SimplifyGlobals: Do not switch a get to use a global of another type (#6605)Alon Zakai2024-05-201-0/+24
| | | | If we wanted to switch types in such cases we'd need to refinalize (which is likely worth doing, though other passes should refine globals anyhow).
* Fix generate-dyncalls and directize passed under table64 (#6604)Sam Clegg2024-05-182-0/+85
|
* Fix GlobalRefining's handling of gets in module code and add missing ↵Alon Zakai2024-05-171-0/+28
| | | | | | | | | | | validation (#6603) GlobalRefining did not traverse module code, so it did not update global.gets in other globals. Add missing validation that actually errors on that: We did not check global.get types. These could be separate PRs but it would be difficult to test them separately.
* [Table64Lowering] Don't assume that all segments are from 64-bit tables (#6599)Sam Clegg2024-05-161-11/+17
| | | | | | | | | This allows modules to contains both 32-bit and 64-bit segment. In order to check the table/memory state when visiting segments we need to ensure that memories/tables are visited only after their segments. The comments in visitTable/visitMemory already assumed this but it wasn't true in practice.