summaryrefslogtreecommitdiff
path: root/test/lit/passes
Commit message (Collapse)AuthorAgeFilesLines
* Compute full transitive closure in GlobalEffects (#5992)Alon Zakai2023-10-062-4/+230
|
* [typed-cont] Allow result types on tags (#5997)Frank Emrich2023-10-056-8/+8
| | | | | | | | | | | This PR is part of a series that adds basic support for the typed continuations proposal. This PR relaxes the restriction that tags must not have results , only params. Tags with results must not be used for exception handling and are only allowed if the typed continuations feature is enabled. As a minor point, this PR also changes the printing of tags without params: To make the presentation consistent, (param) is omitted when printing a tag.
* Updating asyncify_optimize-level=1 test (#5993)Ashley Nelson2023-10-041-19/+19
| | | This test is failing on main, looks like the update to the test was overwritten when commits merged. Fixing with the result of running update_lit_test.py
* RemoveUnusedBrs: Allow less unconditional work and in particular division ↵Alon Zakai2023-10-033-19/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#5989) Fixes #5983: The testcase from there is used here in a new testcase remove-unused-brs_levels in which we check if we are willing to unconditionally do a division operation. Turning an if with an arm that does a division into a select, which always does the division, is almost 5x slower, so we should probably be extremely careful about doing that. I took some measurements and have some suggestions for changes in this PR: * Raise the cost of div/rem to what I measure on my machine, which is 5x slower than an add, or worse. * For some reason we added the if arms rather than take the max of them, so fix that. This does not help the issue, but was confusing. * Adjust TooCostlyToRunUnconditionally in the pass from 9 to 8 (this helps balance the last point). * Use half that value when not optimizing for size. That is, we allow only 4 extra unconditional work normally, and 8 in -Os, and when -Oz then we allow any extra amount. Aside from the new testcases, some existing ones changed. They all appear to change in a reasonable way, to me. We should perhaps go even further than this, and not even run a division unconditionally in -Os, but I wasn't sure it makes sense to go that far as other benchmarks may be affected. For now, this makes the benchmark in #5983 run at full speed in -O3 or -Os, and it remains slow in -Oz. The modified version of the benchmark that only divides in the if (no other operations) is still fast in -O3, but it become slow in -Os as we do turn that if into a select (but again, I didn't want to go that far as to overfit on that one benchmark).
* Asyncify: Simpify if into i32.or (#5988)Heejin Ahn2023-10-0316-243/+195
| | | | | | | | | | | | | | | | | | | | | | | | | | | ```wast (if (result i32) (expr0) (i32.const 1) (expr1) ) ``` can be written as ```wast (i32.or (expr0) (expr1) ) ``` Also this removes some unused variables and methods. This also adds an optimization for ```wast (i32.eqz (global.get $__asyncify_state) ) ``` in `--mod-asyncify-always-and-only-unwind` to fix an unexpected regression caused by this.
* Refine ref.test's castType during refinalization (#5985)Thomas Lively2023-10-021-1/+1
| | | | | | Just like we do with other casts, refine the cast type to be the greatest lower bound of its previous cast type and its input type. The difference is that the output type of ref.test remains i32, but it's still useful to retain more precise type information.
* ConstantFieldPropagation: Fully handle copies (#5969)Alon Zakai2023-09-261-0/+324
| | | | | | | | | | | | | | | | If we see A->f0 = A->f0 then we might be copying fields not only between instances of A but also of any subtypes of A, and so if some subtype has value x then that x might now have reached any other subtype of A (even in a sibling type, so long as A is their parent). We already thought we were handling that, but the mechanism we used to do so (copying New info to Set info, and letting Set info propagate) was not enough. Also add a small constructor to save the work of computing subTypes again. Add TODOs for some cases that we could optimize regarding copies but do not, yet.
* Handle table.fill in Directize (#5974)Alon Zakai2023-09-261-0/+95
| | | Like table.set, it can modify a table.
* StackIR local2stack: Make sure we do not break non-nullable validation (#5919)Alon Zakai2023-09-221-0/+836
| | | | | | | | | | | | | | | | | | | | | | | | | | | local2stack removes a pair of local.set 0 local.get 0 when that set is not used anywhere else: whatever value is put into the local, we can just leave it on the stack to replace the get. However, we only handled actual uses of the set which we checked using LocalGraph. There may be code that does not actually use the set local, but needs that set purely for validation reasons: local.set 0 local.get 0 block local.set 0 end local.get That last get reads the value set in the block, so the first set is not used by it. But for validation purposes, the inner set stops helping at the block end, so we do need that initial set. To fix this, check for gets that need our set to validate before removing any. Fixes #5917
* NameTypes and TypeSSA : Prefer _ over $ in names, and lint away _N suffixes ↵Alon Zakai2023-09-223-41/+66
| | | | | | | | | | | | | | (#5968) Apparently $N (e.g. FooClass$5) is a convention in Java for anonymous classes, so our $N that we use to disambiguate could be confusing. As the way we disambiguate does not matter, switch to using _N. This PR does that in both TypeSSA and NameTypes. Also make NameTypes "lint" names as it goes. That pass tries to give types nice names, leaving existing ones that seem ok, and renaming long or unnamed ones. This PR makes it aware of the _N notation and it tries to remove it, if removing it does not cause a collision. An example of how that helps is if TypeSSA creates a subtype $Foo_0 and then we manage to remove $Foo, then we can use the shorter name for the subtype.
* Support i8/i16 mutable arrays as public types for string interop (#5814)Alon Zakai2023-09-212-3/+66
| | | | | Probably any array of non-reference data can be allowed to be public and sent out of the module, as it is just data. For now, however, just special case the i8 and i16 array types which are useful already for string interop.
* Make heap2local work through casts (#5952)Jérôme Vouillon2023-09-211-7/+96
| | | | | | | | | | | | | | | | | | | | | E.g. (local $x (ref eq) ... (local.set $x (struct.new $float ... ) ) (struct.get $float 0 (ref.cast (ref $float) (local.get $x) ) ) This PR allows us to use heap2local, ignoring the passing cast. This is similar to existing handling of ref.as_non_null.
* Do not optimize tuple locals in StackIR local2stack (#5958)Thomas Lively2023-09-181-0/+29
| | | | This Stack IR optimization is not compatible with a much more powerful optimization we plan to do for tuples in the binary writer.
* [NFC] Port stack IR test to lit (#5957)Thomas Lively2023-09-181-0/+1401
| | | | Fix some whitespace, and name and reorder a few items to make the output better match the input, but otherwise port the tests to lit unmodified.
* Add passes to finalize or unfinalize types (#5944)Alon Zakai2023-09-181-0/+96
| | | | | | | | | TypeFinalization finalizes all types that we can, that is, all private types that have no children. TypeUnFinalization unfinalizes (opens) all (private) types. These could be used by first opening all types, optimizing, and then finalizing, as that might find more opportunities. Fixes #5933
* TupleOptimization: Handle copies of different types in unreachable code (#5956)Alon Zakai2023-09-181-0/+20
|
* Remove legacy type defintion text syntax (#5948)Thomas Lively2023-09-1835-425/+424
| | | | | | | Remove support for the "struct_subtype", "array_subtype", "func_subtype", and "extends" notations we used at various times to declare WasmGC types, leaving only support for the standard text fromat for declaring types. Update all the tests using the old formats and delete tests that existed solely to test the old formats.
* Add a simple tuple optimization pass (#5937)Alon Zakai2023-09-141-0/+1009
| | | | | | | | | | | In some cases tuples are obviously not needed, such as when they are only used in local operations and make/extract. Such tuples are not used as return values or in control flow structures, so we might as well lower them to individual locals per lane, which other passes can optimize a lot better. I believe LLVM does the same with its own tuples: it lowers them as much as possible, leaving only necessary ones. Fixes #5923
* OptimizeInstructions: Simplify tuple.extract of tuple.make (#5938)Alon Zakai2023-09-141-32/+71
| | | | | | | | | | | | | | E.g. (tuple.extract 1 (tuple.make (A) (B) (C)) => (B) Modify some existing tests to not be in this trivial form, so that they do not stop testing what they should.
* Replace i31.new with ref.i31 everywhere (#5931)Thomas Lively2023-09-1312-53/+53
| | | | | Replace i31.new with ref.i31 in the printer, tests, and source code. Continue parsing i31.new for the time being to allow a graceful transition. Also update the JS API to reflect the new instruction name.
* Remove legacy GC text syntax (#5929)Thomas Lively2023-09-121-2/+2
| | | | Remove the old forms of ref.test and ref.cast that took heap types instead of ref types and remove the old array.init_static name for array.new_fixed.
* Fix printing of types for imported functions (#5927)Thomas Lively2023-09-112-2/+2
| | | | | | Previously, the printer incorrectly reconstructed imported functions' types from their signatures instead of printing their types directly. This could cause the printer to print uses of types that were never defined and did not exist in the module. Fix the bug by printing imported functions' heap types directly.
* Make final types the default (#5918)Thomas Lively2023-09-0940-512/+509
| | | | | | | | | | | | | Match the spec and parse the shorthand binary and text formats as final and emit final types without supertypes using the shorthands as well. This is a potentially-breaking change, since the text and binary shorthands can no longer be used to define types that have subtypes. Also make TypeBuilder entries final by default to better match the spec and update the internal APIs to use the "open" terminology rather than "final" terminology. Future changes will update the text format to use the standard "sub open" rather than the current "sub final" keywords. The exception is the new wat parser, which supporst "sub open" as of this change, since it didn't support final types at all previously.
* Remove the GCNNLocals feature (#5080)Thomas Lively2023-08-316-72/+55
| | | | | Now that the WasmGC spec has settled on a way of validating non-nullable locals, we no longer need this experimental feature that allowed nonstandard uses of non-nullable locals.
* Parse non-nullable tuple elements without special handling (#5910)Thomas Lively2023-08-301-5/+3
| | | | | | | In the binary parser, when creating a scratch local to hold multivalue results as tuples, we previously ensured that the scratch local did not contain any non-nullable by modifying its type and inserting ref.as_non_null as necessary. Now that we properly support non-nullable elements in tuple locals, however, this parser behavior is no longer necessary. Remove it.
* Validate and fix up tuples with non-nullable elements (#5909)Thomas Lively2023-08-302-12/+189
| | | | | | The code validating and fixing up non-nullable locals previously did not correctly handle tuples that contained non-nullable elements, which could have resulted in invalid modules going undetected. Update the code to handle tuples and add tests.
* GlobalStructInference: Add missing ReFinalize (#5898)Alon Zakai2023-08-241-0/+47
|
* Simplify and consolidate type printing (#5816)Thomas Lively2023-08-24201-3116/+3116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When printing Binaryen IR, we previously generated names for unnamed heap types based on their structure. This was useful for seeing the structure of simple types at a glance without having to separately go look up their definitions, but it also had two problems: 1. The same name could be generated for multiple types. The generated names did not take into account rec group structure or finality, so types that differed only in these properties would have the same name. Also, generated type names were limited in length, so very large types that shared only some structure could also end up with the same names. Using the same name for multiple types produces incorrect and unparsable output. 2. The generated names were not useful beyond the most trivial examples. Even with length limits, names for nontrivial types were extremely long and visually noisy, which made reading disassembled real-world code more challenging. Fix these problems by emitting simple indexed names for unnamed heap types instead. This regresses readability for very simple examples, but the trade off is worth it. This change also reduces the number of type printing systems we have by one. Previously we had the system in Print.cpp, but we had another, more general and extensible system in wasm-type-printing.h and wasm-type.cpp as well. Remove the old type printing system from Print.cpp and replace it with a much smaller use of the new system. This requires significant refactoring of Print.cpp so that PrintExpressionContents object now holds a reference to a parent PrintSExpression object that holds the type name state. This diff is very large because almost every test output changed slightly. To minimize the diff and ease review, change the type printer in wasm-type.cpp to behave the same as the old type printer in Print.cpp except for the differences in name generation. These changes will be reverted in much smaller PRs in the future to generally improve how types are printed.
* Fix merging of unrelated types in TypeMerging (#5897)Thomas Lively2023-08-231-0/+74
| | | | | | | | | Previously it was possible that the supertype merging phase would merge unrelated types when DFA minimization would split a common supertype out of a partition, leaving unrelated types behind in the same partition. Fix the problem by post-processing the partitions in the supertype merging phase to split any partitions that contain unrelated types. Fixes #5877.
* Fix a merge conflict on main (#5899)Alon Zakai2023-08-231-1/+1
|
* SignatureRefining: Handle updates to call.without.effects (#5884)Alon Zakai2023-08-231-1/+98
| | | | | | | If we refine a signature type that is used in a call.without.effects then that call's results may need to be updated. In the IR it looks like a normal call that happens to pass a function reference as the last param, but it actually means that we call that function (without side effects), so we need to have the same results, and the validator already verified that (so the new testcase here fails without this fix).
* Use the standard syntax for ref.cast, ref.test and array.new_fixed (#5894)Jérôme Vouillon2023-08-2338-974/+977
| | | | | | | | | * Update text output for `ref.cast` and `ref.test` * Update text output for `array.new_fixed` * Update tests with new syntax for `ref.cast` and `ref.test` * Update tests with new `array.new_fixed` syntax
* Fix assertion failure in RemoveUnusedBrs (#5895)Thomas Lively2023-08-231-0/+39
| | | | | | | | The improvements to RemoveUnusedBrs in #5887 also introduced a regression where the pass did not correctly handle unreachable fallthrough values and crashed with an assertion failure. Fix the problem by returning early when a fallthrough value is unreachable and add a regression test. Fixes #5892.
* Update stringref text format (#5891)Jérôme Vouillon2023-08-221-26/+26
| | | | | | | | | | | * Allow new syntax for some stringref opcodes Fixes #5607 * Update stringref text output * Update tests with new syntax for stringref opcodes Except in test/lit/strings.wat, to check that the legacy syntax still works.
* Rename multimemory flag (#5890)Ashley Nelson2023-08-213-5/+5
| | | Renaming the multimemory flag in Binaryen to match its naming in LLVM.
* Improve br_on* optimizations (#5887)Thomas Lively2023-08-222-73/+497
| | | | | | | Optimize both the known-null and known-non-null cases for BrOnNull and BrOnNonNull and optimize for more cast behaviors such as SuccessOnlyIfNonNull and Unreachable for BrOnCast and BrOnCastFail. Leave optimizing SuccessOnlyIfNull to future work, since that's more complicated. Use type information from fallthrough values to inform all the optimizations.
* Fix finalization of call_ref to handle refined target types (#5883)Thomas Lively2023-08-211-0/+70
| | | | | | | | | | Previously CallRef::finalize() would never update the type of the CallRef, even if the type of the call target had been refined to give a more precise result type. Besides unnecessarily losing type information, this could also lead to validation errors, since the validator checks that the type of CallRef matches the result type of the target signature. Fix the bug by updating CallRef's type based on its target signature in CallRef::finalize() and add a test that depends on this refinalization.
* Fix SSA on null refinement (#5886)Alon Zakai2023-08-171-0/+28
| | | | Similar to #5885 this was uncovered by #5881 #5882. Here we need to refinalize when we replace a local.get with a null, since the null's type is more refined.
* ReFinalize in TypeSSA (#5885)Alon Zakai2023-08-171-0/+33
| | | | This has been a bug for a while but it became noticeable after #5881 #5882 which do more work in refinalization.
* Further improve ref.cast during finalization (#5882)Thomas Lively2023-08-176-15/+19
| | | | | | We previously improved the nullability and heap type of the ref.cast target type in RefCast::finalize() based on what we knew about its input type. Simplify the code and make this improvement more powerful by using the greatest lower bound of the original cast target and input type.
* Ensure br_on_cast* target type is subtype of input type (#5881)Thomas Lively2023-08-175-14/+14
| | | | | | | | | | | | | | | | The WasmGC spec will require that the target cast type of br_on_cast and br_on_cast_fail be a subtype of the input type, but so far Binaryen has not enforced this constraint, so it could produce invalid modules when optimizations refined the input to a br_on_cast* such that it was no longer a supertype of the cast target type. Fix this problem by setting the cast target type to be the greatest lower bound of the original cast target type and the current input type in `BrOn::finalize()`. This maintains the invariant that the cast target type should be a subtype of the input type and it also does not change cast behavior; any value that could make the original cast succeed at runtime necessarily inhabits both the original cast target type and the input type, so it also must inhabit their greatest lower bound and will make the updated cast succeed as well.
* Improve cast optimizations (#5876)Thomas Lively2023-08-173-26/+563
| | | | | | | | | | | | Simplify the optimization of ref.cast and ref.test in OptimizeInstructions by moving the loop that examines fallthrough values one at a time out to a shared function in properties.h. Also simplify ref.cast optimization by analyzing the cast result in just one place. In addition to simplifying the code, also make the cast optimizations more powerful by analyzing the nullability and heap type of the cast value independently, resulting in a potentially more precise analysis of the cast behavior. Also improve optimization power by considering fallthrough values when optimizing the SuccessOnlyIfNonNull case.
* Heap2Local: Refinalize if we end up refining (#5879)Alon Zakai2023-08-171-1/+51
| | | | | | | | We shouldn't need to in the general case, but the fuzzer found a corner case where we do need to, see the explanation + testcase, but basically Heap2Local replaces struct fields with locals, and the locals should have the same types, but if a field was somehow less refined for some reason, then the locals could actually be more refined. (And a field could be less refined if we read it from a typed that was under-refined due to a tee or such.)
* Remove legacy WasmGC instructions (#5861)Thomas Lively2023-08-0910-237/+148
| | | | | Remove old, experimental instructions and type encodings that will not be shipped as part of WasmGC. Updating the encodings and text format to match the final spec is left as future work.
* LinearExecutionWalker: Optionally connect blocks for Br and BrOn (#5869)Alon Zakai2023-08-094-23/+81
| | | | | | | | | | | | | | | | | | | Br and BrOn can consider the code before and after them connected if it might be reached (which is the case if the Br has a condition, which BrOn always has). The wasm2js changes may look a little odd as some of them have this: i64toi32_i32$1 = i64toi32_i32$2; i64toi32_i32$1 = i64toi32_i32$2; I looked into that and the reason is that those outputs are not optimized, and also even in unoptimized wasm2js we do run simplify-locals once (to try to reduce the downsides of flatten). As a result, this PR makes a difference there, and that difference can lead to such odd duplicated code after other operations. However, there are no changes to optimized wasm2js outputs, so there is no actual problem. Followup to #5860.
* OptimizeCasts: Connect adjacent blocks in LinearExecutionWalker (#5866)Alon Zakai2023-08-082-21/+80
| | | | | | | Followup to #5860, this does the same for (part of) OptimizeCasts. As there, this is valid because it's ok if we branch away. This part of the pass picks a different local to get when it knows locals have the same values but one is more refined. It is ok to add a tee earlier even if it isn't used later.
* LocalCSE: Connect adjacent blocks in LinearExecutionWalker (#5867)Alon Zakai2023-08-082-4/+53
| | | | | | | Followup to #5860, this does the same for LocalCSE. As there, this is valid because it's ok if we branch away. This pass adds a local.tee of a reused value and then gets it later, and it's ok to add a tee even if we branch away and do not use it.
* SimplifyGlobals: Connect adjacent blocks in LinearExecutionWalker (#5865)Alon Zakai2023-08-081-0/+55
| | | | | | | Followup to #5860, this does the same for SimplifyGlobals as for SimplifyLocals. As there, this is valid because it's ok if we branch away. This part of the pass applies a global value to a global.get based on a dominating global.set, so any dominance is good enough for us.
* LinearExecutionTraversal: Add an option to connect adjacent code, use in ↵Alon Zakai2023-08-085-18/+116
| | | | | | | | | | | SimplifyLocals (#5860) This addresses most of the minor regression from the correctness fix in #5857. That PR makes us consider calls as branching, but in some cases it is ok to ignore that branching (see the comment in the code here), which this PR allows as an option. This undoes one test change from that PR, showing it undoes the regression for SimplifyLocals. More tests are added to cover this specifically as well.
* Fix LinearExecutionWalker on calls (#5857)Alon Zakai2023-08-074-2/+205
| | | | | | | | | | | | | | Calls were simply not handled there, so we could think we were still in the same basic block when we were not, affecting various passes (but somehow this went unnoticed until the TNHOracle #5850 ran on some particular Java code). One existing test was affected, and two new tests are added: one for TNHOracle where I detected this, and one in OptimizeCasts which is perhaps a simpler way to see the problem. All the cases but the TNH one, however, do not need this fix for correctness since they actually don't care if a call would throw. As a TODO, we should find a way to undo this minor regression. The regression only affects builds with EH enabled, though, so most users should be unaffected even in the interm.