summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* [analysis] Improve lattice fuzzer (#6050)Thomas Lively2023-10-273-119/+525
| | | | | | | | | | Implement new `RandomLattice` and `RandomFullLattice` utilities that are lattices randomly created from other lattices. By recursively using themselves as the parameter lattices for lattices like `Inverted` and `Lift`, these random lattices can become arbitrarily nested. Decouple the checking of lattice properties from the checking of transfer function properties by creating a new, standalone `checkLatticeProperties` function.
* Allow rec groups of public function types in closed world (#6053)Alon Zakai2023-10-261-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Closed-world mode allows function types to escape if they are on exported functions, because that has been possible since wasm MVP and cannot be avoided. But we need to also allow all types in those type's rec groups as well. Consider this case: (module (rec (type $0 (func)) (type $1 (func)) ) (func "0" (type $0) (nop) ) (func "1" (type $1) (nop) ) ) The two exported functions make the two types public, so this module validates in closed world mode. Now imagine that metadce removes one export: (module (rec (type $0 (func)) (type $1 (func)) ) (func "0" (type $0) (nop) ) ;; The export "1" is gone. ) Before this PR that no longer validates, because it only marks the type $0 as public. But when a type is public that makes its entire rec group public, so $1 is errored on. To fix that, this PR allows all types in a rec group of an exported function's type, which makes that last module validate.
* [analysis] Implement a Lift lattice (#6040)Thomas Lively2023-10-253-0/+88
| | | | This lattice "lifts" another lattice by inserting a new bottom element underneath it.
* Precompute: Check defaultability, not nullability (#6052)Alon Zakai2023-10-251-2/+2
| | | Followup to #6048, we did not handle nondefaultable tuples because of this.
* [analysis] Implement Flat lattice (#6039)Thomas Lively2023-10-251-0/+93
| | | | | Given a type `T`, `Flat<T>` is the lattice where none of the values of `T` are comparable except with themselves, but they are all greater than a common bottom element not in `T` and less than a common top element also not in `T`.
* [analysis] Add a FullLattice concept and Inverted lattice (#6038)Thomas Lively2023-10-254-4/+89
| | | | | | The FullLattice concept extends the base Lattice with `getTop` and `meet` operations. The `Inverted` lattice uses these operations to reverse the order of an arbitrary full lattice, for example to create a lattice of integers ordered by `>` rather than by `<`.
* [analysis] Implement an Int lattice (#6037)Thomas Lively2023-10-251-0/+63
| | | Implement a generic lattice template for integral types ordered by `<`.
* [analysis] Implement a Bool lattice (#6036)Thomas Lively2023-10-251-0/+47
| | | | | This is a lattice with two elements: `false` is bottom and `true` is top. Add a new gtest file for testing lattices.
* [analysis][NFC] Rename `makeLeastUpperBound` to `join` and move it to ↵Thomas Lively2023-10-255-64/+70
| | | | | | | | | | | | | | | | lattice (#6035) In general, the operation of taking the least upper bound of two lattice elements may depend on some state stored in the lattice object rather than in the elements themselves. To avoid forcing the elements to be larger and more complicated in that case (by storing a parent pointer back to the lattice), move the least upper bound operation to make it a method of the lattice rather than the lattice element. This is also more consistent with where we put e.g. the `compare` method. While we are at it, rename `makeLeastUpperBound` to `join`, which is much shorter and nicer. Usually we avoid using esoteric mathematical jargon like this, but "join" as a normal verb actually describes the operation nicely, so I think it is ok in this case.
* [analysis] Simplify core analysis code (#6034)Thomas Lively2023-10-258-150/+87
| | | | | | | | | | | | | | Simplify the monotone analyzer by replacing all the state it used to store in `BlockState` with a simple vector of lattice elements. Use simple indices to refer to both blocks and their associated states in the vector. Remove the ability for transfer functions to control the initial enqueued order of basic blocks since that was a leaky abstraction. Replace the worklist with a UniqueDeferredQueue since that has generally proven to be more efficient in smiilarly contexts, and more importantly, it has a nicer API. Make miscellaneous simplifications to other code as well. Delete a few unit tests that exposed the order in which blocks were analyzed because they printed intermediate results. These tests should be replaced with tests of analyses' public APIs in the future.
* Add missing includes (#6049)Thomas Lively2023-10-252-0/+3
| | | | These missing includes were not a problem in our standard build configuration, but were breaking other build configurations.
* Typed Continuations: Add cont type (#5998)Frank Emrich2023-10-2410-33/+250
| | | | | | | | | This PR is part of a series that adds basic support for the [typed continuations proposal](https://github.com/wasmfx/specfx). This PR adds continuation types, of the form `(cont $foo)` for some function type `$foo`. The only notable changes affecting existing code are the following: - This is the first `HeapType` which has another `HeapType` (rather than, say, a `Type`) as its immediate child. This required fixes to certain traversals that have a flag for being at the toplevel of a type. - Some shared logic for parsing `HeapType`s has been factored out.
* Fix unreachable code in LocalGraph by making it imprecise there (#6048)Alon Zakai2023-10-243-21/+29
| | | | | | | | | | Followup to #6046 - the fuzzer found we missed handling the case of the entry itself being unreachable, or of an unreachable loop later. Properly identifying unreachable code requires a flow analysis, unfortunately, so this PR gives up on that and instead allows LocalGraph to be imprecise in unreachable code. That avoids adding any overhead, but does mean the IR may be slightly confusing when debugging. It does not have any optimization downsides, however, as it only affects unreachable code. Also add a dump() impl in that file which helps debugging.
* Fix handling of exported imported functions (#6044)Alon Zakai2023-10-244-4/+19
| | | | | | | | Two trivial places did not handle that case, and assumed an exported function was actually defined (and not imported). Also add some const stuff to fix compilation after this change. This was discovered by #6026
* [NFC] LocalGraph: Optimize params with no sets (#6046)Alon Zakai2023-10-241-1/+34
| | | | | | | | | | | If a local index has no sets, then all gets of that index read from the entry block (a param, or a zero for a local). This is actually a common case, where a param has no other set, and so it is worth optimizing, which this PR does by avoiding any flowing operation at all for that index: we just skip and write the entry block as the source of information for such gets. #6042 on precompute-propagate goes from 3 minutes to 2 seconds with this (!). But that testcase is rather special in that it is a huge function with many, many gets in it, so the overhead we remove is very noticeable there.
* [NFC] LocalGraph: Move definition to logical place (#6045)Alon Zakai2023-10-241-3/+5
| | | | | | | | | | | | allGets was declared in a scope that kept it alive for all blocks, and at the end of the loop we clear the gets for a particular block. That's clumsy, and makes a followup harder, so this PR moves it to the natural place for it. (That is, it moves it to the scope that handles a particular block, and removes the manual clearing-out of the get at the end of the loop iteration.) Optimizing compilers are smart enough to be efficient about stack allocations of objects inside loops anyhow (which I measured). Helps #6042.
* Partially revert #6026 (#6043)Alon Zakai2023-10-231-0/+5
| | | That optimization uncovered some LLVM and Binaryen bugs.
* Fix segfault in catch validator (#6032)Philip Blair2023-10-231-26/+24
| | | | | The problem was if you construct a try expression which references a nonexistent tag in one of its catch blocks, the validation code successfully identified the null pointer but then proceeded to try to read from it.
* [analysis][NFC] Create a TransferFunction concept (#6033)Thomas Lively2023-10-207-176/+138
| | | | | | | | | Factor the static assertions for transfer functions out into a new transfer-function.h header. The concept requires the `getDependents` method to return an input range of basic blocks, and to satisfy that requirement, fix up _indirect_ptr_iterator in cfg-impl.h so that it is a proper iterator. Remove part of the lattice fuzzer that was using a placeholder transfer function in a way that does not satisfy the new type constraints; most of that code will be overhauled in the future anyway.
* [analysis][NFC] Move the stack lattice to analysis/lattices (#6030)Thomas Lively2023-10-202-61/+73
| | | Also shorten various names in the implementation to improve readability.
* [analysis][NFC] Remove unused sign-lattice.cpp (#6029)Thomas Lively2023-10-202-46/+0
| | | | There is no header for this source file and its contents are not used anywhere. It will be easy to reintroduce the sign lattice in the future if we need it.
* [analysis][NFC] Move powerset lattices to their own header (#6028)Thomas Lively2023-10-205-139/+199
| | | | Move the powerset lattices out of lattice.h, which now only contains the Lattice concept, to their own dedicated header in a new analysis/lattices directory.
* RemoveUnusedModuleElements: Make exports skip trampolines (#6026)Alon Zakai2023-10-191-0/+61
| | | | | | | | | | | | | | | | | | | If we export a function that just calls another function, we can export that one instead. Then the one in the middle may be unused, function foo() { return bar(); } export foo; // can be an export of bar This saves a few bytes in rare cases, but probably more important is that it saves the trampoline, so if this is on a hot path, we save a call. Context: emscripten-core/emscripten#20478 (comment) In general this is not needed as inlining helps us out by inlining foo() into the caller (since foo is tiny, that always ends up happening). But exports are a case the inliner cannot handle, so we do it here.
* [analysis][NFC] Use C++20 concepts for Lattice (#6027)Thomas Lively2023-10-187-118/+107
| | | | | | | | | | | | | Replace the static assertions ensuring that Lattice types have the necessary operations with a C++20 concept called `Lattice`. To avoid name conflicts with the new concept, rename existing type parameters named `Lattice` to `L`. When not building with C++20, `Lattice` is a macro that resolves to `typename` so the code continues compiling and has the same behavior, but without any eager checks of the requirements on lattices. Add a new C++20 builder to CI to ensure that future changes compile with both C++17 and C++20. Once we switch to C++20 by default, the new builder can be removed. Update the lint builder to use a recent clang-format that understands concepts.
* SimplifyGlobals: Fold single-use globals to their use (#6023)Alon Zakai2023-10-181-0/+61
| | | | | | | | | | | | | | | | For example, (global $a (struct.new $A)) (global $b (struct.new $B (global.get $A))) => (global $b (struct.new $B (struct.new $A))) and the global $a is now unused. This is valid if $a has no other uses. This saves a little in code size, but should not really help otherwise, as we already look through immutable global.get operations in important optimizations. But the code size may matter if there are many such single- use globals.
* [Outlining] Filter branches & returns (#6024)Ashley Nelson2023-10-182-4/+30
| | | Adds a function to filter branch and return instructions from being included in potential outlining candidates.
* Reuse existing function types for blocks (#6022)Thomas Lively2023-10-187-64/+176
| | | | | | | | | | | | | | | | | | | | | Type annotations on multivalue blocks (and loops, ifs, and trys) are type indices that refer to function types in the type section. For these type annotations, the identities of the function types does not matter. As long as the referenced type has the correct parameters and results, it will be valid to use. Previously, when collecting module types, we always used the "default" function type for multivalue control flow, i.e. we used a final function type with no supertypes in a singleton rec group. However, in cases where the program already contains another function type with the expected signature, using the default type is unnecessary and bloats the type section. Update the type collecting code to reuse existing function types for multivalue control flow where possible rather than unconditionally adding the default function type. Similarly, update the binary writer to use the first heap type with the required signature when emitting annotations on multivalue control flow structures. To make this all testable, update the printer to print the type annotations as well, rather than just the result types. Since the parser was not able to parse those newly emitted type annotations, update the parser as well.
* Fuzzer: Allow non-nullable locals (#6019)Alon Zakai2023-10-181-9/+30
| | | | | | | Remove the code that avoided such locals. To avoid much new trapping, add logic to set a value to such locals if they have accesses that are not dominated by a set already. Also in makeTrivial only rarely emit a local.get of a non-nullable local (prefer a constant).
* SimplifyGlobals: Run on function code in missing places (#6020)Alon Zakai2023-10-181-4/+10
| | | | | | | | | | | | | | The pass was written before we had relevant code in module locations, but now with GC we can have global.gets of more things. The scanner did not run on global code in a way that is not a problem yet, but will be for a later PR I'll open. It will be tested there. That is, right now there is no optimization that is confused by the fact that we did not scan code at the module level, but the next PR will do that. The use modifier did not run on global code either, which was an actual missed optimization opportunity: There are cases where we want to modify a global.get to point to another one, and such a get might be in global code, not just in a function. A test is added for that.
* [Outlining] Filter Local Set (#6018)Ashley Nelson2023-10-183-3/+79
| | | | | Adds a general purpose walker named FilterStringifyWalker, intended to walk control flow and take note of whether any of the expressions satisfy the condition. Also includes an << overload for SuffixTree::RepeatedSubstring to make debugging easier.
* [NFC] Some misc comments for clarification (#6017)Alon Zakai2023-10-171-0/+2
|
* Add getGeneralSuperType() that includes basic supers, and use in fuzzer (#6005)Alon Zakai2023-10-175-10/+47
| | | | With this, the fuzzer can replace e.g. an eq expression with a specific struct type, because now it is away that struct types have eq as their ancestor.
* Fuzzer: Move logic for adding a new local on demand to local.get (#6008)Alon Zakai2023-10-171-15/+36
| | | | | | | Previously makeTrappingRefUse would add a local on demand if one was missing for the type, and add a tee for it. This PR moves that logic to makeLocalGet so that we get those benefits any time we want to emit a local.get of a local type that does not exist (including from makeTrappingRefUse which calls makeLocalGet).
* [NFC] Rename getSuperType to getDeclaredSuperType (#6015)Alon Zakai2023-10-1720-49/+51
| | | | A later PR will add getSuperType which will mean "get the general super type - either declared, or not".
* Heap2Local: Refinalize when removing a cast (#6012)Alon Zakai2023-10-161-0/+14
|
* [wasm-split] Fix instrumentation to work with memory 64 (#6013)Thomas Lively2023-10-161-21/+23
| | | | Correctly use the output memory's index type when generating the __write_profile function. Requires moving some code around, but is a very small fix.
* GUFA: Add missing set of optimized boolean (#6010)Alon Zakai2023-10-161-0/+1
| | | | Without marking us as having optimized we didn't refinalize, and broke validation.
* Add an "unsubtyping" optimization (#5982)Thomas Lively2023-10-108-3/+591
| | | | | | | | | | | | | | Add a new pass that analyzes the module to find the minimal subtyping relation that is necessary to maintain the validity and semantics of the program and rewrites the types to use this minimal relation. Besides eliminating references to otherwise-unused intermediate types, this optimization should unlock significant additional optimizing power in other type optimizations that are constrained by having to maintain supertype validity, since after this new optimization there are fewer and more general supertypes. The analysis works by visiting each expression and module element to collect the subtypings that are required to maintain its validity, then, using that as a starting point, iteratively adding new subtypings required by type definitions and casts until reaching a fixed point.
* Fix a bug printing and emitting empty, passive element segments (#6002)Thomas Lively2023-10-091-7/+4
| | | | | | | | Empty, passive element segments were always emitted as having `func` type because all their elements trivially were RefFunc (because they have no elements) and because we were incorrectly checking table types if they existed instead of the element segment's type directly to see if it was non-func. Fix the bug by checking each element segment's type directly and add a test.
* Automatically discard global effects in the rare passes that add effects (#5999)Alon Zakai2023-10-0614-0/+53
| | | | | All logging/instrumentation passes need to do this, to avoid us using stale global effects that are too low (too high is not optimal either, but at least it cannot cause bugs).
* Compute full transitive closure in GlobalEffects (#5992)Alon Zakai2023-10-061-33/+156
|
* [typed-cont] Allow result types on tags (#5997)Frank Emrich2023-10-052-8/+33
| | | | | | | | | | | This PR is part of a series that adds basic support for the typed continuations proposal. This PR relaxes the restriction that tags must not have results , only params. Tags with results must not be used for exception handling and are only allowed if the typed continuations feature is enabled. As a minor point, this PR also changes the printing of tags without params: To make the presentation consistent, (param) is omitted when printing a tag.
* [typed-cont] Add feature flag (#5996)Frank Emrich2023-10-055-1/+16
| | | | | | | This PR is part of a series that adds basic support for the [typed continuations proposal](https://github.com/wasmfx/specfx). This particular PR simply extends `FeatureSet` with a corresponding entry for this proposal.
* [Outlining] Adds separator context (#5977)Ashley Nelson2023-10-043-32/+118
| | | Adds a std::variant to represent the context of why a unique symbol was inserted in the stringified module. This allows us to pass necessary contextual data to subclasses of StringifyWalker in a structured manner.
* Work around a gcc 13 issue with signbit that made us not compute fmin of -0 ↵Alon Zakai2023-10-041-4/+14
| | | | properly (#5994)
* [Outlining] Adds SuffixTree::RepeatSubstring dedupe test (#5972)Ashley Nelson2023-10-043-0/+56
| | | This PR adds a StringProcessor struct intended to hold functions that filter vectors of SuffixTree::RepeatedSubstring, and a test of its first functionality, removing overlapping repeated substrings.
* RemoveUnusedBrs: Allow less unconditional work and in particular division ↵Alon Zakai2023-10-032-20/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#5989) Fixes #5983: The testcase from there is used here in a new testcase remove-unused-brs_levels in which we check if we are willing to unconditionally do a division operation. Turning an if with an arm that does a division into a select, which always does the division, is almost 5x slower, so we should probably be extremely careful about doing that. I took some measurements and have some suggestions for changes in this PR: * Raise the cost of div/rem to what I measure on my machine, which is 5x slower than an add, or worse. * For some reason we added the if arms rather than take the max of them, so fix that. This does not help the issue, but was confusing. * Adjust TooCostlyToRunUnconditionally in the pass from 9 to 8 (this helps balance the last point). * Use half that value when not optimizing for size. That is, we allow only 4 extra unconditional work normally, and 8 in -Os, and when -Oz then we allow any extra amount. Aside from the new testcases, some existing ones changed. They all appear to change in a reasonable way, to me. We should perhaps go even further than this, and not even run a division unconditionally in -Os, but I wasn't sure it makes sense to go that far as other benchmarks may be affected. For now, this makes the benchmark in #5983 run at full speed in -O3 or -Os, and it remains slow in -Oz. The modified version of the benchmark that only divides in the if (no other operations) is still fast in -O3, but it become slow in -Os as we do turn that if into a select (but again, I didn't want to go that far as to overfit on that one benchmark).
* Asyncify: Improve comments (#5987)Heejin Ahn2023-10-033-43/+56
| | | | | | | | This fixes some outdated comments and typos in Asyncify and improves some other comments. This tries to make code comments more readable by making them more accurate and also by using the three state (normal, unwinding, and rewinding) consistently. Drive-by fix: Typo fixes in SimplifyGlobals and wasm-reduce option.
* Asyncify: Simpify if into i32.or (#5988)Heejin Ahn2023-10-031-17/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | ```wast (if (result i32) (expr0) (i32.const 1) (expr1) ) ``` can be written as ```wast (i32.or (expr0) (expr1) ) ``` Also this removes some unused variables and methods. This also adds an optimization for ```wast (i32.eqz (global.get $__asyncify_state) ) ``` in `--mod-asyncify-always-and-only-unwind` to fix an unexpected regression caused by this.
* [NFC] Mark operator== as const (#5990)walkingeyerobot2023-10-031-1/+1
| | | | | C++20 will automatically generate an operator== with reversed operand order, which is ambiguous with the written operator== when one argument is marked const and the other isn't.