| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Followup to #6046 - the fuzzer found we missed handling the case of the entry itself
being unreachable, or of an unreachable loop later. Properly identifying unreachable
code requires a flow analysis, unfortunately, so this PR gives up on that and instead
allows LocalGraph to be imprecise in unreachable code. That avoids adding any
overhead, but does mean the IR may be slightly confusing when debugging. It does
not have any optimization downsides, however, as it only affects unreachable code.
Also add a dump() impl in that file which helps debugging.
|
|
|
|
|
|
|
|
| |
Two trivial places did not handle that case, and assumed an exported function
was actually defined (and not imported).
Also add some const stuff to fix compilation after this change.
This was discovered by #6026
|
|
|
|
|
|
|
|
|
|
|
| |
If a local index has no sets, then all gets of that index read from the entry block
(a param, or a zero for a local). This is actually a common case, where a param
has no other set, and so it is worth optimizing, which this PR does by avoiding
any flowing operation at all for that index: we just skip and write the entry block
as the source of information for such gets.
#6042 on precompute-propagate goes from 3 minutes to 2 seconds with this (!).
But that testcase is rather special in that it is a huge function with many, many
gets in it, so the overhead we remove is very noticeable there.
|
|
|
|
|
|
|
|
|
|
|
|
| |
allGets was declared in a scope that kept it alive for all blocks, and at
the end of the loop we clear the gets for a particular block. That's clumsy,
and makes a followup harder, so this PR moves it to the natural place for it.
(That is, it moves it to the scope that handles a particular block, and removes
the manual clearing-out of the get at the end of the loop iteration.) Optimizing
compilers are smart enough to be efficient about stack allocations
of objects inside loops anyhow (which I measured).
Helps #6042.
|
|
|
| |
That optimization uncovered some LLVM and Binaryen bugs.
|
|
|
|
|
| |
The problem was if you construct a try expression which references a nonexistent tag in
one of its catch blocks, the validation code successfully identified the null pointer but
then proceeded to try to read from it.
|
|
|
|
|
|
|
|
|
| |
Factor the static assertions for transfer functions out into a new
transfer-function.h header. The concept requires the `getDependents` method to
return an input range of basic blocks, and to satisfy that requirement, fix up
_indirect_ptr_iterator in cfg-impl.h so that it is a proper iterator. Remove
part of the lattice fuzzer that was using a placeholder transfer function in a
way that does not satisfy the new type constraints; most of that code will be
overhauled in the future anyway.
|
|
|
| |
Also shorten various names in the implementation to improve readability.
|
|
|
|
| |
There is no header for this source file and its contents are not used anywhere.
It will be easy to reintroduce the sign lattice in the future if we need it.
|
|
|
|
| |
Move the powerset lattices out of lattice.h, which now only contains the Lattice
concept, to their own dedicated header in a new analysis/lattices directory.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we export a function that just calls another function, we can export that one
instead. Then the one in the middle may be unused,
function foo() {
return bar();
}
export foo; // can be an export of bar
This saves a few bytes in rare cases, but probably more important is that it saves
the trampoline, so if this is on a hot path, we save a call.
Context: emscripten-core/emscripten#20478 (comment)
In general this is not needed as inlining helps us out by inlining foo() into the
caller (since foo is tiny, that always ends up happening). But exports are a case
the inliner cannot handle, so we do it here.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace the static assertions ensuring that Lattice types have the necessary
operations with a C++20 concept called `Lattice`. To avoid name conflicts with
the new concept, rename existing type parameters named `Lattice` to `L`. When
not building with C++20, `Lattice` is a macro that resolves to `typename` so the
code continues compiling and has the same behavior, but without any eager checks
of the requirements on lattices.
Add a new C++20 builder to CI to ensure that future changes compile with both
C++17 and C++20. Once we switch to C++20 by default, the new builder can be
removed. Update the lint builder to use a recent clang-format that understands
concepts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example,
(global $a (struct.new $A))
(global $b (struct.new $B (global.get $A)))
=>
(global $b (struct.new $B (struct.new $A)))
and the global $a is now unused. This is valid if $a has no other uses.
This saves a little in code size, but should not really help otherwise, as we
already look through immutable global.get operations in important
optimizations. But the code size may matter if there are many such single-
use globals.
|
|
|
| |
Adds a function to filter branch and return instructions from being included in potential outlining candidates.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Type annotations on multivalue blocks (and loops, ifs, and trys) are type
indices that refer to function types in the type section. For these type
annotations, the identities of the function types does not matter. As long as
the referenced type has the correct parameters and results, it will be valid to
use.
Previously, when collecting module types, we always used the "default" function
type for multivalue control flow, i.e. we used a final function type with no
supertypes in a singleton rec group. However, in cases where the program already
contains another function type with the expected signature, using the default
type is unnecessary and bloats the type section.
Update the type collecting code to reuse existing function types for multivalue
control flow where possible rather than unconditionally adding the default
function type. Similarly, update the binary writer to use the first heap type
with the required signature when emitting annotations on multivalue control flow
structures. To make this all testable, update the printer to print the type
annotations as well, rather than just the result types. Since the parser was not
able to parse those newly emitted type annotations, update the parser as well.
|
|
|
|
|
|
|
| |
Remove the code that avoided such locals.
To avoid much new trapping, add logic to set a value to such locals if they have
accesses that are not dominated by a set already. Also in makeTrivial only
rarely emit a local.get of a non-nullable local (prefer a constant).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pass was written before we had relevant code in module locations, but now
with GC we can have global.gets of more things.
The scanner did not run on global code in a way that is not a problem yet, but
will be for a later PR I'll open. It will be tested there. That is, right now there is no
optimization that is confused by the fact that we did not scan code at the module
level, but the next PR will do that.
The use modifier did not run on global code either, which was an actual missed
optimization opportunity: There are cases where we want to modify a global.get
to point to another one, and such a get might be in global code, not just in a
function. A test is added for that.
|
|
|
|
|
| |
Adds a general purpose walker named FilterStringifyWalker, intended to walk control flow and take note of whether any of the expressions satisfy the condition.
Also includes an << overload for SuffixTree::RepeatedSubstring to make debugging easier.
|
| |
|
|
|
|
| |
With this, the fuzzer can replace e.g. an eq expression with a specific struct type,
because now it is away that struct types have eq as their ancestor.
|
|
|
|
|
|
|
| |
Previously makeTrappingRefUse would add a local on demand if one was
missing for the type, and add a tee for it. This PR moves that logic to
makeLocalGet so that we get those benefits any time we want to emit a
local.get of a local type that does not exist (including from
makeTrappingRefUse which calls makeLocalGet).
|
|
|
|
| |
A later PR will add getSuperType which will mean "get the general super type -
either declared, or not".
|
| |
|
|
|
|
| |
Correctly use the output memory's index type when generating the __write_profile
function. Requires moving some code around, but is a very small fix.
|
|
|
|
| |
Without marking us as having optimized we didn't refinalize, and broke validation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new pass that analyzes the module to find the minimal subtyping relation
that is necessary to maintain the validity and semantics of the program and
rewrites the types to use this minimal relation. Besides eliminating references
to otherwise-unused intermediate types, this optimization should unlock
significant additional optimizing power in other type optimizations that are
constrained by having to maintain supertype validity, since after this new
optimization there are fewer and more general supertypes.
The analysis works by visiting each expression and module element to collect the
subtypings that are required to maintain its validity, then, using that as a
starting point, iteratively adding new subtypings required by type definitions
and casts until reaching a fixed point.
|
|
|
|
|
|
|
|
| |
Empty, passive element segments were always emitted as having `func` type
because all their elements trivially were RefFunc (because they have no
elements) and because we were incorrectly checking table types if they existed
instead of the element segment's type directly to see if it was non-func.
Fix the bug by checking each element segment's type directly and add a test.
|
|
|
|
|
| |
All logging/instrumentation passes need to do this, to avoid us using stale
global effects that are too low (too high is not optimal either, but at least it
cannot cause bugs).
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This PR is part of a series that adds basic support for the typed continuations proposal.
This PR relaxes the restriction that tags must not have results , only params. Tags with
results must not be used for exception handling and are only allowed if the typed
continuations feature is enabled.
As a minor point, this PR also changes the printing of tags without params: To make the
presentation consistent, (param) is omitted when printing a tag.
|
|
|
|
|
|
|
| |
This PR is part of a series that adds basic support for the [typed continuations
proposal](https://github.com/wasmfx/specfx).
This particular PR simply extends `FeatureSet` with a corresponding entry for
this proposal.
|
|
|
| |
Adds a std::variant to represent the context of why a unique symbol was inserted in the stringified module. This allows us to pass necessary contextual data to subclasses of StringifyWalker in a structured manner.
|
|
|
|
| |
properly (#5994)
|
|
|
| |
This PR adds a StringProcessor struct intended to hold functions that filter vectors of SuffixTree::RepeatedSubstring, and a test of its first functionality, removing overlapping repeated substrings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#5989)
Fixes #5983: The testcase from there is used here in a new testcase
remove-unused-brs_levels in which we check if we are willing to unconditionally
do a division operation. Turning an if with an arm that does a division into a
select, which always does the division, is almost 5x slower, so we should probably
be extremely careful about doing that.
I took some measurements and have some suggestions for changes in this PR:
* Raise the cost of div/rem to what I measure on my machine, which is 5x slower
than an add, or worse.
* For some reason we added the if arms rather than take the max of them, so
fix that. This does not help the issue, but was confusing.
* Adjust TooCostlyToRunUnconditionally in the pass from 9 to 8 (this helps
balance the last point).
* Use half that value when not optimizing for size. That is, we allow only 4 extra
unconditional work normally, and 8 in -Os, and when -Oz then we allow any
extra amount.
Aside from the new testcases, some existing ones changed. They all appear to
change in a reasonable way, to me.
We should perhaps go even further than this, and not even run a division
unconditionally in -Os, but I wasn't sure it makes sense to go that far as
other benchmarks may be affected. For now, this makes the benchmark in
#5983 run at full speed in -O3 or -Os, and it remains slow in -Oz. The
modified version of the benchmark that only divides in the if (no other
operations) is still fast in -O3, but it become slow in -Os as we do turn that
if into a select (but again, I didn't want to go that far as to overfit on that one
benchmark).
|
|
|
|
|
|
|
|
| |
This fixes some outdated comments and typos in Asyncify and improves
some other comments. This tries to make code comments more readable by
making them more accurate and also by using the three state (normal,
unwinding, and rewinding) consistently.
Drive-by fix: Typo fixes in SimplifyGlobals and wasm-reduce option.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
```wast
(if (result i32)
(expr0)
(i32.const 1)
(expr1)
)
```
can be written as
```wast
(i32.or
(expr0)
(expr1)
)
```
Also this removes some unused variables and methods.
This also adds an optimization for
```wast
(i32.eqz
(global.get $__asyncify_state)
)
```
in `--mod-asyncify-always-and-only-unwind` to fix an unexpected
regression caused by this.
|
|
|
|
|
| |
C++20 will automatically generate an operator== with reversed operand order,
which is ambiguous with the written operator== when one argument is marked
const and the other isn't.
|
|
|
|
|
|
| |
The parser previously parsed labels and could attach them to control flow
structures, but did not maintain the context necessary to correctly parse
branches. Support parsing labels as both names and indices in IRBuilder,
handling shadowing correctly, and use that support to implement parsing of br.
|
|
|
|
|
|
| |
Just like we do with other casts, refine the cast type to be the greatest lower
bound of its previous cast type and its input type. The difference is that the
output type of ref.test remains i32, but it's still useful to retain more
precise type information.
|
|
|
|
| |
Instead of just reporting the reason and line + column, also log out the element
the error occurred at.
|
|
|
|
|
|
| |
Move the topological sort from the constructor to a separate method. This CRTP
utility calls into its subclass to query supertype relationships, but doing so
in the base class constructor means that the subclass has not been fully
initialized yet, which can cause problems.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we see A->f0 = A->f0 then we might be copying fields not only between
instances of A but also of any subtypes of A, and so if some subtype has
value x then that x might now have reached any other subtype of A
(even in a sibling type, so long as A is their parent).
We already thought we were handling that, but the mechanism we used to
do so (copying New info to Set info, and letting Set info propagate) was not
enough.
Also add a small constructor to save the work of computing subTypes again.
Add TODOs for some cases that we could optimize regarding copies but
do not, yet.
|
| |
|
|
|
| |
Like table.set, it can modify a table.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
local2stack removes a pair of
local.set 0
local.get 0
when that set is not used anywhere else: whatever value is put into the local,
we can just leave it on the stack to replace the get. However, we only handled
actual uses of the set which we checked using LocalGraph. There may be code
that does not actually use the set local, but needs that set purely for validation
reasons:
local.set 0
local.get 0
block
local.set 0
end
local.get
That last get reads the value set in the block, so the first set is not used by it.
But for validation purposes, the inner set stops helping at the block end, so
we do need that initial set.
To fix this, check for gets that need our set to validate before removing any.
Fixes #5917
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#5968)
Apparently $N (e.g. FooClass$5) is a convention in Java for anonymous classes, so our
$N that we use to disambiguate could be confusing. As the way we disambiguate does
not matter, switch to using _N. This PR does that in both TypeSSA and NameTypes.
Also make NameTypes "lint" names as it goes. That pass tries to give types nice names,
leaving existing ones that seem ok, and renaming long or unnamed ones. This PR makes
it aware of the _N notation and it tries to remove it, if removing it does not cause a
collision. An example of how that helps is if TypeSSA creates a subtype $Foo_0 and then
we manage to remove $Foo, then we can use the shorter name for the subtype.
|
|
|
|
|
|
| |
Add a `visitFunctionStart` function to IRBuilder and make it responsible for
setting the function's body when the context is closed. This will simplify
outlining, will be necessary to support branches to function scope properly, and
removes an extra block around function bodies in the new wat parser.
|
|
|
| |
Parse loops in the new wat parser and add support for them to the IRBuilder.
|
|
|
|
|
|
| |
Somewhat counterintuitively, the text syntax for a folded `if` allows any number
of folded instructions in the condition position, not just one. Update the
corresponding `foldedinsts` parsing function to parse arbitrary sequences of
folded instructions and add a test.
|