| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
At the Oct hybrid CG meeting, we decided to add back `exnref`, which was
removed in 2020:
https://github.com/WebAssembly/meetings/blob/main/main/2023/CG-10.md
The new version of the proposal reflected in the explainer:
https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md
While adding support for `exnref` in the current codebase which has all
GC subtype hierarchies, I noticed we might need `noexn` heap type for
the bottom type of `exn`. We don't have it now so I just set it to 0xff
for the moment.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Adds support for the loop instruction to be outlined and a test showing a repeat loop being outlined.
Reviewers: tlively
Reviewed By: tlively
Pull Request: https://github.com/WebAssembly/binaryen/pull/6141
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Change outlining debug logs to use std::cerr
- Add controlFlowQueue push log
- Fix build error with wasm-ir-builder log's use of ShallowExpression
Reviewers: tlively
Reviewed By: tlively
Pull Request: https://github.com/WebAssembly/binaryen/pull/6140
|
|
|
|
|
|
|
|
|
|
| |
Changes the controlFlowQueue used in stringify-walker to push values of Expression*, This ensures that we walk the Wasm module in the same order, regardless of whether the control flow expression is outlined.
Reviewers: tlively
Reviewed By: tlively
Pull Request: https://github.com/WebAssembly/binaryen/pull/6139
|
| |
|
|
|
|
|
|
|
| |
These module fields are especially complex to parse because they contain both
nontrivial types and instructions, so their parsing logic needs to be spread out
across the ParseDecls, ParseModuleTypes, and ParseDefs phases of parsing. This
applies to in-line elements in table definitions as well, which means we need to
be able to match a table to its in-line element segment across multiple phases.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Any function can now be annotated as not to be inlined fully (normally) or not to be
inlined partially. In the future we'll want to read those annotations from the proposed
wasm metadata section on code hints, and from wat text as well, but for now add
trivial passes that set those fields based on function name wildcards, e.g.:
--no-inline=*leave-alone* --inlining
That will not inline any function whose name contains "leave-alone" in the name.
--no-inline disables all inlining (full or partial) while --no-full-inline and
--no-partial-inline affect only full or partial inlining.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A trivial call is something like a function that just calls another immediately,
function foo(x, y) {
return bar(y, 15);
}
We can inline those and expect to benefit in most cases, though we might
increase code size slightly. Hence it makes sense to inline such cases, even
though in general we are careful and do not inline functions with calls in
them; a "trampoline" like that likely has most of the work in the call itself,
which we can avoid by inlining.
Suggested based on findings in Java.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove hardcoded paths for globals/functions/etc. in favor of general code
paths that support all the module elements uniformly. As a result of that, we
now support all parts of wasm, such as tables and element segments, that
we didn't before.
This refactoring is NFC aside from adding functionality. Note that this reduces
the size of wasm-metadce by 10% while increasing its functionality - the
benefits of writing generic code.
To support this, add some trivial generic helpers to get or iterate over module
elements using their kind in a dynamic manner. Using them might make
wasm-metadce slightly slower, but I can't measure any difference.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid adding suffixes when we don't need them to keep names unique.
As background, the suffixes are not used by emcc at all, so they are just
for internal use in the tool. How that works is that metadce gets as input
the list of things the user cares about, with names for them, so it knows
the proper names to give imports and exports, and makes up names for
other things. Those made up names will not be read by the user, so we
can make them prettier as this PR does without breaking anything.
The main benefit of this PR is to make debugging easier.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Parse the legacy v3 syntax for try/catch/catch_all/delegate in both its folded
and unfolded forms.
The first sources of significant complexity is the optional IDs after `catch`
and `catch_all` in the unfolded form, which can be confused for tag indices and
require backtracking to parse correctly.
The second source of complexity is the handling of delegate labels, which are
relative to the try's parent scope despite being parsed after the try's scope
has already started. Handling this correctly requires punching a whole big
enough to drive a truck through through both the parser and IRBuilder
abstractions.
|
|
|
| |
Fixes #6136
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Asyncify gained a way to wrap a pass so that it only runs on a given set of
functions, rather than on all functions, so the wrapper "filters" what the pass
operates on. That was useful in Asyncify as we wanted to only do work on
functions that Asyncify actually instrumented.
There is another place in the code that needs such functionality,
optimizeAfterInlining, which runs optimizations after we inline; again, we
only want to optimize on the functions we know are relevant because they
changed. To do that, move that logic out to a general place so it can be
reused. This makes the code there a lot less hackish.
While doing so make the logic only work on function-parallel passes. It
never did anyhow, but now it asserts on that. (It can't run on a general
pass because a general one does not provide an interface to affect which
functions it operates on; a general pass is entirely opaque in that way.)
|
|
|
| |
See #6088
|
|
|
|
| |
Also fix the parser to correctly error if an imported item appears after a
non-imported item and make the corresponding fix to the test.
|
|
|
|
|
|
|
|
|
|
|
|
| |
When branches target control flow structures other than blocks or loops, the
IRBuilder wraps those control flow structures with an extra block for the
branches to target in Binaryen IR. Usually that block has the same type as the
control flow structure it wraps, but when the control flow structure is
unreachable because all its bodies are unreachable, the wrapper block may still
need to have a non-unreachable type if it is targeted by branches.
Previously the wrapper block would also be unreachable in that case. Fix the bug
by tracking whether the wrapper block will be targeted by any branches and use
the control flow structure's original, non-unreachable type if so.
|
|
|
| |
Adds support for call_indirect to wasm-ir-builder. Tests this works by outlining a sequence including call_indirect.
|
|
|
|
|
|
|
|
|
|
|
| |
Besides If, no control flow structure consumes values from the stack. Fix a
bug in IRBuilder that was causing it to pop control flow children. Also fix a
follow on bug in outlining where it did not make the If condition available on
the stack when starting to visit an If. This required making push() part of
the public API of IRBuilder.
As a drive-by, also add helpful debug logging to IRBuilder.
Co-authored-by: Ashley Nelson <nashley@google.com>
|
|
|
|
|
|
| |
Checking a couple of testing TODOs off and adding more tests of the outlining pass for outlining:
- a sequence at the beginning of an existing function
- a sequence that is outlined into a function that takes no arguments
- multiple sequences from the same source function into different outlined functions
|
|
|
|
|
|
|
|
|
|
| |
This implements an idea I mentioned in the past, to extract the subtyping discovery
code out of Unsubtyping so it could be reused elsewhere. Example possible uses:
the validator could use to remove a lot of code, and also a future PR of mine will
need it. Separately from those, I think this is a nice refactoring as it makes Unsubtyping
much smaller.
This just moves the code out and adds some C++ template elbow grease as needed.
|
|
|
|
|
|
|
| |
Finish the transfer functions for all expressions except for string
instructions, exception handling instructions, tuple instructions, and branch
instructions that carry values. The latter require more work in the CFG builder
because dropping the extra stack values happens after the branch but before the
target block.
|
|
|
|
|
|
| |
call.without.effects implies a call to the function reference in the last parameter,
so the values sent in the other parameters must be taken into account when
computing LUBs for refining arguments, otherwise we might refine so much that
the intrinsic call no longer validates.
|
|
|
|
| |
Also mark array.new_elem as unimplemented as a drive-by; it previously had an
incorrect implementation.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
| |
Allow outlining to be excluded from the command line on non-Emscripten builds.
|
|
|
|
|
|
| |
This is not needed in GUFA as it tracks local values precisely (each set is connected to
the gets that actually read from it), but in a future PR it will be useful to track local
values per index (each set is connected to all gets for that index, i.e., each local index
is a single "location").
|
|
|
|
|
|
|
|
| |
We had an assert there that was wrong. In fact the assert is just in one of two code paths,
and an optional one: the end situation is we have an expression and a constant to add to it,
and the assert was in the case that the expression is a Const so we can do the add at
compile time (the other code path does the add at runtime). This code path is optional as
Precompute would do such compile-time addition anyhow, but it is nice to fix and leave that
path so that this pass emits fully optimal code.
|
|
|
| |
Adds an outlining pass that performs outlining on a module end to end, and two tests.
|
|
|
|
|
|
|
|
|
|
|
| |
Class template argument deduction (CTAD) is a C++17 feature that allows
variables to be declared with class template types without specifying the
template parameters. Deduction guides are a mechanism by which template authors
can control how the template parameters are inferred when CTAD is used. The
Google style guide prohibits the use of CTAD except where template authors opt
in to supporting it by providing explicit deduction guides. For compatibility
with users adhering to Google style, set the compiler flag to check this
condition and add the necessary deduction guides to make the compiler happy
again.
|
|
|
|
|
|
| |
The new wat parser parses block, if, loop, then, and else keywords directly
rather than depending on code generated from gen-s-parser.py. Filter these
keywords out in gen-s-parser.py when generating the new wat parser and delete
the stub functions that the removed generated code used to depend on.
|
| |
|
|
|
|
| |
Also add testcases to be comprehensive and notice changes if we ever decide to
modify that behavior.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
interactions with a parent (#6089)
We had a simple rule that if we reach an expression twice then we give up, which makes
sense for say a block: if one allocation flows out of it, then another can't - it would get
mixed in with the other one, which is a case we don't optimize. However, there are
cases where a parent has multiple children and different interactions with them, like
a struct.set: the reference child does not escape, but the value child does. Before this
PR if we reached the value child first, we'd mark the parent as seen, and then the reference
child would see it isn't the first to get here, and not optimize.
To fix this, reorder the code to handle this case. The manner of interaction between the
child and the parent decides whether we mark the parent as seen and to be further
avoided.
Noticed by the determinism fuzzer, since the order of analysis mattered here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new optimization will eventually weaken casts by generalizing (i.e.
un-refining) their output types. If a cast is weakened enough that its output
type is a supertype of its input type, the cast will be able to be removed by
OptimizeInstructions.
Unlike refining cast inputs, generalizing cast outputs can break module
validation. For example, if the result of a cast is stored to a local and the
cast is weakened enough that its output type is no longer a subtype of that
local's type, then the local.set after the cast will no longer validate. To
avoid this validation failure, this optimization would have to generalize the
type of the local as well. In general, the more we can generalize the types of
program locations, the more we can weaken casts of values that flow into those
locations.
This initial implementation only generalizes the types of locals and does not
actually weaken casts yet. It serves as a proof of concept for the analysis
required to perform the full optimization, though. The analysis uses the new
analysis framework to perform a reverse analysis tracking type requirements for
each local and reference-typed stack value in a function.
Planned and potential future work includes:
- Implementing the transfer function for all kinds of expressions.
- Tracking requirements on the dynamic types of each location to generalize
allocations as well.
- Making the analysis interprocedural and generalizing the types of more
program locations.
- Optimizing tuple-typed locations.
- Generalizing only those locations necessary to eliminate at least one cast
(although this would make the anlysis bidirectional, so it is probably better
left to separate passes).
|
|
|
|
|
|
|
|
| |
Because we currently strip some data segments (i.e. EM_JS strings)
during `--post-emscripten` this is too late as `--separate-data-segments`
always runs in `wasm-emscripten-finalize`.
Once emscripten switches over to using the pass directly we can remove
the support from `wasm-emscripten-finalize`
|
|
|
|
|
|
|
|
|
| |
LocalCSE is nice for large expressions, but for small things it has always been of
unclear benefit since VMs also do GVN/CSE anyhow. So we are likely not speeding
anything up, but hopefully we are reducing code size at least. Doing LocalCSE on
something small like a global.get is very possibly going to increase code size,
however (since we add a tee, and since the local gets are of similar size to global
gets - depends on LUB sizes). On real-world Java code that overhead is noticeable,
so this PR makes us more careful, and we skip things of size 1 (no children).
|
|
|
|
| |
To support parsing calls, add support for parsing function indices and building
calls with IRBuilder.
|
|
|
|
| |
Update the C++20 builder to use Ubuntu 20.04 to catch problems building with its
system compiler. Also fix such a problem in wasm-fuzz-lattices.cpp.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously CFGWalker designated a particular block as the "exit" block, but it
was just the block that happened to appear at the end of the function that
returned values by implicitly flowing them out. That exit block was not tied in
any way to other blocks that might end in returns, so analyses that needed to
perform some action at the end of the function would have had to perform that
action at the end of the designated exit block but also separately at any return
instruction.
Update CFGWalker to make the exit block a synthetic empty block that is a
successor of all other blocks tthat implicitly or explicitly return from the
function in case there are multiple such blocks, or to make the exit block the
single returning block if there is only one. This means that analyses will only
perform their end-of-function actions at the end of the exit block rather than
additionally at every return instruction.
|
|
|
| |
Helps #5951
|
|
|
|
|
|
|
|
|
| |
Combine the `transfer` and `getDependents` methods of a transfer function so
that a transfer function only has to implement `transfer`, which now returns a
range of basic blocks that may need to be re-analyzed.
To make it easier to implement the returned basic block range, change the
requirement so that it provides iterators to `const BasicBlock*` rather than
`BasicBlock`. This allows us to entirely remove cfg-impl.h.
|
|
|
|
|
|
|
|
|
| |
Previously, modifying a single vector element of a `Shared<Vector>` element
required materializing a full vector to do the join. When there is just a single
element to update, materializing all the other elements with bottom value is
useless work. Add a `Vector<L>::SingletonElement` utility that represents but
does not materialize a vector with a single non-bottom element and allow it to
be passed to `Vector<L>::join`. Also update `Shared` and `Inverted` so that
`SingletonElement` joins still work on vectors wrapped in those other lattices.
|
|
|
|
|
|
|
| |
Remove the ability to represent the top element of the stack lattice since it
isn't necessary. Also simplify the element type to be a simple vector, update
the lattice method implementations to be more consistent with implementations in
other lattices, and make the tests more consistent with the tests for other
lattices.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The analysis framework stores a separate lattice element for each basic block
being analyzed to represent the program state at the beginning of the block.
However, in many analyses a significant portion of program state is not
flow-sensitive, so does not benefit from having a separate copy per block. For
example, an analysis might track constraints on the types of locals that do not
vary across blocks, so it really only needs a single copy of the constrains for
each local. It would be correct to simply duplicate the state across blocks
anyway, but it would not be efficient.
To make it possible to share a single copy of a lattice element across basic
blocks, introduce a `Shared<L>` lattice. Mathematically, this lattice represents
a single ascending chain in the underlying lattice and its elements are ordered
according to sequence numbers corresponding to positions in that chain.
Concretely, though, the `Shared<L>` lattice only ever materializes a single,
monotonically increasing element of `L` and all of its elements provide access
to that shared underlying element.
`Shared<L>` will let us get the benefits of having mutable shared state in the
concrete implementation of analyses without losing the benefits of keeping those
analyses expressible purely in terms of the monotone framework.
|