| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
This reverts commit 9090ce56fcc67e15005aeedc59c6bc6773220f11.
This has the effect of once more propagating string constants from
globals to other places (and from non-globals too), which is useful
for various optimizations even if it isn't useful in the final output.
To fix the final output problem, #6257 added a pass that is run at the
end to collect string.const to globals, which allows us to once more
propagate strings in the optimizer, now without a downside.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass finds all string.const and creates globals for them. After this transform, no
string.const appears anywhere but in a global, and each string appears in one global
which is then global.get-ed everywhere.
This avoids overhead in VMs where executing a string.const is an allocation, and is
also a good step towards imported strings. For that, this pass will be extended from
gathering to a full lowering pass, which will first gather into globals as this pass does,
and then turn each of those globals with a string.const into an imported externref.
(For that reason this pass is in a file called StringLowering, as the two passes will
share much of their code, and the larger pass should decide the name I think.)
This pass runs in -O2 and above. Repeated executions have no downside (see
details in code).
|
|
|
|
| |
Specifically offsets larger than 2^32 which were being interpreted
misinterpreted here as very large int64_t values.
|
|
|
|
|
|
| |
The previous name feels too verbose and unwieldy.
This also removes the "new-to-old EH" placeholder. I think it'd be
better to add it back when it is actually added.
|
| |
|
|
|
|
|
|
| |
Rather than `(pop valtype*)`, use `(pop valtype)`, where `valtype` is now
allowed to be a tuple. This will make it possible to parse un-folded multivalue
pops in the new text parser. The alternative would have been to put an arity in
the syntax like we have for other tuple instructions, but that's much uglier.
|
|
|
|
| |
Instead of e.g. `(i32 i32)`, use `(tuple i32 i32)`. Having a keyword to
introduce the s-expression is more consistent with the rest of the language.
|
|
|
|
|
|
|
|
| |
This adds support `CFGWalker` for the new EH instructions (`try_table`
and `throw_ref`). `CFGWalker` is used by many different passes, but in
the same vein as #3494, this adds tests for `RedundantSetElimination`
pass. `rse-eh.wast` file is created from translated and simplified
version of `rse-eh-old.wast`, but many tests were removed because we
don't have special `catch` block or `delegate` anymore.
|
|
|
|
|
|
| |
They might trap. Leave that for RemoveUnusedModuleElements.
Fixes #6230
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An out of bounds active segment traps during startup, which is an effect we must
preserve.
To avoid a regression here, ignore this in TNH mode (where the user assures us
nothing will trap), and also check if a segment will trivially be in bounds and not
trap (if so, it can be removed).
Fixes the remove-unused-module-elements part of #6230
The small change to an existing testcase made a segment there be in bounds,
to avoid this affecting it. Tests for this are in a new file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This translates the old Phase 3 EH instructions, which include `try`,
`catch`, `catch_all`, `delegate`, and `rethrow`, into the new EH
instructions, which include `try_table` (with `catch` / `catch_ref` /
`catch_all` / `catch_all_ref`) and `throw_ref`, passed at the Oct 2023
CG meeting.
This translator can be used as a standalone tool by users of the
previous EH toolchain to generate binaries for the new spec without
recompiling, and also can be used at the end of the Binaryen pipeline to
produce binaries for the new spec while the end-to-end toolchain
implementation for the new spec is in progress.
While the goal of this pass is not optimization, this tries to a little
better than the most naive implementation, namely by omitting a few
instructions where possible and trying to minimize the number of
additional locals, because this can be used as a standalone translator
or the last stage of the pipeline while we can't post-optimize the
results because the whole pipeline (-On) is not ready for the new EH.
|
|
|
|
|
| |
This causes overhead atm since in VMs executing a string.const will actually allocate a
string, and more copies means more allocations. For now, just do not add more. This
required changes to two passes: SimplifyGlobals and Precompute.
|
| |
|
|
|
|
|
|
|
| |
This renames all existing EH lit tests with filenames `*eh*` to
`*eh-old*`. This is a prep work so that we can add tests for the new EH
spec using `*eh*`. The reason I'm trying to split old and new EH test
files is we don't support fuzzing for the new EH yet and I wouldn't want
to exclude old EH tests from fuzzing too because of that.
|
|
|
|
|
|
|
| |
We already applied such globals to other globals, but can do the same to offsets
of data and element segments.
Suggested in #6220
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
E.g.
(i32.add
(select
(i32.const 100)
(i32.const 200)
(..condition..)
)
(i32.const 50)
)
;; =>
(select
(i32.const 150)
(i32.const 250)
(..condition..)
)
We cannot fully precompute the select, but we can "partially precompute" it, by precomputing
its arms using the parent.
This may require looking several steps up the parent chain, which is an awkward operation in
our simple walkers, so to do it we capture stacks of parents and operate directly on them. This
is a little slower than a normal walk, so only do it when we see a promising select, and only in
-O2 and above (this makes the pass 7% or so slower; not a large cost, but best to avoid it in
-O1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We tested --generate-global-effects --vacuum and such, but not
--generate-global-effects -O3 or the other -O flags. Unfortunately, our
targeted testing missed a bug because of that. Specifically, we have special
logic for -O flags to make sure the passes they expand into run with the
proper opt and shrink levels, but that logic happened to also interfere with
global effect computation. It would also interfere with allowing GUFA info
or other things to be stored on the side, which we've proposed. This PR
fixes that + future issues.
The fix is to just allow a pass runner to execute more than once. We thought
to avoid that and assert against it to keep the model "hermetic" (you create
a pass runner, you run the passes, and you throw it out), which feels nice in
a way, but it led to the bug here, and I'm not sure it would prevent any other
ones really. It is also more code. It is simpler to allow a runner to execute more
than once, and add a method to clear it. With that, the logic for -O3 execution
is both simpler and does not interfere with anything but the opt and shrink
level flags: we create a single runner, give it the proper options, and then keep
using that runner + those options as we go, normally.
|
|
|
|
|
|
|
|
|
| |
The new wat parser is much more strict than the legacy wat parser; the latter
accepts all sorts of things that the spec does not allow. To ease an eventual
transition to using the new wat parser by default, update the tests to use the
standard text format in many places where they previously did not. We do not yet
have a way to prevent new errors from being introduced into the test suite, but
at least there will now be many fewer errors when it comes time to make the
switch.
|
|
|
|
|
|
|
|
|
|
|
|
| |
We previously supported (and primarily used) a non-standard text format for
conditionals in which the condition, if-true expression, and if-false expression
were all simply s-expression children of the `if` expression. The standard text
format, however, requires the use of `then` and `else` forms to introduce the
if-true and if-false arms of the conditional. Update the legacy text parser to
require the standard format and update all tests to match. Update the printer to
print the standard format as well.
The .wast and .wat test inputs were mechanically updated with this script:
https://gist.github.com/tlively/85ae7f01f92f772241ec994c840ccbb1
|
|
|
|
|
| |
Update the legacy text parser and all tests to use the standard text format for shared memories, e.g. `(memory $m 1 1 shared)` rather than `(memory $m (shared 1 1))`. Also remove support for non-standard in-line "data" or "segment" declarations.
This change makes the tests more compatible with the new text parser, which only supports the standard format.
|
|
|
|
|
|
| |
These type annotations were removed during the development of the GC proposal,
but we maintained support for parsing them to ease the transition. Now that GC
is shipped, remove support for the non-standard annotation and update our tests
accordingly.
|
|
|
|
|
|
|
|
|
|
| |
Previously the lit test update script interpreted module names as the names of
import items and export names as the names of export items, but it is more
precise to use the actual identifiers of the imported or exported items as the
names instead.
Update update_lit_checks.py to use a more correct regex to match names and to
correctly use the identifiers of import and export items as their names. In some
cases this can improve the readability of test output.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We previously supported a non-standard `(func "name" ...` syntax for declaring
functions exported with the quoted name. Since that is not part of the standard
text format, drop support for it, replacing it with the standard `(func $name
(export "name") ...` syntax instead.
Also replace our other usage of the quoted form in our text output, which was
where we quoted names containing characters that are not allowed to appear in
standard names. To handle that case, adjust our output from `"$name"` to
`$"name"`, which is the standards-track way of supporting such names. Also fix
how we detect non-standard name characters to match the spec.
Update the lit test output generation script to account for these changes,
including by making the `$` prefix on names mandatory. This causes the script to
stop interpreting declarative element segments with the `(elem declare ...`
syntax as being named "declare", so prevent our generated output from regressing
by counting "declare" as a name in the script.
|
|
|
|
|
| |
Parse `tuple.make`, `tuple.extract`, and `tuple.drop`. Also slightly improve the
way we break up tuples into individual elements in IRBuilder by using a
`local.tee` instead of a block containing a `local.set` and `local.get`.
|
|
|
|
|
|
|
|
|
| |
In Binaryen IR, we allow single `Drop` expressions to drop multiple values
packaged up as a tuple. When using IRBuilder to rebuild IR containing such a
drop, it previously treated the drop as a normal WebAssembly drop that dropped
only a single value, producing invalid IR that had extra, undropped values. Fix
the problem by preserving the arity of `Drop` inputs in IRBuilder. To avoid
bloating the IR, thread the size of the desired value through IRBuilder's pop
implementation so that tuple values do not need to be split up and recombined.
|
|
|
|
| |
Existing convention uses _@once@_ but we also use @ for class separation.
It is cleaner&more future proof to use something other convention like _<once>_.
|
|
|
|
|
|
|
|
| |
Once support for tuple.extract lands in the new WAT parser, this arity immediate
will let the parser determine how many values it should pop off the stack to
serve as the tuple operand to `tuple.extract`. This will usually coincide with
the arity of a tuple-producing instruction on top of the stack, but in the
spirit of treating the input as a proper stack machine, it will not have to and
the parser will still work correctly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch puts a new guardrail that will only hoist the field
if it is initialized with the owner class.
The constant hoisting optimization in J2CL pass relies on the
assumption that clinit that will initialize the field will be
executed before the read of the field. That means the field
that is optimized is within the same class:
class Foo {
public static final Object field = new Object();
}
Although it is possible to observe the initial value, that is
not intention of the developer (which the point of the
optimization).
However can also see a similar pattern in following:
class Foo {
public static Object field;
}
class Zoo {
static {
Foo.field = new Object();
}
}
Currently the pass also optimizes it as well since the field
is only initialized once and by a clinit. However Zoo clinit
is not guaranteed to be run before Foo.field access so it is
less safe to speculate on the intention of the developer here
hence it is not worth the risk.
FWIW, we haven't seen this issue. But this is something we
are also guarding in Closure Compiler so I decided it is
worthwhile to do here as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We previously overloaded `drop` to mean both normal drops of single values and
also drops of tuple values. That works fine in the legacy text parser since it
can infer parent-child relationships directly from the s-expression structure of
the input, so it knows that a drop should drop an entire tuple if the
tuple-producing instruction is a child of the drop. The new text parser,
however, is much more like the binary parser in that it uses instruction types
to create parent-child instructions. The new parser always assumes that `drop`
is meant to drop just a single value because that's what it does in WebAssembly.
Since we want to continue to let `Drop` IR expressions consume tuples, and since
we will need a way to write tests for that IR pattern that work with the new
parser, introduce a new pseudoinstruction, `tuple.drop`, to represent drops of
tuples. This pseudoinstruction only exists in the text format and it parses to
normal `Drop` expressions. `tuple.drop` takes the arity of its operand as an
immediate, which will let the new parser parse it correctly in the future.
|
|
|
|
|
|
|
|
|
|
| |
Previously, the number of tuple elements was inferred from the number of
s-expression children of the `tuple.make` expression, but that scheme would not
work in the new wat parser, where s-expressions are optional and cannot be
semantically meaningful.
Update the text format to take the number of tuple elements (i.e. the tuple
arity) as an immediate. This new format will be able to be implemented in the
new parser as follow-on work.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR creates a new pass to optimize J2CL specific patterns
that would otherwise difficult to recognize/prove generically
by other binaryen passes.
The pass currently handles fields what we call as "constant-like".
These fields are fields initialized once and unconditionally through
"clinit" function and technically they do have 2 observable states;
- initial null/0 state
- initialized state.
However you can only observe initial null/0 state in contrived examples,
not in real world/correct applications.
This pass moves such "clinit" initialized fields to global initialization.
Above pattern also matches other lazy init construct like String and Class
literals (which binaryen already reduces to constant expressions). So
the pass is generalized to include them as well. (by matching any functions
with the name pattern "_@once_")
In order for this pass to be effective:
1. It needs to run between O3 passes
2. We need to stop inlining of "once" functions.
Stopping inlining of the once functions are important to preserve their
structure. This both helps existing OnceReducer pass and new J2CL pass to
be a lot more effective. Also it is not useful to inline these functions
as by defintion they only executed once. This could be achieved by passing
no-inline filter.
Although the inlining is generally disabled for these functions, it is
still needed for some cases since inliner is effectively responsible for
removal of the once functions that are simplified into empty or simple
delegating functions. For this reason, the pass will rename such trivial
function so no-inline filter will no longer match them.
Also note that after all optimizations completed, it does make sense to
have a final stage where the "partial inline" of all once functions are
allowed. This will speed them up by moving the initialization check to
call-site.
|
|
|
|
| |
Those fields should be copied together with all the rest of the metadata that
already is. This was just missed in the prior PR.
|
|
|
|
| |
We don't have `*.fromasm` files anymore. Also `BIN_DIR` and
`WATERFALL_BUILD_DIR` variables don't seem to be used as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mixes up the number of results, params, and operands used in
call_indirect instructions that are outlined. Also adds a function
identifier to the original call_indirect test to improve test output
readability.
Reviewers: tlively
Reviewed By: tlively
Pull Request: https://github.com/WebAssembly/binaryen/pull/6152
|
|
|
|
|
|
|
|
|
|
| |
Adds support for the loop instruction to be outlined and a test showing a repeat loop being outlined.
Reviewers: tlively
Reviewed By: tlively
Pull Request: https://github.com/WebAssembly/binaryen/pull/6141
|
|
|
|
|
|
|
|
|
|
| |
Changes the controlFlowQueue used in stringify-walker to push values of Expression*, This ensures that we walk the Wasm module in the same order, regardless of whether the control flow expression is outlined.
Reviewers: tlively
Reviewed By: tlively
Pull Request: https://github.com/WebAssembly/binaryen/pull/6139
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Any function can now be annotated as not to be inlined fully (normally) or not to be
inlined partially. In the future we'll want to read those annotations from the proposed
wasm metadata section on code hints, and from wat text as well, but for now add
trivial passes that set those fields based on function name wildcards, e.g.:
--no-inline=*leave-alone* --inlining
That will not inline any function whose name contains "leave-alone" in the name.
--no-inline disables all inlining (full or partial) while --no-full-inline and
--no-partial-inline affect only full or partial inlining.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A trivial call is something like a function that just calls another immediately,
function foo(x, y) {
return bar(y, 15);
}
We can inline those and expect to benefit in most cases, though we might
increase code size slightly. Hence it makes sense to inline such cases, even
though in general we are careful and do not inline functions with calls in
them; a "trampoline" like that likely has most of the work in the call itself,
which we can avoid by inlining.
Suggested based on findings in Java.
|
|
|
| |
Adds support for call_indirect to wasm-ir-builder. Tests this works by outlining a sequence including call_indirect.
|
|
|
| |
Adds two tests, creates an outlined function that returns a single value and creates an outlined function that returns multivalue.
|
|
|
| |
Adds tests that ensure outlining is skipping repeat sequences that include local.get, local.set, br, and return instructions.
|
|
|
|
|
|
|
|
|
|
|
| |
Besides If, no control flow structure consumes values from the stack. Fix a
bug in IRBuilder that was causing it to pop control flow children. Also fix a
follow on bug in outlining where it did not make the If condition available on
the stack when starting to visit an If. This required making push() part of
the public API of IRBuilder.
As a drive-by, also add helpful debug logging to IRBuilder.
Co-authored-by: Ashley Nelson <nashley@google.com>
|
| |
|
|
|
|
|
|
| |
Checking a couple of testing TODOs off and adding more tests of the outlining pass for outlining:
- a sequence at the beginning of an existing function
- a sequence that is outlined into a function that takes no arguments
- multiple sequences from the same source function into different outlined functions
|
|
|
|
|
|
|
| |
Finish the transfer functions for all expressions except for string
instructions, exception handling instructions, tuple instructions, and branch
instructions that carry values. The latter require more work in the CFG builder
because dropping the extra stack values happens after the branch but before the
target block.
|
|
|
|
|
|
| |
call.without.effects implies a call to the function reference in the last parameter,
so the values sent in the other parameters must be taken into account when
computing LUBs for refining arguments, otherwise we might refine so much that
the intrinsic call no longer validates.
|
|
|
|
|
|
|
|
| |
We had an assert there that was wrong. In fact the assert is just in one of two code paths,
and an optional one: the end situation is we have an expression and a constant to add to it,
and the assert was in the case that the expression is a Const so we can do the add at
compile time (the other code path does the add at runtime). This code path is optional as
Precompute would do such compile-time addition anyhow, but it is nice to fix and leave that
path so that this pass emits fully optimal code.
|
|
|
| |
Adds an outlining pass that performs outlining on a module end to end, and two tests.
|
|
|
|
| |
Also add testcases to be comprehensive and notice changes if we ever decide to
modify that behavior.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
interactions with a parent (#6089)
We had a simple rule that if we reach an expression twice then we give up, which makes
sense for say a block: if one allocation flows out of it, then another can't - it would get
mixed in with the other one, which is a case we don't optimize. However, there are
cases where a parent has multiple children and different interactions with them, like
a struct.set: the reference child does not escape, but the value child does. Before this
PR if we reached the value child first, we'd mark the parent as seen, and then the reference
child would see it isn't the first to get here, and not optimize.
To fix this, reorder the code to handle this case. The manner of interaction between the
child and the parent decides whether we mark the parent as seen and to be further
avoided.
Noticed by the determinism fuzzer, since the order of analysis mattered here.
|