| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass finds all string.const and creates globals for them. After this transform, no
string.const appears anywhere but in a global, and each string appears in one global
which is then global.get-ed everywhere.
This avoids overhead in VMs where executing a string.const is an allocation, and is
also a good step towards imported strings. For that, this pass will be extended from
gathering to a full lowering pass, which will first gather into globals as this pass does,
and then turn each of those globals with a string.const into an imported externref.
(For that reason this pass is in a file called StringLowering, as the two passes will
share much of their code, and the larger pass should decide the name I think.)
This pass runs in -O2 and above. Repeated executions have no downside (see
details in code).
|
|
|
|
|
|
| |
The previous name feels too verbose and unwieldy.
This also removes the "new-to-old EH" placeholder. I think it'd be
better to add it back when it is actually added.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This translates the old Phase 3 EH instructions, which include `try`,
`catch`, `catch_all`, `delegate`, and `rethrow`, into the new EH
instructions, which include `try_table` (with `catch` / `catch_ref` /
`catch_all` / `catch_all_ref`) and `throw_ref`, passed at the Oct 2023
CG meeting.
This translator can be used as a standalone tool by users of the
previous EH toolchain to generate binaries for the new spec without
recompiling, and also can be used at the end of the Binaryen pipeline to
produce binaries for the new spec while the end-to-end toolchain
implementation for the new spec is in progress.
While the goal of this pass is not optimization, this tries to a little
better than the most naive implementation, namely by omitting a few
instructions where possible and trying to minimize the number of
additional locals, because this can be used as a standalone translator
or the last stage of the pipeline while we can't post-optimize the
results because the whole pipeline (-On) is not ready for the new EH.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR creates a new pass to optimize J2CL specific patterns
that would otherwise difficult to recognize/prove generically
by other binaryen passes.
The pass currently handles fields what we call as "constant-like".
These fields are fields initialized once and unconditionally through
"clinit" function and technically they do have 2 observable states;
- initial null/0 state
- initialized state.
However you can only observe initial null/0 state in contrived examples,
not in real world/correct applications.
This pass moves such "clinit" initialized fields to global initialization.
Above pattern also matches other lazy init construct like String and Class
literals (which binaryen already reduces to constant expressions). So
the pass is generalized to include them as well. (by matching any functions
with the name pattern "_@once_")
In order for this pass to be effective:
1. It needs to run between O3 passes
2. We need to stop inlining of "once" functions.
Stopping inlining of the once functions are important to preserve their
structure. This both helps existing OnceReducer pass and new J2CL pass to
be a lot more effective. Also it is not useful to inline these functions
as by defintion they only executed once. This could be achieved by passing
no-inline filter.
Although the inlining is generally disabled for these functions, it is
still needed for some cases since inliner is effectively responsible for
removal of the once functions that are simplified into empty or simple
delegating functions. For this reason, the pass will rename such trivial
function so no-inline filter will no longer match them.
Also note that after all optimizations completed, it does make sense to
have a final stage where the "partial inline" of all once functions are
allowed. This will speed them up by moving the initialization check to
call-site.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Any function can now be annotated as not to be inlined fully (normally) or not to be
inlined partially. In the future we'll want to read those annotations from the proposed
wasm metadata section on code hints, and from wat text as well, but for now add
trivial passes that set those fields based on function name wildcards, e.g.:
--no-inline=*leave-alone* --inlining
That will not inline any function whose name contains "leave-alone" in the name.
--no-inline disables all inlining (full or partial) while --no-full-inline and
--no-partial-inline affect only full or partial inlining.
|
|
|
| |
See #6088
|
|
|
| |
Adds an outlining pass that performs outlining on a module end to end, and two tests.
|
|
|
|
|
|
|
|
| |
Because we currently strip some data segments (i.e. EM_JS strings)
during `--post-emscripten` this is too late as `--separate-data-segments`
always runs in `wasm-emscripten-finalize`.
Once emscripten switches over to using the pass directly we can remove
the support from `wasm-emscripten-finalize`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new pass that analyzes the module to find the minimal subtyping relation
that is necessary to maintain the validity and semantics of the program and
rewrites the types to use this minimal relation. Besides eliminating references
to otherwise-unused intermediate types, this optimization should unlock
significant additional optimizing power in other type optimizations that are
constrained by having to maintain supertype validity, since after this new
optimization there are fewer and more general supertypes.
The analysis works by visiting each expression and module element to collect the
subtypings that are required to maintain its validity, then, using that as a
starting point, iteratively adding new subtypings required by type definitions
and casts until reaching a fixed point.
|
|
|
|
|
|
|
| |
This PR is part of a series that adds basic support for the [typed continuations
proposal](https://github.com/wasmfx/specfx).
This particular PR simply extends `FeatureSet` with a corresponding entry for
this proposal.
|
|
|
|
|
|
|
|
| |
This fixes some outdated comments and typos in Asyncify and improves
some other comments. This tries to make code comments more readable by
making them more accurate and also by using the three state (normal,
unwinding, and rewinding) consistently.
Drive-by fix: Typo fixes in SimplifyGlobals and wasm-reduce option.
|
|
|
|
|
|
|
|
|
| |
TypeFinalization finalizes all types that we can, that is, all private types that have no
children. TypeUnFinalization unfinalizes (opens) all (private) types.
These could be used by first opening all types, optimizing, and then finalizing, as that
might find more opportunities.
Fixes #5933
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases tuples are obviously not needed, such as when they are only used
in local operations and make/extract. Such tuples are not used as return values or
in control flow structures, so we might as well lower them to individual locals per
lane, which other passes can optimize a lot better.
I believe LLVM does the same with its own tuples: it lowers them as much as
possible, leaving only necessary ones.
Fixes #5923
|
|
|
|
|
| |
Now that the WasmGC spec has settled on a way of validating non-nullable locals,
we no longer need this experimental feature that allowed nonstandard uses of
non-nullable locals.
|
|
|
| |
Renaming the multimemory flag in Binaryen to match its naming in LLVM.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GUFA refines existing casts, but does not add new casts for fear of increasing code size
and adding more cast operations at runtime. This PR adds a version that does add all
those casts, and it looks like at least code size improves rather than regresses, at least
on J2Wasm and Kotlin. That is, this pass adds a lot more casts, but subsequent
optimizations benefit enough to shrink overall code size.
However, this may still not be worthwhile, as even if code size decreases we may end
up doing more casts at runtime, and those casts might be hard to remove, e.g.:
(call $foo
(x) ;; inferred to be non-null
)
(func $foo (param (ref null $A)
=>
(call $foo
(ref.cast $A (x) ;; add a cast here
)
(func $foo (param (ref $A) ;; later pass refines here
That new cast cannot be removed after we refine the function parameter. If the
function never benefits from the fact that the input is non-null, then the cast is
wasted work (e.g. if the function only compares the input to another value).
To use this new pass, try --gufa-cast-all rather than --gufa. As with normal GUFA,
running the full optimizer afterwards is important, and even more important in
order to get rid of as many of the new casts as possible.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass strips all EH stuff, including EH instructions and tags, from
the input module and disables the EH feature from the features section.
1. This removes `catch` and `catch_all` blocks from the code. So
```wast
(try
(do
(some code)
)
(catch
...
)
)
```
becomes just `(some code)`. Note that all `rethrow`s will be removed
with `catch`es. Note that all `rethrow`s will be removed with `catch`es.
2. This converts 'throw (...)` into `unreachable`. Note that `rethrows
3. This removes all tags from the module, which are unused anyway after
1 and 2.
4. This removes exception handling feature from the features section.
You can use the pass with
```console
$ wasm-opt --enable-exception-handling --strip-eh INPUT -o OUTPUT
```
This is not an optimization pass, so it is not run unless you specify
the pass explicitly.
This is in effect similar to Clang's `-fignore-exceptions`, in which you
can throw but it will result in a crash and we compile away all landing
pads. This can be used for people who don't (or can't) use
`-fignore-exceptions` in their build settings or who want to compile
away `catch` blocks later.
Closes emscripten-core/emscripten#19585.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to have a wasm-merge tool but removed it for a lack of use cases. Recently
use cases have been showing up in the wasm GC space and elsewhere, as people are
using more diverse toolchains together, for example a project might build some C++
code alongside some wasm GC code. Merging those wasm files together can allow
for nice optimizations like inlining and better DCE etc., so it makes sense to have a
tool for merging.
Background:
* Removal: #1969
* Requests:
* wasm-merge - why it has been deleted #2174
* Compiling and linking wat files #2276
* wasm-link? #2767
This PR is a compete rewrite of wasm-merge, not a restoration of the original
codebase. The original code was quite messy (my fault), and also, since then
we've added multi-memory and multi-table which makes things a lot simpler.
The linking semantics are as described in the "wasm-link" issue #2767 : all we do
is merge normal wasm files together and connect imports and export. That is, we
have a graph of modules and their names, and each import to a module name can
be resolved to that module. Basically, like a JS bundler would do for JS, or, in other
words, we do the same operations as JS code would do to glue wasm modules
together at runtime, but at compile time. See the README update in this PR for a
concrete example.
There are no plans to do more than that simple bundling, so this should not
really overlap with wasm-ld's use cases.
This should be fairly fast as it works in linear time on the total input code. However,
it won't be as fast as wasm-ld, of course, as it does build Binaryen IR for each
module. An advantage to working on Binaryen IR is that we can easily do some
global DCE after merging, and further optimizations are possible later.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Disable sign extension in SignExtLowering.cpp
The sign extension lowering pass would previously lower away the sign extension
instructions, but it wouldn't disable the sign extension feature, so follow-on
passes such as optimize-instructions could reintroduce sign extension
instructions.
Fix the pass to disable the sign extension feature to prevent sign extension
instructions from being reintroduced later.
* update pass description
|
|
|
|
|
| |
After this change, the only type system usable from the tools will be the
standard isorecursive type system. The nominal type system is still usable via
the API, but it will be removed entirely in a follow-on PR.
|
|
|
|
|
|
|
|
|
|
| |
(#5510)
Simply loop over the values and use tuple.make.
This also adds a lit test for ctor-eval. I found that the problem blocking us
before was the logging, which confuses the update script. As this test at
least does not require that logging, this PR adds a --quiet flag that
disables the logging, and then a lit test just works.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a type hierarchy has abstract classes in the middle, that is, types that
are never instantiated, then we can optimize casts and other operations
to them. Say in Java that we have `AbstractList`, and it only has one
subclass `IntList` that is ever created, then any place we have an `AbstractList`
we must actually have an `IntList`, or a null. (Or, if no subtype is instantiated,
then the value must definitely be a null.)
The actual implementation does a type mapping, that is, it finds all places
using an abstract type and makes them refer to the single instantiated
subtype (or null). After that change, no references to the abstract type
remain in the program, so this both refines types and also cleans up the
type section.
|
|
|
|
|
|
|
|
| |
For example,
-O3 --skip-pass=vacuum
will run -O3 normally but it will not run the vacuum pass at all
(which normally runs more than once in -O3).
|
|
|
|
|
|
|
|
|
|
| |
When using JSPI with wasm-split, any calls to secondary module functions
will now first check a global to see if the module is loaded. If not
loaded it will call a JSPI'ed function that will handle loading module.
The setup is split into the JSPI pass and wasm-split tool since the JSPI
pass is first run by emscripten and we need to JSPI'ify the load secondary
module function. wasm-split then injects all the checks and calls to the
load function.
|
|
|
|
| |
This allows tools like wasm-reduce to be told to operate in closed-world mode. That
lets them validate in the more strict way of that mode.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The type rewriting utility in type-updating.cpp gathers all the used heap types,
then rewrites them to newly built and possibly modified heap types. The problem
is that for the isorecursive type system, the set of "used" heap types was
overly broad because it also included unused heap types that are in a rec group
with used types. In the context of emitting a binary, it is important to treat
these types as used because failing to emit them would change the identity of
the used types, but in the context of type optimizations it is ok to treat them
as truly unused because we are changing type identities anyway.
Update the type rewriting utility to only include truly used types in the set of
output types. This causes all existing type optimizations to implicitly drop
unused types, but only if they find any other optimizations to do and actually
run the rewriter utitility. Their output will also still include unused types
that were used before their optimizations were applied.
To overcome these limitations and better match the optimizing power of nominal
mode, which never includes unused types in the output, add a new type
optimization pass that removes unused types and does nothing else and run it
near the end of the global optimization pipeline.
|
|
|
| |
Per the wasm spec guidelines for Load (rule 10) & Store (rule 12), this PR adds an option for bounds checking, producing a runtime error if the instruction exceeds the bounds of the particular memory within the combined memory.
|
|
|
|
|
|
|
|
| |
This finds types that can be merged into their super: types that add no
fields, and are not used in casts, etc. - so we might as well use the super.
This complements TypeSSA, in that it can merge back the new types that
TypeSSA created, if we never found a use for them. Without this, TypeSSA
can bloat binary size quite a lot (I see 10-20%).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This creates new nominal types for each (interesting) struct.new. That then allows
type-based optimizations to be more precise, as those optimizations will track
separate info for each struct.new, in effect. That is kind of like SSA, however, we
do not handle merges. For example:
x = struct.new $A (5);
print(x.value);
y = struct.new $A (11);
print(y.value);
// => //
x = struct.new $A.x (5);
print(x.value);
y = struct.new $A.y (11);
print(y.value);
After the pass runs each of those struct.new creates a unique type, and type-based
analysis can see that 5 or 11 are the only values written in that type (if nothing else
writes there).
This bloats the type section with the new subtypes, so it is best used with a pass
to merge unneeded duplicate types, which a later PR will add. That later PR will
exactly merge back in the types created here, which are nominally different but
indistinguishable otherwise.
This pass is not enabled by default. It's not clear yet where is the best place to do it,
as it must be balanced by type merging, but it might be better to do multiple
rounds of optimization between the two. Needs more investigation.
|
|
|
| |
The flag does nothing so far.
|
|
|
|
| |
Equirecursive is no longer standards track and its implementation is extremely
complex. Remove it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(some.operation
(ref.cast .. (local.get $ref))
(local.get $ref)
)
=>
(some.operation
(local.tee $temp
(ref.cast .. (local.get $ref))
)
(local.get $temp)
)
This can help cases where we cast for some reason but happen to not use the
cast value in all places. This occurs in j2wasm in itable calls sometimes: The
this pointer is is refined, but the itable may be done with an unrefined pointer,
which is less optimizable.
So far this is just inside basic blocks, but that is enough for the cast of itable
calls and other common patterns I see.
|
|
|
|
| |
Fixes #5250
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Monomorphization finds cases where we send more refined types to a function
than it declares. In such cases we can copy the function and refine the parameters:
// B is a subtype of A
foo(new B());
function foo(x : A) { ..}
=>
foo_B(new B()); // call redirected to refined copy
function foo(x : A) { ..} // unchanged
function foo_B(x : B) { ..} // refined copy
This increases code size so it may not be worth it in all cases. This initial PR is
hopefully enough to start experimenting with this on performance, and so it does
not enable the pass by default.
This adds two variations of monomorphization, one that always does it, and the
default which is "careful": it sees whether monomorphizing lets the refined function
actually be better than the original (say, by removing a cast). If there is no
improvement then we do not make any changes. This saves a significant amount
of code size - on j2wasm the careful version increases by 13% instead of 20% -
but it does run more slowly obviously.
|
|
|
|
|
|
|
|
|
| |
This sorts globals by their usage (and respecting dependencies). If the module
has very many globals then using smaller LEBs can matter.
If there are fewer than 128 globals then we cannot reduce size, and the pass
exits early (so this pass will not slow down MVP builds, which usually have just
1 global, the stack pointer). But with wasm GC it is common to use globals for
vtables etc., and often there is a very large number of them.
|
|
|
|
|
|
|
|
|
|
| |
Adds a multi-memories lowering pass that will create a single combined memory from the memories added to the module. This pass assumes that each memory is configured the same (type, shared).
This pass also:
- replaces existing memory.size instructions with a custom function that returns the size of each memory as if they existed independently
- replaces existing memory.grow instructions with a custom function, using global offsets to track the page size of each memory so data doesn't overlap in the singled combined memory
- adjusts the offsets of active data segments
- adjusts the offsets of Loads/Stores
|
|
|
|
|
| |
This allows a three-step upgrade process where binaryen is updated with this
change, then users remove their use of these flags, then binaryen can remove the
flags permanently.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a map of function name => the effects of that function to the
PassOptions structure. That lets us compute those effects once and then
use them in multiple passes afterwards. For example, that lets us optimize
away a call to a function that has no effects:
(drop (call $nothing))
[..]
(func $nothing
;; .. lots of stuff but no effects, only a returned value ..
)
Vacuum will remove that dropped call if we tell it that the called function has
no effects. Note that a nice result of adding this to the PassOptions struct
is that all passes will use the extra info automatically.
This is not enabled by default as the benefits seem rather minor, though it
does help in a small but noticeable way on J2Wasm code, where we use
call.without.effects and have situations like this:
(func $foo
(call $bar)
)
(func $bar
(call.without.effects ..)
)
The call to bar looks like it has effects, normally, but with global effect info
we know it actually doesn't.
To use this, one would do
--generate-global-effects [.. some passes that use the effects ..] --discard-global-effects
Discarding is not necessary, but if there is a pass later that adds effects, then not
discarding could lead to bugs, since we'd think there are fewer effects than there are.
(However, normal optimization passes never add effects, only remove them.)
It's also possible to call this multiple times:
--generate-global-effects -O3 --generate-global-effects -O3
That computes affects after the first -O3, and may find fewer effects than earlier.
This doesn't compute the full transitive closure of the effects across functions. That is,
when computing a function's effects, we don't look into its own calls. The simple case
so far is enough to handle the call.without.effects example from before (though it
may take multiple optimization cycles).
|
|
|
| |
Adds an --in-secondary-memory switch to the wasm-split tool that allows profile data to be stored in a separate memory from module main memory. With this option, users do not need to reserve the initial memory region for profile data and the data can be shared between multiple threads.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In practice typed function references will not ship before GC and is not
independently useful, so it's not necessary to have a separate feature for it.
Roll the functionality previously enabled by --enable-typed-function-references
into --enable-gc instead.
This also avoids a problem with the ongoing implementation of the new GC bottom
heap types. That change will make all ref.null instructions in Binaryen IR refer
to one of the bottom heap types. But since those bottom types are introduced in
GC, it's not valid to emit them in binaries unless unless GC is enabled. The fix
if only reference types is enabled is to emit (ref.null func) instead
of (ref.null nofunc), but that doesn't always work if typed function references
are enabled because a function type more specific than func may be required.
Getting rid of typed function references as a separate feature makes this a
nonissue.
|
|
|
|
|
|
|
| |
Add a pass that wraps all imports and exports with functions that handle
storing and passing along the suspender externref needed for JSPI.
https://github.com/WebAssembly/js-promise-integration/blob/main/proposals/js-promise-integration/Overview.md
|
|
|
| |
Adding multi-memories to the the list of wasm-features.
|
|
|
|
| |
This is no longer needed by emscripten as of:
https://github.com/emscripten-core/emscripten/pull/16529
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are several reasons why a function may not be trained in deterministically.
So to perform quick validation we need to inspect profile.data (another ways requires split to be performed). However as profile.data is a binary file and is not self sufficient, so we cannot currently use it to perform such validation.
Therefore to allow quick check on whether a particular function has been trained in, we need to dump profile.data in a more readable format.
This PR, allows us to output, the list of functions to be kept (in main wasm) and those split functions (to be moved to deferred.wasm) in a readable format, to console.
Added a new option `--print-profile`
- input path to orig.wasm (its the original wasm file that will be used later during split)
- input path to profile.data that we need to output
optionally pass `--unescape`
to unescape the function names
Usage:
```
binaryen\build>bin\wasm-split.exe test\profile_data\MY.orig.wasm --print-profile=test\profile_data\profile.data > test\profile_data\out.log
```
note: meaning of prefixes
`+` => fn to be kept in main wasm
`-` => fn to be split and moved to deferred wasm
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This tracks the possible contents in the entire program all at once using a single IR.
That is in contrast to say DeadArgumentElimination of LocalRefining etc., all of whom
look at one particular aspect of the program (function params and returns in DAE,
locals in LocalRefining). The cost is to build up an entire new IR, which takes a lot
of new code (mostly in the already-landed PossibleContents). Another cost
is this new IR is very big and requires a lot of time and memory to process.
The benefit is that this can find opportunities that are only obvious when looking
at the entire program, and also it can track information that is more specialized
than the normal type system in the IR - in particular, this can track an ExactType,
which is the case where we know the value is of a particular type exactly and not
a subtype.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Implement the basic infrastructure for the full WAT parser with just enough
detail to parse basic modules that contain only imported globals. Parsing
functions correspond to elements of the grammar in the text specification and
are templatized over context types that correspond to each phase of parsing.
Errors are explicitly propagated via `Result<T>` and `MaybeResult<T>` types.
Follow-on PRs will implement additional phases of parsing and parsing for new
elements in the grammar.
|
|
|
|
|
|
|
|
| |
We have some possible use cases for this pass, and so are restoring
it.
This reverts the removal in #3261, fixes compile errors in internal API
changes since then, and flips the direction of the stack for the
wasm backend.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimizes constants in the megamorphic case of two: when we
know two function references are possible, we could in theory emit this:
(select
(ref.func A)
(ref.func B)
(ref.eq
(..ref value..) ;; globally, only 2 things are possible here, and one has
;; ref.func A as its value, and the other ref.func B
(ref.func A))
That is, compare to one of the values, and emit the two possible values there.
Other optimizations can then turn a call_ref on this select into an if over
two direct calls, leading to devirtualization.
We cannot compare a ref.func directly (since function references are not
comparable), and so instead we look at immutable global structs. If we
find a struct type that has only two possible values in some field, and
the structs are in immutable globals (which happens in the vtable case
in j2wasm for example), then we can compare the references of the struct
to decide between the two values in the field.
|