| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new pass that analyzes the module to find the minimal subtyping relation
that is necessary to maintain the validity and semantics of the program and
rewrites the types to use this minimal relation. Besides eliminating references
to otherwise-unused intermediate types, this optimization should unlock
significant additional optimizing power in other type optimizations that are
constrained by having to maintain supertype validity, since after this new
optimization there are fewer and more general supertypes.
The analysis works by visiting each expression and module element to collect the
subtypings that are required to maintain its validity, then, using that as a
starting point, iteratively adding new subtypings required by type definitions
and casts until reaching a fixed point.
|
|
|
|
|
|
|
| |
This PR is part of a series that adds basic support for the [typed continuations
proposal](https://github.com/wasmfx/specfx).
This particular PR simply extends `FeatureSet` with a corresponding entry for
this proposal.
|
|
|
|
|
|
|
|
| |
This fixes some outdated comments and typos in Asyncify and improves
some other comments. This tries to make code comments more readable by
making them more accurate and also by using the three state (normal,
unwinding, and rewinding) consistently.
Drive-by fix: Typo fixes in SimplifyGlobals and wasm-reduce option.
|
|
|
|
|
|
|
|
|
| |
TypeFinalization finalizes all types that we can, that is, all private types that have no
children. TypeUnFinalization unfinalizes (opens) all (private) types.
These could be used by first opening all types, optimizing, and then finalizing, as that
might find more opportunities.
Fixes #5933
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases tuples are obviously not needed, such as when they are only used
in local operations and make/extract. Such tuples are not used as return values or
in control flow structures, so we might as well lower them to individual locals per
lane, which other passes can optimize a lot better.
I believe LLVM does the same with its own tuples: it lowers them as much as
possible, leaving only necessary ones.
Fixes #5923
|
|
|
|
|
| |
Now that the WasmGC spec has settled on a way of validating non-nullable locals,
we no longer need this experimental feature that allowed nonstandard uses of
non-nullable locals.
|
|
|
| |
Renaming the multimemory flag in Binaryen to match its naming in LLVM.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GUFA refines existing casts, but does not add new casts for fear of increasing code size
and adding more cast operations at runtime. This PR adds a version that does add all
those casts, and it looks like at least code size improves rather than regresses, at least
on J2Wasm and Kotlin. That is, this pass adds a lot more casts, but subsequent
optimizations benefit enough to shrink overall code size.
However, this may still not be worthwhile, as even if code size decreases we may end
up doing more casts at runtime, and those casts might be hard to remove, e.g.:
(call $foo
(x) ;; inferred to be non-null
)
(func $foo (param (ref null $A)
=>
(call $foo
(ref.cast $A (x) ;; add a cast here
)
(func $foo (param (ref $A) ;; later pass refines here
That new cast cannot be removed after we refine the function parameter. If the
function never benefits from the fact that the input is non-null, then the cast is
wasted work (e.g. if the function only compares the input to another value).
To use this new pass, try --gufa-cast-all rather than --gufa. As with normal GUFA,
running the full optimizer afterwards is important, and even more important in
order to get rid of as many of the new casts as possible.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass strips all EH stuff, including EH instructions and tags, from
the input module and disables the EH feature from the features section.
1. This removes `catch` and `catch_all` blocks from the code. So
```wast
(try
(do
(some code)
)
(catch
...
)
)
```
becomes just `(some code)`. Note that all `rethrow`s will be removed
with `catch`es. Note that all `rethrow`s will be removed with `catch`es.
2. This converts 'throw (...)` into `unreachable`. Note that `rethrows
3. This removes all tags from the module, which are unused anyway after
1 and 2.
4. This removes exception handling feature from the features section.
You can use the pass with
```console
$ wasm-opt --enable-exception-handling --strip-eh INPUT -o OUTPUT
```
This is not an optimization pass, so it is not run unless you specify
the pass explicitly.
This is in effect similar to Clang's `-fignore-exceptions`, in which you
can throw but it will result in a crash and we compile away all landing
pads. This can be used for people who don't (or can't) use
`-fignore-exceptions` in their build settings or who want to compile
away `catch` blocks later.
Closes emscripten-core/emscripten#19585.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to have a wasm-merge tool but removed it for a lack of use cases. Recently
use cases have been showing up in the wasm GC space and elsewhere, as people are
using more diverse toolchains together, for example a project might build some C++
code alongside some wasm GC code. Merging those wasm files together can allow
for nice optimizations like inlining and better DCE etc., so it makes sense to have a
tool for merging.
Background:
* Removal: #1969
* Requests:
* wasm-merge - why it has been deleted #2174
* Compiling and linking wat files #2276
* wasm-link? #2767
This PR is a compete rewrite of wasm-merge, not a restoration of the original
codebase. The original code was quite messy (my fault), and also, since then
we've added multi-memory and multi-table which makes things a lot simpler.
The linking semantics are as described in the "wasm-link" issue #2767 : all we do
is merge normal wasm files together and connect imports and export. That is, we
have a graph of modules and their names, and each import to a module name can
be resolved to that module. Basically, like a JS bundler would do for JS, or, in other
words, we do the same operations as JS code would do to glue wasm modules
together at runtime, but at compile time. See the README update in this PR for a
concrete example.
There are no plans to do more than that simple bundling, so this should not
really overlap with wasm-ld's use cases.
This should be fairly fast as it works in linear time on the total input code. However,
it won't be as fast as wasm-ld, of course, as it does build Binaryen IR for each
module. An advantage to working on Binaryen IR is that we can easily do some
global DCE after merging, and further optimizations are possible later.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Disable sign extension in SignExtLowering.cpp
The sign extension lowering pass would previously lower away the sign extension
instructions, but it wouldn't disable the sign extension feature, so follow-on
passes such as optimize-instructions could reintroduce sign extension
instructions.
Fix the pass to disable the sign extension feature to prevent sign extension
instructions from being reintroduced later.
* update pass description
|
|
|
|
|
| |
After this change, the only type system usable from the tools will be the
standard isorecursive type system. The nominal type system is still usable via
the API, but it will be removed entirely in a follow-on PR.
|
|
|
|
|
|
|
|
|
|
| |
(#5510)
Simply loop over the values and use tuple.make.
This also adds a lit test for ctor-eval. I found that the problem blocking us
before was the logging, which confuses the update script. As this test at
least does not require that logging, this PR adds a --quiet flag that
disables the logging, and then a lit test just works.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a type hierarchy has abstract classes in the middle, that is, types that
are never instantiated, then we can optimize casts and other operations
to them. Say in Java that we have `AbstractList`, and it only has one
subclass `IntList` that is ever created, then any place we have an `AbstractList`
we must actually have an `IntList`, or a null. (Or, if no subtype is instantiated,
then the value must definitely be a null.)
The actual implementation does a type mapping, that is, it finds all places
using an abstract type and makes them refer to the single instantiated
subtype (or null). After that change, no references to the abstract type
remain in the program, so this both refines types and also cleans up the
type section.
|
|
|
|
|
|
|
|
| |
For example,
-O3 --skip-pass=vacuum
will run -O3 normally but it will not run the vacuum pass at all
(which normally runs more than once in -O3).
|
|
|
|
|
|
|
|
|
|
| |
When using JSPI with wasm-split, any calls to secondary module functions
will now first check a global to see if the module is loaded. If not
loaded it will call a JSPI'ed function that will handle loading module.
The setup is split into the JSPI pass and wasm-split tool since the JSPI
pass is first run by emscripten and we need to JSPI'ify the load secondary
module function. wasm-split then injects all the checks and calls to the
load function.
|
|
|
|
| |
This allows tools like wasm-reduce to be told to operate in closed-world mode. That
lets them validate in the more strict way of that mode.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The type rewriting utility in type-updating.cpp gathers all the used heap types,
then rewrites them to newly built and possibly modified heap types. The problem
is that for the isorecursive type system, the set of "used" heap types was
overly broad because it also included unused heap types that are in a rec group
with used types. In the context of emitting a binary, it is important to treat
these types as used because failing to emit them would change the identity of
the used types, but in the context of type optimizations it is ok to treat them
as truly unused because we are changing type identities anyway.
Update the type rewriting utility to only include truly used types in the set of
output types. This causes all existing type optimizations to implicitly drop
unused types, but only if they find any other optimizations to do and actually
run the rewriter utitility. Their output will also still include unused types
that were used before their optimizations were applied.
To overcome these limitations and better match the optimizing power of nominal
mode, which never includes unused types in the output, add a new type
optimization pass that removes unused types and does nothing else and run it
near the end of the global optimization pipeline.
|
|
|
| |
Per the wasm spec guidelines for Load (rule 10) & Store (rule 12), this PR adds an option for bounds checking, producing a runtime error if the instruction exceeds the bounds of the particular memory within the combined memory.
|
|
|
|
|
|
|
|
| |
This finds types that can be merged into their super: types that add no
fields, and are not used in casts, etc. - so we might as well use the super.
This complements TypeSSA, in that it can merge back the new types that
TypeSSA created, if we never found a use for them. Without this, TypeSSA
can bloat binary size quite a lot (I see 10-20%).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This creates new nominal types for each (interesting) struct.new. That then allows
type-based optimizations to be more precise, as those optimizations will track
separate info for each struct.new, in effect. That is kind of like SSA, however, we
do not handle merges. For example:
x = struct.new $A (5);
print(x.value);
y = struct.new $A (11);
print(y.value);
// => //
x = struct.new $A.x (5);
print(x.value);
y = struct.new $A.y (11);
print(y.value);
After the pass runs each of those struct.new creates a unique type, and type-based
analysis can see that 5 or 11 are the only values written in that type (if nothing else
writes there).
This bloats the type section with the new subtypes, so it is best used with a pass
to merge unneeded duplicate types, which a later PR will add. That later PR will
exactly merge back in the types created here, which are nominally different but
indistinguishable otherwise.
This pass is not enabled by default. It's not clear yet where is the best place to do it,
as it must be balanced by type merging, but it might be better to do multiple
rounds of optimization between the two. Needs more investigation.
|
|
|
| |
The flag does nothing so far.
|
|
|
|
| |
Equirecursive is no longer standards track and its implementation is extremely
complex. Remove it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(some.operation
(ref.cast .. (local.get $ref))
(local.get $ref)
)
=>
(some.operation
(local.tee $temp
(ref.cast .. (local.get $ref))
)
(local.get $temp)
)
This can help cases where we cast for some reason but happen to not use the
cast value in all places. This occurs in j2wasm in itable calls sometimes: The
this pointer is is refined, but the itable may be done with an unrefined pointer,
which is less optimizable.
So far this is just inside basic blocks, but that is enough for the cast of itable
calls and other common patterns I see.
|
|
|
|
| |
Fixes #5250
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Monomorphization finds cases where we send more refined types to a function
than it declares. In such cases we can copy the function and refine the parameters:
// B is a subtype of A
foo(new B());
function foo(x : A) { ..}
=>
foo_B(new B()); // call redirected to refined copy
function foo(x : A) { ..} // unchanged
function foo_B(x : B) { ..} // refined copy
This increases code size so it may not be worth it in all cases. This initial PR is
hopefully enough to start experimenting with this on performance, and so it does
not enable the pass by default.
This adds two variations of monomorphization, one that always does it, and the
default which is "careful": it sees whether monomorphizing lets the refined function
actually be better than the original (say, by removing a cast). If there is no
improvement then we do not make any changes. This saves a significant amount
of code size - on j2wasm the careful version increases by 13% instead of 20% -
but it does run more slowly obviously.
|
|
|
|
|
|
|
|
|
| |
This sorts globals by their usage (and respecting dependencies). If the module
has very many globals then using smaller LEBs can matter.
If there are fewer than 128 globals then we cannot reduce size, and the pass
exits early (so this pass will not slow down MVP builds, which usually have just
1 global, the stack pointer). But with wasm GC it is common to use globals for
vtables etc., and often there is a very large number of them.
|
|
|
|
|
|
|
|
|
|
| |
Adds a multi-memories lowering pass that will create a single combined memory from the memories added to the module. This pass assumes that each memory is configured the same (type, shared).
This pass also:
- replaces existing memory.size instructions with a custom function that returns the size of each memory as if they existed independently
- replaces existing memory.grow instructions with a custom function, using global offsets to track the page size of each memory so data doesn't overlap in the singled combined memory
- adjusts the offsets of active data segments
- adjusts the offsets of Loads/Stores
|
|
|
|
|
| |
This allows a three-step upgrade process where binaryen is updated with this
change, then users remove their use of these flags, then binaryen can remove the
flags permanently.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a map of function name => the effects of that function to the
PassOptions structure. That lets us compute those effects once and then
use them in multiple passes afterwards. For example, that lets us optimize
away a call to a function that has no effects:
(drop (call $nothing))
[..]
(func $nothing
;; .. lots of stuff but no effects, only a returned value ..
)
Vacuum will remove that dropped call if we tell it that the called function has
no effects. Note that a nice result of adding this to the PassOptions struct
is that all passes will use the extra info automatically.
This is not enabled by default as the benefits seem rather minor, though it
does help in a small but noticeable way on J2Wasm code, where we use
call.without.effects and have situations like this:
(func $foo
(call $bar)
)
(func $bar
(call.without.effects ..)
)
The call to bar looks like it has effects, normally, but with global effect info
we know it actually doesn't.
To use this, one would do
--generate-global-effects [.. some passes that use the effects ..] --discard-global-effects
Discarding is not necessary, but if there is a pass later that adds effects, then not
discarding could lead to bugs, since we'd think there are fewer effects than there are.
(However, normal optimization passes never add effects, only remove them.)
It's also possible to call this multiple times:
--generate-global-effects -O3 --generate-global-effects -O3
That computes affects after the first -O3, and may find fewer effects than earlier.
This doesn't compute the full transitive closure of the effects across functions. That is,
when computing a function's effects, we don't look into its own calls. The simple case
so far is enough to handle the call.without.effects example from before (though it
may take multiple optimization cycles).
|
|
|
| |
Adds an --in-secondary-memory switch to the wasm-split tool that allows profile data to be stored in a separate memory from module main memory. With this option, users do not need to reserve the initial memory region for profile data and the data can be shared between multiple threads.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In practice typed function references will not ship before GC and is not
independently useful, so it's not necessary to have a separate feature for it.
Roll the functionality previously enabled by --enable-typed-function-references
into --enable-gc instead.
This also avoids a problem with the ongoing implementation of the new GC bottom
heap types. That change will make all ref.null instructions in Binaryen IR refer
to one of the bottom heap types. But since those bottom types are introduced in
GC, it's not valid to emit them in binaries unless unless GC is enabled. The fix
if only reference types is enabled is to emit (ref.null func) instead
of (ref.null nofunc), but that doesn't always work if typed function references
are enabled because a function type more specific than func may be required.
Getting rid of typed function references as a separate feature makes this a
nonissue.
|
|
|
|
|
|
|
| |
Add a pass that wraps all imports and exports with functions that handle
storing and passing along the suspender externref needed for JSPI.
https://github.com/WebAssembly/js-promise-integration/blob/main/proposals/js-promise-integration/Overview.md
|
|
|
| |
Adding multi-memories to the the list of wasm-features.
|
|
|
|
| |
This is no longer needed by emscripten as of:
https://github.com/emscripten-core/emscripten/pull/16529
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are several reasons why a function may not be trained in deterministically.
So to perform quick validation we need to inspect profile.data (another ways requires split to be performed). However as profile.data is a binary file and is not self sufficient, so we cannot currently use it to perform such validation.
Therefore to allow quick check on whether a particular function has been trained in, we need to dump profile.data in a more readable format.
This PR, allows us to output, the list of functions to be kept (in main wasm) and those split functions (to be moved to deferred.wasm) in a readable format, to console.
Added a new option `--print-profile`
- input path to orig.wasm (its the original wasm file that will be used later during split)
- input path to profile.data that we need to output
optionally pass `--unescape`
to unescape the function names
Usage:
```
binaryen\build>bin\wasm-split.exe test\profile_data\MY.orig.wasm --print-profile=test\profile_data\profile.data > test\profile_data\out.log
```
note: meaning of prefixes
`+` => fn to be kept in main wasm
`-` => fn to be split and moved to deferred wasm
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This tracks the possible contents in the entire program all at once using a single IR.
That is in contrast to say DeadArgumentElimination of LocalRefining etc., all of whom
look at one particular aspect of the program (function params and returns in DAE,
locals in LocalRefining). The cost is to build up an entire new IR, which takes a lot
of new code (mostly in the already-landed PossibleContents). Another cost
is this new IR is very big and requires a lot of time and memory to process.
The benefit is that this can find opportunities that are only obvious when looking
at the entire program, and also it can track information that is more specialized
than the normal type system in the IR - in particular, this can track an ExactType,
which is the case where we know the value is of a particular type exactly and not
a subtype.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Implement the basic infrastructure for the full WAT parser with just enough
detail to parse basic modules that contain only imported globals. Parsing
functions correspond to elements of the grammar in the text specification and
are templatized over context types that correspond to each phase of parsing.
Errors are explicitly propagated via `Result<T>` and `MaybeResult<T>` types.
Follow-on PRs will implement additional phases of parsing and parsing for new
elements in the grammar.
|
|
|
|
|
|
|
|
| |
We have some possible use cases for this pass, and so are restoring
it.
This reverts the removal in #3261, fixes compile errors in internal API
changes since then, and flips the direction of the stack for the
wasm backend.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimizes constants in the megamorphic case of two: when we
know two function references are possible, we could in theory emit this:
(select
(ref.func A)
(ref.func B)
(ref.eq
(..ref value..) ;; globally, only 2 things are possible here, and one has
;; ref.func A as its value, and the other ref.func B
(ref.func A))
That is, compare to one of the values, and emit the two possible values there.
Other optimizations can then turn a call_ref on this select into an if over
two direct calls, leading to devirtualization.
We cannot compare a ref.func directly (since function references are not
comparable), and so instead we look at immutable global structs. If we
find a struct type that has only two possible values in some field, and
the structs are in immutable globals (which happens in the vtable case
in j2wasm for example), then we can compare the references of the struct
to decide between the two values in the field.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new signature-pruning pass that prunes parameters from
signature types where those parameters are never used in any function
that has that type. This is similar to DeadArgumentElimination but works
on a set of functions, and it can handle indirect calls.
Also move a little code from SignatureRefining into a shared place to
avoid duplication of logic to update signature types.
This pattern happens in j2wasm code, for example if all method functions
for some virtual method just return a constant and do not use the this
pointer.
|
|
|
| |
See https://github.com/WebAssembly/extended-const
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Merge similar functions that only differs constant values (like immediate
operand of const and call insts) by parameterization.
Performing this pass at post-link time can merge more functions across
objects. Inspired by Swift compiler's optimization which is derived from
LLVM's one:
https://github.com/apple/swift/blob/main/lib/LLVMPasses/LLVMMergeFunctions.cpp
https://github.com/llvm/llvm-project/blob/main/llvm/docs/MergeFunctions.rst
The basic ideas here are constant value parameterization and direct callee
parameterization by indirection.
Constant value parameterization is like below:
;; Before
(func $big-const-42 (result i32)
[[many instr 1]]
(i32.const 44)
[[many instr 2]]
)
(func $big-const-43 (result i32)
[[many instr 1]]
(i32.const 45)
[[many instr 2]]
)
;; After
(func $byn$mgfn-shared$big-const-42 (result i32)
[[many instr 1]]
(local.get $0) ;; parameterized!!
[[many instr 2]]
)
(func $big-const-42 (result i32)
(call $byn$mgfn-shared$big-const-42
(i32.const 42)
)
)
(func $big-const-43 (result i32)
(call $byn$mgfn-shared$big-const-42
(i32.const 43)
)
)
Direct callee parameterization is similar to the constant value parameterization,
but it parameterizes callee function i by ref.func instead. Therefore it is enabled
only when reference-types and typed-function-references features are enabled.
I saw 1 ~ 2 % reduction for SwiftWasm binary and Ruby's wasm port
using wasi-sdk, and 3 ~ 4.5% reduction for Unity WebGL binary when -Oz.
|
|
|
|
|
| |
Introduce static consts with PassOptions Defaults.
Add assertion to verify that the default options are the Os options.
Also update the text in relevant tests.
|
|
|
|
|
|
|
| |
Add an option for running the asyncify transformation on the primary module
emitted by wasm-split. The idea is that the placeholder functions should be able
to unwind the stack while the secondary module is asynchronously loaded, then
once the placeholder functions have been patched out by the secondary module the
stack should be rewound and end up in the correct secondary function.
|
|
|
|
|
|
|
|
|
|
| |
Add support for isorecursive types to wasm-fuzz-types by generating recursion
groups and ensuring that children types are only selected from candidates
through the end of the current group. For non-isorecursive systems, treat all
the types as belonging to a single group so that their behavior is unchanged.
Also fix two small bugs found by the fuzzer: LUB calculation was taking the
wrong path for isorecursive types and isorecursive validation was not handling
basic heap types properly.
|
| |
|
| |
|