| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Each pass instance can now store an argument for it, which can be different.
This may be a breaking change for the corner case of running a pass multiple
times and setting the pass's argument multiple times as well (before, the last
pass argument affected them all; now, it affects the last instance only). This
only affects arguments with the name of a pass; others remain global, as
before (and multiple passes can read them, in fact). See the CHANGELOG for
details.
Fixes #6646
|
|
|
|
|
|
|
|
|
| |
Normally we use it when optimizing (above a certain level). This lets the user
prevent it from being used even then.
Also add optimization options to wasm-metadce so that this is possible
there as well and not just in wasm-opt (this also opens the door to running
more passes in metadce, which may be useful later).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RefTest (#6692)
CFP focuses on finding when a field always contains a constant, and then replaces
a struct.get with that constant. If we find there are two constant values, then in some
cases we can still optimize, if we have a way to pick between them. All we have is the
struct.get and its reference, so we must use a ref.test:
(struct.get $T x (..ref..))
=>
(select
(..constant1..)
(..constant2..)
(ref.test $U (..ref..))
)
This is valid if, of all the subtypes of $T, those that pass the test have
constant1 in that field, and those that fail the test have constant2. For
example, a simple case is where $T has two subtypes, $T is never created
itself, and each of the two subtypes has a different constant value.
This is a somewhat risky operation, as ref.test is not necessarily cheap.
To mitigate that, this is a new pass, --cfp-reftest that is not run by
default, and also we only optimize when we can use a ref.test on what
we think will be a final type (because ref.test on a final type can be
faster in VMs).
|
|
|
|
|
|
|
| |
This pass receives a list of functions to trace, and then wraps them in calls to
imports. This can be useful for tracing malloc/free calls, for example, but is
generic.
Fixes #6548
|
|
|
|
|
| |
Add the feature and flags to enable and disable it. Require the new feature to
be enabled for shared heap types to validate. To make the test work, update the
validator to actually check features for global types.
|
|
|
|
|
| |
Remove `SExpressionParser`, `SExpressionWasmBuilder`, and `cashew::Parser`.
Simplify gen-s-parser.py. Remove the --new-wat-parser and
--deprecated-wat-parser flags.
|
|
|
|
|
| |
Changes to wasm-validator.cpp here are mostly for consistency between
elem and data segment validation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We settled on the name `WASM_EXNREF` for the new setting in Emscripten
for the name for the new EH option.
https://github.com/emscripten-core/emscripten/blob/2bc5e3156f07e603bc4f3580cf84c038ea99b2df/src/settings.js#L782-L786
"New EH" sounds vague and I'm not sure if "experimental" is really
necessary anyway, given that the potential users of this option is aware
that this is a new spec that has been adopted recently.
To make the option names consistent, this renames `--translate-to-eh`
(the option that only runs the translator) to `--translate-to-exnref`,
and `--experimental-new-eh` to `--emit-exnref` (the option that runs the
translator at the end of the whole pipeline), and renames the pass and
variable names in the code accordingly as well.
In case anyone is using the old option names (and also to make the
Chromium CI pass), this does not delete the old options.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we had passes --generate-stack-ir, --optimize-stack-ir, --print-stack-ir
that could be run like any other passes. After generating StackIR it was stashed on
the function and invalidated if we modified BinaryenIR. If it wasn't invalidated then
it was used during binary writing. This PR switches things so that we optionally
generate, optimize, and print StackIR only during binary writing. It also removes
all traces of StackIR from wasm.h - after this, StackIR is a feature of binary writing
(and printing) logic only.
This is almost NFC, but there are some minor noticeable differences:
1. We no longer print has StackIR in the text format when we see it is there. It
will not be there during normal printing, as it is only present during binary writing.
(but --print-stack-ir still works as before; as mentioned above it runs during writing).
2. --generate/optimize/print-stack-ir change from being passes to being flags that
control that behavior instead. As passes, their order on the commandline mattered,
while now it does not, and they only "globally" affect things during writing.
3. The C API changes slightly, as there is no need to pass it an option "optimize" to
the StackIR APIs. Whether we optimize is handled by --optimize-stack-ir which is
set like other optimization flags on the PassOptions object, so we don't need the
old option to those C APIs.
The main benefit here is simplifying the code, so we don't need to think about
StackIR in more places than just binary writing. That may also allow future
improvements to our usage of StackIR.
|
|
|
|
|
| |
This flag is intended to help users gracefully migrate to the new wat parser. It
will be removed again not too long after the new wat parser is enabled by
default in wasm-opt.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR creates a pass to propagate debug location from parent node to child nodes which has no debug location with pre-order traversal. This is useful for compilers that use Binaryen API to generate WebAssembly modules.
It behaves like `wasm-opt` read text format file: children are tagged with the debug info of the parent, if they have no annotation of their own.
For compilers that use Binaryen API to generate WebAssembly modules, it is a bit redundant to add debugInfo for each expression, Especially when the compiler wrap expressions.
With this pass, compilers just need to add debugInfo for the parent node, which is more convenient.
For example:
```
(drop
(call $voidFunc)
)
```
Without this pass, if the compiler only adds debugInfo for the wrapped expression `drop`, the `call` expression has no corresponding source code mapping in DevTools debugging, which is obviously not user-friendly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The latest idea for efficient string constants is to encode the constants in
the import names of their globals and implement fast paths in the engines for
materializing those constants at instantiation time without needing to parse
anything in JS. This strategy only works for valid strings (i.e. strings without
unpaired surrogates) because only valid strings can be used as import names in
the WebAssembly syntax.
Add a new configuration of the StringLowering pass that encodes valid string
contents in import names, falling back to the JSON custom section approach for
invalid strings.
To test this chang, update the printer to escape import and export names
properly and update the legacy parser to parse escapes in import and export
names properly. As a drive-by, remove the incorrect check in the parser that the
import module and base names are non-empty.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change removes the "minimal" mode from `LegalizeJSInterface`
which was added in #1883.
The idea behind this change was to avoid legalizing most function except
those we know that JS will be calling. The idea was that for dynamic
linking we always want the non-legalized version to be shared between
wasm module. These days we solve this problem in a different way with
the `legalize-js-interface-export-originals` which exports the original
functions alongside the legalized ones. Emscripten then always
prefers the `$orig` functions when doing dynamic linking.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already have passes to legalize i64 imports and exports, which the fuzzer will
run so that we can run wasm files in JS VMs. SIMD and multivalue also pose a
problem as they trap on the boundary. In principle we could legalize them as well,
but that is substantial effort, so instead just prune them: given a wasm module,
remove any imports or exports that use SIMD or multivalue (or anything else that
is not legal for JS).
Running this in the fuzzer will allow us to not skip running v8 on any testcase we
enable SIMD and multivalue for.
(Multivalue is allowed in newer VMs, so that part of this PR could be removed
eventually.)
Also remove the limitation on running v8 with multimemory (v8 now supports
that).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had two JS files that could run a wasm file for fuzzing purposes:
* --emit-js-shell, which emitted a custom JS file that runs the wasm.
* scripts/fuzz_shell.js, which was a generic file that did the same.
Both of those load the wasm and then call the exports in order and print out
logging as it goes of their return values (if any), exceptions, etc. Then the
fuzzer compares that output to running the same wasm in another VM, etc. The
difference is that one was custom for the wasm file, and one was generic. Aside
from that they are similar and duplicated a bunch of code.
This PR improves things by removing 1 and using 2 in all places, that is, we
now use the generic file everywhere.
I believe we added 1 because we thought a generic file can't do all the
things we need, like know the order of exports and the types of return values,
but in practice there are ways to do those things: The exports are in fact
in the proper order (JS order of iteration is deterministic, thankfully), and
for the type we don't want to print type internals anyhow since that would
limit fuzzing --closed-world. We do need to be careful with types in JS (see
notes in the PR about the type of null) but it's not too bad. As for the types
of params, it's fine to pass in null for them all anyhow (null converts to a
number or a reference without error).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SimplifyGlobals already does this, so this is a subset of that pass, and does not
add anything new. It is useful for testing, however.
In particular it allows testing that we propagate subsequent globals in a single
pass, that is if one global reads from another and becomes constant, then it
can be propagated as well. SimplifyGlobals runs multiple passes so this always
worked, but with this pass we can test that we do it efficiently in one pass.
This will also be useful for comparing stringref to imported strings, as it
allows gathered strings to be propagated to other globals (possible with
stringref, but not imported strings) but not anywhere else (which might have
downsides as it could lead to more allocations).
Also add an additional test for simplify-globals that we do not get confused by
an unoptimizable global.get in the middle (see last part).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds `--experimental-new-eh` option to `wasm-opt`. The difference
between this and `--translate-to-new-eh` is, `--translate-to-new-eh`
just runs `TranslateToNewEH` pass, while `--experimental-new-eh`
attaches `TranslateToNewEH` pass at the end of the whole optimization
pipeline. So if no other passes or optimization options (`-On`) are
specified, it is equivalent to `--translate-to-new-eh`. If other
optimization passes are specified, it runs them and at the end run the
translator to ensure the new EH instructions are emitted. The reason we
are doing this this way is that the optimization pipeline as a whole
does not support the new EH instruction yet, but we would like to
provide an option to emit a reasonably OK code with the new EH
instructions.
This also means when the optimization level > 3, it will also run
the StackIR + local2stack optimization after the translation.
Not sure how to test the output of this option, given that there is not
much point in testing the default optimization passes, and it is also
not clear how to print the stack IR if the stack ir generation and
optimization runs as a part of the pipeline and not the explicit command
line options.
This is created in favor of #6267, which added the option to
`optimization-options.h`. It had a problem of running the translator
multiple times when `-On` was given multiple times in the command line,
which I learned was rather a common usage. This adds the option directly
to `wasm-opt.cpp`, which avoids the problem. With this, it is still
possible to create and optimize Stack IR unnecessarily, but that feels a
better alternative.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extends StringGathering by replacing the gathered string globals to imported
globals. It adds a custom section with the strings that the imports are expected to
provide. It also replaces the string type with extern.
This is a complete lowering of strings, except for string operations that are a TODO.
After running this, no strings remain in the wasm, and the outside JS is expected
to provide the proper imports, which it can do by processing the JSON of the
strings in the custom section "string.consts", which looks like
["foo", "bar", ..]
That is, an array of strings, which are imported as
(import "string.const" "0" (global $string.const_foo (ref extern))) ;; foo
(import "string.const" "1" (global $string.const_bar (ref extern))) ;; bar
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass finds all string.const and creates globals for them. After this transform, no
string.const appears anywhere but in a global, and each string appears in one global
which is then global.get-ed everywhere.
This avoids overhead in VMs where executing a string.const is an allocation, and is
also a good step towards imported strings. For that, this pass will be extended from
gathering to a full lowering pass, which will first gather into globals as this pass does,
and then turn each of those globals with a string.const into an imported externref.
(For that reason this pass is in a file called StringLowering, as the two passes will
share much of their code, and the larger pass should decide the name I think.)
This pass runs in -O2 and above. Repeated executions have no downside (see
details in code).
|
|
|
|
|
|
| |
The previous name feels too verbose and unwieldy.
This also removes the "new-to-old EH" placeholder. I think it'd be
better to add it back when it is actually added.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This translates the old Phase 3 EH instructions, which include `try`,
`catch`, `catch_all`, `delegate`, and `rethrow`, into the new EH
instructions, which include `try_table` (with `catch` / `catch_ref` /
`catch_all` / `catch_all_ref`) and `throw_ref`, passed at the Oct 2023
CG meeting.
This translator can be used as a standalone tool by users of the
previous EH toolchain to generate binaries for the new spec without
recompiling, and also can be used at the end of the Binaryen pipeline to
produce binaries for the new spec while the end-to-end toolchain
implementation for the new spec is in progress.
While the goal of this pass is not optimization, this tries to a little
better than the most naive implementation, namely by omitting a few
instructions where possible and trying to minimize the number of
additional locals, because this can be used as a standalone translator
or the last stage of the pipeline while we can't post-optimize the
results because the whole pipeline (-On) is not ready for the new EH.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR creates a new pass to optimize J2CL specific patterns
that would otherwise difficult to recognize/prove generically
by other binaryen passes.
The pass currently handles fields what we call as "constant-like".
These fields are fields initialized once and unconditionally through
"clinit" function and technically they do have 2 observable states;
- initial null/0 state
- initialized state.
However you can only observe initial null/0 state in contrived examples,
not in real world/correct applications.
This pass moves such "clinit" initialized fields to global initialization.
Above pattern also matches other lazy init construct like String and Class
literals (which binaryen already reduces to constant expressions). So
the pass is generalized to include them as well. (by matching any functions
with the name pattern "_@once_")
In order for this pass to be effective:
1. It needs to run between O3 passes
2. We need to stop inlining of "once" functions.
Stopping inlining of the once functions are important to preserve their
structure. This both helps existing OnceReducer pass and new J2CL pass to
be a lot more effective. Also it is not useful to inline these functions
as by defintion they only executed once. This could be achieved by passing
no-inline filter.
Although the inlining is generally disabled for these functions, it is
still needed for some cases since inliner is effectively responsible for
removal of the once functions that are simplified into empty or simple
delegating functions. For this reason, the pass will rename such trivial
function so no-inline filter will no longer match them.
Also note that after all optimizations completed, it does make sense to
have a final stage where the "partial inline" of all once functions are
allowed. This will speed them up by moving the initialization check to
call-site.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Any function can now be annotated as not to be inlined fully (normally) or not to be
inlined partially. In the future we'll want to read those annotations from the proposed
wasm metadata section on code hints, and from wat text as well, but for now add
trivial passes that set those fields based on function name wildcards, e.g.:
--no-inline=*leave-alone* --inlining
That will not inline any function whose name contains "leave-alone" in the name.
--no-inline disables all inlining (full or partial) while --no-full-inline and
--no-partial-inline affect only full or partial inlining.
|
|
|
| |
Adds an outlining pass that performs outlining on a module end to end, and two tests.
|
|
|
|
|
|
|
|
| |
Because we currently strip some data segments (i.e. EM_JS strings)
during `--post-emscripten` this is too late as `--separate-data-segments`
always runs in `wasm-emscripten-finalize`.
Once emscripten switches over to using the pass directly we can remove
the support from `wasm-emscripten-finalize`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new pass that analyzes the module to find the minimal subtyping relation
that is necessary to maintain the validity and semantics of the program and
rewrites the types to use this minimal relation. Besides eliminating references
to otherwise-unused intermediate types, this optimization should unlock
significant additional optimizing power in other type optimizations that are
constrained by having to maintain supertype validity, since after this new
optimization there are fewer and more general supertypes.
The analysis works by visiting each expression and module element to collect the
subtypings that are required to maintain its validity, then, using that as a
starting point, iteratively adding new subtypings required by type definitions
and casts until reaching a fixed point.
|
|
|
|
|
|
|
| |
This PR is part of a series that adds basic support for the [typed continuations
proposal](https://github.com/wasmfx/specfx).
This particular PR simply extends `FeatureSet` with a corresponding entry for
this proposal.
|
|
|
|
|
|
|
|
|
| |
TypeFinalization finalizes all types that we can, that is, all private types that have no
children. TypeUnFinalization unfinalizes (opens) all (private) types.
These could be used by first opening all types, optimizing, and then finalizing, as that
might find more opportunities.
Fixes #5933
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases tuples are obviously not needed, such as when they are only used
in local operations and make/extract. Such tuples are not used as return values or
in control flow structures, so we might as well lower them to individual locals per
lane, which other passes can optimize a lot better.
I believe LLVM does the same with its own tuples: it lowers them as much as
possible, leaving only necessary ones.
Fixes #5923
|
|
|
|
|
| |
Now that the WasmGC spec has settled on a way of validating non-nullable locals,
we no longer need this experimental feature that allowed nonstandard uses of
non-nullable locals.
|
|
|
| |
Renaming the multimemory flag in Binaryen to match its naming in LLVM.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GUFA refines existing casts, but does not add new casts for fear of increasing code size
and adding more cast operations at runtime. This PR adds a version that does add all
those casts, and it looks like at least code size improves rather than regresses, at least
on J2Wasm and Kotlin. That is, this pass adds a lot more casts, but subsequent
optimizations benefit enough to shrink overall code size.
However, this may still not be worthwhile, as even if code size decreases we may end
up doing more casts at runtime, and those casts might be hard to remove, e.g.:
(call $foo
(x) ;; inferred to be non-null
)
(func $foo (param (ref null $A)
=>
(call $foo
(ref.cast $A (x) ;; add a cast here
)
(func $foo (param (ref $A) ;; later pass refines here
That new cast cannot be removed after we refine the function parameter. If the
function never benefits from the fact that the input is non-null, then the cast is
wasted work (e.g. if the function only compares the input to another value).
To use this new pass, try --gufa-cast-all rather than --gufa. As with normal GUFA,
running the full optimizer afterwards is important, and even more important in
order to get rid of as many of the new casts as possible.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass strips all EH stuff, including EH instructions and tags, from
the input module and disables the EH feature from the features section.
1. This removes `catch` and `catch_all` blocks from the code. So
```wast
(try
(do
(some code)
)
(catch
...
)
)
```
becomes just `(some code)`. Note that all `rethrow`s will be removed
with `catch`es. Note that all `rethrow`s will be removed with `catch`es.
2. This converts 'throw (...)` into `unreachable`. Note that `rethrows
3. This removes all tags from the module, which are unused anyway after
1 and 2.
4. This removes exception handling feature from the features section.
You can use the pass with
```console
$ wasm-opt --enable-exception-handling --strip-eh INPUT -o OUTPUT
```
This is not an optimization pass, so it is not run unless you specify
the pass explicitly.
This is in effect similar to Clang's `-fignore-exceptions`, in which you
can throw but it will result in a crash and we compile away all landing
pads. This can be used for people who don't (or can't) use
`-fignore-exceptions` in their build settings or who want to compile
away `catch` blocks later.
Closes emscripten-core/emscripten#19585.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Disable sign extension in SignExtLowering.cpp
The sign extension lowering pass would previously lower away the sign extension
instructions, but it wouldn't disable the sign extension feature, so follow-on
passes such as optimize-instructions could reintroduce sign extension
instructions.
Fix the pass to disable the sign extension feature to prevent sign extension
instructions from being reintroduced later.
* update pass description
|
|
|
|
|
| |
After this change, the only type system usable from the tools will be the
standard isorecursive type system. The nominal type system is still usable via
the API, but it will be removed entirely in a follow-on PR.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a type hierarchy has abstract classes in the middle, that is, types that
are never instantiated, then we can optimize casts and other operations
to them. Say in Java that we have `AbstractList`, and it only has one
subclass `IntList` that is ever created, then any place we have an `AbstractList`
we must actually have an `IntList`, or a null. (Or, if no subtype is instantiated,
then the value must definitely be a null.)
The actual implementation does a type mapping, that is, it finds all places
using an abstract type and makes them refer to the single instantiated
subtype (or null). After that change, no references to the abstract type
remain in the program, so this both refines types and also cleans up the
type section.
|
|
|
|
|
|
|
|
| |
For example,
-O3 --skip-pass=vacuum
will run -O3 normally but it will not run the vacuum pass at all
(which normally runs more than once in -O3).
|
|
|
|
| |
This allows tools like wasm-reduce to be told to operate in closed-world mode. That
lets them validate in the more strict way of that mode.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The type rewriting utility in type-updating.cpp gathers all the used heap types,
then rewrites them to newly built and possibly modified heap types. The problem
is that for the isorecursive type system, the set of "used" heap types was
overly broad because it also included unused heap types that are in a rec group
with used types. In the context of emitting a binary, it is important to treat
these types as used because failing to emit them would change the identity of
the used types, but in the context of type optimizations it is ok to treat them
as truly unused because we are changing type identities anyway.
Update the type rewriting utility to only include truly used types in the set of
output types. This causes all existing type optimizations to implicitly drop
unused types, but only if they find any other optimizations to do and actually
run the rewriter utitility. Their output will also still include unused types
that were used before their optimizations were applied.
To overcome these limitations and better match the optimizing power of nominal
mode, which never includes unused types in the output, add a new type
optimization pass that removes unused types and does nothing else and run it
near the end of the global optimization pipeline.
|
|
|
| |
Per the wasm spec guidelines for Load (rule 10) & Store (rule 12), this PR adds an option for bounds checking, producing a runtime error if the instruction exceeds the bounds of the particular memory within the combined memory.
|
|
|
|
|
|
|
|
| |
This finds types that can be merged into their super: types that add no
fields, and are not used in casts, etc. - so we might as well use the super.
This complements TypeSSA, in that it can merge back the new types that
TypeSSA created, if we never found a use for them. Without this, TypeSSA
can bloat binary size quite a lot (I see 10-20%).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This creates new nominal types for each (interesting) struct.new. That then allows
type-based optimizations to be more precise, as those optimizations will track
separate info for each struct.new, in effect. That is kind of like SSA, however, we
do not handle merges. For example:
x = struct.new $A (5);
print(x.value);
y = struct.new $A (11);
print(y.value);
// => //
x = struct.new $A.x (5);
print(x.value);
y = struct.new $A.y (11);
print(y.value);
After the pass runs each of those struct.new creates a unique type, and type-based
analysis can see that 5 or 11 are the only values written in that type (if nothing else
writes there).
This bloats the type section with the new subtypes, so it is best used with a pass
to merge unneeded duplicate types, which a later PR will add. That later PR will
exactly merge back in the types created here, which are nominally different but
indistinguishable otherwise.
This pass is not enabled by default. It's not clear yet where is the best place to do it,
as it must be balanced by type merging, but it might be better to do multiple
rounds of optimization between the two. Needs more investigation.
|
|
|
| |
The flag does nothing so far.
|
|
|
|
| |
Equirecursive is no longer standards track and its implementation is extremely
complex. Remove it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(some.operation
(ref.cast .. (local.get $ref))
(local.get $ref)
)
=>
(some.operation
(local.tee $temp
(ref.cast .. (local.get $ref))
)
(local.get $temp)
)
This can help cases where we cast for some reason but happen to not use the
cast value in all places. This occurs in j2wasm in itable calls sometimes: The
this pointer is is refined, but the itable may be done with an unrefined pointer,
which is less optimizable.
So far this is just inside basic blocks, but that is enough for the cast of itable
calls and other common patterns I see.
|
|
|
|
| |
Fixes #5250
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Monomorphization finds cases where we send more refined types to a function
than it declares. In such cases we can copy the function and refine the parameters:
// B is a subtype of A
foo(new B());
function foo(x : A) { ..}
=>
foo_B(new B()); // call redirected to refined copy
function foo(x : A) { ..} // unchanged
function foo_B(x : B) { ..} // refined copy
This increases code size so it may not be worth it in all cases. This initial PR is
hopefully enough to start experimenting with this on performance, and so it does
not enable the pass by default.
This adds two variations of monomorphization, one that always does it, and the
default which is "careful": it sees whether monomorphizing lets the refined function
actually be better than the original (say, by removing a cast). If there is no
improvement then we do not make any changes. This saves a significant amount
of code size - on j2wasm the careful version increases by 13% instead of 20% -
but it does run more slowly obviously.
|
|
|
|
|
|
|
|
|
| |
This sorts globals by their usage (and respecting dependencies). If the module
has very many globals then using smaller LEBs can matter.
If there are fewer than 128 globals then we cannot reduce size, and the pass
exits early (so this pass will not slow down MVP builds, which usually have just
1 global, the stack pointer). But with wasm GC it is common to use globals for
vtables etc., and often there is a very large number of them.
|
|
|
|
|
|
|
|
|
|
| |
Adds a multi-memories lowering pass that will create a single combined memory from the memories added to the module. This pass assumes that each memory is configured the same (type, shared).
This pass also:
- replaces existing memory.size instructions with a custom function that returns the size of each memory as if they existed independently
- replaces existing memory.grow instructions with a custom function, using global offsets to track the page size of each memory so data doesn't overlap in the singled combined memory
- adjusts the offsets of active data segments
- adjusts the offsets of Loads/Stores
|