| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Before, we would simply not export a function that had an e.g. anyref
param. As a result, the modules were effectively "closed", which was
good for testing full closed-world mode, but not for testing degrees of
open world. To improve that, this PR allows the fuzzer to export such
functions, and an "enclose world" pass is added that "closes" the wasm
(makes it more compatible with closed-world) that is run 50% of the
time, giving us coverage of both styles.
|
|
|
|
|
|
|
|
|
|
|
| |
This pass lowers nontrapping FP to int instructions to implement LLVM's
conversion behavior.
This means that they are not fully complete lowerings according to the
wasm spec, but have the same
undefined behavior that LLM does. This keeps the pass simpler and
preserves existing behavior when
compiling without nontrapping-ft.
This will be used in emscripten, so that we can build libraries with
nontrapping-fp and lower them away after link if desired.
|
|
|
|
| |
Since the resulting code has the same undefined behavior as LLVM, make
the pass name reflect that.
|
|
|
|
|
|
|
|
| |
This pass lowers away memory.copy and memory.fill operations. It
generates a function that implements the each of the instructions and
replaces the instructions with calls to those functions.
It does not handle other bulk memory operations (e.g. passive segments
and table operations) because they are not used by emscripten to enable
targeting old browsers that don't support bulk memory.
|
|
|
|
|
|
|
|
| |
This allows to remove a reference field from all Java objects reducing
the per object memory and initialization overhead.
The pass is designed to run direclty on the J2CL output before other
optimizations since it relies on invariants that might get lost in
optimization. If the invariants don't hold the pass aborts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HeapStoreOptimization (#6882)
This just moves code out of OptimizeInstructions to the new pass. The existing
test is renamed and now runs the new pass instead. The new pass is run right
after each --optimize-instructions invocation, so it should not cause any
noticeable effects whatsoever, making this NFC.
The motivation here is that there is a bug in the pass, see the new testcase
added at the end, which shows the bug. It is not practical to fix that bug in
OptimizeInstructions since we need more than peephole optimizations to do
so. This PR moves the code to a new pass so we can fix it there properly,
later.
The new pass is named HeapStoreOptimization since the same infrastructure
we will need to fix the bug will also help dead store elimination and related
things.
|
|
|
|
|
|
|
|
|
|
|
| |
The best way to lower strings is via the "magic imports" API that uses
the names of imported string globals as their values. This approach only
works for valid UTF-8 strings, though. The existing
string-lowering-magic-imports pass falls back to putting non-UTF-8
strings in a JSON custom section, but this requires the runtime to
support that custom section for correctness. To help catch errors early
when runtimes do not support the strings custom section, add a new pass
that uses magic imports and raises an error if there are any invalid
strings.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of our type optimization passes emit all non-public types as a
single large rec group, which trivially ensures that different types
remain different, even if they are optimized to have the same structure.
Usually emitting a single large rec group is fine, but it also means
that if the module is split, all of the types will need to be repeated
in all of the split modules. To better support this use case, add a pass
that can split the large rec group back into minimal rec groups, taking
care to preserve separate type identities by emitting different
permutations of the same group where possible or by inserting unused
brand types to differentiate them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before the PR:
$ bin/wasm-opt test/hello_world.wat --metrics
total
[exports] : 1
[funcs] : 1
[globals] : 0
[imports] : 0
[memories] : 1
[memory-data] : 0
[tables] : 0
[tags] : 0
[total] : 3
[vars] : 0
Binary : 1
LocalGet : 2
After the PR:
$ bin/wasm-opt test/hello_world.wat --metrics
Metrics
total
[exports] : 1
[funcs] : 1
...
Note the "Metrics" addition at the top. And the title can be customized:
$ bin/wasm-opt test/hello_world.wat --metrics=text
Metrics: text
total
[exports] : 1
[funcs] : 1
The custom title can be helpful when multiple invocations of metrics are used
at once, e.g. --metrics=before -O3 --metrics=after.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Each pass instance can now store an argument for it, which can be different.
This may be a breaking change for the corner case of running a pass multiple
times and setting the pass's argument multiple times as well (before, the last
pass argument affected them all; now, it affects the last instance only). This
only affects arguments with the name of a pass; others remain global, as
before (and multiple passes can read them, in fact). See the CHANGELOG for
details.
Fixes #6646
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RefTest (#6692)
CFP focuses on finding when a field always contains a constant, and then replaces
a struct.get with that constant. If we find there are two constant values, then in some
cases we can still optimize, if we have a way to pick between them. All we have is the
struct.get and its reference, so we must use a ref.test:
(struct.get $T x (..ref..))
=>
(select
(..constant1..)
(..constant2..)
(ref.test $U (..ref..))
)
This is valid if, of all the subtypes of $T, those that pass the test have
constant1 in that field, and those that fail the test have constant2. For
example, a simple case is where $T has two subtypes, $T is never created
itself, and each of the two subtypes has a different constant value.
This is a somewhat risky operation, as ref.test is not necessarily cheap.
To mitigate that, this is a new pass, --cfp-reftest that is not run by
default, and also we only optimize when we can use a ref.test on what
we think will be a final type (because ref.test on a final type can be
faster in VMs).
|
|
|
|
|
|
|
| |
This pass receives a list of functions to trace, and then wraps them in calls to
imports. This can be useful for tracing malloc/free calls, for example, but is
generic.
Fixes #6548
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Doing it before anything else can help a lot if there is a significant amount of
dead code that can be removed, as it saves work for all the later passes. We
did run this pass if GC was enabled just a few passes later down, but even so
it is worthwhile to run it an additional time, and it makes sense to do even
without GC (though in typical optimized LLVM outputs there will be little
dead code).
If there is no dead code then this is wasted work, but this is a fairly fast pass,
and I measure no significant slowdown due to this. E.g. on the 35 MB clang.wasm
(which is already optimized, so little dead code) it takes around a second, while
all of -O2 takes almost two minutes, so the difference is just 1%.
On J2CL I measure a 15% speedup in -O3 --closed-world -tnh, and also the
binary is 2.5% smaller, which means there is less work for later cycles of -O3.
|
|
|
|
|
| |
Changes to wasm-validator.cpp here are mostly for consistency between
elem and data segment validation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We settled on the name `WASM_EXNREF` for the new setting in Emscripten
for the name for the new EH option.
https://github.com/emscripten-core/emscripten/blob/2bc5e3156f07e603bc4f3580cf84c038ea99b2df/src/settings.js#L782-L786
"New EH" sounds vague and I'm not sure if "experimental" is really
necessary anyway, given that the potential users of this option is aware
that this is a new spec that has been adopted recently.
To make the option names consistent, this renames `--translate-to-eh`
(the option that only runs the translator) to `--translate-to-exnref`,
and `--experimental-new-eh` to `--emit-exnref` (the option that runs the
translator at the end of the whole pipeline), and renames the pass and
variable names in the code accordingly as well.
In case anyone is using the old option names (and also to make the
Chromium CI pass), this does not delete the old options.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we had passes --generate-stack-ir, --optimize-stack-ir, --print-stack-ir
that could be run like any other passes. After generating StackIR it was stashed on
the function and invalidated if we modified BinaryenIR. If it wasn't invalidated then
it was used during binary writing. This PR switches things so that we optionally
generate, optimize, and print StackIR only during binary writing. It also removes
all traces of StackIR from wasm.h - after this, StackIR is a feature of binary writing
(and printing) logic only.
This is almost NFC, but there are some minor noticeable differences:
1. We no longer print has StackIR in the text format when we see it is there. It
will not be there during normal printing, as it is only present during binary writing.
(but --print-stack-ir still works as before; as mentioned above it runs during writing).
2. --generate/optimize/print-stack-ir change from being passes to being flags that
control that behavior instead. As passes, their order on the commandline mattered,
while now it does not, and they only "globally" affect things during writing.
3. The C API changes slightly, as there is no need to pass it an option "optimize" to
the StackIR APIs. Whether we optimize is handled by --optimize-stack-ir which is
set like other optimization flags on the PassOptions object, so we don't need the
old option to those C APIs.
The main benefit here is simplifying the code, so we don't need to think about
StackIR in more places than just binary writing. That may also allow future
improvements to our usage of StackIR.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR creates a pass to propagate debug location from parent node to child nodes which has no debug location with pre-order traversal. This is useful for compilers that use Binaryen API to generate WebAssembly modules.
It behaves like `wasm-opt` read text format file: children are tagged with the debug info of the parent, if they have no annotation of their own.
For compilers that use Binaryen API to generate WebAssembly modules, it is a bit redundant to add debugInfo for each expression, Especially when the compiler wrap expressions.
With this pass, compilers just need to add debugInfo for the parent node, which is more convenient.
For example:
```
(drop
(call $voidFunc)
)
```
Without this pass, if the compiler only adds debugInfo for the wrapped expression `drop`, the `call` expression has no corresponding source code mapping in DevTools debugging, which is obviously not user-friendly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The latest idea for efficient string constants is to encode the constants in
the import names of their globals and implement fast paths in the engines for
materializing those constants at instantiation time without needing to parse
anything in JS. This strategy only works for valid strings (i.e. strings without
unpaired surrogates) because only valid strings can be used as import names in
the WebAssembly syntax.
Add a new configuration of the StringLowering pass that encodes valid string
contents in import names, falling back to the JSON custom section approach for
invalid strings.
To test this chang, update the printer to escape import and export names
properly and update the legacy parser to parse escapes in import and export
names properly. As a drive-by, remove the incorrect check in the parser that the
import module and base names are non-empty.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change removes the "minimal" mode from `LegalizeJSInterface`
which was added in #1883.
The idea behind this change was to avoid legalizing most function except
those we know that JS will be calling. The idea was that for dynamic
linking we always want the non-legalized version to be shared between
wasm module. These days we solve this problem in a different way with
the `legalize-js-interface-export-originals` which exports the original
functions alongside the legalized ones. Emscripten then always
prefers the `$orig` functions when doing dynamic linking.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already have passes to legalize i64 imports and exports, which the fuzzer will
run so that we can run wasm files in JS VMs. SIMD and multivalue also pose a
problem as they trap on the boundary. In principle we could legalize them as well,
but that is substantial effort, so instead just prune them: given a wasm module,
remove any imports or exports that use SIMD or multivalue (or anything else that
is not legal for JS).
Running this in the fuzzer will allow us to not skip running v8 on any testcase we
enable SIMD and multivalue for.
(Multivalue is allowed in newer VMs, so that part of this PR could be removed
eventually.)
Also remove the limitation on running v8 with multimemory (v8 now supports
that).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SimplifyGlobals already does this, so this is a subset of that pass, and does not
add anything new. It is useful for testing, however.
In particular it allows testing that we propagate subsequent globals in a single
pass, that is if one global reads from another and becomes constant, then it
can be propagated as well. SimplifyGlobals runs multiple passes so this always
worked, but with this pass we can test that we do it efficiently in one pass.
This will also be useful for comparing stringref to imported strings, as it
allows gathered strings to be propagated to other globals (possible with
stringref, but not imported strings) but not anywhere else (which might have
downsides as it could lead to more allocations).
Also add an additional test for simplify-globals that we do not get confused by
an unoptimizable global.get in the middle (see last part).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extends StringGathering by replacing the gathered string globals to imported
globals. It adds a custom section with the strings that the imports are expected to
provide. It also replaces the string type with extern.
This is a complete lowering of strings, except for string operations that are a TODO.
After running this, no strings remain in the wasm, and the outside JS is expected
to provide the proper imports, which it can do by processing the JSON of the
strings in the custom section "string.consts", which looks like
["foo", "bar", ..]
That is, an array of strings, which are imported as
(import "string.const" "0" (global $string.const_foo (ref extern))) ;; foo
(import "string.const" "1" (global $string.const_bar (ref extern))) ;; bar
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass finds all string.const and creates globals for them. After this transform, no
string.const appears anywhere but in a global, and each string appears in one global
which is then global.get-ed everywhere.
This avoids overhead in VMs where executing a string.const is an allocation, and is
also a good step towards imported strings. For that, this pass will be extended from
gathering to a full lowering pass, which will first gather into globals as this pass does,
and then turn each of those globals with a string.const into an imported externref.
(For that reason this pass is in a file called StringLowering, as the two passes will
share much of their code, and the larger pass should decide the name I think.)
This pass runs in -O2 and above. Repeated executions have no downside (see
details in code).
|
|
|
|
|
|
| |
The previous name feels too verbose and unwieldy.
This also removes the "new-to-old EH" placeholder. I think it'd be
better to add it back when it is actually added.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This translates the old Phase 3 EH instructions, which include `try`,
`catch`, `catch_all`, `delegate`, and `rethrow`, into the new EH
instructions, which include `try_table` (with `catch` / `catch_ref` /
`catch_all` / `catch_all_ref`) and `throw_ref`, passed at the Oct 2023
CG meeting.
This translator can be used as a standalone tool by users of the
previous EH toolchain to generate binaries for the new spec without
recompiling, and also can be used at the end of the Binaryen pipeline to
produce binaries for the new spec while the end-to-end toolchain
implementation for the new spec is in progress.
While the goal of this pass is not optimization, this tries to a little
better than the most naive implementation, namely by omitting a few
instructions where possible and trying to minimize the number of
additional locals, because this can be used as a standalone translator
or the last stage of the pipeline while we can't post-optimize the
results because the whole pipeline (-On) is not ready for the new EH.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We tested --generate-global-effects --vacuum and such, but not
--generate-global-effects -O3 or the other -O flags. Unfortunately, our
targeted testing missed a bug because of that. Specifically, we have special
logic for -O flags to make sure the passes they expand into run with the
proper opt and shrink levels, but that logic happened to also interfere with
global effect computation. It would also interfere with allowing GUFA info
or other things to be stored on the side, which we've proposed. This PR
fixes that + future issues.
The fix is to just allow a pass runner to execute more than once. We thought
to avoid that and assert against it to keep the model "hermetic" (you create
a pass runner, you run the passes, and you throw it out), which feels nice in
a way, but it led to the bug here, and I'm not sure it would prevent any other
ones really. It is also more code. It is simpler to allow a runner to execute more
than once, and add a method to clear it. With that, the logic for -O3 execution
is both simpler and does not interfere with anything but the opt and shrink
level flags: we create a single runner, give it the proper options, and then keep
using that runner + those options as we go, normally.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR creates a new pass to optimize J2CL specific patterns
that would otherwise difficult to recognize/prove generically
by other binaryen passes.
The pass currently handles fields what we call as "constant-like".
These fields are fields initialized once and unconditionally through
"clinit" function and technically they do have 2 observable states;
- initial null/0 state
- initialized state.
However you can only observe initial null/0 state in contrived examples,
not in real world/correct applications.
This pass moves such "clinit" initialized fields to global initialization.
Above pattern also matches other lazy init construct like String and Class
literals (which binaryen already reduces to constant expressions). So
the pass is generalized to include them as well. (by matching any functions
with the name pattern "_@once_")
In order for this pass to be effective:
1. It needs to run between O3 passes
2. We need to stop inlining of "once" functions.
Stopping inlining of the once functions are important to preserve their
structure. This both helps existing OnceReducer pass and new J2CL pass to
be a lot more effective. Also it is not useful to inline these functions
as by defintion they only executed once. This could be achieved by passing
no-inline filter.
Although the inlining is generally disabled for these functions, it is
still needed for some cases since inliner is effectively responsible for
removal of the once functions that are simplified into empty or simple
delegating functions. For this reason, the pass will rename such trivial
function so no-inline filter will no longer match them.
Also note that after all optimizations completed, it does make sense to
have a final stage where the "partial inline" of all once functions are
allowed. This will speed them up by moving the initialization check to
call-site.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Any function can now be annotated as not to be inlined fully (normally) or not to be
inlined partially. In the future we'll want to read those annotations from the proposed
wasm metadata section on code hints, and from wat text as well, but for now add
trivial passes that set those fields based on function name wildcards, e.g.:
--no-inline=*leave-alone* --inlining
That will not inline any function whose name contains "leave-alone" in the name.
--no-inline disables all inlining (full or partial) while --no-full-inline and
--no-partial-inline affect only full or partial inlining.
|
|
|
| |
Allow outlining to be excluded from the command line on non-Emscripten builds.
|
|
|
| |
Adds an outlining pass that performs outlining on a module end to end, and two tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new optimization will eventually weaken casts by generalizing (i.e.
un-refining) their output types. If a cast is weakened enough that its output
type is a supertype of its input type, the cast will be able to be removed by
OptimizeInstructions.
Unlike refining cast inputs, generalizing cast outputs can break module
validation. For example, if the result of a cast is stored to a local and the
cast is weakened enough that its output type is no longer a subtype of that
local's type, then the local.set after the cast will no longer validate. To
avoid this validation failure, this optimization would have to generalize the
type of the local as well. In general, the more we can generalize the types of
program locations, the more we can weaken casts of values that flow into those
locations.
This initial implementation only generalizes the types of locals and does not
actually weaken casts yet. It serves as a proof of concept for the analysis
required to perform the full optimization, though. The analysis uses the new
analysis framework to perform a reverse analysis tracking type requirements for
each local and reference-typed stack value in a function.
Planned and potential future work includes:
- Implementing the transfer function for all kinds of expressions.
- Tracking requirements on the dynamic types of each location to generalize
allocations as well.
- Making the analysis interprocedural and generalizing the types of more
program locations.
- Optimizing tuple-typed locations.
- Generalizing only those locations necessary to eliminate at least one cast
(although this would make the anlysis bidirectional, so it is probably better
left to separate passes).
|
|
|
|
|
|
|
|
| |
Because we currently strip some data segments (i.e. EM_JS strings)
during `--post-emscripten` this is too late as `--separate-data-segments`
always runs in `wasm-emscripten-finalize`.
Once emscripten switches over to using the pass directly we can remove
the support from `wasm-emscripten-finalize`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new pass that analyzes the module to find the minimal subtyping relation
that is necessary to maintain the validity and semantics of the program and
rewrites the types to use this minimal relation. Besides eliminating references
to otherwise-unused intermediate types, this optimization should unlock
significant additional optimizing power in other type optimizations that are
constrained by having to maintain supertype validity, since after this new
optimization there are fewer and more general supertypes.
The analysis works by visiting each expression and module element to collect the
subtypings that are required to maintain its validity, then, using that as a
starting point, iteratively adding new subtypings required by type definitions
and casts until reaching a fixed point.
|
|
|
|
|
| |
All logging/instrumentation passes need to do this, to avoid us using stale
global effects that are too low (too high is not optimal either, but at least it
cannot cause bugs).
|
|
|
|
|
|
|
|
|
| |
TypeFinalization finalizes all types that we can, that is, all private types that have no
children. TypeUnFinalization unfinalizes (opens) all (private) types.
These could be used by first opening all types, optimizing, and then finalizing, as that
might find more opportunities.
Fixes #5933
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases tuples are obviously not needed, such as when they are only used
in local operations and make/extract. Such tuples are not used as return values or
in control flow structures, so we might as well lower them to individual locals per
lane, which other passes can optimize a lot better.
I believe LLVM does the same with its own tuples: it lowers them as much as
possible, leaving only necessary ones.
Fixes #5923
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GUFA refines existing casts, but does not add new casts for fear of increasing code size
and adding more cast operations at runtime. This PR adds a version that does add all
those casts, and it looks like at least code size improves rather than regresses, at least
on J2Wasm and Kotlin. That is, this pass adds a lot more casts, but subsequent
optimizations benefit enough to shrink overall code size.
However, this may still not be worthwhile, as even if code size decreases we may end
up doing more casts at runtime, and those casts might be hard to remove, e.g.:
(call $foo
(x) ;; inferred to be non-null
)
(func $foo (param (ref null $A)
=>
(call $foo
(ref.cast $A (x) ;; add a cast here
)
(func $foo (param (ref $A) ;; later pass refines here
That new cast cannot be removed after we refine the function parameter. If the
function never benefits from the fact that the input is non-null, then the cast is
wasted work (e.g. if the function only compares the input to another value).
To use this new pass, try --gufa-cast-all rather than --gufa. As with normal GUFA,
running the full optimizer afterwards is important, and even more important in
order to get rid of as many of the new casts as possible.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This is a followup to #5333 . That fixed the selection of which passes to run, but
forgot to also fix the global state of the current optimize/shrink levels. This PR
fixes that. As a result, running -O3 -Oz will now work as expected: the first -O3
will run the right passes (as #5333 fixed) and while running them, the global
optimize/shrinkLevels will be -O3 (and not -Oz), which this PR fixes.
A specific result of this is that -O3 -Oz used to inline less, since the invocation
of inlining during -O3 thought we were optimizing for size. The new test verifies
that we do fully inline in the first -O3 now.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass strips all EH stuff, including EH instructions and tags, from
the input module and disables the EH feature from the features section.
1. This removes `catch` and `catch_all` blocks from the code. So
```wast
(try
(do
(some code)
)
(catch
...
)
)
```
becomes just `(some code)`. Note that all `rethrow`s will be removed
with `catch`es. Note that all `rethrow`s will be removed with `catch`es.
2. This converts 'throw (...)` into `unreachable`. Note that `rethrows
3. This removes all tags from the module, which are unused anyway after
1 and 2.
4. This removes exception handling feature from the features section.
You can use the pass with
```console
$ wasm-opt --enable-exception-handling --strip-eh INPUT -o OUTPUT
```
This is not an optimization pass, so it is not run unless you specify
the pass explicitly.
This is in effect similar to Clang's `-fignore-exceptions`, in which you
can throw but it will result in a crash and we compile away all landing
pads. This can be used for people who don't (or can't) use
`-fignore-exceptions` in their build settings or who want to compile
away `catch` blocks later.
Closes emscripten-core/emscripten#19585.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Disable sign extension in SignExtLowering.cpp
The sign extension lowering pass would previously lower away the sign extension
instructions, but it wouldn't disable the sign extension feature, so follow-on
passes such as optimize-instructions could reintroduce sign extension
instructions.
Fix the pass to disable the sign extension feature to prevent sign extension
instructions from being reintroduced later.
* update pass description
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a type hierarchy has abstract classes in the middle, that is, types that
are never instantiated, then we can optimize casts and other operations
to them. Say in Java that we have `AbstractList`, and it only has one
subclass `IntList` that is ever created, then any place we have an `AbstractList`
we must actually have an `IntList`, or a null. (Or, if no subtype is instantiated,
then the value must definitely be a null.)
The actual implementation does a type mapping, that is, it finds all places
using an abstract type and makes them refer to the single instantiated
subtype (or null). After that change, no references to the abstract type
remain in the program, so this both refines types and also cleans up the
type section.
|
|
|
|
|
| |
Nested runners should be ignored, as they run some internal stuff in
certain passes, which would not contain the pass the user asked to
skip with --skip-pass.
|
|
|
|
|
|
|
|
| |
For example,
-O3 --skip-pass=vacuum
will run -O3 normally but it will not run the vacuum pass at all
(which normally runs more than once in -O3).
|
|
|
|
| |
Without the names section debugging can be hard sometimes, on the binaries
that that mode emits for each pass.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The type rewriting utility in type-updating.cpp gathers all the used heap types,
then rewrites them to newly built and possibly modified heap types. The problem
is that for the isorecursive type system, the set of "used" heap types was
overly broad because it also included unused heap types that are in a rec group
with used types. In the context of emitting a binary, it is important to treat
these types as used because failing to emit them would change the identity of
the used types, but in the context of type optimizations it is ok to treat them
as truly unused because we are changing type identities anyway.
Update the type rewriting utility to only include truly used types in the set of
output types. This causes all existing type optimizations to implicitly drop
unused types, but only if they find any other optimizations to do and actually
run the rewriter utitility. Their output will also still include unused types
that were used before their optimizations were applied.
To overcome these limitations and better match the optimizing power of nominal
mode, which never includes unused types in the output, add a new type
optimization pass that removes unused types and does nothing else and run it
near the end of the global optimization pipeline.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do not optimize or modify public heap types in any way. Public heap types
include the types of imported or exported functions, tables, globals, etc. This
is important to maintain the public interface of a module and ensure it can
still link interact as intended with the outside world.
Also add validation error if we find any nontrivial public types that are not
the types of imported or exported functions. This error is meant to help the
user ensure that type optimizations are not silently inhibited. In the future,
we may want to add options to silence this error or downgrade it to a warning.
This commit only updates the type updating machinery to avoid updating public
types. It does not update any optimization passes accordingly. Since we avoid
modifying public signature types already, this is not expected to break
anything, but in the future once we have function subtyping or if we make the
error optional, we may have to update some of our optimization passes.
|
|
|
| |
Per the wasm spec guidelines for Load (rule 10) & Store (rule 12), this PR adds an option for bounds checking, producing a runtime error if the instruction exceeds the bounds of the particular memory within the combined memory.
|
|
|
|
|
|
|
|
| |
This finds types that can be merged into their super: types that add no
fields, and are not used in casts, etc. - so we might as well use the super.
This complements TypeSSA, in that it can merge back the new types that
TypeSSA created, if we never found a use for them. Without this, TypeSSA
can bloat binary size quite a lot (I see 10-20%).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This creates new nominal types for each (interesting) struct.new. That then allows
type-based optimizations to be more precise, as those optimizations will track
separate info for each struct.new, in effect. That is kind of like SSA, however, we
do not handle merges. For example:
x = struct.new $A (5);
print(x.value);
y = struct.new $A (11);
print(y.value);
// => //
x = struct.new $A.x (5);
print(x.value);
y = struct.new $A.y (11);
print(y.value);
After the pass runs each of those struct.new creates a unique type, and type-based
analysis can see that 5 or 11 are the only values written in that type (if nothing else
writes there).
This bloats the type section with the new subtypes, so it is best used with a pass
to merge unneeded duplicate types, which a later PR will add. That later PR will
exactly merge back in the types created here, which are nominally different but
indistinguishable otherwise.
This pass is not enabled by default. It's not clear yet where is the best place to do it,
as it must be balanced by type merging, but it might be better to do multiple
rounds of optimization between the two. Needs more investigation.
|