| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change we default to an open world, that is, we do the safe thing
by default: we no longer assume a closed world. Users that want a closed
world must pass --closed-world.
Atm we just do not run passes that assume a closed world. (We might later
refine them to find which types don't escape and only optimize those.) The
RemoveUnusedModuleElements is an exception in that the closed-world
flag influences one part of its operation, but not the rest.
Fixes #5292
|
|
|
|
| |
Equirecursive is no longer standards track and its implementation is extremely
complex. Remove it.
|
|
|
|
|
| |
Dumping the text is nice sometimes, but on huge testcases the wat can be 1 GB
in size (!), and so dumping one per pass can lead to using 20 GB or so for the full
optimization pipeline. Emit just the binary to avoid that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(some.operation
(ref.cast .. (local.get $ref))
(local.get $ref)
)
=>
(some.operation
(local.tee $temp
(ref.cast .. (local.get $ref))
)
(local.get $temp)
)
This can help cases where we cast for some reason but happen to not use the
cast value in all places. This occurs in j2wasm in itable calls sometimes: The
this pointer is is refined, but the itable may be done with an unrefined pointer,
which is less optimizable.
So far this is just inside basic blocks, but that is enough for the cast of itable
calls and other common patterns I see.
|
|
|
|
| |
Fixes #5250
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Monomorphization finds cases where we send more refined types to a function
than it declares. In such cases we can copy the function and refine the parameters:
// B is a subtype of A
foo(new B());
function foo(x : A) { ..}
=>
foo_B(new B()); // call redirected to refined copy
function foo(x : A) { ..} // unchanged
function foo_B(x : B) { ..} // refined copy
This increases code size so it may not be worth it in all cases. This initial PR is
hopefully enough to start experimenting with this on performance, and so it does
not enable the pass by default.
This adds two variations of monomorphization, one that always does it, and the
default which is "careful": it sees whether monomorphizing lets the refined function
actually be better than the original (say, by removing a cast). If there is no
improvement then we do not make any changes. This saves a significant amount
of code size - on j2wasm the careful version increases by 13% instead of 20% -
but it does run more slowly obviously.
|
|
|
|
|
|
|
|
|
| |
This sorts globals by their usage (and respecting dependencies). If the module
has very many globals then using smaller LEBs can matter.
If there are fewer than 128 globals then we cannot reduce size, and the pass
exits early (so this pass will not slow down MVP builds, which usually have just
1 global, the stack pointer). But with wasm GC it is common to use globals for
vtables etc., and often there is a very large number of them.
|
|
|
|
|
|
|
|
|
|
| |
Adds a multi-memories lowering pass that will create a single combined memory from the memories added to the module. This pass assumes that each memory is configured the same (type, shared).
This pass also:
- replaces existing memory.size instructions with a custom function that returns the size of each memory as if they existed independently
- replaces existing memory.grow instructions with a custom function, using global offsets to track the page size of each memory so data doesn't overlap in the singled combined memory
- adjusts the offsets of active data segments
- adjusts the offsets of Loads/Stores
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the goal of supporting null characters (i.e. zero bytes) in strings.
Rewrite the underlying interned `IString` to store a `std::string_view` rather
than a `const char*`, reduce the number of map lookups necessary to intern a
string, and present a more immutable interface.
Most importantly, replace the `c_str()` method that returned a `const char*`
with a `toString()` method that returns a `std::string`. This new method can
correctly handle strings containing null characters. A `const char*` can still
be had by calling `data()` on the `std::string_view`, although this usage should
be discouraged.
This change is NFC in spirit, although not in practice. It does not intend to
support any particular new functionality, but it is probably now possible to use
strings containing null characters in at least some cases. At least one parser
bug is also incidentally fixed. Follow-on PRs will explicitly support and test
strings containing nulls for particular use cases.
The C API still uses `const char*` to represent strings. As strings containing
nulls become better supported by the rest of Binaryen, this will no longer be
sufficient. Updating the C and JS APIs to use pointer, length pairs is left as
future work.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously only WalkerPasses had access to the `getPassRunner` and
`getPassOptions` methods. Move those methods to `Pass` so all passes can use
them. As a result, the `PassRunner` passed to `Pass::run` and
`Pass::runOnFunction` is no longer necessary, so remove it.
Also update `Pass::create` to return a unique_ptr, which is more efficient than
having it return a raw pointer only to have the `PassRunner` wrap that raw
pointer in a `unique_ptr`.
Delete the unused template `PassRunner::getLast()`, which looks like it was
intended to enable retrieving previous analyses and has been in the code base
since 2015 but is not implemented anywhere.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a map of function name => the effects of that function to the
PassOptions structure. That lets us compute those effects once and then
use them in multiple passes afterwards. For example, that lets us optimize
away a call to a function that has no effects:
(drop (call $nothing))
[..]
(func $nothing
;; .. lots of stuff but no effects, only a returned value ..
)
Vacuum will remove that dropped call if we tell it that the called function has
no effects. Note that a nice result of adding this to the PassOptions struct
is that all passes will use the extra info automatically.
This is not enabled by default as the benefits seem rather minor, though it
does help in a small but noticeable way on J2Wasm code, where we use
call.without.effects and have situations like this:
(func $foo
(call $bar)
)
(func $bar
(call.without.effects ..)
)
The call to bar looks like it has effects, normally, but with global effect info
we know it actually doesn't.
To use this, one would do
--generate-global-effects [.. some passes that use the effects ..] --discard-global-effects
Discarding is not necessary, but if there is a pass later that adds effects, then not
discarding could lead to bugs, since we'd think there are fewer effects than there are.
(However, normal optimization passes never add effects, only remove them.)
It's also possible to call this multiple times:
--generate-global-effects -O3 --generate-global-effects -O3
That computes affects after the first -O3, and may find fewer effects than earlier.
This doesn't compute the full transitive closure of the effects across functions. That is,
when computing a function's effects, we don't look into its own calls. The simple case
so far is enough to handle the call.without.effects example from before (though it
may take multiple optimization cycles).
|
|
|
|
|
|
|
| |
Add a pass that wraps all imports and exports with functions that handle
storing and passing along the suspender externref needed for JSPI.
https://github.com/WebAssembly/js-promise-integration/blob/main/proposals/js-promise-integration/Overview.md
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An overview of this is in the README in the diff here (conveniently, it is near the
top of the diff). Basically, we fix up nn locals after each pass, by default. This keeps
things easy to reason about - what validates is what is valid wasm - but there are
some minor nuances as mentioned there, in particular, we ignore nameless blocks
(which are commonly added by various passes; ignoring them means we can keep
more locals non-nullable).
The key addition here is LocalStructuralDominance which checks which local
indexes have the "structural dominance" property of 1a, that is, that each get has
a set in its block or an outer block that precedes it. I optimized that function quite
a lot to reduce the overhead of running that logic after each pass. The overhead
is something like 2% on J2Wasm and 0% on Dart (0%, because in this mode we
shrink code size, so there is less work actually, and it balances out).
Since we run fixups after each pass, this PR removes logic to manually call the
fixup code from various places we used to call it (like eh-utils and various passes).
Various passes are now marked as requiresNonNullableLocalFixups => false.
That lets us skip running the fixups after them, which we normally do automatically.
This helps avoid overhead. Most passes still need the fixups, though - any pass
that adds a local, or a named block, or moves code around, likely does.
This removes a hack in SimplifyLocals that is no longer needed. Before we
worked to avoid moving a set into a try, as it might not validate. Now, we just do it
and let fixups happen automatically if they need to: in the common code they
probably don't, so the extra complexity seems not worth it.
Also removes a hack from StackIR. That hack tried to avoid roundtrip adding a
nondefaultable local. But we have the logic to fix that up now, and opts will
likely keep it non-nullable as well.
Various tests end up updated here because now a local can be non-nullable -
previous fixups are no longer needed.
Note that this doesn't remove the gc-nn-locals feature. That has been useful for
testing, and may still be useful in the future - it basically just allows nn locals in
all positions (that can't read the null default value at the entry). We can consider
removing it separately.
Fixes #4824
|
|
|
|
|
|
| |
In BINARYEN_PASS_DEBUG=2 we save the module before each pass, and if
validation fails afterwards, we print the module before. This PR does the same for
function-parallel passes - in that case, we can actually show the specific function
that broke validation, as opposed to the whole module.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This tracks the possible contents in the entire program all at once using a single IR.
That is in contrast to say DeadArgumentElimination of LocalRefining etc., all of whom
look at one particular aspect of the program (function params and returns in DAE,
locals in LocalRefining). The cost is to build up an entire new IR, which takes a lot
of new code (mostly in the already-landed PossibleContents). Another cost
is this new IR is very big and requires a lot of time and memory to process.
The benefit is that this can find opportunities that are only obvious when looking
at the entire program, and also it can track information that is more specialized
than the normal type system in the IR - in particular, this can track an ExactType,
which is the case where we know the value is of a particular type exactly and not
a subtype.
|
|
|
|
|
|
|
|
|
| |
This pass helps on at least one Java microbenchmark in a clear way. Real-world data
is mixed, with no obvious benefit. But it does optimize 819 callsites on the real-world
14 MB J2Wasm binary, so we may find it helps if/when we run into those code paths.
On that binary (the biggest we have for GC) this pass runs in 0.12 seconds, so there
is very little downside to enabling it. It is a fast linear-time operation.
|
|
|
|
|
|
|
|
| |
We have some possible use cases for this pass, and so are restoring
it.
This reverts the removal in #3261, fixes compile errors in internal API
changes since then, and flips the direction of the stack for the
wasm backend.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimizes constants in the megamorphic case of two: when we
know two function references are possible, we could in theory emit this:
(select
(ref.func A)
(ref.func B)
(ref.eq
(..ref value..) ;; globally, only 2 things are possible here, and one has
;; ref.func A as its value, and the other ref.func B
(ref.func A))
That is, compare to one of the values, and emit the two possible values there.
Other optimizations can then turn a call_ref on this select into an if over
two direct calls, leading to devirtualization.
We cannot compare a ref.func directly (since function references are not
comparable), and so instead we look at immutable global structs. If we
find a struct type that has only two possible values in some field, and
the structs are in immutable globals (which happens in the vtable case
in j2wasm for example), then we can compare the references of the struct
to decide between the two values in the field.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new signature-pruning pass that prunes parameters from
signature types where those parameters are never used in any function
that has that type. This is similar to DeadArgumentElimination but works
on a set of functions, and it can handle indirect calls.
Also move a little code from SignatureRefining into a shared place to
avoid duplication of logic to update signature types.
This pattern happens in j2wasm code, for example if all method functions
for some virtual method just return a constant and do not use the this
pointer.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Merge similar functions that only differs constant values (like immediate
operand of const and call insts) by parameterization.
Performing this pass at post-link time can merge more functions across
objects. Inspired by Swift compiler's optimization which is derived from
LLVM's one:
https://github.com/apple/swift/blob/main/lib/LLVMPasses/LLVMMergeFunctions.cpp
https://github.com/llvm/llvm-project/blob/main/llvm/docs/MergeFunctions.rst
The basic ideas here are constant value parameterization and direct callee
parameterization by indirection.
Constant value parameterization is like below:
;; Before
(func $big-const-42 (result i32)
[[many instr 1]]
(i32.const 44)
[[many instr 2]]
)
(func $big-const-43 (result i32)
[[many instr 1]]
(i32.const 45)
[[many instr 2]]
)
;; After
(func $byn$mgfn-shared$big-const-42 (result i32)
[[many instr 1]]
(local.get $0) ;; parameterized!!
[[many instr 2]]
)
(func $big-const-42 (result i32)
(call $byn$mgfn-shared$big-const-42
(i32.const 42)
)
)
(func $big-const-43 (result i32)
(call $byn$mgfn-shared$big-const-42
(i32.const 43)
)
)
Direct callee parameterization is similar to the constant value parameterization,
but it parameterizes callee function i by ref.func instead. Therefore it is enabled
only when reference-types and typed-function-references features are enabled.
I saw 1 ~ 2 % reduction for SwiftWasm binary and Ruby's wasm port
using wasi-sdk, and 3 ~ 4.5% reduction for Unity WebGL binary when -Oz.
|
|
|
|
| |
After emscripten-core/emscripten#15905 lands Emscripten will no longer use it,
and nothing else needs it AFAIK.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds `EHUtils::handleBlockNestedPops`, which can be called at the
end of passes that has a possibility to put `pop`s inside `block`s. This
method assumes there exists a `pop` in a first-descendant line, even
though it can be nested within a block. This allows a `pop` to be nested
within a `block` or a `try`, but not a `loop`, since that means the
`pop` can run multile times. In case of `if`, `pop` can exist only in
its condition; if a `pop` is in its true or false body, that's not in
the first-descendant line.
This can be useful when optimization passes create blocks to do
transformations. Wrapping expressions wiith a block does not change
semantics most of the time, but if pops happen to be inside a block
generated by those passes, they can result in invalid binaries.
To test this, this adds `passes/test_passes.cpp`, which is intended to
contain multiple test passes that test a single (or more) utility
functions separately. Without this kind of pass, it is hard to test
various cases in which nested `pop`s can be generated in existing
passes. This PR also adds `PassRegistry::registerTestPass`, which
registers a pass that's intended only for internal testing and does not
show up in `wasm-opt --help`.
Fixes #4237.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is fairly short and simple after the recent refactorings. This basically
just finds all uses of each signature/function type, and then sees if it
receives more specific types as params. It then rewrites the types if so.
This just handles arguments so far, and not return types.
This differs from DeadArgumentElimination's refineArguments() in that
that pass modifies each function by itself, changing the type of the
function as needed. That is only valid if the type is not observable, that
is, if the function is called indirectly then DAE ignores it. This pass will
work on the types themselves, so it considers all functions sharing a
type as a whole, and when it upgrades that type it ends up affecting them
all.
This finds optimization opportunities on 4% of the total signature
types in j2wasm. Those lead to some benefits in later opts, but the
effect is not huge.
|
|
|
|
|
|
|
|
| |
Fairly simple, this uses the existing infrastructure to find opportunities
to refine the type of a global variable. This a common pattern in j2wasm
for example, where a global begins as a null of $java.lang.Object (the
least specific type) but it is in practice always assigned an object of
some specific type.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
This specializes the fields of structs based on the types written to them. That is,
if a field is of type A but in practice we always write some subtype B to it
then we can change the type of the field to that.
On j2wasm this manages to improve at least one field in 2% of types. Not a
large amount, but this does lead to further benefits in later opts (e.g. about a third
of the improvements are to turn a field non-nullable).
|
|
|
|
|
|
|
|
| |
Now that all known issues with that pass are fixed, enable it by
default. This adds it in a place that seems to make sense on j2wasm,
but in general multiple cycles of optimization will be needed.
This adds a test showing that we run this pass and that it helps
ConstantFieldPropagation by running before it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add struct.get tracking, and if a field is never read from, simply remove
it.
This will error if a field is written using struct.new with a value with side
effects. It is not clear we can handle that, as if the struct.new is in a
global then we can't save the other values to locals etc. to reorder
things. We could perhaps use other globals for it (ugh) but at least for
now, that corner case does not happen on any code I can see.
This allows a quite large code size reduction on j2wasm output (20%). The
reason is that many vtable fields are not actually read, and so removing
them and the ref.func they hold allows us to get rid of those functions,
and code that they reach.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new pass to perform global type optimization. So far this just
does one thing, to find fields with no struct.set and to turn them
immutable (where possible - sub and supertypes must agree).
To do that, this adds a GlobalTypeRewriter utility which rewrites
all the heap types in the module, allowing changes while doing so.
In this PR, the change is to flip the mutable field. Otherwise, the
utility handles all the boilerplate of creating temp heap types using
a TypeBuilder, and it handles replacing the types in every place
they are used in the module.
This is not enabled by default yet as I don't see enough of a benefit
on j2cl. This PR is basically the simplest thing to do in the space of
global type optimization, and the simplest way I can think of to
fully test the GlobalTypeRewriter (which can't be done as a unit
test, really, since we want to emit a full module and validate it etc.).
This PR builds the foundation for more complicated things like
removing unused fields, subtyping fields, and more.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An "intrinsic" is modeled as a call to an import. We could also add new
IR things for them, but that would take more work and lead to less
clear errors in other tools if they try to read a binary using such a
nonstandard extension.
A first intrinsic is added here, call.without.effects This is basically the same
as call_ref except that the optimizer is free to assume the call has no
side effects. Consequently, if the result is not used then it can be optimized
out (as even if it is not used then side effects could have kept it around).
Likewise, the lack of side effects allows more reordering and other
things.
A lowering pass for intrinsics is provided. Rather than automatically
lower them to normal wasm at the end of optimizations, the user must
call that pass explicitly. A typical workflow might be
-O --intrinsic-lowering -O
That optimizes with the intrinsic present - perhaps removing calls
thanks to it - then lowers it into normal wasm - it turns into a call_ref -
and then optimizes further, which would turns the call_ref into a
direct call, potentially inline, etc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some functions run only once with this pattern:
function foo() {
if (foo$ran) return;
foo$ran = 1;
...
}
If that global is not ever set to 0, then the function's payload (after the
initial if and return) will never execute more than once. That means we
can optimize away dominated calls:
foo();
foo(); // we can remove this
To do this, we find which globals are "once", which means they can
fit in that pattern, as they are never set to 0. If a function looks like the
above pattern, and it's global is "once", then the function is "once" as
well, and we can perform this optimization.
This removes over 8% of static calls in j2cl.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable it in -O3 and -Os and higher.
This helps very little on output from LLVM, but also it does not alter
compile times much anyhow. On code that has not been run through
an optimizing compiler already, this can help quite a lot, e.g., 15% of
code size on some wasm GC samples.
This will not normally help with speed, as optimizing VMs do such
things anyhow. However, this can help baseline compilers and
interpreters and so forth.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Technically this is not a new pass, but it is a rewrite almost from scratch.
Local Common Subexpression Elimination looks for repeated patterns,
stuff like this:
x = (a + b) + c
y = a + b
=>
temp = a + b
x = temp + c
y = temp
The old pass worked on flat IR, which is inefficient, and was overly
complicated because of that. The new pass uses a new algorithm that
I think is pretty simple, see the detailed comment at the top.
This keeps the pass enabled only in -O4, like before - right after
flattening the IR. That is to make this as minimal a change as possible.
Followups will enable the pass in the main pipeline, that is, we will
finally be able to run it by default. (Note that to make the pass work
well after flatten, an extra simplify-locals is added - the old pass used
to do part of simplify-locals internally, which was one source of
complexity. Even so, some of the -O4 tests have changes, due to
minor factors - they are just minor orderings etc., which can be
seen by inspecting the outputs before and after using e.g.
--metrics)
This plus some followup work leads to large wins on wasm GC output.
On j2cl there is a common pattern of repeated struct.gets, so common
that this pass removes 85% of all struct.gets, which makes the total
binary 15% smaller. However, on LLVM-emitted code the benefit is
minor, less than 1%.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A field in a struct is constant if we can see that in the entire program we
only ever write the same constant value to it. For example, imagine a
vtable type that we construct with the same funcrefs each time, then (if
we have no struct.sets, or if we did, and they had the same value), we
could replace a get with that constant value, since it cannot be anything
else:
(struct.new $T (i32.const 10) (rtt))
..no other conflicting values..
(struct.get $T 0) => (i32.const 10)
If the value is a function reference, then this may allow other passes
to turn what was a call_ref into a direct call and perhaps also get
inlined, effectively a form of devirtualization.
This only works in nominal typing, as we need to know the supertype
of each type. (It could work in theory in structural, but we'd need to do
hard work to find all the possible supertypes, and it would also
become far less effective.)
This deletes a trivial test for running -O on GC content. We have
many more tests for GC at this point, so that test is not needed, and
this PR also optimizes the code into something trivial and
uninteresting anyhow.
|
|
|
|
|
| |
Add a new OptimizeForJS pass which contains rewriting rules specific to JavaScript.
LLVM usually lowers x != 0 && (x & (x - 1)) == 0 (isPowerOf2) to popcnt(x) == 1 which is ok for wasm and other targets but is quite expensive for JavaScript. In this PR we lower the popcnt pattern back to the isPowerOf2 pattern.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a local is say anyref, but all values assigned to it are something
more specific like funcref, then we can make the type of the local
more specific. In principle that might allow further optimizations, as
the local.gets of that local will have a more specific type that their
users can see, like this:
(import .. (func $get-funcref (result funcref)))
(func $foo (result i32)
(local $x anyref)
(local.set $x (call $get-funcref))
(ref.is_func (local.get $x))
)
=>
(func $foo (result i32)
(local $x funcref) ;; updated to a subtype of the original
(local.set $x (call $get-funcref))
(ref.is_func (local.get $x)) ;; this can now be optimized to "1"
)
A possible downside is that using more specific types may not end
up allowing optimizations but may end up increasing the size of
the binary (say, replacing lots of anyref with various specific
types that compress more poorly; also, for recursive types the LUB
may be a unique type appearing nowhere else in the wasm). We
should investigate the code size factors more later.
|
|
|
|
|
|
|
|
| |
This is a useful alternative to extract-function when you don't know the
function's name.
Also moves the extract-function tests to be lit tests and re-uses them as
extract-function-index tests.
|
|
|
|
|
|
|
|
|
|
|
| |
If we run a pass that removes DWARF followed by one that could destroy it, then
there is no possible problem - there is nothing left to destroy. We can run the later
pass with no issues (and no warnings).
Also add an assertion on running a pass runner only once. That has always been
the assumption, and now that we track whether the added passes remove debug
info, we need to check it.
Fixes emscripten-core/emscripten#14161
|
|
|
|
|
|
|
|
|
|
| |
wasm-as supports --symbolmap=FOO as an argument. We got a request to
support the same in wasm-opt. wasm-opt does have --print-function-map which
does the same, but as a pass. To unify them, use the new pass arg sugar from
#3882 which allows us to add a --symbolmap pass whose argument can be
set as --symbolmap=FOO. That perfectly matches the wasm-as notation.
For now, keep the old --print-function-map notation as well, to not break
emscripten. After we remove it there we can remove it here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we allocate some GC data, and do not let the reference escape, then we can
replace the allocation with locals, one local for each field in the allocation
basically. This avoids the allocation, and also allows us to optimize the locals
further.
On the Dart DeltaBlue benchmark, this is a 24% speedup (making it faster than
the JS version, incidentially), and also a 6% reduction in code size.
The tests are not the best way to show what this does, as the pass assumes
other passes will clean up after. Here is an example to clarify. First, in pseudocode:
ref = new Int(42)
do {
ref.set(ref.get() + 1)
} while (import(ref.get())
That is, we allocate an int on the heap and use it as a counter. Unnecessarily,
as it could be a normal int on the stack.
Wat:
(module
;; A boxed integer: an entire struct just to hold an int.
(type $boxed-int (struct (field (mut i32))))
(import "env" "import" (func $import (param i32) (result i32)))
(func "example"
(local $ref (ref null $boxed-int))
;; Allocate a boxed integer of 42 and save the reference to it.
(local.set $ref
(struct.new_with_rtt $boxed-int
(i32.const 42)
(rtt.canon $boxed-int)
)
)
;; Increment the integer in a loop, looking for some condition.
(loop $loop
(struct.set $boxed-int 0
(local.get $ref)
(i32.add
(struct.get $boxed-int 0
(local.get $ref)
)
(i32.const 1)
)
)
(br_if $loop
(call $import
(struct.get $boxed-int 0
(local.get $ref)
)
)
)
)
)
)
Before this pass, the optimizer could do essentially nothing with this.
Even with this pass, running -O1 has no effect, as the pass is only
used in -O2+. However, running --heap2local -O1 leads to this:
(func $0
(local $0 i32)
(local.set $0
(i32.const 42)
)
(loop $loop
(br_if $loop
(call $import
(local.tee $0
(i32.add
(local.get $0)
(i32.const 1)
)
)
)
)
)
)
All the GC heap operations have been removed, and we just
have a plain int now, allowing a bunch of other opts to run. That
output is basically the optimal code, I think.
|
|
|
|
|
|
| |
This allows changing a global's value on the commandline in an easy way.
In some toolchains this is useful as the build can contain a global that
indicates something like a logging level, which this can customize.
|
|
|
|
|
| |
The pass gives a simple short name to each type, $type$N. This can be
useful in debugging, to avoid the autogenerated names which can be very
long.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a Poppify ("--poppify") pass for converting normal Binaryen IR to Poppy IR.
Like the existing construction of Stacky IR, Poppify depends on the
BinaryenIRWriter to drive the emitting of instructions in correct stack machine
order. As instructions are "emitted," Poppify replaces their children with pops
and collects them in a list. At the end of each scope, Poppify creates a block
containing all the collected instructions for that scope and injects that block
into the enclosing scope. All tuple globals and instructions dealing with tuples
are also expanded to remove all tuples from the program.
The validator currently fails to validate many valid Poppy IR patterns produced
in the tests, but fixing that is left as follow-on work to keep this PR focused
on the Poppify pass itself. For now the tests simply skip validation.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the addDefault* methods would avoid adding opt passes that we
know are incompatible with DWARF. However, that didn't handle the case of
passes that are added in other ways. For example, when running Asyncify,
emcc will run --flatten before, and that pass is not compatible with DWARF.
This PR lets us warn on that by annotating the passes themselves. Then we
use those annotation to either not run a pass at all (matching the previous
behavior) or to show a warning when necessary.
Fixes emscripten-core/emscripten#13288 . That is, concretely
after this PR running asyncify + DWARF will show a warning to the user.
|
|
|
|
|
|
|
|
| |
This avoids needing to add include wasm-printing if a file doesn't already have it.
To achieve that, add the std::ostream hooks in wasm.h, and also use them
when possible, removing the need for the special WasmPrinter object.
Also stop printing in "full" (print types on each line) in error messages by default. The
user can still get that, as always, using BINARYEN_PRINT_FULL=1 in the env.
|
| |
|