| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
| |
An initial content testcase may only work in open world, so check for that
using the existing mechanism of checking if such testcases work with out
feature flags.
|
|
|
|
|
|
|
|
| |
This finds types that can be merged into their super: types that add no
fields, and are not used in casts, etc. - so we might as well use the super.
This complements TypeSSA, in that it can merge back the new types that
TypeSSA created, if we never found a use for them. Without this, TypeSSA
can bloat binary size quite a lot (I see 10-20%).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This creates new nominal types for each (interesting) struct.new. That then allows
type-based optimizations to be more precise, as those optimizations will track
separate info for each struct.new, in effect. That is kind of like SSA, however, we
do not handle merges. For example:
x = struct.new $A (5);
print(x.value);
y = struct.new $A (11);
print(y.value);
// => //
x = struct.new $A.x (5);
print(x.value);
y = struct.new $A.y (11);
print(y.value);
After the pass runs each of those struct.new creates a unique type, and type-based
analysis can see that 5 or 11 are the only values written in that type (if nothing else
writes there).
This bloats the type section with the new subtypes, so it is best used with a pass
to merge unneeded duplicate types, which a later PR will add. That later PR will
exactly merge back in the types created here, which are nominally different but
indistinguishable otherwise.
This pass is not enabled by default. It's not clear yet where is the best place to do it,
as it must be balanced by type merging, but it might be better to do multiple
rounds of optimization between the two. Needs more investigation.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change we default to an open world, that is, we do the safe thing
by default: we no longer assume a closed world. Users that want a closed
world must pass --closed-world.
Atm we just do not run passes that assume a closed world. (We might later
refine them to find which types don't escape and only optimize those.) The
RemoveUnusedModuleElements is an exception in that the closed-world
flag influences one part of its operation, but not the rest.
Fixes #5292
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(some.operation
(ref.cast .. (local.get $ref))
(local.get $ref)
)
=>
(some.operation
(local.tee $temp
(ref.cast .. (local.get $ref))
)
(local.get $temp)
)
This can help cases where we cast for some reason but happen to not use the
cast value in all places. This occurs in j2wasm in itable calls sometimes: The
this pointer is is refined, but the itable may be done with an unrefined pointer,
which is less optimizable.
So far this is just inside basic blocks, but that is enough for the cast of itable
calls and other common patterns I see.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Monomorphization finds cases where we send more refined types to a function
than it declares. In such cases we can copy the function and refine the parameters:
// B is a subtype of A
foo(new B());
function foo(x : A) { ..}
=>
foo_B(new B()); // call redirected to refined copy
function foo(x : A) { ..} // unchanged
function foo_B(x : B) { ..} // refined copy
This increases code size so it may not be worth it in all cases. This initial PR is
hopefully enough to start experimenting with this on performance, and so it does
not enable the pass by default.
This adds two variations of monomorphization, one that always does it, and the
default which is "careful": it sees whether monomorphizing lets the refined function
actually be better than the original (say, by removing a cast). If there is no
improvement then we do not make any changes. This saves a significant amount
of code size - on j2wasm the careful version increases by 13% instead of 20% -
but it does run more slowly obviously.
|
|
|
|
|
|
|
| |
That PR renamed test/lit/optimize-instructions.wast to
test/lit/optimize-instructions-mvp.wast. However, the fuzzer was explicitly
adding the testto the list of important initial contents under the old name, so
it was failing an assertion that the initial contents existed. Update the fuzzer
to use the new test name.
|
|
|
|
|
|
|
| |
These operations emit a completely different type than their input, so they must be
marked as roots, and not as things that flow values through them (because then
we filter everything out as the types are not compatible).
Fixes #5219
|
| |
|
|
|
|
|
| |
The fuzzer started to fail on the recent externalize/internalize test
that was added in #5175 as we lack interpreter support. Move that to a separate
file and ignore it in the fuzzer for now.
|
|
|
| |
Also fix some formatting issue in the file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a map of function name => the effects of that function to the
PassOptions structure. That lets us compute those effects once and then
use them in multiple passes afterwards. For example, that lets us optimize
away a call to a function that has no effects:
(drop (call $nothing))
[..]
(func $nothing
;; .. lots of stuff but no effects, only a returned value ..
)
Vacuum will remove that dropped call if we tell it that the called function has
no effects. Note that a nice result of adding this to the PassOptions struct
is that all passes will use the extra info automatically.
This is not enabled by default as the benefits seem rather minor, though it
does help in a small but noticeable way on J2Wasm code, where we use
call.without.effects and have situations like this:
(func $foo
(call $bar)
)
(func $bar
(call.without.effects ..)
)
The call to bar looks like it has effects, normally, but with global effect info
we know it actually doesn't.
To use this, one would do
--generate-global-effects [.. some passes that use the effects ..] --discard-global-effects
Discarding is not necessary, but if there is a pass later that adds effects, then not
discarding could lead to bugs, since we'd think there are fewer effects than there are.
(However, normal optimization passes never add effects, only remove them.)
It's also possible to call this multiple times:
--generate-global-effects -O3 --generate-global-effects -O3
That computes affects after the first -O3, and may find fewer effects than earlier.
This doesn't compute the full transitive closure of the effects across functions. That is,
when computing a function's effects, we don't look into its own calls. The simple case
so far is enough to handle the call.without.effects example from before (though it
may take multiple optimization cycles).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In practice typed function references will not ship before GC and is not
independently useful, so it's not necessary to have a separate feature for it.
Roll the functionality previously enabled by --enable-typed-function-references
into --enable-gc instead.
This also avoids a problem with the ongoing implementation of the new GC bottom
heap types. That change will make all ref.null instructions in Binaryen IR refer
to one of the bottom heap types. But since those bottom types are introduced in
GC, it's not valid to emit them in binaries unless unless GC is enabled. The fix
if only reference types is enabled is to emit (ref.null func) instead
of (ref.null nofunc), but that doesn't always work if typed function references
are enabled because a function type more specific than func may be required.
Getting rid of typed function references as a separate feature makes this a
nonissue.
|
|
|
|
| |
These new GC instructions infallibly convert between `extern` and `any`
references now that those types are not in the same hierarchy.
|
|
|
|
| |
Also fix a small logic error - call lines can be prefixes of each other, so use
the full line (including newline) to differentiate.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This mode is tricky to fuzz because the mode is basically "assume traps never
happen; if a trap does happen, that is undefined behavior". So if any trap occurs
in the random fuzz testcase, we can't optimize with -tnh and assume the results
stay to same. To avoid that, we ignore all functions from the first one that traps,
that is, we only compare the code that ran without trapping. That often is a small
subset of the total functions, sadly, but I do see that this ends up with some useful
coverage despite the drawback.
This also requires some fixes to comparing of references, specifically, funcrefs
are printed with the function name/index, but that can change during opts, so
ignore that. This wasn't noticed before because this new fuzzer mode does
comparisons of --fuzz-exec-before output, instead of doing --fuzz-exec which
runs it before and after and compares it internally in wasm-opt. Here we are
comparing the output externally, which we didn't do before.
|
|
|
|
|
|
|
| |
This PR removes the single memory restriction in IR, adding support for a single module to reference multiple memories. To support this change, a new memory name field was added to 13 memory instructions in order to identify the memory for the instruction.
It is a goal of this PR to maintain backwards compatibility with existing text and binary wasm modules, so memory indexes remain optional for memory instructions. Similarly, the JS API makes assumptions about which memory is intended when only one memory is present in the module. Another goal of this PR is that existing tests behavior be unaffected. That said, tests must now explicitly define a memory before invoking memory instructions or exporting a memory, and memory names are now printed for each memory instruction in the text format.
There remain quite a few places where a hardcoded reference to the first memory persist (memory flattening, for example, will return early if more than one memory is present in the module). Many of these call-sites, particularly within passes, will require us to rethink how the optimization works in a multi-memories world. Other call-sites may necessitate more invasive code restructuring to fully convert away from relying on a globally available, single memory pointer.
|
| |
|
|
|
|
|
|
|
| |
* better
* fix
* undo
|
| |
|
|
|
|
|
|
|
| |
RTTs were removed from the GC spec and if they are added back in in the future,
they will be heap types rather than value types as in our implementation.
Updating our implementation to have RTTs be heap types would have been more work
than deleting them for questionable benefit since we don't know how long it will
be before they are specced again.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This tracks the possible contents in the entire program all at once using a single IR.
That is in contrast to say DeadArgumentElimination of LocalRefining etc., all of whom
look at one particular aspect of the program (function params and returns in DAE,
locals in LocalRefining). The cost is to build up an entire new IR, which takes a lot
of new code (mostly in the already-landed PossibleContents). Another cost
is this new IR is very big and requires a lot of time and memory to process.
The benefit is that this can find opportunities that are only obvious when looking
at the entire program, and also it can track information that is more specialized
than the normal type system in the IR - in particular, this can track an ExactType,
which is the case where we know the value is of a particular type exactly and not
a subtype.
|
|
|
|
|
|
| |
#4758 regressed the determinism fuzzer. First, FEATURE_OPTS is not a
list but a string. Second, as a hack another location would add --strip-dwarf
to that list, but that only works in wasm-opt. To fix that, remove the hack
and instead just ignore DWARF-containing files.
|
|
|
|
|
|
|
|
| |
This starts to implement the Wasm Strings proposal
https://github.com/WebAssembly/stringref/blob/main/proposals/stringref/Overview.md
This just adds the types.
|
|
|
|
|
|
|
|
|
|
|
| |
Nominal types don't make much sense without GC, and in particular trying to emit
them with typed function references but not GC enabled can result in invalid
binaries because nominal types do not respect the type ordering constraints
required by the typed function references proposal. Making this change was
mostly straightforward, but required fixing the fuzzer to use --nominal only
when GC is enabled and required exiting early from nominal-only optimizations
when GC was not enabled.
Fixes #4756.
|
| |
|
|
|
| |
Relates to #4741
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimizes constants in the megamorphic case of two: when we
know two function references are possible, we could in theory emit this:
(select
(ref.func A)
(ref.func B)
(ref.eq
(..ref value..) ;; globally, only 2 things are possible here, and one has
;; ref.func A as its value, and the other ref.func B
(ref.func A))
That is, compare to one of the values, and emit the two possible values there.
Other optimizations can then turn a call_ref on this select into an if over
two direct calls, leading to devirtualization.
We cannot compare a ref.func directly (since function references are not
comparable), and so instead we look at immutable global structs. If we
find a struct type that has only two possible values in some field, and
the structs are in immutable globals (which happens in the vtable case
in j2wasm for example), then we can compare the references of the struct
to decide between the two values in the field.
|
|
|
|
| |
If we use it as initial contents, we will try to execute it, and hit the TODOs
in the interpreter for unimplemented parts.
|
|
|
|
|
|
| |
Without this the error from a bad given wasm file - either by the user, or during
a reduction where smaller wasms are given - could be very confusing. A bad
given wasm file during reduction, in particular, indicates a bug in the reducer
most likely.
|
|
|
|
|
| |
Without this the reduction will fail on not being able to
parse the input file, if the input file depends on
nominal typing.
|
|
|
|
|
| |
wasm-dis does enable all features by default, so we don't need the
feature flags, but we do need --nominal etc. since we emit such
modules now.
|
|
|
|
|
|
|
| |
Instead of a raw run command, use the helper function, which adds the
feature flags. That adds --nominal which is needed more now after #4625
This fixes the fuzz failures mentioned in #4625 (comment)
|
|
|
| |
Add a flag to make it easy to pick which typesystem to test.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new signature-pruning pass that prunes parameters from
signature types where those parameters are never used in any function
that has that type. This is similar to DeadArgumentElimination but works
on a set of functions, and it can handle indirect calls.
Also move a little code from SignatureRefining into a shared place to
avoid duplication of logic to update signature types.
This pattern happens in j2wasm code, for example if all method functions
for some virtual method just return a constant and do not use the this
pointer.
|
|
|
|
|
|
|
|
|
| |
This enables fuzzing EH with initial contents. fuzzing.cpp/h does not
yet support generation of EH instructions, but with this we can still
fuzz EH based on initial contents.
The fuzzer ran successfully for more than 1,900,000 iterations, with my
local modification that always enables EH and lets the fuzzer select
only EH tests for its initial contents.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Fuzzer shows the initial contents and prompts the user if they want to
proceed after #4173, but when the fuzzer is used within a script called
from `wasm-reduce`, we shouldn't pause for the user input. This shows
the prompt only when there is no seed given.
To do that, we now initialize the important initial contents from
`main`. We used to assign those variables before we start `main`.
|
|
|
|
|
|
|
|
|
| |
See #4149
This modifies the test added in #4163 which used static casts on
dynamically-created structs and arrays. That was technically not
valid (as we won't want users to "mix" the two forms). This makes that
test 100% static, which both fixes the test and gives test coverage
to the new instructions added here.
|
|
|
|
|
|
|
|
|
| |
This is another attempt to address #4073. Instead of relying on the
timestamp, this examines git log to gather the list of test files added
or modified within some fixed number of days. The number of days is
currently set to 30 (= 1 month) but can be changed. This will be enabled
by `--auto-initial-contents`, which is now disabled by default.
Hopefully fixes #4073.
|
|
|
|
|
|
|
|
|
|
|
|
| |
We added an optional ReFinalize in OptimizeInstructions at some point,
but that is not valid: The ReFinalize only updates types when all other
works is done, but the pass works incrementally. The bug the fuzzer found
is that a child is changed to be unreachable, and then the parent is
optimized before finalize() is called on it, which led to an assertion being
hit (as the child was unreachable but not the parent, which should also
be).
To fix this, do not change types in this pass. Emit an extra block with a
declared type when necessary. Other passes can remove the extra block.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR helps with functions like this:
function foo(x) {
if (x) {
..
lots of work here
..
}
}
If "lots of work" is large enough, then we won't inline such a
function. However, we may end up calling into the function
only to get a false on that if and immediately exit. So it is useful
to partially inline this function, basically by creating a split
of it into a condition part that is inlineable
function foo$inlineable(x) {
if (x) {
foo$outlined();
}
}
and an outlined part that is not inlineable:
function foo$outlined(x) {
..
lots of work here
..
}
We can then inline the inlineable part. That means that a call
like
foo(param);
turns into
if (param) {
foo$outlined();
}
In other words, we end up replacing a call and then a check with
a check and then a call. Any time that the condition is false, this
will be a speedup.
The cost here is increased size, as we duplicate the condition
into the callsites. For that reason, only do this when heavily
optimizing for size.
This is a 10% speedup on j2cl. This helps two types of functions
there: Java class inits, which often look like "have I been
initialized before? if not, do all this work", and also assertion
methods which look like "if the input is null, throw an
exception".
|
|
|
|
|
|
| |
Avoids a crash in calling getHeapType when there isn't one.
Also add the relevant lit test (and a few others) to the list of files to
fuzz more heavily.
|
|
|
|
|
|
|
|
|
|
|
|
| |
tablify() attempts to turns a sequence of br_ifs into a single
br_table. This PR adds some flexibility to the specific pattern it
looks for, specifically:
* Accept i32.eqz as a comparison to zero, and not just to look
for i32.eq against a constant.
* Allow the first condition to be a tee. If it is, compare later
conditions to local.get of that local.
This will allow more br_tables to be emitted in j2cl output.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some functions run only once with this pattern:
function foo() {
if (foo$ran) return;
foo$ran = 1;
...
}
If that global is not ever set to 0, then the function's payload (after the
initial if and return) will never execute more than once. That means we
can optimize away dominated calls:
foo();
foo(); // we can remove this
To do this, we find which globals are "once", which means they can
fit in that pattern, as they are never set to 0. If a function looks like the
above pattern, and it's global is "once", then the function is "once" as
well, and we can perform this optimization.
This removes over 8% of static calls in j2cl.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we catches "Output must be deterministic" we can't see any details. This PR fix
this and now we can see diff of b1.wasm and b2.wasm files.
Example output:
Output must be deterministic.
Diff:
--- expected
+++ actual
@@ -2072,9 +2072,7 @@
)
(drop
(block $label$16 (result funcref)
- (local.set $10
- (ref.null func)
- )
+ (nop)
(drop
(call $22
(f64.const 0.296)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Technically this is not a new pass, but it is a rewrite almost from scratch.
Local Common Subexpression Elimination looks for repeated patterns,
stuff like this:
x = (a + b) + c
y = a + b
=>
temp = a + b
x = temp + c
y = temp
The old pass worked on flat IR, which is inefficient, and was overly
complicated because of that. The new pass uses a new algorithm that
I think is pretty simple, see the detailed comment at the top.
This keeps the pass enabled only in -O4, like before - right after
flattening the IR. That is to make this as minimal a change as possible.
Followups will enable the pass in the main pipeline, that is, we will
finally be able to run it by default. (Note that to make the pass work
well after flatten, an extra simplify-locals is added - the old pass used
to do part of simplify-locals internally, which was one source of
complexity. Even so, some of the -O4 tests have changes, due to
minor factors - they are just minor orderings etc., which can be
seen by inspecting the outputs before and after using e.g.
--metrics)
This plus some followup work leads to large wins on wasm GC output.
On j2cl there is a common pattern of repeated struct.gets, so common
that this pass removes 85% of all struct.gets, which makes the total
binary 15% smaller. However, on LLVM-emitted code the benefit is
minor, less than 1%.
|
| |
|
|
|
|
|
|
|
| |
Now that the features section adds on top of the commandline arguments,
it means the way we test if initial contents are ok to use will not work if
the wasm has a features section - as it will enable a feature, even if
we wanted to see if the wasm can work without that feature. To fix this,
strip the features section there.
|