| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We previously supported a non-standard `(func "name" ...` syntax for declaring
functions exported with the quoted name. Since that is not part of the standard
text format, drop support for it, replacing it with the standard `(func $name
(export "name") ...` syntax instead.
Also replace our other usage of the quoted form in our text output, which was
where we quoted names containing characters that are not allowed to appear in
standard names. To handle that case, adjust our output from `"$name"` to
`$"name"`, which is the standards-track way of supporting such names. Also fix
how we detect non-standard name characters to match the spec.
Update the lit test output generation script to account for these changes,
including by making the `$` prefix on names mandatory. This causes the script to
stop interpreting declarative element segments with the `(elem declare ...`
syntax as being named "declare", so prevent our generated output from regressing
by counting "declare" as a name in the script.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
interactions with a parent (#6089)
We had a simple rule that if we reach an expression twice then we give up, which makes
sense for say a block: if one allocation flows out of it, then another can't - it would get
mixed in with the other one, which is a case we don't optimize. However, there are
cases where a parent has multiple children and different interactions with them, like
a struct.set: the reference child does not escape, but the value child does. Before this
PR if we reached the value child first, we'd mark the parent as seen, and then the reference
child would see it isn't the first to get here, and not optimize.
To fix this, reorder the code to handle this case. The manner of interaction between the
child and the parent decides whether we mark the parent as seen and to be further
avoided.
Noticed by the determinism fuzzer, since the order of analysis mattered here.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
E.g.
(local $x (ref eq)
...
(local.set $x
(struct.new $float
...
)
)
(struct.get $float 0
(ref.cast (ref $float)
(local.get $x)
)
)
This PR allows us to use heap2local, ignoring the passing cast.
This is similar to existing handling of ref.as_non_null.
|
|
|
|
|
|
|
|
| |
We shouldn't need to in the general case, but the fuzzer found a corner case
where we do need to, see the explanation + testcase, but basically Heap2Local
replaces struct fields with locals, and the locals should have the same types,
but if a field was somehow less refined for some reason, then the locals could
actually be more refined. (And a field could be less refined if we read it from a
typed that was under-refined due to a tee or such.)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this PR, if a call had no paths to a catch in the same function then we skipped
creating a new basic block right after it. As a result, we could have a call in the middle
of a basic block. If EH is enabled that means we might transfer control flow out of
the function from the middle of a block. But it is better to have the property that
any transfer of control flow - to another basic block, or outside of the function - can
only happen at the end of a basic block.
This causes some overhead, but a subsequent PR (#5838) will remove that as a
followup, and this PR adds a little code to pass the module and check if EH is enabled,
and avoid the overhead if not, which at least avoids regressing the non-EH case
until that followup lands.
|
|
|
|
|
|
|
| |
If we see (ref.cast $A) but we have inferred that a more refined type will
be present there at runtime $B then we can refine the cast to (ref.cast $B).
We could do the same even when a cast is not present, but that would
increase code size. This optimization keeps code size constant.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously only WalkerPasses had access to the `getPassRunner` and
`getPassOptions` methods. Move those methods to `Pass` so all passes can use
them. As a result, the `PassRunner` passed to `Pass::run` and
`Pass::runOnFunction` is no longer necessary, so remove it.
Also update `Pass::create` to return a unique_ptr, which is more efficient than
having it return a raw pointer only to have the `PassRunner` wrap that raw
pointer in a `unique_ptr`.
Delete the unused template `PassRunner::getLast()`, which looks like it was
intended to enable retrieving previous analyses and has been in the code base
since 2015 but is not implemented anywhere.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An overview of this is in the README in the diff here (conveniently, it is near the
top of the diff). Basically, we fix up nn locals after each pass, by default. This keeps
things easy to reason about - what validates is what is valid wasm - but there are
some minor nuances as mentioned there, in particular, we ignore nameless blocks
(which are commonly added by various passes; ignoring them means we can keep
more locals non-nullable).
The key addition here is LocalStructuralDominance which checks which local
indexes have the "structural dominance" property of 1a, that is, that each get has
a set in its block or an outer block that precedes it. I optimized that function quite
a lot to reduce the overhead of running that logic after each pass. The overhead
is something like 2% on J2Wasm and 0% on Dart (0%, because in this mode we
shrink code size, so there is less work actually, and it balances out).
Since we run fixups after each pass, this PR removes logic to manually call the
fixup code from various places we used to call it (like eh-utils and various passes).
Various passes are now marked as requiresNonNullableLocalFixups => false.
That lets us skip running the fixups after them, which we normally do automatically.
This helps avoid overhead. Most passes still need the fixups, though - any pass
that adds a local, or a named block, or moves code around, likely does.
This removes a hack in SimplifyLocals that is no longer needed. Before we
worked to avoid moving a set into a try, as it might not validate. Now, we just do it
and let fixups happen automatically if they need to: in the common code they
probably don't, so the extra complexity seems not worth it.
Also removes a hack from StackIR. That hack tried to avoid roundtrip adding a
nondefaultable local. But we have the logic to fix that up now, and opts will
likely keep it non-nullable as well.
Various tests end up updated here because now a local can be non-nullable -
previous fixups are no longer needed.
Note that this doesn't remove the gc-nn-locals feature. That has been useful for
testing, and may still be useful in the future - it basically just allows nn locals in
all positions (that can't read the null default value at the entry). We can consider
removing it separately.
Fixes #4824
|
|
|
|
|
|
|
| |
RTTs were removed from the GC spec and if they are added back in in the future,
they will be heap types rather than value types as in our implementation.
Updating our implementation to have RTTs be heap types would have been more work
than deleting them for questionable benefit since we don't know how long it will
be before they are specced again.
|
| |
|
|
|
|
|
|
|
| |
When the allocation we optimize away flows through a loop, then just like
with a block we must change the type to be nullable, since we are replacing
the allocation with a null.
Fixes #4287
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A SmallSet starts with fixed storage that it uses in the simplest
possible way (linear scan, no sorting). If it exceeds a size then it
starts using a normal std::set. So for small amounts of data it
avoids allocation and any other overhead.
This adds a unit test and also uses it in LocalGraph which provides
a large amount of additional coverage.
I also changed an unrelated data structure from std::map to
std::unordered_map which I noticed while doing profiling in
LocalGraph. (And a tiny bit of additional refactoring there.)
This makes LocalGraph-using passes like ssa-nomerge and
precompute-propagate 10-15% faster on a bunch of real-world
codebases I tested.
|
|
|
|
|
|
|
|
|
| |
See #4149
This modifies the test added in #4163 which used static casts on
dynamically-created structs and arrays. That was technically not
valid (as we won't want users to "mix" the two forms). This makes that
test 100% static, which both fixes the test and gives test coverage
to the new instructions added here.
|
|
|
|
|
|
|
|
|
|
|
| |
This finishes the refactoring started in #4115 by doing the
same change to pass a Module into EffectAnalyzer instead of
features. To do so this refactors the fallthrough API and a few
other small things. After those changes, this PR removes the
old feature constructor of EffectAnalyzer entirely.
This requires a small breaking change in the C API, changing
BinaryenExpressionGetSideEffects's feature param to a
module. That makes this change not NFC, but otherwise it is.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Given
(local $x (ref $foo))
(local.set $x ..)
(.. (local.get $x))
If we remove the local.set but not the get, then we end up with
(local $x (ref $foo))
(.. (local.get $x))
It looks like the local.get reads the initial value of a non-nullable local,
which is not allowed.
In practice, this would crash in precompute-propagate which would
try to propagate the initial value to the get. Add an assertion there with
a clear message, as until we have full validation of non-nullable locals
(and the spec for that is in flux), that pass is where bugs will end up
being noticed.
To fix this, replace the get as well. We can replace it with a null
for simplicity; it will never be used anyhow.
This also uncovered a small bug with reached not containing all
the things we reached - it was missing local.gets.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we would try to stop using the allocation as much as possible,
for example not writing it to locals any more, and leaving it to other passes
to actually remove it (and remove gets of those locals etc.). This seemed
simpler and more modular, but does not actually work in some cases as
the fuzzer has found. Specifically, if we stop writing our allocation to
locals, then if we do a (ref.as_non_null (local.get ..)) of that, then we
will trap on the null present in the local.
Instead, this changes our rewriting to do slightly more work, but it is
simpler in the end. We replace the allocation with a null, and replace
all the places that use it accordingly, for example, updating types to be
nullable, and removing RefAsNonNulls, etc. This literally gets rid of the
allocation and all the places it flows to (leaving less for other passes to
do later).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we branch to a block, and there are no other branches or a final
value on the block either, then there is no mixing, and we may be able
to optimize the allocation. Before this PR, all branches stopped us.
To do this, add some helpers in BranchUtils.
The main flow logic in Heap2Local used to stop when we reached a
child for the second time. With branches, however, a child can flow both to
its immediate parent, and to branch targets, and so the proper thing to look
at is when we reach a parent for the second time (which would definitely
indicate mixing).
Tests are added for the new functionality. Note that some existing tests
already covered some things we should not optimize, and so no tests
were needed for them. The existing ones are: $get-through-block,
$branch-to-block.
|
|
If we allocate some GC data, and do not let the reference escape, then we can
replace the allocation with locals, one local for each field in the allocation
basically. This avoids the allocation, and also allows us to optimize the locals
further.
On the Dart DeltaBlue benchmark, this is a 24% speedup (making it faster than
the JS version, incidentially), and also a 6% reduction in code size.
The tests are not the best way to show what this does, as the pass assumes
other passes will clean up after. Here is an example to clarify. First, in pseudocode:
ref = new Int(42)
do {
ref.set(ref.get() + 1)
} while (import(ref.get())
That is, we allocate an int on the heap and use it as a counter. Unnecessarily,
as it could be a normal int on the stack.
Wat:
(module
;; A boxed integer: an entire struct just to hold an int.
(type $boxed-int (struct (field (mut i32))))
(import "env" "import" (func $import (param i32) (result i32)))
(func "example"
(local $ref (ref null $boxed-int))
;; Allocate a boxed integer of 42 and save the reference to it.
(local.set $ref
(struct.new_with_rtt $boxed-int
(i32.const 42)
(rtt.canon $boxed-int)
)
)
;; Increment the integer in a loop, looking for some condition.
(loop $loop
(struct.set $boxed-int 0
(local.get $ref)
(i32.add
(struct.get $boxed-int 0
(local.get $ref)
)
(i32.const 1)
)
)
(br_if $loop
(call $import
(struct.get $boxed-int 0
(local.get $ref)
)
)
)
)
)
)
Before this pass, the optimizer could do essentially nothing with this.
Even with this pass, running -O1 has no effect, as the pass is only
used in -O2+. However, running --heap2local -O1 leads to this:
(func $0
(local $0 i32)
(local.set $0
(i32.const 42)
)
(loop $loop
(br_if $loop
(call $import
(local.tee $0
(i32.add
(local.get $0)
(i32.const 1)
)
)
)
)
)
)
All the GC heap operations have been removed, and we just
have a plain int now, allowing a bunch of other opts to run. That
output is basically the optimal code, I think.
|