| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Inlining was careful about nested calls like this:
(call $a
(call $b)
)
If we inlined the outer call first, we'd have
(block $inlined-code-from-a
..code..
(call $b)
)
After that, the inner call is a child of a block, not of a call. That is,
we've moved the inner call to another parent. To replace that
inner call when we inline, we'd need to update the new parent,
which would require work. To avoid that work, the pass simply
created a block in the middle:
(call $a
(block
(call $b)
)
)
Now the inner call's immediate parent will not change when we
inline the outer call.
However, it turns out that this was entirely unnecessary. We find
the calls using a post-order traversal, and we store the actions in
a vector that we traverse in order, so we only ever process things
in the optimal order of children before parents. And in that order
there is no problem: inlining the inner call first leads to
(call $a
(block $inlined-code-from-b
(..code..)
)
)
That does not affect the outer call's parent.
This PR removes the creation of the unnecessary blocks. This doesn't
improve the final output as optimizations remove the unneeded
blocks later anyhow, but it does make the code simpler and a little
faster. It also makes debugging less confusing. But this is not truly
NFC because --inlining (but not --inlining-optimizing) will actually
emit fewer blocks now (but only --inlining-optimizing is used by
default in production).
The diff on tests here is very small when ignoring whitespace. The
remaining differences are just emitting fewer obviously-unneeded
blocks. There is also one test that needed manual changes,
inlining-eh-legacy, because it tested that we do Pop fixups, but
after emitting one fewer block, those fixups were not needed. I
added a new test there with two nested calls, which does end up
needing those fixups. I also added such a test in
inlining_all-features so that we have coverage for such nested
calls (we might remove the eh-legacy file some day, and other
existing tests with nested calls that I found were more complex).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We may inline multiple times into a single function. Previously, if we did so, we
did the "fixups" such as ReFinalize and non-nullable local fixes once per such
inlining. But that is wasteful as each ReFinalize etc. scans the whole function,
and could be done after we copy all the code from all the inlinings, which is
what this PR does: it splits doInlining() into one function that inlines code and
one that does the updates after, and the update is done after all inlinings.
This turns out to be very important, a 5x speedup on two large real-world wasm
files I am looking at. The reason is that we actually inline more than once in
half the cases, and sometimes far more - in one case we inline over 1,000
times into a function! (and ReFinalized 1,000 times too many)
This is practically NFC, but it turns out that there are some tiny noticeable
differences between running ReFinalize once at the end vs. once after each
inlining. These differences are not really functional or observable in the
behavior of the code, and optimizations would remove them anyhow, but
they are noticeable in two tests here. The changes to tests are, in order:
* Different block names, just because the counter we use sees more things.
* In a testcase with unreachable code, we inline twice into a function, and
the first inlining brings in an unreachable, and ReFinalizing early will lead
to it propagating differently than if we wait to ReFinalize. (It actually leads
to another cycle of inlining in that case, as a fluke.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new text parser is faster and more standards compliant than the old text
parser. Enable it by default in wasm-opt and update the tests to reflect the
slightly different results it produces. Besides following the spec, the new
parser differs from the old parser in that it:
- Does not synthesize `loop` and `try` labels unnecessarily
- Synthesizes different block names in some cases
- Parses exports in a different order
- Parses `nop`s instead of empty blocks for empty control flow arms
- Does not support parsing Poppy IR
- Produces different error messages
- Cannot parse `pop` except as the first instruction inside a `catch`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A trivial call is something like a function that just calls another immediately,
function foo(x, y) {
return bar(y, 15);
}
We can inline those and expect to benefit in most cases, though we might
increase code size slightly. Hence it makes sense to inline such cases, even
though in general we are careful and do not inline functions with calls in
them; a "trampoline" like that likely has most of the work in the call itself,
which we can avoid by inlining.
Suggested based on findings in Java.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When printing Binaryen IR, we previously generated names for unnamed heap types
based on their structure. This was useful for seeing the structure of simple
types at a glance without having to separately go look up their definitions, but
it also had two problems:
1. The same name could be generated for multiple types. The generated names did
not take into account rec group structure or finality, so types that differed
only in these properties would have the same name. Also, generated type names
were limited in length, so very large types that shared only some structure
could also end up with the same names. Using the same name for multiple types
produces incorrect and unparsable output.
2. The generated names were not useful beyond the most trivial examples. Even
with length limits, names for nontrivial types were extremely long and visually
noisy, which made reading disassembled real-world code more challenging.
Fix these problems by emitting simple indexed names for unnamed heap types
instead. This regresses readability for very simple examples, but the trade off
is worth it.
This change also reduces the number of type printing systems we have by one.
Previously we had the system in Print.cpp, but we had another, more general and
extensible system in wasm-type-printing.h and wasm-type.cpp as well. Remove the
old type printing system from Print.cpp and replace it with a much smaller use
of the new system. This requires significant refactoring of Print.cpp so that
PrintExpressionContents object now holds a reference to a parent
PrintSExpression object that holds the type name state.
This diff is very large because almost every test output changed slightly. To
minimize the diff and ease review, change the type printer in wasm-type.cpp to
behave the same as the old type printer in Print.cpp except for the differences
in name generation. These changes will be reverted in much smaller PRs in the
future to generally improve how types are printed.
|
|
|
|
|
|
|
|
|
|
|
| |
Each time we inline we put the contents in a block. Before we used the same name
each time we inlined the same method, and as a result had many conflicts if a
function was inlined many times. With this PR we emit a different name each
time.
This is not 100% NFC as it does change block names, which is observable in the IR
(as can be seen in the test updates).
This helps #5696 in speeding up UniqueNameManner.
|
|
|
|
|
|
|
|
|
|
| |
All top-level Module elements are identified and referred to by Name, but for
historical reasons element and data segments were referred to by index instead.
Fix this inconsistency by using Names to refer to segments from expressions that
use them. Also parse and print segment names like we do for other elements.
The C API is partially converted to use names instead of indices, but there are
still many functions that refer to data segments by index. Finishing the
conversion can be done in the future once it becomes necessary.
|
|
|
|
|
|
|
|
|
|
|
|
| |
See example in the new comment. In general, we never want to go from
unreachable to reachable (refinalize doesn't even try), as it misses out on
DCE opportunities. Also it may have validation issues, which is how the
fuzzer found this.
Remove code in the same place that was redundant after the refinalize
was added in #5492. That simplifies some existing testcases slightly, by
removing an unneeded br we added, and now we add a new unreachable
at the end in other cases that need it.
|
|
|
|
|
|
|
|
|
|
|
|
| |
We must refinalize as inlining unreachable code can lead to
more things becoming unreachable.
We also must uniquify label names before refinalizing, as the
IR must be valid at that time, so that code is moved.
This causes some minor changes to existing test code (some
label changes, and refinalization makes more things
unreachable), but only the two new tests show actual problems
that needed to be fixed.
|
| |
|
|
|
| |
I was reading these tests and failing to find the names script.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we would keep doing iterations of inlining until we hit
the number of functions in the module (at which point, we would
definitely know we are recursing). This prevents infinite recursion,
but it can take a very very long time to notice that in a huge
module with one tiny recursive function and 100,000 other ones.
To do better than that, track how many times we've inlined into
a function. After a fixed number of such inlinings, stop.
Aside from avoding very slow compile times, such infinite
recursion likely is not that beneficial to do a great many times,
so anyhow it is best to stop after a few iterations.
|
|
|