| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
When IRBuilder builds an empty non-block scope such as a function body,
an if arm, a try block, etc, it needs to produce some expression to
represent the empty contents. Previously it produced a nop, but change
it to produce an empty block instead. The binary writer and printer have
special logic to elide empty blocks, so this produces smaller output.
Update J2CLOpts to recognize functions containing empty blocks as
trivial to avoid regressing one of its tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rec groups need to be topologically sorted for the output module to be
valid, but the specific order of rec groups also affects the module size
because types at lower indices requires fewer bytes to reference. We
previously optimized for code size when gathering types by sorting the
list of groups before doing the topological sort. This was brittle,
though, and depended on implementation details of the topological sort
to be correct.
Replace the old topological sort with use of the new
`TopologicalSort::minSort` utility, which is a more principled method of
achieving a minimal topological sort with respect to some comparator.
Also draw inspiration from ReorderGlobals and apply an exponential
factor to take the users of a rec group into account when determining
its weight.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The argument is the minimum benefit we must see for us to decide to optimize, e.g.
--monomorphize --pass-arg=monomorphize-min-benefit@50
When the minimum benefit is 50% then if we reduce the cost by 50% through
monomorphization then we optimize there. 95% would only optimize when we
remove almost all the cost, etc.
In practice I see 95% will actually tend to reduce code size overall, as while we add
monomorphized versions of functions, we only do so when we remove a lot of
work and size, and after inlining we gain benefits. However, 50% or even lower can
lead to better benchmark results, in return for larger code size, just like with
inlining. To be careful, the default is set to 95%.
Previously we optimized whenever we saw any benefit at all, which is the same
as requiring a minimum benefit of 0%. Old tests have the flag applied in this PR
to set that value, so they do not change.
|
|
Previously call operands were monomorphized (considered as part of the
call context, so we can create a specialized function with those operands
fixed) if they were constant or had a different type than the function
parameter's type. This generalizes that to pull in pretty much all the code
we possibly can, including nested code. For example:
(call $foo
(struct.new $struct
(i32.const 10)
(local.get $x)
(local.get $y)
)
)
This can turn into
(call $foo_mono
(local.get $x)
(local.get $y)
)
The struct.new and even one of the struct.new's children is moved into the
called function, replacing the original ref argument with two other ones. If the
original called function was this:
(func $foo (param $ref (ref ..))
..
)
then the monomorphized function then looks like this:
(func $foo_mono (param $x i32) (param $y i32)
(local $ref (ref ..))
(local.set $ref
(struct.new $struct
(i32.const 10)
(local.get $x)
(local.get $y)
)
)
..
)
The struct.new and its constant child appear here, and we read the
parameters.
To do this, generalize the code that creates the call context to accept
everything that is impossible to copy (like a local.get) or that would be
tricky and likely unworthwhile (like another call or a tuple). Also check
for effect interactions, as this code motion does some reordering.
For this to work, we need to adjust how we compute the costs we
compare when deciding what to monomorphize. Before we just
compared the called function to the monomorphized called function,
which was good enough when the call context only contained consts,
but now it can contain arbitrarily nested code. The proper comparison
is between these two:
* Old function + call context
* New monomorphized function
Including the call context makes this a fair comparison. In the example
above, the struct.new and the i32.const are part of the call context,
and so they are in the monomorphized function, so if we didn't count
them in other function we'd decide not to optimize anything with a large
context.
The new functionality is tested in a new file. A few parts of existing
tests needed changes to not become pointless after this improvement,
namely by replacing stuff that we now optimize with things that we
don't like replacing an i32.eqz with a local.get. There are also a
handful of test outcomes that change in CAREFUL mode due to the
new cost analysis.
|