| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Now that the header includes topological sort utilities in addition to
the utility that iterates over all topological orders, it makes more
sense for it to be named topological_sort.h
|
|
|
|
|
|
| |
This saves memory and could in principle improve performance, although a
quick experiment with 30 samples on ReorderGlobals did not yield a
statistically significant improvement. At any rate, using Index is more
consistent with other parts of the code base.
|
|
|
|
|
|
| |
We previously computed both forward and reverse dependence graphs, but
one of them was only used for a single topological sort that could just
as well be computed by reversing the topological sort on the other
graph.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than finding the minimum sort with respect to the original order
of vertices, find the minimum sort with respect to an arbitrary
user-provided comparator. Users of the minSort utility previously had to
sort their input graphs according to their desired ordering, but now
they can simply provide their comparator instead.
Take advantage of the new functionality in ReorderGlobals and also
standardize on a single data type for representing dependence graphs to
avoid unnecessary conversions. Together, these changes slightly speed up
ReorderGlobals.
Move the topological sort code previously in a .cpp file into the header
so the comparator can be provided as a lambda template parameter instead
of as a `std::function`. This makes ReorderGlobals about 5% faster.
|
|
|
|
|
|
|
| |
Previously they were structs and their results were accessed with
`operator*()`, but that was unnecessarily complicated and could lead to
problems with temporary lifetimes being too short. Simplify the
utilities by making them functions. This also allows the wrapper
templates to infer the proper element types automatically.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the new TopologicalSort and MinTopologicalSortOf utilities instead
of the old CRTP topological sort utility and a bespoke heap-based
topological sort in ReorderGlobals. Since there is no longer a heap to
pop from, the direction of the custom comparator is now much more
intuitive.
Further simplify the code by switching from tracking the new order of
globals using a sequence of new indices to tracking the order using a
sequence of old indices.
This change also makes the pass about 20% faster on a large real-world
module.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old ordering in that pass did a topological sort while sorting by uses
both within topological groups and between them. That could be unoptimal
in some cases, however, and actually on J2CL output this pass made the
binary larger, which is how we noticed this.
The problem is that such a toplogical sort keeps topological groups in
place, but it can be useful to interleave them sometimes. Imagine this:
$c - $a
/
$e
\
$d - $b
Here $e depends on $c, etc. The optimal order may interleave the two
arms here, e.g. $a, $b, $d, $c, $e. That is because the dependencies define
a partial order, and so the arms here are actually independent.
Sorting by toplogical depth first might help in some cases, but also is not
optimal in general, as we may want to mix toplogical depths:
$a, $c, $b, $d, $e does so, and it may be the best ordering.
This PR implements a natural greedy algorithm that picks the global with
the highest use count at each step, out of the set of possible globals, which
is the set of globals that have no unresolved dependencies. So we start by
picking the first global with no dependencies and add at at the front; then
that unlocks anything that depended on it and we pick from that set, and
so forth.
This may also not be optimal, but it is easy to make it more flexible by
customizing the counts, and we consider 4 sorts here:
* Set all counts to 0. This means we only take into account dependencies,
and we break ties by the original order, so this is as close to the original
order as we can be.
* Use the actual use counts. This is the simple greedy algorithm.
* Set the count of each global to also contain the counts of its children,
so the count is the total that might be unlocked. This gives more weight
to globals that can unlock more later, so it is less greedy.
* As last, but weight children's counts lower in an exponential way, which
makes sense as they may depend on other globals too.
In practice it is simple to generate cases where 1, 2, or 3 is optimal (see
new tests), but on real-world J2CL I see that 4 (with a particular exponential
coefficient) is best, so the pass computes all 4 and picks the best. As a
result it will never worsen the size and it has a good chance of
improving.
The differences between these are small, so in theory we could pick any
of them, but given they are all modifications of a single algorithm it is
very easy to compute them all with little code complexity.
The benefits are rather small here, but this can save a few hundred
bytes on a multi-MB Java file. This comes at a tiny compile time cost, but
seems worth it for the new guarantee to never regress size.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the first module has a global that reads from a global that appears in a later
module, then we need to reorder the globals, because if we just append the
globals from the later module we'd end up with a global reading from another
that is not before it.
Changes to the existing renamings test are just due to the global sorting
pass that now runs (it not only fixes up validation errors but also tries to sort
in a more optimal order for size).
Fixes #6220
|
|
This sorts globals by their usage (and respecting dependencies). If the module
has very many globals then using smaller LEBs can matter.
If there are fewer than 128 globals then we cannot reduce size, and the pass
exits early (so this pass will not slow down MVP builds, which usually have just
1 global, the stack pointer). But with wasm GC it is common to use globals for
vtables etc., and often there is a very large number of them.
|