summaryrefslogtreecommitdiff
path: root/src/passes/ReorderGlobals.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [NFC] Rename topological_orders.h to topological_sort.h (#6915)Thomas Lively2024-09-071-1/+1
| | | | | Now that the header includes topological sort utilities in addition to the utility that iterates over all topological orders, it makes more sense for it to be named topological_sort.h
* [NFC] Use Index instead of size_t in topological sort util (#6903)Thomas Lively2024-09-051-6/+6
| | | | | | This saves memory and could in principle improve performance, although a quick experiment with 30 samples on ReorderGlobals did not yield a statistically significant improvement. At any rate, using Index is more consistent with other parts of the code base.
* [NFC] Compute only one dependence graph in ReorderGlobals (#6891)Thomas Lively2024-09-041-28/+18
| | | | | | We previously computed both forward and reverse dependence graphs, but one of them was only used for a single topological sort that could just as well be computed by reversing the topological sort on the other graph.
* [NFC] Take custom comparator in TopologicalSort::minSort (#6890)Thomas Lively2024-09-041-57/+35
| | | | | | | | | | | | | | | | Rather than finding the minimum sort with respect to the original order of vertices, find the minimum sort with respect to an arbitrary user-provided comparator. Users of the minSort utility previously had to sort their input graphs according to their desired ordering, but now they can simply provide their comparator instead. Take advantage of the new functionality in ReorderGlobals and also standardize on a single data type for representing dependence graphs to avoid unnecessary conversions. Together, these changes slightly speed up ReorderGlobals. Move the topological sort code previously in a .cpp file into the header so the comparator can be provided as a lambda template parameter instead of as a `std::function`. This makes ReorderGlobals about 5% faster.
* [NFC] Change topological sort utilities to functions (#6889)Thomas Lively2024-09-031-3/+2
| | | | | | | Previously they were structs and their results were accessed with `operator*()`, but that was unnecessarily complicated and could lead to problems with temporary lifetimes being too short. Simplify the utilities by making them functions. This also allows the wrapper templates to infer the proper element types automatically.
* Simplify ReorderGlobals using new topological sort utils (#6885)Thomas Lively2024-08-291-141/+62
| | | | | | | | | | | | | | Use the new TopologicalSort and MinTopologicalSortOf utilities instead of the old CRTP topological sort utility and a bespoke heap-based topological sort in ReorderGlobals. Since there is no longer a heap to pop from, the direction of the custom comparator is now much more intuitive. Further simplify the code by switching from tracking the new order of globals using a sequence of new indices to tracking the order using a sequence of old indices. This change also makes the pass about 20% faster on a large real-world module.
* Optimize ReorderGlobals ordering with a new algorithm (#6625)Alon Zakai2024-05-311-71/+326
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old ordering in that pass did a topological sort while sorting by uses both within topological groups and between them. That could be unoptimal in some cases, however, and actually on J2CL output this pass made the binary larger, which is how we noticed this. The problem is that such a toplogical sort keeps topological groups in place, but it can be useful to interleave them sometimes. Imagine this: $c - $a / $e \ $d - $b Here $e depends on $c, etc. The optimal order may interleave the two arms here, e.g. $a, $b, $d, $c, $e. That is because the dependencies define a partial order, and so the arms here are actually independent. Sorting by toplogical depth first might help in some cases, but also is not optimal in general, as we may want to mix toplogical depths: $a, $c, $b, $d, $e does so, and it may be the best ordering. This PR implements a natural greedy algorithm that picks the global with the highest use count at each step, out of the set of possible globals, which is the set of globals that have no unresolved dependencies. So we start by picking the first global with no dependencies and add at at the front; then that unlocks anything that depended on it and we pick from that set, and so forth. This may also not be optimal, but it is easy to make it more flexible by customizing the counts, and we consider 4 sorts here: * Set all counts to 0. This means we only take into account dependencies, and we break ties by the original order, so this is as close to the original order as we can be. * Use the actual use counts. This is the simple greedy algorithm. * Set the count of each global to also contain the counts of its children, so the count is the total that might be unlocked. This gives more weight to globals that can unlock more later, so it is less greedy. * As last, but weight children's counts lower in an exponential way, which makes sense as they may depend on other globals too. In practice it is simple to generate cases where 1, 2, or 3 is optimal (see new tests), but on real-world J2CL I see that 4 (with a particular exponential coefficient) is best, so the pass computes all 4 and picks the best. As a result it will never worsen the size and it has a good chance of improving. The differences between these are small, so in theory we could pick any of them, but given they are all modifications of a single algorithm it is very easy to compute them all with little code complexity. The benefits are rather small here, but this can save a few hundred bytes on a multi-MB Java file. This comes at a tiny compile time cost, but seems worth it for the new guarantee to never regress size.
* wasm-merge: Sort globals to ensure proper validation (#6221)Alon Zakai2024-01-111-1/+5
| | | | | | | | | | | | | If the first module has a global that reads from a global that appears in a later module, then we need to reorder the globals, because if we just append the globals from the later module we'd end up with a global reading from another that is not before it. Changes to the existing renamings test are just due to the global sorting pass that now runs (it not only fixes up validation errors but also tries to sort in a more optimal order for size). Fixes #6220
* ReorderGlobals pass (#4904)Alon Zakai2022-11-021-0/+164
This sorts globals by their usage (and respecting dependencies). If the module has very many globals then using smaller LEBs can matter. If there are fewer than 128 globals then we cannot reduce size, and the pass exits early (so this pass will not slow down MVP builds, which usually have just 1 global, the stack pointer). But with wasm GC it is common to use globals for vtables etc., and often there is a very large number of them.