diff options
author | Alon Zakai <azakai@google.com> | 2024-05-31 14:55:57 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-05-31 14:55:57 -0700 |
commit | f8086adbf030b26eb7ce75e5d1834ae8134c689e (patch) | |
tree | b2d9e63f721aa7913c86924c8a966c9c1cc89923 /test/lit/passes/reorder-globals.wast | |
parent | 0c23394e9d73252000c3161fb33344ca7bbf247c (diff) | |
download | binaryen-f8086adbf030b26eb7ce75e5d1834ae8134c689e.tar.gz binaryen-f8086adbf030b26eb7ce75e5d1834ae8134c689e.tar.bz2 binaryen-f8086adbf030b26eb7ce75e5d1834ae8134c689e.zip |
Optimize ReorderGlobals ordering with a new algorithm (#6625)
The old ordering in that pass did a topological sort while sorting by uses
both within topological groups and between them. That could be unoptimal
in some cases, however, and actually on J2CL output this pass made the
binary larger, which is how we noticed this.
The problem is that such a toplogical sort keeps topological groups in
place, but it can be useful to interleave them sometimes. Imagine this:
$c - $a
/
$e
\
$d - $b
Here $e depends on $c, etc. The optimal order may interleave the two
arms here, e.g. $a, $b, $d, $c, $e. That is because the dependencies define
a partial order, and so the arms here are actually independent.
Sorting by toplogical depth first might help in some cases, but also is not
optimal in general, as we may want to mix toplogical depths:
$a, $c, $b, $d, $e does so, and it may be the best ordering.
This PR implements a natural greedy algorithm that picks the global with
the highest use count at each step, out of the set of possible globals, which
is the set of globals that have no unresolved dependencies. So we start by
picking the first global with no dependencies and add at at the front; then
that unlocks anything that depended on it and we pick from that set, and
so forth.
This may also not be optimal, but it is easy to make it more flexible by
customizing the counts, and we consider 4 sorts here:
* Set all counts to 0. This means we only take into account dependencies,
and we break ties by the original order, so this is as close to the original
order as we can be.
* Use the actual use counts. This is the simple greedy algorithm.
* Set the count of each global to also contain the counts of its children,
so the count is the total that might be unlocked. This gives more weight
to globals that can unlock more later, so it is less greedy.
* As last, but weight children's counts lower in an exponential way, which
makes sense as they may depend on other globals too.
In practice it is simple to generate cases where 1, 2, or 3 is optimal (see
new tests), but on real-world J2CL I see that 4 (with a particular exponential
coefficient) is best, so the pass computes all 4 and picks the best. As a
result it will never worsen the size and it has a good chance of
improving.
The differences between these are small, so in theory we could pick any
of them, but given they are all modifications of a single algorithm it is
very easy to compute them all with little code complexity.
The benefits are rather small here, but this can save a few hundred
bytes on a multi-MB Java file. This comes at a tiny compile time cost, but
seems worth it for the new guarantee to never regress size.
Diffstat (limited to 'test/lit/passes/reorder-globals.wast')
-rw-r--r-- | test/lit/passes/reorder-globals.wast | 547 |
1 files changed, 534 insertions, 13 deletions
diff --git a/test/lit/passes/reorder-globals.wast b/test/lit/passes/reorder-globals.wast index 6017f09ad..542311d59 100644 --- a/test/lit/passes/reorder-globals.wast +++ b/test/lit/passes/reorder-globals.wast @@ -145,8 +145,6 @@ ;; As above, but without dependencies, so now $c is first and then $b. (module - - ;; CHECK: (global $c i32 (i32.const 30)) ;; CHECK: (global $b i32 (i32.const 20)) @@ -180,10 +178,10 @@ ) ) -;; As above, but a mixed case: $b depends on $a but $c has no dependencies. $c -;; can be first. +;; As above, but a mixed case: $b depends on $a but $c has no dependencies, and +;; the counts are $c with the most, followed by $b, and then $a. $c can be +;; first here, but $b must follow $a. (module - ;; CHECK: (global $c i32 (i32.const 30)) ;; CHECK: (global $a i32 (i32.const 10)) @@ -197,6 +195,12 @@ ;; CHECK-NEXT: (global.get $b) ;; CHECK-NEXT: ) ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $b) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop ;; CHECK-NEXT: (global.get $c) ;; CHECK-NEXT: ) ;; CHECK-NEXT: (drop @@ -204,6 +208,10 @@ ;; CHECK-NEXT: ) ;; CHECK-NEXT: ) (func $uses + ;; ($a already has one use in the global $b) + (drop + (global.get $b) + ) (drop (global.get $b) ) @@ -213,27 +221,111 @@ (drop (global.get $c) ) + (drop + (global.get $c) + ) ) ) -;; Another mixed case, now with $c depending on $b. $b can be before $a. +;; As above, but with the counts adjusted: before we had $c, $b, $a from most to +;; least uses, and now $b, $c, $a. +;; +;; A greedy sort would do $c, $a, $b (as the first choice is between $c and $a, +;; and $c wins), but that leaves $b, the highest count, for the end. The +;; smoothed-out LEB costs (1 byte at the start, +1/128 each index later) are: +;; +;; $c $a $b +;; 1 * 2 + 129/128 * 1 + 130/128 * 3 = 775/128 +;; +;; The original sort is +;; +;; $a $b $c +;; 1 * 1 + 129/128 * 3 + 130/128 * 2 = 775/128 +;; +;; As they are equal we prefer the original order. (module + ;; CHECK: (global $a i32 (i32.const 10)) + (global $a i32 (i32.const 10)) + ;; CHECK: (global $b i32 (global.get $a)) + (global $b i32 (global.get $a)) + ;; CHECK: (global $c i32 (i32.const 30)) + (global $c i32 (i32.const 30)) + ;; CHECK: (func $uses (type $0) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $b) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $b) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $b) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $uses + (drop + (global.get $b) + ) + (drop + (global.get $b) + ) + (drop + (global.get $b) + ) + (drop + (global.get $c) + ) + (drop + (global.get $c) + ) + ) +) - ;; CHECK: (global $b i32 (i32.const 20)) - - ;; CHECK: (global $c i32 (global.get $b)) - +;; As above, but with the counts adjusted to $b, $a, $c. +(module ;; CHECK: (global $a i32 (i32.const 10)) (global $a i32 (i32.const 10)) - (global $b i32 (i32.const 20)) - (global $c i32 (global.get $b)) + ;; CHECK: (global $b i32 (global.get $a)) + (global $b i32 (global.get $a)) + ;; CHECK: (global $c i32 (i32.const 30)) + (global $c i32 (i32.const 30)) ;; CHECK: (func $uses (type $0) ;; CHECK-NEXT: (drop ;; CHECK-NEXT: (global.get $b) ;; CHECK-NEXT: ) ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $b) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $uses + (drop + (global.get $b) + ) + (drop + (global.get $b) + ) + ) +) + +;; As above, but with the counts adjusted to $c, $a, $b. +(module + ;; CHECK: (global $c i32 (i32.const 30)) + + ;; CHECK: (global $a i32 (i32.const 10)) + (global $a i32 (i32.const 10)) + ;; CHECK: (global $b i32 (global.get $a)) + (global $b i32 (global.get $a)) + (global $c i32 (i32.const 30)) + + ;; CHECK: (func $uses (type $0) + ;; CHECK-NEXT: (drop ;; CHECK-NEXT: (global.get $c) ;; CHECK-NEXT: ) ;; CHECK-NEXT: (drop @@ -242,11 +334,63 @@ ;; CHECK-NEXT: ) (func $uses (drop - (global.get $b) + (global.get $c) ) (drop (global.get $c) ) + ) +) + +;; As above, but with the counts adjusted to $a, $b, $c. +(module + ;; CHECK: (global $a i32 (i32.const 10)) + (global $a i32 (i32.const 10)) + ;; CHECK: (global $b i32 (global.get $a)) + (global $b i32 (global.get $a)) + ;; CHECK: (global $c i32 (i32.const 30)) + (global $c i32 (i32.const 30)) + + ;; CHECK: (func $uses (type $0) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $a) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $b) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $uses + (drop + (global.get $a) + ) + (drop + (global.get $b) + ) + ) +) + +;; As above, but with the counts adjusted to $a, $c, $b. +(module + ;; CHECK: (global $a i32 (i32.const 10)) + (global $a i32 (i32.const 10)) + ;; CHECK: (global $c i32 (i32.const 30)) + + ;; CHECK: (global $b i32 (global.get $a)) + (global $b i32 (global.get $a)) + (global $c i32 (i32.const 30)) + + ;; CHECK: (func $uses (type $0) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $a) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $uses + (drop + (global.get $a) + ) (drop (global.get $c) ) @@ -292,3 +436,380 @@ ) ) ) + +;; Lower letters have lower counts: $a has the least, and $e has the most. +;; +;; Dependency graph (left depends on right): +;; +;; $c - $a +;; / +;; $e +;; \ +;; $d - $b +;; +;; $e has the most uses, followed by $c and $d. $a and $b have a reverse +;; ordering from their dependers, so a naive topological sort will fail to +;; be optimal. There are multiple optimal orders however, including: +;; +;; $b, $a, $c, $d, $e +;; $b, $d, $a, $c, $e +;; +;; $b and $e must be at the edges, but there is no single way to sort the +;; others: the first sorting here puts $a before both $d (though $a has +;; lower count) while the second puts $d before $c. Our greedy algorithm +;; picks the second order here. +(module + ;; CHECK: (global $b i32 (i32.const 20)) + + ;; CHECK: (global $d i32 (global.get $b)) + + ;; CHECK: (global $a i32 (i32.const 10)) + (global $a i32 (i32.const 10)) + + (global $b i32 (i32.const 20)) + + ;; CHECK: (global $c i32 (global.get $a)) + (global $c i32 (global.get $a)) + + (global $d i32 (global.get $b)) + + ;; CHECK: (global $e i32 (i32.add + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: (global.get $d) + ;; CHECK-NEXT: )) + (global $e i32 (i32.add + (global.get $c) + (global.get $d) + )) + + ;; CHECK: (func $uses (type $0) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $e) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $e) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $e) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $e) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $e) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $d) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $d) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $d) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $b) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $uses + ;; $a, $b, $c, $d each have one already from the globals. Add more so that + ;; $a has the least, and $e has the most + (drop (global.get $e)) + (drop (global.get $e)) + (drop (global.get $e)) + (drop (global.get $e)) + (drop (global.get $e)) + + (drop (global.get $d)) + (drop (global.get $d)) + (drop (global.get $d)) + + (drop (global.get $c)) + (drop (global.get $c)) + + (drop (global.get $b)) + ) +) + +;; As above, but add a direct dep from $d to $a: +;; +;; $c - $a +;; / / +;; $e / <-- this was added +;; \ / +;; $d - $b +;; +;; This forces $a to appear before $d: the order goes from before, which was +;; $b, $d, $a, $c, $e +;; to +;; $b, $a, $d, $c, $e +(module + ;; CHECK: (global $b i32 (i32.const 20)) + + ;; CHECK: (global $a i32 (i32.const 10)) + (global $a i32 (i32.const 10)) + + (global $b i32 (i32.const 20)) + + ;; CHECK: (global $d i32 (i32.add + ;; CHECK-NEXT: (global.get $b) + ;; CHECK-NEXT: (global.get $a) + ;; CHECK-NEXT: )) + + ;; CHECK: (global $c i32 (global.get $a)) + (global $c i32 (global.get $a)) + + (global $d i32 (i32.add + (global.get $b) + (global.get $a) ;; this was added + )) + + ;; CHECK: (global $e i32 (i32.add + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: (global.get $d) + ;; CHECK-NEXT: )) + (global $e i32 (i32.add + (global.get $c) + (global.get $d) + )) + + ;; CHECK: (func $uses (type $0) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $e) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $e) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $e) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $e) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $e) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $e) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $d) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $d) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $d) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $d) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $b) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $b) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $uses + ;; $b, $c, $d each have one already from the globals, and $a has two. Add + ;; more so that $a has the least, and $e has the most. + (drop (global.get $e)) + (drop (global.get $e)) + (drop (global.get $e)) + (drop (global.get $e)) + (drop (global.get $e)) + (drop (global.get $e)) + + (drop (global.get $d)) + (drop (global.get $d)) + (drop (global.get $d)) + (drop (global.get $d)) + + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + + (drop (global.get $b)) + (drop (global.get $b)) + ) +) + +;; A situation where the simple greedy sort fails to be optimal. We have a +;; chain and one more independent global +;; +;; a <- b <- c +;; +;; other +;; +;; The candidates for the first global emitted are a and other, as they have no +;; dependencies, and other has a higher count so greedy sorting would pick it. +;; however, c has the highest count by far, so it is worth being less greedy and +;; doing a just in order to be able to do b and then c, and to emit other last. +;; In other words, the original order is best. +(module + ;; CHECK: (global $a i32 (i32.const 0)) + (global $a i32 (i32.const 0)) + ;; CHECK: (global $b i32 (global.get $a)) + (global $b i32 (global.get $a)) + ;; CHECK: (global $c i32 (global.get $b)) + (global $c i32 (global.get $b)) + + ;; CHECK: (global $other i32 (i32.const 1)) + (global $other i32 (i32.const 1)) + + ;; CHECK: (func $uses (type $0) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $other) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $other) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $uses + ;; Ten uses for $c, far more than all the rest combined. + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + + ;; Two uses for other, which is more than $a's single use. + (drop (global.get $other)) + (drop (global.get $other)) + ) +) + +;; As above, but with the original order a little different. This is again a +;; case where the greedy sort is unoptimal (see above), but now also the +;; original sort is unoptimal as well, and instead we use the "sum" sort, which +;; counts the sum of the uses of a global and all the things it depends on and +;; uses that as the count for that global (which in effect means that we take +;; into consideration not only its own size but the entire size that it may +;; unlock, which happens to work well here). +;; +;; The only change in the input compared to the previous test is that $other +;; was moved up to between $b and $c. Sum sort works well here because the +;; first comparison is $a and $other, and sum takes into account $b and $c in +;; $a's favor, so it wins. Likewise $b and $c win against $other as well, so +;; the order is $a, $b, $c, $other which is optimal here. +(module + ;; CHECK: (global $a i32 (i32.const 0)) + (global $a i32 (i32.const 0)) + ;; CHECK: (global $b i32 (global.get $a)) + (global $b i32 (global.get $a)) + + ;; CHECK: (global $c i32 (global.get $b)) + + ;; CHECK: (global $other i32 (i32.const 1)) + (global $other i32 (i32.const 1)) + + (global $c i32 (global.get $b)) + + ;; CHECK: (func $uses (type $0) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $c) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $other) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: (drop + ;; CHECK-NEXT: (global.get $other) + ;; CHECK-NEXT: ) + ;; CHECK-NEXT: ) + (func $uses + ;; Ten uses for $c, far more than all the rest combined. + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + (drop (global.get $c)) + + ;; Two uses for other, which is more than $a's single use. + (drop (global.get $other)) + (drop (global.get $other)) + ) +) + |