diff options
author | Alon Zakai <azakai@google.com> | 2023-10-03 16:39:12 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-10-03 16:39:12 -0700 |
commit | b2e096d79c36daa2cbfb7dc3db31af76e9f45cc8 (patch) | |
tree | 52f962128c9fe6c1b8cc8b847e5af40f33117c40 /test/passes | |
parent | 24779b2a3fe5e5c7cc6b1da3661d346cd9c129ae (diff) | |
download | binaryen-b2e096d79c36daa2cbfb7dc3db31af76e9f45cc8.tar.gz binaryen-b2e096d79c36daa2cbfb7dc3db31af76e9f45cc8.tar.bz2 binaryen-b2e096d79c36daa2cbfb7dc3db31af76e9f45cc8.zip |
RemoveUnusedBrs: Allow less unconditional work and in particular division (#5989)
Fixes #5983: The testcase from there is used here in a new testcase
remove-unused-brs_levels in which we check if we are willing to unconditionally
do a division operation. Turning an if with an arm that does a division into a
select, which always does the division, is almost 5x slower, so we should probably
be extremely careful about doing that.
I took some measurements and have some suggestions for changes in this PR:
* Raise the cost of div/rem to what I measure on my machine, which is 5x slower
than an add, or worse.
* For some reason we added the if arms rather than take the max of them, so
fix that. This does not help the issue, but was confusing.
* Adjust TooCostlyToRunUnconditionally in the pass from 9 to 8 (this helps
balance the last point).
* Use half that value when not optimizing for size. That is, we allow only 4 extra
unconditional work normally, and 8 in -Os, and when -Oz then we allow any
extra amount.
Aside from the new testcases, some existing ones changed. They all appear to
change in a reasonable way, to me.
We should perhaps go even further than this, and not even run a division
unconditionally in -Os, but I wasn't sure it makes sense to go that far as
other benchmarks may be affected. For now, this makes the benchmark in
#5983 run at full speed in -O3 or -Os, and it remains slow in -Oz. The
modified version of the benchmark that only divides in the if (no other
operations) is still fast in -O3, but it become slow in -Os as we do turn that
if into a select (but again, I didn't want to go that far as to overfit on that one
benchmark).
Diffstat (limited to 'test/passes')
-rw-r--r-- | test/passes/remove-unused-brs_enable-multivalue.txt | 7 |
1 files changed, 3 insertions, 4 deletions
diff --git a/test/passes/remove-unused-brs_enable-multivalue.txt b/test/passes/remove-unused-brs_enable-multivalue.txt index 9df644daa..0a66153e9 100644 --- a/test/passes/remove-unused-brs_enable-multivalue.txt +++ b/test/passes/remove-unused-brs_enable-multivalue.txt @@ -2502,8 +2502,9 @@ (i32.const 0) ) (func $ifs-copies-recursive (param $20 i32) (result i32) - (local.set $20 - (select + (if + (i32.const 1) + (local.set $20 (select (select (i32.const 4) @@ -2513,8 +2514,6 @@ (local.get $20) (i32.const 2) ) - (local.get $20) - (i32.const 1) ) ) (local.get $20) |