summaryrefslogtreecommitdiff
path: root/src
diff options
context:
space:
mode:
authorAlon Zakai <azakai@google.com>2021-11-09 17:01:47 -0800
committerGitHub <noreply@github.com>2021-11-10 01:01:47 +0000
commitc30152666af8db5ef8634165a5a3ff192d3a6c98 (patch)
tree07330c059c5861480db31e396c188879b6800beb /src
parentb260f1cc65096a7784da8ef8ad25a067e0480e5b (diff)
downloadbinaryen-c30152666af8db5ef8634165a5a3ff192d3a6c98.tar.gz
binaryen-c30152666af8db5ef8634165a5a3ff192d3a6c98.tar.bz2
binaryen-c30152666af8db5ef8634165a5a3ff192d3a6c98.zip
CoalesceLocals: Rewrite the algorithm to be linear and to ignore copies (#4314)
The old algorithm can be summarized as: In each basic block, start at the beginning. Each pair of live locals there might interfere with each other, as they might arrive from different entry blocks with different values. Afterwards, go through the block and find overlapping live ranges, and mark interferences there as well. This is non-linear because at the start of the block we do a double-loop over all pairs of live locals, which in general can be O(N^2) (N - number of locals). It also has the downside of ignoring copies: if two locals have overlapping live ranges but they must have identical values on those ranges, they do not actually interfere, for example x = 10; y = x; .. // live ranges overlap here foo(x, y); // live ranges end here. We can ignore this overlap since the copy shows they are identical there, but the pass did not take this into account. To some extent other passes can remove such copies (SimplifyLocals, MergeLocals, RedundantSetElimination), but in general this was a weak spot for the optimizer. I realized there is a solution to both these problems: In Wasm, given that we have a default value for all locals, if a local is live at the start of a block then it must be live at the end of all the blocks reaching it. That is so because the liveness will extend backwards all the way to some set of the local, possibly all the way to the zero-initialization at the start of the function, and it extends that way through all predecessor blocks. A consequence of this is that there are no interferences between locals that only occur during a merge: The live ranges include the predecessor blocks, and theirs, and so forth, until we reach a block where one of the locals is assigned a value different than the other. That is a necessary and sufficient condition for intererence, and therefore when processing a block we only need to look at its contents, and can ignore the merging of control flow, which allows us to be linear. More details on this and on the new algorithm in comments in the source, but the basic idea is that it simply goes through each block in a linear way, finding which values are assigned to each local (using a numbering of unique values), and noting which are live at each time. If two locals are live and one is assigned a value that is not the same as the value in the other, mark them as interfering. This is of substantial benefit to j2wasm output, I believe because it is common there to find local subexpression elimination opportunities after inlining, and each time we find one we add a local. If we inline different functions into the same target, we may end up with copied locals for each of them. (This was not noticed in the past because it is very rare on LLVM output, which has already had inlining and GVN etc. done.) There is a small benefit to LLVM output as well, though just a few percent at best. However, it is enough to be noticeable on some of the code size tests. This is also faster than the previous pass. It's normally not noticeable as this pass is not one of the slowest anyhow, but I found some real-world codebases where the pass becomes 50% faster. I have not found any case where it is slower than the old algorithm. Fuzzed over several days to be sure this is correct, and also verified on the emscripten test suite.
Diffstat (limited to 'src')
-rw-r--r--src/passes/CoalesceLocals.cpp188
1 files changed, 161 insertions, 27 deletions
diff --git a/src/passes/CoalesceLocals.cpp b/src/passes/CoalesceLocals.cpp
index d62981d2c..7b2505094 100644
--- a/src/passes/CoalesceLocals.cpp
+++ b/src/passes/CoalesceLocals.cpp
@@ -57,10 +57,10 @@ struct CoalesceLocals
void increaseBackEdgePriorities();
+ // Calculate interferences between locals. This will will fill
+ // the data structure |interferences|.
void calculateInterferences();
- void calculateInterferences(const SetOfLocals& locals);
-
void pickIndicesFromOrder(std::vector<Index>& order,
std::vector<Index>& indices);
void pickIndicesFromOrder(std::vector<Index>& order,
@@ -146,47 +146,180 @@ void CoalesceLocals::increaseBackEdgePriorities() {
void CoalesceLocals::calculateInterferences() {
interferences.resize(numLocals * numLocals);
std::fill(interferences.begin(), interferences.end(), false);
+
+ // We will track the values in each local, using a numbering where each index
+ // represents a unique different value. This array maps a local index to the
+ // value index it contains.
+ //
+ // To avoid reallocating this array all the time, allocate it once outside the
+ // loop.
+ std::vector<Index> values(numLocals);
+
for (auto& curr : basicBlocks) {
if (liveBlocks.count(curr.get()) == 0) {
continue; // ignore dead blocks
}
- // everything coming in might interfere, as it might come from a different
- // block
- auto live = curr->contents.end;
- calculateInterferences(live);
- // scan through the block itself
+
+ // First, find which gets end a live range. While doing so, also calculate
+ // the effectiveness of sets.
auto& actions = curr->contents.actions;
+ std::vector<bool> endsLiveRange(actions.size(), false);
+ auto live = curr->contents.end;
for (int i = int(actions.size()) - 1; i >= 0; i--) {
auto& action = actions[i];
auto index = action.index;
if (action.isGet()) {
- // new live local, interferes with all the rest
- live.insert(index);
- for (auto i : live) {
- interfere(i, index);
+ if (!live.has(index)) {
+ // The local is not live after us, so its liveness ends here.
+ endsLiveRange[i] = true;
+ live.insert(index);
}
} else {
+ // This is a set. Check if the local is alive after it; if it is then
+ // the set if effective as there is some get that can read the value.
if (live.erase(index)) {
action.effective = true;
}
}
}
+
+ // We have processed from the end of the block to the start, updating |live|
+ // as we go, and now it must be equal to the state at the start of the
+ // block. We will also use |live| in the next loop, and assume it begins
+ // in that state.
+ assert(live == curr->contents.start);
+
+ // Now that we know live ranges, check if locals interfere in this block.
+ // Locals interfere if they might contain different values on areas where
+ // their live ranges overlap. To evaluate that, we do an analysis inside
+ // the block that gives each set a unique value number, and as those flow
+ // around through copies between sets we can see when sets are guaranteed to
+ // be equal.
+
+ // Give all default values the same value ID, regardless of their type. This
+ // is valid since we only coalesce locals of the same type anyhow.
+ auto zeroInit = 0;
+ Index nextValue = 1;
+
+ if (curr.get() == entry) {
+ // Each parameter is assumed to have a different value on entry.
+ for (Index i = 0; i < getFunction()->getNumParams(); i++) {
+ values[i] = nextValue++;
+ }
+
+ for (Index i = getFunction()->getNumParams();
+ i < getFunction()->getNumLocals();
+ i++) {
+ values[i] = zeroInit;
+ }
+ } else {
+ // In any block but the entry, assume that each live local might have a
+ // different value at the start.
+ // TODO: Propagating value IDs across blocks could identify more copies,
+ // however, it would also be nonlinear.
+ for (auto index : curr->contents.start) {
+ values[index] = nextValue++;
+ }
+ }
+
+ // Traverse through the block from start to finish. We keep track of both
+ // liveness (in |live|) and the value IDs in each local (in |values|)
+ // while doing so.
+ for (Index i = 0; i < actions.size(); i++) {
+ auto& action = actions[i];
+ auto index = action.index;
+ if (action.isGet()) {
+ if (endsLiveRange[i]) {
+ bool erased = live.erase(action.index);
+ assert(erased);
+ WASM_UNUSED(erased);
+ }
+ continue;
+ }
+
+ // This is a set. Find the value being assigned to the local.
+ auto* set = (*action.origin)->cast<LocalSet>();
+ Index newValue;
+ if (set->value->is<LocalGet>() || set->value->is<LocalSet>()) {
+ // This is a copy: Either it is a get or a tee, that occurs right
+ // before us. Set our new value to theirs.
+ assert(i > 0 && set->value == *actions[i - 1].origin);
+ newValue = values[actions[i - 1].index];
+ } else {
+ // This is not a copy; assign a new unique value.
+ newValue = nextValue++;
+ }
+ values[index] = newValue;
+
+ // If this set has no gets that read from it, then it does not start a
+ // live range, and it cannot cause interference.
+ if (!action.effective) {
+ continue;
+ }
+
+ // Update interferences: This will interfere with any other local that
+ // is currently live and contains a different value.
+ for (auto other : live) {
+ // This index cannot have been live before this set (as we would be
+ // trampling some other set before us, if so; and then that set would
+ // have been ineffective). We will mark this index as live right after
+ // this loop).
+ assert(other != index);
+ if (values[other] != newValue) {
+ interfere(other, index);
+ }
+ }
+ live.insert(action.index);
+ }
+
+ // Note that we do not need to do anything for merges: while in general an
+ // interference can happen either in a block or when control flow merges,
+ // in wasm we have default values for all locals. As a result, if a local is
+ // live at the beginning of a block, it will be live at the ends of *all*
+ // the blocks reaching it: there is no possibility of an "unset local." That
+ // is, imagine we have this merge with a conflict:
+ //
+ // [a is set to some value] ->-
+ // |
+ // |->- [merge block where a and b are used]
+ // |
+ // [b is set to some value] ->-
+ //
+ // It is true that a conflict happens in the merge block, and if we had
+ // unset locals then the top block would have b unset, and the bottom block
+ // would have a unset, and so there would be no conflict there and the
+ // problem would only appear in the merge. But in wasm, that a and b are
+ // used in the merge block means that they are live at the end of both the
+ // top and bottom block, and that liveness will extend all the way back to
+ // *some* set of those values, possibly only the zero-initialization at the
+ // function start. Therefore a conflict will be noticed in both the top and
+ // bottom blocks, and that merge block does not need to reason about merging
+ // its inputs. In other words, a conflict will appear in the middle of a
+ // block, somewhere, and therefore we leave it to that block to identify,
+ // and so blocks only need to reason about their own contents and not what
+ // arrives to them.
+ //
+ // The one exception here is the entry to the function, see below.
}
- // Params have a value on entry, so mark them as live, as variables
- // live at the entry expect their zero-init value.
- SetOfLocals start = entry->contents.start;
+
+ // We must not try to coalesce parameters as they are fixed. Mark them as
+ // "interfering" so that we do not need to special-case them later.
auto numParams = getFunction()->getNumParams();
for (Index i = 0; i < numParams; i++) {
- start.insert(i);
+ for (Index j = i + 1; j < numParams; j++) {
+ interfereLowHigh(i, j);
+ }
}
- calculateInterferences(start);
-}
-void CoalesceLocals::calculateInterferences(const SetOfLocals& locals) {
- Index size = locals.size();
- for (Index i = 0; i < size; i++) {
- for (Index j = i + 1; j < size; j++) {
- interfereLowHigh(locals[i], locals[j]);
+ // We must handle interference between uses of the zero-init value and
+ // parameters manually. A zero initialization represents a set (to a default
+ // value), and that set would be what alerts us to a conflict, but there is no
+ // actual set in the IR since the zero-init value is applied implicitly.
+ for (auto i : entry->contents.start) {
+ if (i >= numParams) {
+ for (Index j = 0; j < numParams; j++) {
+ interfereLowHigh(j, i);
+ }
}
}
}
@@ -376,12 +509,13 @@ void CoalesceLocals::applyIndices(std::vector<Index>& indices,
set->index = indices[set->index];
// in addition, we can optimize out redundant copies and ineffective
// sets
- LocalGet* get;
- if ((get = set->value->dynCast<LocalGet>()) &&
- get->index == set->index) {
- action.removeCopy();
- continue;
+ if (auto* get = set->value->dynCast<LocalGet>()) {
+ if (get->index == set->index) {
+ action.removeCopy();
+ continue;
+ }
}
+ // TODO: do the same for a tee
// remove ineffective actions
if (!action.effective) {
// value may have no side effects, further optimizations can eliminate