determinism fix: hash results may differ between runs (#1431)

Hash results may differ between runs, as they can depend on pointers. In remove-duplicate-functions, that shouldn't matter, except that we only considered the first item in each hash group vs the others (to avoid O(N^2)), which is fine except for hash collisions (collisions mean 2 groups are merged into one, and considering just the first item vs the rest we miss out on the other duplicates in that single group). And hash collisions do occur (rarely) in practice. Instead, consider all comparisons in each hash group, which should be fine unless we have large amounts of hash collisions.
author: Alon Zakai <alonzakai@gmail.com> 2018-02-16 08:18:55 -0800
committer: GitHub <noreply@github.com> 2018-02-16 08:18:55 -0800
commit: 27000a9bfa616133c9368214386fd0416f398dfe (patch)
tree: dec89c6aabbed596fa7671432a071a889b4fb4a9 /src
parent: 85ae8cc6ffeffa65ff30d17649a0d8589dab2b00 (diff)
download: binaryen-27000a9bfa616133c9368214386fd0416f398dfe.tar.gz
binaryen-27000a9bfa616133c9368214386fd0416f398dfe.tar.bz2
binaryen-27000a9bfa616133c9368214386fd0416f398dfe.zip
1 files changed, 16 insertions, 17 deletions
diff --git a/src/passes/DuplicateFunctionElimination.cpp b/src/passes/DuplicateFunctionElimination.cpp
index c7852237b..027d16b4a 100644
--- a/src/passes/DuplicateFunctionElimination.cpp
+++ b/src/passes/DuplicateFunctionElimination.cpp
@@ -102,23 +102,22 @@ struct DuplicateFunctionElimination : public Pass {
       std::set<Name> duplicates;
       for (auto& pair : hashGroups) {
         auto& group = pair.second;
-        if (group.size() == 1) continue;
-        // pick a base for each group, and try to replace everyone else to it. TODO: multiple bases per hash group, for collisions
-#if 0
-        // for comparison purposes, pick in a deterministic way based on the names
-        Function* base = nullptr;
-        for (auto* func : group) {
-          if (!base || strcmp(func->name.str, base->name.str) < 0) {
-            base = func;
-          }
-        }
-#else
-        Function* base = group[0];
-#endif
-        for (auto* func : group) {
-          if (func != base && equal(func, base)) {
-            replacements[func->name] = base->name;
-            duplicates.insert(func->name);
+        Index size = group.size();
+        if (size == 1) continue;
+        // The groups should be fairly small, and even if a group is large we should
+        // have almost all of them identical, so we should not hit actual O(N^2)
+        // here unless the hash is quite poor.
+        for (Index i = 0; i < size - 1; i++) {
+          auto* first = group[i];
+          if (duplicates.count(first->name)) continue;
+          for (Index j = i + 1; j < size; j++) {
+            auto* second = group[j];
+            if (duplicates.count(second->name)) continue;
+            if (equal(first, second)) {
+              // great, we can replace the second with the first!
+              replacements[second->name] = first->name;
+              duplicates.insert(second->name);
+            }
           }
         }
       }
author	Alon Zakai <alonzakai@gmail.com>	2018-02-16 08:18:55 -0800
committer	GitHub <noreply@github.com>	2018-02-16 08:18:55 -0800
commit	27000a9bfa616133c9368214386fd0416f398dfe (patch)
tree	dec89c6aabbed596fa7671432a071a889b4fb4a9 /src
parent	85ae8cc6ffeffa65ff30d17649a0d8589dab2b00 (diff)
download	binaryen-27000a9bfa616133c9368214386fd0416f398dfe.tar.gz binaryen-27000a9bfa616133c9368214386fd0416f398dfe.tar.bz2 binaryen-27000a9bfa616133c9368214386fd0416f398dfe.zip