summaryrefslogtreecommitdiff
path: root/src/support/hash.h
Commit message (Collapse)AuthorAgeFilesLines
* Make 32-bit hashing identical to 64-bit in TypeSSA (#7048)Alon Zakai2024-11-041-10/+12
| | | | | | | | | | | This is NFC on 64-bit systems but noticeable on 32. Also remove the 32-bit path in hash_combine. That isn't necessary for this fix, but it makes the code simpler and also makes debugging between systems simpler. It might also avoid problems in future cases, if we are lucky. The only cost is perhaps a slight slowdown on 32-bit systems, which seems worth it. Fixes #7046
* TypeSSA: Handle collisions by adding a hash to ensure a fresh rec group (#5724)Alon Zakai2023-05-191-1/+4
| | | Fixes #5720
* [NFC] Add a generic hash implementation for tuples (#5162)Thomas Lively2022-10-191-0/+18
| | | | | We already provided a specialization of `std::hash` for arbitrary pairs, so add one for `std::tuple` as well. Use the new specialization where we were previously using nested pairs just to be able to use the pair specialization.
* [GUFA] Improve hashing (#5091)Alon Zakai2022-09-281-1/+1
| | | | Avoid manually doing bitshifts etc. - leave combining to the core hash logic, which can do a better job.
* Support comparing, subtyping, and naming recursive types (#3610)Thomas Lively2021-02-251-0/+13
| | | | | | | | | | | | | | | | | | | | | When the type section is emitted, types with an equal amount of references are ordered by an arbitrary measure of simplicity, which previously would infinitely recurse on structurally equivalent recursive types. Similarly, calculating whether an recursive type was a subtype of another recursive type could have infinitely recursed. This PR avoids infinite recursions in both cases by switching the algorithms from using normal inductive recursion to using coinductive recursion. The difference is that while the inductive algorithms assume the relations do not hold for a pair of HeapTypes until they have been exhaustively shown to hold, the coinductive algorithms assume the relations hold unless a counterexample can be found. In addition to those two algorithms, this PR also implement name generation for recursive types, using de Bruijn indices to stand in for inner uses of the recursive types. There are additional algorithms that will need to be switched from inductive to coinductive recursion, such as least upper bound generation, but these presented a good starting point and are sufficient to get some interesting programs working end-to-end.
* Add 64-bit hash_combine (#3041)Daniel Wirtz2020-08-161-1/+9
| | | Currently only the low 32-bits of a hash are guaranteed to be shuffled before combining with the other hash, so this PR also adds a 64-bit variant of hash_combine, including a comment on where the constants are coming from.
* Refactor hashing (#3023)Daniel Wirtz2020-08-121-19/+14
| | | | | | | | | * Unifies internal hashing helpers to naturally integrate with std::hash * Removes the previous custom implementation * Computed hashes are now always size_t * Introduces a hash_combine helper * Fixes an overwritten partial hash in Relooper.cpp
* Apply format changes from #2048 (#2059)Alon Zakai2019-04-261-1/+2
| | | Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
* Stack IR (#1623)Alon Zakai2018-07-301-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new IR, "Stack IR". This represents wasm at a very low level, as a simple stream of instructions, basically the same as wasm's binary format. This is unlike Binaryen IR which is structured and in a tree format. This gives some small wins on binary sizes, less than 1% in most cases, usually 0.25-0.50% or so. That's not much by itself, but looking forward this prepares us for multi-value, which we really need an IR like this to be able to optimize well. Also, it's possible there is more we can do already - currently there are just a few stack IR optimizations implemented, DCE local2stack - check if a set_local/get_local pair can be removed, which keeps the set's value on the stack, which if the stars align it can be popped instead of the get. Block removal - remove any blocks with no branches, as they are valid in wasm binary format. Implementation-wise, the IR is defined in wasm-stack.h. A new StackInst is defined, representing a single instruction. Most are simple reflections of Binaryen IR (an add, a load, etc.), and just pointers to them. Control flow constructs are expanded into multiple instructions, like a block turns into a block begin and end, and we may also emit extra unreachables to handle the fact Binaryen IR has unreachable blocks/ifs/loops but wasm does not. Overall, all the Binaryen IR differences with wasm vanish on the way to stack IR. Where this IR lives: Each Function now has a unique_ptr to stack IR, that is, a function may have stack IR alongside the main IR. If the stack IR is present, we write it out during binary writing; if not, we do the same binaryen IR => wasm binary process as before (this PR should not affect speed there). This design lets us use normal Passes on stack IR, in particular this PR defines 3 passes: Generate stack IR Optimize stack IR (might be worth splitting out into separate passes eventually) Print stack IR for debugging purposes Having these as normal passes is convenient as then they can run in parallel across functions and all the other conveniences of our current Pass system. However, a downside of keeping the second IR as an option on Functions, and using normal Passes to operate on it, means that we may get out of sync: if you generate stack IR, then modify binaryen IR, then the stack IR may no longer be valid (for example, maybe you removed locals or modified instructions in place etc.). To avoid that, Passes now define if they modify Binaryen IR or not; if they do, we throw away the stack IR. Miscellaneous notes: Just writing Stack IR, then writing to binary - no optimizations - is 20% slower than going directly to binary, which is one reason why we still support direct writing. This does lead to some "fun" C++ template code to make that convenient: there is a single StackWriter class, templated over the "mode", which is either Binaryen2Binary (direct writing), Binaryen2Stack, or Stack2Binary. This avoids a lot of boilerplate as the 3 modes share a lot of code in overlapping ways. Stack IR does not support source maps / debug info. We just don't use that IR if debug info is present. A tiny text format comment (if emitting non-minified text) indicates stack IR is present, if it is ((; has Stack IR ;)). This may help with debugging, just in case people forget. There is also a pass to print out the stack IR for debug purposes, as mentioned above. The sieve binaryen.js test was actually not validating all along - these new opts broke it in a more noticeable manner. Fixed. Added extra checks in pass-debug mode, to verify that if stack IR should have been thrown out, it was. This should help avoid any confusion with the IR being invalid. Added a comment about the possible future of stack IR as the main IR, depending on optimization results, following some discussion earlier today.
* duplicate-function-elimination improvements (#1590)Alon Zakai2018-06-071-3/+6
| | | | | | | On a codebase with 370K functions, 160K were in fact duplicate (!)... and it took many many passes to figure that out, over 2 minutes in fact (!), as A and B may be identical only after we see that the functions C1, C2 that they call are identical (so there can be long "chains" here). To avoid this, limit how many passes we do. In -O1, just do one pass - that gets most duplicates. In -O2, do 10 passes - that gets almost all of it on this codebase. And in -O3 (or -Os/-Oz) do as many passes as necessary (i.e., the old behavior). This at least lets iteration builds (-O1) be nice and fast. This PR also refactors the hashing code used in that pass, moving it to nicer header files for clearer readability. Also some other minor cleanups in hashing code that helped debug this.
* disambiguate hash usage (#1182)Alon Zakai2017-09-131-1/+1
|
* Const hoisting (#1176)Alon Zakai2017-09-121-0/+5
| | | A pass that hoists repeating constants to a local, and replaces their uses with a get of that local. This can reduce binary size, but can also *increase* gzip size, so it's mostly for experimentation and not used by default.
* add hash utility, and support for hashing and comparing expressionsAlon Zakai2016-05-281-0/+39