forks/binaryen.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	DWARF debug line updating (#2545)	Alon Zakai	2019-12-20	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this, we can update DWARF debug line info properly as we write a new binary. To do that we track binary locations as we write. Each instruction is mapped to the location it is written to. We must also adjust them as we move code around because of LEB optimization (we emit a function or a section with a 5-byte LEB placeholder, the maximal size; later we shrink it which is almost always possible). writeDWARFSections() now takes a second param, the new locations of instructions. It then maps debug line info from the original offsets in the binary to the new offsets in the binary being written. The core logic for updating the debug line section is in wasm-debug.cpp. It basically tracks state machine logic both to read the existing debug lines and to emit the new ones. I couldn't find a way to reuse LLVM code for this, but reading LLVM's code was very useful here. A final tricky thing we need to do is to update the DWARF section's internal size annotation. The LLVM YAML writing code doesn't do that for us. Luckily it's pretty easy, in fixEmittedSection we just update the first 4 bytes in place to have the section size, after we've emitted it and know the size. This ignores debug lines with a 0 in the line, col, or addr, see WebAssembly/debugging#9 (comment) This ignores debug line offsets into the middle of instructions, which LLVM sometimes emits for some reason, see WebAssembly/debugging#9 (comment) Handling that would likely at least double our memory usage, which is unfortunate - we are run in an LTO manner, where the entire app's DWARF is present, and it may be massive. I think we should see if such odd offsets are a bug in LLVM, and if we can fix or prevent that. This does not emit "special" opcodes for debug lines. Those are purely an optimization, which I wanted to leave for later. (Even without them we decrease the size quite a lot, btw, as many lines have 0s in them...) This adds some testing that shows we can load and save fib2.c and fannkuch.cpp properly. The latter includes more than one function and has nontrivial code. To actually emit correct offsets a few minor fixes are done here: * Fix the code section location tracking during reading - the correct offset we care about is the body of the code section, not including the section declaration and size. * Fix wasm-stack debug line emitting. We need to update in BinaryInstWriter::visit(), that is, right before writing bytes for the instruction. That differs from * BinaryenIRWriter::visit which is a recursive function that also calls the children - so the offset there would be of the first child. For some reason that is correct with source maps, I don't understand why, but it's wrong for DWARF... * Print code section offsets in hex, to match other tools. Remove DWARFUpdate pass, which was useful for testing temporarily, but doesn't make sense now (it just updates without writing a binary). cc @yurydelendik
*	DWARF parsing and writing support using LLVM (#2520)	Alon Zakai	2019-12-19	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This imports LLVM code for DWARF handling. That code has the Apache 2 license like us. It's also the same code used to emit DWARF in the common toolchain, so it seems like a safe choice. This adds two passes: --dwarfdump which runs the same code LLVM runs for llvm-dwarfdump. This shows we can parse it ok, and will be useful for debugging. And --dwarfupdate writes out the DWARF sections (unchanged from what we read, so it just roundtrips - for updating we need #2515). This puts LLVM in thirdparty which is added here. All the LLVM code is behind USE_LLVM_DWARF, which is on by default, but off in JS for now, as it increases code size by 20%. This current approach imports the LLVM files directly. This is not how they are intended to be used, so it required a bunch of local changes - more than I expected actually, for the platform-specific stuff. For now this seems to work, so it may be good enough, but in the long term we may want to switch to linking against libllvm. A downside to doing that is that binaryen users would need to have an LLVM build, and even in the waterfall builds we'd have a problem - while we ship LLVM there anyhow, we constantly update it, which means that binaryen would need to be on latest llvm all the time too (which otherwise, given DWARF is quite stable, we might not need to constantly update). An even larger issue is that as I did this work I learned about how DWARF works in LLVM, and while the reading code is easy to reuse, the writing code is trickier. The main code path is heavily integrated with the MC layer, which we don't have - we might want to create a "fake MC layer" for that, but it sounds hard. Instead, there is the YAML path which is used mostly for testing, and which can convert DWARF to and from YAML and from binary. Using the non-YAML parts there, we can convert binary DWARF to the YAML layer's nice Info data, then convert that to binary. This works, however, this is not the path LLVM uses normally, and it supports only some basic DWARF sections - I had to add ranges support, in fact. So if we need more complex things, we may end up needing to use the MC layer approach, or consider some other DWARF library. However, hopefully that should not affect the core binaryen code which just calls a library for DWARF stuff. Helps #2400
*	Write wasm/wast files with BINARYEN_PASS_DEBUG=3 (#2527)	Heejin Ahn	2019-12-13	1	-3/+3
\| \| \| \| \|	Currently `BINARYEN_PASS_DEBUG=3` prints `.wasm` files but they are actually text wast files. This makes `BINARYEN_PASS_DEBUG=3` prints both wasm/wast files, where wasm contains a binary file and wast a text file.
*	Add a RoundTrip pass (#2516)	Alon Zakai	2019-12-09	1	-0/+3
\| \| \| \| \| \|	This pass writes and reads the module. This shows the effects of converting to and back from the binary format, and will be useful in testing dwarf debug support (where we'll need to see that writing and reading a module preserves debug info properly).
*	Use wat over wast for text format filenames (#2518)	Sam Clegg	2019-12-08	1	-1/+1
\|
*	Add a pass to inline __original_main() into main() (#2461)	Alon Zakai	2019-11-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	clang/llvm introduce __original_main as a workaround for the fact that main may have different signatures. A downside to that is that users get it in stack traces, which is confusing. In -O2 and above we normally inline __original_main anyhow, but as this is for debugging, non-optimized builds matter too, so add a pass for this. The implementation is trivial, just call doInling. However we must check some corner cases first. Bonus minor fixes to FindAllPointers, which unnecessarily created an object to get the class Id (which is not valid for all classes), and that it didn't take the input by reference properly, which meant we couldn't get the pointer to the function body's toplevel.
*	Add a --strip-dwarf pass (#2454)	Alon Zakai	2019-11-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This pass strips DWARF debug sections, but not other debug sections. This is useful when emitting source maps, as we do need the SourceMapURL section, but the DWARF sections are not longer necessary (and we've seen a testcase where they are massively large, so big the wasm can't even be loaded in a browser...). Also contains a trivial one-line fix in --extract-function which was necessary to create the testcase here: that pass extracts a function from a wasm file (like llvm-extract) but it didn't check if an export already existed for the function.
*	Add PostAssemblyScript pass (#2407)	Daniel Wirtz	2019-11-19	1	-0/+6
\| \| \| \| \|	Adds the AssemblyScript-specific passes post-assemblyscript and post-assemblyscript-finalize, eliminating redundant ARC-style retain/release patterns conservatively emitted by the compiler.
*	Add ModAsyncify* passes (#2404)	Alon Zakai	2019-10-23	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These passes are meant to be run after Asyncify has been run, they modify the output. We can assume that we will always unwind if we reach an import, or that we will never unwind, etc. This is meant to help with lazy code loading, that is, the ability for an initially-downloaded wasm to not contain all the code, and if code not present there is called, we download all the rest and continue with that. That could work something like this: * The wasm is created. It contains calls to a special import for lazy code loading. * Asyncify is run on it. * The initially downloaded wasm is created by running --mod-asyncify-always-and-only-unwind: if the special import for lazy code loading is called, we will definitely unwind, and we won't rewind in this binary. * The lazily downloaded wasm is created by running --mod-asyncify-never-unwind: we will rewind into this binary, but no longer need support for unwinding. (Optionally, there could also be a third wasm, which has not had Asyncify run on it, and which we'd swap to for max speed.) These --mod-asyncify passes allow the optimizer to do a lot of work, especially for the initially downloaded wasm if we have lots of calls to the lazy code loading import. In that case the optimizer will see that those calls unwind, which means the code after them is not reached, potentially making lots of code dead and removable.
*	SimplifyGlobals: Apply known constant values in linear traces (#2340)	Alon Zakai	2019-09-13	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	This optimizes stuff like (global.set $x (i32.const 123)) (global.get $x) into (global.set $x (i32.const 123)) (i32.const 123) This doesn't help much with LLVM output as it's rare to use globals (except for the stack pointer, and that's already well optimized), but it may help on general wasm. It can also help with Asyncify that does use globals extensively.
*	Duplicate Import Elimination (#2292)	Alon Zakai	2019-08-09	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is both an optimization and a workaround for the problem that emscripten-core/emscripten#7641 uncovered and had to be reverted because of. What's going on there is that wasm-emscripten-finalize turns emscripten_longjmp_jmpbuf into emscripten_longjmp (for some LLVM internal reason - there's a long comment in the source that I didn't fully follow). There are two such imports already, one for each name, and before that PR, we ended up with just one. After that PR, we end up with two. And with two, the minification of import names gets confused - we have two imports with the same name, and the code there ends up ignoring one of them. I'm not sure why that PR changed things - I guess the wasm-emscripten-finalize code looks at the name, and that PR changed what name appears? @sbc100 maybe #2285 is related? Anyhow, it's not trivial to make import minification code support two identical imports, but I don't think we should - we should avoid having such duplication anyhow. And we should add an assert that they don't exist (I'll open a PR for that later when it's possible). This fixes the duplication by adding a useful pass to remove duplicate imports (just functions, for now). Pretty simple, but we didn't do it yet. Even if there is a wasm-emscripten-finalize bug we need to fix with those duplicate imports, I think this pass is still a good thing to add. I confirmed that this fixes the issue caused by that PR.
*	Simpify PassRunner.add() and automatically parallelize parallel functions ↵	Alon Zakai	2019-07-19	1	-19/+14
\| \| \| \| \| \| \| \| \|	(#2242) Main change here is in pass.h, everything else is changes to work with the new API. The add("name") remains as before, while the weird variadic add(..) which constructed the pass now just gets a std::unique_ptr of a pass. This also makes the memory management internally fully automatic. And it makes it trivial to parallelize WalkerPass::run on parallel passes. As a benefit, this allows removing a lot of code since in many cases there is no need to create a new pass runner, and running a pass can be just a single line.
*	Cleanups after renaming Bysyncify to Asyncify (#2228)	Alon Zakai	2019-07-16	1	-2/+0
\| \| \| \| \|	* Clarify the difference between old and new Asyncify. * Remove the old --bysyncify pass option.
*	Bysyncify => Asyncify (#2226)	Alon Zakai	2019-07-15	1	-3/+5
\| \| \| \| \| \| \|	After some discussion this seems like a less confusing name: what the pass does is "asyncify" code, after all. The one downside is the name overlaps with the old emscripten "Asyncify" utility, which we'll need to clarify in the docs there. This keeps the old --bysyncify flag around for now, which is helpful for avoiding temporary breakage on CI as we move the emscripten side as well.
*	Clean up loose ends in feature handling (#2203)	Thomas Lively	2019-07-03	1	-0/+3
\| \| \| \| \|	Fix and test mutable globals support, replace string literals with constants, and add a pass to emit the target features section.
*	Bysyncify: async transform for wasm (#2172)	Alon Zakai	2019-06-15	1	-0/+3
\| \| \| \| \| \| \| \| \|	This adds a new pass, Bysyncify, which transforms code to allow unwind and rewinding the call stack and local state. This allows things like coroutines, turning synchronous code asynchronous, etc. The new pass file itself has a large comment on top with docs. So far the tests here seem to show this works, but this hasn't been tested heavily yet. My next step is to hook this up to emscripten as a replacement for asyncify/emterpreter, see emscripten-core/emscripten#8561 Note that this is completely usable by itself, so it could be useful for any language that needs coroutines etc., and not just ones using LLVM and/or emscripten. See docs on the ABI in the pass source.
*	Add --print-function-map to print out a map of function index to name (#2155)	Alon Zakai	2019-05-31	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	* work * fix * fix * format
*	Allow color API to enable and disable colors (#2111)	Siddharth	2019-05-17	1	-1/+1
\| \| \| \| \| \|	This is useful for front-ends which wish to selectively enable or disable coloring. Also expose these APIs from the C API.
*	wasm2js: avoid reinterprets (#2094)	Alon Zakai	2019-05-10	1	-2/+5
\| \| \| \| \|	In JS a reinterpret is especially expensive, as we implement it as a write to a temp buffer and a read using another view. This finds places where we load a value from memory, then reinterpret it later - in that case, we can load it using another view, at the cost of another load and another local. This is helpful on things like Box2D, where there are many reinterprets due to the main 2D vector class being an union over two floats/ints, and LLVM likes to do a single i64 load of them.
*	Emit process ID in the filenames of byn* tempfiles (#1916)	Alon Zakai	2019-05-10	1	-1/+10
\| \| \|	Helps to avoid trampling each other when binaryen is called multiple times from emcc, for example.
*	Optimize mutable globals (#2066)	Alon Zakai	2019-05-02	1	-0/+4
\| \| \| \| \| \| \|	If a global is marked mutable but not assigned to, make it immutable. If an immutable global is a copy of another, use the original, so we can remove the duplicates. Fixes #2011
*	Add a pass to lower unaligned loads and stores (#2078)	Alon Zakai	2019-05-02	1	-0/+3
\| \| \| \| \|	This replaces the wasm2js code that lowered them to pessimistic (1-byte aligned) loads and stores. The new pass will do the optimal thing, keeping 2-byte alignment where possible. This is also nicer as a standalone pass, which has the simple property that after it runs all loads and stores are aligned, instead of some code scattered inside wasm2js.
*	clang-tidy braces changes (#2075)	Alon Zakai	2019-05-01	1	-1/+2
\| \| \|	Applies the changes in #2065, and temprarily disables the hook since it's too slow to run on a change this large. We should re-enable it in a later commit.
*	Apply format changes from #2048 (#2059)	Alon Zakai	2019-04-26	1	-112/+279
\| \| \|	Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
*	Change default feature set to MVP (#1993)	Thomas Lively	2019-04-16	1	-0/+1
\| \| \| \| \|	In the absence of the target features section or command line flags. When there are command line flags, it is an error if they do not exactly match the target features section, except if --detect-features has been provided. Also adds a --print-features pass to print the command line flags for all enabled options and uses it to make the feature tests more rigorous.
*	Move features from passOptions to Module (#2001)	Thomas Lively	2019-04-12	1	-2/+2
\| \| \| \| \|	This allows us to emit a (potentially modified) target features section and conditionally emit other sections such as the DataCount section based on the presence of features.
*	Move segment merging to fit web limits into its own pass (#1980)	Thomas Lively	2019-04-08	1	-0/+1
\| \| \| \| \| \|	It was previously part of writing a binary, but changing the number of segments at such a late stage would not work in the presence of bulk memory's datacount section. Also updates the memory packing pass to respect the web's limits on the number of data segments.
*	better location for running directize	Alon Zakai	2019-04-02	1	-1/+1
\|
*	works	Alon Zakai (kripken)	2019-03-31	1	-2/+2
\|
*	wip [ci skip]	Alon Zakai (kripken)	2019-03-31	1	-2/+2
\|
*	Semi-SSA improvements (#1965)	Alon Zakai	2019-03-25	1	-0/+6
\| \| \| \| \| \| \|	This adds an ssa-nomerge pass, which like ssa creates new local indexes for each set, but it does not alter indexes that have merges (in practice adding indexes to merges can lead to more copies in the end.) This also stops adding a new local index for a set that is already in "ssa form", that is, has only one set (aside from the zero initialization which wasm mandates, but for an "ssa form" index, that must not be used). This then enables ssa-nomerge in -O3 and -Os. This doesn't help much on well-optimized code like from the wasm backend (but it does sometimes - 0.5% code size improvement on Box2D), but on AssemblyScript for example it can remove a copy in the n-body benchmark as can be seen in the test updates here.
*	optimize-instructions after the last precompute-propagate (#1963)	Alon Zakai	2019-03-22	1	-1/+1
\| \| \| \| \| \| \|	A propagated constant can be helpful in the various patterns in optimize instructions. Testcase shows an example of this in action - we can optimize out a load offset for a constant, but if we propagated it afterwards, we would miss that. In general these two passes can help each other, so maybe they should be combined and run multiple iterations, but that's what --converge is for. Meanwhile this change improves us on what seems to be the more common case - guessed at by it being what I noticed in practice, and when I run the fuzzer, I see only this type of case.
*	Add strip-target-features pass (#1946)	Thomas Lively	2019-03-14	1	-0/+1
\| \| \|	And run it in wasm-emscripten-finalize. This will prevent the emscripten output from changing when the target features section lands in LLVM.
*	Consistently optimize small added constants into load/store offsets (#1924)	Alon Zakai	2019-03-01	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	See #1919 - we did not do this consistently before. This adds a lowMemoryUnused option to PassOptions. It can be passed on the commandline with --low-memory-unused. If enabled, we run the new optimize-added-constants pass, which does the real work here, replacing older code in post-emscripten. Aside from running at the proper time (unlike the old pass, see #1919), this also has a -propagate mode, which can do stuff like this: y = x + 10 [..] load(y) [..] load(y) => y = x + 10 [..] load(x, offset=10) [..] load(x, offset=10) That is, it can propagate such offsets to the loads/stores. This pattern is common in big interpreter loops, where the pointers are offsets into a big struct of state. The pass does this propagation by using a new feature of LocalGraph, which can verify which locals are in SSA mode. Binaryen IR is not SSA (intentionally, since it's a later IR), but if a local only has a single set for all gets, that means that local is in such a state, and can be optimized. The tricky thing is that all locals are initialized to zero, so there are at minimum two sets. But if we verify that the real set dominates all the gets, then the zero initialization cannot reach them, and we are safe. This PR also makes safe-heap aware of lowMemoryUnused. If so, we check for not just an access of 0, but the range 0-1023. This makes zlib 5% faster, with either the wasm backend or asm2wasm. It also makes it 0.5% smaller. Also helps sqlite (1.5% faster) and lua (1% faster)
*	Optimize normally with debug info (#1927)	Alon Zakai	2019-02-28	1	-11/+7
\| \| \| \| \|	* optimize normally with debug info - some of it may be removed, but that's the price of higher optimization levels, and by optimizing normally in profiling and -g2 etc. builds they are more comparable to normal ones, yielding better data * copy debug locations automatically in replaceCurrent in wasm-traversal, so optimization passes at least by default will preserve debuggability
*	Dead return value elimination in DeadArgumentElimination (#1917)	Alon Zakai	2019-02-26	1	-8/+9
\| \| \| \| \| \| \|	* Finds functions whose return value is always dropped, and removes the return. * Run multiple iterations of the pass, as one can enable others. * Do not run DeadArgumentElimination at all if debug info is present (with these improvements, it became much more likely to destroy debug info). Saves 2.5% on hello world, because of some simple libc calls.
*	respect --no-validation in pass-debug mode (#1904)	Alon Zakai	2019-02-12	1	-15/+18
\|
*	Strip the producers section in --strip-producers (#1875)	Alon Zakai	2019-01-31	1	-1/+3
\| \| \| \| \| \| \| \|	WebAssembly/tool-conventions#93 has a summary of emscripten's current thinking on this. For Binaryen, we don't want to do anything to the producers section by default, but do want it to be possible to optionally remove it. To achieve that, this PR * creates a --strip-producers pass that removes that section. * creates a --strip-debug pass that removes debug info, same as the old --strip, which is still around but deprecated. A followup in emscripten will use this pass by default.
*	Massive renaming (#1855)	Thomas Lively	2019-01-07	1	-2/+2
\| \| \| \| \| \|	Automated renaming according to https://github.com/WebAssembly/spec/issues/884#issuecomment-426433329.
*	Minimal JS legalization (#1824)	Alon Zakai	2018-12-14	1	-0/+1
\| \| \| \| \|	Even when we don't want to fully legalize code for JS, we should still legalize things that only JS cares about. In particular, dynCall_* methods are used from JS to call into the wasm table, and if they exist they are only for JS, so we should only legalize them. The use case motivating this is that in dynamic linking you may want to disable legalization, so that wasm=>wasm module calls are fast even with i64s, but you do still need dynCalls to be legalized even in that case, otherwise an invoke with an i64 parameter would fail.
*	No exit runtime pass (#1816)	Alon Zakai	2018-12-13	1	-0/+1
\| \| \|	When emscripten knows that the runtime will not be exited, it can tell codegen to not emit atexit() calls (since those callbacks will never be run). This saves both code size and startup time. In asm2wasm the JSBackend does it directly. For the wasm backend, this pass does the same on the output wasm.
*	Run coalesce-locals after the final simplify-locals.	Alon Zakai (kripken)	2018-12-04	1	-1/+4
\| \| \| \| \| \|	We now emit more sets and tees of if-elses from simplify-locals, and coalesce-locals is necessary to remove them if they are ineffectual, that is, if no get will read them.
*	Add --strip that removes debug info (#1787)	Alon Zakai	2018-12-03	1	-0/+1
\| \| \| \|	This is sort of like --strip on a native binary. The more specific use case for us is e.g. you link with a library that has -g in its CFLAGS, but you don't want debug info in your final executable (I hit this with poppler now). We can make emcc pass this to binaryen if emcc is not building an output with intended debug info.
*	Adjust MinifyImportsAndExports to optionally minify the export names, ↵	Jukka Jylänki	2018-11-22	1	-1/+2
\| \| \| \|	sometimes that is not desirable.
*	Don't try to optimize away unused names in RemoveUnusedBrs (#1750)	Alon Zakai	2018-11-15	1	-1/+2
\| \| \| \| \|	Rely on the dedicated pass for that. It's not worth the extra complexity to try, as we can't easily handle all the cases anyhow. Add another run of the dedicated name-removing pass in the default passes.
*	Add pass to minify import and export names (#1719)	Alon Zakai	2018-11-01	1	-0/+1
\| \| \| \| \| \| \| \| \|	This new pass minifies import and export names, for example, this may minify (import "env" "longname" (func $internal)) to (import "env" "a" (func $internal)) By updating the JS that provides those imports/calls those exports, we can use the minified names properly. This can save a useful amount of space in the wasm and JS, see kripken/emscripten#7414
*	Unify imported and non-imported things (#1678)	Alon Zakai	2018-09-19	1	-11/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes #1649 This moves us to a single object for functions, which can be imported or nor, and likewise for globals (as a result, GetGlobals do not need to check if the global is imported or not, etc.). All imported things now inherit from Importable, which has the module and base of the import, and if they are set then it is an import. For convenient iteration, there are a few helpers like ModuleUtils::iterDefinedGlobals(wasm, [&](Global* global) { .. use global .. }); as often iteration only cares about imported or defined (non-imported) things.
*	DeadArgumentElimination Pass (#1641)	Alon Zakai	2018-09-05	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a pass to remove unnecessary call arguments in an LTO-like manner, that is: * If a parameter is not actually used in a function, we don't need to send anything, and can remove it from the function's declaration. Concretely, (func $a (param $x i32) ..no uses of $x.. ) (func $b (call $a (..)) ) => (func $a ..no uses of $x.. ) (func $b (call $a) ) And * If a parameter is only ever sent the same constant value, we can just set that constant value in the function (which then means that the values sent from the outside are no longer used, as in the previous point). Concretely, (func $a (param $x i32) ..may use $x.. ) (func $b (call $a (i32.const 1)) (call $a (i32.const 1)) ) => (func $a (local $x i32) (set_local $x (i32.const 1) ..may use $x.. ) (func $b (call $a) (call $a) ) How much this helps depends on the codebase obviously, but sometimes it is pretty useful. For example, it shrinks 0.72% on Unity and 0.37% on Mono. Note that those numbers include not just the optimization itself, but the other optimizations it then enables - in particular the second point from earlier leads to inlining a constant value, which often allows constant propagation, and also removing parameters may enable more duplicate function elimination, etc. - which explains how this can shrink Unity by almost 1%. Implementation is pretty straightforward, but there is some work to make the heavy part of the pass parallel, and a bunch of corner cases to avoid (can't change a function that is exported or in the table, etc.). Like the Inlining pass, there is both a standard and an "optimizing" version of this pass - the latter also optimizes the functions it changes, as like Inlining, it's useful to not need to re-run all function optimizations on the whole module.
*	Loop Invariant Code Motion (#1658)	Alon Zakai	2018-09-04	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds an licm pass. Not that important for LLVM-originating code obviously, but for AssemblyScript and other non-LLVM compilers this might help a lot. Also when wasm has GC a bunch more non-LLVM languages may arrive that can benefit. The pass is mostly straightforward. I considered using the DataFlow IR since it's in SSA form, or the CFG IR, but in the end it's actually pretty convenient to use the main IR as it is - with explicit loops already present - plus LocalGraph which connects each get to the sets influencing it. Passed a bunch of fuzzing, and also the emscripten test suite at -O1 with licm added to the default passes (but I don't think it would make sense to run this by default, as LLVM doesn't need it). We limit code moved by this pass as follows: An increased code size on fuzz testcases (and, more rarely, on real inputs) can happen due to stuff like this: (loop (set_local $x (i32.const 1)) .. ) => (set_local $x (i32.const 1)) (loop .. ) For a const or a get_local, such an assignment to a local is both very cheap (a copy to another local may be optimized out later), and moving it out may prevent other optimizations (since we have no pass that tries to move code back in to a loop edit well, not by default, precompute-propagate etc. would do it, but are only run on high opt levels). So I made the pass not move such trivial code (sets/tees of consts or gets). However, the risk remains if code is moved out that is later reduced to a constant, so something like -Os --flatten --licm -Os may make sense.
*	Souper integration + DataFlow optimizations (#1638)	Alon Zakai	2018-08-27	1	-3/+6
\| \| \| \| \| \| \| \| \|	Background: google/souper#323 This adds a --souperify pass, which emits Souper IR in text format. That can then be read by Souper which can emit superoptimization rules. We hope that eventually we can integrate those rules into Binaryen. How this works is we emit an internal "DataFlow IR", which is an SSA-based IR, and then write that out into Souper text. This also adds a --dfo pass, which stands for data-flow optimizations. A DataFlow IR is generated, like in souperify, and then performs some trivial optimizations using it. There are very few things that can do that our other optimizations can't already, but this is also good testing for the DataFlow IR, plus it is good preparation for using Souper's superoptimization output (which would also construct DataFlow IR, like here, but then do some matching on the Souper rules).