forks/binaryen.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	Massive renaming (#1855)	Thomas Lively	2019-01-07	1	-122/+122
\| \| \| \| \| \|	Automated renaming according to https://github.com/WebAssembly/spec/issues/884#issuecomment-426433329.
*	Properly optimize loop values (#1800)	Alon Zakai	2018-12-05	1	-19/+19
\| \| \|	Remove the existing hack, and optimize them just like we do for ifs and blocks. This is now able to handle a few more cases than before.
*	Run coalesce-locals after the final simplify-locals.	Alon Zakai (kripken)	2018-12-04	1	-39/+24
\| \| \| \| \| \|	We now emit more sets and tees of if-elses from simplify-locals, and coalesce-locals is necessary to remove them if they are ineffectual, that is, if no get will read them.
*	Use getTempRet0/setTempRet0 in LegalizeJSInterface.cpp (#1709)	Sam Clegg	2018-11-20	1	-38/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Its simpler if we always import these functions from the embedder rather then synthesizing them various placed. This is part of a 4 part change: LLVM: https://reviews.llvm.org/D53240 fastcomp: https://github.com/kripken/emscripten-fastcomp/pull/237 emscripten: https://github.com/kripken/emscripten/pull/7358 binaryen: https://github.com/kripken/emscripten/pull/7358 Fixes: https://github.com/kripken/emscripten/issues/7273 Fixes: https://github.com/kripken/emscripten/issues/7304
*	Rename tableBase/memoryBase to __table_base/__memory_base (#1731)	Sam Clegg	2018-11-08	1	-4/+4
\|
*	Emit imports before defined things in text format (#1715)	Alon Zakai	2018-11-01	1	-1/+1
\| \| \| \| \|	That is the correct order in the text format, wabt errors otherwise. See AssemblyScript/assemblyscript#310
*	Unify imported and non-imported things (#1678)	Alon Zakai	2018-09-19	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes #1649 This moves us to a single object for functions, which can be imported or nor, and likewise for globals (as a result, GetGlobals do not need to check if the global is imported or not, etc.). All imported things now inherit from Importable, which has the module and base of the import, and if they are set then it is an import. For convenient iteration, there are a few helpers like ModuleUtils::iterDefinedGlobals(wasm, [&](Global* global) { .. use global .. }); as often iteration only cares about imported or defined (non-imported) things.
*	DeadArgumentElimination Pass (#1641)	Alon Zakai	2018-09-05	1	-296/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a pass to remove unnecessary call arguments in an LTO-like manner, that is: * If a parameter is not actually used in a function, we don't need to send anything, and can remove it from the function's declaration. Concretely, (func $a (param $x i32) ..no uses of $x.. ) (func $b (call $a (..)) ) => (func $a ..no uses of $x.. ) (func $b (call $a) ) And * If a parameter is only ever sent the same constant value, we can just set that constant value in the function (which then means that the values sent from the outside are no longer used, as in the previous point). Concretely, (func $a (param $x i32) ..may use $x.. ) (func $b (call $a (i32.const 1)) (call $a (i32.const 1)) ) => (func $a (local $x i32) (set_local $x (i32.const 1) ..may use $x.. ) (func $b (call $a) (call $a) ) How much this helps depends on the codebase obviously, but sometimes it is pretty useful. For example, it shrinks 0.72% on Unity and 0.37% on Mono. Note that those numbers include not just the optimization itself, but the other optimizations it then enables - in particular the second point from earlier leads to inlining a constant value, which often allows constant propagation, and also removing parameters may enable more duplicate function elimination, etc. - which explains how this can shrink Unity by almost 1%. Implementation is pretty straightforward, but there is some work to make the heavy part of the pass parallel, and a bunch of corner cases to avoid (can't change a function that is exported or in the table, etc.). Like the Inlining pass, there is both a standard and an "optimizing" version of this pass - the latter also optimizes the functions it changes, as like Inlining, it's useful to not need to re-run all function optimizations on the whole module.
*	Stack IR (#1623)	Alon Zakai	2018-07-30	1	-27/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a new IR, "Stack IR". This represents wasm at a very low level, as a simple stream of instructions, basically the same as wasm's binary format. This is unlike Binaryen IR which is structured and in a tree format. This gives some small wins on binary sizes, less than 1% in most cases, usually 0.25-0.50% or so. That's not much by itself, but looking forward this prepares us for multi-value, which we really need an IR like this to be able to optimize well. Also, it's possible there is more we can do already - currently there are just a few stack IR optimizations implemented, DCE local2stack - check if a set_local/get_local pair can be removed, which keeps the set's value on the stack, which if the stars align it can be popped instead of the get. Block removal - remove any blocks with no branches, as they are valid in wasm binary format. Implementation-wise, the IR is defined in wasm-stack.h. A new StackInst is defined, representing a single instruction. Most are simple reflections of Binaryen IR (an add, a load, etc.), and just pointers to them. Control flow constructs are expanded into multiple instructions, like a block turns into a block begin and end, and we may also emit extra unreachables to handle the fact Binaryen IR has unreachable blocks/ifs/loops but wasm does not. Overall, all the Binaryen IR differences with wasm vanish on the way to stack IR. Where this IR lives: Each Function now has a unique_ptr to stack IR, that is, a function may have stack IR alongside the main IR. If the stack IR is present, we write it out during binary writing; if not, we do the same binaryen IR => wasm binary process as before (this PR should not affect speed there). This design lets us use normal Passes on stack IR, in particular this PR defines 3 passes: Generate stack IR Optimize stack IR (might be worth splitting out into separate passes eventually) Print stack IR for debugging purposes Having these as normal passes is convenient as then they can run in parallel across functions and all the other conveniences of our current Pass system. However, a downside of keeping the second IR as an option on Functions, and using normal Passes to operate on it, means that we may get out of sync: if you generate stack IR, then modify binaryen IR, then the stack IR may no longer be valid (for example, maybe you removed locals or modified instructions in place etc.). To avoid that, Passes now define if they modify Binaryen IR or not; if they do, we throw away the stack IR. Miscellaneous notes: Just writing Stack IR, then writing to binary - no optimizations - is 20% slower than going directly to binary, which is one reason why we still support direct writing. This does lead to some "fun" C++ template code to make that convenient: there is a single StackWriter class, templated over the "mode", which is either Binaryen2Binary (direct writing), Binaryen2Stack, or Stack2Binary. This avoids a lot of boilerplate as the 3 modes share a lot of code in overlapping ways. Stack IR does not support source maps / debug info. We just don't use that IR if debug info is present. A tiny text format comment (if emitting non-minified text) indicates stack IR is present, if it is ((; has Stack IR ;)). This may help with debugging, just in case people forget. There is also a pass to print out the stack IR for debug purposes, as mentioned above. The sieve binaryen.js test was actually not validating all along - these new opts broke it in a more noticeable manner. Fixed. Added extra checks in pass-debug mode, to verify that if stack IR should have been thrown out, it was. This should help avoid any confusion with the IR being invalid. Added a comment about the possible future of stack IR as the main IR, depending on optimization results, following some discussion earlier today.
*	Merge loop tails up (#1543)	Alon Zakai	2018-05-10	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	E.g. ``` (block .. (loop $l .. (br_if $l (..)) .. code that does not branch to the loop top ) .. that code could be moved here .. ) ``` Moving the code out of the loop may help the loop body become a singleton expression, and is more readable anyhow.
*	do more optimizations after inlining: precompute-propagate plus all regular ↵	Alon Zakai	2018-04-30	1	-13/+11
\| \| \| \|	opts (#1523)
*	Improve precompute-propagate (#1514)	Alon Zakai	2018-04-26	1	-18/+1
\| \| \|	Propagate constants through a tee_local. Found by Souper. Details in patch comments - basically we didn't differentiate precomputing a value and an expression.
*	Add i64 high bits (tempRet0) helper funcs when legalizing JS interface (#1428)	Alon Zakai	2018-02-22	1	-0/+10
\| \| \| \|	* add tempRet0 helpers when necessary in legalize-js-interface
*	Inlining improvements (#1397)	Alon Zakai	2018-02-02	1	-58/+27
\| \| \| \| \| \| \|	Simplify inlining logic: don't special case the first iteration or change behavior based on when we are optimizing or not. Instead, use one simpler set of heuristics for both inlining and inlining-optimizing. We only run inlining-optimizing by default anyhow, no point to try to make inlining without optimizations useful by itself, it's not a realistic use case. (inlining is still useful for debugging, and if you will run optimizations anyhow later on everything, in which case inlining-optimizing might add some redundancy.) The simpler heuristics after this let us do a somewhat better job as we are no longer paranoid about inlining in multiple iterations. Also raise limit on inlining things that are obviously worth it from size 1 to size 2: things of size 2 will never lead to an increase in code size after we optimize (it takes at least 3 nodes to generate something that reads two locals and reverses their order, which would require a temp local in the outside scope etc.). Also fix infinite recursion of inlining an infinitely recursive set of calls.
*	Inlining improvements (#1375)	Alon Zakai	2018-01-24	1	-29/+14
\| \| \| \| \| \|	* inline 1-element functions when optimizing, as they will be smaller than the call we are replacing * add an extra useful vacuum after inlining+optimizing
*	Global optimization fixes (#1360)	Alon Zakai	2018-01-17	1	-96/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* run dfe at the very end, as it may be more effective after inlining * optimize reorder-functions * do a final dfe in asm2wasm after all other opts * make inlining deterministic: std::atomic<T> values are not zero-initialized * do global post opts at the end of asm2wasm, and don't also do them in the module builder * fix function type removing * don't inline+optimize when preserving debug info
*	Optimize out memory and table when possible (#1352)	Alon Zakai	2018-01-10	1	-4/+4
\| \| \|	We can remove the memory/table (itself, or an import if imported) if they are not used. This is pretty minor on a large wasm file, but when reading small wasts it's very noticeable to have an unused memory and table all the time.
*	Redundant Set Elimination pass (#1344)	Alon Zakai	2018-01-05	1	-3/+1
\| \| \| \|	This optimizes #1343. It looks for stores of a value that is already present in the local, which in particular can remove the initial set to 0 of loops starting at zero, since all locals are initialized to that already. This helps in real-world code, but is not super-common since coalescing means we tend to have assigned something else to it anyhow before we need it to be zero, so this mainly helps in small functions (and running this before coalescing would extend live ranges in potentially bad ways).
*	Emit binary function index in comment in text format, for convenience (#1232)	Alon Zakai	2017-10-20	1	-30/+30
\|
*	Share trap mode between asm2wasm and s2wasm (#1168)	jgravelle-google	2017-10-02	1	-14/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Extract Asm2WasmBuilder::TrapMode to shared FloatTrapMode * Extract makeTrappingI32Binary * Extract makeTrappingI64Binary * Extract asm2wasm test script into scripts/test/asm2wasm.py This matches s2wasm.py, and makes iterating on asm2wasm slightly faster. * Simplify callsites with an arg struct * Combine func adding across i32 and i64 * Support f32-to-int in asm2wasm * Add BinaryenTrapMode pass, run pass from s2wasm * BinaryenTrapMode pass takes trap context as a parameter * Pass fully supports non-trapping binary ops * Defer adding functions until after iteration (hackily) * Update asm2wasm to work with deferred function adding, rebuild tests * Extract makeTrappingFloatToInt32 * Extract makeTrappingFloatToInt64 * Add unary conversions to trap pass * Add functions in the pass itself * Set s2wasm trap mode with command-line arguments * Print BINARYEN_PASS_DEBUG state when testing * Get asm2wasm using the BinaryenTrapMode pass instead of handling it inline * Also handle f32 to int in asm2wasm * Make BinaryenTrapMode only need a FloatTrapMode from the caller * Just pass the current binary Expression directly * Combine makeTrappingI32Binary with makeTrappingI64Binary * Pass Unary expr to makeTrappingFloatToInt32 * Unify makeTrappingFloatToInt32 & 64 * Move makeTrapping* functions inside BinaryenTrapMode, make addedFunctions non-static * Remove FloatTrapContext * Minor cleanups * Extract some smaller subfunctions * Emit name switch/casing, rename is32Bit to isI64 for consistency * Rename BinaryenTrapMode to FloatTrap, make trap mode a nested enum * Add some comments explaining why FloatTrap is non-parallel * Rename addedFunctions to generatedFunctions for precision * Rename move and split float-clamp.h to passes/FloatTrap.(h\|cpp) * Use builder instead of allocator * Instantiate trap handling passes via the pass manager * Move passes/FloatTrap.h to ast/trapping.h * Add helper function to add trap-handling passes * Add trap mode pass tests * Rename FloatTrap.cpp to TrapMode.cpp * Add s2wasm trap mode tests. Force float->int conversion to be signed * Add trapping_sint_div_s test to unit.asm.js * Fix flake8 issues with test scripts * Update pass description comment * Extract building functions methods * Make generate functions into top-level functions * Add GeneratedTrappingFunctions class to manage function/import additions * Move ensure/makeTrapping functions outside class scope * Use GeneratedTrappingFunctions to add immediately in asm2wasm mode * Remove trapping_sint_div_s test We only added it to test that trapping divisions would get constant-folded at the correct time. Now that we're not changing the timing of trapping modes, the test is unneeded (and problematic). * Review feedback, add validator/.wasm to .gitignore Add support for unsigned float-to-int conversion * Use opcode directly instead of bools * Update s2wasm clamp test for unsigned ftoi
*	precompute-propagate pass (#1179)	Alon Zakai	2017-09-12	1	-48/+22
\| \| \| \| \| \| \|	Implements #1172: this adds a variant of precompute, "precompute-propagate", which also does constant propagation. Precompute by itself just runs the interpreter on each expression and sees if it is in fact a constant; precompute-propagate also looks at the graph of connections between get and set locals, and propagates those constant values. This helps with cases as noticed in #1168 - while in most cases LLVM will do this already, it's important when inlining, e.g. inlining of the clamping math functions. This new pass is run when inlining, and otherwise only in -O3/-Oz, as it does increase compilation time noticeably if run on everything (and for almost no benefit if LLVM has run). Most of the code here is just refactoring out from the ssa pass the get/set graph computation, so it can now be used by both the ssa pass and precompute-propagate.
*	Improve and enable inlining pass (#966)	Alon Zakai	2017-08-07	1	-64/+101
\| \| \| \| \| \| \| \|	* improve inlining pass to inline single-use functions that are fairly small, which makes it useful for removing unnecessary global constructors from clang. add an inlining-optimizing pass that also optimizes where it inlined, as new opportunities arise. enable that it by default in O2+ * fix a bug where we didn't run all passes properly - refactor addDefaultGlobalOptimizationPasses() into a pre and post version. we can only run the post version in incremental optimizing builds (functions appear one by one, we optimize them first, and do global stuff when all are done), but can run both when doing a full optimize * copy in inlining, allowing multiple inlinings of the same function in the future
*	fix asm2wasm wasm-only i64 switches with a large (>32 bit) offset #1109 (#1111)	Alon Zakai	2017-07-26	1	-4/+48
\|
*	loads may trap, do not remove them in vacuum unless the pass options allow that	Alon Zakai (kripken)	2017-07-11	1	-1/+130
\|
*	Code folding (#1076)	Alon Zakai	2017-06-28	1	-17/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds a pass that folds code, i.e. merges it when possible. See details in comment in the pass implementation cpp. This is enabled by default in -Os and -Oz. Seems risky to enable anywhere else, as it does add branches - likely predictable ones so maybe no slowdown, but still some risk. Code size numbers: wasm-backend: 196331 + binaryen -Os (before): 182598 + binaryen -Os (with folding): 181943 asm2wasm -Os (before): 172463 asm2wasm -Os (with folding): 168774 So this reduces wasm-backend output by an additional 0.5% than it could before. Mainly this is because the wasm backend already has code folding, whereas on asm2wasm output, where we didn't have folding before, this saves over 2%. The 0.5% improvement on the wasm backend's output might be because this can fold more types of code than LLVM can (it can fold nested control flow, in particular).
*	Consistently handle possible traps in all cases (#1062)	Alon Zakai	2017-06-22	1	-14/+74
\| \| \| \| \| \|	* consistently handle possible traps in asm.js ffi results * consistently handle possible traps in i64/float conversions
*	Support new result syntax for if/loop/block (#1047)	Sam Clegg	2017-06-12	1	-6/+6
\| \| \| \| \| \|	Support both syntax formats in input since the old spec tests still need to be parsable.
*	use TypeUpdater in vacuum	Alon Zakai (kripken)	2017-05-20	1	-1/+0
\|
*	Validate finalization (#1014)	Alon Zakai	2017-05-18	1	-0/+331
\| \| \| \| \| \| \|	* validate that types are properly finalized, when in pass-debug mode (BINARYEN_PASS_DEBUG env var): check after each pass is run that the type of each node is equal to the proper type (when finalizing it, i.e., fully recomputing the type). * fix many fuzz bugs found by that. * in particular, fix dce bugs with type changes not being fully updated during code removal. add a new TypeUpdater helper class that lets a pass update types efficiently, by the helper tracking deps between blocks and branches etc., and updating/propagating type changes only as necessary.
*	optimize if and select in the case their values are identical (#1013)	Alon Zakai	2017-05-17	1	-19/+4
\|
*	merge blocks before and after remove-unused-brs	Alon Zakai (kripken)	2017-05-10	1	-1/+0
\|
*	use a single space for pretty printing of wasts, so massive wasts are less ↵	Alon Zakai	2017-03-09	1	-470/+470
\| \| \| \|	unruly (#928)
*	Use 3 modes for potentially trapping ops in asm2wasm (#929)	Alon Zakai	2017-03-07	1	-0/+514
	* use 3 modes for potentially trapping ops in asm2wasm: allow (just emit a potentially trapping op), js (do exactly what js does, even if it takes a slow ffi to do it), and clamp (avoid the trap by clamping as necessary)