| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
|
| |
|
|
|
|
|
|
|
| |
This adds an ssa-nomerge pass, which like ssa creates new local indexes for each set, but it does not alter indexes that have merges (in practice adding indexes to merges can lead to more copies in the end.)
This also stops adding a new local index for a set that is already in "ssa form", that is, has only one set (aside from the zero initialization which wasm mandates, but for an "ssa form" index, that must not be used).
This then enables ssa-nomerge in -O3 and -Os. This doesn't help much on well-optimized code like from the wasm backend (but it does sometimes - 0.5% code size improvement on Box2D), but on AssemblyScript for example it can remove a copy in the n-body benchmark as can be seen in the test updates here.
|
|
|
|
|
|
|
| |
A propagated constant can be helpful in the various patterns in optimize instructions.
Testcase shows an example of this in action - we can optimize out a load offset for a constant, but if we propagated it afterwards, we would miss that.
In general these two passes can help each other, so maybe they should be combined and run multiple iterations, but that's what --converge is for. Meanwhile this change improves us on what seems to be the more common case - guessed at by it being what I noticed in practice, and when I run the fuzzer, I see only this type of case.
|
|
|
| |
And run it in wasm-emscripten-finalize. This will prevent the emscripten output from changing when the target features section lands in LLVM.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See #1919 - we did not do this consistently before.
This adds a lowMemoryUnused option to PassOptions. It can be passed on the commandline with --low-memory-unused. If enabled, we run the new optimize-added-constants pass, which does the real work here, replacing older code in post-emscripten.
Aside from running at the proper time (unlike the old pass, see #1919), this also has a -propagate mode, which can do stuff like this:
y = x + 10
[..]
load(y)
[..]
load(y)
=>
y = x + 10
[..]
load(x, offset=10)
[..]
load(x, offset=10)
That is, it can propagate such offsets to the loads/stores. This pattern is common in big interpreter loops, where the pointers are offsets into a big struct of state.
The pass does this propagation by using a new feature of LocalGraph, which can verify which locals are in SSA mode. Binaryen IR is not SSA (intentionally, since it's a later IR), but if a local only has a single set for all gets, that means that local is in such a state, and can be optimized. The tricky thing is that all locals are initialized to zero, so there are at minimum two sets. But if we verify that the real set dominates all the gets, then the zero initialization cannot reach them, and we are safe.
This PR also makes safe-heap aware of lowMemoryUnused. If so, we check for not just an access of 0, but the range 0-1023.
This makes zlib 5% faster, with either the wasm backend or asm2wasm. It also makes it 0.5% smaller. Also helps sqlite (1.5% faster) and lua (1% faster)
|
|
|
|
|
| |
* optimize normally with debug info - some of it may be removed, but that's the price of higher optimization levels, and by optimizing normally in profiling and -g2 etc. builds they are more comparable to normal ones, yielding better data
* copy debug locations automatically in replaceCurrent in wasm-traversal, so optimization passes at least by default will preserve debuggability
|
|
|
|
|
|
|
| |
* Finds functions whose return value is always dropped, and removes the return.
* Run multiple iterations of the pass, as one can enable others.
* Do not run DeadArgumentElimination at all if debug info is present (with these improvements, it became much more likely to destroy debug info).
Saves 2.5% on hello world, because of some simple libc calls.
|
| |
|
|
|
|
|
|
|
|
| |
WebAssembly/tool-conventions#93 has a summary of emscripten's current thinking on this. For Binaryen, we don't want to do anything to the producers section by default, but do want it to be possible to optionally remove it. To achieve that, this PR
* creates a --strip-producers pass that removes that section.
* creates a --strip-debug pass that removes debug info, same as the old --strip, which is still around but deprecated.
A followup in emscripten will use this pass by default.
|
|
|
|
|
|
| |
Automated renaming according to
https://github.com/WebAssembly/spec/issues/884#issuecomment-426433329.
|
|
|
|
|
| |
Even when we don't want to fully legalize code for JS, we should still legalize things that only JS cares about. In particular, dynCall_* methods are used from JS to call into the wasm table, and if they exist they are only for JS, so we should only legalize them.
The use case motivating this is that in dynamic linking you may want to disable legalization, so that wasm=>wasm module calls are fast even with i64s, but you do still need dynCalls to be legalized even in that case, otherwise an invoke with an i64 parameter would fail.
|
|
|
| |
When emscripten knows that the runtime will not be exited, it can tell codegen to not emit atexit() calls (since those callbacks will never be run). This saves both code size and startup time. In asm2wasm the JSBackend does it directly. For the wasm backend, this pass does the same on the output wasm.
|
|
|
|
|
|
| |
We now emit more sets and tees of if-elses from simplify-locals, and
coalesce-locals is necessary to remove them if they are ineffectual,
that is, if no get will read them.
|
|
|
|
| |
This is sort of like --strip on a native binary. The more specific use case for us is e.g. you link with a library that has -g in its CFLAGS, but you don't want debug info in your final executable (I hit this with poppler now). We can make emcc pass this to binaryen if emcc is not building an output with intended debug info.
|
|
|
|
| |
sometimes that is not desirable.
|
|
|
|
|
| |
Rely on the dedicated pass for that. It's not worth the extra complexity to try, as we can't easily handle all the cases anyhow.
Add another run of the dedicated name-removing pass in the default passes.
|
|
|
|
|
|
|
|
|
| |
This new pass minifies import and export names, for example, this may minify
(import "env" "longname" (func $internal))
to
(import "env" "a" (func $internal))
By updating the JS that provides those imports/calls those exports, we can use the minified names properly. This can save a useful amount of space in the wasm and JS, see kripken/emscripten#7414
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #1649
This moves us to a single object for functions, which can be imported or nor, and likewise for globals (as a result, GetGlobals do not need to check if the global is imported or not, etc.). All imported things now inherit from Importable, which has the module and base of the import, and if they are set then it is an import.
For convenient iteration, there are a few helpers like
ModuleUtils::iterDefinedGlobals(wasm, [&](Global* global) {
.. use global ..
});
as often iteration only cares about imported or defined (non-imported) things.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a pass to remove unnecessary call arguments in an LTO-like manner, that is:
* If a parameter is not actually used in a function, we don't need to send anything, and can remove it from the function's declaration. Concretely,
(func $a (param $x i32)
..no uses of $x..
)
(func $b
(call $a (..))
)
=>
(func $a
..no uses of $x..
)
(func $b
(call $a)
)
And
* If a parameter is only ever sent the same constant value, we can just set that constant value in the function (which then means that the values sent from the outside are no longer used, as in the previous point). Concretely,
(func $a (param $x i32)
..may use $x..
)
(func $b
(call $a (i32.const 1))
(call $a (i32.const 1))
)
=>
(func $a
(local $x i32)
(set_local $x (i32.const 1)
..may use $x..
)
(func $b
(call $a)
(call $a)
)
How much this helps depends on the codebase obviously, but sometimes it is pretty useful. For example, it shrinks 0.72% on Unity and 0.37% on Mono. Note that those numbers include not just the optimization itself, but the other optimizations it then enables - in particular the second point from earlier leads to inlining a constant value, which often allows constant propagation, and also removing parameters may enable more duplicate function elimination, etc. - which explains how this can shrink Unity by almost 1%.
Implementation is pretty straightforward, but there is some work to make the heavy part of the pass parallel, and a bunch of corner cases to avoid (can't change a function that is exported or in the table, etc.). Like the Inlining pass, there is both a standard and an "optimizing" version of this pass - the latter also optimizes the functions it changes, as like Inlining, it's useful to not need to re-run all function optimizations on the whole module.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds an licm pass. Not that important for LLVM-originating code obviously, but for AssemblyScript and other non-LLVM compilers this might help a lot. Also when wasm has GC a bunch more non-LLVM languages may arrive that can benefit.
The pass is mostly straightforward. I considered using the DataFlow IR since it's in SSA form, or the CFG IR, but in the end it's actually pretty convenient to use the main IR as it is - with explicit loops already present - plus LocalGraph which connects each get to the sets influencing it.
Passed a bunch of fuzzing, and also the emscripten test suite at -O1 with licm added to the default passes (but I don't think it would make sense to run this by default, as LLVM doesn't need it).
We limit code moved by this pass as follows: An increased code size on fuzz testcases (and, more rarely, on real inputs) can happen due to stuff like this:
(loop
(set_local $x (i32.const 1))
..
)
=>
(set_local $x (i32.const 1))
(loop
..
)
For a const or a get_local, such an assignment to a local is both very cheap (a copy to another local may be optimized out later), and moving it out may prevent other optimizations (since we have no pass that tries to move code back in to a loop edit well, not by default, precompute-propagate etc. would do it, but are only run on high opt levels). So I made the pass not move such trivial code (sets/tees of consts or gets). However, the risk remains if code is moved out that is later reduced to a constant, so something like -Os --flatten --licm -Os may make sense.
|
|
|
|
|
|
|
|
|
| |
Background: google/souper#323
This adds a --souperify pass, which emits Souper IR in text format. That can then be read by Souper which can emit superoptimization rules. We hope that eventually we can integrate those rules into Binaryen.
How this works is we emit an internal "DataFlow IR", which is an SSA-based IR, and then write that out into Souper text.
This also adds a --dfo pass, which stands for data-flow optimizations. A DataFlow IR is generated, like in souperify, and then performs some trivial optimizations using it. There are very few things that can do that our other optimizations can't already, but this is also good testing for the DataFlow IR, plus it is good preparation for using Souper's superoptimization output (which would also construct DataFlow IR, like here, but then do some matching on the Souper rules).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new IR, "Stack IR". This represents wasm at a very low level, as a simple stream of instructions, basically the same as wasm's binary format. This is unlike Binaryen IR which is structured and in a tree format.
This gives some small wins on binary sizes, less than 1% in most cases, usually 0.25-0.50% or so. That's not much by itself, but looking forward this prepares us for multi-value, which we really need an IR like this to be able to optimize well. Also, it's possible there is more we can do already - currently there are just a few stack IR optimizations implemented,
DCE
local2stack - check if a set_local/get_local pair can be removed, which keeps the set's value on the stack, which if the stars align it can be popped instead of the get.
Block removal - remove any blocks with no branches, as they are valid in wasm binary format.
Implementation-wise, the IR is defined in wasm-stack.h. A new StackInst is defined, representing a single instruction. Most are simple reflections of Binaryen IR (an add, a load, etc.), and just pointers to them. Control flow constructs are expanded into multiple instructions, like a block turns into a block begin and end, and we may also emit extra unreachables to handle the fact Binaryen IR has unreachable blocks/ifs/loops but wasm does not. Overall, all the Binaryen IR differences with wasm vanish on the way to stack IR.
Where this IR lives: Each Function now has a unique_ptr to stack IR, that is, a function may have stack IR alongside the main IR. If the stack IR is present, we write it out during binary writing; if not, we do the same binaryen IR => wasm binary process as before (this PR should not affect speed there). This design lets us use normal Passes on stack IR, in particular this PR defines 3 passes:
Generate stack IR
Optimize stack IR (might be worth splitting out into separate passes eventually)
Print stack IR for debugging purposes
Having these as normal passes is convenient as then they can run in parallel across functions and all the other conveniences of our current Pass system. However, a downside of keeping the second IR as an option on Functions, and using normal Passes to operate on it, means that we may get out of sync: if you generate stack IR, then modify binaryen IR, then the stack IR may no longer be valid (for example, maybe you removed locals or modified instructions in place etc.). To avoid that, Passes now define if they modify Binaryen IR or not; if they do, we throw away the stack IR.
Miscellaneous notes:
Just writing Stack IR, then writing to binary - no optimizations - is 20% slower than going directly to binary, which is one reason why we still support direct writing. This does lead to some "fun" C++ template code to make that convenient: there is a single StackWriter class, templated over the "mode", which is either Binaryen2Binary (direct writing), Binaryen2Stack, or Stack2Binary. This avoids a lot of boilerplate as the 3 modes share a lot of code in overlapping ways.
Stack IR does not support source maps / debug info. We just don't use that IR if debug info is present.
A tiny text format comment (if emitting non-minified text) indicates stack IR is present, if it is ((; has Stack IR ;)). This may help with debugging, just in case people forget. There is also a pass to print out the stack IR for debug purposes, as mentioned above.
The sieve binaryen.js test was actually not validating all along - these new opts broke it in a more noticeable manner. Fixed.
Added extra checks in pass-debug mode, to verify that if stack IR should have been thrown out, it was. This should help avoid any confusion with the IR being invalid.
Added a comment about the possible future of stack IR as the main IR, depending on optimization results, following some discussion earlier today.
|
|
|
|
|
|
|
|
|
| |
This defines a new -O4 optimization mode, as flatten + flat-only opts (currently local-cse) + -O3.
In practice, flattening is not needed for LLVM output, which is pretty flat already (no block or if values, etc., even if it does use tees and does nest expressions; and LLVM has already done gvn etc. anyhow). In general, though, wasm generated by a non-LLVM compiler may naturally be nested because wasm allows that. See for example #1593 where an AssemblyScript testcase requires flattening to be fully optimized. So -O4 can help there.
-O4 takes 3x longer to run than -O3 in my testing, basically because flat IR is much bigger. But when it's useful it may be worth it. It does handle that AssemblyScript testcase and others like it. There's not much big real-world code that isn't LLVM yet, but running the fuzzer - which happily creates nested stuff all the time - I see -O4 consistently shrink the size by around 20% over -O3.
|
|
|
|
|
| |
This makes it much more effective, by rewriting it to depend on flatten. In flattened IR, it is very simple to check if an expression is equivalent to one already available for use in a local, and use that one instead, basically we just track values in locals.
Helps with #1521
|
|
|
|
| |
helpful in both positions on general code (#1581)
|
| |
|
|
|
|
|
|
| |
This commit implements the `copysign` instruction for the wasm2asm binary. The
implementation here is a new pass which wholesale replaces `copysign`
instructions with the equivalent bit ops and reinterpretation instructions. It's
intended that this matches Emscripten's implementation of lowering here.
|
|
|
|
|
|
|
|
|
| |
If locals are known to contain the same value, we can
* Pick which local to use for a get_local of any of them. Makes sense to prefer the most common, to increase the chance of one dropping to zero uses.
* Remove copies between a local and one that we know contains the same value.
This is a consistent win, small though, around 0.1-0.2%.
|
|
|
|
|
| |
Remove the entire memory/table when possible, in particular, when not imported, exported, or used. Previously we did not look at whether they were imported, so we assumed we could never remove them.
Also add a variant that removes everything but functions, which can be useful when reducing a testcase that only cares about code in functions.
|
|
|
|
|
| |
Add a version of simplify-locals which does not create nesting. This keeps the IR flat (in the sense of --flatten).
Also refactor simpify-locals to be a template, so the various modes are all template parameters.
|
|
|
|
|
|
|
|
| |
Noticed by Souper.
* We only folded identical code in an if-else when both arms were blocks, so we were missing the case of one arm being just a singleton expression. This PR will wraps that in a block so the rest of the optimization can work on it, if it sees it is going to be folded out. Turns out this is common for phis.
* We only ran code-folding in -Os, because I assumed it was just good for code size, but as it may remove phis in the wasm VM later, seems like we should run it when not optimizing for size as well.
Together, these two shrink lua -O3 by almost 1%.
|
|
|
|
|
|
|
|
|
|
| |
Inspired by #1501
* remove unneeded appearances of the default switch target (at the front or back of the list of targets)
* optimize a switch with 0, 1 or 2 targets into an if or if-chain
* optimize a br_if br pair when they have the same target
Makes e.g. fastcomp libc++ 2% smaller. Noticeable improvements on other things like box2d etc.
|
|
|
|
| |
unreachable, the modified version is as well (#1481)
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a pass that implements "function pointer cast emulation" - allows indirect calls to go through even if the number of arguments or their types is incorrect. That is undefined behavior in C/C++ but in practice somehow works in native archs. It is even relied upon in e.g. Python.
Emscripten already has such emulation for asm.js, which also worked for asm2wasm. This implements something like it in binaryen which also allows the wasm backend to use it. As a result, Python should now be portable using the wasm backend.
The mechanism used for the emulation is to make all indirect calls use a fixed number of arguments, all of type i64, and a return type of also i64. Thunks are then placed in the table which translate the arguments properly for the target, basically by reinterpreting to i64 and back. As a result, receiving an i64 when an i32 is sent will have the upper bits all zero, and the reverse would truncate the upper bits, etc. (Note that this is different than emscripten's existing emulation, which converts (as signed) to a double. That makes sense for JS where double's can contain all numeric values, but in wasm we have i64s. Also, bitwise conversion may be more like what native archs do anyhow. It is enough for Python.)
Also adds validation for a function's type matching the function's actual params and result (surprised we didn't have that before, but we didn't, and there was even a place in the test suite where that was wrong).
Also simplifies the build script by moving two cpp files into the wasm/ subdir, so they can be built once and shared between the various tools.
|
|
|
|
|
|
|
|
|
|
| |
* refactor BINARYEN_PASS_DEBUG code for writing byn-* files, make it easy to emit binaries instead of text
* fix up bad argument numbers in asm2wasm. This can be caused by undefined behavior on the LLVM side, which happens to work natively. it's nicer to fix it up like it would be in a native build, and give a warning, instead of failing to compile
* update build-js.sh
* updated builds
|
|
|
|
|
|
|
| |
Simplify inlining logic: don't special case the first iteration or change behavior based on when we are optimizing or not. Instead, use one simpler set of heuristics for both inlining and inlining-optimizing. We only run inlining-optimizing by default anyhow, no point to try to make inlining without optimizations useful by itself, it's not a realistic use case. (inlining is still useful for debugging, and if you will run optimizations anyhow later on everything, in which case inlining-optimizing might add some redundancy.) The simpler heuristics after this let us do a somewhat better job as we are no longer paranoid about inlining in multiple iterations.
Also raise limit on inlining things that are obviously worth it from size 1 to size 2: things of size 2 will never lead to an increase in code size after we optimize (it takes at least 3 nodes to generate something that reads two locals and reverses their order, which would require a temp local in the outside scope etc.).
Also fix infinite recursion of inlining an infinitely recursive set of calls.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* run dfe at the very end, as it may be more effective after inlining
* optimize reorder-functions
* do a final dfe in asm2wasm after all other opts
* make inlining deterministic: std::atomic<T> values are not zero-initialized
* do global post opts at the end of asm2wasm, and don't also do them in the module builder
* fix function type removing
* don't inline+optimize when preserving debug info
|
|
|
| |
Emits binary size and opcode counts for each function, which helps investigating what's taking up space in a wasm binary.
|
|
|
|
| |
(#1356)
|
|
|
|
| |
This optimizes #1343. It looks for stores of a value that is already present in the local, which in particular can remove the initial set to 0 of loops starting at zero, since all locals are initialized to that already. This helps in real-world code, but is not super-common since coalescing means we tend to have assigned something else to it anyhow before we need it to be zero, so this mainly helps in small functions (and running this before coalescing would extend live ranges in potentially bad ways).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an experiment to help with Boehm-style GC. It will spill things that could be pointers to the C stack, so that they can be seen by conservative garbage collection.
The spills add code size and runtime overhead, but actually less than I thought: 10% slower (smaller than the difference between VMs), 15% gzip size larger. We can do even better with more optimizations for this, like a dead store elimination pass.
This PR does the following:
* Add the new pass.
* Create an abi/ dir, with info about the pointer size and stack manipulation utilities.
* Separates out the liveness analysis from CoalesceLocals, so that other passes can use it (like SpillPointers).
* Refactor out the SortedVector class from the liveness analysis to a separate file (just seems nicer that way).
|
|
|
|
|
|
|
|
|
| |
This optimizes the situation described in #1331. Namely, when x is copied into y, then on subsequent gets of x we could use y instead, and vice versa, as their value is equal. Specifically, this seems to get rid of the definite overlap in the live ranges of x and y, as removing it allows coalesce-locals to merge them. The pass therefore does nothing if the live range of y ends there anyhow.
The danger here is that we may extend the live range so that it causes more conflicts with other things, so this is a heuristic, but I've tested it on every codebase I can find and it always produces a net win, even on one I saw a 0.4% reduction of code size, which surprised me.
This is a fairly slow pass, because it uses LocalGraph which isn't much optimized. This PR includes a minor optimization for it, but we should rewrite it. Meanwhile this is just enabled in -O3 and -Oz.
This PR also includes some fuzzing improvements, to better test stuff like this.
|
|
|
| |
Do not print the entire and possibly very large module when validation fails. Leave printing to tools using the validator, instead of always doing it in the validator where it can't be overridden.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This enum describes which wasm features the IR is expected to include. The
validator should reject operations which require excluded features, and passes
should avoid producing IR which requires excluded features.
This makes it easier to catch possible errors in Binaryen producers (e.g.
emscripten). Asm2wasm has a flag to enable or disable atomics. Other
tools currently just accept all features (as, dis and opt are just for
inspecting or modifying existing modules, so it would be annoying to have to use
flags with those tools and I expect the risk of accidentally introducing
atomics to be low).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename flatten-control-flow to flatten, which now flattens everything, not just control flow, so e.g.
(i32.add
(call $x)
(call $y)
)
==>
(block
(set_local $temp_x (call $x))
(set_local $temp_y (call $y))
(i32.add
(get_local $x)
(get_local $y)
)
)
This uses more locals than before, but is much simpler and avoids a bunch of corner cases and fuzz bugs the old one hit. We can optimize later if necessary.
|
|
|
|
| |
* refactor validator API to use enums
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Extract Asm2WasmBuilder::TrapMode to shared FloatTrapMode
* Extract makeTrappingI32Binary
* Extract makeTrappingI64Binary
* Extract asm2wasm test script into scripts/test/asm2wasm.py
This matches s2wasm.py, and makes iterating on asm2wasm slightly faster.
* Simplify callsites with an arg struct
* Combine func adding across i32 and i64
* Support f32-to-int in asm2wasm
* Add BinaryenTrapMode pass, run pass from s2wasm
* BinaryenTrapMode pass takes trap context as a parameter
* Pass fully supports non-trapping binary ops
* Defer adding functions until after iteration (hackily)
* Update asm2wasm to work with deferred function adding, rebuild tests
* Extract makeTrappingFloatToInt32
* Extract makeTrappingFloatToInt64
* Add unary conversions to trap pass
* Add functions in the pass itself
* Set s2wasm trap mode with command-line arguments
* Print BINARYEN_PASS_DEBUG state when testing
* Get asm2wasm using the BinaryenTrapMode pass instead of handling it inline
* Also handle f32 to int in asm2wasm
* Make BinaryenTrapMode only need a FloatTrapMode from the caller
* Just pass the current binary Expression directly
* Combine makeTrappingI32Binary with makeTrappingI64Binary
* Pass Unary expr to makeTrappingFloatToInt32
* Unify makeTrappingFloatToInt32 & 64
* Move makeTrapping* functions inside BinaryenTrapMode, make addedFunctions non-static
* Remove FloatTrapContext
* Minor cleanups
* Extract some smaller subfunctions
* Emit name switch/casing, rename is32Bit to isI64 for consistency
* Rename BinaryenTrapMode to FloatTrap, make trap mode a nested enum
* Add some comments explaining why FloatTrap is non-parallel
* Rename addedFunctions to generatedFunctions for precision
* Rename move and split float-clamp.h to passes/FloatTrap.(h|cpp)
* Use builder instead of allocator
* Instantiate trap handling passes via the pass manager
* Move passes/FloatTrap.h to ast/trapping.h
* Add helper function to add trap-handling passes
* Add trap mode pass tests
* Rename FloatTrap.cpp to TrapMode.cpp
* Add s2wasm trap mode tests. Force float->int conversion to be signed
* Add trapping_sint_div_s test to unit.asm.js
* Fix flake8 issues with test scripts
* Update pass description comment
* Extract building functions methods
* Make generate functions into top-level functions
* Add GeneratedTrappingFunctions class to manage function/import additions
* Move ensure/makeTrapping functions outside class scope
* Use GeneratedTrappingFunctions to add immediately in asm2wasm mode
* Remove trapping_sint_div_s test
We only added it to test that trapping divisions would get
constant-folded at the correct time. Now that we're not changing the
timing of trapping modes, the test is unneeded (and problematic).
* Review feedback, add validator/*.wasm to .gitignore
* Add support for unsigned float-to-int conversion
* Use opcode directly instead of bools
* Update s2wasm clamp test for unsigned ftoi
|
|
|
|
|
|
|
| |
Implements #1172: this adds a variant of precompute, "precompute-propagate", which also does constant propagation. Precompute by itself just runs the interpreter on each expression and sees if it is in fact a constant; precompute-propagate also looks at the graph of connections between get and set locals, and propagates those constant values.
This helps with cases as noticed in #1168 - while in most cases LLVM will do this already, it's important when inlining, e.g. inlining of the clamping math functions. This new pass is run when inlining, and otherwise only in -O3/-Oz, as it does increase compilation time noticeably if run on everything (and for almost no benefit if LLVM has run).
Most of the code here is just refactoring out from the ssa pass the get/set graph computation, so it can now be used by both the ssa pass and precompute-propagate.
|
|
|
| |
A pass that hoists repeating constants to a local, and replaces their uses with a get of that local. This can reduce binary size, but can also *increase* gzip size, so it's mostly for experimentation and not used by default.
|