| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this, we can update DWARF debug line info properly as
we write a new binary.
To do that we track binary locations as we write. Each
instruction is mapped to the location it is written to. We
must also adjust them as we move code around because
of LEB optimization (we emit a function or a section
with a 5-byte LEB placeholder, the maximal size; later
we shrink it which is almost always possible).
writeDWARFSections() now takes a second param, the new
locations of instructions. It then maps debug line info from the
original offsets in the binary to the new offsets in the binary
being written.
The core logic for updating the debug line section is in
wasm-debug.cpp. It basically tracks state machine logic
both to read the existing debug lines and to emit the new
ones. I couldn't find a way to reuse LLVM code for this, but
reading LLVM's code was very useful here.
A final tricky thing we need to do is to update the DWARF
section's internal size annotation. The LLVM YAML writing
code doesn't do that for us. Luckily it's pretty easy, in
fixEmittedSection we just update the first 4 bytes in place
to have the section size, after we've emitted it and know
the size.
This ignores debug lines with a 0 in the line, col, or addr,
see WebAssembly/debugging#9 (comment)
This ignores debug line offsets into the middle of
instructions, which LLVM sometimes emits for some
reason, see WebAssembly/debugging#9 (comment)
Handling that would likely at least double our memory
usage, which is unfortunate - we are run in an LTO manner,
where the entire app's DWARF is present, and it may be
massive. I think we should see if such odd offsets are
a bug in LLVM, and if we can fix or prevent that.
This does not emit "special" opcodes for debug lines. Those
are purely an optimization, which I wanted to leave for
later. (Even without them we decrease the size quite a lot,
btw, as many lines have 0s in them...)
This adds some testing that shows we can load and save
fib2.c and fannkuch.cpp properly. The latter includes more
than one function and has nontrivial code.
To actually emit correct offsets a few minor fixes are
done here:
* Fix the code section location tracking during reading -
the correct offset we care about is the body of the code
section, not including the section declaration and size.
* Fix wasm-stack debug line emitting. We need to update
in BinaryInstWriter::visit(), that is, right before writing
bytes for the instruction. That differs from
* BinaryenIRWriter::visit which is a recursive function
that also calls the children - so the offset there would be
of the first child. For some reason that is correct with
source maps, I don't understand why, but it's wrong for
DWARF...
* Print code section offsets in hex, to match other tools.
Remove DWARFUpdate pass, which was useful for testing
temporarily, but doesn't make sense now (it just updates without
writing a binary).
cc @yurydelendik
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This imports LLVM code for DWARF handling. That code has the
Apache 2 license like us. It's also the same code used to
emit DWARF in the common toolchain, so it seems like a safe choice.
This adds two passes: --dwarfdump which runs the same code LLVM
runs for llvm-dwarfdump. This shows we can parse it ok, and will
be useful for debugging. And --dwarfupdate writes out the DWARF
sections (unchanged from what we read, so it just roundtrips - for
updating we need #2515).
This puts LLVM in thirdparty which is added here.
All the LLVM code is behind USE_LLVM_DWARF, which is on
by default, but off in JS for now, as it increases code size by 20%.
This current approach imports the LLVM files directly. This is not
how they are intended to be used, so it required a bunch of
local changes - more than I expected actually, for the platform-specific
stuff. For now this seems to work, so it may be good enough, but
in the long term we may want to switch to linking against libllvm.
A downside to doing that is that binaryen users would need to
have an LLVM build, and even in the waterfall builds we'd have a
problem - while we ship LLVM there anyhow, we constantly update
it, which means that binaryen would need to be on latest llvm all
the time too (which otherwise, given DWARF is quite stable, we
might not need to constantly update).
An even larger issue is that as I did this work I learned about how
DWARF works in LLVM, and while the reading code is easy to
reuse, the writing code is trickier. The main code path is heavily
integrated with the MC layer, which we don't have - we might want
to create a "fake MC layer" for that, but it sounds hard. Instead,
there is the YAML path which is used mostly for testing, and which
can convert DWARF to and from YAML and from binary. Using
the non-YAML parts there, we can convert binary DWARF to
the YAML layer's nice Info data, then convert that to binary. This
works, however, this is not the path LLVM uses normally, and it
supports only some basic DWARF sections - I had to add ranges
support, in fact. So if we need more complex things, we may end
up needing to use the MC layer approach, or consider some other
DWARF library. However, hopefully that should not affect the core
binaryen code which just calls a library for DWARF stuff.
Helps #2400
|
|
|
|
|
| |
Currently `BINARYEN_PASS_DEBUG=3` prints `.wasm` files but they are
actually text wast files. This makes `BINARYEN_PASS_DEBUG=3` prints both
wasm/wast files, where wasm contains a binary file and wast a text file.
|
|
|
|
|
|
| |
This pass writes and reads the module. This shows the effects
of converting to and back from the binary format, and will be
useful in testing dwarf debug support (where we'll need to see
that writing and reading a module preserves debug info properly).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
clang/llvm introduce __original_main as a workaround for
the fact that main may have different signatures. A downside
to that is that users get it in stack traces, which is confusing.
In -O2 and above we normally inline __original_main anyhow,
but as this is for debugging, non-optimized builds matter too,
so add a pass for this.
The implementation is trivial, just call doInling. However we
must check some corner cases first.
Bonus minor fixes to FindAllPointers, which unnecessarily
created an object to get the class Id (which is not valid
for all classes), and that it didn't take the input by
reference properly, which meant we couldn't get the
pointer to the function body's toplevel.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass strips DWARF debug sections, but not other debug
sections. This is useful when emitting source maps, as we do
need the SourceMapURL section, but the DWARF sections are
not longer necessary (and we've seen a testcase where they
are massively large, so big the wasm can't even be loaded in
a browser...).
Also contains a trivial one-line fix in --extract-function which
was necessary to create the testcase here: that pass extracts
a function from a wasm file (like llvm-extract) but it didn't check
if an export already existed for the function.
|
|
|
|
|
| |
Adds the AssemblyScript-specific passes post-assemblyscript
and post-assemblyscript-finalize, eliminating redundant ARC-style
retain/release patterns conservatively emitted by the compiler.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These passes are meant to be run after Asyncify has been run, they modify the
output. We can assume that we will always unwind if we reach an import, or
that we will never unwind, etc.
This is meant to help with lazy code loading, that is, the ability for an
initially-downloaded wasm to not contain all the code, and if code not present
there is called, we download all the rest and continue with that. That could
work something like this:
* The wasm is created. It contains calls to a special import for lazy code
loading.
* Asyncify is run on it.
* The initially downloaded wasm is created by running
--mod-asyncify-always-and-only-unwind: if the special import for lazy code
loading is called, we will definitely unwind, and we won't rewind in this binary.
* The lazily downloaded wasm is created by running --mod-asyncify-never-unwind:
we will rewind into this binary, but no longer need support for unwinding.
(Optionally, there could also be a third wasm, which has not had Asyncify run
on it, and which we'd swap to for max speed.)
These --mod-asyncify passes allow the optimizer to do a lot of work, especially
for the initially downloaded wasm if we have lots of calls to the lazy code
loading import. In that case the optimizer will see that those calls unwind,
which means the code after them is not reached, potentially making lots of code
dead and removable.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimizes stuff like
(global.set $x (i32.const 123))
(global.get $x)
into
(global.set $x (i32.const 123))
(i32.const 123)
This doesn't help much with LLVM output as it's rare to use globals (except for the stack pointer, and that's already well optimized), but it may help on general wasm. It can also help with Asyncify that does use globals extensively.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is both an optimization and a workaround for the problem that emscripten-core/emscripten#7641 uncovered and had to be reverted because of.
What's going on there is that wasm-emscripten-finalize turns emscripten_longjmp_jmpbuf into emscripten_longjmp (for some LLVM internal reason - there's a long comment in the source that I didn't fully follow). There are two such imports already, one for each name, and before that PR, we ended up with just one. After that PR, we end up with two. And with two, the minification of import names gets confused - we have two imports with the same name, and the code there ends up ignoring one of them.
I'm not sure why that PR changed things - I guess the wasm-emscripten-finalize code looks at the name, and that PR changed what name appears? @sbc100 maybe #2285 is related?
Anyhow, it's not trivial to make import minification code support two identical imports, but I don't think we should - we should avoid having such duplication anyhow. And we should add an assert that they don't exist (I'll open a PR for that later when it's possible).
This fixes the duplication by adding a useful pass to remove duplicate imports (just functions, for now). Pretty simple, but we didn't do it yet. Even if there is a wasm-emscripten-finalize bug we need to fix with those duplicate imports, I think this pass is still a good thing to add.
I confirmed that this fixes the issue caused by that PR.
|
|
|
|
|
|
|
|
|
| |
(#2242)
Main change here is in pass.h, everything else is changes to work with the new API.
The add("name") remains as before, while the weird variadic add(..) which constructed the pass now just gets a std::unique_ptr of a pass. This also makes the memory management internally fully automatic. And it makes it trivial to parallelize WalkerPass::run on parallel passes.
As a benefit, this allows removing a lot of code since in many cases there is no need to create a new pass runner, and running a pass can be just a single line.
|
|
|
|
|
| |
* Clarify the difference between old and new Asyncify.
* Remove the old --bysyncify pass option.
|
|
|
|
|
|
|
| |
After some discussion this seems like a less confusing name: what the pass does is "asyncify" code, after all.
The one downside is the name overlaps with the old emscripten "Asyncify" utility, which we'll need to clarify in the docs there.
This keeps the old --bysyncify flag around for now, which is helpful for avoiding temporary breakage on CI as we move the emscripten side as well.
|
|
|
|
|
| |
Fix and test mutable globals support, replace string literals with
constants, and add a pass to emit the target features section.
|
|
|
|
|
|
|
|
|
| |
This adds a new pass, Bysyncify, which transforms code to allow unwind and rewinding the call stack and local state. This allows things like coroutines, turning synchronous code asynchronous, etc.
The new pass file itself has a large comment on top with docs.
So far the tests here seem to show this works, but this hasn't been tested heavily yet. My next step is to hook this up to emscripten as a replacement for asyncify/emterpreter, see emscripten-core/emscripten#8561
Note that this is completely usable by itself, so it could be useful for any language that needs coroutines etc., and not just ones using LLVM and/or emscripten. See docs on the ABI in the pass source.
|
|
|
|
|
|
|
|
|
|
| |
* work
* fix
* fix
* format
|
|
|
|
|
|
| |
This is useful for front-ends which wish to selectively enable or
disable coloring.
Also expose these APIs from the C API.
|
|
|
|
|
| |
In JS a reinterpret is especially expensive, as we implement it as a write to a temp buffer and a read using another view. This finds places where we load a value from memory, then reinterpret it later - in that case, we can load it using another view, at the cost of another load and another local.
This is helpful on things like Box2D, where there are many reinterprets due to the main 2D vector class being an union over two floats/ints, and LLVM likes to do a single i64 load of them.
|
|
|
| |
Helps to avoid trampling each other when binaryen is called multiple times from emcc, for example.
|
|
|
|
|
|
|
| |
If a global is marked mutable but not assigned to, make it immutable.
If an immutable global is a copy of another, use the original, so we can remove the duplicates.
Fixes #2011
|
|
|
|
|
| |
This replaces the wasm2js code that lowered them to pessimistic (1-byte aligned) loads and stores. The new pass will do the optimal thing, keeping 2-byte alignment where possible.
This is also nicer as a standalone pass, which has the simple property that after it runs all loads and stores are aligned, instead of some code scattered inside wasm2js.
|
|
|
| |
Applies the changes in #2065, and temprarily disables the hook since it's too slow to run on a change this large. We should re-enable it in a later commit.
|
|
|
| |
Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
|
|
|
|
|
| |
In the absence of the target features section or command line flags. When there are command line flags, it is an error if they do not exactly match the target features section, except if --detect-features has been provided.
Also adds a --print-features pass to print the command line flags for all enabled options and uses it to make the feature tests more rigorous.
|
|
|
|
|
| |
This allows us to emit a (potentially modified) target features
section and conditionally emit other sections such as the DataCount
section based on the presence of features.
|
|
|
|
|
|
| |
It was previously part of writing a binary, but changing the number of
segments at such a late stage would not work in the presence of bulk
memory's datacount section. Also updates the memory packing pass
to respect the web's limits on the number of data segments.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
This adds an ssa-nomerge pass, which like ssa creates new local indexes for each set, but it does not alter indexes that have merges (in practice adding indexes to merges can lead to more copies in the end.)
This also stops adding a new local index for a set that is already in "ssa form", that is, has only one set (aside from the zero initialization which wasm mandates, but for an "ssa form" index, that must not be used).
This then enables ssa-nomerge in -O3 and -Os. This doesn't help much on well-optimized code like from the wasm backend (but it does sometimes - 0.5% code size improvement on Box2D), but on AssemblyScript for example it can remove a copy in the n-body benchmark as can be seen in the test updates here.
|
|
|
|
|
|
|
| |
A propagated constant can be helpful in the various patterns in optimize instructions.
Testcase shows an example of this in action - we can optimize out a load offset for a constant, but if we propagated it afterwards, we would miss that.
In general these two passes can help each other, so maybe they should be combined and run multiple iterations, but that's what --converge is for. Meanwhile this change improves us on what seems to be the more common case - guessed at by it being what I noticed in practice, and when I run the fuzzer, I see only this type of case.
|
|
|
| |
And run it in wasm-emscripten-finalize. This will prevent the emscripten output from changing when the target features section lands in LLVM.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See #1919 - we did not do this consistently before.
This adds a lowMemoryUnused option to PassOptions. It can be passed on the commandline with --low-memory-unused. If enabled, we run the new optimize-added-constants pass, which does the real work here, replacing older code in post-emscripten.
Aside from running at the proper time (unlike the old pass, see #1919), this also has a -propagate mode, which can do stuff like this:
y = x + 10
[..]
load(y)
[..]
load(y)
=>
y = x + 10
[..]
load(x, offset=10)
[..]
load(x, offset=10)
That is, it can propagate such offsets to the loads/stores. This pattern is common in big interpreter loops, where the pointers are offsets into a big struct of state.
The pass does this propagation by using a new feature of LocalGraph, which can verify which locals are in SSA mode. Binaryen IR is not SSA (intentionally, since it's a later IR), but if a local only has a single set for all gets, that means that local is in such a state, and can be optimized. The tricky thing is that all locals are initialized to zero, so there are at minimum two sets. But if we verify that the real set dominates all the gets, then the zero initialization cannot reach them, and we are safe.
This PR also makes safe-heap aware of lowMemoryUnused. If so, we check for not just an access of 0, but the range 0-1023.
This makes zlib 5% faster, with either the wasm backend or asm2wasm. It also makes it 0.5% smaller. Also helps sqlite (1.5% faster) and lua (1% faster)
|
|
|
|
|
| |
* optimize normally with debug info - some of it may be removed, but that's the price of higher optimization levels, and by optimizing normally in profiling and -g2 etc. builds they are more comparable to normal ones, yielding better data
* copy debug locations automatically in replaceCurrent in wasm-traversal, so optimization passes at least by default will preserve debuggability
|
|
|
|
|
|
|
| |
* Finds functions whose return value is always dropped, and removes the return.
* Run multiple iterations of the pass, as one can enable others.
* Do not run DeadArgumentElimination at all if debug info is present (with these improvements, it became much more likely to destroy debug info).
Saves 2.5% on hello world, because of some simple libc calls.
|
| |
|
|
|
|
|
|
|
|
| |
WebAssembly/tool-conventions#93 has a summary of emscripten's current thinking on this. For Binaryen, we don't want to do anything to the producers section by default, but do want it to be possible to optionally remove it. To achieve that, this PR
* creates a --strip-producers pass that removes that section.
* creates a --strip-debug pass that removes debug info, same as the old --strip, which is still around but deprecated.
A followup in emscripten will use this pass by default.
|
|
|
|
|
|
| |
Automated renaming according to
https://github.com/WebAssembly/spec/issues/884#issuecomment-426433329.
|
|
|
|
|
| |
Even when we don't want to fully legalize code for JS, we should still legalize things that only JS cares about. In particular, dynCall_* methods are used from JS to call into the wasm table, and if they exist they are only for JS, so we should only legalize them.
The use case motivating this is that in dynamic linking you may want to disable legalization, so that wasm=>wasm module calls are fast even with i64s, but you do still need dynCalls to be legalized even in that case, otherwise an invoke with an i64 parameter would fail.
|
|
|
| |
When emscripten knows that the runtime will not be exited, it can tell codegen to not emit atexit() calls (since those callbacks will never be run). This saves both code size and startup time. In asm2wasm the JSBackend does it directly. For the wasm backend, this pass does the same on the output wasm.
|
|
|
|
|
|
| |
We now emit more sets and tees of if-elses from simplify-locals, and
coalesce-locals is necessary to remove them if they are ineffectual,
that is, if no get will read them.
|
|
|
|
| |
This is sort of like --strip on a native binary. The more specific use case for us is e.g. you link with a library that has -g in its CFLAGS, but you don't want debug info in your final executable (I hit this with poppler now). We can make emcc pass this to binaryen if emcc is not building an output with intended debug info.
|
|
|
|
| |
sometimes that is not desirable.
|
|
|
|
|
| |
Rely on the dedicated pass for that. It's not worth the extra complexity to try, as we can't easily handle all the cases anyhow.
Add another run of the dedicated name-removing pass in the default passes.
|
|
|
|
|
|
|
|
|
| |
This new pass minifies import and export names, for example, this may minify
(import "env" "longname" (func $internal))
to
(import "env" "a" (func $internal))
By updating the JS that provides those imports/calls those exports, we can use the minified names properly. This can save a useful amount of space in the wasm and JS, see kripken/emscripten#7414
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #1649
This moves us to a single object for functions, which can be imported or nor, and likewise for globals (as a result, GetGlobals do not need to check if the global is imported or not, etc.). All imported things now inherit from Importable, which has the module and base of the import, and if they are set then it is an import.
For convenient iteration, there are a few helpers like
ModuleUtils::iterDefinedGlobals(wasm, [&](Global* global) {
.. use global ..
});
as often iteration only cares about imported or defined (non-imported) things.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a pass to remove unnecessary call arguments in an LTO-like manner, that is:
* If a parameter is not actually used in a function, we don't need to send anything, and can remove it from the function's declaration. Concretely,
(func $a (param $x i32)
..no uses of $x..
)
(func $b
(call $a (..))
)
=>
(func $a
..no uses of $x..
)
(func $b
(call $a)
)
And
* If a parameter is only ever sent the same constant value, we can just set that constant value in the function (which then means that the values sent from the outside are no longer used, as in the previous point). Concretely,
(func $a (param $x i32)
..may use $x..
)
(func $b
(call $a (i32.const 1))
(call $a (i32.const 1))
)
=>
(func $a
(local $x i32)
(set_local $x (i32.const 1)
..may use $x..
)
(func $b
(call $a)
(call $a)
)
How much this helps depends on the codebase obviously, but sometimes it is pretty useful. For example, it shrinks 0.72% on Unity and 0.37% on Mono. Note that those numbers include not just the optimization itself, but the other optimizations it then enables - in particular the second point from earlier leads to inlining a constant value, which often allows constant propagation, and also removing parameters may enable more duplicate function elimination, etc. - which explains how this can shrink Unity by almost 1%.
Implementation is pretty straightforward, but there is some work to make the heavy part of the pass parallel, and a bunch of corner cases to avoid (can't change a function that is exported or in the table, etc.). Like the Inlining pass, there is both a standard and an "optimizing" version of this pass - the latter also optimizes the functions it changes, as like Inlining, it's useful to not need to re-run all function optimizations on the whole module.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds an licm pass. Not that important for LLVM-originating code obviously, but for AssemblyScript and other non-LLVM compilers this might help a lot. Also when wasm has GC a bunch more non-LLVM languages may arrive that can benefit.
The pass is mostly straightforward. I considered using the DataFlow IR since it's in SSA form, or the CFG IR, but in the end it's actually pretty convenient to use the main IR as it is - with explicit loops already present - plus LocalGraph which connects each get to the sets influencing it.
Passed a bunch of fuzzing, and also the emscripten test suite at -O1 with licm added to the default passes (but I don't think it would make sense to run this by default, as LLVM doesn't need it).
We limit code moved by this pass as follows: An increased code size on fuzz testcases (and, more rarely, on real inputs) can happen due to stuff like this:
(loop
(set_local $x (i32.const 1))
..
)
=>
(set_local $x (i32.const 1))
(loop
..
)
For a const or a get_local, such an assignment to a local is both very cheap (a copy to another local may be optimized out later), and moving it out may prevent other optimizations (since we have no pass that tries to move code back in to a loop edit well, not by default, precompute-propagate etc. would do it, but are only run on high opt levels). So I made the pass not move such trivial code (sets/tees of consts or gets). However, the risk remains if code is moved out that is later reduced to a constant, so something like -Os --flatten --licm -Os may make sense.
|
|
|
|
|
|
|
|
|
| |
Background: google/souper#323
This adds a --souperify pass, which emits Souper IR in text format. That can then be read by Souper which can emit superoptimization rules. We hope that eventually we can integrate those rules into Binaryen.
How this works is we emit an internal "DataFlow IR", which is an SSA-based IR, and then write that out into Souper text.
This also adds a --dfo pass, which stands for data-flow optimizations. A DataFlow IR is generated, like in souperify, and then performs some trivial optimizations using it. There are very few things that can do that our other optimizations can't already, but this is also good testing for the DataFlow IR, plus it is good preparation for using Souper's superoptimization output (which would also construct DataFlow IR, like here, but then do some matching on the Souper rules).
|