| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The DCE pass is one of the oldest in binaryen, and had quite a lot of
cruft from the changes in unreachability and other stuff in wasm and
binaryen's history. This PR rewrites it from scratch, making it about
1/3 the size.
I noticed this when looking for places to use code autogeneration.
The old version had annoying boilerplate, while the new one avoids
any need for it.
There may be noticeable differences, as the old pass did more than
it needed to. It overlapped with remove-unused-names for some
reason I don't remember. The new pass leaves that to the other
pass to do. I added another run of remove-unused-names to avoid
noticeable differences in optimized builds, but you can see
differences in the testcases that only run DCE by itself. (The test
differences in this PR are mostly whitespace.)
(The overlap is that if a block ended up not needed, that is, all
branches to it were removed, the old DCE would remove the block.)
This pass is about 15% faster than the old version. However, when
adding another run of remove-unused-names the difference
basically vanishes, so this isn't a speedup.
|
|
|
|
|
|
|
|
|
| |
And associated stack.h. The current stack.h clearly doesn't work with
the llvm back as it assumes the stack grows up, which means non of these
has been working or used in a long time.
Rather than trying to fix this unused features its probably cleaner to
just remove it for now and restore it rom git history if its someone
that anyone actually wants to use in the future.
|
|
|
|
| |
The use of these passes was removed on the emscripten side
in https://github.com/emscripten-core/emscripten/pull/12536.
|
|
|
|
| |
We can't validate or print out the wasm in that case, but at least
logging the names as they run can help debug some situations.
|
|
|
|
| |
This pass will convert a module with 64-bit loads and stores accessing a 64-bit memory to a regular 32-bit one.
Pointers remain 64-bit but are truncated just before use.
|
|
|
| |
See emscripten-core/emscripten#11860
|
|
|
|
|
|
| |
This PR contains:
- Changes that enable/disable tests on Windows to allow for better local testing.
- Also changes many abort() into Fatal() when it is really just exiting on error. This is because abort() generates a dialog window on Windows which is not great in automated scripts.
- Improvements to CMake to better work with the project in IDEs (VS).
|
|
|
|
|
|
| |
Two new flags here, one to completely removes dynCalls, and another to
limit them to only signatures that contains i64.
See #3043
|
|
|
|
|
|
|
|
|
| |
* Unifies internal hashing helpers to naturally integrate with std::hash
* Removes the previous custom implementation
* Computed hashes are now always size_t
* Introduces a hash_combine helper
* Fixes an overwritten partial hash in Relooper.cpp
|
|
|
|
|
|
|
|
| |
This doesn't lower them - it just replaces the unsupported operation
with a drop. This will be useful for fuzzing, where to compare JS to the
correct semantics we must avoid operations where JS is not always
accurate.
Also fully document the i64 -> f32 conversion issue in JS.
|
|
|
|
|
|
| |
The core logic is still living in EmscriptenGlueGenerator because
its used also by fixInvokeFunctionNames.
As a followup we can figure out how to make these more independent.
|
|
|
|
|
| |
Pretty trivial, but will be useful in wasm2js testing, where we
can't assume an incorrectly-aligned load/store will still work,
so we'll need to be pessimistic about alignment there.
|
|
|
|
|
| |
This new pass takes an optional stack-check-handler argument
which is the name of the function to call on stack overflow.
If no argument is passed then it just traps.
|
|
|
|
| |
Doing it this way happens to re-order the __assign_got_entries
function in the module, but its otherwise NFC.
|
|
|
| |
First step in making wasm-emscripten-finalize use more passes.
|
|
|
|
|
|
| |
This moves the fuzzer de-NaN logic out into a separate pass. This is
cleaner and also better since the old way would de-NaN once, but then
the reducer could generate code with nans. The new way lets us de-NaN
while reducing.
|
|
|
|
| |
Based on freedback in #2741 it looks like we can use the existing
`simplify-globals-optimizing` pass to trigger this cleanups we need.
|
|
|
|
|
| |
Since the global is never read, we know that any write operation
will be unobservable.
|
|
|
|
|
|
|
|
|
|
|
| |
Anything that merges/swaps/etc. locals, or inlines, or merges functions,
must be disabled for now. However, that does still leave almost all
passes, so this should not affect output sizes much (and the full LLVM
optimizer can be run before too).
Over time we can resolve each of those FIXMEs.
The test output here shows how disabling those allows over twice as
much debug_line info to be preserved.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This replaces imports like env.foo with a.foo, which can
save a bunch of bytes when there are many imported
functions.
Note that by changing all the import names to a it ends
up requiring a single merged import module.
Note also that when doing this we modify all the imports,
minifying their modules and names (since it makes no
sense to be careful about minifying only modules known
to us - env/wasi - if we are minifyin the names of all
modules).
This will require an emscripten PR to benefit from it.
|
|
|
|
|
|
|
|
|
| |
When memory is packed and there are passive segments, bulk memory
operations that reference those segments by index need to be updated to
reflect the new indices and possibly split into multiple instructions
that reference multiple split segments. For some bulk-memory operations,
it is necessary to introduce new globals to explicitly track the drop
state of the original segments, but this PR is careful to only add
globals where necessary.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this, we can update DWARF debug line info properly as
we write a new binary.
To do that we track binary locations as we write. Each
instruction is mapped to the location it is written to. We
must also adjust them as we move code around because
of LEB optimization (we emit a function or a section
with a 5-byte LEB placeholder, the maximal size; later
we shrink it which is almost always possible).
writeDWARFSections() now takes a second param, the new
locations of instructions. It then maps debug line info from the
original offsets in the binary to the new offsets in the binary
being written.
The core logic for updating the debug line section is in
wasm-debug.cpp. It basically tracks state machine logic
both to read the existing debug lines and to emit the new
ones. I couldn't find a way to reuse LLVM code for this, but
reading LLVM's code was very useful here.
A final tricky thing we need to do is to update the DWARF
section's internal size annotation. The LLVM YAML writing
code doesn't do that for us. Luckily it's pretty easy, in
fixEmittedSection we just update the first 4 bytes in place
to have the section size, after we've emitted it and know
the size.
This ignores debug lines with a 0 in the line, col, or addr,
see WebAssembly/debugging#9 (comment)
This ignores debug line offsets into the middle of
instructions, which LLVM sometimes emits for some
reason, see WebAssembly/debugging#9 (comment)
Handling that would likely at least double our memory
usage, which is unfortunate - we are run in an LTO manner,
where the entire app's DWARF is present, and it may be
massive. I think we should see if such odd offsets are
a bug in LLVM, and if we can fix or prevent that.
This does not emit "special" opcodes for debug lines. Those
are purely an optimization, which I wanted to leave for
later. (Even without them we decrease the size quite a lot,
btw, as many lines have 0s in them...)
This adds some testing that shows we can load and save
fib2.c and fannkuch.cpp properly. The latter includes more
than one function and has nontrivial code.
To actually emit correct offsets a few minor fixes are
done here:
* Fix the code section location tracking during reading -
the correct offset we care about is the body of the code
section, not including the section declaration and size.
* Fix wasm-stack debug line emitting. We need to update
in BinaryInstWriter::visit(), that is, right before writing
bytes for the instruction. That differs from
* BinaryenIRWriter::visit which is a recursive function
that also calls the children - so the offset there would be
of the first child. For some reason that is correct with
source maps, I don't understand why, but it's wrong for
DWARF...
* Print code section offsets in hex, to match other tools.
Remove DWARFUpdate pass, which was useful for testing
temporarily, but doesn't make sense now (it just updates without
writing a binary).
cc @yurydelendik
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This imports LLVM code for DWARF handling. That code has the
Apache 2 license like us. It's also the same code used to
emit DWARF in the common toolchain, so it seems like a safe choice.
This adds two passes: --dwarfdump which runs the same code LLVM
runs for llvm-dwarfdump. This shows we can parse it ok, and will
be useful for debugging. And --dwarfupdate writes out the DWARF
sections (unchanged from what we read, so it just roundtrips - for
updating we need #2515).
This puts LLVM in thirdparty which is added here.
All the LLVM code is behind USE_LLVM_DWARF, which is on
by default, but off in JS for now, as it increases code size by 20%.
This current approach imports the LLVM files directly. This is not
how they are intended to be used, so it required a bunch of
local changes - more than I expected actually, for the platform-specific
stuff. For now this seems to work, so it may be good enough, but
in the long term we may want to switch to linking against libllvm.
A downside to doing that is that binaryen users would need to
have an LLVM build, and even in the waterfall builds we'd have a
problem - while we ship LLVM there anyhow, we constantly update
it, which means that binaryen would need to be on latest llvm all
the time too (which otherwise, given DWARF is quite stable, we
might not need to constantly update).
An even larger issue is that as I did this work I learned about how
DWARF works in LLVM, and while the reading code is easy to
reuse, the writing code is trickier. The main code path is heavily
integrated with the MC layer, which we don't have - we might want
to create a "fake MC layer" for that, but it sounds hard. Instead,
there is the YAML path which is used mostly for testing, and which
can convert DWARF to and from YAML and from binary. Using
the non-YAML parts there, we can convert binary DWARF to
the YAML layer's nice Info data, then convert that to binary. This
works, however, this is not the path LLVM uses normally, and it
supports only some basic DWARF sections - I had to add ranges
support, in fact. So if we need more complex things, we may end
up needing to use the MC layer approach, or consider some other
DWARF library. However, hopefully that should not affect the core
binaryen code which just calls a library for DWARF stuff.
Helps #2400
|
|
|
|
|
| |
Currently `BINARYEN_PASS_DEBUG=3` prints `.wasm` files but they are
actually text wast files. This makes `BINARYEN_PASS_DEBUG=3` prints both
wasm/wast files, where wasm contains a binary file and wast a text file.
|
|
|
|
|
|
| |
This pass writes and reads the module. This shows the effects
of converting to and back from the binary format, and will be
useful in testing dwarf debug support (where we'll need to see
that writing and reading a module preserves debug info properly).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
clang/llvm introduce __original_main as a workaround for
the fact that main may have different signatures. A downside
to that is that users get it in stack traces, which is confusing.
In -O2 and above we normally inline __original_main anyhow,
but as this is for debugging, non-optimized builds matter too,
so add a pass for this.
The implementation is trivial, just call doInling. However we
must check some corner cases first.
Bonus minor fixes to FindAllPointers, which unnecessarily
created an object to get the class Id (which is not valid
for all classes), and that it didn't take the input by
reference properly, which meant we couldn't get the
pointer to the function body's toplevel.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass strips DWARF debug sections, but not other debug
sections. This is useful when emitting source maps, as we do
need the SourceMapURL section, but the DWARF sections are
not longer necessary (and we've seen a testcase where they
are massively large, so big the wasm can't even be loaded in
a browser...).
Also contains a trivial one-line fix in --extract-function which
was necessary to create the testcase here: that pass extracts
a function from a wasm file (like llvm-extract) but it didn't check
if an export already existed for the function.
|
|
|
|
|
| |
Adds the AssemblyScript-specific passes post-assemblyscript
and post-assemblyscript-finalize, eliminating redundant ARC-style
retain/release patterns conservatively emitted by the compiler.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These passes are meant to be run after Asyncify has been run, they modify the
output. We can assume that we will always unwind if we reach an import, or
that we will never unwind, etc.
This is meant to help with lazy code loading, that is, the ability for an
initially-downloaded wasm to not contain all the code, and if code not present
there is called, we download all the rest and continue with that. That could
work something like this:
* The wasm is created. It contains calls to a special import for lazy code
loading.
* Asyncify is run on it.
* The initially downloaded wasm is created by running
--mod-asyncify-always-and-only-unwind: if the special import for lazy code
loading is called, we will definitely unwind, and we won't rewind in this binary.
* The lazily downloaded wasm is created by running --mod-asyncify-never-unwind:
we will rewind into this binary, but no longer need support for unwinding.
(Optionally, there could also be a third wasm, which has not had Asyncify run
on it, and which we'd swap to for max speed.)
These --mod-asyncify passes allow the optimizer to do a lot of work, especially
for the initially downloaded wasm if we have lots of calls to the lazy code
loading import. In that case the optimizer will see that those calls unwind,
which means the code after them is not reached, potentially making lots of code
dead and removable.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimizes stuff like
(global.set $x (i32.const 123))
(global.get $x)
into
(global.set $x (i32.const 123))
(i32.const 123)
This doesn't help much with LLVM output as it's rare to use globals (except for the stack pointer, and that's already well optimized), but it may help on general wasm. It can also help with Asyncify that does use globals extensively.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is both an optimization and a workaround for the problem that emscripten-core/emscripten#7641 uncovered and had to be reverted because of.
What's going on there is that wasm-emscripten-finalize turns emscripten_longjmp_jmpbuf into emscripten_longjmp (for some LLVM internal reason - there's a long comment in the source that I didn't fully follow). There are two such imports already, one for each name, and before that PR, we ended up with just one. After that PR, we end up with two. And with two, the minification of import names gets confused - we have two imports with the same name, and the code there ends up ignoring one of them.
I'm not sure why that PR changed things - I guess the wasm-emscripten-finalize code looks at the name, and that PR changed what name appears? @sbc100 maybe #2285 is related?
Anyhow, it's not trivial to make import minification code support two identical imports, but I don't think we should - we should avoid having such duplication anyhow. And we should add an assert that they don't exist (I'll open a PR for that later when it's possible).
This fixes the duplication by adding a useful pass to remove duplicate imports (just functions, for now). Pretty simple, but we didn't do it yet. Even if there is a wasm-emscripten-finalize bug we need to fix with those duplicate imports, I think this pass is still a good thing to add.
I confirmed that this fixes the issue caused by that PR.
|
|
|
|
|
|
|
|
|
| |
(#2242)
Main change here is in pass.h, everything else is changes to work with the new API.
The add("name") remains as before, while the weird variadic add(..) which constructed the pass now just gets a std::unique_ptr of a pass. This also makes the memory management internally fully automatic. And it makes it trivial to parallelize WalkerPass::run on parallel passes.
As a benefit, this allows removing a lot of code since in many cases there is no need to create a new pass runner, and running a pass can be just a single line.
|
|
|
|
|
| |
* Clarify the difference between old and new Asyncify.
* Remove the old --bysyncify pass option.
|
|
|
|
|
|
|
| |
After some discussion this seems like a less confusing name: what the pass does is "asyncify" code, after all.
The one downside is the name overlaps with the old emscripten "Asyncify" utility, which we'll need to clarify in the docs there.
This keeps the old --bysyncify flag around for now, which is helpful for avoiding temporary breakage on CI as we move the emscripten side as well.
|
|
|
|
|
| |
Fix and test mutable globals support, replace string literals with
constants, and add a pass to emit the target features section.
|
|
|
|
|
|
|
|
|
| |
This adds a new pass, Bysyncify, which transforms code to allow unwind and rewinding the call stack and local state. This allows things like coroutines, turning synchronous code asynchronous, etc.
The new pass file itself has a large comment on top with docs.
So far the tests here seem to show this works, but this hasn't been tested heavily yet. My next step is to hook this up to emscripten as a replacement for asyncify/emterpreter, see emscripten-core/emscripten#8561
Note that this is completely usable by itself, so it could be useful for any language that needs coroutines etc., and not just ones using LLVM and/or emscripten. See docs on the ABI in the pass source.
|
|
|
|
|
|
|
|
|
|
| |
* work
* fix
* fix
* format
|
|
|
|
|
|
| |
This is useful for front-ends which wish to selectively enable or
disable coloring.
Also expose these APIs from the C API.
|
|
|
|
|
| |
In JS a reinterpret is especially expensive, as we implement it as a write to a temp buffer and a read using another view. This finds places where we load a value from memory, then reinterpret it later - in that case, we can load it using another view, at the cost of another load and another local.
This is helpful on things like Box2D, where there are many reinterprets due to the main 2D vector class being an union over two floats/ints, and LLVM likes to do a single i64 load of them.
|
|
|
| |
Helps to avoid trampling each other when binaryen is called multiple times from emcc, for example.
|
|
|
|
|
|
|
| |
If a global is marked mutable but not assigned to, make it immutable.
If an immutable global is a copy of another, use the original, so we can remove the duplicates.
Fixes #2011
|
|
|
|
|
| |
This replaces the wasm2js code that lowered them to pessimistic (1-byte aligned) loads and stores. The new pass will do the optimal thing, keeping 2-byte alignment where possible.
This is also nicer as a standalone pass, which has the simple property that after it runs all loads and stores are aligned, instead of some code scattered inside wasm2js.
|
|
|
| |
Applies the changes in #2065, and temprarily disables the hook since it's too slow to run on a change this large. We should re-enable it in a later commit.
|
|
|
| |
Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
|
|
|
|
|
| |
In the absence of the target features section or command line flags. When there are command line flags, it is an error if they do not exactly match the target features section, except if --detect-features has been provided.
Also adds a --print-features pass to print the command line flags for all enabled options and uses it to make the feature tests more rigorous.
|
|
|
|
|
| |
This allows us to emit a (potentially modified) target features
section and conditionally emit other sections such as the DataCount
section based on the presence of features.
|
|
|
|
|
|
| |
It was previously part of writing a binary, but changing the number of
segments at such a late stage would not work in the presence of bulk
memory's datacount section. Also updates the memory packing pass
to respect the web's limits on the number of data segments.
|
| |
|