| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Specified at
https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
|
|
|
|
|
|
| |
Almost no actual change in the files except for a license update. The new license
is a proper FOSS one, it turns out, see #5947
Fixes #5947
|
|
|
|
|
| |
Unfortunately there isn't a single place where an error may occur. I tested on
several files with different flags and added sufficient warnings so that we warn
on them all.
|
|
|
| |
This was missed in #4536.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add gtest as a git submodule in third_party and integrate it into the build the
same way WABT does. Adds a new executable, `binaryen-unittests`, to execute
`gtest_main`. As a nontrivial example test, port one of the `TypeBuilder` tests
from example/ to gtest/.
Using gtest has a number of advantages over the current example tests:
- Tests are compiled and linked at build time rather than runtime, surfacing
errors earlier and speeding up test execution.
- Tests are all built into a single binary, reducing overall link time and
further reducing test overhead.
- Tests are built from the same CMake project as the rest of Binaryen, so
compiler settings (e.g. sanitizers) are applied uniformly rather than having
to be separately set via the COMPILER_FLAGS environment variable.
- Using the industry-standard gtest rather than our own script reduces our
maintenance burden.
Using gtest will lower the barrier to writing C++ tests and will hopefully lead
to us having more proper unit tests.
|
|
|
|
|
| |
(on VS2019)
Triggered by:
wasm-emscripten-finalize --minimize-wasm-changes -g --bigint --no-dyncalls --dwarf test_asan_api.wasm -o test_asan_api.wasm --detect-features
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A recent change in LLVM causes it to sometimes end up with a thing
with no parent. That is, a debug_line or a debug_loc that has no CU that
refers to it. This is perhaps LLVM DCEing CUs, or something else that
changed - not sure. But it seems like valid DWARF we should handle.
This PR handles that in our code. Two things broke here. First, locs must
be simply ignored when there is no CU. Second, lines are trickier as we
used to compute their position by scanning them, and that list contained
only ones with a CU. So we missed some and ended up with wrong offsets.
To make things simpler and more robust, just track the position of each
line table on itself.
Fixes #3697
|
|
|
|
|
| |
The address size was hard-coded to 4, it now gets this information from .debug_info.
This required changing the parsing order.
Also made failure to parse .debug_loc fail the program, as before this error was easy to ignore.
|
|
|
| |
And fix errors from such a build.
|
|
|
| |
This removes `exnref` type and `br_on_exn` instruction.
|
|
|
|
| |
We change the AddrSize which causes all DW_FORM_addr to be written differently.
Depends on https://reviews.llvm.org/D91395
|
|
|
|
|
|
| |
Apparently I misunderstood the DWARF spec on this. Abbreviation offsets are all
relative to 1 (as 0 is not a valid number). For some reason I thought the first DIE's
index was the "base", as in practice LLVM always emits the lowest index there,
and that's what the LLVM YAML code suggested to me.
|
|
|
| |
Adds a new script `./third_party/setup.py` to conveniently install necessary dependencies for testing and fuzzing, including the SpiderMonkey JS shell (mozjs), the V8 JS shell and WABT. Other scripts now automatically pick these up when installed and fall back to look for the tools in PATH like before.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous code assumed that each compile unit had its own
abbreviation section, and they are all in order. That's normally how
LLVM emits things, but in #2992 there is a testcase in which linking
of object files with IR files somehow ends up with a different order.
The proper fix is to track the binary offsets of abbreviations in the
abbreviation section. That section is comprised of null-terminated
lists, which each CU has an offset to the beginning of. With those
offsets, we can match things properly.
Add a testcase that crashes without this, to prevent regressions.
Fixes #2992
Fixes #3007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#2862)
In the .debug_loc section the Start/End address offsets in a location list are
relative to the address of the compilation unit that refers that location list.
There is a problem in function wasm::Debug:: updateLoc(), which compares these
offsets with the actual module addresses of expressions and functions, causing
the generation of invalid location lists.
The fix is not trivial, because the DWARF debug_loc section does not specify
which is the compilation unit associated to each location list entry.
A simple workaround is to store, in LocationUpdater, a map of location list
offsets to the base address of the compilation units referencing them, and that
can be easily calculated in updateDIE().
|
|
|
|
|
|
|
|
|
|
| |
Such a module can't have valid DIEs, since we have no way to
interpret them.
Also check if DWARF sections from LLVM have contents -
when they are empty the section may exist but have a null
for its data.
Fixes #2673
|
|
|
|
| |
:: (#2669)
|
|
|
|
|
|
|
|
|
| |
Each compilation unit's abbreviations must be terminated by
a zero, so that we use the right abbreviations. This adds that
support to the YAML layer, both adding the zeros and parsing
them to look in the right abbreviation section at the right time.
Also add two large testcases, zlib and cubescript, which
crash without this and the last PR.
|
|
|
|
|
|
|
|
| |
(#2628)
The debug_line section is the only one in which we change
sizes and so must update offsets. It turns out that there are such
offsets, DW_AT_stmt_list, so without updating them we can't
handle multi-unit dwarf files.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for that section to the YAML layer, and add
code to update it.
The updating is slightly tricky - unlike .debug_ranges, the
size of entries is not fixed. So we can't just skip entries,
as the end marker is smaller than a normal entry. Instead,
replace now-invalid segments with (1, 1) which is of size
0 and so should be ignored by the debugger (we can't use
(0, 0) as that would be an end marker, and (-1, *) is
the special base marker).
In the future we probably do want to do this in a more
sophisticated manner, completely rewriting the indexes
into the section as well. For now though this should be
enough for when binaryen does not optimize (as we
don't move/reorder anything).
Note that this doesn't update the location description
(like where on the wasm expression stack the value is).
Again, that is correct for when binaryen doesn't
optimize, but for fully optimized builds we would need
to track things (which would be hard!).
Also clean up some code that uses "Extra" instead of
"Delimiter" that was missed before, and shorten some
unnecessarily long names.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes the testcase in #2343 (comment)
Looks like that's from Rust. Not sure why it would have an invalid
abbreviation code, but perhaps the LLVM there emits dwarf differently
than we've tested on so far. May be worth investigating further, but
for now emit a warning, skip that element, and don't crash.
Also fix valgrind warnings about Span values not being initialized,
which was invalid and bad as well (wasted memory in our maps,
and might have overlapped with real values), and interfered with
figuring this out.
|
|
|
|
| |
From
llvm/llvm-project@adf7a0a
|
|
|
|
|
| |
gimli-rs and perhaps also the spec require a final 0
to terminate the list. LLVM itself is fine without that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Multiple tables appear to be emitted when linking files
together. This fixes our support for that, which did not
update their size properly. This required patching the
YAML emitting code from LLVM in order to measure
the size and then emit it, as that code is apparently
not designed to handle changes in line table
contents.
Other minor fixes:
* Set the flags for our dwarfdump command to emit
the same as llvm-dwarfdump does with -v -all.
* Add support for a few more opcodes,
set_discriminator, set_basic_block, fixed_advance_pc,
set_isa.
* Handle a compile unit without abbreviations in the
YAML code (again, apparently not something this
LLVM code was intended to do).
* Handle a compile unit with zero entries in the
YAML code (ditto).
* Properly set the AddressSize - we use the
DWARFContext in a different way than LLVM expects,
apparently.
With this the emscripten test suite passes with
-gforce_dwarf without crashing.
My overall impression so from the the YAML code is
that it probably isn't a long-term solution for us. Perhaps
it may end up being scaffolding, that is, we can
replace it with our own code eventually that is based
on it, and remove most of the LLVM code. Before
deciding that we should get everything working first,
and this seems like the quickest path there.
|
|
|
|
| |
This requires removing some more LLVM code, which
was only pulled in a debug build.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this, we can update DWARF debug line info properly as
we write a new binary.
To do that we track binary locations as we write. Each
instruction is mapped to the location it is written to. We
must also adjust them as we move code around because
of LEB optimization (we emit a function or a section
with a 5-byte LEB placeholder, the maximal size; later
we shrink it which is almost always possible).
writeDWARFSections() now takes a second param, the new
locations of instructions. It then maps debug line info from the
original offsets in the binary to the new offsets in the binary
being written.
The core logic for updating the debug line section is in
wasm-debug.cpp. It basically tracks state machine logic
both to read the existing debug lines and to emit the new
ones. I couldn't find a way to reuse LLVM code for this, but
reading LLVM's code was very useful here.
A final tricky thing we need to do is to update the DWARF
section's internal size annotation. The LLVM YAML writing
code doesn't do that for us. Luckily it's pretty easy, in
fixEmittedSection we just update the first 4 bytes in place
to have the section size, after we've emitted it and know
the size.
This ignores debug lines with a 0 in the line, col, or addr,
see WebAssembly/debugging#9 (comment)
This ignores debug line offsets into the middle of
instructions, which LLVM sometimes emits for some
reason, see WebAssembly/debugging#9 (comment)
Handling that would likely at least double our memory
usage, which is unfortunate - we are run in an LTO manner,
where the entire app's DWARF is present, and it may be
massive. I think we should see if such odd offsets are
a bug in LLVM, and if we can fix or prevent that.
This does not emit "special" opcodes for debug lines. Those
are purely an optimization, which I wanted to leave for
later. (Even without them we decrease the size quite a lot,
btw, as many lines have 0s in them...)
This adds some testing that shows we can load and save
fib2.c and fannkuch.cpp properly. The latter includes more
than one function and has nontrivial code.
To actually emit correct offsets a few minor fixes are
done here:
* Fix the code section location tracking during reading -
the correct offset we care about is the body of the code
section, not including the section declaration and size.
* Fix wasm-stack debug line emitting. We need to update
in BinaryInstWriter::visit(), that is, right before writing
bytes for the instruction. That differs from
* BinaryenIRWriter::visit which is a recursive function
that also calls the children - so the offset there would be
of the first child. For some reason that is correct with
source maps, I don't understand why, but it's wrong for
DWARF...
* Print code section offsets in hex, to match other tools.
Remove DWARFUpdate pass, which was useful for testing
temporarily, but doesn't make sense now (it just updates without
writing a binary).
cc @yurydelendik
|
|
This imports LLVM code for DWARF handling. That code has the
Apache 2 license like us. It's also the same code used to
emit DWARF in the common toolchain, so it seems like a safe choice.
This adds two passes: --dwarfdump which runs the same code LLVM
runs for llvm-dwarfdump. This shows we can parse it ok, and will
be useful for debugging. And --dwarfupdate writes out the DWARF
sections (unchanged from what we read, so it just roundtrips - for
updating we need #2515).
This puts LLVM in thirdparty which is added here.
All the LLVM code is behind USE_LLVM_DWARF, which is on
by default, but off in JS for now, as it increases code size by 20%.
This current approach imports the LLVM files directly. This is not
how they are intended to be used, so it required a bunch of
local changes - more than I expected actually, for the platform-specific
stuff. For now this seems to work, so it may be good enough, but
in the long term we may want to switch to linking against libllvm.
A downside to doing that is that binaryen users would need to
have an LLVM build, and even in the waterfall builds we'd have a
problem - while we ship LLVM there anyhow, we constantly update
it, which means that binaryen would need to be on latest llvm all
the time too (which otherwise, given DWARF is quite stable, we
might not need to constantly update).
An even larger issue is that as I did this work I learned about how
DWARF works in LLVM, and while the reading code is easy to
reuse, the writing code is trickier. The main code path is heavily
integrated with the MC layer, which we don't have - we might want
to create a "fake MC layer" for that, but it sounds hard. Instead,
there is the YAML path which is used mostly for testing, and which
can convert DWARF to and from YAML and from binary. Using
the non-YAML parts there, we can convert binary DWARF to
the YAML layer's nice Info data, then convert that to binary. This
works, however, this is not the path LLVM uses normally, and it
supports only some basic DWARF sections - I had to add ranges
support, in fact. So if we need more complex things, we may end
up needing to use the MC layer approach, or consider some other
DWARF library. However, hopefully that should not affect the core
binaryen code which just calls a library for DWARF stuff.
Helps #2400
|