summaryrefslogtreecommitdiff
path: root/src/wasm/wasm-debug.cpp
Commit message (Collapse)AuthorAgeFilesLines
* DWARF: Ignore debug_loc spans that are invalid (#2939)Alon Zakai2020-07-011-2/+3
| | | | | | | | An (x, y) span is updated to some (q, r) in the new binary. If q > r then the span is no longer valid - the optimizer has reordered things too much. It's possible this could be flipped, but I'm not certain. It seems safer to just omit these, which are very rare (I only see this on some larger testcases in the emscripten test suite).
* DWARF: Never emit (0, 0) to mean an empty span in debug_loc (#2940)Alon Zakai2020-07-011-0/+10
| | | | | | | | | After mapping to the new positions, and after relativizing to the base, if we end up with (0, 0) then we must emit something else, as that would be interpreted as the end of a list. As it is an empty span, the actual value doesn't matter, it just has to be != 0. This can happen if the very first span in a compile unit is an empty span, in which case relative to the base of the compile unit we would have (0, 0).
* DWARF: Always update .debug_loc base offsets (#2936)Alon Zakai2020-06-301-47/+88
| | | | | | | | | | | | | | | | | | | | | .debug_loc entries can have bases: a value that all values after it in the list are relative to. Previously we used to keep the base value as it was, to keep things as similar to the original DWARF as possible. However, if optimizations move code around so that the values after the base are before the base, then the values could no longer be emitted, and we skipped them in effect. This PR makes us always pick a new base for each list. This allows the base to always work for the values after it, but does mean we change the lists quite a lot more. If there is any extra meaning to the original bases here we may lose that, but the DWARF spec doesn't seem to indicate anything like that (however, it isn't clear to me why LLVM then doesn't always choose the maximal base as the code here does - LLVM's values seem oddly arbitrary). Also properly note the base of each compile unit, which previously we just noted the old value, but didn't look at the new one in the new binary being written.
* DWARF: Track sequences so that we can handle reordering within one (#2932)Alon Zakai2020-06-251-27/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we tracked sequence ends, so if an instruction was marked as the end, we'd keep marking it that way in the output. However, if X, Y, Z form a sequence that is then reordered into Z, Y, X then we need to emit the end on X now. To do that, give a "sequence number" to each debug line. Then when emitting, we can tell if two adjacent lines are in a sequence or not, and emit the end properly. This fixes a large partner testcase, allowing llvm-dwarfdump --verify --debug-line to pass on it. With this change it is easier to remove the hackish handling of prologueEnd that we had before, where we reset it. Instead, just emit it when it is set, and that's all. In particular we can get rid of the // Reset the state and resetAfterLine() calls in emitDiff. That function now just emits a diff, with no side effects, and is marked const. This refactoring moves the needToEmit() check to an earlier place. Instead of noting lines we'll never emit, don't even note them at all. The test diff seems large, but it is all due to one small change that then changes all the later offsets: - 0x00000831: 01 DW_LNS_copy - 0x000000000000086e 43 4 1 0 0 is_stmt + 0x00000831: 00 DW_LNE_end_sequence + 0x000000000000086e 43 4 1 0 0 is_stmt end_sequence Note how we add end_sequence there. We used to have an entry right after it with line 0 that was marked as the end of the sequence. In the new code, we don't emit that unnecessary line (which was previously only emitted for the end sequence!) and instead emit the end sequence on the last valid line.
* DWARF: Fix sequence_end emitting (#2929)Alon Zakai2020-06-241-2/+3
| | | | | | | | | | | | | We must emit those, even if otherwise it looks like a line we can omit, as the ends of sequences have important meaning and dwarfdump will warn without them. Looks like fannkuch0 in the test suite already had an example of an incorrectly-omitted sequence_end, so no need for a new testcase. Verified that without this e.g. wasm2.test_exceptions with -g added will lead to a wasm that warns, but with this PR the debug_line section is reported as valid by dwarfdump.
* Fix DWARF location list updating with nonzero compilation unit base addr ↵Paolo Severini2020-05-271-3/+32
| | | | | | | | | | | | | | | | (#2862) In the .debug_loc section the Start/End address offsets in a location list are relative to the address of the compilation unit that refers that location list. There is a problem in function wasm::Debug:: updateLoc(), which compares these offsets with the actual module addresses of expressions and functions, causing the generation of invalid location lists. The fix is not trivial, because the DWARF debug_loc section does not specify which is the compilation unit associated to each location list entry. A simple workaround is to store, in LocationUpdater, a map of location list offsets to the base address of the compilation units referencing them, and that can be easily calculated in updateDIE().
* [dwarf] Handle a bad mapped base in debug_loc updating (#2859)Alon Zakai2020-05-181-3/+17
| | | | Turns out we had a testcase for this already, but were doing the wrong thing on it.
* DWARF: Ignore a compile unit with no abbreviations (#2678)Alon Zakai2020-03-041-1/+1
| | | | | | | | | | Such a module can't have valid DIEs, since we have no way to interpret them. Also check if DWARF sections from LLVM have contents - when they are empty the section may exist but have a null for its data. Fixes #2673
* DWARF: Fix debug_range handling of invalid entries (#2662)Alon Zakai2020-02-181-18/+14
| | | | | | | | | | | | | | If an invalid entry appears - either it began as such, or became invalid after optimization - we should not emit (0, 0) which is an end marker. Instead, emit an invalid entry marker, something with (0, x) for x != 0. As a bonus, if a test/passes case has "noprint" in the name, don't print the wasm, which we do by default. In the testcase here for example we just care about the dwarf, and the printed module would be quite large. Thank you to @paolosevMSFT for identifying and suggesting the fix.
* DWARF: Update DW_AT_stmt_list which are offsets into the debug_line section ↵Alon Zakai2020-01-281-21/+51
| | | | | | | | (#2628) The debug_line section is the only one in which we change sizes and so must update offsets. It turns out that there are such offsets, DW_AT_stmt_list, so without updating them we can't handle multi-unit dwarf files.
* DWARF: Properly emit signed 32 bit values for advance_line (#2625)Alon Zakai2020-01-241-1/+4
| | | | | | | | | The LLVM SData field is 64-bit (to support 64-bit addresses I suppose) so when we assigned to it we actually led it to emit an LEB for a signed 64-bit value that is an unsigned 32-bit one. This worked in LLVM (where I guess it forces the value to 32-bit anyhow?) but failed in gimli (where I guess it doesn't?).
* DWARF: Update .debug_loc (#2616)Alon Zakai2020-01-231-47/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for that section to the YAML layer, and add code to update it. The updating is slightly tricky - unlike .debug_ranges, the size of entries is not fixed. So we can't just skip entries, as the end marker is smaller than a normal entry. Instead, replace now-invalid segments with (1, 1) which is of size 0 and so should be ignored by the debugger (we can't use (0, 0) as that would be an end marker, and (-1, *) is the special base marker). In the future we probably do want to do this in a more sophisticated manner, completely rewriting the indexes into the section as well. For now though this should be enough for when binaryen does not optimize (as we don't move/reorder anything). Note that this doesn't update the location description (like where on the wasm expression stack the value is). Again, that is correct for when binaryen doesn't optimize, but for fully optimized builds we would need to track things (which would be hard!). Also clean up some code that uses "Extra" instead of "Delimiter" that was missed before, and shorten some unnecessarily long names.
* DWARF: Update debug_ranges (#2612)Alon Zakai2020-01-221-2/+55
| | | | | | | | | | | | | | | | | | Pretty straightforward given all we have so far. Note that fannkuch3_manyopts has an example of a sequence of ranges of which some must be skipped while others must not, showing we handle that by skipping the bad ones and updating the remaining. That is, if that we have a sequence of two (begin, end) spans [(10, 20), (30, 40)] It's possible (10, 20) maps in the new binary to (110, 120) while (30, 40) was eliminated by the optimizer and we have nothing valid to map it to. In that case we emit [(110, 120)]
* DWARF: Fix debug lines in fannkuch -O0 (#2611)Alon Zakai2020-01-221-6/+18
| | | | | | | | | | | | | | | | Just some trivial fixes: * Properly reset prologue after each line (unlike others, this flag should be reset immediately). * Test for a function's end address first, as LLVM output appears to use 1-past-the-end-of-the-function as a location in that function, and not the next (note the first byte of the next function, which is ambiguously identical to that value, is used at least in low_pc; I'm not sure if it's used in debug lines too). * Ignore the same address if LLVM emitted it more than once, which it does sometimes.
* DWARF: Use end_sequence and copy properly (#2610)Alon Zakai2020-01-221-6/+12
| | | | | | | We need to track end_sequence directly, and use either end_sequence or copy (copy emits a line without marking it as ending a sequence). After this, fib2 debug line output looks perfect.
* DWARF: Allow debug lines with column 0 (#2609)Alon Zakai2020-01-221-2/+2
| | | | While line and address values of 0 should be skipped, it seems like column 0 are valid lines emitted by LLVM.
* DWARF: Track more function locations (#2604)Alon Zakai2020-01-221-27/+68
| | | | | | | | | | | | | | DWARF from LLVM can refer to the first byte belonging to the function, where the size LEB is, or to the first byte after that, where the local declarations are, or the end opcode, or to one byte past that which is one byte past the bytes that belong to the function. We aren't sure why LLVM does this, but track it all for now. After this all debug line positions are identified. However, in some cases a debug line refers to one past the end of the function, which may be an LLVM bug. That location is ambiguous as it could also be the first byte of the next function (what made this discovery possible was when this happened to the last function, after which there is another section).
* DWARF: Track the positions of 'end', 'else', 'catch' binary locations (#2603)Alon Zakai2020-01-211-13/+58
| | | | | | | | | | | | | | | | | Control flow structures have those in addition to the normal span of (start, end), and we need to track them too. Tracking them during reading requires us to track control flow structures while parsing, so that we can know to which structure an end/else/catch refers to. We track these locations using a map on the side of instruction to its "extra" locations. That avoids increasing the size of the tracking info for the much more common non-control flow instructions. Note that there is one more 'end' location, that of the function (not referring to any instruction). I left that to a later PR to not increase this one too much.
* Update debug line info with function entries (#2600)Alon Zakai2020-01-171-1/+12
| | | | | | LLVM points to the start of the function in some debug line entries - right after the size LEB of the function, which is where the locals are declared, and before any instructions.
* Use BinaryLocation instead of hardcoding uint32_t (#2598)Alon Zakai2020-01-161-17/+17
| | | | This will make it easier to switch to something else for offsets in wasm binaries if we get >4GB files.
* DWARF: high_pc computation (#2595)Alon Zakai2020-01-161-46/+113
| | | | | | | Update high_pc values. These are interesting as they may be a relative offset compared to the low_pc. For functions we already had both a start and an end. Add such tracking for instructions as well.
* DWARF: Function location tracking (#2592)Alon Zakai2020-01-161-32/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Track the beginning and end of each function, both when reading and writing. We track expressions and functions separately, instead of having a single big map of (oldAddr) => (newAddr) because of the potentially ambiguous case of the final expression in a function: it's end might be identical in offset to the end of the function. So we have two different things that map to the same offset. However, if the context is "the end of the function" then the updated address is the new end of the function, even if the function ends with a different instruction now, as the old last instruction might have moved or been optimized out. Concretely, we have getNewExprAddr and getNewFuncAddr, so we can ask to update the location of either an expression or a function, and use that contextual information. This checks for the DIE tag in order to know what we are looking for. To be safe, if we hit an unknown tag, we halt, so that we don't silently miss things. As the test updates show, the new things we can do thanks to this PR are to update compile unit and subprogram low_pc locations. Note btw that in the first test (dwarfdump_roundtrip_dwarfdump.bin.txt) we change 5 to 0: that is correct since that test does not write out DWARF (it intentionally has no -g), so we do not track binary locations while writing, and so we have nothing to update to (the other tests show actual updating). Also fix the order in the python test runner code to show a diff of expected to encountered, and not the reverse, which confused me.
* DWARF updating: update DW_AT_low_pc attributes (#2584)Alon Zakai2020-01-141-18/+102
| | | | | | | Mostly straightforward: go over the dwarf entries, find the low_pc ones, and update their positions. A slight oddity is that we must traverse both the dwarf context - which has the rich APIs for analsis - and the YAML data structure - which is minimal but is used for writing out.
* Omit DWARF debug line ranges starting with 0 (#2587)Alon Zakai2020-01-141-0/+25
| | | | | | | | | | | Check if an entry starts a new range of addresses. Each range is a set of related addresses, where in particular, if the first has been zeroed out by the linker, we must omit the entire range. If we do not, then the initial range is 0 and the others are offsets relative to it, which will look like random addresses, perhaps into the middle of instructions, and perhaps that happen to collide with real ones (a debugger would ignore those, so we must too; it's easier and better to simply omit them). See https://bugs.llvm.org/show_bug.cgi?id=44516#c2
* DWARF support for multiple line tables (#2557)Alon Zakai2020-01-091-23/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multiple tables appear to be emitted when linking files together. This fixes our support for that, which did not update their size properly. This required patching the YAML emitting code from LLVM in order to measure the size and then emit it, as that code is apparently not designed to handle changes in line table contents. Other minor fixes: * Set the flags for our dwarfdump command to emit the same as llvm-dwarfdump does with -v -all. * Add support for a few more opcodes, set_discriminator, set_basic_block, fixed_advance_pc, set_isa. * Handle a compile unit without abbreviations in the YAML code (again, apparently not something this LLVM code was intended to do). * Handle a compile unit with zero entries in the YAML code (ditto). * Properly set the AddressSize - we use the DWARFContext in a different way than LLVM expects, apparently. With this the emscripten test suite passes with -gforce_dwarf without crashing. My overall impression so from the the YAML code is that it probably isn't a long-term solution for us. Perhaps it may end up being scaffolding, that is, we can replace it with our own code eventually that is based on it, and remove most of the LLVM code. Before deciding that we should get everything working first, and this seems like the quickest path there.
* DWARF debug line updating (#2545)Alon Zakai2019-12-201-11/+305
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this, we can update DWARF debug line info properly as we write a new binary. To do that we track binary locations as we write. Each instruction is mapped to the location it is written to. We must also adjust them as we move code around because of LEB optimization (we emit a function or a section with a 5-byte LEB placeholder, the maximal size; later we shrink it which is almost always possible). writeDWARFSections() now takes a second param, the new locations of instructions. It then maps debug line info from the original offsets in the binary to the new offsets in the binary being written. The core logic for updating the debug line section is in wasm-debug.cpp. It basically tracks state machine logic both to read the existing debug lines and to emit the new ones. I couldn't find a way to reuse LLVM code for this, but reading LLVM's code was very useful here. A final tricky thing we need to do is to update the DWARF section's internal size annotation. The LLVM YAML writing code doesn't do that for us. Luckily it's pretty easy, in fixEmittedSection we just update the first 4 bytes in place to have the section size, after we've emitted it and know the size. This ignores debug lines with a 0 in the line, col, or addr, see WebAssembly/debugging#9 (comment) This ignores debug line offsets into the middle of instructions, which LLVM sometimes emits for some reason, see WebAssembly/debugging#9 (comment) Handling that would likely at least double our memory usage, which is unfortunate - we are run in an LTO manner, where the entire app's DWARF is present, and it may be massive. I think we should see if such odd offsets are a bug in LLVM, and if we can fix or prevent that. This does not emit "special" opcodes for debug lines. Those are purely an optimization, which I wanted to leave for later. (Even without them we decrease the size quite a lot, btw, as many lines have 0s in them...) This adds some testing that shows we can load and save fib2.c and fannkuch.cpp properly. The latter includes more than one function and has nontrivial code. To actually emit correct offsets a few minor fixes are done here: * Fix the code section location tracking during reading - the correct offset we care about is the body of the code section, not including the section declaration and size. * Fix wasm-stack debug line emitting. We need to update in BinaryInstWriter::visit(), that is, right before writing bytes for the instruction. That differs from * BinaryenIRWriter::visit which is a recursive function that also calls the children - so the offset there would be of the first child. For some reason that is correct with source maps, I don't understand why, but it's wrong for DWARF... * Print code section offsets in hex, to match other tools. Remove DWARFUpdate pass, which was useful for testing temporarily, but doesn't make sense now (it just updates without writing a binary). cc @yurydelendik
* DWARF parsing and writing support using LLVM (#2520)Alon Zakai2019-12-191-0/+151
This imports LLVM code for DWARF handling. That code has the Apache 2 license like us. It's also the same code used to emit DWARF in the common toolchain, so it seems like a safe choice. This adds two passes: --dwarfdump which runs the same code LLVM runs for llvm-dwarfdump. This shows we can parse it ok, and will be useful for debugging. And --dwarfupdate writes out the DWARF sections (unchanged from what we read, so it just roundtrips - for updating we need #2515). This puts LLVM in thirdparty which is added here. All the LLVM code is behind USE_LLVM_DWARF, which is on by default, but off in JS for now, as it increases code size by 20%. This current approach imports the LLVM files directly. This is not how they are intended to be used, so it required a bunch of local changes - more than I expected actually, for the platform-specific stuff. For now this seems to work, so it may be good enough, but in the long term we may want to switch to linking against libllvm. A downside to doing that is that binaryen users would need to have an LLVM build, and even in the waterfall builds we'd have a problem - while we ship LLVM there anyhow, we constantly update it, which means that binaryen would need to be on latest llvm all the time too (which otherwise, given DWARF is quite stable, we might not need to constantly update). An even larger issue is that as I did this work I learned about how DWARF works in LLVM, and while the reading code is easy to reuse, the writing code is trickier. The main code path is heavily integrated with the MC layer, which we don't have - we might want to create a "fake MC layer" for that, but it sounds hard. Instead, there is the YAML path which is used mostly for testing, and which can convert DWARF to and from YAML and from binary. Using the non-YAML parts there, we can convert binary DWARF to the YAML layer's nice Info data, then convert that to binary. This works, however, this is not the path LLVM uses normally, and it supports only some basic DWARF sections - I had to add ranges support, in fact. So if we need more complex things, we may end up needing to use the MC layer approach, or consider some other DWARF library. However, hopefully that should not affect the core binaryen code which just calls a library for DWARF stuff. Helps #2400