summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* DWARF: Use end_sequence and copy properly (#2610)Alon Zakai2020-01-226-2446/+2274
| | | | | | | We need to track end_sequence directly, and use either end_sequence or copy (copy emits a line without marking it as ending a sequence). After this, fib2 debug line output looks perfect.
* DWARF: Allow debug lines with column 0 (#2609)Alon Zakai2020-01-226-1788/+1835
| | | | While line and address values of 0 should be skipped, it seems like column 0 are valid lines emitted by LLVM.
* DWARF: Track more function locations (#2604)Alon Zakai2020-01-224-41/+104
| | | | | | | | | | | | | | DWARF from LLVM can refer to the first byte belonging to the function, where the size LEB is, or to the first byte after that, where the local declarations are, or the end opcode, or to one byte past that which is one byte past the bytes that belong to the function. We aren't sure why LLVM does this, but track it all for now. After this all debug line positions are identified. However, in some cases a debug line refers to one past the end of the function, which may be an LLVM bug. That location is ambiguous as it could also be the first byte of the next function (what made this discovery possible was when this happened to the last function, after which there is another section).
* DWARF: Track the positions of 'end', 'else', 'catch' binary locations (#2603)Alon Zakai2020-01-218-1160/+1366
| | | | | | | | | | | | | | | | | Control flow structures have those in addition to the normal span of (start, end), and we need to track them too. Tracking them during reading requires us to track control flow structures while parsing, so that we can know to which structure an end/else/catch refers to. We track these locations using a map on the side of instruction to its "extra" locations. That avoids increasing the size of the tracking info for the much more common non-control flow instructions. Note that there is one more 'end' location, that of the function (not referring to any instruction). I left that to a later PR to not increase this one too much.
* Handle an invalid AbbrCode in DWARF handling (#2607)Alon Zakai2020-01-212-1/+8
| | | | | | | | | | | | | Fixes the testcase in #2343 (comment) Looks like that's from Rust. Not sure why it would have an invalid abbreviation code, but perhaps the LLVM there emits dwarf differently than we've tested on so far. May be worth investigating further, but for now emit a warning, skip that element, and don't crash. Also fix valgrind warnings about Span values not being initialized, which was invalid and bad as well (wasted memory in our maps, and might have overlapped with real values), and interfered with figuring this out.
* Unify JS memory segment API (#2533)Daniel Wirtz2020-01-215-5/+26
| | | | | | | | | | Binaryen.js now uses offset instead of byteOffset when inspecting a memory segment, matching the arguments on memory segment creation. Also adds inspection of the passive property. Previously, one would specify { offset, data, passive } on creation and get back { byteOffset, data } upon inspection. This PR unifies both to the keys on creation while also adding the respective C-API to retrieve passive status, which was missing.
* Simplify binary parsing a little (#2602)Alon Zakai2020-01-211-2/+4
| | | | | Instead of hackishly advancing the read position in the binary buffer, call readExpression which will do that, and also do all the debug info handling for us.
* Remove limit in the log length in fuzz_opt.py (#2601)Heejin Ahn2020-01-171-4/+2
| | | | | | It is convenient to have the full command when debugging fuzzing errors. The fuzzer sometimes fails before running `wasm-reduce` and being able to reproduce the command right away from the log is very handy in that case.
* Update debug line info with function entries (#2600)Alon Zakai2020-01-176-757/+839
| | | | | | LLVM points to the start of the function in some debug line entries - right after the size LEB of the function, which is where the locals are declared, and before any instructions.
* Expose ExpressionAnalyzer in C-/JS-API (#2585)Daniel Wirtz2020-01-176-0/+261
| | | | | | | Instead of reinventing the wheel on our side, this adds ExpressionAnalyzer bindings to the C- and JS-APIs, which can be useful for generators. For example, a generator may decide to simplify a compilation step if a subexpression doesn't have any side effects, or simply skip emitting something that is likely to compile to a drop or an empty block right away.
* Use BinaryLocation instead of hardcoding uint32_t (#2598)Alon Zakai2020-01-163-26/+32
| | | | This will make it easier to switch to something else for offsets in wasm binaries if we get >4GB files.
* DWARF: high_pc computation (#2595)Alon Zakai2020-01-1611-66/+151
| | | | | | | Update high_pc values. These are interesting as they may be a relative offset compared to the low_pc. For functions we already had both a start and an end. Add such tracking for instructions as well.
* Add EH support for CFGWalker (#2597)Heejin Ahn2020-01-163-0/+288
| | | | | | | | This adds EH instruction support for `CFGWalker`. This also implements `call` instruction handling within a try-catch; every call can possibly throw and unwind to the innermost catch block. This adds tests for RedundantSetElimination pass, which uses `CFGWalker`.
* Update DWARF testcases (#2594)Alon Zakai2020-01-166-8703/+9118
| | | | | This only touches test code. The files are compiled with latest LLVM + https://reviews.llvm.org/D71681 in order to get more realistic DWARF content.
* Update LLVM to support WASM_location (#2596)Alon Zakai2020-01-163-1/+6
| | | | From llvm/llvm-project@adf7a0a
* DWARF: Function location tracking (#2592)Alon Zakai2020-01-1611-71/+148
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Track the beginning and end of each function, both when reading and writing. We track expressions and functions separately, instead of having a single big map of (oldAddr) => (newAddr) because of the potentially ambiguous case of the final expression in a function: it's end might be identical in offset to the end of the function. So we have two different things that map to the same offset. However, if the context is "the end of the function" then the updated address is the new end of the function, even if the function ends with a different instruction now, as the old last instruction might have moved or been optimized out. Concretely, we have getNewExprAddr and getNewFuncAddr, so we can ask to update the location of either an expression or a function, and use that contextual information. This checks for the DIE tag in order to know what we are looking for. To be safe, if we hit an unknown tag, we halt, so that we don't silently miss things. As the test updates show, the new things we can do thanks to this PR are to update compile unit and subprogram low_pc locations. Note btw that in the first test (dwarfdump_roundtrip_dwarfdump.bin.txt) we change 5 to 0: that is correct since that test does not write out DWARF (it intentionally has no -g), so we do not track binary locations while writing, and so we have nothing to update to (the other tests show actual updating). Also fix the order in the python test runner code to show a diff of expected to encountered, and not the reverse, which confused me.
* Optimize passive segments in memory-packing (#2426)Thomas Lively2020-01-157-159/+1978
| | | | | | | | | When memory is packed and there are passive segments, bulk memory operations that reference those segments by index need to be updated to reflect the new indices and possibly split into multiple instructions that reference multiple split segments. For some bulk-memory operations, it is necessary to introduce new globals to explicitly track the drop state of the original segments, but this PR is careful to only add globals where necessary.
* Align binaryen.js with the npm package (#2551)Daniel Wirtz2020-01-1426-854/+739
| | | | | Binaryen.js now uses binaryen (was Binaryen) as its global name to align with the npm package. Also fixes issues with emitting and testing both the JS and Wasm builds.
* DWARF updating: update DW_AT_low_pc attributes (#2584)Alon Zakai2020-01-143-62/+146
| | | | | | | Mostly straightforward: go over the dwarf entries, find the low_pc ones, and update their positions. A slight oddity is that we must traverse both the dwarf context - which has the rich APIs for analsis - and the YAML data structure - which is minimal but is used for writing out.
* Omit DWARF debug line ranges starting with 0 (#2587)Alon Zakai2020-01-144-0/+1093
| | | | | | | | | | | Check if an entry starts a new range of addresses. Each range is a set of related addresses, where in particular, if the first has been zeroed out by the linker, we must omit the entire range. If we do not, then the initial range is 0 and the others are offsets relative to it, which will look like random addresses, perhaps into the middle of instructions, and perhaps that happen to collide with real ones (a debugger would ignore those, so we must too; it's easier and better to simply omit them). See https://bugs.llvm.org/show_bug.cgi?id=44516#c2
* Clear Mixedarena in ModuleUtils::clearModule (#2588)Heejin Ahn2020-01-131-0/+1
|
* Verify --version output matches CHANGELOG (#2580)Sam Clegg2020-01-103-8/+22
| | | | | | | | | | | | | | | | | The new version string looks like this: wasm-opt version 90 (version_90-18-g77329439d) The version reported here is the version from the CMakeLists.txt file followed by the git version in brackets. We verify that the main version here matches the CHANGELOG to prevent people from changing one without changeing the other. This will help with emscripten that wants to be able to programaticaly check the --version of binaryen tools. See https://github.com/emscripten-core/emscripten/issues/10175
* Fix emitting of .debug_abbrev (#2582)Alon Zakai2020-01-106-10/+14
| | | | | gimli-rs and perhaps also the spec require a final 0 to terminate the list. LLVM itself is fine without that.
* wasm2js: Do not convert x >>> 0 | 0 to x >>> 0 (#2581)Alon Zakai2020-01-105-20/+28
| | | | | | | | | | | | isBinary was used where we should only accept a signed binary, as removing the | 0 from an unsigned value may be incorrect. This does regress a few small things (as can be seen in the diff). If it's important we can add more sophisticated optimizations here, perhaps like an assumption that the signedness of a local never matters. Fixes emscripten-core/emscripten#10173
* DWARF support for multiple line tables (#2557)Alon Zakai2020-01-0915-31/+2571
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multiple tables appear to be emitted when linking files together. This fixes our support for that, which did not update their size properly. This required patching the YAML emitting code from LLVM in order to measure the size and then emit it, as that code is apparently not designed to handle changes in line table contents. Other minor fixes: * Set the flags for our dwarfdump command to emit the same as llvm-dwarfdump does with -v -all. * Add support for a few more opcodes, set_discriminator, set_basic_block, fixed_advance_pc, set_isa. * Handle a compile unit without abbreviations in the YAML code (again, apparently not something this LLVM code was intended to do). * Handle a compile unit with zero entries in the YAML code (ditto). * Properly set the AddressSize - we use the DWARFContext in a different way than LLVM expects, apparently. With this the emscripten test suite passes with -gforce_dwarf without crashing. My overall impression so from the the YAML code is that it probably isn't a long-term solution for us. Perhaps it may end up being scaffolding, that is, we can replace it with our own code eventually that is based on it, and remove most of the LLVM code. Before deciding that we should get everything working first, and this seems like the quickest path there.
* Remove implicit conversion operators from Type (#2577)Thomas Lively2020-01-0830-208/+217
| | | | | | | | | | * Remove implicit conversion operators from Type Now types must be explicitly converted to uint32_t with Type::getID or to ValueType with Type::getVT. This fixes #2572 for switches that use Type::getVT. * getVT => getSingle
* Remove git dependency (#2578)Sam Clegg2020-01-083-16/+20
| | | | | | | Only use git to set version number if .git directory is present. This means that for release archives the VERSION string will be used as-is. Fixes #2563
* Revert "Reland "Fix renaming in FixInvokeFunctionNamesWalker (#2513)" ↵Sam Clegg2020-01-078-462/+72
| | | | | (#2542)" (#2576) This reverts commit f62e171c38bea14302f9b79f7941a248ea704425.
* [NFC] Enforce use of `Type::` on type names (#2434)Thomas Lively2020-01-0777-2153/+2308
|
* [NFC] Clean up unnecessary `template`s in calls 🧹🧹🧹 (#2394)Thomas Lively2020-01-073-10/+9
|
* Fix debug build (#2561)Alon Zakai2020-01-061-46/+1
| | | | This requires removing some more LLVM code, which was only pulled in a debug build.
* DCE at the end of wasm2js (#2574)Alon Zakai2020-01-068-69/+265
| | | | | | By doing so we ensure that our calls to convert wasm types to JS types never try to convert an unreachable. Fixes #2558
* Do not print push/pop in stack IR (#2571)Heejin Ahn2020-01-063-9/+14
| | | | | This makes push and pop instructions not printed in the stack IR format to make it valid wat form. Push and pop are still generated in the stack IR in memory but not printed in the text format.
* Allow subtype in throw instruction (#2568)Heejin Ahn2020-01-065-48/+71
| | | | | This allows subtype for arguments of `throw`. This also renames `shouldBeSubTypeOrUnreachable` to `shouldBeSubTypeOrFirstIsUnreachable`, to be consistent with `shouldBeEqualOrFirstIsUnreachable`.
* Add line and col info to wast parser exceptions (#2570)Heejin Ahn2020-01-061-63/+80
| | | | | | This adds line and column info to wast parser exception messages to be more readable when they are encoutered. In other cases this makes existing line and column number more fine grained, or adds some helpful strings (if line and column info is not available).
* Skip liveness analysis if too many locals (#2560)Alon Zakai2020-01-066-0/+58
| | | | | | | | | | | | | | | The analysis currently uses a dense matrix. If there are >65535 locals then the indexes don't fit in a 32-bit type like a wasm32 index, which led to overflows and incorrect behavior. To avoid that, don't run passes with liveness analysis for now if they have that many locals. Note that skipping coalesce-locals (the main liveness-using pass) is not that bad, as we run it more than once, and it's likely that even if the first must be skipped, we can still run the second (which is after simplify- and reorder-locals, which can greatly reduce the local count). Fixes #2559
* Parse memarg in atomic.wait and atomic.notify (#2569)Heejin Ahn2020-01-036-21/+119
| | | | | | - Allow `atomic.notify` and `atomic.wait` instructions to parse memory arguments (`align` and `offset`) and print the offset in these instruction when writing binary, rather than assuming it to be 0 - Change arguments of `parseMemAttributes` to be references
* Generate push/pop in stack IR (#2566)Heejin Ahn2020-01-035-7/+17
| | | | | | | | | | | We have not been generating push and pop instructions in the stack IR. Even though they are not written in binary, they have to be in the stack IR to match the number of inputs and outputs of instructions. Currently `BinaryenIRWriter` is used both for stack IR generation and binary generation, so we should emit those instructions in `BinaryenIRWriter`. `BinaryenIRToBinaryWriter`, which inherits `BinaryenIRWriter`, does not do anything for push and pop instructions, so they are still not emitted in binary.
* Use FeatureSet instead of FeatureSet::Feature(NFC) (#2562)Heejin Ahn2020-01-023-35/+34
| | | | | This uses `FeatureSet` in place of `FeatureSet::Feature` when possible, making it possible for functions take a set of multiple features as one argument.
* Add support for reference types proposal (#2451)Heejin Ahn2019-12-30118-2284/+7083
| | | | | | | | | | | | This adds support for the reference type proposal. This includes support for all reference types (`anyref`, `funcref`(=`anyfunc`), and `nullref`) and four new instructions: `ref.null`, `ref.is_null`, `ref.func`, and new typed `select`. This also adds subtype relationship support between reference types. This does not include table instructions yet. This also does not include wasm2js support. Fixes #2444 and fixes #2447.
* Move Type-related functions into Type class (NFC) (#2556)Heejin Ahn2019-12-2915-128/+143
| | | | | | | | | | | Several type-related functions currently exist outside of `Type` class and thus in the `wasm`, effectively global, namespace. This moves these functions into `Type` class, making them either member functions or static functions. Also this renames `getSize` to `getByteSize` to make it not to be confused with `size`, which returns the number of types in multiple types. This also reorders the order of functions in `wasm-type.cpp` to match that of `wasm-type.h`.
* Add a dwarf updating test that runs -O4 which does a lot of code changes and ↵Alon Zakai2019-12-243-0/+4746
| | | | movements (#2552)
* Fix for binaryen.js getExpressionInfo on switch names (#2553)Brion Vibber2019-12-233-3/+6
| | | | | | | | | | | | Switch label names for br_table instructions were corrupted in the binaryen.js API layer, with each label cropped down to the number of characters that it is an index into the list. This was due to passing UTF8ToString as a callback method to Array.prototype.map, which passes the index as second parameter. The second parameter of UTF8ToString is the max number of bytes to copy, so the initial label came out as '', then 'l', then 'la', 'lab', etc. Corrected an existing test case that had the wrong output in it.
* version_90Alon Zakai2019-12-231-0/+3
|
* Refactor module element related functions (NFC) (#2550)Heejin Ahn2019-12-232-82/+72
| | | | This does something similar to #2489 for more functions, removing boilerplate code for each module element using template functions.
* Reorder git ignores (#2549)Alon Zakai2019-12-221-2/+2
|
* Fix memory size calculation in MemoryPacking (#2548)Heejin Ahn2019-12-202-2/+10
| | | | Because `memory.size` returns the size in number of pages, we have to multiply the size with the page size when converting `memory.init`.
* DWARF debug line updating (#2545)Alon Zakai2019-12-2022-173/+6946
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this, we can update DWARF debug line info properly as we write a new binary. To do that we track binary locations as we write. Each instruction is mapped to the location it is written to. We must also adjust them as we move code around because of LEB optimization (we emit a function or a section with a 5-byte LEB placeholder, the maximal size; later we shrink it which is almost always possible). writeDWARFSections() now takes a second param, the new locations of instructions. It then maps debug line info from the original offsets in the binary to the new offsets in the binary being written. The core logic for updating the debug line section is in wasm-debug.cpp. It basically tracks state machine logic both to read the existing debug lines and to emit the new ones. I couldn't find a way to reuse LLVM code for this, but reading LLVM's code was very useful here. A final tricky thing we need to do is to update the DWARF section's internal size annotation. The LLVM YAML writing code doesn't do that for us. Luckily it's pretty easy, in fixEmittedSection we just update the first 4 bytes in place to have the section size, after we've emitted it and know the size. This ignores debug lines with a 0 in the line, col, or addr, see WebAssembly/debugging#9 (comment) This ignores debug line offsets into the middle of instructions, which LLVM sometimes emits for some reason, see WebAssembly/debugging#9 (comment) Handling that would likely at least double our memory usage, which is unfortunate - we are run in an LTO manner, where the entire app's DWARF is present, and it may be massive. I think we should see if such odd offsets are a bug in LLVM, and if we can fix or prevent that. This does not emit "special" opcodes for debug lines. Those are purely an optimization, which I wanted to leave for later. (Even without them we decrease the size quite a lot, btw, as many lines have 0s in them...) This adds some testing that shows we can load and save fib2.c and fannkuch.cpp properly. The latter includes more than one function and has nontrivial code. To actually emit correct offsets a few minor fixes are done here: * Fix the code section location tracking during reading - the correct offset we care about is the body of the code section, not including the section declaration and size. * Fix wasm-stack debug line emitting. We need to update in BinaryInstWriter::visit(), that is, right before writing bytes for the instruction. That differs from * BinaryenIRWriter::visit which is a recursive function that also calls the children - so the offset there would be of the first child. For some reason that is correct with source maps, I don't understand why, but it's wrong for DWARF... * Print code section offsets in hex, to match other tools. Remove DWARFUpdate pass, which was useful for testing temporarily, but doesn't make sense now (it just updates without writing a binary). cc @yurydelendik
* Strip DWARF in finalize, to avoid keeping it around til later unnecessarily ↵Alon Zakai2019-12-201-0/+8
| | | | | | | | (#2544) Without this, the first wasm-opt invocation will remove it. But it can be very large, and we will soon start to automatically do updating on it when it exists, so avoid the work if we aren't actually building a final output with dwarf.
* Reland "Fix renaming in FixInvokeFunctionNamesWalker (#2513)" (#2542)Sam Clegg2019-12-208-72/+462
| | | | | | | | | | | | | | | | * Reland "Fix renaming in FixInvokeFunctionNamesWalker (#2513)" In the previous iteration of this change we were not calling `renameFunctions` for each of the functions we removed. The problem manifested itself when we rename the imported function to `emscripten_longjmp_jmpbuf` to `emscripten_longjmp`. In this case the import of `emscripten_longjmp` already exists so we remove the import of `emscripten_longjmp_jmpbuf` but we were not correclty calling renameFunctions to handle the rename of all the uses. Add an additional test case to cover the failures that we saw on the emscripten tree.