summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* Clear Mixedarena in ModuleUtils::clearModule (#2588)Heejin Ahn2020-01-131-0/+1
|
* Verify --version output matches CHANGELOG (#2580)Sam Clegg2020-01-101-1/+1
| | | | | | | | | | | | | | | | | The new version string looks like this: wasm-opt version 90 (version_90-18-g77329439d) The version reported here is the version from the CMakeLists.txt file followed by the git version in brackets. We verify that the main version here matches the CHANGELOG to prevent people from changing one without changeing the other. This will help with emscripten that wants to be able to programaticaly check the --version of binaryen tools. See https://github.com/emscripten-core/emscripten/issues/10175
* wasm2js: Do not convert x >>> 0 | 0 to x >>> 0 (#2581)Alon Zakai2020-01-101-2/+10
| | | | | | | | | | | | isBinary was used where we should only accept a signed binary, as removing the | 0 from an unsigned value may be incorrect. This does regress a few small things (as can be seen in the diff). If it's important we can add more sophisticated optimizations here, perhaps like an assumption that the signedness of a local never matters. Fixes emscripten-core/emscripten#10173
* DWARF support for multiple line tables (#2557)Alon Zakai2020-01-091-23/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multiple tables appear to be emitted when linking files together. This fixes our support for that, which did not update their size properly. This required patching the YAML emitting code from LLVM in order to measure the size and then emit it, as that code is apparently not designed to handle changes in line table contents. Other minor fixes: * Set the flags for our dwarfdump command to emit the same as llvm-dwarfdump does with -v -all. * Add support for a few more opcodes, set_discriminator, set_basic_block, fixed_advance_pc, set_isa. * Handle a compile unit without abbreviations in the YAML code (again, apparently not something this LLVM code was intended to do). * Handle a compile unit with zero entries in the YAML code (ditto). * Properly set the AddressSize - we use the DWARFContext in a different way than LLVM expects, apparently. With this the emscripten test suite passes with -gforce_dwarf without crashing. My overall impression so from the the YAML code is that it probably isn't a long-term solution for us. Perhaps it may end up being scaffolding, that is, we can replace it with our own code eventually that is based on it, and remove most of the LLVM code. Before deciding that we should get everything working first, and this seems like the quickest path there.
* Remove implicit conversion operators from Type (#2577)Thomas Lively2020-01-0830-208/+217
| | | | | | | | | | * Remove implicit conversion operators from Type Now types must be explicitly converted to uint32_t with Type::getID or to ValueType with Type::getVT. This fixes #2572 for switches that use Type::getVT. * getVT => getSingle
* Remove git dependency (#2578)Sam Clegg2020-01-081-1/+1
| | | | | | | Only use git to set version number if .git directory is present. This means that for release archives the VERSION string will be used as-is. Fixes #2563
* Revert "Reland "Fix renaming in FixInvokeFunctionNamesWalker (#2513)" ↵Sam Clegg2020-01-071-26/+15
| | | | | (#2542)" (#2576) This reverts commit f62e171c38bea14302f9b79f7941a248ea704425.
* [NFC] Enforce use of `Type::` on type names (#2434)Thomas Lively2020-01-0776-2049/+2204
|
* [NFC] Clean up unnecessary `template`s in calls 🧹🧹🧹 (#2394)Thomas Lively2020-01-073-10/+9
|
* DCE at the end of wasm2js (#2574)Alon Zakai2020-01-061-0/+3
| | | | | | By doing so we ensure that our calls to convert wasm types to JS types never try to convert an unreachable. Fixes #2558
* Do not print push/pop in stack IR (#2571)Heejin Ahn2020-01-061-0/+5
| | | | | This makes push and pop instructions not printed in the stack IR format to make it valid wat form. Push and pop are still generated in the stack IR in memory but not printed in the text format.
* Allow subtype in throw instruction (#2568)Heejin Ahn2020-01-061-47/+52
| | | | | This allows subtype for arguments of `throw`. This also renames `shouldBeSubTypeOrUnreachable` to `shouldBeSubTypeOrFirstIsUnreachable`, to be consistent with `shouldBeEqualOrFirstIsUnreachable`.
* Add line and col info to wast parser exceptions (#2570)Heejin Ahn2020-01-061-63/+80
| | | | | | This adds line and column info to wast parser exception messages to be more readable when they are encoutered. In other cases this makes existing line and column number more fine grained, or adds some helpful strings (if line and column info is not available).
* Skip liveness analysis if too many locals (#2560)Alon Zakai2020-01-063-0/+22
| | | | | | | | | | | | | | | The analysis currently uses a dense matrix. If there are >65535 locals then the indexes don't fit in a 32-bit type like a wasm32 index, which led to overflows and incorrect behavior. To avoid that, don't run passes with liveness analysis for now if they have that many locals. Note that skipping coalesce-locals (the main liveness-using pass) is not that bad, as we run it more than once, and it's likely that even if the first must be skipped, we can still run the second (which is after simplify- and reorder-locals, which can greatly reduce the local count). Fixes #2559
* Parse memarg in atomic.wait and atomic.notify (#2569)Heejin Ahn2020-01-032-21/+39
| | | | | | - Allow `atomic.notify` and `atomic.wait` instructions to parse memory arguments (`align` and `offset`) and print the offset in these instruction when writing binary, rather than assuming it to be 0 - Change arguments of `parseMemAttributes` to be references
* Generate push/pop in stack IR (#2566)Heejin Ahn2020-01-033-7/+8
| | | | | | | | | | | We have not been generating push and pop instructions in the stack IR. Even though they are not written in binary, they have to be in the stack IR to match the number of inputs and outputs of instructions. Currently `BinaryenIRWriter` is used both for stack IR generation and binary generation, so we should emit those instructions in `BinaryenIRWriter`. `BinaryenIRToBinaryWriter`, which inherits `BinaryenIRWriter`, does not do anything for push and pop instructions, so they are still not emitted in binary.
* Use FeatureSet instead of FeatureSet::Feature(NFC) (#2562)Heejin Ahn2020-01-023-35/+34
| | | | | This uses `FeatureSet` in place of `FeatureSet::Feature` when possible, making it possible for functions take a set of multiple features as one argument.
* Add support for reference types proposal (#2451)Heejin Ahn2019-12-3059-477/+1532
| | | | | | | | | | | | This adds support for the reference type proposal. This includes support for all reference types (`anyref`, `funcref`(=`anyfunc`), and `nullref`) and four new instructions: `ref.null`, `ref.is_null`, `ref.func`, and new typed `select`. This also adds subtype relationship support between reference types. This does not include table instructions yet. This also does not include wasm2js support. Fixes #2444 and fixes #2447.
* Move Type-related functions into Type class (NFC) (#2556)Heejin Ahn2019-12-2915-128/+143
| | | | | | | | | | | Several type-related functions currently exist outside of `Type` class and thus in the `wasm`, effectively global, namespace. This moves these functions into `Type` class, making them either member functions or static functions. Also this renames `getSize` to `getByteSize` to make it not to be confused with `size`, which returns the number of types in multiple types. This also reorders the order of functions in `wasm-type.cpp` to match that of `wasm-type.h`.
* Fix for binaryen.js getExpressionInfo on switch names (#2553)Brion Vibber2019-12-231-1/+4
| | | | | | | | | | | | Switch label names for br_table instructions were corrupted in the binaryen.js API layer, with each label cropped down to the number of characters that it is an index into the list. This was due to passing UTF8ToString as a callback method to Array.prototype.map, which passes the index as second parameter. The second parameter of UTF8ToString is the max number of bytes to copy, so the initial label came out as '', then 'l', then 'la', 'lab', etc. Corrected an existing test case that had the wrong output in it.
* Refactor module element related functions (NFC) (#2550)Heejin Ahn2019-12-232-82/+72
| | | | This does something similar to #2489 for more functions, removing boilerplate code for each module element using template functions.
* Fix memory size calculation in MemoryPacking (#2548)Heejin Ahn2019-12-201-1/+6
| | | | Because `memory.size` returns the size in number of pages, we have to multiply the size with the page size when converting `memory.init`.
* DWARF debug line updating (#2545)Alon Zakai2019-12-2011-46/+406
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this, we can update DWARF debug line info properly as we write a new binary. To do that we track binary locations as we write. Each instruction is mapped to the location it is written to. We must also adjust them as we move code around because of LEB optimization (we emit a function or a section with a 5-byte LEB placeholder, the maximal size; later we shrink it which is almost always possible). writeDWARFSections() now takes a second param, the new locations of instructions. It then maps debug line info from the original offsets in the binary to the new offsets in the binary being written. The core logic for updating the debug line section is in wasm-debug.cpp. It basically tracks state machine logic both to read the existing debug lines and to emit the new ones. I couldn't find a way to reuse LLVM code for this, but reading LLVM's code was very useful here. A final tricky thing we need to do is to update the DWARF section's internal size annotation. The LLVM YAML writing code doesn't do that for us. Luckily it's pretty easy, in fixEmittedSection we just update the first 4 bytes in place to have the section size, after we've emitted it and know the size. This ignores debug lines with a 0 in the line, col, or addr, see WebAssembly/debugging#9 (comment) This ignores debug line offsets into the middle of instructions, which LLVM sometimes emits for some reason, see WebAssembly/debugging#9 (comment) Handling that would likely at least double our memory usage, which is unfortunate - we are run in an LTO manner, where the entire app's DWARF is present, and it may be massive. I think we should see if such odd offsets are a bug in LLVM, and if we can fix or prevent that. This does not emit "special" opcodes for debug lines. Those are purely an optimization, which I wanted to leave for later. (Even without them we decrease the size quite a lot, btw, as many lines have 0s in them...) This adds some testing that shows we can load and save fib2.c and fannkuch.cpp properly. The latter includes more than one function and has nontrivial code. To actually emit correct offsets a few minor fixes are done here: * Fix the code section location tracking during reading - the correct offset we care about is the body of the code section, not including the section declaration and size. * Fix wasm-stack debug line emitting. We need to update in BinaryInstWriter::visit(), that is, right before writing bytes for the instruction. That differs from * BinaryenIRWriter::visit which is a recursive function that also calls the children - so the offset there would be of the first child. For some reason that is correct with source maps, I don't understand why, but it's wrong for DWARF... * Print code section offsets in hex, to match other tools. Remove DWARFUpdate pass, which was useful for testing temporarily, but doesn't make sense now (it just updates without writing a binary). cc @yurydelendik
* Strip DWARF in finalize, to avoid keeping it around til later unnecessarily ↵Alon Zakai2019-12-201-0/+8
| | | | | | | | (#2544) Without this, the first wasm-opt invocation will remove it. But it can be very large, and we will soon start to automatically do updating on it when it exists, so avoid the work if we aren't actually building a final output with dwarf.
* Reland "Fix renaming in FixInvokeFunctionNamesWalker (#2513)" (#2542)Sam Clegg2019-12-201-15/+26
| | | | | | | | | | | | | | | | * Reland "Fix renaming in FixInvokeFunctionNamesWalker (#2513)" In the previous iteration of this change we were not calling `renameFunctions` for each of the functions we removed. The problem manifested itself when we rename the imported function to `emscripten_longjmp_jmpbuf` to `emscripten_longjmp`. In this case the import of `emscripten_longjmp` already exists so we remove the import of `emscripten_longjmp_jmpbuf` but we were not correclty calling renameFunctions to handle the rename of all the uses. Add an additional test case to cover the failures that we saw on the emscripten tree.
* Binary format code section offset tracking (#2515)Alon Zakai2019-12-1910-8/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optionally track the binary format code section offsets, that is, when loading a binary, remember where each IR node was read from. This is necessary for DWARF debug info, as these are the offsets DWARF refers to. (Note that eventually we may want to do something else, like first read the DWARF and only then add debug info annotations into the IR in a more LLVM-like manner, but this is more straightforward and should be enough to update debug lines and ranges). This tracking adds noticeable overhead - every single IR node adds an entry in a map - so avoid it unless actually necessary. Specifically, if the user passes in -g and there are actually DWARF sections in the binary, and we are not about to remove those sections, then we need it. Print binary format code section offsets in text, when printing with -g. This will help debug and test dwarf support. It looks like ;; code offset: 0x7 as an annotation right before each node. Also add support for -g in wasm-opt tests (unlike a pass, it has just one - as a prefix). Helps #2400
* Compile Binaryen to WebAssembly (#2503)Daniel Wirtz2019-12-191-428/+498
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR enables compiling Binaryen to WebAssembly when building binaryen.js. Since WebAssembly is best compiled and instantiated asynchronously in browsers, it also adds a new mechanism to tell if respectively when the module is ready by means of one of the following: // Using a promise const binaryen = require("binaryen"); binaryen.ready.then(() => { ... use normally ... }); // Using await const binaryen = require("binaryen"); (async () => { await binaryen.ready; ... use normally ... })(); // Where top-level await is available const binaryen = await require("binaryen").ready; ... use normally ... One can also tell if Binaryen is already ready (for example when assuming it in follow-up code) by: if (/* we already know that */ binaryen.isReady) { ... use normally ... } else { throw Error("Binaryen is supposed to be ready here but isn't"); } The JS test cases have been updated accordingly by wrapping everything in a test function and invoking it once ready. Documentation will have to be updated as well to cover this of course. New file size is about 2.5mb, even though the Wasm becomes inlined into the JS file which makes distribution across different environments a lot easier. Also makes building binaryen (to either js or wasm) emit binaryen.js, and not binaryen_js.js etc. Supersedes and thus fixes #1381 With .ready it also fixes #2452
* Revert "Fix renaming in FixInvokeFunctionNamesWalker (#2513)" (#2541)Sam Clegg2019-12-191-13/+8
| | | This reverts commit f0a2e2c75c7bb3008f10b6edbb8dc4cfd27b7d28.
* DWARF parsing and writing support using LLVM (#2520)Alon Zakai2019-12-197-0/+256
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This imports LLVM code for DWARF handling. That code has the Apache 2 license like us. It's also the same code used to emit DWARF in the common toolchain, so it seems like a safe choice. This adds two passes: --dwarfdump which runs the same code LLVM runs for llvm-dwarfdump. This shows we can parse it ok, and will be useful for debugging. And --dwarfupdate writes out the DWARF sections (unchanged from what we read, so it just roundtrips - for updating we need #2515). This puts LLVM in thirdparty which is added here. All the LLVM code is behind USE_LLVM_DWARF, which is on by default, but off in JS for now, as it increases code size by 20%. This current approach imports the LLVM files directly. This is not how they are intended to be used, so it required a bunch of local changes - more than I expected actually, for the platform-specific stuff. For now this seems to work, so it may be good enough, but in the long term we may want to switch to linking against libllvm. A downside to doing that is that binaryen users would need to have an LLVM build, and even in the waterfall builds we'd have a problem - while we ship LLVM there anyhow, we constantly update it, which means that binaryen would need to be on latest llvm all the time too (which otherwise, given DWARF is quite stable, we might not need to constantly update). An even larger issue is that as I did this work I learned about how DWARF works in LLVM, and while the reading code is easy to reuse, the writing code is trickier. The main code path is heavily integrated with the MC layer, which we don't have - we might want to create a "fake MC layer" for that, but it sounds hard. Instead, there is the YAML path which is used mostly for testing, and which can convert DWARF to and from YAML and from binary. Using the non-YAML parts there, we can convert binary DWARF to the YAML layer's nice Info data, then convert that to binary. This works, however, this is not the path LLVM uses normally, and it supports only some basic DWARF sections - I had to add ranges support, in fact. So if we need more complex things, we may end up needing to use the MC layer approach, or consider some other DWARF library. However, hopefully that should not affect the core binaryen code which just calls a library for DWARF stuff. Helps #2400
* Fix trapping and dangling insts in memory packing (#2540)Heejin Ahn2019-12-191-4/+14
| | | | | | | | | | | This does two things: - Restore `visitDataDrop` handler deleted in #2529, but now we convert invalid `data.drop`s to not `unreachable` but `nop`. This conforms to the revised spec that `data.drop` on the active segment can be treated as a nop. - Make `visitMemoryInit` trap if offset or size are not equal to 0 or if the dest address is out of bounds. Otherwise drop all its argument. Fixes #2535.
* SIMD {i8x16,i16x8}.avgr_u instructions (#2539)Thomas Lively2019-12-1814-1/+68
| | | As specified in https://github.com/WebAssembly/simd/pull/126.
* Correctly clear memory / table info in clearModule (#2536)Heejin Ahn2019-12-172-2/+17
| | | | | | Currently `ModuleUtils::clearModule` does not clear `exists` flags in the memory and table, and running RoundTrip pass on any module that has a memory or a table fails as a result. This creates `clear` function in `Memory` and `Table` and makes `clearModule` call them.
* Fix renaming in FixInvokeFunctionNamesWalker (#2513)Sam Clegg2019-12-171-8/+13
| | | | | | | | | | | | | This fixes https://github.com/emscripten-core/emscripten/issues/9950. The issue only shows up when debug names are not present so most of the changes in CL come from disabling debug names in the lld tests. We want to make sure that wasm-emscripten-finalize runs fine without debug names so I think it makes most sense to test in this mode. The actual bugfix is in wasm-emscripten.cpp as part of the FixInvokeFunctionNamesWalker. The problem was the name of the function rather than is import name was being added to importRenames. This means that when debug names were present (and the two names were the same) we didn't see the bug.
* Implement 0-len/drop spec changes in bulk memory (#2529)Heejin Ahn2019-12-162-19/+22
| | | | | | | | | | | | | | | | | | | | | This implements recent bulk memory spec changes (WebAssembly/bulk-memory-operations#126) in Binaryen. Now `data.drop` is equivalent to shrinking a segment size to 0, and dropping already dropped segments or active segments (which are thought to be dropped in the beginning) is treated as a no-op. And all bounds checking is performed in advance, so partial copying/filling/initializing does not occur. I tried to implement `visitDataDrop` in the interpreter as `segment.data.clear();`, which is exactly what the revised spec says. I didn't end up doing that because this also deletes all contents from active segments, and there are cases we shouldn't do that: - `wasm-ctor-eval` shouldn't delete active segments, because it will store the changed contents back into segments - When `--fuzz-exec` is given to `wasm-opt`, it runs the module and compare the execution call results before and after transformations. But if running a module will nullify all active segments, applying any transformation to the module or re-running it does not make any sense.
* Improve RoundTrip pass: avoid copying (#2531)Alon Zakai2019-12-161-5/+3
|
* Write wasm/wast files with BINARYEN_PASS_DEBUG=3 (#2527)Heejin Ahn2019-12-131-3/+3
| | | | | Currently `BINARYEN_PASS_DEBUG=3` prints `.wasm` files but they are actually text wast files. This makes `BINARYEN_PASS_DEBUG=3` prints both wasm/wast files, where wasm contains a binary file and wast a text file.
* Remove redundant instructions in Flatten (#2524)Heejin Ahn2019-12-121-17/+23
| | | | | | | When the expression type is none, it does not seem to be necessary to make it a prelude and insert a nop. This also results in unnecessary blocks that contains an expression with a nop, which can be reduced to just the expression. This also adds some newlines to improve readability.
* Support stack overflow checks in standalone mode (#2525)Alon Zakai2019-12-123-5/+22
| | | | | | | | | In normal mode we call a JS import, but we can't import from JS in standalone mode. Instead, just trap in that case with an unreachable. (The error reporting is not as good in this case, but at least it catches all errors and halts, and the emitted wasm is valid for standalone mode.) Helps emscripten-core/emscripten#10019
* Make local.tee's type its local's type (#2511)Heejin Ahn2019-12-1222-57/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the current spec, `local.tee`'s return type should be the same as its local's type. (Discussions on whether we should change this rule is going on in WebAssembly/reference-types#55, but here I will assume this spec does not change. If this changes, we should change many parts of Binaryen transformation anyway...) But currently in Binaryen `local.tee`'s type is computed from its value's type. This didn't make any difference in the MVP, but after we have subtype relationship in #2451, this can become a problem. For example: ``` (func $test (result funcref) (local $0 anyref) (local.tee $0 (ref.func $test) ) ) ``` This shouldn't validate in the spec, but this will pass Binaryen validation with the current `local.tee` implementation. This makes `local.tee`'s type computed from the local's type, and makes `LocalSet::makeTee` get a type parameter, to which we should pass the its corresponding local's type. We don't embed the local type in the class `LocalSet` because it may increase memory size. This also fixes the type of `local.get` to be the local type where `local.get` and `local.set` pair is created from `local.tee`.
* Remove FunctionType (#2510)Thomas Lively2019-12-1162-1477/+848
| | | | | | | | | | | | | | | | | Function signatures were previously redundantly stored on Function objects as well as on FunctionType objects. These two signature representations had to always be kept in sync, which was error-prone and needlessly complex. This PR takes advantage of the new ability of Type to represent multiple value types by consolidating function signatures as a pair of Types (params and results) stored on the Function object. Since there are no longer module-global named function types, significant changes had to be made to the printing and emitting of function types, as well as their parsing and manipulation in various passes. The C and JS APIs and their tests also had to be updated to remove named function types.
* Fix loop parent computation in DataFlow.Graph (#2522)Heejin Ahn2019-12-111-0/+3
| | | | | | This fixes the parent-child relationship computation in `DataFlow.Graph` when there is a loop. This wasn't discovered until now because this is used in Souperify and Souperify only runs after Flatten pass, which produces redundant blocks between inside and outside of a loop.
* Add a RoundTrip pass (#2516)Alon Zakai2019-12-095-2/+92
| | | | | | This pass writes and reads the module. This shows the effects of converting to and back from the binary format, and will be useful in testing dwarf debug support (where we'll need to see that writing and reading a module preserves debug info properly).
* Fix comparison of none and unreachable types (#2514)Heejin Ahn2019-12-091-2/+2
| | | | | | | | | | | | | | | | | | | Currently `none` and `unreachable` types are stored as the same empty `{}` in src/wasm/wasm-type.cpp. This makes `Type::operator<` incorrectly when given `none` and `unreachable`, because it expands both given types and lexicographically compare them, when both of the expanded vector will be empty. This was found by the fuzzer. This line in `Modder::visitExpression` tries to retrieve candidates of the same type. Because we can't really compare these two types, if you give `unreachable` as the key, candidates of `none` type can be returned. This generates incorrect code that ends up failing in validation in a very weird way. It was hard to generate a small testcase to trigger this part because it was found by generating fuzzed code from a random data file. But I guess this fix is pretty straightforward. Fixes #2512.
* Use wat over wast for text format filenames (#2518)Sam Clegg2019-12-0810-17/+13
|
* Don't include `$` with names unless outputting to wat format (#2506)Sam Clegg2019-12-062-20/+26
| | | | | | | | | | | The `$` is not actually part of the name, its the marker that starts a name in the wat format. It can be confusing to see it show up when doing `cerr << name`, for example. This change has Print.cpp add the `$` which seem like the right place to do this. Plus it revealed a bunch of places where were not calling printName to escape all the names we were printing.
* Avoid errors in binaryen.js assertions builds, and enable ASSERTIONS in ↵Alon Zakai2019-12-061-0/+6
| | | | debug builds. (#2507)
* Include in minification all imports from modules starting with `wasi_` (#2509)Sam Clegg2019-12-051-3/+1
| | | | | | This allows us to support not just wasi_unstable but also the new wasi_snapshot_preview1 and beyond. See https://github.com/emscripten-core/emscripten/pull/9956
* Add some tracing to wasm-emscripten-finalize (#2505)Sam Clegg2019-12-053-9/+30
| | | | | Also fix, but in splitting the names of the trace channels. Obviously I can't write string.split correctly in C first time around.
* Add string parameter to WASM_UNREACHABLE (#2499)Sam Clegg2019-12-0556-420/+450
| | | | | This works more like llvm's unreachable handler in that is preserves information even in release builds.
* Add BYN_ENABLE_ASSERTSION option to allow assertions to be disabled. (#2500)Sam Clegg2019-12-049-6/+29
| | | | | | | | We always enable assertions by default, but this options allows for a build without them. Fix all errors in the ASSERTIONS=OFF build, even though we don't normally build this its good to keep it building.