summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* Add a HeapType method for getting the rec group index (#4480)Thomas Lively2022-01-272-2/+10
| | | | | Storing the rec group index on the HeapTypeInfo avoids having to do a linear scan through the rec group to find the index for a particular type. This will be important for isorecursive canonicalization, which uses rec group indices.
* [NFC] Split shallow and deep updating of CanonicalizationState (#4478)Thomas Lively2022-01-261-19/+41
| | | | | | | Since isorecursive canonicalization will happen incrementally in a bottom-up manner, it will be more efficient if can more precisely control which types get updated at each step. Refactor the `CanonicalizationState` interface to separately expose shallow updating, which updates only the top-level state, and deep updating of the use sites within HeapTypeInfos.
* [NFC] Templatize/generalize RuntimeExpressionRunner (#4477)Alon Zakai2022-01-261-9/+20
| | | | | | | | | Add a base class for it, that is templated and can be extended in general ways, and make callFunction templated on the runner to use as well. This allows the interpreter's behavior to be customized in a way that we couldn't so far. wasm-ctor-eval wants to use a special Runner when it evals a function, so that it can track certain operations, which this will enable.
* Remove NoExitRuntime pass (#4431)Alon Zakai2022-01-264-71/+0
| | | | After emscripten-core/emscripten#15905 lands Emscripten will no longer use it, and nothing else needs it AFAIK.
* Remove old EffectAnalyzer hacks for asm.js debugInfo (#4457)Alon Zakai2022-01-261-9/+1
| | | | | | | | In asm2wasm we modelled debugInfo using special imports. And we tried to not move them around much. Current debugInfo is tracked on instructions and is not affected by removing this. This may have some tiny effect beneficial effect on code size in debug builds, perhaps.
* [OptimizeInstructions] Combine some relational ops joined Or/And (Part 7-8) ↵Max Graey2022-01-261-12/+63
| | | | | | | | | | | (#4399) Final part of #4265 (i32(x) >= 0) & (i32(y) >= 0) ==> i32(x | y) >= 0 (i64(x) >= 0) & (i64(y) >= 0) ==> i64(x | y) >= 0 (i32(x) == -1) & (i32(y) == -1) ==> i32(x & y) == -1 (i64(x) == -1) & (i64(y) == -1) ==> i64(x & y) == -1
* Isorecursive type validation (#4475)Thomas Lively2022-01-262-43/+106
| | | | | | | When building isorecursive types, validate their relationships according to the rules described in https://github.com/WebAssembly/gc/pull/243. Specifically, supertypes must be declared before their subtypes to statically prevent cycles and child types must be declared either before or in the same recursion group as their parents.
* Asyncify: Use stack instead of recursive call to avoid stack overflow (#4433)Yuta Saito2022-01-251-71/+147
| | | | | Rewrite AsyncifyFlow.process to use stack instead of recursive call. This patch resolves #4401
* Make `TypeBuilder::build()` fallible (#4474)Thomas Lively2022-01-256-27/+85
| | | | | | | | | | | It is possible for type building to fail, for example if the declared nominal supertypes form a cycle or are structurally invalid. Previously we would report a fatal error and kill the program from inside `TypeBuilder::build()` in these situations, but this handles errors at the wrong layer of the code base and is inconvenient for testing the error cases. In preparation for testing the new error cases introduced by isorecursive typing, make type building fallible and add new tests for existing error cases. Also fix supertype cycle detection, which it turns out did not work correctly.
* Parse, create, and print isorecursive recursion groups (#4464)Thomas Lively2022-01-215-28/+265
| | | | | | | | | | | | | In `--hybrid` isorecursive mode, associate each defined type with a recursion group, represented as a `(rec ...)` wrapping the type definitions in the text format. Parse that text format, create the rec groups using a new TypeBuilder method, and print the rec groups in the printer. The only semantic difference rec groups currently make is that if one type in a rec group will be included in the output, all the types in that rec group will be included. This is because changing a rec group in any way (for example by removing a type) changes the identity of the types in that group in the isorecursive type system. Notably, rec groups do not yet participate in validation, so `--hybrid` is largely equivalent to `--nominal` for now.
* StackCheck: Add argument stack-check-handler call (#4471)Sam Clegg2022-01-211-4/+9
| | | | | | | This function call now takes the address (which by defintion is outside of the stack range) that the program was attempting to set SP to. This allows emscripten to provide a more useful error message on stack over/under flow.
* Create `ParentIndexIterator` to reduce iterator boilerplate (#4469)Thomas Lively2022-01-213-66/+108
| | | | | | | Add a utility class for defining all the common operations like pre- and post- increment and decrement, addition and subtraction, and assigning addition and subtraction for iterators that are comprised of a parent object and an index into that parent object. Use the new utility to reduce the boilerplate in wasm-type.h. Add a new test of the iterator behavior.
* Reset global type state between tests (#4468)Thomas Lively2022-01-202-0/+18
| | | | | Add a `destroyAllTypes` function to clear the global state of the type system and use it in a custom gtest test fixture to ensure that each test starts and ends with a fresh state.
* [OptimizeInstructions] Combine some relational ops joined Or/And (Part 5-6) ↵Max Graey2022-01-201-6/+59
| | | | | | | | | (#4372) (i32(x) >= 0) | (i32(y) >= 0) ==> i32(x & y) >= 0 (i64(x) >= 0) | (i64(y) >= 0) ==> i64(x & y) >= 0 (i32(x) != -1) | (i32(y) != -1) ==> i32(x & y) != -1 (i64(x) != -1) | (i64(y) != -1) ==> i64(x & y) != -1
* Remove unused `isNominal` field on HeapTypeInfo (#4465)Thomas Lively2022-01-204-63/+13
| | | | | | | | This field was originally added with the goal of allowing types from multiple type systems to coexist by determining the type system on a per-type level rather than globally. This goal was never fully achieved and the `isNominal` field is not used outside of tests. Now that we are working on implementing the hybrid isorecursive system, it does not look like having types from multiple systems coexist will be useful in the near term, so clean up this tech debt.
* Update heuristic for finding `__stack_pointer` to allow exports. NFC (#4467)Sam Clegg2022-01-201-6/+7
| | | | | | | | | | | There is no reason the `__stack_pointer` global can't be exported from the module, and in fact I'm experimenting with a non-relocatable main module that requires this. See https://github.com/emscripten-core/emscripten/issues/12682 This heuristic still kind of sucks but should always be good enough for llvm output that always puts the stack pointer first.
* Introduce gtest (#4466)Thomas Lively2022-01-201-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | Add gtest as a git submodule in third_party and integrate it into the build the same way WABT does. Adds a new executable, `binaryen-unittests`, to execute `gtest_main`. As a nontrivial example test, port one of the `TypeBuilder` tests from example/ to gtest/. Using gtest has a number of advantages over the current example tests: - Tests are compiled and linked at build time rather than runtime, surfacing errors earlier and speeding up test execution. - Tests are all built into a single binary, reducing overall link time and further reducing test overhead. - Tests are built from the same CMake project as the rest of Binaryen, so compiler settings (e.g. sanitizers) are applied uniformly rather than having to be separately set via the COMPILER_FLAGS environment variable. - Using the industry-standard gtest rather than our own script reduces our maintenance burden. Using gtest will lower the barrier to writing C++ tests and will hopefully lead to us having more proper unit tests.
* SAFE_HEAP: Avoid annotating any function reachable from start function (#4463)Sam Clegg2022-01-191-23/+32
| | | | | | Since https://reviews.llvm.org/D117412 landed it has causes a bunch of SAFE_HEAP tests in emscripten to start failing, because `__wasm_apply_data_relocs` can now sometimes be called from with `__wasm_init_memory` as opposed to directly from the start function.
* Add a `--hybrid` type system option (#4460)Thomas Lively2022-01-193-0/+13
| | | | | Eventually this will enable the isorecursive hybrid type system described in https://github.com/WebAssembly/gc/pull/243, but for now it just throws a fatal error if used.
* Add --no-emit-metadata option to wasm-emscripten-finalize (#4450)Sam Clegg2022-01-191-3/+14
| | | | | | This is useful for the case where we might want to finalize without extracting metadata. See: https://github.com/emscripten-core/emscripten/pull/15918
* Allow import mutable globals used in Asyncify pass (#4427)かめのこにょこにょこ2022-01-141-10/+25
| | | | | | | | | | | This PR is part of the solution to emscripten-core/emscripten#15594. emscripten Asyncify won't work properly in side modules, because the globals, __asyncify_state and __asyncify_data, are not synchronized between main-module and side-modules. A new pass arg, asyncify-side-module, is added to make __asyncify_state and __asyncify_data imported in the instrumented wasm.
* Refactor ModuleUtils::collectHeapTypes (#4455)Thomas Lively2022-01-149-68/+103
| | | | | Update the API to make both the type indices and optimized sorting optional. It will become more important to avoid unnecessary sorting once isorecursive types have been implemented because they will make the sorting more complicated.
* Revert "[OptimizeInstructions] Optimize zero sized bulk memory ops even ↵Thomas Lively2022-01-141-21/+5
| | | | | without "ignoreImplicitTraps" (#4295)" (#4459) This reverts commit 5cf3521708cfada341285414df2dc7366d7e5454.
* Add fast paths for Literals::getType (#4454)Alon Zakai2022-01-141-0/+6
| | | | | In the common case, avoid allocating a vector and calling malloc. This makes us over 3x faster on the benchmark in #4452
* Optimize Literal constructors and destructor (#4456)Alon Zakai2022-01-141-47/+63
| | | | | | | Handle the isBasic() case first - that inlined function is very fast to call, and it is the common case. Also, do not do unnecessary work there: just write out what we need, instead of always doing a memcpy of 16 bytes. This makes us over 2x faster on the benchmark in #4452
* Add a fast path to isSubType (#4453)Alon Zakai2022-01-141-0/+8
| | | | | We call this very frequently in the interpreter. This is a 25% speedup on the benchmark in #4452
* [NFC] Move ModuleUtils::collectHeapTypes to a .cpp file (#4458)Thomas Lively2022-01-133-152/+178
| | | In preparation for the refactoring in #4455.
* LiteralList => Literals (#4451)Alon Zakai2022-01-136-26/+30
| | | | | | | LiteralList overlaps with Literals, but is less efficient as it is not a SmallVector. Add reserve/capacity methods to SmallVector which are now necessary to compile.
* [OptimizeInstructions] Optimize zero sized bulk memory ops even without ↵Max Graey2022-01-121-5/+21
| | | | "ignoreImplicitTraps" (#4295)
* [ctor-eval] Eval functions with params if ignoring external input (#4446)Alon Zakai2022-01-121-6/+24
| | | | | | | | | | | | | | | | | When ignoring external input, assume params have a value of 0. This makes it possible to eval main(argc, argv) if one is careful and does not actually use those values. This is basically a workaround for main always receiving argc/argv, even if the C code has no args (in that case the compiler emits __original_main for the user's main, and wraps it with a main that adds the args, hence the problem). This is similar to the existing support for handling wasi_args_get when ignoring external input, although it just sets values of zeros for the params. Perhaps it could check for main() specifically and return 1 for argc and a proper buffer for argv somehow, but I think if a program wants to use --ignore-external-input it can avoid actually reading argc/argv.
* [ctor-eval] Followup refactoring to use std::optional for EvalCtorOutcome ↵Alon Zakai2022-01-121-21/+16
| | | | (#4448)
* [ctor-eval] Eval functions with a return value (#4443)Alon Zakai2022-01-121-24/+49
| | | This is necessary for e.g. main() which returns an i32.
* [ctor-eval] Stop if there are any memory.init instructions (#4442)Alon Zakai2022-01-114-57/+99
| | | | | | | | This tool depends (atm) on flattening memory segments. That is not compatible with memory.init which cares about segment identities. This changes flatten() only by adding the check for MemoryInit. The rest is unchanged, although I saw the other two params are not needed and I removed them while I was there.
* [ctor-eval] Add an option to keep some exports (#4441)Alon Zakai2022-01-111-15/+39
| | | | | | | | | | | | | | | | | | | | | | By default wasm-ctor-eval removes exports that it manages to completely eval (if it just partially evals then the export remains, but points to a function with partially-evalled contents). However, in some cases we do want to keep the export around even so, for example during fuzzing (as the fuzzer wants to call the same exports before and after wasm-ctor-eval runs) and also if there is an ABI we need to preserve (like if we manage to eval all of main()), or if the function returns a value (which we don't support yet, but this is a PR to prepare for that). Specifically, there is now a new option: --kept-exports foo,bar That is a list of exports to keep around. Note that when we keep around an export after evalling the ctor we make the export point to a new function. That new function just contains a nop, so that nothing happens when it is called. But the original function is kept around as it may have other callers, who we do not want to modify.
* [ctor-eval] Fix evalling of overlapping table segments (#4440)Alon Zakai2022-01-111-19/+25
|
* [ctor-eval] Partial evaluation (#4438)Alon Zakai2022-01-112-26/+184
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This lets us eval part of a function but not all, which is necessary to handle real-world things like __wasm_call_ctors in LLVM output, as that is the single ctor that is exported and it has calls to the actual ctors. To do so, we look for a toplevel block and execute its items one by one, in a FunctionScope. If we stop in the middle, then we are performing a partial eval. In that case, we only remove the parts of the function that we removed, and we also serialize the locals whose values we read from the FunctionScope. For example, consider this: function foo() { return 10; } function __wasm_call_ctors() { var x; x = foo(); x++; // We stop evalling here. import1(); import2(x); } We can eval x = foo() and x++, but we must stop evalling when we reach the first of those imports. The partially-evalled function then looks like this: function __wasm_call_ctors() { var x; x = 11; import1(); import2(x); } That is, we evalled two lines of executing code and simply removed them, and then we wrote out the value of the local at that point, and then the rest of the code in the function is as it used to be.
* SafeHeap: Avoid instrumenting functions directly called from the "start" (#4439)Sam Clegg2022-01-101-20/+36
|
* Escape \t as well as \n when writing JSON output. (#4437)Sam Clegg2022-01-101-0/+5
| | | | | | | | As it happens, this doesn't (normally) break the resulting EM_ASM or EM_JS strings because (IIUC) JS supports the tab literal inside of strings as well as "\t". However, it's better to preserve the original text so that it looks the same in the JS file as it did in the original source.
* Fix emscripten build by removing dummy atexit function (#4435)Sam Clegg2022-01-091-4/+0
| | | | | | | | | | | | | | | | | | | | | | | Since https://github.com/emscripten-core/emscripten/pull/15905 landed emscripten now includes its own dummy atexit function when building with EXIT_RUNTIME=0. This dummy function conflicts with the emscripten-provided one: ``` wasm-ld: error: duplicate symbol: atexit >>> defined in CMakeFiles/binaryen_wasm.dir/src/binaryen-c.cpp.o >>> defined in ...wasm32-emscripten/lto/libnoexit.a(atexit_dummy.o) ``` Normally overriding symbols from libc does not causes issues but one needs to be sure to override all the symbols in a given object file so that the object in question (atexit_dummy.o) does not get linked in. In this case some other symbol being defined in in atexit_dummy.o (e.g. __cxa_atexit) is likely the cause of the conflict. Overriding symbols from libc is likely to break in this way as the libc evolves, and since emscripten is now providing a dummy, just as we want, its better/safer to simply remove our dummy.
* [ctor-eval] Switch logging from stderr to stdout (#4432)Alon Zakai2022-01-071-7/+7
| | | | | This logging is central to what this tool does, and not optional, so stdout makes more sense I think. Also, as I'm re-integrating this on the emscripten side, this makes it simpler.
* Warn about and ignore empty local/param names in name section (#4426)Alon Zakai2022-01-071-3/+17
| | | | | | | Fixes the crash in #4418 Also replace the .at() there with better logic to handle imported functions. See WebAssembly/wabt#1799 for details on why wabt sometimes emits this.
* [ctor-eval] Eval and store changes to globals (#4430)Alon Zakai2022-01-071-16/+11
| | | | | | | | | | This is necessary for being able to optimize real-world code, as it lets us use the stack pointer for example. With this PR we allow changes to globals, and we simply store the final state of the global in the global at the end. Basically the same as we do for memory, but for globals. Remove a test that now fails ("imported2"). Replace it with a nicer test of saving the values of globals. Also add a test for an imported global, which we do not allow (we never did, but I don't see a test for it).
* [ctor-eval] Add --ignore-external-input option (#4428)Alon Zakai2022-01-062-11/+73
| | | | | | | | | | | | This is meant to address one of the main limitations of wasm-ctor-eval in emscripten atm, that libc++ global ctors will read env vars, which means they call an import, which stops us from evalling, emscripten-core/emscripten#15403 (comment) To handle that, this adds an option to ignore external input. When set, we can assume that no env vars will be read, no reading from stdin, no arguments to main(), etc. Perhaps these could each be separate options, but I think keeping it simple for now might be good enough.
* [ctor-eval] Refactor an applyToModule() method instead of hacks [NFC] (#4425)Alon Zakai2022-01-061-19/+38
| | | | | | | Previously this would hackishly apply all execution changes to the memory all the time, and then "undo" it by saving the state before and copying that in. Instead, this PR makes execution write into a side buffer, and now there is a clear method for when we want to actually apply the results to the module.
* [C API] Fix BinaryenTypeCreate argument numTypes type (#4417)chai20102022-01-062-3/+3
| | | All other numXxxs argument use BinaryenIndex type.
* [ctor-eval] Remove stack hacks (#4429)Alon Zakai2022-01-061-55/+2
| | | | | | | | | Remove some hackish code for fastcomp's stack handling. The stack pointer arrives in an imported global there. Upstream does not do this, so this code is completely unneeded these days (and, frankly, kind of scary as I read it now... it modeled the stack as separate memory from the heap...). Remove the tests for this as well. I verified that there was nothing else in those tests that we need to keep.
* Add categories to --help text (#4421)Alon Zakai2022-01-0516-13/+216
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The general shape of the --help output is now: ======================== wasm-foo Does the foo operation ======================== wasm-foo opts: -------------- --foo-bar .. Tool opts: ---------- .. The options are now in categories, with the more specific ones - most likely to be wanted by the user - first. I think this makes the list a lot less confusing. In particular, in wasm-opt all the opt passes are now in their own category. Also add a script to make it easy to update the help tests.
* Turn an assertion on not colliding with an internal name into an error (#4422)Alon Zakai2022-01-052-3/+6
| | | | | | Without this, the result in a build without assertions might be quite confusing. See #4410 Also make the internal names more obviously internal names.
* Add binary format parse check for imported function types (#4423)Alon Zakai2022-01-051-1/+7
| | | | | Without this we hit an assertion later, which is less clear. See #4413
* [EH] Fixup nested pops after reading stacky binary (#4420)Heejin Ahn2022-01-043-45/+55
| | | | | | When reading stacky code in the binary reader, we create `block`s to make it fit into Binaryen AST, within which `pop`s can be nested, making the resulting AST invalid. This PR runs the fixup function after reading each `Try` to fix this.