summaryrefslogtreecommitdiff
path: root/src/tools
Commit message (Collapse)AuthorAgeFilesLines
* [wasm-split] Add --print-profile option (#4771)sps-gold2022-07-253-19/+118
| | | | | | | | | | | | | | | | | | | | | | | There are several reasons why a function may not be trained in deterministically. So to perform quick validation we need to inspect profile.data (another ways requires split to be performed). However as profile.data is a binary file and is not self sufficient, so we cannot currently use it to perform such validation. Therefore to allow quick check on whether a particular function has been trained in, we need to dump profile.data in a more readable format. This PR, allows us to output, the list of functions to be kept (in main wasm) and those split functions (to be moved to deferred.wasm) in a readable format, to console. Added a new option `--print-profile` - input path to orig.wasm (its the original wasm file that will be used later during split) - input path to profile.data that we need to output optionally pass `--unescape` to unescape the function names Usage: ``` binaryen\build>bin\wasm-split.exe test\profile_data\MY.orig.wasm --print-profile=test\profile_data\profile.data > test\profile_data\out.log ``` note: meaning of prefixes `+` => fn to be kept in main wasm `-` => fn to be split and moved to deferred wasm
* Remove basic reference types (#4802)Thomas Lively2022-07-205-112/+28
| | | | | | | | | Basic reference types like `Type::funcref`, `Type::anyref`, etc. made it easy to accidentally forget to handle reference types with the same basic HeapTypes but the opposite nullability. In principle there is nothing special about the types with shorthands except in the binary and text formats. Removing these shorthands from the internal type representation by removing all basic reference types makes some code more complicated locally, but simplifies code globally and encourages properly handling both nullable and non-nullable reference types.
* [Strings] Add feature flag for Strings proposal (#4766)Alon Zakai2022-06-301-0/+1
|
* Fix more no-assertions warnings (#4765)Alon Zakai2022-06-302-1/+3
|
* [Strings] Add string proposal types (#4755)Alon Zakai2022-06-293-2/+21
| | | | | | | | This starts to implement the Wasm Strings proposal https://github.com/WebAssembly/stringref/blob/main/proposals/stringref/Overview.md This just adds the types.
* Disallow --nominal with GC (#4758)Thomas Lively2022-06-281-0/+6
| | | | | | | | | | | Nominal types don't make much sense without GC, and in particular trying to emit them with typed function references but not GC enabled can result in invalid binaries because nominal types do not respect the type ordering constraints required by the typed function references proposal. Making this change was mostly straightforward, but required fixing the fuzzer to use --nominal only when GC is enabled and required exiting early from nominal-only optimizations when GC was not enabled. Fixes #4756.
* First class Data Segments (#4733)Ashley Nelson2022-06-214-40/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Updating wasm.h/cpp for DataSegments * Updating wasm-binary.h/cpp for DataSegments * Removed link from Memory to DataSegments and updated module-utils, Metrics and wasm-traversal * checking isPassive when copying data segments to know whether to construct the data segment with an offset or not * Removing memory member var from DataSegment class as there is only one memory rn. Updated wasm-validator.cpp * Updated wasm-interpreter * First look at updating Passes * Updated wasm-s-parser * Updated files in src/ir * Updating tools files * Last pass on src files before building * added visitDataSegment * Fixing build errors * Data segments need a name * fixing var name * ran clang-format * Ensuring a name on DataSegment * Ensuring more datasegments have names * Adding explicit name support * Fix fuzzing name * Outputting data name in wasm binary only if explicit * Checking temp dataSegments vector to validateBinary because it's the one with the segments before we processNames * Pass on when data segment names are explicitly set * Ran auto_update_tests.py and check.py, success all around * Removed an errant semi-colon and corrected a counter. Everything still passes * Linting * Fixing processing memory names after parsed from binary * Updating the test from the last fix * Correcting error comment * Impl kripken@ comments * Impl tlively@ comments * Updated tests that remove data print when == 0 * Ran clang format * Impl tlively@ comments * Ran clang-format
* Reducer: Support --hybrid (#4726)Alon Zakai2022-06-141-0/+3
|
* [Parser] Begin parsing modules (#4716)Thomas Lively2022-06-101-0/+6
| | | | | | | | | | | Implement the basic infrastructure for the full WAT parser with just enough detail to parse basic modules that contain only imported globals. Parsing functions correspond to elements of the grammar in the text specification and are templatized over context types that correspond to each phase of parsing. Errors are explicitly propagated via `Result<T>` and `MaybeResult<T>` types. Follow-on PRs will implement additional phases of parsing and parsing for new elements in the grammar.
* Fuzzer: Add support for creating structs and arrays in makeConst (#4707)Alon Zakai2022-06-011-8/+20
| | | | | | | | #4659 adds a testcase with an import of (ref $struct). This could cause an error in the fuzzer, since it wants to remove imports (because the various fuzzers cannot pass in custom imports - they want to just run the wasm). When it tries to remove that import it tries to create a constant for a struct reference, and fails. To fix that, add enough support to create structs and arrays at least in the simple case where all their fields are defaultable.
* Fuzzer: Refactor makeConst into separate functions [NFC] (#4709)Alon Zakai2022-06-012-85/+118
| | | This just moves code around + adds assertions.
* Remove renameMainArgcArgv from wasm-emscripten-finalize (#4700)Sam Clegg2022-05-311-6/+0
| | | | | | | | | This part to finalize is currently not used and was added in preparation for https://reviews.llvm.org/D75277. However, the better solution to dealing with this alternative name for main is on the emscripten side. The main reason for this is that doing the rename here in binaryen would require finalize to always re-write the binary, which is expensive.
* Validator: Check features for ref.null's type (#4677)Alon Zakai2022-05-181-0/+2
|
* [GC Fuzzing] Avoid non-nullable eqref without GC (#4675)Alon Zakai2022-05-181-2/+22
| | | | | | With only reference types but not GC, we cannot easily create a constant for eqref for example. Only GC adds i31.new etc. To avoid assertions in the fuzzer, avoid randomly picking (ref eq) etc., that is, keep it nullable so that we can emit a (ref.null eq) if we need a constant value of that type.
* wasm-reduce: Fix order in shrinkByReduction call (#4673)Alon Zakai2022-05-171-1/+4
| | | | | | The old code would short-circuit and not do anything after we managed any reduction in the loop here. That would end up doing entire iterations of the whole pipeline before removing another element segment, which could be slow.
* [Fuzzer] Reduce trap probability in function ref fallback code (#4653)Alon Zakai2022-05-161-10/+15
| | | | | | Also improve comments. As suggested in #4647
* [Fuzzer] Fix another reference types vs gc types issue (#4647)Alon Zakai2022-05-061-36/+37
| | | | | | | | | | Diff without whitespace is smaller. We can't emit HeapType::data without GC. Fixing that by switching to func, another problem was uncovered: makeRefFuncConst had a TODO to handle the case where we need a function to refer to but have created none yet. In fact that TODO was done at the end of the function. Fix up the logic in between to actually get there.
* Fix fuzzer's choosing of reference types (#4642)Alon Zakai2022-05-051-7/+18
| | | | | | * Don't emit "i31" or "data" if GC is not enabled, as only the GC feature adds those. * Don't emit "any" without GC either. While it is allowed, fuzzer limitations prevent this atm (see details in comment - it's fixable).
* Remove externref (#4633)Thomas Lively2022-05-044-38/+4
| | | | | | Remove `Type::externref` and `HeapType::ext` and replace them with uses of anyref and any, respectively, now that we have unified these types in the GC proposal. For backwards compatibility, continue to parse `extern` and `externref` and maintain their relevant C API functions.
* wasm-reduce: Try to remove functions from a random place (#4612)Alon Zakai2022-04-251-7/+32
| | | | | | Previously we'd only try to remove functions from index 0, so we missed some opportunities. With this change we still go through all the functions if things go well, but we start from a deterministic random location in the vector.
* [NominalFuzzing] Fix TranslateToFuzzReader::getSubType(Rtt) (#4604)Alon Zakai2022-04-211-0/+6
| | | | Randomly selecting a depth is ok for structural typing, but in nominal it must match the actual hierarchy of types.
* [NominalFuzzing] Don't compare nominal types in the fuzzer (#4603)Alon Zakai2022-04-211-3/+8
| | | | | The same module will have a different type after some transformations, even though that is not observable, like --roundtrip. Basically, we should not be comparing types between separate modules, which is what the fuzzer does.
* [SIMD] Make swizzle's opcode name consistent (NFC) (#4585)Heejin Ahn2022-04-091-1/+1
| | | | Other opcode ends with `Inxm` or `Fnxm` (where n and m are integers), while `i8x16.swizzle`'s opcode name doesn't have an `I` in there.
* [NFC] Refactor Feature::All to match FeatureSet.setAll() (#4557)Alon Zakai2022-03-311-3/+3
| | | | | | | | | | | As we recently noted in #4555, that Feature::All and FeatureSet.setAll() are different is potentially confusing... I think the best thing is to make them identical. This does that, and adds a new Feature::AllPossible which is everything possible and not just the set of all features that are enabled by -all. This undoes part of #4555 as now the old/simpler code works properly.
* [Wasm GC] Fix non-nullable tuples (#4555)Alon Zakai2022-03-301-3/+3
| | | | | | Apply the same logic to tuple fields as we do for all other fields, when checking whether a non-nullable value is valid. Fixes #4554
* [Wasm GC] Signature Pruning (#4545)Alon Zakai2022-03-251-0/+1
| | | | | | | | | | | | | This adds a new signature-pruning pass that prunes parameters from signature types where those parameters are never used in any function that has that type. This is similar to DeadArgumentElimination but works on a set of functions, and it can handle indirect calls. Also move a little code from SignatureRefining into a shared place to avoid duplication of logic to update signature types. This pattern happens in j2wasm code, for example if all method functions for some virtual method just return a constant and do not use the this pointer.
* Add support for extended-const proposal (#4529)Sam Clegg2022-03-191-0/+1
| | | See https://github.com/WebAssembly/extended-const
* Fix errors when building in C++20 mode (#4528)Jakub Szewczyk2022-03-181-2/+2
| | | | | | | * use [[noreturn]] available since C++11 instead of compiler-specific attributes * replace deprecated std::is_pod with is_trivial&&is_standard_layout (also available since C++11/14) * explicitly capture this in [=] lambdas * extra const functions in FeatureSet, fix implicit cast warning by using the features field directly * Use CMAKE_CXX_STANDARD to ensure the C++ standard parameter is set on all targets, remove manual compiler flag workaround.
* Clarify in tools help message that -O == -Os. (#4516)t4lz2022-02-161-11/+18
| | | | | Introduce static consts with PassOptions Defaults. Add assertion to verify that the default options are the Os options. Also update the text in relevant tests.
* Make IndexedTypeNameGenerator more powerful (#4511)Thomas Lively2022-02-091-2/+2
| | | | | Allow IndexedTypeNameGenerator to be configured with a custom prefix and also allow it to be parameterized with an explicit fallback generator. This allows multiple IndexedTypeNameGenerators to be composed together, for example.
* [wasm-split] Add an --asyncify option (#4513)Thomas Lively2022-02-093-0/+20
| | | | | | | Add an option for running the asyncify transformation on the primary module emitted by wasm-split. The idea is that the placeholder functions should be able to unwind the stack while the secondary module is asynchronously loaded, then once the placeholder functions have been patched out by the secondary module the stack should be rewound and end up in the correct secondary function.
* Fuzz structural type canonicalization (#4510)Thomas Lively2022-02-091-14/+308
| | | | | | | Add a new fuzz checker to wasm-type-fuzzer that builds copies of the originally built types, randomly selecting for each child type from all potential sources, including both the originally built types and the not-yet-built duplicate types. After building the new types, check that they are indeed identical to the old types, which means that nothing has gone wrong with canonicalization.
* Print recursion groups in wasm-fuzz-types (#4509)Thomas Lively2022-02-081-0/+17
|
* Generate heap type names when printing types (#4503)Thomas Lively2022-02-071-1/+22
| | | | | | | | | | | | | | | | | | | | | | The previous printing system in the Types API would print the full recursive structure of a Type or HeapType with special markers using de Bruijn indices to avoid infinite recursion and a separate special marker for when the size exceeded an arbitrary upper limit. In practice, the types printed by that system were not human readable, so all that complexity was not useful. Replace that system with a new system that always emits a HeapType name rather than recursing into the structure of inner HeapTypes. Add methods for printing Types and HeapTypes with custom HeapType name generators. Also add a new wasm-type-printing.h header with off-the-shelf type name generators that implement simple naming schemes sufficient for tests and the type fuzzer. Note that these new printing methods and the old printing methods they augment are not used for emitting text modules. Printing types as part of expressions and modules is handled by separate code in Print.cpp and the printing API modified in this PR is mostly used for debugging. However, the new printing methods are general enough that Print.cpp should be able to use them as well, so update the format used to print types in the modified printing system to match the text format in anticipation of making that change in a follow-up PR.
* [NFC] Move type fuzzer method definitions out of line (#4505)Thomas Lively2022-02-041-91/+102
| | | | This makes it easier to get an overview of what methods exist by looking at the shorter struct definition.
* Isorecursive type fuzzing (#4501)Thomas Lively2022-02-042-25/+79
| | | | | | | | | | Add support for isorecursive types to wasm-fuzz-types by generating recursion groups and ensuring that children types are only selected from candidates through the end of the current group. For non-isorecursive systems, treat all the types as belonging to a single group so that their behavior is unchanged. Also fix two small bugs found by the fuzzer: LUB calculation was taking the wrong path for isorecursive types and isorecursive validation was not handling basic heap types properly.
* wasm-reduce: Add newer passes (#4502)Alon Zakai2022-02-031-0/+4
| | | | | | | | | | | | | | | | These might help reduction. Most newer passes, like say --type-refining, are not going to actually help by themselves without other passes, so those are not added (they get run in the -O2 etc. modes, which at least gives them a chance to help). DeadArgumentElimination: Might help by itself, if just removing arguments reduces code size. In some cases applying constants may increase code size, though, but the -optimizing variant helps there. GlobalTypeOptimization: This can remove type fields which can shrink the type section by a lot. This is the reason I realized I should open this PR, when I happened to notice that running that pass manually after reduction helped a lot more. SimplifyGlobals: Can remove unused globals, merge identical immutable ones, etc., all of which can help code size directly.
* [Wasm GC] [ctor-eval] Evaluate and serialize GC data (#4491)Alon Zakai2022-02-031-4/+169
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This ended up simpler than I thought. We can simply emit global and local data as we go, creating globals as necessary to contain GC data, and referring to them using global.get later. That will ensure that data identity works (things referring to the same object in the interpreter will refer to the same object when the wasm is loaded). In more detail, each live GC item is created in a "defining global", a global that is immutable and of the precise type of that data. Then we just read from that location in any place that wants to refer to that data. That is, something like function foo() { var x = Bar(10); var y = Bar(20); var z = x; z.value++; // first object now contains 11 ... } will be evalled into something like var define$0 = Bar(11); // note the ++ has taken effect here var define$1 = Bar(20); function foo() { var x = define$0; var y = define$1; var z = define$0; ... } This PR should handle everything but "cycles", that is, GC data that at runtime ends up forming a loop. Leaving that for later work (not sure how urgent it is to fix).
* [Docs] Document wasm-ctor-eval (#4493)Alon Zakai2022-02-031-3/+2
|
* Remove used wasm-emscripten-finalize option `--initial-stack-pointer` (#4490)Sam Clegg2022-02-011-8/+0
|
* Interpreter: Remove GlobalManager (#4486)Alon Zakai2022-01-311-84/+21
| | | | | | | | | | | | | | | | | | | | | | | | | GlobalManager is another class that added complexity in the interpreter logic, and did not help. In fact it hurts extensibility, as when one wants to extend the interpreter one has another class to customize, and it is templated on the main runner, so again as #4479 we end up with annoying template cycles. This simply removes that class. That makes the interpreter code strictly simpler. Applying that change to wasm-ctor-eval also ends up fixing a pre-existing bug, so this PR gets testing through that. The ctor-eval issue was that we did not extend the GlobalManager properly in the past: we checked for accesses on imported globals there, but not in the main class, i.e., not on global.get operations. Needing to do things in two places is an example of the previous complexity. The fix is simply to implement visitGlobalGet in one place, and remove all the GlobalManager logic added in ctor-eval, which then gets a lot simpler as well. The new imported-global-2.wast checks for that bug (a global.get of an import should stop us from evalling). Existing tests cover the other cases, like it being ok to read a non-imported global, etc. The existing test indirect-call3.wast required a slight change: There was a global.get of an imported global, which was ignored in the place it happened (an init of an elem segment); the new code checks all global.gets, so it now catches that.
* [NFC] Refactor ModuleInstanceBase+RuntimeExpressionRunner into a single ↵Alon Zakai2022-01-283-32/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | class (#4479) As recently discussed, the interpreter code is way too complex. Trying to add ctor-eval stuff I need, I got stuck and ended up spending some time to get rid of some of the complexity. We had a ModuleInstanceBase class which was basically an instance of a module, that is, an execution of it. And internally we have RuntimeExpressionRunner which is a runner that integrates with the ModuleInstanceBase - basically, it uses the runtime info to execute code. For example, the MIB has globals info, and the RER would read it from there. But these two classes are really just one functionality - an execution of a module. We get rid of some complexity by removing the separation between them, ending up with a class that can run a module. One set of problems we avoid is that we can now extend the single class in a simple way. Before, we would need to extend both - and inform each other of those changes. That gets "fun" with CRTP which we use everywhere. In other words, each of the two classes depended on the other / would need to be templated on the other. Specifically, MIB.callFunction would need to be given the RER to run with, and so that would need to be templated on it. This ends up leading to a bunch more templating all around - all complexity that we just don't need. See the simplification to the wasm-ctor-eval for some of that (and even worse complexity would have been needed without this PR in the next steps for that tool to eval GC stuff). The final single class is now called ModuleRunner. Also fixes a pre-existing issue uncovered by this PR. We had the delegate target on the runner, but it should be tied to a function scope. This happened to not be a problem if one always created a new runner for each scope, but this PR makes the runner longer-lived, so the stale data ended up mattering. The PR moves that data to the proper place. Note: Diff without whitespace is far, far smaller.
* Fuzzer: Fix a missing return of a trap (#4485)Alon Zakai2022-01-281-0/+1
| | | | | We emitted the right text to stdout to indicate a trap in one code path, but did not return a Trap from the function. As a result, we'd continue and hit the assert on the next line.
* wasm-emscripten-finalize: Remove legacy --new-pic-abi option (#4483)Sam Clegg2022-01-271-6/+0
|
* Make `TypeBuilder::build()` fallible (#4474)Thomas Lively2022-01-251-1/+6
| | | | | | | | | | | It is possible for type building to fail, for example if the declared nominal supertypes form a cycle or are structurally invalid. Previously we would report a fatal error and kill the program from inside `TypeBuilder::build()` in these situations, but this handles errors at the wrong layer of the code base and is inconvenient for testing the error cases. In preparation for testing the new error cases introduced by isorecursive typing, make type building fallible and add new tests for existing error cases. Also fix supertype cycle detection, which it turns out did not work correctly.
* Introduce gtest (#4466)Thomas Lively2022-01-201-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | Add gtest as a git submodule in third_party and integrate it into the build the same way WABT does. Adds a new executable, `binaryen-unittests`, to execute `gtest_main`. As a nontrivial example test, port one of the `TypeBuilder` tests from example/ to gtest/. Using gtest has a number of advantages over the current example tests: - Tests are compiled and linked at build time rather than runtime, surfacing errors earlier and speeding up test execution. - Tests are all built into a single binary, reducing overall link time and further reducing test overhead. - Tests are built from the same CMake project as the rest of Binaryen, so compiler settings (e.g. sanitizers) are applied uniformly rather than having to be separately set via the COMPILER_FLAGS environment variable. - Using the industry-standard gtest rather than our own script reduces our maintenance burden. Using gtest will lower the barrier to writing C++ tests and will hopefully lead to us having more proper unit tests.
* Add a `--hybrid` type system option (#4460)Thomas Lively2022-01-191-0/+9
| | | | | Eventually this will enable the isorecursive hybrid type system described in https://github.com/WebAssembly/gc/pull/243, but for now it just throws a fatal error if used.
* Add --no-emit-metadata option to wasm-emscripten-finalize (#4450)Sam Clegg2022-01-191-3/+14
| | | | | | This is useful for the case where we might want to finalize without extracting metadata. See: https://github.com/emscripten-core/emscripten/pull/15918
* LiteralList => Literals (#4451)Alon Zakai2022-01-133-6/+6
| | | | | | | LiteralList overlaps with Literals, but is less efficient as it is not a SmallVector. Add reserve/capacity methods to SmallVector which are now necessary to compile.
* [ctor-eval] Eval functions with params if ignoring external input (#4446)Alon Zakai2022-01-121-6/+24
| | | | | | | | | | | | | | | | | When ignoring external input, assume params have a value of 0. This makes it possible to eval main(argc, argv) if one is careful and does not actually use those values. This is basically a workaround for main always receiving argc/argv, even if the C code has no args (in that case the compiler emits __original_main for the user's main, and wraps it with a main that adds the args, hence the problem). This is similar to the existing support for handling wasi_args_get when ignoring external input, although it just sets values of zeros for the params. Perhaps it could check for main() specifically and return 1 for argc and a proper buffer for argv somehow, but I think if a program wants to use --ignore-external-input it can avoid actually reading argc/argv.