summaryrefslogtreecommitdiff
path: root/src/tools/wasm-metadce.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Add a --preserve-type-order option (#6916)Thomas Lively2024-09-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike other module elements, types are not stored on the `Module`. Instead, they are collected by traversing the IR before printing and binary writing. The code that collects the types tries to optimize the order of rec groups based on the number of times each type is used. As a result, the output order of types generally has no relation to the input order of types. In addition, most type optimizations rewrite the types into a single large rec group, and the order of types in that group is essentially arbitrary. Changes to the code for counting type uses, sorting types, or sorting rec groups can yield very large changes in the output order of types, producing test diffs that are hard to review and potentially harming the readability of tests by moving output types away from the corresponding input types. To help make test output more stable and readable, introduce a tool option that causes the order of output types to match the order of input types as closely as possible. It is implemented by having the parsers record the indices of the input types on the `Module` just like they already record the type names. The `GlobalTypeRewriter` infrastructure used by type optimizations associates the new types with the old indices just like it already does for names and also respects the input order when rewriting types into a large recursion group. By default, wasm-opt and other tools clear the recorded type indices after parsing the module, so their default behavior is not modified by this change. Follow-on PRs will use the new flag in more tests, which will generate large diffs but leave the tests in stable, more readable states that will no longer change due to other changes to the optimizing type sorting logic.
* [StackIR] Allow StackIR to be disabled from the commandline (#6725)Alon Zakai2024-07-101-2/+2
| | | | | | | | | Normally we use it when optimizing (above a certain level). This lets the user prevent it from being used even then. Also add optimization options to wasm-metadce so that this is possible there as well and not just in wasm-opt (this also opens the door to running more passes in metadce, which may be useful later).
* [StackIR] Run StackIR during binary writing and not as a pass (#6568)Alon Zakai2024-05-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously we had passes --generate-stack-ir, --optimize-stack-ir, --print-stack-ir that could be run like any other passes. After generating StackIR it was stashed on the function and invalidated if we modified BinaryenIR. If it wasn't invalidated then it was used during binary writing. This PR switches things so that we optionally generate, optimize, and print StackIR only during binary writing. It also removes all traces of StackIR from wasm.h - after this, StackIR is a feature of binary writing (and printing) logic only. This is almost NFC, but there are some minor noticeable differences: 1. We no longer print has StackIR in the text format when we see it is there. It will not be there during normal printing, as it is only present during binary writing. (but --print-stack-ir still works as before; as mentioned above it runs during writing). 2. --generate/optimize/print-stack-ir change from being passes to being flags that control that behavior instead. As passes, their order on the commandline mattered, while now it does not, and they only "globally" affect things during writing. 3. The C API changes slightly, as there is no need to pass it an option "optimize" to the StackIR APIs. Whether we optimize is handled by --optimize-stack-ir which is set like other optimization flags on the PassOptions object, so we don't need the old option to those C APIs. The main benefit here is simplifying the code, so we don't need to think about StackIR in more places than just binary writing. That may also allow future improvements to our usage of StackIR.
* Do not add an extra null character when reading files (#6538)Thomas Lively2024-04-241-2/+0
| | | | | | | | The new wat parser currently considers itself to be at the end of the file whenever it cannot lex another token. This is not quite right, but fixing it causes parser errors because of the extra null character we were appending to files when we read them. This null character is not useful since we can already read files as `std::string`, which always has an implicit null character, so remove it. Clean up some users of `read_file` while we're at it.
* Add sourcemap support to wasm-metadce and wasm-merge (#6372)Jérôme Vouillon2024-03-061-1/+32
|
* Remove empty _ARRAY/_VECTOR defines (NFC) (#6182)Heejin Ahn2023-12-141-3/+0
| | | | | | | `_VECTOR` or `_ARRAY` defines in `wasm-delegations-fields.def` are supposed to be defined in terms of their non-vector/array counterparts when undefined. This removes empty `_VECTOR`/`_ARRAY` defines when including `wasm-delegations-fields.def`, while adding definitions for `DELEGATE_GET_FIELD` in case it is missing.
* wasm-metadce all the things (#6142)Alon Zakai2023-11-301-141/+82
| | | | | | | | | | | | | | | Remove hardcoded paths for globals/functions/etc. in favor of general code paths that support all the module elements uniformly. As a result of that, we now support all parts of wasm, such as tables and element segments, that we didn't before. This refactoring is NFC aside from adding functionality. Note that this reduces the size of wasm-metadce by 10% while increasing its functionality - the benefits of writing generic code. To support this, add some trivial generic helpers to get or iterate over module elements using their kind in a dynamic manner. Using them might make wasm-metadce slightly slower, but I can't measure any difference.
* wasm-metadce: Improve name deduplication (#6138)Alon Zakai2023-11-301-2/+5
| | | | | | | | | | | | Avoid adding suffixes when we don't need them to keep names unique. As background, the suffixes are not used by emcc at all, so they are just for internal use in the tool. How that works is that metadce gets as input the list of things the user cares about, with names for them, so it knows the proper names to give imports and exports, and makes up names for other things. Those made up names will not be read by the user, so we can make them prettier as this PR does without breaking anything. The main benefit of this PR is to make debugging easier.
* [NFC] Add a helper to get function DCE names in wasm-metadce (#5793)Alon Zakai2023-06-301-30/+15
|
* [wasm-metadce] Note ref.func connections + fix rooting of segment offsets ↵Jérôme Vouillon2023-06-291-13/+28
| | | | (#5791)
* Switch from `typedef` to `using` in C++ code. NFC (#5258)Sam Clegg2022-11-151-1/+1
| | | | This is more modern and (IMHO) easier to read than that old C typedef syntax.
* Make `Name` a pointer, length pair (#5122)Thomas Lively2022-10-111-8/+8
| | | | | | | | | | | | | | | | | | | | | | | With the goal of supporting null characters (i.e. zero bytes) in strings. Rewrite the underlying interned `IString` to store a `std::string_view` rather than a `const char*`, reduce the number of map lookups necessary to intern a string, and present a more immutable interface. Most importantly, replace the `c_str()` method that returned a `const char*` with a `toString()` method that returns a `std::string`. This new method can correctly handle strings containing null characters. A `const char*` can still be had by calling `data()` on the `std::string_view`, although this usage should be discouraged. This change is NFC in spirit, although not in practice. It does not intend to support any particular new functionality, but it is probably now possible to use strings containing null characters in at least some cases. At least one parser bug is also incidentally fixed. Follow-on PRs will explicitly support and test strings containing nulls for particular use cases. The C API still uses `const char*` to represent strings. As strings containing nulls become better supported by the rest of Binaryen, this will no longer be sufficient. Updating the C and JS APIs to use pointer, length pairs is left as future work.
* Refactor interaction between Pass and PassRunner (#5093)Thomas Lively2022-09-301-1/+3
| | | | | | | | | | | | | | Previously only WalkerPasses had access to the `getPassRunner` and `getPassOptions` methods. Move those methods to `Pass` so all passes can use them. As a result, the `PassRunner` passed to `Pass::run` and `Pass::runOnFunction` is no longer necessary, so remove it. Also update `Pass::create` to return a unique_ptr, which is more efficient than having it return a raw pointer only to have the `PassRunner` wrap that raw pointer in a `unique_ptr`. Delete the unused template `PassRunner::getLast()`, which looks like it was intended to enable retrieving previous analyses and has been in the code base since 2015 but is not implemented anywhere.
* First class Data Segments (#4733)Ashley Nelson2022-06-211-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Updating wasm.h/cpp for DataSegments * Updating wasm-binary.h/cpp for DataSegments * Removed link from Memory to DataSegments and updated module-utils, Metrics and wasm-traversal * checking isPassive when copying data segments to know whether to construct the data segment with an offset or not * Removing memory member var from DataSegment class as there is only one memory rn. Updated wasm-validator.cpp * Updated wasm-interpreter * First look at updating Passes * Updated wasm-s-parser * Updated files in src/ir * Updating tools files * Last pass on src files before building * added visitDataSegment * Fixing build errors * Data segments need a name * fixing var name * ran clang-format * Ensuring a name on DataSegment * Ensuring more datasegments have names * Adding explicit name support * Fix fuzzing name * Outputting data name in wasm binary only if explicit * Checking temp dataSegments vector to validateBinary because it's the one with the segments before we processNames * Pass on when data segment names are explicitly set * Ran auto_update_tests.py and check.py, success all around * Removed an errant semi-colon and corrected a counter. Everything still passes * Linting * Fixing processing memory names after parsed from binary * Updating the test from the last fix * Correcting error comment * Impl kripken@ comments * Impl tlively@ comments * Updated tests that remove data print when == 0 * Ran clang format * Impl tlively@ comments * Ran clang-format
* Add categories to --help text (#4421)Alon Zakai2022-01-051-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The general shape of the --help output is now: ======================== wasm-foo Does the foo operation ======================== wasm-foo opts: -------------- --foo-bar .. Tool opts: ---------- .. The options are now in categories, with the more specific ones - most likely to be wanted by the user - first. I think this makes the list a lot less confusing. In particular, in wasm-opt all the opt passes are now in their own category. Also add a script to make it easy to update the help tests.
* Modernize code to C++17 (#3104)Max Graey2021-11-221-10/+4
|
* [wasm-metadce] Add support for tags (#4250)Heejin Ahn2021-10-141-0/+17
| | | | | | This adds support for tag-using instructions (`throw` and `catch`) to wasm-metadce. We had to use a hacky workaround in emscripten-core/emscripten#15266 because of the lack of this support; after this lands we can remove it.
* [wasm-metadce] Don't add null names to roots (#4246)Heejin Ahn2021-10-141-7/+5
| | | | | | | | | Not sure why the current code tries to add the name even when it is null, but it causes `dump()` to behave strangely and pollute stdout when it tries to print `root.str`. Also this changes code printing `Name.str` to printing just `Name`; when `Name.str` is null, it prints `(null Name)` instead of polluting stdout, and it is the recommended way of printing `Name` anyway.
* Apply features from the commandline first (#3960)Alon Zakai2021-07-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | As suggested in https://github.com/WebAssembly/binaryen/pull/3955#issuecomment-871016647 This applies commandline features first. If the features section is present, and disallows some of them, then we warn. Otherwise, the features can combine (for example, a wasm may enable feature X because it has to use it, and a user can simply add the flag for feature Y if they want the optimizer to try to use it; both flags will then be enabled). This is important because in some cases we need to know the features before parsing the wasm, in the case that the wasm does not use the features section. In particular, non-nullable GC locals have an effect during parsing. (Typed function references also does, but we found a way to apply its effect all the time, that is, always use the refined type, and that happened to not break the case where the feature is disabled - but such a workaround is not possible with non-nullable locals.) To make this less error-prone, add a FeatureSet input as a parameter to WasmBinaryBuilder. That is, when building a module, we must give it the features to use while doing so. This will unblock #3955 . That PR will also add a test for the actual usage of a feature during loading (the test can only be added there, after that PR unbreaks things).
* [EH] Replace event with tag (#3937)Heejin Ahn2021-06-181-18/+17
| | | | | | | | | | | We recently decided to change 'event' to 'tag', and to 'event section' to 'tag section', out of the rationale that the section contains a generalized tag that references a type, which may be used for something other than exceptions, and the name 'event' can be confusing in the web context. See - https://github.com/WebAssembly/exception-handling/issues/159#issuecomment-857910130 - https://github.com/WebAssembly/exception-handling/pull/161
* wasm-metadce: Keep symbols alive if there is any refeence to corresponding ↵Sam Clegg2021-04-211-0/+11
| | | | | | | | GOT.mem or GOT.func import (#3831) This prevents the DCE of used symbols in emscripten's `MAIN_MODULE=2` use case which we are starting to use and recommend a lot more. Part of the fix for https://github.com/emscripten-core/emscripten/issues/13786
* [RT] Support expressions in element segments (#3666)Abbas Mashayekh2021-03-241-7/+9
| | | | | | This PR adds support for `ref.null t` as a valid element segment item. The abbreviated format of `(elem ... func $f $g...)` is kept in both printing and binary emitting if all items are `ref.func`s. Public APIs aren't updated in this PR.
* [reference-types] Support passive elem segments (#3572)Abbas Mashayekh2021-03-051-12/+10
| | | | | | | | | | | Passive element segments do not belong to any table, so the link between Table and elem needs to be weaker; i.e. an elem may have a table in case of active segments, or simply be a collection of function references in case of passive/declarative segments. This PR takes Table::Segment out and turns it into a first class module element just like tables and functions. It also implements early support for parsing, printing, encoding and decoding passive/declarative elem segments.
* [reference-types] remove single table restriction in IR (#3517)Abbas Mashayekh2021-02-091-9/+11
| | | Adds support for modules with multiple tables. Adds a field for the table name to `CallIndirect` and updates the C/JS APIs accordingly.
* Refactor printing code so that printing Expressions always works (#3450)Alon Zakai2020-12-171-1/+1
| | | | | | | | This avoids needing to add include wasm-printing if a file doesn't already have it. To achieve that, add the std::ostream hooks in wasm.h, and also use them when possible, removing the need for the special WasmPrinter object. Also stop printing in "full" (print types on each line) in error messages by default. The user can still get that, as always, using BINARYEN_PRINT_FULL=1 in the env.
* Binary format code section offset tracking (#2515)Alon Zakai2019-12-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optionally track the binary format code section offsets, that is, when loading a binary, remember where each IR node was read from. This is necessary for DWARF debug info, as these are the offsets DWARF refers to. (Note that eventually we may want to do something else, like first read the DWARF and only then add debug info annotations into the IR in a more LLVM-like manner, but this is more straightforward and should be enough to update debug lines and ranges). This tracking adds noticeable overhead - every single IR node adds an entry in a map - so avoid it unless actually necessary. Specifically, if the user passes in -g and there are actually DWARF sections in the binary, and we are not about to remove those sections, then we need it. Print binary format code section offsets in text, when printing with -g. This will help debug and test dwarf support. It looks like ;; code offset: 0x7 as an annotation right before each node. Also add support for -g in wasm-opt tests (unlike a pass, it has just one - as a prefix). Helps #2400
* Fix metadce debug info after #2497 (#2501)Sam Clegg2019-12-041-0/+1
| | | This like was mistakenly removed as part of the BYN_TRACE conversion.
* Convert to using DEBUG macros (#2497)Sam Clegg2019-12-041-7/+2
| | | | | | This means that debugging/tracing can now be enabled and controlled centrally without managing and passing state around the codebase.
* Add feature flags and validation to wasm-metadce (#2364)Thomas Lively2019-09-271-2/+12
| | | | | | Sometimes wasm-metadce is the last tool to run over a binary in Emscripten, and in that case it needs to know what features are enabled in order to emit a valid binary. For example it needs to know whether to emit a data count section.
* Simpify PassRunner.add() and automatically parallelize parallel functions ↵Alon Zakai2019-07-191-3/+1
| | | | | | | | | (#2242) Main change here is in pass.h, everything else is changes to work with the new API. The add("name") remains as before, while the weird variadic add(..) which constructed the pass now just gets a std::unique_ptr of a pass. This also makes the memory management internally fully automatic. And it makes it trivial to parallelize WalkerPass::run on parallel passes. As a benefit, this allows removing a lot of code since in many cases there is no need to create a new pass runner, and running a pass can be just a single line.
* Add event section (#2151)Heejin Ahn2019-05-311-2/+32
| | | | | | | | | | | | | | | | | | This adds support for the event and the event section, as specified in https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md#changes-to-the-binary-model. Wasm events are features that suspend the current execution and transfer the control flow to a corresponding handler. Currently the only supported event kind is exceptions. For events, this includes support for - Binary file reading/writing - Wast file reading/writing - Binaryen.js API - Fuzzer - Validation - Metadce - Passes: metrics, minify-imports-and-exports, remove-unused-module-elements
* Reflect instruction renaming in code (#2128)Heejin Ahn2019-05-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Reflected new renamed instruction names in code and tests: - `get_local` -> `local.get` - `set_local` -> `local.set` - `tee_local` -> `local.tee` - `get_global` -> `global.get` - `set_global` -> `global.set` - `current_memory` -> `memory.size` - `grow_memory` -> `memory.grow` - Removed APIs related to old instruction names in Binaryen.js and added APIs with new names if they are missing. - Renamed `typedef SortedVector LocalSet` to `SetsOfLocals` to prevent name clashes. - Resolved several TODO renaming items in wasm-binary.h: - `TableSwitch` -> `BrTable` - `I32ConvertI64` -> `I32WrapI64` - `I64STruncI32` -> `I64SExtendI32` - `I64UTruncI32` -> `I64UExtendI32` - `F32ConvertF64` -> `F32DemoteI64` - `F64ConvertF32` -> `F64PromoteF32` - Renamed `BinaryenGetFeatures` and `BinaryenSetFeatures` to `BinaryenModuleGetFeatures` and `BinaryenModuleSetFeatures` for consistency.
* Fix misc. things for globals (#2119)Heejin Ahn2019-05-171-1/+1
|
* Allow color API to enable and disable colors (#2111)Siddharth2019-05-171-1/+1
| | | | | | This is useful for front-ends which wish to selectively enable or disable coloring. Also expose these APIs from the C API.
* clang-tidy braces changes (#2075)Alon Zakai2019-05-011-2/+4
| | | Applies the changes in #2065, and temprarily disables the hook since it's too slow to run on a change this large. We should re-enable it in a later commit.
* Apply format changes from #2048 (#2059)Alon Zakai2019-04-261-122/+146
| | | Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
* Passive segments (#1976)Thomas Lively2019-04-051-1/+3
| | | | | Adds support for the bulk memory proposal's passive segments. Uses a new (data passive ...) s-expression syntax to mark sections as passive.
* Code style improvements (#1868)Alon Zakai2019-01-151-1/+1
| | | | * Use modern T p = v; notation to initialize class fields * Use modern X() = default; notation for empty class constructors
* Unify imported and non-imported things (#1678)Alon Zakai2018-09-191-38/+49
| | | | | | | | | | | | | | Fixes #1649 This moves us to a single object for functions, which can be imported or nor, and likewise for globals (as a result, GetGlobals do not need to check if the global is imported or not, etc.). All imported things now inherit from Importable, which has the module and base of the import, and if they are set then it is an import. For convenient iteration, there are a few helpers like ModuleUtils::iterDefinedGlobals(wasm, [&](Global* global) { .. use global .. }); as often iteration only cares about imported or defined (non-imported) things.
* 'std::string &' => 'std::string& ' (#1403)Alon Zakai2018-02-051-3/+3
| | | The & on the type is the proper convention.
* metadce fixes (#1329)Alon Zakai2017-12-071-37/+114
| | | | | | | | | | | | * ignore missing imports (the wasm may have already had them optimized out) * handle segments that hold on to globals (root them, for now, as we can't remove segments) * run reorder-functions, as the optimal order may have changed after we dce * fix global, global init, and segment offset reachability * fix import rooting and processing - imports may be imported more than once
* wasm-metadce tool (#1320)Alon Zakai2017-12-061-0/+471
This adds a new tool for better dead code elimination. The problem this helps overcome is when the wasm module is part of something larger, like a wasm+JS combination, and therefore doing DCE in either one is not sufficient as it can't remove a cycle spanning the wasm and JS worlds. Concretely, when binaryen performs DCE by itself, it can never remove an export, because it considers those roots - but in the larger ("meta") space outside, they may actually be removable. To solve that, this tool receives a description of the outside graph (in very abstract form), including which nodes are roots. It then adds to that graph nodes from the wasm, so that we have a single graph representing the entire space (the outside + wasm + connections between them). It then performs DCE, finding what is not reachable from the roots, and cleaning it up from the wasm. It of course can't clean up things from the outside, since all it has is the abstract representation of those things in the graph, but it prints out the ids of the removable nodes, which an outside tool can use. This tool is written in as general a way as possible, hopefully it can have multiple uses. The use I have in mind is to write something in emscripten that uses this to DCE the JS+wasm combination that we emit.