summaryrefslogtreecommitdiff
path: root/src/ir/module-splitting.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Module splitting: don't create new tables when splitting with Emscripten (#7050)Derek Schuff2024-11-021-6/+18
| | | | | | | | Emscripten's JS loader code for wasm-split isn't prepared for handling multiple tables that binaryen automatically creates when reference types are enabled (especially in conjunction with dynamic loading). For now, disable creation of multiple tables when using Emscripten's table ABI (distinguished by importing or exporting a table named "__indirect_function_table".
* [wasm-split] Minimize non-function export names (#6951)Thomas Lively2024-09-171-2/+5
| | | | | | | | The module splitting utility has a configuration option for minimizing new export names, but it was previously applied only to newly exported functions. Using the new multi-split mode can produce lots of exported tables and splitting WasmGC programs can produce lots of exported globals, so minimizing these export names can have a big impact on code size.
* [wasm-split] Configure split functions rather than kept functions (#6949)Thomas Lively2024-09-171-1/+1
| | | | | | | | The configuration for the module splitting utility previous took a set of functions to keep in the primary module. Change it to take a list of functions to split into the secondary module instead. This improves the code quality in multi-split mode because it keeps stub functions generated by previous splits from being moved into secondary modules during later splits.
* [wasm-split] Run RemoveUnusedElements on secondary modules (#6945)Thomas Lively2024-09-171-0/+11
| | | | | | | | | Rather than analyze what module elements from the primary module a secondary module will need, the splitting logic conservatively imports all module elements from the primary module into the secondary module. Run RemoveUnusedElements on the secondary module to remove any of these imports that happen to be unnecessary. Leave a TODO mentioning the possibility of being more selective about which module elements get exported to reduce code size in the primary module, too.
* [wasm-split] Add an option to skip importing placeholders (#6942)Thomas Lively2024-09-161-21/+34
| | | | | | | | | | | | | | Wasm-split generally assumes that calls to secondary functions made before the secondary module has been loaded and instantiated should go to imported placeholder functions that can be responsible for loading the secondary module and forwarding the call to the loaded function. That scheme makes the loading entirely transparent from the application's point of view, which is not always a good thing. Other schemes would make it impossible for a secondary function to be called before the secondary module has been explicitly loaded, in which case the placeholder functions would never be called. To improve code size and simplify instantiation under these schemes, add a new `--no-placeholders` option that skips adding imported placeholder functions.
* [wasm-split] Use a fresh table when reference types are enabled (#6726)Thomas Lively2024-07-111-13/+18
| | | | | | | Rather than trying to trampoline primary-to-secondary calls through an existing table, just create a fresh table for this purpose. This ensures that modifications to the existing tables cannot interfere with primary-to-secondary calls and conversely that loading the secondary module cannot overwrite modifications to the tables.
* Fix wasm-split bug in absence of active element segments (#6651)Thomas Lively2024-06-111-7/+13
| | | | | | | | The module splitting code incorrectly assumed that there would be at least one active element segment and failed to initialize the table slot manager with a function table if that was not the case. Fix the bug by setting the table even when there are no active segments and add a test. Fixes #6572 and #6637.
* wasm-split: Handle RefFuncs (#6513)Alon Zakai2024-05-081-8/+85
| | | | | When we have a ref.func that refers to the secondary module then make a trampoline that calls it directly. The trampoline's call is then fixed up like all direct calls to the secondary module.
* [jspi] - Support new version of JSPI for module splitting. (#6546)Brendan Dahl2024-04-291-6/+20
| | | | | | With the old version of JSPI, the JSPI pass was required to be run before splitting and would automatically add an export to be able to find the load_secondary_module function. Now that the pass is no longer needed, just add an import manually for the load_secondary_module function.
* [wasm-split] Do not split out functions referring to segments (#6517)Thomas Lively2024-04-231-4/+62
| | | | | | | Since data and elem segments cannot be imported or exported, there is no way to access them from the secondary module, so functions that need to refer to them cannot be split out. Fixes #6512.
* Fixes regarding explicit names (#6466)Jérôme Vouillon2024-04-111-3/+4
| | | | | | | - Only write explicit function names. - When merging modules, the name of types, globals and tags in all modules but the first were lost. - Set name as explicit when copying a function with a new name.
* Support using JSPI to load the secondary wasm split module. (#5431)Brendan Dahl2023-01-201-3/+52
| | | | | | | | | | When using JSPI with wasm-split, any calls to secondary module functions will now first check a global to see if the module is loaded. If not loaded it will call a JSPI'ed function that will handle loading module. The setup is split into the JSPI pass and wasm-split tool since the JSPI pass is first run by emscripten and we need to JSPI'ify the load secondary module function. wasm-split then injects all the checks and calls to the load function.
* Make `Name` a pointer, length pair (#5122)Thomas Lively2022-10-111-3/+2
| | | | | | | | | | | | | | | | | | | | | | | With the goal of supporting null characters (i.e. zero bytes) in strings. Rewrite the underlying interned `IString` to store a `std::string_view` rather than a `const char*`, reduce the number of map lookups necessary to intern a string, and present a more immutable interface. Most importantly, replace the `c_str()` method that returned a `const char*` with a `toString()` method that returns a `std::string`. This new method can correctly handle strings containing null characters. A `const char*` can still be had by calling `data()` on the `std::string_view`, although this usage should be discouraged. This change is NFC in spirit, although not in practice. It does not intend to support any particular new functionality, but it is probably now possible to use strings containing null characters in at least some cases. At least one parser bug is also incidentally fixed. Follow-on PRs will explicitly support and test strings containing nulls for particular use cases. The C API still uses `const char*` to represent strings. As strings containing nulls become better supported by the rest of Binaryen, this will no longer be sufficient. Updating the C and JS APIs to use pointer, length pairs is left as future work.
* Mutli-Memories Support in IR (#4811)Ashley Nelson2022-08-171-8/+3
| | | | | | | This PR removes the single memory restriction in IR, adding support for a single module to reference multiple memories. To support this change, a new memory name field was added to 13 memory instructions in order to identify the memory for the instruction. It is a goal of this PR to maintain backwards compatibility with existing text and binary wasm modules, so memory indexes remain optional for memory instructions. Similarly, the JS API makes assumptions about which memory is intended when only one memory is present in the module. Another goal of this PR is that existing tests behavior be unaffected. That said, tests must now explicitly define a memory before invoking memory instructions or exporting a memory, and memory names are now printed for each memory instruction in the text format. There remain quite a few places where a hardcoded reference to the first memory persist (memory flattening, for example, will return early if more than one memory is present in the module). Many of these call-sites, particularly within passes, will require us to rethink how the optimization works in a multi-memories world. Other call-sites may necessitate more invasive code restructuring to fully convert away from relying on a globally available, single memory pointer.
* Remove basic reference types (#4802)Thomas Lively2022-07-201-6/+6
| | | | | | | | | Basic reference types like `Type::funcref`, `Type::anyref`, etc. made it easy to accidentally forget to handle reference types with the same basic HeapTypes but the opposite nullability. In principle there is nothing special about the types with shorthands except in the binary and text formats. Removing these shorthands from the internal type representation by removing all basic reference types makes some code more complicated locally, but simplifies code globally and encourages properly handling both nullable and non-nullable reference types.
* Modernize code to C++17 (#3104)Max Graey2021-11-221-6/+2
|
* Change from storing Signature to HeapType on CallIndirect (#4352)Thomas Lively2021-11-221-2/+2
| | | | | | | | | | | | With nominal function types, this change makes it so that we preserve the identity of the function type used with call_indirect instructions rather than recreating a function heap type, which may or may not be the same as the originally parsed heap type, from the function signature during module writing. This will simplify the type system implementation by removing the need to store a "canonical" nominal heap type for each unique signature. We previously depended on those canonical types to avoid creating multiple duplicate function types during module writing, but now we aren't creating any new function types at all.
* Preserve Function HeapTypes (#3952)Thomas Lively2021-06-301-14/+15
| | | | | | | | | When using nominal types, func.ref of two functions with identical signatures but different HeapTypes will yield different types. To preserve these semantics, Functions need to track their HeapTypes, not just their Signatures. This PR replaces the Signature field in Function with a HeapType field and adds new utility methods to make it almost as simple to update and query the function HeapType as it was to update and query the Function Signature.
* [EH] Make tag's attribute encoding detail (#3947)Heejin Ahn2021-06-211-1/+0
| | | | | | | | | This removes `attribute` field from `Tag` class, making the reserved and unused field known only to binary encoder and decoder. This also removes the `attribute` parameter from `makeTag` and `addTag` methods in wasm-builder.h, C API, and Binaryen JS API. Suggested in https://github.com/WebAssembly/binaryen/pull/3946#pullrequestreview-687756523.
* [EH] Replace event with tag (#3937)Heejin Ahn2021-06-181-7/+7
| | | | | | | | | | | We recently decided to change 'event' to 'tag', and to 'event section' to 'tag section', out of the rationale that the section contains a generalized tag that references a type, which may be used for something other than exceptions, and the name 'event' can be confusing in the web context. See - https://github.com/WebAssembly/exception-handling/issues/159#issuecomment-857910130 - https://github.com/WebAssembly/exception-handling/pull/161
* [wasm-split] Add an option to emit a placeholder map (#3931)Thomas Lively2021-06-121-2/+7
| | | | | | The new instruction emits a file containing a map between placeholder index and the name of the split out function that placeholder is replacing in the table. This map is intended to be useful for debugging, as discussed in https://github.com/emscripten-core/emscripten/issues/14330.
* [wasm-split] Minimize names of newly created exports (#3905)Thomas Lively2021-06-011-2/+10
| | | | | | | | | wasm-split would previously use internal function names to create the external names of the functions that are newly exported from the primary module to be imported into the secondary module. When the input module contains full function names (as is commonly the case when emitting symbol maps), this caused the function names to be preserved as the export names, even when names are otherwise being stripped. To save on code size and properly anonymize functions, generate minimal export names when debuginfo is disabled instead.
* Minor wasm-split improvements (#3825)Thomas Lively2021-04-201-39/+35
| | | | | | | | | | | - Support functions appearing more than once in the table. It turns out we were assuming and asserting that functions would appear at most once, but we weren't making use of that assumption in any way. - Make TableSlotManager::getSlot take a function Name rather than a RefFunc expression to avoid allocating and leaking unnecessary expressions. - Add and use a Builder interface for building TableElementSegments to make them more similar to other module-level items.
* wasm-split: Update dylink section when growing table (#3791)Sam Clegg2021-04-091-0/+3
|
* RefFunc: Validate that the type is non-nullable, and avoid possible bugs in ↵Alon Zakai2021-04-081-2/+1
| | | | | | | | the builder (#3790) The builder can receive a HeapType so that callers don't need to set non-nullability themselves. Not NFC as some of the callers were in fact still making it nullable.
* [RT] Add type to tables and element segments (#3763)Abbas Mashayekh2021-04-061-6/+13
|
* [RT] Support expressions in element segments (#3666)Abbas Mashayekh2021-03-241-34/+50
| | | | | | This PR adds support for `ref.null t` as a valid element segment item. The abbreviated format of `(elem ... func $f $g...)` is kept in both printing and binary emitting if all items are `ref.func`s. Public APIs aren't updated in this PR.
* [reference-types] Support passive elem segments (#3572)Abbas Mashayekh2021-03-051-47/+62
| | | | | | | | | | | Passive element segments do not belong to any table, so the link between Table and elem needs to be weaker; i.e. an elem may have a table in case of active segments, or simply be a collection of function references in case of passive/declarative segments. This PR takes Table::Segment out and turns it into a first class module element just like tables and functions. It also implements early support for parsing, printing, encoding and decoding passive/declarative elem segments.
* Support comparing, subtyping, and naming recursive types (#3610)Thomas Lively2021-02-251-13/+0
| | | | | | | | | | | | | | | | | | | | | When the type section is emitted, types with an equal amount of references are ordered by an arbitrary measure of simplicity, which previously would infinitely recurse on structurally equivalent recursive types. Similarly, calculating whether an recursive type was a subtype of another recursive type could have infinitely recursed. This PR avoids infinite recursions in both cases by switching the algorithms from using normal inductive recursion to using coinductive recursion. The difference is that while the inductive algorithms assume the relations do not hold for a pair of HeapTypes until they have been exhaustively shown to hold, the coinductive algorithms assume the relations hold unless a counterexample can be found. In addition to those two algorithms, this PR also implement name generation for recursive types, using de Bruijn indices to stand in for inner uses of the recursive types. There are additional algorithms that will need to be switched from inductive to coinductive recursion, such as least upper bound generation, but these presented a good starting point and are sufficient to get some interesting programs working end-to-end.
* Remove assertions that prevent non-assertion builds (#3576)Alon Zakai2021-02-171-0/+1
| | | And fix errors from such a build.
* [reference-types] remove single table restriction in IR (#3517)Abbas Mashayekh2021-02-091-46/+83
| | | Adds support for modules with multiple tables. Adds a field for the table name to `CallIndirect` and updates the C/JS APIs accordingly.
* [module-splitting] Fix a crash when a function is exported twice (#3455)Thomas Lively2020-12-171-0/+4
| | | | | | | | | `ModuleSplitter::thunkExportedSecondaryFunctions` creates a thunk for each secondary function that needs to be exported from the main module. Previously, if a secondary function was exported twice, this code would try to create two thunks for it rather than just making one thunk and exporting it twice. This caused a fatal error because the second thunk had the same name as the first thunk and therefore could not be added to the module. This PR fixes the issue by creating no more than one thunk per function.
* Fix wasm-split name collision bug (#3447)Thomas Lively2020-12-161-3/+16
| | | | | | | | | During module splitting, a map is constructed from internal names to their corresponding export names. This code previously did not take into account the fact that the same internal name may be used by different kinds of entities (e.g. a table and a memory may have the same internal name), which resulted in the secondary module incorrectly using the same import name for all of the entities that shared an internal name. This PR fixes the problem by including the ExternalKind of the entity in the keys of the export map.
* [module-splitting] Allow splitting with non-const table offsets (#3408)Thomas Lively2020-12-011-82/+181
| | | | | | | | | | Extend the splitting logic to handle splitting modules with a single table segment with a non-const offset. In this situation the placeholder function names are interpreted as offsets from the table base global rather than absolute indices into the table. Since addition is not allowed in segment offset expressions, the secondary module's segment must start at the same place as the first table's segment. That means that some primary functions must be duplicated in the secondary segment to fill any gaps. They are exported and imported as necessary.
* Module splitting (#3317)Thomas Lively2020-11-121-0/+479
Adds the capability to programatically split a module into a primary and secondary module such that the primary module can be compiled and run before the secondary module has been instantiated. All calls to secondary functions (i.e. functions that have been split out into the secondary module) in the primary module are rewritten to be indirect calls through the table. Initially, the table slots of all secondary functions contain references to imported placeholder functions. When the secondary module is instantiated, it will automatically patch the table to insert references to the original functions. The process of module splitting involves these steps: 1. Create the new secondary module. 2. Export globals, events, tables, and memories from the primary module and import them in the secondary module. 3. Move the deferred functions from the primary to the secondary module. 4. For any secondary function exported from the primary module, export in its place a trampoline function that makes an indirect call to its placeholder function (and eventually to the original secondary function), allocating a new table slot for the placeholder if necessary. 5. Rewrite direct calls from primary functions to secondary functions to be indirect calls to their placeholder functions (and eventually to their original secondary functions), allocating new table slots for the placeholders if necessary. 6. For each primary function directly called from a secondary function, export the primary function if it is not already exported and import it into the secondary module. 7. Replace all references to secondary functions in the primary module's table segments with references to imported placeholder functions. 8. Create new active table segments in the secondary module that will replace all the placeholder function references in the table with references to their corresponding secondary functions upon instantiation. Functions can be used or referenced three ways in a WebAssembly module: they can be exported, called, or placed in a table. The above procedure introduces a layer of indirection to each of those mechanisms that removes all references to secondary functions from the primary module but restores the original program's semantics once the secondary module is instantiated. As more mechanisms that reference functions are added in the future, such as ref.func instructions, they will have to be modified to use a similar layer of indirection. The code as currently written makes a few assumptions about the module that is being split: 1. It assumes that mutable-globals is allowed. This could be worked around by introducing wrapper functions for globals and rewriting secondary code that accesses them, but now that mutable-globals is shipped on all browsers, hopefully that extra complexity won't be necessary. 2. It assumes that all table segment offsets are constants. This simplifies the generation of segments to actively patch in the secondary functions without overwriting any other table slots. This assumption could be relaxed by 1) having secondary segments re-write primary function slots as well, 2) allowing addition in segment offsets, or 3) synthesizing a start function to modify the table instead of using segments. 3. It assumes that each function appears in the table at most once. This isn't necessarily true in general or even for LLVM output after function deduplication. Relaxing this assumption would just require slightly more complex code, so it is a good candidate for a follow up PR. Future Binaryen work for this feature includes providing a command line tool exposing this functionality as well as C API, JS API, and fuzzer support. We will also want to provide a simple instrumentation pass for finding dead or late-executing functions that would be good candidates for splitting out. It would also be good to integrate that instrumentation with future function outlining work so that dead or exceptional basic blocks could be split out into a separate module.