| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Emscripten's JS loader code for wasm-split isn't prepared for handling
multiple tables that binaryen automatically creates when reference types
are enabled (especially in conjunction with dynamic loading). For now,
disable creation of multiple tables when using Emscripten's table ABI
(distinguished by importing or exporting a table named
"__indirect_function_table".
|
|
|
|
|
|
|
|
| |
The module splitting utility has a configuration option for minimizing
new export names, but it was previously applied only to newly exported
functions. Using the new multi-split mode can produce lots of exported
tables and splitting WasmGC programs can produce lots of exported
globals, so minimizing these export names can have a big impact on code
size.
|
|
|
|
|
|
|
|
| |
The configuration for the module splitting utility previous took a set
of functions to keep in the primary module. Change it to take a list of
functions to split into the secondary module instead. This improves the
code quality in multi-split mode because it keeps stub functions
generated by previous splits from being moved into secondary modules
during later splits.
|
|
|
|
|
|
|
|
|
| |
Rather than analyze what module elements from the primary module a
secondary module will need, the splitting logic conservatively imports
all module elements from the primary module into the secondary module.
Run RemoveUnusedElements on the secondary module to remove any of these
imports that happen to be unnecessary. Leave a TODO mentioning the
possibility of being more selective about which module elements get
exported to reduce code size in the primary module, too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Wasm-split generally assumes that calls to secondary functions made
before the secondary module has been loaded and instantiated should go
to imported placeholder functions that can be responsible for loading
the secondary module and forwarding the call to the loaded function.
That scheme makes the loading entirely transparent from the
application's point of view, which is not always a good thing. Other
schemes would make it impossible for a secondary function to be called
before the secondary module has been explicitly loaded, in which case
the placeholder functions would never be called. To improve code size
and simplify instantiation under these schemes, add a new
`--no-placeholders` option that skips adding imported placeholder
functions.
|
|
|
|
|
|
|
| |
Rather than trying to trampoline primary-to-secondary calls through an
existing table, just create a fresh table for this purpose. This ensures
that modifications to the existing tables cannot interfere with
primary-to-secondary calls and conversely that loading the secondary
module cannot overwrite modifications to the tables.
|
|
|
|
|
|
|
|
| |
The module splitting code incorrectly assumed that there would be at least one
active element segment and failed to initialize the table slot manager with a
function table if that was not the case. Fix the bug by setting the table even
when there are no active segments and add a test.
Fixes #6572 and #6637.
|
|
|
|
|
| |
When we have a ref.func that refers to the secondary module then make a trampoline
that calls it directly. The trampoline's call is then fixed up like all direct calls to the
secondary module.
|
|
|
|
|
|
| |
With the old version of JSPI, the JSPI pass was required to be run before
splitting and would automatically add an export to be able to find the
load_secondary_module function. Now that the pass is no longer needed,
just add an import manually for the load_secondary_module function.
|
|
|
|
|
|
|
| |
Since data and elem segments cannot be imported or exported, there is no way to
access them from the secondary module, so functions that need to refer to them
cannot be split out.
Fixes #6512.
|
|
|
|
|
|
|
| |
- Only write explicit function names.
- When merging modules, the name of types, globals and tags in all
modules but the first were lost.
- Set name as explicit when copying a function with a new name.
|
|
|
|
|
|
|
|
|
|
| |
When using JSPI with wasm-split, any calls to secondary module functions
will now first check a global to see if the module is loaded. If not
loaded it will call a JSPI'ed function that will handle loading module.
The setup is split into the JSPI pass and wasm-split tool since the JSPI
pass is first run by emscripten and we need to JSPI'ify the load secondary
module function. wasm-split then injects all the checks and calls to the
load function.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the goal of supporting null characters (i.e. zero bytes) in strings.
Rewrite the underlying interned `IString` to store a `std::string_view` rather
than a `const char*`, reduce the number of map lookups necessary to intern a
string, and present a more immutable interface.
Most importantly, replace the `c_str()` method that returned a `const char*`
with a `toString()` method that returns a `std::string`. This new method can
correctly handle strings containing null characters. A `const char*` can still
be had by calling `data()` on the `std::string_view`, although this usage should
be discouraged.
This change is NFC in spirit, although not in practice. It does not intend to
support any particular new functionality, but it is probably now possible to use
strings containing null characters in at least some cases. At least one parser
bug is also incidentally fixed. Follow-on PRs will explicitly support and test
strings containing nulls for particular use cases.
The C API still uses `const char*` to represent strings. As strings containing
nulls become better supported by the rest of Binaryen, this will no longer be
sufficient. Updating the C and JS APIs to use pointer, length pairs is left as
future work.
|
|
|
|
|
|
|
| |
This PR removes the single memory restriction in IR, adding support for a single module to reference multiple memories. To support this change, a new memory name field was added to 13 memory instructions in order to identify the memory for the instruction.
It is a goal of this PR to maintain backwards compatibility with existing text and binary wasm modules, so memory indexes remain optional for memory instructions. Similarly, the JS API makes assumptions about which memory is intended when only one memory is present in the module. Another goal of this PR is that existing tests behavior be unaffected. That said, tests must now explicitly define a memory before invoking memory instructions or exporting a memory, and memory names are now printed for each memory instruction in the text format.
There remain quite a few places where a hardcoded reference to the first memory persist (memory flattening, for example, will return early if more than one memory is present in the module). Many of these call-sites, particularly within passes, will require us to rethink how the optimization works in a multi-memories world. Other call-sites may necessitate more invasive code restructuring to fully convert away from relying on a globally available, single memory pointer.
|
|
|
|
|
|
|
|
|
| |
Basic reference types like `Type::funcref`, `Type::anyref`, etc. made it easy to
accidentally forget to handle reference types with the same basic HeapTypes but
the opposite nullability. In principle there is nothing special about the types
with shorthands except in the binary and text formats. Removing these shorthands
from the internal type representation by removing all basic reference types
makes some code more complicated locally, but simplifies code globally and
encourages properly handling both nullable and non-nullable reference types.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
With nominal function types, this change makes it so that we preserve the
identity of the function type used with call_indirect instructions rather than
recreating a function heap type, which may or may not be the same as the
originally parsed heap type, from the function signature during module writing.
This will simplify the type system implementation by removing the need to store
a "canonical" nominal heap type for each unique signature. We previously
depended on those canonical types to avoid creating multiple duplicate function
types during module writing, but now we aren't creating any new function types
at all.
|
|
|
|
|
|
|
|
|
| |
When using nominal types, func.ref of two functions with identical signatures
but different HeapTypes will yield different types. To preserve these semantics,
Functions need to track their HeapTypes, not just their Signatures.
This PR replaces the Signature field in Function with a HeapType field and adds
new utility methods to make it almost as simple to update and query the function
HeapType as it was to update and query the Function Signature.
|
|
|
|
|
|
|
|
|
| |
This removes `attribute` field from `Tag` class, making the reserved and
unused field known only to binary encoder and decoder. This also removes
the `attribute` parameter from `makeTag` and `addTag` methods in
wasm-builder.h, C API, and Binaryen JS API.
Suggested in
https://github.com/WebAssembly/binaryen/pull/3946#pullrequestreview-687756523.
|
|
|
|
|
|
|
|
|
|
|
| |
We recently decided to change 'event' to 'tag', and to 'event section'
to 'tag section', out of the rationale that the section contains a
generalized tag that references a type, which may be used for something
other than exceptions, and the name 'event' can be confusing in the web
context.
See
- https://github.com/WebAssembly/exception-handling/issues/159#issuecomment-857910130
- https://github.com/WebAssembly/exception-handling/pull/161
|
|
|
|
|
|
| |
The new instruction emits a file containing a map between placeholder index and
the name of the split out function that placeholder is replacing in the table.
This map is intended to be useful for debugging, as discussed in
https://github.com/emscripten-core/emscripten/issues/14330.
|
|
|
|
|
|
|
|
|
| |
wasm-split would previously use internal function names to create the external
names of the functions that are newly exported from the primary module to be
imported into the secondary module. When the input module contains full function
names (as is commonly the case when emitting symbol maps), this caused the
function names to be preserved as the export names, even when names are
otherwise being stripped. To save on code size and properly anonymize functions,
generate minimal export names when debuginfo is disabled instead.
|
|
|
|
|
|
|
|
|
|
|
| |
- Support functions appearing more than once in the table. It turns out we
were assuming and asserting that functions would appear at most once, but we
weren't making use of that assumption in any way.
- Make TableSlotManager::getSlot take a function Name rather than a RefFunc
expression to avoid allocating and leaking unnecessary expressions.
- Add and use a Builder interface for building TableElementSegments to make
them more similar to other module-level items.
|
| |
|
|
|
|
|
|
|
|
| |
the builder (#3790)
The builder can receive a HeapType so that callers don't need to set non-nullability
themselves.
Not NFC as some of the callers were in fact still making it nullable.
|
| |
|
|
|
|
|
|
| |
This PR adds support for `ref.null t` as a valid element segment
item. The abbreviated format of `(elem ... func $f $g...)` is kept in
both printing and binary emitting if all items are `ref.func`s. Public
APIs aren't updated in this PR.
|
|
|
|
|
|
|
|
|
|
|
| |
Passive element segments do not belong to any table, so the link between
Table and elem needs to be weaker; i.e. an elem may have a table in case
of active segments, or simply be a collection of function references in
case of passive/declarative segments.
This PR takes Table::Segment out and turns it into a first class module
element just like tables and functions. It also implements early support
for parsing, printing, encoding and decoding passive/declarative elem
segments.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the type section is emitted, types with an equal amount of references are
ordered by an arbitrary measure of simplicity, which previously would infinitely
recurse on structurally equivalent recursive types. Similarly, calculating
whether an recursive type was a subtype of another recursive type could have
infinitely recursed. This PR avoids infinite recursions in both cases by
switching the algorithms from using normal inductive recursion to using
coinductive recursion. The difference is that while the inductive algorithms
assume the relations do not hold for a pair of HeapTypes until they have been
exhaustively shown to hold, the coinductive algorithms assume the relations hold
unless a counterexample can be found.
In addition to those two algorithms, this PR also implement name generation for
recursive types, using de Bruijn indices to stand in for inner uses of the
recursive types.
There are additional algorithms that will need to be switched from inductive to
coinductive recursion, such as least upper bound generation, but these presented
a good starting point and are sufficient to get some interesting programs
working end-to-end.
|
|
|
| |
And fix errors from such a build.
|
|
|
| |
Adds support for modules with multiple tables. Adds a field for the table name to `CallIndirect` and updates the C/JS APIs accordingly.
|
|
|
|
|
|
|
|
|
| |
`ModuleSplitter::thunkExportedSecondaryFunctions` creates a thunk for each
secondary function that needs to be exported from the main module. Previously,
if a secondary function was exported twice, this code would try to create two
thunks for it rather than just making one thunk and exporting it twice. This
caused a fatal error because the second thunk had the same name as the first
thunk and therefore could not be added to the module. This PR fixes the issue by
creating no more than one thunk per function.
|
|
|
|
|
|
|
|
|
| |
During module splitting, a map is constructed from internal names to their
corresponding export names. This code previously did not take into account the
fact that the same internal name may be used by different kinds of
entities (e.g. a table and a memory may have the same internal name), which
resulted in the secondary module incorrectly using the same import name for all
of the entities that shared an internal name. This PR fixes the problem by
including the ExternalKind of the entity in the keys of the export map.
|
|
|
|
|
|
|
|
|
|
| |
Extend the splitting logic to handle splitting modules with a single table
segment with a non-const offset. In this situation the placeholder function
names are interpreted as offsets from the table base global rather than absolute
indices into the table. Since addition is not allowed in segment offset
expressions, the secondary module's segment must start at the same place as the
first table's segment. That means that some primary functions must be duplicated
in the secondary segment to fill any gaps. They are exported and imported as
necessary.
|
|
Adds the capability to programatically split a module into a primary and
secondary module such that the primary module can be compiled and run before the
secondary module has been instantiated. All calls to secondary functions (i.e.
functions that have been split out into the secondary module) in the primary
module are rewritten to be indirect calls through the table. Initially, the
table slots of all secondary functions contain references to imported
placeholder functions. When the secondary module is instantiated, it will
automatically patch the table to insert references to the original functions.
The process of module splitting involves these steps:
1. Create the new secondary module.
2. Export globals, events, tables, and memories from the primary module and
import them in the secondary module.
3. Move the deferred functions from the primary to the secondary module.
4. For any secondary function exported from the primary module, export in
its place a trampoline function that makes an indirect call to its
placeholder function (and eventually to the original secondary function),
allocating a new table slot for the placeholder if necessary.
5. Rewrite direct calls from primary functions to secondary functions to be
indirect calls to their placeholder functions (and eventually to their
original secondary functions), allocating new table slots for the
placeholders if necessary.
6. For each primary function directly called from a secondary function, export
the primary function if it is not already exported and import it into the
secondary module.
7. Replace all references to secondary functions in the primary module's table
segments with references to imported placeholder functions.
8. Create new active table segments in the secondary module that will replace
all the placeholder function references in the table with references to
their corresponding secondary functions upon instantiation.
Functions can be used or referenced three ways in a WebAssembly module: they can
be exported, called, or placed in a table. The above procedure introduces a
layer of indirection to each of those mechanisms that removes all references to
secondary functions from the primary module but restores the original program's
semantics once the secondary module is instantiated. As more mechanisms that
reference functions are added in the future, such as ref.func instructions, they
will have to be modified to use a similar layer of indirection.
The code as currently written makes a few assumptions about the module that is
being split:
1. It assumes that mutable-globals is allowed. This could be worked around by
introducing wrapper functions for globals and rewriting secondary code that
accesses them, but now that mutable-globals is shipped on all browsers,
hopefully that extra complexity won't be necessary.
2. It assumes that all table segment offsets are constants. This simplifies the
generation of segments to actively patch in the secondary functions without
overwriting any other table slots. This assumption could be relaxed by 1)
having secondary segments re-write primary function slots as well, 2)
allowing addition in segment offsets, or 3) synthesizing a start function to
modify the table instead of using segments.
3. It assumes that each function appears in the table at most once. This isn't
necessarily true in general or even for LLVM output after function
deduplication. Relaxing this assumption would just require slightly more
complex code, so it is a good candidate for a follow up PR.
Future Binaryen work for this feature includes providing a command line tool
exposing this functionality as well as C API, JS API, and fuzzer support. We
will also want to provide a simple instrumentation pass for finding dead or
late-executing functions that would be good candidates for splitting out. It
would also be good to integrate that instrumentation with future function
outlining work so that dead or exceptional basic blocks could be split out into
a separate module.
|