| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
This makes Precompute about 5% faster on a WasmGC binary.
Inspired by #6931.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Value types were previously represented internally as either enum values
for "basic," i.e. non-reference, non-tuple types or pointers to
`TypeInfo` structs encoding either references or tuples. Update the
representation of reference types to use one bit to encode nullability
and the rest of the bits to encode the referenced heap type. This allows
canonical reference types to be created with a single logical or rather
than by taking a lock on a global type store and doing a hash map lookup
to canonicalize.
This change is a massive performance improvement and dramatically
improves how performance scales with threads because the removed lock
was highly contended. Even with a single core, the performance of an O3
optimization pipeline on a WasmGC module improves by 6%. With 8 cores,
the improvement increases to 29% and with all 128 threads on my machine,
the improvement reaches 46%.
The full new encoding of types is as follows:
- If the type ID is within the range of the basic types, the type is
the corresponding basic type.
- Otherwise, if bit 0 is set, the type is a tuple and the rest of the
bits are a canonical pointer to the tuple.
- Otherwise, the type is a reference type. Bit 1 determines the
nullability and the rest of the bits encode the heap type.
Also update the encodings of basic heap types so they no longer use the
low two bits to avoid conflicts with the use of those bits in the
encoding of types.
|
|
|
|
| |
Some versions of libcxx or clang error without this, apparently due to
Type being a forward declaration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
methods (#6936)
This just moves code around. As a result, isRef() vanishes entirely from the
profiling traces in #6931, since now the core isRef/Tuple/etc. methods are
all inlineable.
This also required some reordering of wasm-type.h, namely to move HeapType
up front. No changes to that class otherwise.
TypeInfo is now in the header. getTypeInfo is now a static method on Type.
This has the downside of moving internal details into the header, and it may
increase compile time a little. The upside is making the --precompute benchmark
from #6931 significantly faster, 33%, and it will also help the many
Type::isNonNullable() etc. calls we have scattered around the codebase in
other passes too.
|
|
|
|
|
|
| |
* Add interpreter support for exnref values.
* Fix optimization passes to support try_table.
* Enable the interpreter (but not in V8, see code) on exceptions.
|
|
|
|
|
|
|
|
|
|
| |
Given a function that maps the old child heap types to new child heap
types, the new API takes care of copying the rest of the structure of a
given heap type into a TypeBuilder slot.
Use the new API in GlobalTypeRewriter::rebuildTypes. It will also be
used in an upcoming type optimization. This refactoring also required
adding the ability to clear the supertype of a TypeBuilder slot, which
was previously not possible.
|
|
|
|
|
|
|
|
|
|
|
| |
This is very similar to the internal utilities for canonicalizing rec
groups in the type system implementation, except that the new utility
also supports ordered comparison of rec groups, and of course the new
utility only uses the public type API.
A follow-up PR will replace the internal implementation of rec group
comparison and hashing in the type system with this one.
Another follow-up PR will use this new utility in a type optimization.
|
|
|
|
|
|
|
|
|
| |
PR ##6803 proposed removing Type::isString and HeapType::isString in
favor of more explicit, verbose callsites. There was no consensus to
make this change, but it was accidentally committed as part of #6804.
Revert the accidental change, except for the useful, noncontroversial
parts, such as fixing the `isString` implementation and a few other
locations to correctly handle shared types.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The HeapType API has functions like `isBasic()`, `isStruct()`,
`isSignature()`, etc. to test the classification of a heap type. Many
users have to call these functions in sequence and handle all or most of
the possible classifications. When we add a new kind of heap type,
finding and updating all these sites is a manual and error-prone
process.
To make adding new heap type kinds easier, introduce a new API that
returns an enum classifying the heap type. The enum can be used in
switch statements and the compiler's exhaustiveness checker will flag
use sites that need to be updated when we add a new kind of heap type.
This commit uses the new enum internally in the type system, but
follow-on commits will add new uses and convert uses of the existing
APIs to use `getKind` instead.
|
| |
|
|
|
|
|
|
|
|
|
| |
This abbreviates a common pattern where we first had to check whether a
heap type was basic, then if it was, get its unshared version and
compare it to some expected BasicHeapType.
Suggested in
https://github.com/WebAssembly/binaryen/pull/6771#discussion_r1683005495.
|
| |
|
|
|
| |
Add spec tests checking validation for structs and arrays.
|
|
|
|
|
|
|
|
|
|
|
| |
Implement binary and text parsing and printing of shared basic heap types and
incorporate them into the type hierarchy.
To avoid the massive amount of code duplication that would be necessary if we
were to add separate enum variants for each of the shared basic heap types, use
bit 0 to indicate whether the type is shared and replace `getBasic()` with
`getBasic(Unshared)`, which clears that bit. Update all the use sites to record
whether the original type was shared and produce shared or unshared output
without code duplication.
|
|
|
|
|
|
|
| |
Since the BasicHeapTypes are in an enum, calling HeapType methods on them
requires something like `HeapType(HeapType::func).someMethod()`. This is
unnecessarily verbose, so add a new `HeapTypes` namespace that contains
constexpr HeapType globals that can be used instead, shorting this to
`HeapTypes::func.someMethod()`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Parse the text format for shared composite types as described in the
shared-everything thread proposal. Update the parser to use 'comptype' instead
of 'strtype' to match the final GC spec and add the new syntactic class
'sharecomptype'.
Update the type canonicalization logic to take sharedness into account to avoid
merging shared and unshared types. Make the same change in the TypeMerging pass.
Ensure that shared and unshared types cannot be in a subtype relationship with
each other.
Follow-up PRs will add shared abstract heap types, binary parsing and emitting
for shared types, and fuzzer support for shared types.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the new wast parser to parse a full script up front, then traverse the
parsed script data structure and execute the commands. wasm-shell had previously
used the new wat parser for top-level modules, but it now uses the new parser
for module assertions as well. Fix various bugs this uncovered.
After this change, wasm-shell supports all the assertions used in the upstream
spec tests (although not new kinds of assertions introduced in any proposals).
Uncomment various `assert_exhaustion` tests that we can now execute.
Other kinds of assertions remain commented out in our tests: wasm-shell now
supports `assert_unlinkable`, but the interpreter does not eagerly check for the
existence of imports, so those tests do not pass. Tests that check for NaNs also
remain commented out because they do not yet use the standard syntax that
wasm-shell now supports for canonical and arithmetic NaN results, and our
interpreter would not pass all of those tests even if they did use the standard
syntax.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The stringview types from the stringref proposal have three irregularities that
break common invariants and require pervasive special casing to handle properly:
they are supertypes of `none` but not subtypes of `any`, they cannot be the
targets of casts, and they cannot be used to construct nullable references. At
the same time, the stringref proposal has been superseded by the imported
strings proposal, which does not have these irregularities. The cost of
maintaing and improving our support for stringview types is no longer worth the
benefit of supporting them.
Simplify the code base by entirely removing the stringview types and related
instructions that do not have analogues in the imported strings proposal and do
not make sense in the absense of stringviews.
Three remaining instructions, `stringview_wtf16.get_codeunit`,
`stringview_wtf16.slice`, and `stringview_wtf16.length` take stringview operands
in the stringref proposal but cannot be removed because they lower to operations
from the imported strings proposal. These instructions are changed to take
stringref operands in Binaryen IR, and to allow a graceful upgrade path for
users of these instructions, the text and binary parsers still accept but ignore
`string.as_wtf16`, which is the instruction used to convert stringrefs to
stringviews. The binary writer emits code sequences that use scratch locals and `string.as_wtf16` to keep the output valid.
Future PRs will further align binaryen with the imported strings proposal
instead of the stringref proposal, for example by making `string` a subtype of
`extern` instead of a subtype of `any` and by removing additional instructions
that do not have analogues in the imported strings proposal.
|
|
|
|
|
|
|
|
| |
This PR is part of a series that adds basic support for the typed
continuations/wasmfx proposal.
This particular PR adds cont and nocont as top and bottom types for
continuation types, completely analogous to func and nofunc for function types
(also: exn and noexn).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At the Oct hybrid CG meeting, we decided to add back `exnref`, which was
removed in 2020:
https://github.com/WebAssembly/meetings/blob/main/main/2023/CG-10.md
The new version of the proposal reflected in the explainer:
https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md
While adding support for `exnref` in the current codebase which has all
GC subtype hierarchies, I noticed we might need `noexn` heap type for
the bottom type of `exn`. We don't have it now so I just set it to 0xff
for the moment.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new optimization will eventually weaken casts by generalizing (i.e.
un-refining) their output types. If a cast is weakened enough that its output
type is a supertype of its input type, the cast will be able to be removed by
OptimizeInstructions.
Unlike refining cast inputs, generalizing cast outputs can break module
validation. For example, if the result of a cast is stored to a local and the
cast is weakened enough that its output type is no longer a subtype of that
local's type, then the local.set after the cast will no longer validate. To
avoid this validation failure, this optimization would have to generalize the
type of the local as well. In general, the more we can generalize the types of
program locations, the more we can weaken casts of values that flow into those
locations.
This initial implementation only generalizes the types of locals and does not
actually weaken casts yet. It serves as a proof of concept for the analysis
required to perform the full optimization, though. The analysis uses the new
analysis framework to perform a reverse analysis tracking type requirements for
each local and reference-typed stack value in a function.
Planned and potential future work includes:
- Implementing the transfer function for all kinds of expressions.
- Tracking requirements on the dynamic types of each location to generalize
allocations as well.
- Making the analysis interprocedural and generalizing the types of more
program locations.
- Optimizing tuple-typed locations.
- Generalizing only those locations necessary to eliminate at least one cast
(although this would make the anlysis bidirectional, so it is probably better
left to separate passes).
|
|
|
|
|
|
|
|
|
| |
This PR is part of a series that adds basic support for the [typed continuations proposal](https://github.com/wasmfx/specfx).
This PR adds continuation types, of the form `(cont $foo)` for some function type `$foo`.
The only notable changes affecting existing code are the following:
- This is the first `HeapType` which has another `HeapType` (rather than, say, a `Type`) as its immediate child. This required fixes to certain traversals that have a flag for being at the toplevel of a type.
- Some shared logic for parsing `HeapType`s has been factored out.
|
|
|
|
| |
With this, the fuzzer can replace e.g. an eq expression with a specific struct type,
because now it is away that struct types have eq as their ancestor.
|
|
|
|
| |
A later PR will add getSuperType which will mean "get the general super type -
either declared, or not".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new pass that analyzes the module to find the minimal subtyping relation
that is necessary to maintain the validity and semantics of the program and
rewrites the types to use this minimal relation. Besides eliminating references
to otherwise-unused intermediate types, this optimization should unlock
significant additional optimizing power in other type optimizations that are
constrained by having to maintain supertype validity, since after this new
optimization there are fewer and more general supertypes.
The analysis works by visiting each expression and module element to collect the
subtypings that are required to maintain its validity, then, using that as a
starting point, iteratively adding new subtypings required by type definitions
and casts until reaching a fixed point.
|
|
|
|
|
| |
Probably any array of non-reference data can be allowed to be public and sent
out of the module, as it is just data. For now, however, just special case the i8
and i16 array types which are useful already for string interop.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Match the spec and parse the shorthand binary and text formats as final and emit
final types without supertypes using the shorthands as well. This is a
potentially-breaking change, since the text and binary shorthands can no longer
be used to define types that have subtypes.
Also make TypeBuilder entries final by default to better match the spec and
update the internal APIs to use the "open" terminology rather than "final"
terminology. Future changes will update the text format to use the standard "sub
open" rather than the current "sub final" keywords. The exception is the new wat
parser, which supporst "sub open" as of this change, since it didn't support
final types at all previously.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Simplify the optimization of ref.cast and ref.test in OptimizeInstructions by
moving the loop that examines fallthrough values one at a time out to a shared
function in properties.h. Also simplify ref.cast optimization by analyzing the
cast result in just one place.
In addition to simplifying the code, also make the cast optimizations more
powerful by analyzing the nullability and heap type of the cast value
independently, resulting in a potentially more precise analysis of the cast
behavior. Also improve optimization power by considering fallthrough values when
optimizing the SuccessOnlyIfNonNull case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TypeMapper is a utility used to globally rewrite types, mapping some eliminated
source types into destination types they should be replaced with. This was
previously done by first rewriting all the types in the IR according to the
given mapping, then rewriting the type definitions and updating all the types in
the IR again. Not only was doing the rewriting twice inefficient, it also
introduced a subtle bug where the set of private types eligible to be rewritten
could be inconsistent because updating types in the IR could change the types of
control flow structures. The fuzzer found a case where this inconsistency caused
the type rebuilding to fail.
Fix the bug by first building the new types with the mapping applied and only
then rewriting the IR a single time.
Also add a `TypeBuilder::dump` utility for use in debugging.
Fixes #5845.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement support in the type system for final types, which are not allowed to
have any subtypes. Final types are syntactically different from similar
non-final types, so type canonicalization is made aware of finality. Similarly,
TypeMerging and TypeSSA are updated to work correctly in the presence of final
types as well.
Implement binary and text parsing and emitting of final types. Use the standard
text format to represent final types and interpret the non-standard
"struct_subtype" and friends as non-final. This allows a graceful upgrade path
for users currently using the non-standard text format, where they can update
their code to use final types correctly at the point when they update to use the
standard format. Once users have migrated to using the fully expanded standard
text format, we can update update Binaryen's parsers to interpret the MVP
shorthands as final types to match the spec without breaking those users.
To make it safe for V8 to independently start interpreting types declared
without `sub` as final, also reserve that shorthand encoding only for types that
have no strict subtypes.
|
|
|
|
|
|
|
| |
Rather than wrap a `TypeList`, make `Tuple` an alias of `TypeList`. This means
removing `Tuple::toString`, but that had no callers and was of limited use for
debugging anyway. In return, the use of tuples becomes much less verbose.
In the future, it may make sense to remove one of `Tuple` and `TypeList`.
|
|
|
|
|
|
|
| |
Rewrite the type canonicalization algorithm to fully canonicalize a single rec
group at a time rather than canonicalizing multiple rec groups at once in
multiple stages. The previous code was useful when it had to be shared with
equirecursive and nominal canonicalization, but was much more complicated than
necessary for just isorecursive canonicalization, which is all we support today.
|
|
|
|
|
|
|
|
| |
Now that we no longer support constructing basic heap types in TypeBuilder, we
can fully initialize rec groups when they are created, rather than having to
initialize them later during the build step after any basic types have been
canonicalized. Alongside that change, also simplify the process of initializing
a type builder slot to avoid completely overwriting the HeapTypeInfo in the slot
and avoid the hacky workarounds that required.
|
|
|
|
|
|
|
|
|
|
|
| |
This capability was originally introduced to support calculating LUBs in the
equirecursive type system, but has not been needed for anything except tests
since the equirecursive type system was removed. Since building basic heap types
is no longer useful and was a source of significant complexity, remove the APIs
that allowed it and the tests that used those APIs.
Also remove test/example/type-builder.cpp, since a significant portion of it
tested the removed APIs and the rest is already better tested in
test/gtest/type-builder.cpp.
|
|
|
|
|
| |
And since the only type system left is the standard isorecursive type system,
remove `TypeSystem` and its associated APIs entirely. Delete a few tests that
only made sense under the isorecursive type system.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some valid GC types, such as non-nullable references to bottom heap types and
types that contain non-nullable references to themselves, are uninhabitable,
meaning it is not possible to construct values of those types. This can cause
problems for the fuzzer, which generally needs to be able to construct values of
arbitrary types.
To simplify things for the fuzzer, introduce a utility for transforming type
graphs such that all their types are inhabitable. The utility performs a DFS to
find cycles of non-nullable references and breaks those cycles by introducing
nullability.
The new utility is itself fuzzed in the type fuzzer.
|
|
|
|
|
|
|
|
|
|
| |
Store string data as GC data. Inefficient (one Const per char), but ok for now.
Implement string.new_wtf16 and string.const, enough for basic testing.
Create strings in makeConstantExpression, which enables ctor-eval support.
Print strings in fuzz-exec which makes testing easier.
|
|
|
|
|
|
| |
`struct` has replaced `data` in the upstream spec, so update Binaryen's types to
match. We had already supported `struct` as an alias for data, but now remove
support for `data` entirely. Also remove instructions like `ref.is_data` that
are deprecated and do not make sense without a `data` type.
|
|
|
|
| |
We generalized the underlying API, TypeBuilder::setSubType, to allow it to take
any HeapType as the supertype in #5045. Make the same change now in the helper.
|
|
|
|
| |
Equirecursive is no longer standards track and its implementation is extremely
complex. Remove it.
|
|
|
|
| |
This is more modern and (IMHO) easier to read than that old C typedef
syntax.
|
|
|
|
|
|
|
|
|
| |
`array` is the supertype of all defined array types and for now is a subtype of
`data`. (Once `data` becomes `struct` this will no longer be true.) Update the
binary and text parsing of `array.len` to ignore the obsolete type annotation
and update the binary emitting to emit a zero in place of the old type
annotation and the text printing to print an arbitrary heap type for the
annotation. A follow-on PR will add support for the newer unannotated version of
`array.len`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the goal of supporting null characters (i.e. zero bytes) in strings.
Rewrite the underlying interned `IString` to store a `std::string_view` rather
than a `const char*`, reduce the number of map lookups necessary to intern a
string, and present a more immutable interface.
Most importantly, replace the `c_str()` method that returned a `const char*`
with a `toString()` method that returns a `std::string`. This new method can
correctly handle strings containing null characters. A `const char*` can still
be had by calling `data()` on the `std::string_view`, although this usage should
be discouraged.
This change is NFC in spirit, although not in practice. It does not intend to
support any particular new functionality, but it is probably now possible to use
strings containing null characters in at least some cases. At least one parser
bug is also incidentally fixed. Follow-on PRs will explicitly support and test
strings containing nulls for particular use cases.
The C API still uses `const char*` to represent strings. As strings containing
nulls become better supported by the rest of Binaryen, this will no longer be
sufficient. Updating the C and JS APIs to use pointer, length pairs is left as
future work.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These types, `none`, `nofunc`, and `noextern` are uninhabited, so references to
them can only possibly be null. To simplify the IR and increase type precision,
introduce new invariants that all `ref.null` instructions must be typed with one
of these new bottom types and that `Literals` have a bottom type iff they
represent null values. These new invariants requires several additional changes.
First, it is now possible that the `ref` or `target` child of a `StructGet`,
`StructSet`, `ArrayGet`, `ArraySet`, or `CallRef` instruction has a bottom
reference type, so it is not possible to determine what heap type annotation to
emit in the binary or text formats. (The bottom types are not valid type
annotations since they do not have indices in the type section.)
To fix that problem, update the printer and binary emitter to emit unreachables
instead of the instruction with undetermined type annotation. This is a valid
transformation because the only possible value that could flow into those
instructions in that case is null, and all of those instructions trap on nulls.
That fix uncovered a latent bug in the binary parser in which new unreachables
within unreachable code were handled incorrectly. This bug was not previously
found by the fuzzer because we generally stop emitting code once we encounter an
instruction with type `unreachable`. Now, however, it is possible to emit an
`unreachable` for instructions that do not have type `unreachable` (but are
known to trap at runtime), so we will continue emitting code. See the new
test/lit/parse-double-unreachable.wast for details.
Update other miscellaneous code that creates `RefNull` expressions and null
`Literals` to maintain the new invariants as well.
|
|
|
| |
Fixes #5041
|
|
|
|
|
|
|
| |
Match the latest version of the GC spec. This change does not depend on V8
changing its interpretation of the shorthands because we are still temporarily
not emitting the binary shorthands, but all Binaryen users will have to update
their interpretations along with this change if they use the text or binary
shorthands.
|
|
|
|
|
|
|
| |
The GC proposal has split `any` and `extern` back into two separate types, so
reintroduce `HeapType::ext` to represent `extern`. Before it was originally
removed in #4633, externref was a subtype of anyref, but now it is not. Now that
we have separate heaptype type hierarchies, make `HeapType::getLeastUpperBound`
fallible as well.
|
|
|
|
|
|
|
| |
RTTs were removed from the GC spec and if they are added back in in the future,
they will be heap types rather than value types as in our implementation.
Updating our implementation to have RTTs be heap types would have been more work
than deleting them for questionable benefit since we don't know how long it will
be before they are specced again.
|
|
|
|
|
|
|
|
|
| |
Basic reference types like `Type::funcref`, `Type::anyref`, etc. made it easy to
accidentally forget to handle reference types with the same basic HeapTypes but
the opposite nullability. In principle there is nothing special about the types
with shorthands except in the binary and text formats. Removing these shorthands
from the internal type representation by removing all basic reference types
makes some code more complicated locally, but simplifies code globally and
encourages properly handling both nullable and non-nullable reference types.
|
|
|
|
|
|
|
|
| |
This starts to implement the Wasm Strings proposal
https://github.com/WebAssembly/stringref/blob/main/proposals/stringref/Overview.md
This just adds the types.
|