| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generally we try to order types by decreasing use count so that frequently used
types get smaller indices. For the equirecursive and nominal systems, there are
no contraints on the ordering of types, so we just have to sort them according
to their use counts. For the isorecursive type system, however, there are a
number of ordering constraints that have to be met for the type section to be
valid. First, types in the same recursion group must be adjacent so they can be
grouped together. Second, groups must be ordered topologically so that they only
refer to types in themselves or prior groups.
Update type ordering to produce a valid isorecursive output by performing a
topological sort on the recursion groups. While performing the sort, prefer to
visit and finish processing the most used groups first as a heuristic to improve
the final ordering.
Do not reorder types within groups, since doing so would change type identity
and could affect the external interface of the module. Leave that reordering to
an optimization pass (not yet implemented) that users can explicitly opt in to.
|
|
|
|
|
| |
Update the HeapType constructors that take Signature, Structs, and Arrays to
work properly with isorecursive typing. This is particularly important for the
Signature constructor, which is used frequently throughout the code base.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Isorecursive canonicalization is similar to equirecursive canonicalization in
that it deduplicates types based on their structure, but unlike equirecursive
canonicalization, isorecursive canonicalization does not need to minimize the
type definition first. Another difference is that structures are deduplicated at
the level of recursion groups rather than individual types, so we cannot reuse
the existing `FiniteShapeHasher` and `FiniteShapeEquator` utilities. Instead,
introduce a new `RecGroupStructure` wrapper that provides equality comparison
and hashing very similar to the former utilities.
Another feature of the isorecursive type system is that its constraints on the
order of recursion groups means that they are already topologically sorted and
can be incrementally canonicalized in a bottom-up manner. This incremental
canonicalization means that the `RecGroupStructure` utility can assume that all
child `HeapTypes` have already been canonicalized and can avoid traversing
anything but the top-level type definitions in the wrapped recursion group. The
only exception is self-references into the wrapped recursion group itself, which
may not be canonical. That special case is detected and handled without any
nontrivial extra work.
Overall, the canonicalization algorithm traverses each type definition once to
canonicalize its `HeapType` use sites, then once to hash its recursion group
structure, then finally once to canonicalize its `Type` use sites in
`globallyCanonicalize`, giving only linear time complexity.
|
|
|
|
|
| |
Storing the rec group index on the HeapTypeInfo avoids having to do a linear
scan through the rec group to find the index for a particular type. This will
be important for isorecursive canonicalization, which uses rec group indices.
|
|
|
|
|
|
|
| |
Since isorecursive canonicalization will happen incrementally in a bottom-up
manner, it will be more efficient if can more precisely control which types get
updated at each step. Refactor the `CanonicalizationState` interface to
separately expose shallow updating, which updates only the top-level state, and
deep updating of the use sites within HeapTypeInfos.
|
|
|
|
|
|
|
| |
When building isorecursive types, validate their relationships according to the
rules described in https://github.com/WebAssembly/gc/pull/243. Specifically,
supertypes must be declared before their subtypes to statically prevent cycles
and child types must be declared either before or in the same recursion group as
their parents.
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible for type building to fail, for example if the declared nominal
supertypes form a cycle or are structurally invalid. Previously we would report
a fatal error and kill the program from inside `TypeBuilder::build()` in these
situations, but this handles errors at the wrong layer of the code base and is
inconvenient for testing the error cases.
In preparation for testing the new error cases introduced by isorecursive
typing, make type building fallible and add new tests for existing error cases.
Also fix supertype cycle detection, which it turns out did not work correctly.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In `--hybrid` isorecursive mode, associate each defined type with a recursion
group, represented as a `(rec ...)` wrapping the type definitions in the text
format. Parse that text format, create the rec groups using a new TypeBuilder
method, and print the rec groups in the printer.
The only semantic difference rec groups currently make is that if one type in a
rec group will be included in the output, all the types in that rec group will
be included. This is because changing a rec group in any way (for example by
removing a type) changes the identity of the types in that group in the
isorecursive type system. Notably, rec groups do not yet participate in
validation, so `--hybrid` is largely equivalent to `--nominal` for now.
|
|
|
|
|
|
|
| |
Add a utility class for defining all the common operations like pre- and post-
increment and decrement, addition and subtraction, and assigning addition and
subtraction for iterators that are comprised of a parent object and an index
into that parent object. Use the new utility to reduce the boilerplate in
wasm-type.h. Add a new test of the iterator behavior.
|
|
|
|
|
| |
Add a `destroyAllTypes` function to clear the global state of the type system
and use it in a custom gtest test fixture to ensure that each test starts and
ends with a fresh state.
|
|
|
|
|
|
|
|
| |
This field was originally added with the goal of allowing types from multiple
type systems to coexist by determining the type system on a per-type level
rather than globally. This goal was never fully achieved and the `isNominal`
field is not used outside of tests. Now that we are working on implementing the
hybrid isorecursive system, it does not look like having types from multiple
systems coexist will be useful in the near term, so clean up this tech debt.
|
|
|
|
|
|
|
|
|
|
|
| |
There is no reason the `__stack_pointer` global can't be exported from
the module, and in fact I'm experimenting with a non-relocatable main
module that requires this.
See https://github.com/emscripten-core/emscripten/issues/12682
This heuristic still kind of sucks but should always be good enough
for llvm output that always puts the stack pointer first.
|
|
|
|
|
| |
Eventually this will enable the isorecursive hybrid type system described in
https://github.com/WebAssembly/gc/pull/243, but for now it just throws a fatal
error if used.
|
|
|
|
|
| |
Update the API to make both the type indices and optimized sorting optional.
It will become more important to avoid unnecessary sorting once isorecursive
types have been implemented because they will make the sorting more complicated.
|
|
|
|
|
|
|
| |
Handle the isBasic() case first - that inlined function is very fast to call,
and it is the common case. Also, do not do unnecessary work there: just
write out what we need, instead of always doing a memcpy of 16 bytes.
This makes us over 2x faster on the benchmark in #4452
|
|
|
|
|
| |
We call this very frequently in the interpreter.
This is a 25% speedup on the benchmark in #4452
|
|
|
|
|
|
|
|
| |
As it happens, this doesn't (normally) break the resulting EM_ASM or
EM_JS strings because (IIUC) JS supports the tab literal inside of
strings as well as "\t".
However, it's better to preserve the original text so that it looks
the same in the JS file as it did in the original source.
|
|
|
|
|
|
|
| |
Fixes the crash in #4418
Also replace the .at() there with better logic to handle imported functions.
See WebAssembly/wabt#1799 for details on why wabt sometimes emits this.
|
|
|
|
|
|
| |
Without this, the result in a build without assertions might be quite
confusing. See #4410
Also make the internal names more obviously internal names.
|
|
|
|
|
| |
Without this we hit an assertion later, which is less clear.
See #4413
|
|
|
|
|
|
| |
When reading stacky code in the binary reader, we create `block`s to
make it fit into Binaryen AST, within which `pop`s can be nested, making
the resulting AST invalid. This PR runs the fixup function after reading
each `Try` to fix this.
|
|
|
|
|
|
|
|
|
| |
This enables fuzzing EH with initial contents. fuzzing.cpp/h does not
yet support generation of EH instructions, but with this we can still
fuzz EH based on initial contents.
The fuzzer ran successfully for more than 1,900,000 iterations, with my
local modification that always enables EH and lets the fuzzer select
only EH tests for its initial contents.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Equirecursive canonicalization generates new minimal type definitions for each
distinct type, but then it must replace the old, non-minimial definitions with
the new ones. That second part where the types are replaced is not unique to
equirecursive canonicalization; the same replacement happens with BasicKind
types and in the future the hybrid isorecursive system will also need to do that
kind of replacement. To improve code reuse and separation of concerns, factor
the type replacement logic out into a separate utility.
This change slightly slows down shape canonicalization, but shape
canonicalization is still much faster than global canonicalization. Nominal
typing performance is not affected.
|
|
|
| |
See https://github.com/emscripten-core/emscripten/pull/15855
|
|
|
|
|
|
|
| |
If that type is not valid then we cannot even create and finalize the node,
which means we'd hit an assertion inside finalize(), before we reach the
validator.
Fixes #4383
|
|
|
|
|
| |
Update the LUB calculation code to use std::optional rather than out params and
validate LUBs in the fuzzer to ensure that the change is NFC as intended. Also
add HeapType::getLeastUpperBound to the public API as a convenience.
|
|
|
|
|
|
|
|
|
|
| |
Hashing and comparison of nominal HeapTypeInfos previously observed their child
Types, so the Types had to be canonicalized before the HeapTypes. Unfortunately
equirecursive canonicalization requires that the HeapTypes be canonicalized
before the Types, so this was a point of divergence between the two systems.
However, #4394 updated hashing and comparison of nominal types to not depend on
child Types, so now we can harmonize the two systems by having them use the same
`globallyCanonicalize` function to canonicalize their HeapTypes followed by
their Types.
|
|
|
|
|
|
|
| |
Now that caching of "canonical" nominal signatures is handled at a separate
layer, we can remove the separate code paths for hashing and comparing
HeapTypeInfos based on their structure even in nominal mode. Now hashing and
comparing of HeapTypeInfos is uniformly handled by FiniteShapeHasher and
FiniteShapeEquator.
|
|
|
| |
Fixes #4384
|
|
|
|
|
|
|
| |
We have always cached nominal signature types keyed on their signatures to avoid
creating extra nominal types through the `HeapType::HeapType(Signature)`
constructor. However, that logic was previously built into the HeapTypeInfo
canonicalization system. To allow that system to be simplified in future PRs,
separate the caching into its own explicit layer.
|
|
|
|
|
|
|
| |
Types and HeapTypes are inserted into their respective stores either by copying
a reference to a `TypeInfo` or `HeapTypeInfo` or by moving a
`std::unique_ptr<TypeInfo>` or `std::unique_ptr<HeapTypeInfo>`. Previously these
two code paths had separate, similar logic. To reduce deduplication, combine
both code paths into a single method.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We do some postprocessing after parsing `Try` to make sure `delegate`
only targets `try`s and not `block`s:
https://github.com/WebAssembly/binaryen/blob/9659f9b07c1196447edee68fe04c8d7dd2480652/src/wasm/wasm-binary.cpp#L6404-L6426
But in case the outer `try` has neither of `catch` nor `delegate`, the
previous code just return prematurely, skipping the postprocessing part,
resulting in a binary parsing error. This PR removes that early-exiting
code.
Some test outputs have changed because `try`s are assigned labels after
the early exit. But those labels can be removed by other optimization
passes when there is no inner `rethrow` or `delegate` that targets them.
(On a side note, the restriction that `delegate` cannot target a `block`
has been removed a few months ago in the spec, so if a `delegate`
targets a `block`, it means it is just rethrown from that block. But I
still think this is a convenient invariant to hold at least within the
binaryen IR. I'm planning to allow parsing of `delegate` targeting
`block`s later, but I will make them point to `try` when read in the
IR. At the moment the LLVM toolchain does not generate such code.)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
With nominal function types, this change makes it so that we preserve the
identity of the function type used with call_indirect instructions rather than
recreating a function heap type, which may or may not be the same as the
originally parsed heap type, from the function signature during module writing.
This will simplify the type system implementation by removing the need to store
a "canonical" nominal heap type for each unique signature. We previously
depended on those canonical types to avoid creating multiple duplicate function
types during module writing, but now we aren't creating any new function types
at all.
|
|
|
|
|
| |
Check that types that were meant to have a subtype relationship actually do. To
expose the intended subtyping to the fuzzer, expose `subtypeIndices` in the
return value of the type generation function.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As we work toward allowing nominal and structural types to coexist, any
difference in how they can be built or used will be an inconvenient footgun that
we will have to work around. In the spirit of reducing the differences between
the type systems, allow TypeBuilder to construct basic HeapTypes in nominal mode
just as it can in equirecursive mode.
Although this change is a net increase in code complexity for not much
benefit (wasm-opt never needs to build basic HeapTypes), it is also an
incremental step toward getting rid of separate type system modes, so I expect
it to simplify other PRs in the near future.
This change also uncovered a bug in how the type fuzzer generated subtypes of
basic HeapTypes. The generated subtypes did not necessarily have the intended
`Kind`, which caused failures in nominal subtype validation in the fuzzer.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new fuzzer binary that repeatedly generates random types to find bugs in
the type system implementation. Each iteration creates some number of root types
followed by some number of subtypes thereof. Each built type can contain
arbitrary references to other built types, regardless of their order of
construction.
Right now the fuzzer only finds fatal errors in type building (and in its own
implementation), but it is meant to be extended to check other properties in the
future, such as that LUB calculations work as expected.
The logic for creating types is also intended to be integrated into the main
fuzzer in a follow-on PR so that the main fuzzer can fuzz with arbitrarily more
interesting GC types.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds relaxed-simd instructions based on the current status of the
proposal
https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md.
Binary opcodes are based on what is listed in
https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md#binary-format.
Text names are not fixed yet, and some sort sort of names that maps to
the non-relaxed versions are chosen for this prototype.
Support for these instructions have been added to LLVM via builtins,
adding support here will allow Emscripten to successfully compile files
that use those builtins.
Interpreter support has also been added, and they delegate to the
non-relaxed versions of the instructions.
Most instructions are implemented in the interpreter the same way as the non-relaxed
simd128 instructions, except for fma/fms, which is always fused.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves validation of `catch` bodies mostly by checking the
validity of `pop`s.
For every `catch` body:
- Checks if its tag exists
- If the tag's type is none:
- Ensures there shouldn't be any `pop`s
- If the tag's type is not none:
- Checks if there's a single `pop` within the catch body
- Checks if the tag type matches the `pop`'s type
- Checks if the `pop`'s location is valid
For every `catch_all` body:
- Ensures there shuldn't be any `pop`s
This uncovers several bugs related to `pop`s in existing tests, which
this PR also fixes.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allocation and cast instructions without explicit RTTs should use the canonical
RTTs for the given types. Furthermore, the RTTs for nominal types should reflect
the static type hierarchy. Previously, however, we implemented allocations and
casts without RTTs using an alternative system that only used static types
rather than RTT values. This alternative system would work fine in a world
without first-class RTTs, but it did not properly allow mixing instructions that
use RTTs and instructions that do not use RTTs as intended by the M4 GC spec.
This PR fixes the issue by using canonical RTTs where appropriate and cleans up
the relevant casting code using std::variant.
|
|
|
|
| |
This helps prevent bugs where we assume that the GCData has either a HeapType or
Rtt without checking. Indeed, one such bug is found and fixed.
|
|
|
| |
This sets the C++ standard variable in the build to C++17, and makes use of std::optional (a C++17 library feature) in one place, to test that it's working.
|
| |
|
|
|
|
|
|
|
|
| |
Switch from "extends" to M4 nominal syntax
Change all test inputs from using the old (extends $super) syntax to using the
new *_subtype syntax for their inputs and also update the printer to emit the
new syntax. Add a new test explicitly testing the old notation to make sure it
keeps working until we remove support for it.
|
|
|
|
|
|
|
|
|
|
|
| |
Add an assert on not emitting a null name (which would cause
a crash a few lines down on trying to read its bytes). I hit that
when writing a buggy pass that updated field names.
Also fix the case of a type not having a name but some of its
fields having names. We can't test that atm since our text
format requires types to have names anyhow, so this is a
fix for a possible future where we do allow parsing non-named
types.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Implement parsing the new {func,struct,array}_subtype format for nominal types.
For now, the new format is parsed the same way the old-style (extends X) format
is parsed, i.e. in --nominal mode types are parsed as nominal but otherwise they
are parsed as equirecursive. Intentionally do not parse the new types
unconditionally as nominal for now to allow frontends to update their nominal
text format while continuing to use the workflow of running wasm-opt without
--nominal to lower nominal types to structural types.
|
|
|
|
|
|
|
|
| |
See #4220 - this lets us handle the common case for now of simply having
an identical heap type to the table when the signature is identical.
With this PR, #4207's optimization of call_ref + table.get into
call_indirect now leads to a binary that works in V8 in nominal mode.
|