| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
| |
`array` is the supertype of all defined array types and for now is a subtype of
`data`. (Once `data` becomes `struct` this will no longer be true.) Update the
binary and text parsing of `array.len` to ignore the obsolete type annotation
and update the binary emitting to emit a zero in place of the old type
annotation and the text printing to print an arbitrary heap type for the
annotation. A follow-on PR will add support for the newer unannotated version of
`array.len`.
|
|
|
|
|
|
|
| |
In the upstream spec, `data` has been replaced with a type called `struct`. To
allow for a graceful update in Binaryen, start by introducing "struct" as an
alias for "data". Once users have stopped emitting `data` directly, future PRs
will remove `data` and update the subtyping so that arrays are no longer
subtypes of `struct`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the goal of supporting null characters (i.e. zero bytes) in strings.
Rewrite the underlying interned `IString` to store a `std::string_view` rather
than a `const char*`, reduce the number of map lookups necessary to intern a
string, and present a more immutable interface.
Most importantly, replace the `c_str()` method that returned a `const char*`
with a `toString()` method that returns a `std::string`. This new method can
correctly handle strings containing null characters. A `const char*` can still
be had by calling `data()` on the `std::string_view`, although this usage should
be discouraged.
This change is NFC in spirit, although not in practice. It does not intend to
support any particular new functionality, but it is probably now possible to use
strings containing null characters in at least some cases. At least one parser
bug is also incidentally fixed. Follow-on PRs will explicitly support and test
strings containing nulls for particular use cases.
The C API still uses `const char*` to represent strings. As strings containing
nulls become better supported by the rest of Binaryen, this will no longer be
sufficient. Updating the C and JS APIs to use pointer, length pairs is left as
future work.
|
|
|
| |
Finishes work missed in #5126.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These types, `none`, `nofunc`, and `noextern` are uninhabited, so references to
them can only possibly be null. To simplify the IR and increase type precision,
introduce new invariants that all `ref.null` instructions must be typed with one
of these new bottom types and that `Literals` have a bottom type iff they
represent null values. These new invariants requires several additional changes.
First, it is now possible that the `ref` or `target` child of a `StructGet`,
`StructSet`, `ArrayGet`, `ArraySet`, or `CallRef` instruction has a bottom
reference type, so it is not possible to determine what heap type annotation to
emit in the binary or text formats. (The bottom types are not valid type
annotations since they do not have indices in the type section.)
To fix that problem, update the printer and binary emitter to emit unreachables
instead of the instruction with undetermined type annotation. This is a valid
transformation because the only possible value that could flow into those
instructions in that case is null, and all of those instructions trap on nulls.
That fix uncovered a latent bug in the binary parser in which new unreachables
within unreachable code were handled incorrectly. This bug was not previously
found by the fuzzer because we generally stop emitting code once we encounter an
instruction with type `unreachable`. Now, however, it is possible to emit an
`unreachable` for instructions that do not have type `unreachable` (but are
known to trap at runtime), so we will continue emitting code. See the new
test/lit/parse-double-unreachable.wast for details.
Update other miscellaneous code that creates `RefNull` expressions and null
`Literals` to maintain the new invariants as well.
|
|
|
|
|
|
|
| |
Emit call_ref instructions with type annotations and a temporary opcode. Also
implement support for parsing optional type annotations on call_ref in the text
and binary formats. This is part of a multi-part graceful update to switch
Binaryen and all of its users over to using the type-annotated version of
call_ref without there being any breakage.
|
|
|
|
|
|
| |
The GC spec has been updated to have heap type annotations on call_ref and
return_call_ref. To avoid breaking users, we will have a graceful, multi-step
upgrade to the annotated version of call_ref, but since return_call_ref has no
users yet, update it in a single step.
|
|
|
|
|
|
|
| |
Previously when we parsed `string.const` payloads in the text format we were
using the text strings directly instead of un-escaping them. Fix that parsing,
and while we're editing the code, also add support for the `\r` escape allowed
by the spec. Remove a spurious nested anonymous namespace and spurious `static`s
in Print.cpp as well.
|
|
|
|
| |
(#5038)
|
|
|
|
|
|
|
| |
Match the latest version of the GC spec. This change does not depend on V8
changing its interpretation of the shorthands because we are still temporarily
not emitting the binary shorthands, but all Binaryen users will have to update
their interpretations along with this change if they use the text or binary
shorthands.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the wat parser would turn this input:
(block
(nop)
)
into something like this:
(block $block17
(nop)
)
It just added a name all the time, in case the block is referred to by an index
later even though it doesn't have a name.
This PR makes us rountrip more precisely by not adding such names: if there
was no name before, and there is no break by index, then do not add a name.
In addition, this will be useful for non-nullable locals since whether a block has
a name or not matters there. Like #4912, this makes us more regular in our
usage of block names.
|
|
|
|
|
|
|
| |
The GC proposal has split `any` and `extern` back into two separate types, so
reintroduce `HeapType::ext` to represent `extern`. Before it was originally
removed in #4633, externref was a subtype of anyref, but now it is not. Now that
we have separate heaptype type hierarchies, make `HeapType::getLeastUpperBound`
fallible as well.
|
|
|
|
|
|
|
| |
This PR removes the single memory restriction in IR, adding support for a single module to reference multiple memories. To support this change, a new memory name field was added to 13 memory instructions in order to identify the memory for the instruction.
It is a goal of this PR to maintain backwards compatibility with existing text and binary wasm modules, so memory indexes remain optional for memory instructions. Similarly, the JS API makes assumptions about which memory is intended when only one memory is present in the module. Another goal of this PR is that existing tests behavior be unaffected. That said, tests must now explicitly define a memory before invoking memory instructions or exporting a memory, and memory names are now printed for each memory instruction in the text format.
There remain quite a few places where a hardcoded reference to the first memory persist (memory flattening, for example, will return early if more than one memory is present in the module). Many of these call-sites, particularly within passes, will require us to rethink how the optimization works in a multi-memories world. Other call-sites may necessitate more invasive code restructuring to fully convert away from relying on a globally available, single memory pointer.
|
|
|
| |
Like the 8-bit array variants, it takes 3 parameters.
|
| |
|
|
|
|
|
|
|
| |
RTTs were removed from the GC spec and if they are added back in in the future,
they will be heap types rather than value types as in our implementation.
Updating our implementation to have RTTs be heap types would have been more work
than deleting them for questionable benefit since we don't know how long it will
be before they are specced again.
|
| |
|
|
|
|
|
|
|
|
|
| |
Basic reference types like `Type::funcref`, `Type::anyref`, etc. made it easy to
accidentally forget to handle reference types with the same basic HeapTypes but
the opposite nullability. In principle there is nothing special about the types
with shorthands except in the binary and text formats. Removing these shorthands
from the internal type representation by removing all basic reference types
makes some code more complicated locally, but simplifies code globally and
encourages properly handling both nullable and non-nullable reference types.
|
| |
|
|
|
|
|
|
|
| |
Unfortunately one slice is the same as python [start:end], using 2 params,
and the other slice is one param, [CURR:CURR+num] (where CURR is implied
by the current state in the iter). So we can't use a single class here. Perhaps
a different name would be good, like slice vs substring (like JS does), but
I picked names to match the current spec.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This is more work than a typical instruction because it also adds a new section:
all the (string.const "foo") strings are put in a new "strings" section in the binary, and
the instructions refer to them by index.
|
|
|
|
|
|
| |
This is the first instruction from the Strings proposal.
This includes everything but interpreter support.
|
|
|
|
|
|
|
|
| |
This starts to implement the Wasm Strings proposal
https://github.com/WebAssembly/stringref/blob/main/proposals/stringref/Overview.md
This just adds the types.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Updating wasm.h/cpp for DataSegments
* Updating wasm-binary.h/cpp for DataSegments
* Removed link from Memory to DataSegments and updated module-utils, Metrics and wasm-traversal
* checking isPassive when copying data segments to know whether to construct the data segment with an offset or not
* Removing memory member var from DataSegment class as there is only one memory rn. Updated wasm-validator.cpp
* Updated wasm-interpreter
* First look at updating Passes
* Updated wasm-s-parser
* Updated files in src/ir
* Updating tools files
* Last pass on src files before building
* added visitDataSegment
* Fixing build errors
* Data segments need a name
* fixing var name
* ran clang-format
* Ensuring a name on DataSegment
* Ensuring more datasegments have names
* Adding explicit name support
* Fix fuzzing name
* Outputting data name in wasm binary only if explicit
* Checking temp dataSegments vector to validateBinary because it's the one with the segments before we processNames
* Pass on when data segment names are explicitly set
* Ran auto_update_tests.py and check.py, success all around
* Removed an errant semi-colon and corrected a counter. Everything still passes
* Linting
* Fixing processing memory names after parsed from binary
* Updating the test from the last fix
* Correcting error comment
* Impl kripken@ comments
* Impl tlively@ comments
* Updated tests that remove data print when == 0
* Ran clang format
* Impl tlively@ comments
* Ran clang-format
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds exported tags to `exports` section in wasm-emscripten-finalize
metadata so Emscripten can use it.
Also fixes a bug in the parser. We have only recognized the export
format of
```wasm
(tag $e2 (param f32))
(export "e2" (tag $e2))
```
and ignored this format:
```wasm
(tag $e1 (export "e1") (param i32))
```
Companion patch: https://github.com/emscripten-core/emscripten/pull/17064
|
|
|
|
|
|
|
|
|
|
| |
Share the logic for parsing imported and non-imported globals of the formats:
(import "module" "base" (global $name? type))
(global $name? type init)
This fixes #4676, since the deleted logic for parsing imported globals did not
handle parsing GC types correctly.
|
|
|
|
|
|
| |
This unsafe experimental instruction is semantically equivalent to
ref.cast_static, but V8 will unsafely turn it into a nop. This is meant to help
us measure cast overhead more precisely than we can by globally turning all
casts into nops.
|
|
|
|
|
|
| |
Remove `Type::externref` and `HeapType::ext` and replace them with uses of
anyref and any, respectively, now that we have unified these types in the GC
proposal. For backwards compatibility, continue to parse `extern` and
`externref` and maintain their relevant C API functions.
|
|
|
|
| |
Instead of just reporting the type index that causes an error when building
types, report the name of the responsible type when parsing the text format.
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible for type building to fail, for example if the declared nominal
supertypes form a cycle or are structurally invalid. Previously we would report
a fatal error and kill the program from inside `TypeBuilder::build()` in these
situations, but this handles errors at the wrong layer of the code base and is
inconvenient for testing the error cases.
In preparation for testing the new error cases introduced by isorecursive
typing, make type building fallible and add new tests for existing error cases.
Also fix supertype cycle detection, which it turns out did not work correctly.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In `--hybrid` isorecursive mode, associate each defined type with a recursion
group, represented as a `(rec ...)` wrapping the type definitions in the text
format. Parse that text format, create the rec groups using a new TypeBuilder
method, and print the rec groups in the printer.
The only semantic difference rec groups currently make is that if one type in a
rec group will be included in the output, all the types in that rec group will
be included. This is because changing a rec group in any way (for example by
removing a type) changes the identity of the types in that group in the
isorecursive type system. Notably, rec groups do not yet participate in
validation, so `--hybrid` is largely equivalent to `--nominal` for now.
|
|
|
|
|
|
|
|
| |
This field was originally added with the goal of allowing types from multiple
type systems to coexist by determining the type system on a per-type level
rather than globally. This goal was never fully achieved and the `isNominal`
field is not used outside of tests. Now that we are working on implementing the
hybrid isorecursive system, it does not look like having types from multiple
systems coexist will be useful in the near term, so clean up this tech debt.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
With nominal function types, this change makes it so that we preserve the
identity of the function type used with call_indirect instructions rather than
recreating a function heap type, which may or may not be the same as the
originally parsed heap type, from the function signature during module writing.
This will simplify the type system implementation by removing the need to store
a "canonical" nominal heap type for each unique signature. We previously
depended on those canonical types to avoid creating multiple duplicate function
types during module writing, but now we aren't creating any new function types
at all.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Implement parsing the new {func,struct,array}_subtype format for nominal types.
For now, the new format is parsed the same way the old-style (extends X) format
is parsed, i.e. in --nominal mode types are parsed as nominal but otherwise they
are parsed as equirecursive. Intentionally do not parse the new types
unconditionally as nominal for now to allow frontends to update their nominal
text format while continuing to use the workflow of running wasm-opt without
--nominal to lower nominal types to structural types.
|
| |
|
|
|
|
| |
Adds the part of the spec test suite that this passes (without table.set we
can't do it all).
|
|
|
|
|
|
|
|
|
| |
See #4149
This modifies the test added in #4163 which used static casts on
dynamically-created structs and arrays. That was technically not
valid (as we won't want users to "mix" the two forms). This makes that
test 100% static, which both fixes the test and gives test coverage
to the new instructions added here.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
These variants take a HeapType that is the type we intend to cast to,
and do not take an RTT.
These are intended to be more statically optimizable. For now though
this PR just implements the minimum to get them parsing and to get
through the optimizer without crashing.
Spec: https://docs.google.com/document/d/1afthjsL_B9UaMqCA5ekgVmOm75BVFu6duHNsN9-gnXw/edit#
See #4149
|
|
|
|
|
|
|
| |
array.init is like array.new_with_rtt except that it takes
as arguments the values to initialize the array with (as opposed to
a size and an optional initial value).
Spec: https://docs.google.com/document/d/1afthjsL_B9UaMqCA5ekgVmOm75BVFu6duHNsN9-gnXw/edit#
|