| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Add methods to `Token` for determining whether the token can be interpreted as a
particular token type, returning the interpreted value as appropriate. These
methods perform additional bounds checks for integers and NaN payloads that
could not be done during the initial lexing because the lexer did not know what
the intended token type was. The float methods also reinterpret integer tokens
as floating point tokens since the float grammar is a superset of the integer
grammar and inject the NaN payloads into parsed NaN values.
Move all bounds checking to these new classifier functions to have it in one
place.
|
|
|
|
| |
This moves it out of the validator so it can be used elsewhere. It will be
used in #4685
|
|
|
|
|
|
| |
The first way to should detect this is if the main function actually
doesn't take any params. They we fallback to looking deeper.
In preparation for https://reviews.llvm.org/D75277
|
|
|
|
|
|
|
|
|
| |
This part to finalize is currently not used and was added in preparation
for https://reviews.llvm.org/D75277.
However, the better solution to dealing with this alternative name for
main is on the emscripten side. The main reason for this is that
doing the rename here in binaryen would require finalize to always
re-write the binary, which is expensive.
|
|
|
|
|
|
|
|
| |
Previously we were tracking whether integer tokens were signed but we did not
differentiate between positive and negative signs. Unfortunately, without
differentiating them, there's no way to tell the difference between an in-bounds
negative integer and a wildly out-of-bounds positive integer when trying to
perform bounds checks for s32 tokens. Fix the problem by tracking not only
whether there is a sign on an integer token, but also what the sign is.
|
| |
|
| |
|
|
|
|
|
|
| |
wat-parser-internal.h was already quite large after implementing just the lexer,
so it made sense to rename it to be lexer-specific and start a new file for the
higher-level parser. Also make it a proper .cpp file and split the testable
interface out into wat-lexer.h.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds exported tags to `exports` section in wasm-emscripten-finalize
metadata so Emscripten can use it.
Also fixes a bug in the parser. We have only recognized the export
format of
```wasm
(tag $e2 (param f32))
(export "e2" (tag $e2))
```
and ignored this format:
```wasm
(tag $e1 (export "e1") (param i32))
```
Companion patch: https://github.com/emscripten-core/emscripten/pull/17064
|
|
|
|
| |
Improve comments and variable names to make it clear that we allocate and build
a separate string only when necessary to handle escape sequences.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than trying to actually implement the parsing of float values, which
cannot be done naively due to precision concerns, just parse the float grammar
then postprocess the parsed text into a form we can pass to `strtod` to do the
actual parsing of the value.
Since the float grammar reuses `num` and `hexnum` from the integer grammar but
does not care about overflow, add a mode to `LexIntCtx`, `num`, and `hexnum` to
allow parsing overflowing numbers.
For NaNs, store the payload as a separate value rather than as part of the
parsed double. The payload will be injected into the NaN at a higher level of
the parser once we know whether we are parsing an f64 or an f32 and therefore
know what the allowable payload values are.
|
|
|
|
|
| |
Also include reserved words that look like keywords to avoid having to find and
enumerate all the valid keywords. Invalid keywords will be rejected at a higher
level in the parser instead.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
We were missing CallRef in the CFG traversal code in a place where we
note possible exceptions. As a result we thought CallRef cannot throw, and
were missing some control flow edges.
To actually detect the problem, we need to validate non-nullable locals
properly, which we were not doing. This adds that as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Begin implementing a new text format parser that will accept the
standard text format. Start with a lexer that can iterate over
tokens in an underlying text buffer. The initial supported tokens
are integers, parentheses, and whitespace including comments.
The implementation is in a new private internal header so it can
be included into a gtest source file even though it is not meant
to be a public API. Once the parser is more complete, there will
be an additional public header exposing a more concise public API
and the private header will be included into a source file that
implements that public API.
The new parser will improve on the existing text format parser
not only because it will accept the full standard text format,
but also because its code will be simpler and easier to maintain
and because it will hopefully be faster as well. The new parser
will be built out of small functions that closely mirror the
grammar productions given in the spec and will heavily use C++17
features like string_view, optional, and variant to provide more
self-documenting and efficient code.
Future PRs will add support for lexing other kinds of tokens
followed by support for parsing more complex constructs.
|
|
|
|
|
|
| |
We were checking that nominal modules only had a single element in their type
sections, but that's not correct for the prototype nominal binary format we
still want to support. The test for this missed catching the bug because it
wasn't actually parsing in nominal mode.
|
|
|
|
|
|
|
|
|
|
| |
Share the logic for parsing imported and non-imported globals of the formats:
(import "module" "base" (global $name? type))
(global $name? type init)
This fixes #4676, since the deleted logic for parsing imported globals did not
handle parsing GC types correctly.
|
| |
|
|
|
|
|
|
|
| |
There's no reason not to allow growing by zero slots, but previously doing so
would trigger an assertion. This caused a crash when roundtripping a trivial
module.
Fixes #4667.
|
|
|
|
|
|
|
|
| |
Taking a Type is redundant as we only care about the heap type -
the nullability must be Nullable.
This avoids needing an assertion in the function, that is, it makes
the API more type-safe.
|
|
|
|
|
|
| |
This unsafe experimental instruction is semantically equivalent to
ref.cast_static, but V8 will unsafely turn it into a nop. This is meant to help
us measure cast overhead more precisely than we can by globally turning all
casts into nops.
|
|
|
|
|
|
| |
In f124a11ca3 we removed support for the prototype nominal binary format
entirely, but that means that we can no longer parse older binary modules that
used that format. Fix this regression by restoring the ability to parse the
prototype binary format.
|
|
|
|
|
|
| |
Remove `Type::externref` and `HeapType::ext` and replace them with uses of
anyref and any, respectively, now that we have unified these types in the GC
proposal. For backwards compatibility, continue to parse `extern` and
`externref` and maintain their relevant C API functions.
|
|
|
|
|
|
|
|
|
|
| |
Print subtype declarations using the standards-track format with a vector of
supertypes followed by a normal type declaration rather than our interim nominal
format that used alternative versions of the func, struct, and array forms.
Desugar the nominal format to additionally emit all the types into a single
large recursion group. Currently V8 is performing this desugaring, but after
this change and a future change that fixes the order of nominal types to ensure
supertypes precede subtypes, it will no longer need to.
|
| |
|
|
|
| |
As proposed in https://github.com/WebAssembly/relaxed-simd/issues/52.
|
|
|
|
| |
Other opcode ends with `Inxm` or `Fnxm` (where n and m are integers),
while `i8x16.swizzle`'s opcode name doesn't have an `I` in there.
|
|
|
| |
As proposed in https://github.com/WebAssembly/relaxed-simd/issues/40.
|
|
|
|
|
| |
#4555 fixed validation for such tuples, but we also did not handle
them in "stacky" code using pops etc., due to a logic bug in the
binary reading code.
|
|
|
|
|
|
| |
Apply the same logic to tuple fields as we do for all other fields,
when checking whether a non-nullable value is valid.
Fixes #4554
|
| |
|
|
|
| |
See https://github.com/WebAssembly/extended-const
|
|
|
|
| |
Instead of just reporting the type index that causes an error when building
types, report the name of the responsible type when parsing the text format.
|
|
|
|
|
|
|
|
|
| |
We were already eagerly canonicalizing basic HeapTypes when building types so
the more complicated canonicalization algorithms would not have to handle
noncanonical heap types, but we were not doing the same for Types. Equirecursive
canonicalization was properly handling noncanonical Types everywhere, but
isorecursive canonicalization was not. Rather than update isorecursive
canonicalization in multiple places, remove the special handling from
equirecursive canonicalization and canonicalize types in a single location.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous printing system in the Types API would print the full recursive
structure of a Type or HeapType with special markers using de Bruijn indices to
avoid infinite recursion and a separate special marker for when the size
exceeded an arbitrary upper limit. In practice, the types printed by that system
were not human readable, so all that complexity was not useful.
Replace that system with a new system that always emits a HeapType name rather
than recursing into the structure of inner HeapTypes. Add methods for printing
Types and HeapTypes with custom HeapType name generators. Also add a new
wasm-type-printing.h header with off-the-shelf type name generators that
implement simple naming schemes sufficient for tests and the type fuzzer.
Note that these new printing methods and the old printing methods they augment
are not used for emitting text modules. Printing types as part of expressions
and modules is handled by separate code in Print.cpp and the printing API
modified in this PR is mostly used for debugging. However, the new printing
methods are general enough that Print.cpp should be able to use them as well, so
update the format used to print types in the modified printing system to match
the text format in anticipation of making that change in a follow-up PR.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Validating nominally declared supertypes depends on being able to test type
equality, which in turn depends on the compared types having already been
canonicalized. Previously we validated supertypes before canonicalization, so
validation would fail in cases where it should have succeeded. Fix the bug by
canonicalizing first. This means that the global type store can now end up
holding invalid types, but those types will never be exposed to the user, so
that's not a huge problem.
Also fix an unrelated bug that was preventing the test from passing, which is
that supertypes were being hashed and compared without the special logic for
detecting self-references. This bug preventing the equivalent recursion groups
in the test from being deduplicated.
|
|
|
|
|
|
|
|
|
|
| |
Add support for isorecursive types to wasm-fuzz-types by generating recursion
groups and ensuring that children types are only selected from candidates
through the end of the current group. For non-isorecursive systems, treat all
the types as belonging to a single group so that their behavior is unchanged.
Also fix two small bugs found by the fuzzer: LUB calculation was taking the
wrong path for isorecursive types and isorecursive validation was not handling
basic heap types properly.
|
|
|
|
|
|
|
|
|
|
|
| |
Write and parse recursion groups in binary type sections. Unlike in the text
format, where we ignore recursion groups when not using isorecursive types, do
not allow parsing binary recursion group when using other type systems. Doing so
would produce incorrect results because recursions groups only count as single
entries in the type system vector so we dynamically grow the TypeBuilder when we
encounter them. That would change the mapping of later indices to types, and
would change the meaning of previous type definitions that use those later
indices. This is not a problem in the isorecursive system because in that system
type definitions are not allowed to use later indices.
|
|
|
|
| |
We can only call getHeapType if it is indeed a function type. Otherwise we should
show the error and move on.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generally we try to order types by decreasing use count so that frequently used
types get smaller indices. For the equirecursive and nominal systems, there are
no contraints on the ordering of types, so we just have to sort them according
to their use counts. For the isorecursive type system, however, there are a
number of ordering constraints that have to be met for the type section to be
valid. First, types in the same recursion group must be adjacent so they can be
grouped together. Second, groups must be ordered topologically so that they only
refer to types in themselves or prior groups.
Update type ordering to produce a valid isorecursive output by performing a
topological sort on the recursion groups. While performing the sort, prefer to
visit and finish processing the most used groups first as a heuristic to improve
the final ordering.
Do not reorder types within groups, since doing so would change type identity
and could affect the external interface of the module. Leave that reordering to
an optimization pass (not yet implemented) that users can explicitly opt in to.
|
|
|
|
|
| |
Update the HeapType constructors that take Signature, Structs, and Arrays to
work properly with isorecursive typing. This is particularly important for the
Signature constructor, which is used frequently throughout the code base.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Isorecursive canonicalization is similar to equirecursive canonicalization in
that it deduplicates types based on their structure, but unlike equirecursive
canonicalization, isorecursive canonicalization does not need to minimize the
type definition first. Another difference is that structures are deduplicated at
the level of recursion groups rather than individual types, so we cannot reuse
the existing `FiniteShapeHasher` and `FiniteShapeEquator` utilities. Instead,
introduce a new `RecGroupStructure` wrapper that provides equality comparison
and hashing very similar to the former utilities.
Another feature of the isorecursive type system is that its constraints on the
order of recursion groups means that they are already topologically sorted and
can be incrementally canonicalized in a bottom-up manner. This incremental
canonicalization means that the `RecGroupStructure` utility can assume that all
child `HeapTypes` have already been canonicalized and can avoid traversing
anything but the top-level type definitions in the wrapped recursion group. The
only exception is self-references into the wrapped recursion group itself, which
may not be canonical. That special case is detected and handled without any
nontrivial extra work.
Overall, the canonicalization algorithm traverses each type definition once to
canonicalize its `HeapType` use sites, then once to hash its recursion group
structure, then finally once to canonicalize its `Type` use sites in
`globallyCanonicalize`, giving only linear time complexity.
|
|
|
|
|
| |
Storing the rec group index on the HeapTypeInfo avoids having to do a linear
scan through the rec group to find the index for a particular type. This will
be important for isorecursive canonicalization, which uses rec group indices.
|
|
|
|
|
|
|
| |
Since isorecursive canonicalization will happen incrementally in a bottom-up
manner, it will be more efficient if can more precisely control which types get
updated at each step. Refactor the `CanonicalizationState` interface to
separately expose shallow updating, which updates only the top-level state, and
deep updating of the use sites within HeapTypeInfos.
|
|
|
|
|
|
|
| |
When building isorecursive types, validate their relationships according to the
rules described in https://github.com/WebAssembly/gc/pull/243. Specifically,
supertypes must be declared before their subtypes to statically prevent cycles
and child types must be declared either before or in the same recursion group as
their parents.
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible for type building to fail, for example if the declared nominal
supertypes form a cycle or are structurally invalid. Previously we would report
a fatal error and kill the program from inside `TypeBuilder::build()` in these
situations, but this handles errors at the wrong layer of the code base and is
inconvenient for testing the error cases.
In preparation for testing the new error cases introduced by isorecursive
typing, make type building fallible and add new tests for existing error cases.
Also fix supertype cycle detection, which it turns out did not work correctly.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In `--hybrid` isorecursive mode, associate each defined type with a recursion
group, represented as a `(rec ...)` wrapping the type definitions in the text
format. Parse that text format, create the rec groups using a new TypeBuilder
method, and print the rec groups in the printer.
The only semantic difference rec groups currently make is that if one type in a
rec group will be included in the output, all the types in that rec group will
be included. This is because changing a rec group in any way (for example by
removing a type) changes the identity of the types in that group in the
isorecursive type system. Notably, rec groups do not yet participate in
validation, so `--hybrid` is largely equivalent to `--nominal` for now.
|
|
|
|
|
|
|
| |
Add a utility class for defining all the common operations like pre- and post-
increment and decrement, addition and subtraction, and assigning addition and
subtraction for iterators that are comprised of a parent object and an index
into that parent object. Use the new utility to reduce the boilerplate in
wasm-type.h. Add a new test of the iterator behavior.
|
|
|
|
|
| |
Add a `destroyAllTypes` function to clear the global state of the type system
and use it in a custom gtest test fixture to ensure that each test starts and
ends with a fresh state.
|