| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
ParseDefsCtx was the only client of the CRTP InstrParserCtx utility and the
separation between the two did not serve a real purpose. Simplify the code by
combining them.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Add parsing functions for `memarg`s, the offset and align fields of load and
store instructions. These fields are interesting because they are lexically
reserved words that need to be further parsed to extract their actual values. On
top of that, add support for parsing all of the load and store instructions.
This required fixing a buffer overflow problem in the generated parser code and
adding more information to the signatures of the SIMD load and store
instructions. `SIMDLoadStoreLane` instructions are particularly interesting
because they may require backtracking to parse correctly.
|
|
|
|
|
| |
These are encoded as RefAs operations, and we have optimizations that assume those
trap on null, but Externalize/Internalize do not. Skip them there to avoid an error on the
type being incorrect later.
|
| |
|
|
|
|
|
|
|
|
|
| |
Since gen-s-parser.py is essentially a giant table mapping instruction names to
the information necessary to construct the corresponding IR nodes, there should
be no need to further parse instruction names after the code generated by
gen-s-parser.py runs. However, memory instruction parsing still parsed
instruction names to get information such as size and default alignment. The new
parser does not have the ability to parse that information out of instruction
names, so put it in the gen-s-parser.py table instead.
|
|
|
|
|
| |
This wasn't noticed since we apparently only use module code scanning to find stuff
like function references atm (which can't be in a data segment). But newer passes will
need to scan everything (#5163).
|
|
|
|
|
|
|
|
| |
Specifically if a segment offset was a const, we checked that it made sense. But the
wasm spec doesn't do that, and it actually causes some issues (#5163).
In theory this extra validation might be useful - compile-time error rather than runtime -
but if we want this it should probably be an optional thing, like an opt-in flag or a --lint
pass or such.
|
|
|
|
|
| |
I believe all locations that create one already set it (or else we'd see errors), but it's not
easy to see that when reading the code. And other similar locations (like DataSegment)
do initialize to null, so do so for consistency.
|
|
|
|
|
| |
Also add the ability to parse memory indexes to correctly handle the
multi-memory versions of these instructions. Add and use a conversion from
`Result` to `MaybeResult` as well.
|
|
|
|
|
|
|
| |
Parse 32-bit and 64-bit memories, including their initial and max sizes. Shared
memories are left to a follow-up PR. The memory abbreviation that includes
inline data is parsed, but the associated data segment is not yet created. Also
do some minor simplifications in neighboring helper functions for other kinds of
module elements.
|
|
|
|
|
| |
We already provided a specialization of `std::hash` for arbitrary pairs, so add
one for `std::tuple` as well. Use the new specialization where we were
previously using nested pairs just to be able to use the pair specialization.
|
|
|
| |
`Push` expressions were removed in #2867, so we no longer need to make them.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
When we read from a struct/array using a cone type, read from the types in the cone
and nothing else. Previously we used the declared type in the wasm, which might be
larger (both in the base type and the depth). Likewise, in a write.
To do this, this extends ConeReadLocation with a depth (previously the depth there
was assumed to be infinite, and now it is to a potentially limited depth).
After this we are fully utilizing cone types in GUFA, as the test changes show (or at
least I can't think of any other uses of cones).
|
|
|
| |
The C API still returned non nullable types for `dataref` (`ref data` instead of `ref null data`) and `i31ref` (`ref i31` instead of `ref null i31`). This PR aligns with the current state of the GC proposal, making them nullable when obtained via the C API.
|
|
|
| |
Including all `SIMDExtract`, `SIMDReplace`, `SIMDShuffle` expressions.
|
|
|
|
| |
Also add some missing error checking for the similar local instructions and make
some neighboring styling more consistent.
|
|
|
|
|
| |
modern nulls (#5154)
Modern nulls never compare equal unless they have the same type too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we have a cone type, we are able to represent in PossibleContents the
natural content of a wasm location: a type or any of its subtypes. This allows us to
enforce the wasm typing rules, that is, to filter the data arriving at a location by the
wasm type of the location.
Technically this could be unnecessary if we had full implementations of flowFoo
and so forth, that is, tailored code for each wasm expression that makes sure we
only contain and flow content that fits in the wasm type. Atm we don't have that,
and until the wasm spec stabilizes it's probably not worth the effort. Instead,
simply filter based on the type, which gives the same result (though it does take
a little more work; I measured it at 3% or so of runtime).
While doing so normalize cones to their actual maximum depth, which simplifies
things and will help more later as well.
|
|
|
| |
Adds `BinaryenHeapTypeNone`, `BinaryenHeapTypeNoext` and `BinaryenHeapTypeNofunc` to obtain the bottom heap types. Also adds `BinaryenHeapTypeIsBottom` to test whether a given heap type is a bottom type, and `BinaryenHeapTypeGetBottom` to obtain the respective bottom type given a heap type.
|
| |
|
|
|
| |
Test that we can still parse the old annotated form as well.
|
|
|
|
|
|
|
|
|
| |
`array` is the supertype of all defined array types and for now is a subtype of
`data`. (Once `data` becomes `struct` this will no longer be true.) Update the
binary and text parsing of `array.len` to ignore the obsolete type annotation
and update the binary emitting to emit a zero in place of the old type
annotation and the text printing to print an arbitrary heap type for the
annotation. A follow-on PR will add support for the newer unannotated version of
`array.len`.
|
|
|
|
|
|
| |
As the number of basic heap types has grown, the complexity of the subtype and
LUB calculations has grown as well. To ensure that they are correct, test the
complete matrix of basic types and trivial user-defined types. Fix the subtype
calculation to make string types subtypes of `any` to make the test pass.
|
|
|
|
| |
If the only memories are imported, we don't need the section. We were already
doing that for tables, functions, etc.
|
|
|
| |
Instead of Many, use a proper Cone Type for the data, as appropriate.
|
|
|
|
|
| |
Since the type annotations are not stored explicitly in Binaryen IR, we have to
validate them in the parser. Implement this and fix a newly-caught incorrect
annotation in the tests.
|
|
|
|
|
|
|
| |
In the upstream spec, `data` has been replaced with a type called `struct`. To
allow for a graceful update in Binaryen, start by introducing "struct" as an
alias for "data". Once users have stopped emitting `data` directly, future PRs
will remove `data` and update the subtyping so that arrays are no longer
subtypes of `struct`.
|
| |
|
|
|
|
|
| |
Since our usage of `WithPosition` depends on C++17 class template argument
deduction, it triggers a clang warning `-Wctad-maybe-unsupported`. Silence the
warning by providing an explicit deduction guide.
|
|
|
|
|
|
| |
This computes how deep the children of a heap type are. This will be useful in
cone type optimizations, since we want to "normalize" cones: a cone of depth
infinity can just be a cone of the actual maximum depth of existing children, etc.,
and it's simpler to have a single canonical representation to avoid extra work.
|
|
|
|
|
| |
This requires parsing local indices and fixing a bug in `Function::setLocalName`
where it only set up the mapping from index to name and not the mapping from
name to index.
|
|
|
|
|
| |
Parse unary, binary, drop, and select instructions, properly fixing up stacky
code, unreachable code, and multivalue code so it can be represented in Binaryen
IR.
|
|
|
| |
This will be useful in further cone type optimizations.
|
|
|
|
|
|
|
|
| |
The `makeXXX` functions that are responsible for individual instructions will
generally need the locations of those functions to emit useful errors. However,
since the instruction names are parsed before the `makeXXX` functions are
called, the functions have no good way of getting the location of the beginning
of the instruction. Fix this by explicitly passing them the location of the
beginning of the instruction.
|
|
|
|
| |
Avoid allocating there. This is both faster and also it ensures we never modify
our internal data structure after our constructor.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than passing both a `Ctx` and a `ParseInput` to every parsing function,
pass only a `Ctx` with a `ParseInput` inside of it. This significantly reduces
verbosity in the parser. To handle cases where parsing needs to happen at
specific locations, which used to be handled by constructing a new `ParseInput`
independent from the ctx, introduce a new RAII utility for temporarily changing
the location of the `ParseInput` inside a context.
Also add a utility for generating an error at a particular location to avoid
having to construct new `ParseInput` objects just for that purpose. This
resolves a few TODOs about correcting error locations, but since we don't test
those yet, I still consider this NFC.
|
|
|
|
|
| |
When the heap types are not subtypes of each other, but a null is possible, the
intersection exists and is a null. That null must be the shared bottom type.
|
|
|
|
|
|
|
|
|
|
|
| |
A cone type is a PossibleContents that has a base type and a depth, and it
contains all subtypes up to that depth. So depth 0 is an exact type from
before, etc.
This only adds cone type computations when combining types, that is, when we
combine two exact types we might get a cone, etc. This does not yet use the
cone info in all places (like struct gets and sets), and it does not yet define roots
of cone types, all of which is left for later. IOW this is the MVP of cone types that
is just enough to add them + pass tests + test the new functionality.
|
|
|
|
| |
Remove an obsolete error about null characters and test both binary and text
round tripping of a string constant containing an escaped zero byte.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the goal of supporting null characters (i.e. zero bytes) in strings.
Rewrite the underlying interned `IString` to store a `std::string_view` rather
than a `const char*`, reduce the number of map lookups necessary to intern a
string, and present a more immutable interface.
Most importantly, replace the `c_str()` method that returned a `const char*`
with a `toString()` method that returns a `std::string`. This new method can
correctly handle strings containing null characters. A `const char*` can still
be had by calling `data()` on the `std::string_view`, although this usage should
be discouraged.
This change is NFC in spirit, although not in practice. It does not intend to
support any particular new functionality, but it is probably now possible to use
strings containing null characters in at least some cases. At least one parser
bug is also incidentally fixed. Follow-on PRs will explicitly support and test
strings containing nulls for particular use cases.
The C API still uses `const char*` to represent strings. As strings containing
nulls become better supported by the rest of Binaryen, this will no longer be
sufficient. Updating the C and JS APIs to use pointer, length pairs is left as
future work.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Parse folded expressions as described in the spec:
https://webassembly.github.io/spec/core/text/instructions.html#folded-instructions.
The old binaryen parser _only_ parses folded expressions, and furthermore
requires them to be folded such that a parent instruction consumes the values
produced by its children and only those values. The standard format is much more
general and allows folded instructions to have an arbitrary number of children
independent of dataflow.
To prevent the rest of the parser from having to know or care about the
difference between folded and unfolded instructions, parse folded instructions
after their children have been parsed. This means that a sequence of
instructions is always parsed in the order they would appear in a binary no
matter how they are folded (or not folded).
|
|
|
| |
Finishes work missed in #5126.
|
|
|
|
| |
As an NFC preliminary change that will minimize the diff in #5122, which moves
IString to the wasm namespace.
|
|
|
| |
Making a change to wasm-validator so that Memory::kUnlimitedSize is treated properly like an unlimited case. The check for whether memory.initial < memory.max will only happen if memory.hasMax() — meaning if memory.max is not set to kUnlimitedSize.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we treated each local index as a location, and every local.set to
that index could be read by every local.get. With this we connect only
relevant sets to gets.
Practically speaking, this removes LocalLocation which is what was just
described, and instead there is ParamLocation for incoming parameter
values. And local.get/set use normal ExpressionLocations to connect a
set to a get.
I was worried this would be slow, since computing LocalGraph takes time,
but it actually more than makes up for itself on J2Wasm and we are faster
actually rocket I guess since we do less updating after local.sets.
This makes a noticeable change on the J2Wasm binary, and perhaps will
help with benchmarks.
|
|
|
|
|
| |
Unfortunately there isn't a single place where an error may occur. I tested on
several files with different flags and added sufficient warnings so that we warn
on them all.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These types, `none`, `nofunc`, and `noextern` are uninhabited, so references to
them can only possibly be null. To simplify the IR and increase type precision,
introduce new invariants that all `ref.null` instructions must be typed with one
of these new bottom types and that `Literals` have a bottom type iff they
represent null values. These new invariants requires several additional changes.
First, it is now possible that the `ref` or `target` child of a `StructGet`,
`StructSet`, `ArrayGet`, `ArraySet`, or `CallRef` instruction has a bottom
reference type, so it is not possible to determine what heap type annotation to
emit in the binary or text formats. (The bottom types are not valid type
annotations since they do not have indices in the type section.)
To fix that problem, update the printer and binary emitter to emit unreachables
instead of the instruction with undetermined type annotation. This is a valid
transformation because the only possible value that could flow into those
instructions in that case is null, and all of those instructions trap on nulls.
That fix uncovered a latent bug in the binary parser in which new unreachables
within unreachable code were handled incorrectly. This bug was not previously
found by the fuzzer because we generally stop emitting code once we encounter an
instruction with type `unreachable`. Now, however, it is possible to emit an
`unreachable` for instructions that do not have type `unreachable` (but are
known to trap at runtime), so we will continue emitting code. See the new
test/lit/parse-double-unreachable.wast for details.
Update other miscellaneous code that creates `RefNull` expressions and null
`Literals` to maintain the new invariants as well.
|