summaryrefslogtreecommitdiff
path: root/scripts/gen-s-parser.py
Commit message (Collapse)AuthorAgeFilesLines
* Typed continuations: suspend instructions (#6393)Frank Emrich2024-03-191-0/+1
| | | | | | | | | | | | | | | | | | | | | This PR is part of a series that adds basic support for the [typed continuations/wasmfx proposal](https://github.com/wasmfx/specfx). This particular PR adds support for the `suspend` instruction for suspending with a given tag, documented [here](https://github.com/wasmfx/specfx/blob/main/proposals/continuations/Overview.md#instructions). These instructions are of the form `(suspend $tag)`. Assuming that `$tag` is defined with _n_ `param` types `t_1` to `t_n`, the instruction consumes _n_ arguments of types `t_1` to `t_n`. Its result type is the same as the `result` type of the tag. Thus, the folded textual representation looks like `(suspend $tag arg1 ... argn)`. Support for the instruction is implemented in both the old and the new wat parser. Note that this PR does not implement validation of the new instruction. This PR also fixes finalization of `cont.new`, `cont.bind` and `resume` nodes in those cases where any of their children are unreachable.
* Typed continuations: cont.bind instructions (#6365)Frank Emrich2024-03-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | This PR is part of a series that adds basic support for the [typed continuations/wasmfx proposal](https://github.com/wasmfx/specfx). This particular PR adds support for the `cont.bind` instruction for partially applying continuations, documented [here](https://github.com/wasmfx/specfx/blob/main/proposals/continuations/Overview.md#instructions). In short, these instructions are of the form `(cont.bind $ct_before $ct_after)` where `$ct_before` and `$ct_after` are related continuation types. They must only differ in the number of arguments, where `$ct_before` has _n_ additional parameters as compared to `$ct_after`, for some _n_ ≥ 0. The idea is that `(cont.bind $ct_before $ct_after)` then takes a reference to a continuation of type `$ct_before` as well as _n_ operands and returns a (reference to a) continuation of type `$ct_after`. Thus, the folded textual representation looks like `(cont.bind $ct_before $ct_after arg1 ... argn c)`. Support for the instruction is implemented in both the old and the new wat parser. Note that this PR does not implement validation of the new instruction.
* [Parser] Parse annotations, including source map comments (#6345)Thomas Lively2024-02-261-2/+2
| | | | | | | | | | Parse annotations using the standards-track `(@annotation ...)` format as well as the `;;@ source-map:0:1` format. Have the lexer implicitly collect annotations while it skips whitespace and add lexer APIs to access the annotations since the last token was parsed. Collect annotations before parsing each instruction and pass the annotations explicitly to the parser and parser context functions for instructions. Add an API to `IRBuilder` to set a debug location to be attached to the next visited or created instruction and use it from the parser.
* Typed continuations: cont.new instructions (#6308)Frank Emrich2024-02-221-0/+1
| | | | | | | | | | | | | | | | | This PR is part of a series that adds basic support for the [typed continuations/wasmfx proposal](https://github.com/wasmfx/specfx). This particular PR adds support for the `cont.new` instruction for creating continuations, documented [here(https://github.com/wasmfx/specfx/blob/main/proposals/continuations/Overview.md#instructions). In short, these instructions are of the form `(cont.new $ct)` where `$ct` must be a continuation type. The instruction takes a single (nullable) function reference as its argument, which means that the folded representation of the instruction is of the form `(cont.new $ct (foo ...))`. Support for the instruction is implemented in both the old and the new wat parser. Note that this PR does not implement validation of the new instruction.
* Typed continuations: resume instructions (#6083)Frank Emrich2024-01-111-0/+2
| | | | | This PR is part of a series that adds basic support for the [typed continuations proposal](https://github.com/wasmfx/specfx). This particular PR adds support for the `resume` instruction. The most notable missing feature is validation, which is not implemented, yet.
* [Parser] Parse br_if correctly (#6202)Thomas Lively2024-01-041-2/+2
| | | | The new text parser and IRBuilder were previously not differentiating between `br` and `br_if`. Handle `br_if` correctly by popping and assigning a condition.
* [EH] Add instructions for new proposal (#6181)Heejin Ahn2023-12-191-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | This adds basic support for the new instructions in the new EH proposal passed at the Oct CG hybrid CG meeting: https://github.com/WebAssembly/meetings/blob/main/main/2023/CG-10.md https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md This mainly adds two instructions: `try_table` and `throw_ref`. This is the bare minimum required to read and write text and binary format, and does not include analyses or optimizations. (It includes some analysis required for validation of existing instructions.) Validation for the new instructions is not yet included. `try_table` faces the same problem with the `resume` instruction in #6083 that without the module-level tag info, we are unable to know the 'sent types' of `try_table`. This solves it with a similar approach taken in #6083: this adds `Module*` parameter to `finalize` methods, which defaults to `nullptr` when not given. The `Module*` parameter is given when called from the binary and text parser, and we cache those tag types in `sentTypes` array within `TryTable` class. In later optimization passes, as long as they don't touch tags, it is fine to call `finalize` without the `Module*`. Refer to https://github.com/WebAssembly/binaryen/pull/6083#issuecomment-1854634679 and #6096 for related discussions when `resume` was added.
* Add a `tuple.drop` text pseudoinstruction (#6170)Thomas Lively2023-12-121-0/+1
| | | | | | | | | | | | | | | | | We previously overloaded `drop` to mean both normal drops of single values and also drops of tuple values. That works fine in the legacy text parser since it can infer parent-child relationships directly from the s-expression structure of the input, so it knows that a drop should drop an entire tuple if the tuple-producing instruction is a child of the drop. The new text parser, however, is much more like the binary parser in that it uses instruction types to create parent-child instructions. The new parser always assumes that `drop` is meant to drop just a single value because that's what it does in WebAssembly. Since we want to continue to let `Drop` IR expressions consume tuples, and since we will need a way to write tests for that IR pattern that work with the new parser, introduce a new pseudoinstruction, `tuple.drop`, to represent drops of tuples. This pseudoinstruction only exists in the text format and it parses to normal `Drop` expressions. `tuple.drop` takes the arity of its operand as an immediate, which will let the new parser parse it correctly in the future.
* [Parser] Parse try/catch/catch_all/delegate (#6128)Thomas Lively2023-11-291-6/+4
| | | | | | | | | | | | | | Parse the legacy v3 syntax for try/catch/catch_all/delegate in both its folded and unfolded forms. The first sources of significant complexity is the optional IDs after `catch` and `catch_all` in the unfolded form, which can be confused for tag indices and require backtracking to parse correctly. The second source of complexity is the handling of delegate labels, which are relative to the try's parent scope despite being parsed after the try's scope has already started. Handling this correctly requires punching a whole big enough to drive a truck through through both the parser and IRBuilder abstractions.
* [Parser][NFC] Filter out unused instructions in gen-s-parser.py (#6095)Thomas Lively2023-11-091-0/+5
| | | | | | The new wat parser parses block, if, loop, then, and else keywords directly rather than depending on code generated from gen-s-parser.py. Filter these keywords out in gen-s-parser.py when generating the new wat parser and delete the stub functions that the removed generated code used to depend on.
* Implement table.copy (#6078)Alon Zakai2023-11-061-2/+1
| | | Helps #5951
* [NFC][Parser] Simplify instruction handling (#5964)Thomas Lively2023-09-211-3/+2
| | | | | | | | | | | The new wat parser previously returned InstrT types when parsing individual instructions and collected InstrsT types when parsing sequences of instructions. However, instructions were always actually tracked in the internal state of the parsing context, so these types never held any interesting or necessary data. Simplify the parser by removing these types and leaning into the pattern that the parser context will keep track of parsed instructions. This allows for a much cleaner separation between the `instrs` and `foldedinstrs` parser functions.
* [Parser] Parse if-else in the new wat parser and IRBuilder (#5963)Thomas Lively2023-09-211-0/+3
| | | | | | Parse both the straight-line and folded versions of if, including the abbreviations that allow omitting the else clause. In the IRBuilder, generalize the scope stack to be able to track scopes other than blocks and add methods for visiting the beginnings of ifs and elses.
* Implement table.fill (#5949)Thomas Lively2023-09-181-0/+1
| | | | | | | | This instruction was standardized as part of the bulk memory proposal, but we never implemented it until now. Leave similar instructions like table.copy as future work. Fixes #5939.
* Replace i31.new with ref.i31 everywhere (#5931)Thomas Lively2023-09-131-1/+2
| | | | | Replace i31.new with ref.i31 in the printer, tests, and source code. Continue parsing i31.new for the time being to allow a graceful transition. Also update the JS API to reflect the new instruction name.
* Replace I31New with RefI31 everywhere (#5930)Thomas Lively2023-09-131-1/+1
| | | | | | | | Globally replace the source string "I31New" with "RefI31" in preparation for renaming the instruction from "i31.new" to "ref.i31", as implemented in the spec in https://github.com/WebAssembly/gc/pull/422. This would be NFC, except that it also changes the string in the external-facing C APIs. A follow-up PR will make the corresponding behavioral change.
* Remove legacy GC text syntax (#5929)Thomas Lively2023-09-121-1/+0
| | | | Remove the old forms of ref.test and ref.cast that took heap types instead of ref types and remove the old array.init_static name for array.new_fixed.
* Update stringref text format (#5891)Jérôme Vouillon2023-08-221-0/+9
| | | | | | | | | | | * Allow new syntax for some stringref opcodes Fixes #5607 * Update stringref text output * Update tests with new syntax for stringref opcodes Except in test/lit/strings.wat, to check that the legacy syntax still works.
* Remove legacy WasmGC instructions (#5861)Thomas Lively2023-08-091-16/+2
| | | | | Remove old, experimental instructions and type encodings that will not be shipped as part of WasmGC. Updating the encodings and text format to match the final spec is left as future work.
* [NFC] Refactor each of ArrayNewSeg and ArrayInit into subclasses for ↵Alon Zakai2023-05-041-4/+4
| | | | | | | | | | | Data/Elem (#5692) ArrayNewSeg => ArrayNewSegData, ArrayNewSegElem ArrayInit => ArrayInitData, ArrayInitElem Basically we remove the opcode and use the class type to differentiate them. This adds some code but it makes the representation simpler and more compact in memory, and it will help with #5690
* Implement array.fill, array.init_data, and array.init_elem (#5637)Thomas Lively2023-04-061-0/+3
| | | | | These complement array.copy, which we already supported, as an initial complete set of bulk array operations. Replace the WIP spec tests with the upstream spec tests, lightly edited for compatibility with Binaryen.
* Parse and print `array.new_fixed` (#5527)Thomas Lively2023-02-281-1/+2
| | | | | | | | | This is a (more) standard name for `array.init_static`. (The full upstream name in the spec repo is `array.new_canon_fixed`, but I'm still hoping we can drop `canon` from all the instruction names and it doesn't appear elsewhere in Binaryen). Update all the existing tests to use the new name and add a test specifically to ensure the old name continues parsing.
* [NFC] Internally rename `ArrayInit` to `ArrayNewFixed` (#5526)Thomas Lively2023-02-281-1/+1
| | | | | | | | To match the standard instruction name, rename the expression class without changing any parsing or printing behavior. A follow-on PR will take care of the functional side of this change while keeping support for parsing the old name. This change will allow `ArrayInit` to be used as the expression class for the upcoming `array.init_data` and `array.init_elem` instructions.
* [Strings] Add experimental string.hash instruction (#5480)Alon Zakai2023-02-031-0/+1
| | | See WebAssembly/stringref#60
* [Strings] Add experimental StringNew variants (#5459)Alon Zakai2023-01-261-4/+7
| | | | | | string.from_code_point makes a string from an int code point. string.new_utf8*_try makes a utf8 string and returns null on a UTF8 encoding error rather than trap.
* [Strings] Add string.compare (#5453)Alon Zakai2023-01-251-1/+2
| | | See WebAssembly/stringref#58
* [Wasm GC] Replace `HeapType::data` with `HeapType::struct_` (#5416)Thomas Lively2023-01-101-4/+0
| | | | | | `struct` has replaced `data` in the upstream spec, so update Binaryen's types to match. We had already supported `struct` as an alias for data, but now remove support for `data` entirely. Also remove instructions like `ref.is_data` that are deprecated and do not make sense without a `data` type.
* Represent ref.as_{func,data,i31} with RefCast (#5413)Thomas Lively2023-01-101-3/+3
| | | | | | | | | | | | | These operations are deprecated and directly representable as casts, so remove their opcodes in the internal IR and parse them as casts instead. For now, add logic to the printing and binary writing of RefCast to continue emitting the legacy instructions to minimize test changes. The few test changes necessary are because it is no longer valid to perform a ref.as_func on values outside the func type hierarchy now that ref.as_func is subject to the ref.cast validation rules. RefAsExternInternalize, RefAsExternExternalize, and RefAsNonNull are left unmodified. A future PR may remove RefAsNonNull as well, since it is also expressible with casts.
* Replace `RefIs` with `RefIsNull` (#5401)Thomas Lively2023-01-091-4/+4
| | | | | | | | | | | | | | | * Replace `RefIs` with `RefIsNull` The other `ref.is*` instructions are deprecated and expressible in terms of `ref.test`. Update binary and text parsing to parse those instructions as `RefTest` expressions. Also update the printing and emitting of `RefTest` expressions to emit the legacy instructions for now to minimize test changes and make this a mostly non-functional change. Since `ref.is_null` is the only `RefIs` instruction left, remove the `RefIsOp` field and rename the expression class to `RefIsNull`. The few test changes are due to the fact that `ref.is*` instructions are now subject to `ref.test` validation, and in particular it is no longer valid to perform a `ref.is_func` on a value outside of the `func` type hierarchy.
* Consolidate br_on* operations (#5399)Thomas Lively2023-01-061-12/+12
| | | | | | | | | | | | | | | | | | The `br_on{_non}_{data,i31,func}` operations are deprecated and directly representable in terms of the new `br_on_cast` and `br_on_cast_fail` instructions, so remove their dedicated IR opcodes in favor of representing them as casts. `br_on_null` and `br_on_non_null` cannot be consolidated the same way because their behavior is not directly representable in terms of `br_on_cast` and `br_on_cast_fail`; when the cast to null bottom type succeeds, the null check instructions implicitly drop the null value whereas the cast instructions would propagate it. Add special logic to the binary writer and printer to continue emitting the deprecated instructions for now. This will allow us to update the test suite in a separate future PR with no additional functional changes. Some tests are updated because the validator no longer allows passing non-func data to `br_on_func`. Doing so has not made sense since we separated the three reference type hierarchies.
* [Parser] Parse blocks (#5393)Thomas Lively2023-01-051-1/+1
| | | | | | | | | | | | | Parse both the folded and unfolded forms of blocks and structure the code to make supporting additional block instructions like if-else and try-catch relatively simple. Parsing block types is extra fun because they may implicitly define new signature heap types via a typeuse, but only if their types are not given by a single result type. To figuring out whether a new type may be introduced in all the relevant parsing stages, always track at least the arity of parsed results. The parser parses block labels, but more work will be required to support branch instructions that use them.
* Fix OOB string_view read in generated parser code (#5349)Thomas Lively2022-12-141-7/+5
| | | | | | | | | The `op` string_view was intentionally created to point into the `buf` buffer so that reading past its end would still be safe, but some C++ standard library implementations assert when reading past the end of a string_view. Change the generated code to read out of `buf` instead to avoid those assertions. Fixes #5322. Fixes #5342.
* Add standard versions of WasmGC casts (#5331)Thomas Lively2022-12-071-9/+12
| | | | | | | We previously supported only the non-standard cast instructions introduced when we were experimenting with nominal types. Parse the names and opcodes of their standard counterparts and switch to emitting the standard names and opcodes. Port all of the tests to use the standard instructions, but add additional tests showing that the non-standard versions are still parsed correctly.
* Implement `array.new_data` and `array.new_elem` (#5214)Thomas Lively2022-11-071-0/+2
| | | | | | | | | In order to test them, fix the binary and text parsers to accept passive data segments even if a module has no memory. In addition to parsing and emitting the new instructions, also implement their validation and interpretation. Test the interpretation directly with wasm-shell tests adapted from the upstream spec tests. Running the upstream spec tests directly would require fixing too many bugs in the legacy text parser, so it will have to wait for the new text parser to be ready.
* [Parser] Parse loads and stores (#5174)Thomas Lively2022-10-211-24/+28
| | | | | | | | | | Add parsing functions for `memarg`s, the offset and align fields of load and store instructions. These fields are interesting because they are lexically reserved words that need to be further parsed to extract their actual values. On top of that, add support for parsing all of the load and store instructions. This required fixing a buffer overflow problem in the generated parser code and adding more information to the signatures of the SIMD load and store instructions. `SIMDLoadStoreLane` instructions are particularly interesting because they may require backtracking to parse correctly.
* [NFC] Avoid re-parsing instruction names (#5171)Thomas Lively2022-10-201-88/+88
| | | | | | | | | Since gen-s-parser.py is essentially a giant table mapping instruction names to the information necessary to construct the corresponding IR nodes, there should be no need to further parse instruction names after the code generated by gen-s-parser.py runs. However, memory instruction parsing still parsed instruction names to get information such as size and default alignment. The new parser does not have the ability to parse that information out of instruction names, so put it in the gen-s-parser.py table instead.
* [Parser][NFC] Pass instruction locations to `makeXXX` functions (#5133)Thomas Lively2022-10-121-2/+2
| | | | | | | | The `makeXXX` functions that are responsible for individual instructions will generally need the locations of those functions to emit useful errors. However, since the instruction names are parsed before the `makeXXX` functions are called, the functions have no good way of getting the location of the beginning of the instruction. Fix this by explicitly passing them the location of the beginning of the instruction.
* [Parser][NFC] Move `ParseInput` into the parser context (#5132)Thomas Lively2022-10-121-2/+2
| | | | | | | | | | | | | Rather than passing both a `Ctx` and a `ParseInput` to every parsing function, pass only a `Ctx` with a `ParseInput` inside of it. This significantly reduces verbosity in the parser. To handle cases where parsing needs to happen at specific locations, which used to be handled by constructing a new `ParseInput` independent from the ctx, introduce a new RAII utility for temporarily changing the location of the `ParseInput` inside a context. Also add a utility for generating an error at a particular location to avoid having to construct new `ParseInput` objects just for that purpose. This resolves a few TODOs about correcting error locations, but since we don't test those yet, I still consider this NFC.
* Make `Name` a pointer, length pair (#5122)Thomas Lively2022-10-111-3/+6
| | | | | | | | | | | | | | | | | | | | | | | With the goal of supporting null characters (i.e. zero bytes) in strings. Rewrite the underlying interned `IString` to store a `std::string_view` rather than a `const char*`, reduce the number of map lookups necessary to intern a string, and present a more immutable interface. Most importantly, replace the `c_str()` method that returned a `const char*` with a `toString()` method that returns a `std::string`. This new method can correctly handle strings containing null characters. A `const char*` can still be had by calling `data()` on the `std::string_view`, although this usage should be discouraged. This change is NFC in spirit, although not in practice. It does not intend to support any particular new functionality, but it is probably now possible to use strings containing null characters in at least some cases. At least one parser bug is also incidentally fixed. Follow-on PRs will explicitly support and test strings containing nulls for particular use cases. The C API still uses `const char*` to represent strings. As strings containing nulls become better supported by the rest of Binaryen, this will no longer be sufficient. Updating the C and JS APIs to use pointer, length pairs is left as future work.
* Implement `extern.externalize` and `extern.internalize` (#4975)Thomas Lively2022-08-291-0/+2
| | | | These new GC instructions infallibly convert between `extern` and `any` references now that those types are not in the same hierarchy.
* Remove RTTs (#4848)Thomas Lively2022-08-051-10/+0
| | | | | | | RTTs were removed from the GC spec and if they are added back in in the future, they will be heap types rather than value types as in our implementation. Updating our implementation to have RTTs be heap types would have been more work than deleting them for questionable benefit since we don't know how long it will be before they are specced again.
* [Strings] GC variants for string.encode (#4817)Alon Zakai2022-07-211-0/+2
|
* [Strings] Add string.new GC variants (#4813)Alon Zakai2022-07-191-0/+2
|
* [Strings] stringview_wtf16.length (#4809)Alon Zakai2022-07-181-0/+1
| | | | This measures the length of a view, so it seems simplest to make it a sub-operation of the existing measure instruction.
* [Strings] stringview_*.slice (#4805)Alon Zakai2022-07-151-0/+3
| | | | | | | Unfortunately one slice is the same as python [start:end], using 2 params, and the other slice is one param, [CURR:CURR+num] (where CURR is implied by the current state in the iter). So we can't use a single class here. Perhaps a different name would be good, like slice vs substring (like JS does), but I picked names to match the current spec.
* [Strings] stringview access operations (#4798)Alon Zakai2022-07-131-0/+5
|
* [Strings] string.as (#4797)Alon Zakai2022-07-121-0/+3
|
* [Parser] Start to parse instructions (#4789)Thomas Lively2022-07-111-6/+22
| | | | | | | | | | | | | | | | | | | | | Update gen-s-parser.py to produce a second version of its parsing code that works with the new wat parser. The new version automatically replaces the `s` element argument in the existing parser with the `ctx` and `in` arguments used by the new parser, so adding new instructions will not require any additional work in gen-s-parser.py after this change. Also add stub `make***` functions to the new wat parser, with a few filled out, namely `makeNop`, `makeUnreachable`, `makeConst`, and `makeRefNull`. Update the `global` parser to parse global initializer instructions and update wat-kitchen-sink.wast to demonstrate that the instructions are parsed correctly. Adding new instruction classes will require adding a new `make***` function to wat-parser.cpp in additional to wasm-s-parser.{h,cpp} after this change, but adding a trivial failing implementation is good enough for the time being, so I don't expect this to appreciably increase our maintenance burden in the near term. The infrastructure for parsing folded instructions, instructions with operands, and control flow instructions will be implemented in future PRs.
* [Strings] string.is_usv_sequence (#4783)Alon Zakai2022-07-081-0/+1
| | | | | | | This implements it as a StringMeasure opcode. They do have the same number of operands, same trapping behavior, and same return type. They both get a string and do some inspection of it to return an i32. Perhaps the name could be StringInspect or something like that, rather than StringMeasure..? But I think for now this might be good enough, and the spec may change anyhow later.
* [Strings] string.eq (#4781)Alon Zakai2022-07-081-0/+1
|