summaryrefslogtreecommitdiff
path: root/src/parser
Commit message (Collapse)AuthorAgeFilesLines
* [Strings] Represent string values as WTF-16 internally (#6418)Thomas Lively2024-03-222-20/+10
| | | | | | | | | | | | | | | | WTF-16, i.e. arbitrary sequences of 16-bit values, is the encoding of Java and JavaScript strings, and using the same encoding makes the interpretation of string operations trivial, even when accounting for non-ascii characters. Specifically, use little-endian WTF-16. Re-encode string constants from WTF-8 to WTF-16 in the parsers, then back to WTF-8 in the writers. Update the constructor for string `Literal`s to interpret the string as WTF-16 and store a sequence of WTF-16 code units, i.e. 16-bit integers. Update `Builder::makeConstantExpression` accordingly to convert from the new `Literal` string representation back to a WTF-16 string. Update the interpreter to remove the logic for detecting non-ascii characters and bailing out. The naive implementations of all the string operations are correct now that our string encoding matches the JS string encoding.
* Typed continuations: suspend instructions (#6393)Frank Emrich2024-03-192-0/+19
| | | | | | | | | | | | | | | | | | | | | This PR is part of a series that adds basic support for the [typed continuations/wasmfx proposal](https://github.com/wasmfx/specfx). This particular PR adds support for the `suspend` instruction for suspending with a given tag, documented [here](https://github.com/wasmfx/specfx/blob/main/proposals/continuations/Overview.md#instructions). These instructions are of the form `(suspend $tag)`. Assuming that `$tag` is defined with _n_ `param` types `t_1` to `t_n`, the instruction consumes _n_ arguments of types `t_1` to `t_n`. Its result type is the same as the `result` type of the tag. Thus, the folded textual representation looks like `(suspend $tag arg1 ... argn)`. Support for the instruction is implemented in both the old and the new wat parser. Note that this PR does not implement validation of the new instruction. This PR also fixes finalization of `cont.new`, `cont.bind` and `resume` nodes in those cases where any of their children are unreachable.
* [Parser] Propagate debug locations like the old parser (#6377)Thomas Lively2024-03-051-0/+55
| | | | | | | | | Add a pass that propagates debug locations to unannotated child and sibling expressions after parsing. The new parser on its own only attaches debug locations to directly annotated instructions, but this pass, which we run unconditionally, emulates the behavior of the previous parser for compatibility with existing programs. It does unintuitive things to programs using the non-nested format because it runs on nested Binaryen IR, so we may want to rethink this at some point.
* [Parser] Support prologue and epilogue sourcemap annotations (#6370)Thomas Lively2024-03-044-32/+58
| | | | | | | and fix a bug with sourcemap annotations on folded `if` conditions. Update IRBuilder to apply prologue and epilogue source locations when beginning and ending a function scope. Add basic support in the parser for explicitly tracking annotations on module fields, although only do anything with them in the case of prologue source location annotations.
* Typed continuations: cont.bind instructions (#6365)Frank Emrich2024-03-042-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | This PR is part of a series that adds basic support for the [typed continuations/wasmfx proposal](https://github.com/wasmfx/specfx). This particular PR adds support for the `cont.bind` instruction for partially applying continuations, documented [here](https://github.com/wasmfx/specfx/blob/main/proposals/continuations/Overview.md#instructions). In short, these instructions are of the form `(cont.bind $ct_before $ct_after)` where `$ct_before` and `$ct_after` are related continuation types. They must only differ in the number of arguments, where `$ct_before` has _n_ additional parameters as compared to `$ct_after`, for some _n_ ≥ 0. The idea is that `(cont.bind $ct_before $ct_after)` then takes a reference to a continuation of type `$ct_before` as well as _n_ operands and returns a (reference to a) continuation of type `$ct_after`. Thus, the folded textual representation looks like `(cont.bind $ct_before $ct_after arg1 ... argn c)`. Support for the instruction is implemented in both the old and the new wat parser. Note that this PR does not implement validation of the new instruction.
* [Parser] Support inline data in 64-bit memory declarations (#6364)Thomas Lively2024-02-291-7/+24
| | | | This new form of the abbreviated memory declaration with inline data is introduced in the memory64 proposal.
* [Parser] Do not require a memory for GC string ops (#6363)Thomas Lively2024-02-292-12/+54
| | | | | We previously required a memory to exist while parsing all `StringNew` and `StringEncode` instructions, even though some variants of the instructions use GC arrays instead. Require a memory only for those instructions that use one.
* [Parser] Parse annotations, including source map comments (#6345)Thomas Lively2024-02-264-583/+1700
| | | | | | | | | | Parse annotations using the standards-track `(@annotation ...)` format as well as the `;;@ source-map:0:1` format. Have the lexer implicitly collect annotations while it skips whitespace and add lexer APIs to access the annotations since the last token was parsed. Collect annotations before parsing each instruction and pass the annotations explicitly to the parser and parser context functions for instructions. Add an API to `IRBuilder` to set a debug location to be attached to the next visited or created instruction and use it from the parser.
* Typed continuations: cont.new instructions (#6308)Frank Emrich2024-02-222-0/+15
| | | | | | | | | | | | | | | | | This PR is part of a series that adds basic support for the [typed continuations/wasmfx proposal](https://github.com/wasmfx/specfx). This particular PR adds support for the `cont.new` instruction for creating continuations, documented [here(https://github.com/wasmfx/specfx/blob/main/proposals/continuations/Overview.md#instructions). In short, these instructions are of the form `(cont.new $ct)` where `$ct` must be a continuation type. The instruction takes a single (nullable) function reference as its argument, which means that the folded representation of the instruction is of the form `(cont.new $ct (foo ...))`. Support for the instruction is implemented in both the old and the new wat parser. Note that this PR does not implement validation of the new instruction.
* [Parser][NFC] Remove `Token` from lexer interface (#6333)Thomas Lively2024-02-222-44/+46
| | | | | | Replace the general `peek` method that returned a `Token` with specific peek methods that look for (but do not consume) specific kinds of tokens. This change is a prerequisite for simplifying the lexer implementation by removing `Token` entirely.
* [Parser][NFC] Remove parser/input.h (#6332)Thomas Lively2024-02-226-111/+31
| | | | Remove the layer of abstraction sitting between the parser and the lexer now that the lexer has an interface the parser can use directly.
* [Parser] Simplify the lexer interface (#6319)Thomas Lively2024-02-203-318/+252
| | | | | | | | | | | The lexer was previously an iterator over tokens, but that expressivity is not actually used in the parser. Instead, we have `input.h` that adapts the token iterator interface into an iterface that is actually useful. As a first step toward simplifying the lexer implementation to no longer be an iterator over tokens, update its interface by moving the adaptation from input.h to the lexer itself. This requires extensive changes to the lexer unit tests, which will not have to change further when we actually simplify the lexer implementation.
* [Parser] Parse `resume` (#6295)Thomas Lively2024-02-092-9/+53
|
* [Parser] Support references to struct fields by name (#6293)Thomas Lively2024-02-082-11/+28
| | | | Construct a mapping from heap type and field name to field index, then use it while parsing instructions.
* [Parser] Do not involve IRBuilder for imported functions (#6286)Thomas Lively2024-02-073-12/+11
| | | | | | | | | | We previously had a bug where we would begin and end an IRBuilder context for imported functions even though they don't have bodies. For functions that return results, ending this empty scope should have produced an error except that we had another bug where we only produced that error for multivalue functions. We did not previously have imported multivalue functions in wat-kitchen-sink.wast, so both of these bugs went undetected. Fix both bugs and update the test to include an imported multivalue function so that it would have failed without this fix.
* [Parser] Support string-style identifiers (#6278)Thomas Lively2024-02-062-29/+68
| | | | | | | | | | In addition to normal identifiers, support parsing identifiers of the format `$"..."`. This format is not yet allowed by the standard, but it is a popular proposed extension (see https://github.com/WebAssembly/spec/issues/617 and https://github.com/WebAssembly/annotations/issues/21). Binaryen has historically allowed a similar format and has supported arbitrary non-standard identifier characters, so it's much easier to support this extended syntax than to fix everything to use the restricted standard syntax.
* [Parser] Parse v128.const (#6275)Thomas Lively2024-02-055-1/+142
|
* [Parser] Templatize lexing of integers (#6272)Thomas Lively2024-02-054-108/+50
| | | | | | Have a single implementation for lexing each of unsigned, signed, and uninterpreted integers, each generic over the bit width of the integer. This reduces duplication in the existing code and it will make it much easier to support lexing more 8- and 16-bit integers.
* [Parser] Parse start declarations (#6256)Thomas Lively2024-01-303-0/+37
|
* [Parser] Parse pops (by doing nothing) (#6252)Thomas Lively2024-01-302-1/+8
| | | | | | | | | | | | | Parse pop expressions and check that they have the expected types, but do not actually create new Pop expressions or push anything onto the stack because we already create Pop expressions as necessary when visiting the beginning of catch blocks. Unlike the legacy text parser, the new text parser is not capable of parsing pops in invalid locations in the IR. This means that the new text parser will never be able to parse test/lit/catch-pop-fixup-eh-old.wast, which deliberately parses invalid IR to check that the pops can be fixed up and moved to the correct locations. It should be acceptable to delete that test when we turn on the new parser by default, though, so that won't be a problem.
* [Parser] Parse tuple types (#6249)Thomas Lively2024-01-292-5/+45
| | | | | Use the new `(tuple ...)` syntax. Enforce that tuples have a valid number of elements and are not nested to avoid assertion failures when parsing invalid input.
* [Parser] Parse throw_ref (#6238)Thomas Lively2024-01-252-1/+6
|
* [Parser] Parse try_table (#6237)Thomas Lively2024-01-252-3/+136
|
* Typed continuations: resume instructions (#6083)Frank Emrich2024-01-111-0/+5
| | | | | This PR is part of a series that adds basic support for the [typed continuations proposal](https://github.com/wasmfx/specfx). This particular PR adds support for the `resume` instruction. The most notable missing feature is validation, which is not implemented, yet.
* [Parser] Parse remaining heap and reference types (#6218)Thomas Lively2024-01-102-20/+60
| | | Parse types like `exnref` and `nofunc` that we did not previously support.
* [Parser] Parse br_if correctly (#6202)Thomas Lively2024-01-042-6/+7
| | | | The new text parser and IRBuilder were previously not differentiating between `br` and `br_if`. Handle `br_if` correctly by popping and assigning a condition.
* [Parser] Go back to "sub final" intead of "sub open" (#6199)Thomas Lively2024-01-031-1/+1
| | | | The planned spec change to use "sub open" never came together, so the standard format remains "sub final".
* [Parser] Parse br_on_cast{_fail} input annotations (#6198)Thomas Lively2024-01-032-7/+13
| | | | And validate in IRBuilder both that the input annotation is valid and that the input matches it.
* [Parser] Parse folded instructions that contain parentheses (#6196)Thomas Lively2024-01-032-38/+51
| | | | | | | | | | | | To parse folded instructions in the right order, we need to defer parsing each instruction until we have parsed each of its children and found its closing parenthesis. Previously we naively looked for parentheses to determine where instructions began and ended before we parsed them, but that scheme did not correctly handle instructions that can contain parentheses in their immediates, such as call_indirect. Fix the problem by using the actual instruction parser functions with a placeholder context to find the end of the instructions, including any kind of immediates they might have.
* [Parser] Support standalone import definitions (#6191)Thomas Lively2024-01-023-7/+95
| | | | We previously support the in-line import abbreviation, but now add support for explicit, non-abbreviated imports as well.
* [EH] Add instructions for new proposal (#6181)Heejin Ahn2023-12-191-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | This adds basic support for the new instructions in the new EH proposal passed at the Oct CG hybrid CG meeting: https://github.com/WebAssembly/meetings/blob/main/main/2023/CG-10.md https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md This mainly adds two instructions: `try_table` and `throw_ref`. This is the bare minimum required to read and write text and binary format, and does not include analyses or optimizations. (It includes some analysis required for validation of existing instructions.) Validation for the new instructions is not yet included. `try_table` faces the same problem with the `resume` instruction in #6083 that without the module-level tag info, we are unable to know the 'sent types' of `try_table`. This solves it with a similar approach taken in #6083: this adds `Module*` parameter to `finalize` methods, which defaults to `nullptr` when not given. The `Module*` parameter is given when called from the binary and text parser, and we cache those tag types in `sentTypes` array within `TryTable` class. In later optimization passes, as long as they don't touch tags, it is fine to call `finalize` without the `Module*`. Refer to https://github.com/WebAssembly/binaryen/pull/6083#issuecomment-1854634679 and #6096 for related discussions when `resume` was added.
* [Parser] Parse explicit exports (#6179)Thomas Lively2023-12-143-0/+78
|
* [Parser] Parse tuple operations (#6174)Thomas Lively2023-12-132-3/+41
| | | | | Parse `tuple.make`, `tuple.extract`, and `tuple.drop`. Also slightly improve the way we break up tuples into individual elements in IRBuilder by using a `local.tee` instead of a block containing a `local.set` and `local.get`.
* [Parser] Parse the remaining array operations (#6158)Thomas Lively2023-12-124-23/+191
| | | | | | | Parse `array.new_elem`, `array.init_data`, and `array.init_elem`. Accidentally also includes: * [Parser] Parse string types and operations (#6161)
* [Parser] Parse rethrow (#6155)Thomas Lively2023-12-122-1/+8
| | | | Like `delegate`, rethrow takes a `Try` label. Refactor the delegate handling so that `Try` can share its logic.
* [Parser] Parse table operations (#6154)Thomas Lively2023-12-122-6/+67
| | | | Including table.get, table.set, table.size, table.grow, table.fill, and table.copy.
* Add a `tuple.drop` text pseudoinstruction (#6170)Thomas Lively2023-12-121-0/+5
| | | | | | | | | | | | | | | | | We previously overloaded `drop` to mean both normal drops of single values and also drops of tuple values. That works fine in the legacy text parser since it can infer parent-child relationships directly from the s-expression structure of the input, so it knows that a drop should drop an entire tuple if the tuple-producing instruction is a child of the drop. The new text parser, however, is much more like the binary parser in that it uses instruction types to create parent-child instructions. The new parser always assumes that `drop` is meant to drop just a single value because that's what it does in WebAssembly. Since we want to continue to let `Drop` IR expressions consume tuples, and since we will need a way to write tests for that IR pattern that work with the new parser, introduce a new pseudoinstruction, `tuple.drop`, to represent drops of tuples. This pseudoinstruction only exists in the text format and it parses to normal `Drop` expressions. `tuple.drop` takes the arity of its operand as an immediate, which will let the new parser parse it correctly in the future.
* [Parser] Parse call_indirect and return_call_indirect (#6148)Thomas Lively2023-12-062-1/+26
|
* [Parser] Parse tables and element segments (#6147)Thomas Lively2023-12-065-20/+537
| | | | | | | These module fields are especially complex to parse because they contain both nontrivial types and instructions, so their parsing logic needs to be spread out across the ParseDecls, ParseModuleTypes, and ParseDefs phases of parsing. This applies to in-line elements in table definitions as well, which means we need to be able to match a table to its in-line element segment across multiple phases.
* [Parser] Parse try/catch/catch_all/delegate (#6128)Thomas Lively2023-11-292-23/+206
| | | | | | | | | | | | | | Parse the legacy v3 syntax for try/catch/catch_all/delegate in both its folded and unfolded forms. The first sources of significant complexity is the optional IDs after `catch` and `catch_all` in the unfolded form, which can be confused for tag indices and require backtracking to parse correctly. The second source of complexity is the handling of delegate labels, which are relative to the try's parent scope despite being parsed after the try's scope has already started. Handling this correctly requires punching a whole big enough to drive a truck through through both the parser and IRBuilder abstractions.
* [Parser] Parse tags and throw (#6126)Thomas Lively2023-11-205-11/+144
| | | | Also fix the parser to correctly error if an imported item appears after a non-imported item and make the corresponding fix to the test.
* [Parser] Parse call_ref (#6103)Thomas Lively2023-11-152-6/+11
| | | | Also mark array.new_elem as unimplemented as a drive-by; it previously had an incorrect implementation.
* [Parser] Parse array.new_fixed (#6102)Thomas Lively2023-11-152-1/+15
|
* [Parser] Parse RefAs expressions (#6101)Thomas Lively2023-11-152-1/+6
|
* [Parser] Parse BrOn expressions (#6100)Thomas Lively2023-11-152-2/+19
|
* [Parser] Parse ref.test and ref.cast (#6099)Thomas Lively2023-11-152-2/+16
|
* [Parser] Parse br_table (#6098)Thomas Lively2023-11-152-1/+22
|
* [Parser] Parse ref.func (#6097)Thomas Lively2023-11-152-3/+8
|
* [Parser][NFC] Filter out unused instructions in gen-s-parser.py (#6095)Thomas Lively2023-11-091-23/+0
| | | | | | The new wat parser parses block, if, loop, then, and else keywords directly rather than depending on code generated from gen-s-parser.py. Filter these keywords out in gen-s-parser.py when generating the new wat parser and delete the stub functions that the removed generated code used to depend on.
* [Parser] Parse `call` and `return_call` (#6086)Thomas Lively2023-11-072-3/+41
| | | | To support parsing calls, add support for parsing function indices and building calls with IRBuilder.