summaryrefslogtreecommitdiff
path: root/src/parser
Commit message (Collapse)AuthorAgeFilesLines
* [NFC] Make MemoryOrder parameters non-optional (#7171)Thomas Lively2024-12-212-8/+7
| | | | | | Update Builder and IRBuilder makeStructGet and makeStructSet functions to require the memory order to be explicitly supplied. This is slightly more verbose, but will reduce the chances that we forget to properly consider synchronization when implementing new features in the future.
* Support atomic struct accessors (#7155)Thomas Lively2024-12-182-8/+63
| | | | | | | | | | | | | | | | | | | | | Implement support for both sequentially consistent and acquire-release variants of `struct.atomic.get` and `struct.atomic.set`, as proposed by shared-everything-threads. Introduce a new `MemoryOrdering` enum for describing different levels of atomicity (or the lack thereof). This new enum should eventually be adopted by linear memory atomic accessors as well to support acquire-release semantics, but for now just use it in `StructGet` and `StructSet`. In addition to implementing parsing and emitting for the instructions, validate that shared-everything is enabled to use them, mark them as having synchronization side effects, and lightly optimize them by relaxing acquire-release accesses to non-shared structs to normal, unordered accesses. This is valid because such accesses cannot possibly synchronize with other threads. Also update Precompute to avoid optimizing out synchronization points. There are probably other passes that need to be updated to avoid incorrectly optimizing synchronizing accesses, but identifying and fixing them is left as future work.
* Support control flow inputs in IRBuilder (#7149)Thomas Lively2024-12-131-23/+22
| | | | | | | | | | | | | | | | | | | | Since multivalue was standardized, WebAssembly has supported not only multiple results but also an arbitrary number of inputs on control flow structures, but until now Binaryen did not support control flow input. Binaryen IR still has no way to represent control flow input, so lower it away using scratch locals in IRBuilder. Since both the text and binary parsers use IRBuilder, this gives us full support for parsing control flow inputs. The lowering scheme is mostly simple. A local.set writing the control flow inputs to a scratch local is inserted immediately before the control flow structure begins and a local.get retrieving those inputs is inserted inside the control flow structure before the rest of its body. The only complications come from ifs, in which the inputs must be retrieved at the beginning of both arms, and from loops, where branches to the beginning of the loop must be transformed so their values are written to the scratch local along the way. Resolves #6407.
* Mark Result and MaybeResult [[nodiscard]] (#7083)Thomas Lively2024-11-151-3/+3
| | | | | | Since these types may be carrying errors that need to be handled or propagated, it is always an error not to use them in some way. Adding the [[nodiscard]] attribute caused the compiler to find a few instances where we were incorrectly ignoring results. Fix these places.
* Reset function context when ending a function in IRBuilder (#7081)Thomas Lively2024-11-151-1/+1
| | | | | | | | | | | | | | | | | | | IRBuilder contains a pointer to the current function that is used to create scratch locals, look up the operand types for returns, etc. This pointer is nullable because IRBuilder can also be used in non-function contexts such as global initializers. Visiting the start of a function sets the function pointer, and after this change visiting the end of a function resets the pointer to null. This avoids potential problems where code outside a function would be able to incorrectly use scratch locals and returns if the IRBuilder had previously been used to build a function. This change requires some adjustments to Outlining, which visits code out of order, so ends up visiting code from inside a function after visiting the end of the function. To support this use case, add a `setFunction` method to IRBuilder that lets the user explicitly control its function context. Also remove the optional function pointer parameter to the IRBuilder constructor since it is less flexible and not used.
* Rename indexType -> addressType. NFC (#7060)Sam Clegg2024-11-073-33/+35
| | | See https://github.com/WebAssembly/memory64/pull/92
* Fix typo in parsers.h (#7032)Angela Upreti2024-10-251-1/+1
| | | Corrected `maybeRefType` declaration to `maybeReftype`.
* Source Maps: Support 5 segment mappings (#6795)Ömer Sinan Ağacan2024-10-011-3/+24
| | | | | | | Support 5-segment source mappings, which add a name. Reference: https://github.com/tc39/source-map/blob/main/source-map-rev3.md#proposed-format
* Require string-style identifiers to be UTF-8 (#6941)Thomas Lively2024-09-161-0/+10
| | | | | | | | | | | In the WebAssembly text format, strings can generally be arbitrary bytes, but identifiers must be valid UTF-8. Check for UTF-8 validity when parsing string-style identifiers in the lexer. Update StringLowering to generate valid UTF-8 global names even for strings that may not be valid UTF-8 and test that text round tripping works correctly after StringLowering. Fixes #6937.
* Fix parser error on block params (#6932)Thomas Lively2024-09-111-6/+6
| | | | | | | | | | The error checking we had to report an error when the input contains block parameters was in a code path that is no longer executed under normal circumstances. Specifically, it was part of the `ParseModuleTypesCtx` phase of parsing, which no longer parses function bodies. Move the error checking to the `ParseDefsCtx` phase, which does parse function bodies. Fixes #6929.
* Add a --preserve-type-order option (#6916)Thomas Lively2024-09-101-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike other module elements, types are not stored on the `Module`. Instead, they are collected by traversing the IR before printing and binary writing. The code that collects the types tries to optimize the order of rec groups based on the number of times each type is used. As a result, the output order of types generally has no relation to the input order of types. In addition, most type optimizations rewrite the types into a single large rec group, and the order of types in that group is essentially arbitrary. Changes to the code for counting type uses, sorting types, or sorting rec groups can yield very large changes in the output order of types, producing test diffs that are hard to review and potentially harming the readability of tests by moving output types away from the corresponding input types. To help make test output more stable and readable, introduce a tool option that causes the order of output types to match the order of input types as closely as possible. It is implemented by having the parsers record the indices of the input types on the `Module` just like they already record the type names. The `GlobalTypeRewriter` infrastructure used by type optimizations associates the new types with the old indices just like it already does for names and also respects the input order when rewriting types into a large recursion group. By default, wasm-opt and other tools clear the recorded type indices after parsing the module, so their default behavior is not modified by this change. Follow-on PRs will use the new flag in more tests, which will generate large diffs but leave the tests in stable, more readable states that will no longer change due to other changes to the optimizing type sorting logic.
* Fix a warning under gcc 14 (#6912)Scott Moser2024-09-091-0/+5
| | | Fixes: https://github.com/WebAssembly/binaryen/issues/6779
* Check for required actions when parsing wast (#6874)Thomas Lively2024-08-271-3/+11
| | | | | | | | | The parser function for `action` returned a `MaybeResult`, but we were treating it as returning a normal `Result` and not checking that it had contents in several places. Replace the current `action()` with `maybeAction()` and add a new `action()` that requires the action to be present. Fixes #6872.
* Support more reference constants in wast scripts (#6865)Thomas Lively2024-08-261-0/+28
| | | | | | | | | | | | | | Spec tests use constants like `ref.array` and `ref.eq` to assert that exported function return references of the correct types. Support more such constants in the wast parser. Also fix a bug where the interpretation of `array.new_data` for arrays of packed fields was not properly truncating the packed data. Move the function for reading fields from memory from literal.cpp to wasm-interpreter.h, where the function for truncating packed data lives. Other bugs prevent us from enabling any more spec tests as a result of this change, but we can get farther through several of them before failing. Update the comments about the failures accordingly.
* Support `ref.extern n` in spec tests (#6858)Thomas Lively2024-08-211-1/+11
| | | | | | | | | | | | | | | | | Spec tests pass the value `ref.extern n`, where `n` is some integer, into exported functions that expect to receive externrefs and receive such values back out as return values. The payload serves to distinguish externrefs so the test can assert that the correct one was returned. Parse these values in wast scripts and represent them as externalized i31refs carrying the payload. We will need a different representation eventually, since some tests explicitly expect these externrefs to not be i31refs, but this suffices to get several new tests passing. To get the memory64 version of table_grow.wast passing, additionally fix the interpreter to handle growing 64-bit tables correctly. Delete the local versions of the upstream tests that can now be run successfully.
* Implement table.init (#6827)Alon Zakai2024-08-162-0/+25
| | | | | Also use TableInit in the interpreter to initialize module's table state, which will now handle traps properly, fixing #6431
* Add missing parser error check in makeArrayInitElem (#6835)Sofi Aberegg2024-08-131-0/+1
| | | Fixes #6833
* Typed continuations: update syntax of handler clauses (#6824)Frank Emrich2024-08-091-2/+2
| | | | | | | | | | | | | | | | | | | | | The syntax for handler clauses in `resume` instructions has recently changed, using `on` instead of `tag` now. Instead of ``` (resume $ct (tag $tag0 $block0) ... (tag $tagn $blockn)) ``` we now have ``` (resume $ct (on $tag0 $block0) ... (on $tagn $blockn)) ``` This PR adapts parsing, printing, and some tests accordingly. (Note that this PR deliberately makes none of the other changes that will arise from implementing the new, combined stack switching proposal, yet.)
* [NFC][parser] Rename deftype and subtype (#6819)Thomas Lively2024-08-074-36/+43
| | | | | | Match the current spec and clarify terminology by renaming the old `deftype` to `rectype` and renaming the old `subtype` to `typedef`. Also split the parser for actual `subtype` out of the parser for the newly named `typedef`.
* [parser] Fix bug when printing type builder errors (#6817)Thomas Lively2024-08-061-1/+1
| | | | | | The type index from the TypeBuilder error was mapped to a file location incorrectly, resulting in an assertion failure. Fixes #6816.
* Make source parser consistent with binary parser when naming things. NFC (#6813)Sam Clegg2024-08-061-2/+3
| | | | | The `timport$` prefix is already used for tables, so the binary parser currently uses `eimport$` to name tags (I guess because they are normally exception tags?).
* [threads] ref.i31_shared (#6735)Thomas Lively2024-07-123-7/+22
| | | | | | | Implement `ref.i31_shared` the new instruction for creating references to shared i31s. Implement binary and text parsing and emitting as well as interpretation. Copy the upstream spec test for i31 and modify it so that all the heap types are shared. Comment out some parts that we do not yet support.
* Validate that names are valid UTF-8 (#6682)Thomas Lively2024-06-191-4/+5
| | | | | | Add an `isUTF8` utility and use it in both the text and binary parsers. Add missing checks for overlong encodings and overlarge code points in our WTF8 reader, which the new utility uses. Re-enable the spec tests that test UTF-8 validation.
* Re-enable binary.wast spec test (#6677)Thomas Lively2024-06-181-0/+2
| | | | | | Fix the wast parser to accept IDs on quoted modules, remove tests that are invalidated by the multimemory proposal, and add validation that the total number of variables in a function is less than 2^32 and that the code section is present if there is a non-empty function section.
* [threads] Shared basic heap types (#6667)Thomas Lively2024-06-192-70/+114
| | | | | | | | | | | Implement binary and text parsing and printing of shared basic heap types and incorporate them into the type hierarchy. To avoid the massive amount of code duplication that would be necessary if we were to add separate enum variants for each of the shared basic heap types, use bit 0 to indicate whether the type is shared and replace `getBasic()` with `getBasic(Unshared)`, which clears that bit. Update all the use sites to record whether the original type was shared and produce shared or unshared output without code duplication.
* [Parser] Fix error message on required reftype (#6666)Thomas Lively2024-06-141-9/+15
| | | | | | | | | Not all uses of the `reftype` parser handled the fact that it returned a `MaybeResult`. Change its name to `maybeReftype`, add a new `reftype` parser that returns an error if there is no reftype, and update all the use sites. Fixes #6655.
* [Parser] Update requirements for implicit type uses (#6665)Thomas Lively2024-06-141-1/+1
| | | | | | | As an abbreviation, a `typeuse` can be given as just a list of parameters and results, in which case it corresponds to the index of the first function type with the same parameters and results. That function type must also be an MVP function type, i.e. it cannot have a nontrivial rec group, be non-final, or have a declared supertype. The parser did not previously implement all of these rules.
* [threads] Binary reading and writing of shared composite types (#6664)Thomas Lively2024-06-141-1/+2
| | | | Also update the parser so that implicit type uses are not matched with shared function types.
* [Parser][NFC] Make typeidx and maybeTypeidx return consistent types (#6663)Thomas Lively2024-06-142-13/+12
| | | | | | | Since the BasicHeapTypes are in an enum, calling HeapType methods on them requires something like `HeapType(HeapType::func).someMethod()`. This is unnecessarily verbose, so add a new `HeapTypes` namespace that contains constexpr HeapType globals that can be used instead, shorting this to `HeapTypes::func.someMethod()`.
* [threads] Parse, build, and print shared composite types (#6654)Thomas Lively2024-06-122-10/+29
| | | | | | | | | | | | | | Parse the text format for shared composite types as described in the shared-everything thread proposal. Update the parser to use 'comptype' instead of 'strtype' to match the final GC spec and add the new syntactic class 'sharecomptype'. Update the type canonicalization logic to take sharedness into account to avoid merging shared and unshared types. Make the same change in the TypeMerging pass. Ensure that shared and unshared types cannot be in a subtype relationship with each other. Follow-up PRs will add shared abstract heap types, binary parsing and emitting for shared types, and fuzzer support for shared types.
* [Parser][NFC] Split parser into multiple compilation units (#6653)Thomas Lively2024-06-129-167/+369
| | | | | | | | | | | | Because the parser has five stages, it requires instantiating all of the templates in parsers.h with up to five different contexts. Instantiating all those templates in a single compilation unit takes a long time. On my machine, a release build of wat-parser.cpp.o took 32 seconds. To reduce the time of incremental rebuilds on machines with many cores, split the code across several compilation units so that the templates need to be instantiated for just a single context in each unit. On my machine the longest compilation time after this splitting is 17 seconds. The time for a full release build also drops from 42 seconds to 33 seconds. On machines with fewer cores, the benefit may be smaller or even negative, though.
* Remove obsolete parser code (#6607)Thomas Lively2024-05-291-2/+0
| | | | | Remove `SExpressionParser`, `SExpressionWasmBuilder`, and `cashew::Parser`. Simplify gen-s-parser.py. Remove the --new-wat-parser and --deprecated-wat-parser flags.
* Use new wast parser in wasm2js (#6606)Thomas Lively2024-05-291-2/+6
| | | | When generating assertions, traverse the `WASTScript` data structure rather than interleaving assertion parsing with emitting.
* Rewrite wasm-shell to use new wast parser (#6601)Thomas Lively2024-05-176-43/+52
| | | | | | | | | | | | | | | | | | Use the new wast parser to parse a full script up front, then traverse the parsed script data structure and execute the commands. wasm-shell had previously used the new wat parser for top-level modules, but it now uses the new parser for module assertions as well. Fix various bugs this uncovered. After this change, wasm-shell supports all the assertions used in the upstream spec tests (although not new kinds of assertions introduced in any proposals). Uncomment various `assert_exhaustion` tests that we can now execute. Other kinds of assertions remain commented out in our tests: wasm-shell now supports `assert_unlinkable`, but the interpreter does not eagerly check for the existence of imports, so those tests do not pass. Tests that check for NaNs also remain commented out because they do not yet use the standard syntax that wasm-shell now supports for canonical and arithmetic NaN results, and our interpreter would not pass all of those tests even if they did use the standard syntax.
* Debug location parser: accept arbitrary paths (#6594)Jérôme Vouillon2024-05-151-15/+14
| | | | | The whole annotation was parsed as a keyword, which prevented file paths with non-ascii characters or paths starting with `/` or `.`. Also, there was a typo: one was comparing `fileSize` rather than `lineSize` to `contents->npos`.
* [Strings] Remove operations not included in imported strings (#6589)Thomas Lively2024-05-152-69/+12
| | | | | | The stringref proposal has been superseded by the imported JS strings proposal, but the former has many more operations than the latter. To reduce complexity, remove all operations that are part of stringref but not part of imported strings.
* [Strings] Remove stringview types and instructions (#6579)Thomas Lively2024-05-153-122/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The stringview types from the stringref proposal have three irregularities that break common invariants and require pervasive special casing to handle properly: they are supertypes of `none` but not subtypes of `any`, they cannot be the targets of casts, and they cannot be used to construct nullable references. At the same time, the stringref proposal has been superseded by the imported strings proposal, which does not have these irregularities. The cost of maintaing and improving our support for stringview types is no longer worth the benefit of supporting them. Simplify the code base by entirely removing the stringview types and related instructions that do not have analogues in the imported strings proposal and do not make sense in the absense of stringviews. Three remaining instructions, `stringview_wtf16.get_codeunit`, `stringview_wtf16.slice`, and `stringview_wtf16.length` take stringview operands in the stringref proposal but cannot be removed because they lower to operations from the imported strings proposal. These instructions are changed to take stringref operands in Binaryen IR, and to allow a graceful upgrade path for users of these instructions, the text and binary parsers still accept but ignore `string.as_wtf16`, which is the instruction used to convert stringrefs to stringviews. The binary writer emits code sequences that use scratch locals and `string.as_wtf16` to keep the output valid. Future PRs will further align binaryen with the imported strings proposal instead of the stringref proposal, for example by making `string` a subtype of `extern` instead of a subtype of `any` and by removing additional instructions that do not have analogues in the imported strings proposal.
* Source maps: Allow specifying that an expression has no debug info in text ↵Jérôme Vouillon2024-05-141-1/+7
| | | | | | | | | | | | (#6520) ;;@ with nothing else (no source:line) can be used to specify that the following expression does not have any debug info associated to it. This can be used to stop the automatic propagation of debug info in the text parsers. The text printer has also been updated to output this comment when needed.
* [Parser] Parse wast scripts (#6581)Thomas Lively2024-05-135-1/+518
| | | | | | | | | | | The spec tests use an extension of the standard text format that includes various commands and assertions used to test WebAssembly implementations. Add a utility to parse this extended WebAssembly script format and use it in wasm-shell to check that it parses our spec tests without error. Fix a few errors the new parser found in our spec tests. A future PR will rewrite wasm-shell to interpret the results of the new parser, but for now to keep the diff smaller, do not do anything with the new parser except check for errors.
* [memory64] Add table64 to existing memory64 support (#6577)Sam Clegg2024-05-103-27/+58
| | | | | | | Tests is still very limited. Hopefully we can use the upstream spec tests soon and avoid having to write our own tests for `.set/.set/.fill/etc`. See https://github.com/WebAssembly/memory64/issues/51
* [Parser][NFC] Clean up the lexer index/pos API (#6553)Thomas Lively2024-04-293-32/+30
| | | | | The lexer previously had both `getPos` and `getIndex` APIs that did different things, but after a recent refactoring there is no difference between the index and the position. Deduplicate the API surface.
* [Parser] Do not eagerly lex numbers (#6544)Thomas Lively2024-04-252-293/+141
| | | | Lex integers and floats on demand to avoid wasted work. Remove `Token` completely now that all kinds of tokens are lexed on demand.
* [Parser] Do not eagerly lex strings (#6543)Thomas Lively2024-04-252-49/+25
| | | Lex them on demand instead to avoid wasted work.
* [Parser] Do not eagerly lex IDs (#6542)Thomas Lively2024-04-252-43/+23
| | | Lex them on demand instead to avoid wasted work.
* [Parser] Do not eagerly lex keywords (#6541)Thomas Lively2024-04-252-85/+56
| | | Lex them on demand instead to avoid wasted work.
* [Parser] Do not eagerly lex parens (#6540)Thomas Lively2024-04-252-65/+36
| | | | | | | | | | | The lexer currently lexes tokens eagerly and stores them in a `Token` variant ahead of when they are actually requested by the parser. It is wasteful, however, to classify tokens before they are requested by the parser because it is likely that the next token will be precisely the kind the parser requests. The work of checking and rejecting other possible classifications ahead of time is not useful. To make incremental progress toward removing `Token` completely, lex parentheses on demand instead of eagerly.
* [Parser] Enable the new text parser by default (#6371)Thomas Lively2024-04-251-0/+2
| | | | | | | | | | | | | | The new text parser is faster and more standards compliant than the old text parser. Enable it by default in wasm-opt and update the tests to reflect the slightly different results it produces. Besides following the spec, the new parser differs from the old parser in that it: - Does not synthesize `loop` and `try` labels unnecessarily - Synthesizes different block names in some cases - Parses exports in a different order - Parses `nop`s instead of empty blocks for empty control flow arms - Does not support parsing Poppy IR - Produces different error messages - Cannot parse `pop` except as the first instruction inside a `catch`
* [Parser] Use the new parser in wasm-shell and wasm-as (#6529)Thomas Lively2024-04-244-16/+35
| | | | | | | | | | | | | | | | | | | Updating just one or the other of these tools would cause the tests spec/import-after-*.fail.wast to fail, since only the updated tool would correctly fail to parse its contents. To avoid this, update both tools at once. (The tests erroneously pass before this change because check.py does not ensure that .fail.wast tests fail, only that failing tests end in .fail.wast.) In wasm-shell, to minimize the diff, only use the new parser to parse modules and instructions. Continue using the legacy parsing based on s-expressions for the other wast commands. Updating the parsing of the other commands to use `Lexer` instead of `SExpressionParser` is left as future work. The boundary between the two parsing styles is somewhat hacky, but it is worth it to enable incremental development. Update the tests to fix incorrect wast rejected by the new parser. Many of the spec/old_* tests use non-standard forms from before Wasm MVP was standardized, so fixing them would have been onerous. All of these tests have non-old_* variants, so simply delete them.
* DebugLocationPropagation: pass debuglocation from parent node to chil… (#6500)许鑫权2024-04-211-45/+1
| | | | | | | | | | | | | | | | | | | This PR creates a pass to propagate debug location from parent node to child nodes which has no debug location with pre-order traversal. This is useful for compilers that use Binaryen API to generate WebAssembly modules. It behaves like `wasm-opt` read text format file: children are tagged with the debug info of the parent, if they have no annotation of their own. For compilers that use Binaryen API to generate WebAssembly modules, it is a bit redundant to add debugInfo for each expression, Especially when the compiler wrap expressions. With this pass, compilers just need to add debugInfo for the parent node, which is more convenient. For example: ``` (drop (call $voidFunc) ) ``` Without this pass, if the compiler only adds debugInfo for the wrapped expression `drop`, the `call` expression has no corresponding source code mapping in DevTools debugging, which is obviously not user-friendly.
* [Parser][NFC] Do less work when parsing function types (#6516)Thomas Lively2024-04-192-3/+11
| | | | | | | After the initial parsing pass to find the locations of all the module elements and after the type definitions have been parsed, the next phase of parsing is to visit all of the module elements and parse their types. This phase does not require parsing function bodies, but it previously parsed entire functions anyway for simplicity. To improve performance, skip that useless work.