summaryrefslogtreecommitdiff
path: root/test/gtest/wat-lexer.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [Parser] Do not eagerly lex parens (#6540)Thomas Lively2024-04-251-3/+0
| | | | | | | | | | | The lexer currently lexes tokens eagerly and stores them in a `Token` variant ahead of when they are actually requested by the parser. It is wasteful, however, to classify tokens before they are requested by the parser because it is likely that the next token will be precisely the kind the parser requests. The work of checking and rejecting other possible classifications ahead of time is not useful. To make incremental progress toward removing `Token` completely, lex parentheses on demand instead of eagerly.
* [Parser] Simplify the lexer interface (#6319)Thomas Lively2024-02-201-1432/+831
| | | | | | | | | | | The lexer was previously an iterator over tokens, but that expressivity is not actually used in the parser. Instead, we have `input.h` that adapts the token iterator interface into an iterface that is actually useful. As a first step toward simplifying the lexer implementation to no longer be an iterator over tokens, update its interface by moving the adaptation from input.h to the lexer itself. This requires extensive changes to the lexer unit tests, which will not have to change further when we actually simplify the lexer implementation.
* [Parser] Support string-style identifiers (#6278)Thomas Lively2024-02-061-0/+27
| | | | | | | | | | In addition to normal identifiers, support parsing identifiers of the format `$"..."`. This format is not yet allowed by the standard, but it is a popular proposed extension (see https://github.com/WebAssembly/spec/issues/617 and https://github.com/WebAssembly/annotations/issues/21). Binaryen has historically allowed a similar format and has supported arbitrary non-standard identifier characters, so it's much easier to support this extended syntax than to fix everything to use the restricted standard syntax.
* [Parser] Templatize lexing of integers (#6272)Thomas Lively2024-02-051-166/+166
| | | | | | Have a single implementation for lexing each of unsigned, signed, and uninterpreted integers, each generic over the bit width of the integer. This reduces duplication in the existing code and it will make it much easier to support lexing more 8- and 16-bit integers.
* [NFC] Split the new wat parser into multiple files (#5960)Thomas Lively2023-09-191-1/+1
| | | | | | And put the new files in a new source directory, "parser". This is a rough split and is not yet expected to dramatically improve compile times. The exact organization of the new files is subject to change, but this splitting should be enough to make further parser development more pleasant.
* Replace more uses of `NAN` (#5354)Thomas Lively2022-12-151-2/+2
| | | | MSVC is making `NAN` negative, so use an explicitly constructed positive NaN instead.
* Remove more uses of NAN (#5310)Thomas Lively2022-12-021-8/+10
| | | | | In favor of the more portable code snippet using `std::copysign`. Also reintroduce assertions that the NaNs have the expected signs. This continues work started in #5302.
* [Parser][NFC] Small code cleanups (#4729)Thomas Lively2022-06-141-184/+184
| | | | Apply cleanups suggested by aheejin in post-merge code review of previous parser PRs.
* [Parser] Begin parsing modules (#4716)Thomas Lively2022-06-101-1/+1
| | | | | | | | | | | Implement the basic infrastructure for the full WAT parser with just enough detail to parse basic modules that contain only imported globals. Parsing functions correspond to elements of the grammar in the text specification and are templatized over context types that correspond to each phase of parsing. Errors are explicitly propagated via `Result<T>` and `MaybeResult<T>` types. Follow-on PRs will implement additional phases of parsing and parsing for new elements in the grammar.
* [Parser] Token classification (#4699)Thomas Lively2022-06-011-8/+521
| | | | | | | | | | | | Add methods to `Token` for determining whether the token can be interpreted as a particular token type, returning the interpreted value as appropriate. These methods perform additional bounds checks for integers and NaN payloads that could not be done during the initial lexing because the lexer did not know what the intended token type was. The float methods also reinterpret integer tokens as floating point tokens since the float grammar is a superset of the integer grammar and inject the NaN payloads into parsed NaN values. Move all bounds checking to these new classifier functions to have it in one place.
* [Parser] Replace Signedness with ternary Sign (#4698)Thomas Lively2022-05-271-42/+42
| | | | | | | | Previously we were tracking whether integer tokens were signed but we did not differentiate between positive and negative signs. Unfortunately, without differentiating them, there's no way to tell the difference between an in-bounds negative integer and a wildly out-of-bounds positive integer when trying to perform bounds checks for s32 tokens. Fix the problem by tracking not only whether there is a sign on an integer token, but also what the sign is.
* [Parser][NFC] Create a public wat-lexer.h header (#4695)Thomas Lively2022-05-271-0/+1004
wat-parser-internal.h was already quite large after implementing just the lexer, so it made sense to rename it to be lexer-specific and start a new file for the higher-level parser. Also make it a proper .cpp file and split the testable interface out into wat-lexer.h.