| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
The lexer currently lexes tokens eagerly and stores them in a `Token` variant
ahead of when they are actually requested by the parser. It is wasteful,
however, to classify tokens before they are requested by the parser because it
is likely that the next token will be precisely the kind the parser requests.
The work of checking and rejecting other possible classifications ahead of time
is not useful.
To make incremental progress toward removing `Token` completely, lex parentheses
on demand instead of eagerly.
|
|
|
|
|
|
|
|
|
|
|
| |
The lexer was previously an iterator over tokens, but that expressivity is not
actually used in the parser. Instead, we have `input.h` that adapts the token
iterator interface into an iterface that is actually useful.
As a first step toward simplifying the lexer implementation to no longer be an
iterator over tokens, update its interface by moving the adaptation from input.h
to the lexer itself. This requires extensive changes to the lexer unit tests,
which will not have to change further when we actually simplify the lexer
implementation.
|
|
|
|
|
|
|
|
|
|
| |
In addition to normal identifiers, support parsing identifiers of the format
`$"..."`. This format is not yet allowed by the standard, but it is a popular
proposed extension (see https://github.com/WebAssembly/spec/issues/617 and
https://github.com/WebAssembly/annotations/issues/21).
Binaryen has historically allowed a similar format and has supported arbitrary
non-standard identifier characters, so it's much easier to support this extended
syntax than to fix everything to use the restricted standard syntax.
|
|
|
|
|
|
| |
Have a single implementation for lexing each of unsigned, signed, and
uninterpreted integers, each generic over the bit width of the integer. This
reduces duplication in the existing code and it will make it much easier to
support lexing more 8- and 16-bit integers.
|
|
|
|
|
|
| |
And put the new files in a new source directory, "parser". This is a rough split
and is not yet expected to dramatically improve compile times. The exact
organization of the new files is subject to change, but this splitting should be
enough to make further parser development more pleasant.
|
|
|
|
| |
MSVC is making `NAN` negative, so use an explicitly constructed positive NaN
instead.
|
|
|
|
|
| |
In favor of the more portable code snippet using `std::copysign`. Also
reintroduce assertions that the NaNs have the expected signs. This continues
work started in #5302.
|
|
|
|
| |
Apply cleanups suggested by aheejin in post-merge code review of previous
parser PRs.
|
|
|
|
|
|
|
|
|
|
|
| |
Implement the basic infrastructure for the full WAT parser with just enough
detail to parse basic modules that contain only imported globals. Parsing
functions correspond to elements of the grammar in the text specification and
are templatized over context types that correspond to each phase of parsing.
Errors are explicitly propagated via `Result<T>` and `MaybeResult<T>` types.
Follow-on PRs will implement additional phases of parsing and parsing for new
elements in the grammar.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add methods to `Token` for determining whether the token can be interpreted as a
particular token type, returning the interpreted value as appropriate. These
methods perform additional bounds checks for integers and NaN payloads that
could not be done during the initial lexing because the lexer did not know what
the intended token type was. The float methods also reinterpret integer tokens
as floating point tokens since the float grammar is a superset of the integer
grammar and inject the NaN payloads into parsed NaN values.
Move all bounds checking to these new classifier functions to have it in one
place.
|
|
|
|
|
|
|
|
| |
Previously we were tracking whether integer tokens were signed but we did not
differentiate between positive and negative signs. Unfortunately, without
differentiating them, there's no way to tell the difference between an in-bounds
negative integer and a wildly out-of-bounds positive integer when trying to
perform bounds checks for s32 tokens. Fix the problem by tracking not only
whether there is a sign on an integer token, but also what the sign is.
|
|
wat-parser-internal.h was already quite large after implementing just the lexer,
so it made sense to rename it to be lexer-specific and start a new file for the
higher-level parser. Also make it a proper .cpp file and split the testable
interface out into wat-lexer.h.
|