| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Add an `isUTF8` utility and use it in both the text and binary parsers.
Add missing checks for overlong encodings and overlarge code points in
our WTF8 reader, which the new utility uses. Re-enable the spec tests
that test UTF-8 validation.
|
|
|
|
|
| |
The lexer previously had both `getPos` and `getIndex` APIs that did different
things, but after a recent refactoring there is no difference between the index
and the position. Deduplicate the API surface.
|
|
|
|
| |
Lex integers and floats on demand to avoid wasted work. Remove `Token`
completely now that all kinds of tokens are lexed on demand.
|
|
|
| |
Lex them on demand instead to avoid wasted work.
|
|
|
| |
Lex them on demand instead to avoid wasted work.
|
|
|
| |
Lex them on demand instead to avoid wasted work.
|
|
|
|
|
|
|
|
|
|
|
| |
The lexer currently lexes tokens eagerly and stores them in a `Token` variant
ahead of when they are actually requested by the parser. It is wasteful,
however, to classify tokens before they are requested by the parser because it
is likely that the next token will be precisely the kind the parser requests.
The work of checking and rejecting other possible classifications ahead of time
is not useful.
To make incremental progress toward removing `Token` completely, lex parentheses
on demand instead of eagerly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Updating just one or the other of these tools would cause the tests
spec/import-after-*.fail.wast to fail, since only the updated tool would
correctly fail to parse its contents. To avoid this, update both tools at
once. (The tests erroneously pass before this change because check.py does not
ensure that .fail.wast tests fail, only that failing tests end in .fail.wast.)
In wasm-shell, to minimize the diff, only use the new parser to parse modules
and instructions. Continue using the legacy parsing based on s-expressions for
the other wast commands. Updating the parsing of the other commands to use
`Lexer` instead of `SExpressionParser` is left as future work. The boundary
between the two parsing styles is somewhat hacky, but it is worth it to enable
incremental development.
Update the tests to fix incorrect wast rejected by the new parser. Many of the
spec/old_* tests use non-standard forms from before Wasm MVP was standardized,
so fixing them would have been onerous. All of these tests have non-old_*
variants, so simply delete them.
|
|
|
|
|
|
|
|
|
|
| |
Parse annotations using the standards-track `(@annotation ...)` format as well
as the `;;@ source-map:0:1` format. Have the lexer implicitly collect
annotations while it skips whitespace and add lexer APIs to access the
annotations since the last token was parsed. Collect annotations before parsing
each instruction and pass the annotations explicitly to the parser and parser
context functions for instructions. Add an API to `IRBuilder` to set a debug
location to be attached to the next visited or created instruction and use it
from the parser.
|
|
|
|
|
|
| |
Replace the general `peek` method that returned a `Token` with specific peek
methods that look for (but do not consume) specific kinds of tokens. This change
is a prerequisite for simplifying the lexer implementation by removing `Token`
entirely.
|
|
|
|
| |
Remove the layer of abstraction sitting between the parser and the lexer now
that the lexer has an interface the parser can use directly.
|
|
|
|
|
|
|
|
|
|
|
| |
The lexer was previously an iterator over tokens, but that expressivity is not
actually used in the parser. Instead, we have `input.h` that adapts the token
iterator interface into an iterface that is actually useful.
As a first step toward simplifying the lexer implementation to no longer be an
iterator over tokens, update its interface by moving the adaptation from input.h
to the lexer itself. This requires extensive changes to the lexer unit tests,
which will not have to change further when we actually simplify the lexer
implementation.
|
|
|
|
|
|
|
|
|
|
| |
In addition to normal identifiers, support parsing identifiers of the format
`$"..."`. This format is not yet allowed by the standard, but it is a popular
proposed extension (see https://github.com/WebAssembly/spec/issues/617 and
https://github.com/WebAssembly/annotations/issues/21).
Binaryen has historically allowed a similar format and has supported arbitrary
non-standard identifier characters, so it's much easier to support this extended
syntax than to fix everything to use the restricted standard syntax.
|
|
|
|
|
|
| |
Have a single implementation for lexing each of unsigned, signed, and
uninterpreted integers, each generic over the bit width of the integer. This
reduces duplication in the existing code and it will make it much easier to
support lexing more 8- and 16-bit integers.
|
|
And put the new files in a new source directory, "parser". This is a rough split
and is not yet expected to dramatically improve compile times. The exact
organization of the new files is subject to change, but this splitting should be
enough to make further parser development more pleasant.
|