| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
| |
This will hopefully fix the build on the coverage builder.
|
|
|
|
|
|
| |
Add an `isUTF8` utility and use it in both the text and binary parsers.
Add missing checks for overlong encodings and overlarge code points in
our WTF8 reader, which the new utility uses. Re-enable the spec tests
that test UTF-8 validation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The latest idea for efficient string constants is to encode the constants in
the import names of their globals and implement fast paths in the engines for
materializing those constants at instantiation time without needing to parse
anything in JS. This strategy only works for valid strings (i.e. strings without
unpaired surrogates) because only valid strings can be used as import names in
the WebAssembly syntax.
Add a new configuration of the StringLowering pass that encodes valid string
contents in import names, falling back to the JSON custom section approach for
invalid strings.
To test this chang, update the printer to escape import and export names
properly and update the legacy parser to parse escapes in import and export
names properly. As a drive-by, remove the incorrect check in the parser that the
import module and base names are non-empty.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
WTF-16, i.e. arbitrary sequences of 16-bit values, is the encoding of Java and
JavaScript strings, and using the same encoding makes the interpretation of
string operations trivial, even when accounting for non-ascii characters.
Specifically, use little-endian WTF-16.
Re-encode string constants from WTF-8 to WTF-16 in the parsers, then back to
WTF-8 in the writers. Update the constructor for string `Literal`s to interpret
the string as WTF-16 and store a sequence of WTF-16 code units, i.e. 16-bit
integers. Update `Builder::makeConstantExpression` accordingly to convert from
the new `Literal` string representation back to a WTF-16 string.
Update the interpreter to remove the logic for detecting non-ascii characters
and bailing out. The naive implementations of all the string operations are
correct now that our string encoding matches the JS string encoding.
|
|
|
|
|
|
|
|
| |
Catch and report all kinds of WTF-8 encoding errors in the source strings,
including invalid leading bytes, invalid trailing bytes, unexpected ends of
strings, and invalid surrogate sequences. Insert replacement characters into the
output as necessary. Add a TODO about minimizing size by escaping only those
code points mandated to be escaped by the JSON spec. Generally improve
readability of the code.
|
|
|
|
| |
Also add an end-to-end test using node to verify we can parse the escaped
content properly using TextDecoder+JSON.parse.
|
|
|
|
| |
Now that we have a .cpp file, none of the code that was in string.h needs to be
in a header any more.
|
|
Update identifiers used in tests to use a format supported by the new text
parser, i.e. either the standard format with its limited set of allowed
characters or the non-standard `$"..."` format. Notably, any name containing
square or curly braces now uses the string format.
Input automatically updated with this script:
https://gist.github.com/tlively/4e22311736661849e641d02e521a0748
The printer is updated to properly escape names in more places as well. The
logic for escaping names is moved to a common location so that the type
printing logic in wasm-type.cpp can use it as well.
|