summaryrefslogtreecommitdiff
path: root/src/support/string.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [Strings] Represent string values as WTF-16 internally (#6418)Thomas Lively2024-03-221-52/+179
| | | | | | | | | | | | | | | | WTF-16, i.e. arbitrary sequences of 16-bit values, is the encoding of Java and JavaScript strings, and using the same encoding makes the interpretation of string operations trivial, even when accounting for non-ascii characters. Specifically, use little-endian WTF-16. Re-encode string constants from WTF-8 to WTF-16 in the parsers, then back to WTF-8 in the writers. Update the constructor for string `Literal`s to interpret the string as WTF-16 and store a sequence of WTF-16 code units, i.e. 16-bit integers. Update `Builder::makeConstantExpression` accordingly to convert from the new `Literal` string representation back to a WTF-16 string. Update the interpreter to remove the logic for detecting non-ascii characters and bailing out. The naive implementations of all the string operations are correct now that our string encoding matches the JS string encoding.
* Improve JSON string encoding (#6328)Thomas Lively2024-02-211-69/+103
| | | | | | | | Catch and report all kinds of WTF-8 encoding errors in the source strings, including invalid leading bytes, invalid trailing bytes, unexpected ends of strings, and invalid surrogate sequences. Insert replacement characters into the output as necessary. Add a TODO about minimizing size by escaping only those code points mandated to be escaped by the JSON spec. Generally improve readability of the code.
* StringLowering: Escape the JSON in the custom section (#6316)Alon Zakai2024-02-201-1/+87
| | | | Also add an end-to-end test using node to verify we can parse the escaped content properly using TextDecoder+JSON.parse.
* [NFC] Move code to string.cpp (#6282)Thomas Lively2024-02-061-0/+86
| | | | Now that we have a .cpp file, none of the code that was in string.h needs to be in a header any more.
* Properly stringify names in tests (#6279)Thomas Lively2024-02-061-0/+57
Update identifiers used in tests to use a format supported by the new text parser, i.e. either the standard format with its limited set of allowed characters or the non-standard `$"..."` format. Notably, any name containing square or curly braces now uses the string format. Input automatically updated with this script: https://gist.github.com/tlively/4e22311736661849e641d02e521a0748 The printer is updated to properly escape names in more places as well. The logic for escaping names is moved to a common location so that the type printing logic in wasm-type.cpp can use it as well.