summaryrefslogtreecommitdiff
path: root/src/support/string.h
Commit message (Collapse)AuthorAgeFilesLines
* Validate that names are valid UTF-8 (#6682)Thomas Lively2024-06-191-0/+3
| | | | | | Add an `isUTF8` utility and use it in both the text and binary parsers. Add missing checks for overlong encodings and overlarge code points in our WTF8 reader, which the new utility uses. Re-enable the spec tests that test UTF-8 validation.
* [Strings] Add a string lowering pass using magic imports (#6497)Thomas Lively2024-04-151-0/+5
| | | | | | | | | | | | | | | | | The latest idea for efficient string constants is to encode the constants in the import names of their globals and implement fast paths in the engines for materializing those constants at instantiation time without needing to parse anything in JS. This strategy only works for valid strings (i.e. strings without unpaired surrogates) because only valid strings can be used as import names in the WebAssembly syntax. Add a new configuration of the StringLowering pass that encodes valid string contents in import names, falling back to the JSON custom section approach for invalid strings. To test this chang, update the printer to escape import and export names properly and update the legacy parser to parse escapes in import and export names properly. As a drive-by, remove the incorrect check in the parser that the import module and base names are non-empty.
* [Strings] Represent string values as WTF-16 internally (#6418)Thomas Lively2024-03-221-2/+17
| | | | | | | | | | | | | | | | WTF-16, i.e. arbitrary sequences of 16-bit values, is the encoding of Java and JavaScript strings, and using the same encoding makes the interpretation of string operations trivial, even when accounting for non-ascii characters. Specifically, use little-endian WTF-16. Re-encode string constants from WTF-8 to WTF-16 in the parsers, then back to WTF-8 in the writers. Update the constructor for string `Literal`s to interpret the string as WTF-16 and store a sequence of WTF-16 code units, i.e. 16-bit integers. Update `Builder::makeConstantExpression` accordingly to convert from the new `Literal` string representation back to a WTF-16 string. Update the interpreter to remove the logic for detecting non-ascii characters and bailing out. The naive implementations of all the string operations are correct now that our string encoding matches the JS string encoding.
* StringLowering: Escape the JSON in the custom section (#6316)Alon Zakai2024-02-201-1/+3
| | | | Also add an end-to-end test using node to verify we can parse the escaped content properly using TextDecoder+JSON.parse.
* [NFC] Move code to string.cpp (#6282)Thomas Lively2024-02-061-84/+6
| | | | Now that we have a .cpp file, none of the code that was in string.h needs to be in a header any more.
* Properly stringify names in tests (#6279)Thomas Lively2024-02-061-0/+3
| | | | | | | | | | | | | Update identifiers used in tests to use a format supported by the new text parser, i.e. either the standard format with its limited set of allowed characters or the non-standard `$"..."` format. Notably, any name containing square or curly braces now uses the string format. Input automatically updated with this script: https://gist.github.com/tlively/4e22311736661849e641d02e521a0748 The printer is updated to properly escape names in more places as well. The logic for escaping names is moved to a common location so that the type printing logic in wasm-type.cpp can use it as well.
* Support one-line-one-function file format for asyncify lists (#6051)Alexander Guryanov2023-10-301-3/+36
| | | | | | | If there are newlines in the list, then we split using them in a simple manner (that does not take into account nesting of any other delimiters). Fixes #6047 Fixes #5271
* Modernize code to C++17 (#3104)Max Graey2021-11-221-6/+3
|
* cleanup to allow binaryen to be built in more strict environments (#3566)walkingeyerobot2021-02-161-0/+1
|
* [GC] Fix parsing/printing of ref types using i31 (#3469)Alon Zakai2021-01-071-0/+4
| | | | | | | | | | | | This lets us parse (ref null i31) and (ref i31) and not just i31ref. It also fixes the parsing of i31ref, making it nullable for now, which we need to do until we support non-nullability. Fix some internal handling of i31 where we had just i31ref (which meant we just handled the non-nullable type). After fixing a bug in printing (where we didn't print out (ref null i31) properly), I found some a simplification, to remove TypeName.
* asyncify: support *-matching in whitelist and blacklist (#2344)Beuc2019-09-231-5/+6
| | | See emscripten-core/emscripten#9381 for rationale.
* Support response files, and use that in Asyncify (#2319)Alon Zakai2019-08-301-0/+10
| | | See emscripten-core/emscripten#9206, the asyncify names can need complex escaping, so this provides an escape hatch.
* Proper Asyncify list name handling (#2275)Alon Zakai2019-07-311-0/+42
| | | | | The lists are comma separated, but the names can have internal commas since they are human-readable. This adds awareness of bracketing things, so void foo(int, double) is parsed as a single function name, properly. Helps emscripten-core/emscripten#9128
* Bysyncify: allow wildcard endings in import list (#2190)Alon Zakai2019-06-301-0/+69
This allows us to do things in emscripten like note that all env.invoke_* functions are important.