diff options
author | Thomas Lively <tlively@google.com> | 2024-03-22 16:56:33 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-03-22 23:56:33 +0000 |
commit | b3fea30f84fef3ff7aa77775e00b83ba62d997cc (patch) | |
tree | 53494a466d8e56d34d849d14927817a22f843748 /src/passes | |
parent | d3414c3deaebe7ba35731a8c20d7fa5f5a833ca3 (diff) | |
download | binaryen-b3fea30f84fef3ff7aa77775e00b83ba62d997cc.tar.gz binaryen-b3fea30f84fef3ff7aa77775e00b83ba62d997cc.tar.bz2 binaryen-b3fea30f84fef3ff7aa77775e00b83ba62d997cc.zip |
[Strings] Represent string values as WTF-16 internally (#6418)
WTF-16, i.e. arbitrary sequences of 16-bit values, is the encoding of Java and
JavaScript strings, and using the same encoding makes the interpretation of
string operations trivial, even when accounting for non-ascii characters.
Specifically, use little-endian WTF-16.
Re-encode string constants from WTF-8 to WTF-16 in the parsers, then back to
WTF-8 in the writers. Update the constructor for string `Literal`s to interpret
the string as WTF-16 and store a sequence of WTF-16 code units, i.e. 16-bit
integers. Update `Builder::makeConstantExpression` accordingly to convert from
the new `Literal` string representation back to a WTF-16 string.
Update the interpreter to remove the logic for detecting non-ascii characters
and bailing out. The naive implementations of all the string operations are
correct now that our string encoding matches the JS string encoding.
Diffstat (limited to 'src/passes')
-rw-r--r-- | src/passes/Print.cpp | 8 | ||||
-rw-r--r-- | src/passes/StringLowering.cpp | 8 |
2 files changed, 14 insertions, 2 deletions
diff --git a/src/passes/Print.cpp b/src/passes/Print.cpp index 643f1cc3f..80047a281 100644 --- a/src/passes/Print.cpp +++ b/src/passes/Print.cpp @@ -2232,7 +2232,13 @@ struct PrintExpressionContents } void visitStringConst(StringConst* curr) { printMedium(o, "string.const "); - String::printEscaped(o, curr->string.str); + // Re-encode from WTF-16 to WTF-8. + std::stringstream wtf8; + [[maybe_unused]] bool valid = + String::convertWTF16ToWTF8(wtf8, curr->string.str); + assert(valid); + // TODO: Use wtf8.view() once we have C++20. + String::printEscaped(o, wtf8.str()); } void visitStringMeasure(StringMeasure* curr) { switch (curr->op) { diff --git a/src/passes/StringLowering.cpp b/src/passes/StringLowering.cpp index e0d3fbad0..322f0deb2 100644 --- a/src/passes/StringLowering.cpp +++ b/src/passes/StringLowering.cpp @@ -147,8 +147,14 @@ struct StringGathering : public Pass { } auto& string = strings[i]; + // Re-encode from WTF-16 to WTF-8 to make the name easier to read. + std::stringstream wtf8; + [[maybe_unused]] bool valid = + String::convertWTF16ToWTF8(wtf8, string.str); + assert(valid); + // TODO: Use wtf8.view() once we have C++20. auto name = Names::getValidGlobalName( - *module, std::string("string.const_") + std::string(string.str)); + *module, std::string("string.const_") + std::string(wtf8.str())); globalName = name; newNames.insert(name); auto* stringConst = builder.makeStringConst(string); |