From b3fea30f84fef3ff7aa77775e00b83ba62d997cc Mon Sep 17 00:00:00 2001
From: Thomas Lively <tlively@google.com>
Date: Fri, 22 Mar 2024 16:56:33 -0700
Subject: [Strings] Represent string values as WTF-16 internally (#6418)

WTF-16, i.e. arbitrary sequences of 16-bit values, is the encoding of Java and
JavaScript strings, and using the same encoding makes the interpretation of
string operations trivial, even when accounting for non-ascii characters.
Specifically, use little-endian WTF-16.

Re-encode string constants from WTF-8 to WTF-16 in the parsers, then back to
WTF-8 in the writers. Update the constructor for string `Literal`s to interpret
the string as WTF-16 and store a sequence of WTF-16 code units, i.e. 16-bit
integers. Update `Builder::makeConstantExpression` accordingly to convert from
the new `Literal` string representation back to a WTF-16 string.

Update the interpreter to remove the logic for detecting non-ascii characters
and bailing out. The naive implementations of all the string operations are
correct now that our string encoding matches the JS string encoding.
---
 scripts/fuzz_opt.py | 3 ---
 1 file changed, 3 deletions(-)

(limited to 'scripts')

diff --git a/scripts/fuzz_opt.py b/scripts/fuzz_opt.py
index 9831eb467..686895790 100755
--- a/scripts/fuzz_opt.py
+++ b/scripts/fuzz_opt.py
@@ -333,9 +333,6 @@ INITIAL_CONTENTS_IGNORE = [
     'exception-handling.wast',
     'translate-to-new-eh.wast',
     'rse-eh.wast',
-    # Non-UTF8 strings trap in V8, and have limitations in our interpreter
-    'string-lowering.wast',
-    'precompute-strings.wast',
 ]
 
 
-- 
cgit v1.2.3