| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
| |
Implement binary and text parsing and printing of shared basic heap types and
incorporate them into the type hierarchy.
To avoid the massive amount of code duplication that would be necessary if we
were to add separate enum variants for each of the shared basic heap types, use
bit 0 to indicate whether the type is shared and replace `getBasic()` with
`getBasic(Unshared)`, which clears that bit. Update all the use sites to record
whether the original type was shared and produce shared or unshared output
without code duplication.
|
|
|
|
|
|
|
|
|
| |
When popping past an unreachable instruction would lead to popping from an empty
stack or popping an incorrect type, we need to avoid popping and produce new
Unreachable instructions instead to ensure we parse valid IR. The logic for
this was flawed and made the synthetic Unreachable come before the popped
unreachable child, which was not correct in the case that that popped
unreachable was a branch or other non-trapping instruction. Fix and simplify the
logic and re-enable the spec test that uncovered the bug.
|
|
|
|
|
|
| |
Rather than treating them as custom sections. Also fix UB where invalid
`Section` enum values could be used as keys in a map. Use the raw `uint8_t`
section IDs as keys instead. Re-enable a disabled spec test that was failing
because of this bug and UB.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code used i instead of index, as in this pseudocode:
for i in range(num_names):
index = readU32LEB() # index of the data segment to name
name = readName() # name to give that segment
data[i] = name # XXX 'i' should be 'index'
That (funnily enough) happened to always work before since we write names in
order. That is, normally given segments A,B,C we'd write then in the names section
as A,B,C. Then the reader, which had the bug, would always have i and index
identical in value anyhow. But if a wasm producer used different indexes, a
problem could happen.
To test this, add a binary file that has a reversed name section.
Fixes #6672
|
|
|
|
| |
Also update the parser so that implicit type uses are not matched with shared
function types.
|
|
|
|
|
| |
Add the feature and flags to enable and disable it. Require the new feature to
be enabled for shared heap types to validate. To make the test work, update the
validator to actually check features for global types.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Parse the text format for shared composite types as described in the
shared-everything thread proposal. Update the parser to use 'comptype' instead
of 'strtype' to match the final GC spec and add the new syntactic class
'sharecomptype'.
Update the type canonicalization logic to take sharedness into account to avoid
merging shared and unshared types. Make the same change in the TypeMerging pass.
Ensure that shared and unshared types cannot be in a subtype relationship with
each other.
Follow-up PRs will add shared abstract heap types, binary parsing and emitting
for shared types, and fuzzer support for shared types.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The binary writing of `stringview_wtf16.slice` requires scratch locals to store
the `start` and `end` operands while the string operand is converted to a
stringview. To avoid unbounded binary bloat when round-tripping, we detect the
case that `start` and `end` are already `local.get`s and avoid using scratch
locals by deferring the binary writing of the `local.get` operands until after
the stringview conversoins is emitted.
We previously optimized the scratch locals for `start` and `end` independently,
but this could produce incorrect code in the case where the `local.get` for
`start` is deferred but its value is changed by a `local.set` in the code for
`end`. Fix the problem by only optimizing to avoid scratch locals in the case
where both `start` and `end` are already `local.get`s, so they will still be
emitted in the original relative order and they cannot interfere with each other
anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The parser was incorrectly handling the parsing of declarative element segments whose `init` is a `vec(expr)`.
https://webassembly.github.io/spec/core/binary/modules.html#element-section
Binry parser was simply reading a single `u32LEB` value for `init`
instead of parsing a expression regardless `usesExpressions = true`.
This commit updates the `WasmBinaryReader::readElementSegments` function
to correctly parse the expressions for declarative element segments by
calling `readExpression` instead of `getU32LEB` when `usesExpressions = true`.
Resolves the parsing exception:
"[parse exception: bad section size, started at ... not being equal to new position ...]"
Related discussion: https://github.com/tanishiking/scala-wasm/issues/136
|
|
|
|
|
| |
Remove `SExpressionParser`, `SExpressionWasmBuilder`, and `cashew::Parser`.
Simplify gen-s-parser.py. Remove the --new-wat-parser and
--deprecated-wat-parser flags.
|
|
|
|
|
|
|
|
|
|
| |
caught it (#6626)
The DataSegment was manually added to .dataSegments, but we need to add it
using addDataSegment so the maps are updated and getDataSegment(name)
works.
Also add validation that would have caught this earlier: check that each item in
the item lists can be fetched by name.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this PR we generate global.gets in globals, which we did not do before.
We do that by replacing makeConst (the only thing we did before, for the
contents of globals) with makeTrivial, and add code to makeTrivial to sometimes
make a global.get. When no suitable global exists, makeGlobalGet will emit a
constant, so there is no danger in trying.
Also raise the number of globals a little.
Also explicitly note the current limitation of requiring all tuple globals to contain
tuple.make and nothing else, including not global.get, and avoid adding such
invalid global.gets in tuple globals in the fuzzer.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the new wast parser to parse a full script up front, then traverse the
parsed script data structure and execute the commands. wasm-shell had previously
used the new wat parser for top-level modules, but it now uses the new parser
for module assertions as well. Fix various bugs this uncovered.
After this change, wasm-shell supports all the assertions used in the upstream
spec tests (although not new kinds of assertions introduced in any proposals).
Uncomment various `assert_exhaustion` tests that we can now execute.
Other kinds of assertions remain commented out in our tests: wasm-shell now
supports `assert_unlinkable`, but the interpreter does not eagerly check for the
existence of imports, so those tests do not pass. Tests that check for NaNs also
remain commented out because they do not yet use the standard syntax that
wasm-shell now supports for canonical and arithmetic NaN results, and our
interpreter would not pass all of those tests even if they did use the standard
syntax.
|
|
|
|
|
|
|
|
|
|
|
| |
validation (#6603)
GlobalRefining did not traverse module code, so it did not update global.gets
in other globals.
Add missing validation that actually errors on that: We did not check global.get
types.
These could be separate PRs but it would be difficult to test them separately.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes us compliant with the wasm spec by adding a cast: we use the refined
type for br_if fallthrough values, and the wasm spec uses the branch target. If the
two differ, we add a cast after the br_if to make things match.
Alternatively we could match the wasm spec's typing in our IR, but we hope the wasm
spec will improve here, and so this is will only be temporary in that case. Even if not,
this is useful because by using the most refined type in the IR we optimize in the best
way possible, and only suffer when we emit fixups in the binary, but in practice those
cases are very rare: br_if is almost always dropped rather than used, in real-world
code (except for fuzz cases and exploits).
We check carefully when a br_if value is actually used (and not dropped) and its type
actually differs, and it does not already have a cast. The last condition ensures that
we do not keep adding casts over repeated roundtripping.
|
|
|
|
|
| |
Changes to wasm-validator.cpp here are mostly for consistency between
elem and data segment validation.
|
|
|
|
|
|
| |
The stringref proposal has been superseded by the imported JS strings proposal,
but the former has many more operations than the latter. To reduce complexity,
remove all operations that are part of stringref but not part of imported
strings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The stringview types from the stringref proposal have three irregularities that
break common invariants and require pervasive special casing to handle properly:
they are supertypes of `none` but not subtypes of `any`, they cannot be the
targets of casts, and they cannot be used to construct nullable references. At
the same time, the stringref proposal has been superseded by the imported
strings proposal, which does not have these irregularities. The cost of
maintaing and improving our support for stringview types is no longer worth the
benefit of supporting them.
Simplify the code base by entirely removing the stringview types and related
instructions that do not have analogues in the imported strings proposal and do
not make sense in the absense of stringviews.
Three remaining instructions, `stringview_wtf16.get_codeunit`,
`stringview_wtf16.slice`, and `stringview_wtf16.length` take stringview operands
in the stringref proposal but cannot be removed because they lower to operations
from the imported strings proposal. These instructions are changed to take
stringref operands in Binaryen IR, and to allow a graceful upgrade path for
users of these instructions, the text and binary parsers still accept but ignore
`string.as_wtf16`, which is the instruction used to convert stringrefs to
stringviews. The binary writer emits code sequences that use scratch locals and `string.as_wtf16` to keep the output valid.
Future PRs will further align binaryen with the imported strings proposal
instead of the stringref proposal, for example by making `string` a subtype of
`extern` instead of a subtype of `any` and by removing additional instructions
that do not have analogues in the imported strings proposal.
|
|
|
|
| |
I recently add TableSize/Grow and noticed I didn't need these. It seems
they are superfluous.
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#6520)
;;@
with nothing else (no source:line) can be used to specify that the following
expression does not have any debug info associated to it. This can be used
to stop the automatic propagation of debug info in the text parsers.
The text printer has also been updated to output this comment when needed.
|
|
|
|
|
|
|
|
|
|
|
| |
Change `countScratchLocals` to return the count and type of necessary scratch
locals. It used to encode them as keys in the global map from scratch local
types to local indices, which could not handle having more than one scratch
local of a given type and was generally harder to reason about due to its use of
global state. Take the opportunity to avoid emitting unnecessary scratch locals
for `TupleExtract` expressions that will be optimized to not use them.
Also simplify and better document the calculation of the mapping from IR indices
to binary indices for all locals, scratch and non-scratch.
|
|
|
|
|
|
|
| |
Tests is still very limited. Hopefully we can use the upstream spec
tests soon and avoid having to write our own tests for
`.set/.set/.fill/etc`.
See https://github.com/WebAssembly/memory64/issues/51
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we had passes --generate-stack-ir, --optimize-stack-ir, --print-stack-ir
that could be run like any other passes. After generating StackIR it was stashed on
the function and invalidated if we modified BinaryenIR. If it wasn't invalidated then
it was used during binary writing. This PR switches things so that we optionally
generate, optimize, and print StackIR only during binary writing. It also removes
all traces of StackIR from wasm.h - after this, StackIR is a feature of binary writing
(and printing) logic only.
This is almost NFC, but there are some minor noticeable differences:
1. We no longer print has StackIR in the text format when we see it is there. It
will not be there during normal printing, as it is only present during binary writing.
(but --print-stack-ir still works as before; as mentioned above it runs during writing).
2. --generate/optimize/print-stack-ir change from being passes to being flags that
control that behavior instead. As passes, their order on the commandline mattered,
while now it does not, and they only "globally" affect things during writing.
3. The C API changes slightly, as there is no need to pass it an option "optimize" to
the StackIR APIs. Whether we optimize is handled by --optimize-stack-ir which is
set like other optimization flags on the PassOptions object, so we don't need the
old option to those C APIs.
The main benefit here is simplifying the code, so we don't need to think about
StackIR in more places than just binary writing. That may also allow future
improvements to our usage of StackIR.
|
|
|
|
| |
It seems like that each of the callsites already has looked up the
`Memory` object so this helper is not doing anything useful.
|
|
|
|
|
|
|
|
|
| |
This allows writing of binaries with DWARF info when multivalue is
enabled. Currently we just crash when both are enabled together. This
just assumes, unless we have run DWARF-invalidating passes, all locals
added for tuples or scratch locals would have been added at the end of
the local list, so just printing all locals in order would preserve the
DWARF info. Tuple locals are expanded in place and scratch locals are
added at the end.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Keep debug locations at function start
The `fn_prolog_epilog.debugInfo` test is failing otherwise, since there
was debug information associated to the nop instruction at the beginning
of the function.
* Do not clear the debug information when reaching the end of the source map
The last segment should extend to the end of the function.
* Propagate debug location from the function prolog to its first instruction
* Fix printing of epilogue location
The text parser no longer propagates locations to the epilogue, so we
should always print the location if there is one.
* Fix debug location smearing
The debug location of the last instruction should not smear into the
function epilogue, and a debug location from a previous function should
not smear into the prologue of the current function.
|
|
|
|
|
| |
Without this the fuzzer can error on differences in behavior between V8 and us.
Also move the limitations constants to their own header.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Helping #6509, this fixes debugging support for StackIR, which makes it more
possible to use StackIR in more places.
The fix is basically just to pass around some more state, and then to call the
parent with "please write debug info" at the correct times, mirroring the
similar calls in BinaryenIRWriter.
The relevant Emscripten tests pass, and the source map test modified
here produces identical output in StackIR and non-StackIR modes (the
test is also simplified to remove --new-wat-parser which is no longer
needed, after which the test can clearly show that StackIR has the same
output as BinaryenIR).
|
|
|
|
|
|
|
| |
When the input has branches to block scope, IR builder generally has to add a
wrapper block with a label name for the branch to target. To reduce the parsed
IR size, add a special case for when the wrapped expression is already an
unnamed block. In that case we can simply add the label to the existing block
instead of creating a new wrapper block.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#6549)
As suggested in #6434 (comment) , lower ref.cast of string views
to ref.as_non_null in binary writing. It is a simple hack that avoids the
problem of V8 not allowing them to be cast.
Add fuzzing support for the last three core string operations, after which
that problem becomes very frequent.
Also add yet another makeTrappingRefUse that was missing in that
fuzzer code.
|
|
|
|
| |
Disallow returns from having any children, even unreachable children, in
function that do not return any values.
|
|
|
|
|
|
|
|
|
|
|
|
| |
When branches target control flow structures other than blocks or loops,
the IRBuilder wraps those control flow structures with an extra block
for the branches to target in Binaryen IR. When the control flow
structure is unreachable because all its bodies are unreachable, the
wrapper block may still need to have a non-unreachable type if it is
targeted by branches. This is achieved by tracking whether the wrapper
block will be targeted by any branches and use the control flow
structure's original, non-unreachable type if so. However, this was not
properly tracked when moving into the `else` branch of an `if` or the
`catch`/`cath_all` handlers of a `try` block.
|
|
|
|
|
|
|
|
|
|
|
| |
The lexer currently lexes tokens eagerly and stores them in a `Token` variant
ahead of when they are actually requested by the parser. It is wasteful,
however, to classify tokens before they are requested by the parser because it
is likely that the next token will be precisely the kind the parser requests.
The work of checking and rejecting other possible classifications ahead of time
is not useful.
To make incremental progress toward removing `Token` completely, lex parentheses
on demand instead of eagerly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new text parser is faster and more standards compliant than the old text
parser. Enable it by default in wasm-opt and update the tests to reflect the
slightly different results it produces. Besides following the spec, the new
parser differs from the old parser in that it:
- Does not synthesize `loop` and `try` labels unnecessarily
- Synthesizes different block names in some cases
- Parses exports in a different order
- Parses `nop`s instead of empty blocks for empty control flow arms
- Does not support parsing Poppy IR
- Produces different error messages
- Cannot parse `pop` except as the first instruction inside a `catch`
|
|
|
|
|
|
|
|
| |
The new wat parser currently considers itself to be at the end of the file
whenever it cannot lex another token. This is not quite right, but fixing it
causes parser errors because of the extra null character we were appending to
files when we read them. This null character is not useful since we can already
read files as `std::string`, which always has an implicit null character, so
remove it. Clean up some users of `read_file` while we're at it.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the standard text format, try scopes can be targeted by both normal branches
and delegates, but in Binaryen IR we only allow them to be targeted by
delegates, so we have to translate branches to try scopes into branches to
wrapper blocks instead. These wrapper blocks must have different names than the
try expressions they wrap, so we actually need to track two label names for try
expressions: one for delegates and another for normal branches.
We previously tried to avoid this complexity by tracking only the branch label
and computing the delegate label from the branch label as necessary, but that
produced unnecessary wrapper blocks and ugly label names that did not appear in
the source.
To produce better IR and minimize the diff when switching to the new text
parser, bite the bullet and track the delegate and branch label names
separately. This eliminates unnecessary wrapper blocks and keeps try names the
same as in the wat source where possible.
|
|
|
|
| |
To reduce the size of the test output diff when switching to the new text
parser, update it to generate the same block names as the legacy parser.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We previously would eagerly drop all concretely typed expressions on the stack
when pushing an unreachable instruction. This was semantically correct and
closely modeled the semantics of unreachable instructions, which implicitly drop
the entire stack and start a new polymorphic stack. However, it also meant that
the structure of the parsed IR did not match the structure of the folded input,
which meant that tests involving unreachable children would not parse as
intended, preventing the test from testing the intended behavior.
For example, this wat:
```wasm
(i32.add
(i32.const 0)
(unreachable)
)
```
Would previously parse into this IR:
```wasm
(drop
(i32.const 0)
)
(i32.add
(unreachable)
(unreachable)
)
```
To fix this problem, we need to stop eagerly dropping stack values when
encountering an unreachable instruction so we can still pop expressions pushed
before the unreachable as direct children of later instructions. In the example
above, we need to keep the `i32.const 0` on the stack so it is available to be
popped and become a child of the `i32.add`.
However, the naive solution of simply popping past unreachables would produce
invalid IR in some cases. For example, consider this wat:
```wasm
f32.const 0
unreachable
i32.add
```
The naive solution would parse this wat into this IR:
```wasm
(i32.add
(f32.const 0)
(unreachable)
)
```
But we do not want to parse an `i32.add` with an `f32`-typed child. Neither do
we want to reject this input, since it is a perfectly valid Wasm fragment. In this
case, we actually want the old behavior of dropping the `f32.const` and
replacing it with another `unreachable` as the first child of the `i32.add`.
To both match the input structure where possible and also gracefully fall back
to the old behavior of dropping expressions prior to the unreachable, collect
constraints on the types of each child for each kind of expression and compare
them to the types of available expressions on the stack when an unreachable
instruction will be popped. When the constraints are satisfied, pop expressions
normally, even after popping the unreachable instruction. Otherwise, drop the
instructions that precede the unreachable instruction to ensure we parse valid
IR.
To collect the constraints, add a new `ChildTyper` utility that calls a
different callback for each kind of possible type constraint for each child. In
the future, this utility can be used to simplify the validator as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The latest idea for efficient string constants is to encode the constants in
the import names of their globals and implement fast paths in the engines for
materializing those constants at instantiation time without needing to parse
anything in JS. This strategy only works for valid strings (i.e. strings without
unpaired surrogates) because only valid strings can be used as import names in
the WebAssembly syntax.
Add a new configuration of the StringLowering pass that encodes valid string
contents in import names, falling back to the JSON custom section approach for
invalid strings.
To test this chang, update the printer to escape import and export names
properly and update the legacy parser to parse escapes in import and export
names properly. As a drive-by, remove the incorrect check in the parser that the
import module and base names are non-empty.
|
|
|
|
|
|
|
| |
- Only write explicit function names.
- When merging modules, the name of types, globals and tags in all
modules but the first were lost.
- Set name as explicit when copying a function with a new name.
|
|
|
|
|
|
|
| |
When we need to pop a tuple and the top value on the stack is unreachable, just
pop the unreachable rather than producing a tuple.make. This always produces
valid IR since an unreachable is always valid where a tuple would otherwise be
expected. It also avoids bloating the parsed IR, since we would previously parse
a `tuple.make` where all the children were unreachable in this case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a combined commit covering multiple PRs fixing the handling of return
calls in different areas. The PRs are all landed as a single commit to ensure
internal consistency and avoid problems with bisection.
Original PR descriptions follow:
* Fix inlining of `return_call*` (#6448)
Previously we transformed return calls in inlined function bodies into normal
calls followed by branches out to the caller code. Similarly, when inlining a
`return_call` callsite, we simply added a `return` after the body inlined at the
callsite. These transformations would have been correct if the semantics of
return calls were to call and then return, but they are not correct for the
actual semantics of returning and then calling.
The previous implementation is observably incorrect for return calls inside try
blocks, where the previous implementation would run the inlined body within the
try block, but the proper semantics would be to run the inlined body outside the
try block.
Fix the problem by transforming inlined return calls to branches followed by
calls rather than as calls followed by branches. For the case of inlined return
call callsites, insert branches out of the original body of the caller and
inline the body of the callee as a sibling of the original caller body. For the
other case of return calls appearing in inlined bodies, translate the return
calls to branches out to calls inserted as siblings of the original inlined
body.
In both cases, it would have been convenient to use multivalue block return to
send call parameters along the branches to the calls, but unfortunately in our
IR that would have required tuple-typed scratch locals to unpack the tuple of
operands at the call sites. It is simpler to just use locals to propagate the
operands in the first place.
* Fix interpretation of `return_call*` (#6451)
We previously interpreted return calls as calls followed by returns, but that is
not correct both because it grows the size of the execution stack and because it
runs the called functions in the wrong context, which can be observable in the
case of exception handling.
Update the interpreter to handle return calls correctly by adding a new
`RETURN_CALL_FLOW` that behaves like a return, but carries the arguments and
reference to the return-callee rather than normal return values.
`callFunctionInternal` is updated to intercept this flow and call return-called
functions in a loop until a function returns with some other kind of flow.
Pull in the upstream spec tests return_call.wast, return_call_indirect.wast, and
return_call_ref.wast with light editing so that we parse and validate them
successfully.
* Handle return calls in wasm-ctor-eval (#6464)
When an evaluated export ends in a return call, continue evaluating the
return-called function. This requires propagating the parameters, handling the
case that the return-called function might be an import, and fixing up local
indices in case the final function has different parameters than the original
function.
* Update effects.h to handle return calls correctly (#6470)
As far as their surrounding code is concerned return calls are no different from
normal returns. It's only from a caller's perspective that a function containing
a return call also has the effects of the return-callee. To model this more
precisely in EffectAnalyzer, stash the throw effect of return-callees on the
side and only merge it in at the end when analyzing the effects of a full
function body.
|
|
|
|
|
|
|
|
| |
This PR is part of a series that adds basic support for the typed
continuations/wasmfx proposal.
This particular PR adds cont and nocont as top and bottom types for
continuation types, completely analogous to func and nofunc for function types
(also: exn and noexn).
|
|
|
|
| |
- Output segment names even when no memory is declared.
- Only write explicit names.
|
|
|
| |
The types was ignored and funcref was always used instead.
|
|
|
|
|
|
| |
The stringview types (`stringview_wtf8`, `stringview_wtf16`, and
`stringview_iter`) are not subtypes of `any` even though they are supertypes of
`none`. This breaks the type system invariant that types share a bottom type iff
they share a top type, but we can work around that.
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we printed strings as WTF-8 in the output of fuzz-exec, but this
could produce invalid unicode output and did not make unprintable characters
visible. Fix both these problems by escaping the output, using the JSON string
escape procedure since the string to be escaped is WTF-16. Reimplement the same
escaping procedure in fuzz_shell.js so that the way we print strings when
running on a real JS engine matches the way we print them in our own fuzz-exec
interpreter.
Fixes #6435.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
WTF-16, i.e. arbitrary sequences of 16-bit values, is the encoding of Java and
JavaScript strings, and using the same encoding makes the interpretation of
string operations trivial, even when accounting for non-ascii characters.
Specifically, use little-endian WTF-16.
Re-encode string constants from WTF-8 to WTF-16 in the parsers, then back to
WTF-8 in the writers. Update the constructor for string `Literal`s to interpret
the string as WTF-16 and store a sequence of WTF-16 code units, i.e. 16-bit
integers. Update `Builder::makeConstantExpression` accordingly to convert from
the new `Literal` string representation back to a WTF-16 string.
Update the interpreter to remove the logic for detecting non-ascii characters
and bailing out. The naive implementations of all the string operations are
correct now that our string encoding matches the JS string encoding.
|