| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
(#7008)
When we gather strings, we create new globals for each one, that is then
the canonical defining global for it, which will then be used everywhere
else. We create such a global if we lack one, but if we happen to have such
a global - a global that simply defines a string - then we reuse it. But we
didn't handle the case where there was a use before the definition, and
failed to sort the definition before the use.
|
|
|
|
|
|
|
|
|
|
|
| |
In the WebAssembly text format, strings can generally be arbitrary
bytes, but identifiers must be valid UTF-8. Check for UTF-8 validity
when parsing string-style identifiers in the lexer.
Update StringLowering to generate valid UTF-8 global names even for
strings that may not be valid UTF-8 and test that text round tripping
works correctly after StringLowering.
Fixes #6937.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The best way to lower strings is via the "magic imports" API that uses
the names of imported string globals as their values. This approach only
works for valid UTF-8 strings, though. The existing
string-lowering-magic-imports pass falls back to putting non-UTF-8
strings in a JSON custom section, but this requires the runtime to
support that custom section for correctness. To help catch errors early
when runtimes do not support the strings custom section, add a new pass
that uses magic imports and raises an error if there are any invalid
strings.
|
|
|
|
|
| |
Audit the remaining ocurrences of `== HeapType::` and fix those that did
not handle shared types correctly. Add tests for some of the fixes;
others are NFC but clarify the code.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need StringLowering to modify even public types, as it must replace every
single stringref with externref, even if that modifies the ABI. To achieve that
we told it that all string-using types were private, which let TypeUpdater update
them, but the problem is that it moves all private types to a new single
rec group, which meant public and private types ended up in the same group.
As a result, a single public type would make it all public, preventing optimizations
and breaking things as in #6630 #6640.
Ideally TypeUpdater would modify public types while keeping them in the same
rec groups, but this may be a very specific issue for StringLowering, and that
might be a lot of work. Instead, just make StringLowering handle public types of
functions in a manual way, which is simple and should handle all cases that
matter in practice, at least in J2Wasm.
|
|
|
|
|
|
| |
The stringref proposal has been superseded by the imported JS strings proposal,
but the former has many more operations than the latter. To reduce complexity,
remove all operations that are part of stringref but not part of imported
strings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The stringview types from the stringref proposal have three irregularities that
break common invariants and require pervasive special casing to handle properly:
they are supertypes of `none` but not subtypes of `any`, they cannot be the
targets of casts, and they cannot be used to construct nullable references. At
the same time, the stringref proposal has been superseded by the imported
strings proposal, which does not have these irregularities. The cost of
maintaing and improving our support for stringview types is no longer worth the
benefit of supporting them.
Simplify the code base by entirely removing the stringview types and related
instructions that do not have analogues in the imported strings proposal and do
not make sense in the absense of stringviews.
Three remaining instructions, `stringview_wtf16.get_codeunit`,
`stringview_wtf16.slice`, and `stringview_wtf16.length` take stringview operands
in the stringref proposal but cannot be removed because they lower to operations
from the imported strings proposal. These instructions are changed to take
stringref operands in Binaryen IR, and to allow a graceful upgrade path for
users of these instructions, the text and binary parsers still accept but ignore
`string.as_wtf16`, which is the instruction used to convert stringrefs to
stringviews. The binary writer emits code sequences that use scratch locals and `string.as_wtf16` to keep the output valid.
Future PRs will further align binaryen with the imported strings proposal
instead of the stringref proposal, for example by making `string` a subtype of
`extern` instead of a subtype of `any` and by removing additional instructions
that do not have analogues in the imported strings proposal.
|
|
|
|
|
| |
We were reusing mutable globals in StringGathering, which meant
that we'd use a global to represent a particular string but if it was mutated
then it could contain a different string during execution.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The latest idea for efficient string constants is to encode the constants in
the import names of their globals and implement fast paths in the engines for
materializing those constants at instantiation time without needing to parse
anything in JS. This strategy only works for valid strings (i.e. strings without
unpaired surrogates) because only valid strings can be used as import names in
the WebAssembly syntax.
Add a new configuration of the StringLowering pass that encodes valid string
contents in import names, falling back to the JSON custom section approach for
invalid strings.
To test this chang, update the printer to escape import and export names
properly and update the legacy parser to parse escapes in import and export
names properly. As a drive-by, remove the incorrect check in the parser that the
import module and base names are non-empty.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
WTF-16, i.e. arbitrary sequences of 16-bit values, is the encoding of Java and
JavaScript strings, and using the same encoding makes the interpretation of
string operations trivial, even when accounting for non-ascii characters.
Specifically, use little-endian WTF-16.
Re-encode string constants from WTF-8 to WTF-16 in the parsers, then back to
WTF-8 in the writers. Update the constructor for string `Literal`s to interpret
the string as WTF-16 and store a sequence of WTF-16 code units, i.e. 16-bit
integers. Update `Builder::makeConstantExpression` accordingly to convert from
the new `Literal` string representation back to a WTF-16 string.
Update the interpreter to remove the logic for detecting non-ascii characters
and bailing out. The naive implementations of all the string operations are
correct now that our string encoding matches the JS string encoding.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we do a local.set of a value into a local then we have both a subtyping constraint - for
the value to be valid to put in that local - and also a flow of a value, which can then reach
more places. Such flow then interacts with casts in Unsubtyping, since it needs to know
what can flow where in order to know how casts force us to keep subtyping relations.
That regressed in the not-actually-NFC #6323 in which I added the innocuous lines
to add subtyping constraints in ref.eq. It seems fine to require that the arms of a
RefEq must be of type eqref, but Unsubtyping then assuming those arms flowed into
a location of type eqref... which means casts might force us to not optimize some
things.
To fix this, differentiate the rare case of non-flowing subtyping constraints, which is
basically only RefEq. There are perhaps a few more cases (like i31 operations) but they
do not matter in practice for Unsubtyping anyhow; I suggest we land this first to undo
the regression and then at our leisure investigate the other instructions.
|
|
|
|
| |
Previously we lowered this to `getCodePointAt`, which has different semantics
around surrogate pairs.
|
|
|
|
|
|
|
|
| |
This replaces horrible hacks to find which nulls need to switch (from none to
noext) with general code using SubtypingDiscoverer. That helper is aware of
where each expression is written, so we can find those nulls trivially.
This is NFC on existing usage but should fix any remaining bugs with null
constants.
|
|
|
|
| |
Also add an end-to-end test using node to verify we can parse the escaped
content properly using TextDecoder+JSON.parse.
|
| |
|
|
|
| |
StringAs's output must be non-nullable, so add a cast.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The input module might use an array of 16-bit elements type that is somewhere in a
giant rec group, but that is not valid for imported strings: that array type is now on an
import and must match the expected ABI, which is to be in its own personal rec group.
The old array16 type remains in the module after this transformation, but all uses of it
are replaced with uses of the new array16 type.
Also move makeImports to after updateTypes: there are no types to update in the new
imports. That does not matter but it can make debugging less pleasant, so improve it.
|
|
|
|
|
| |
Replacing the string heap type with extern is dangerous as they do not share top/bottom
types. In practice this works out almost everywhere except for a few ifs, which we can fix
up as a hack for now.
|
|
|
|
| |
We want to actually remove all stringref appearances, in both public and
private types.
|
| |
|
|
|
| |
All those in the list from #6271 (comment)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extends StringGathering by replacing the gathered string globals to imported
globals. It adds a custom section with the strings that the imports are expected to
provide. It also replaces the string type with extern.
This is a complete lowering of strings, except for string operations that are a TODO.
After running this, no strings remain in the wasm, and the outside JS is expected
to provide the proper imports, which it can do by processing the JSON of the
strings in the custom section "string.consts", which looks like
["foo", "bar", ..]
That is, an array of strings, which are imported as
(import "string.const" "0" (global $string.const_foo (ref extern))) ;; foo
(import "string.const" "1" (global $string.const_bar (ref extern))) ;; bar
|
|
This pass finds all string.const and creates globals for them. After this transform, no
string.const appears anywhere but in a global, and each string appears in one global
which is then global.get-ed everywhere.
This avoids overhead in VMs where executing a string.const is an allocation, and is
also a good step towards imported strings. For that, this pass will be extended from
gathering to a full lowering pass, which will first gather into globals as this pass does,
and then turn each of those globals with a string.const into an imported externref.
(For that reason this pass is in a file called StringLowering, as the two passes will
share much of their code, and the larger pass should decide the name I think.)
This pass runs in -O2 and above. Repeated executions have no downside (see
details in code).
|