summaryrefslogtreecommitdiff
path: root/src/passes/StringLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [Strings] StringGathering: Handle uses of strings before their definitions ↵Alon Zakai2024-10-151-10/+8
| | | | | | | | | | (#7008) When we gather strings, we create new globals for each one, that is then the canonical defining global for it, which will then be used everywhere else. We create such a global if we lack one, but if we happen to have such a global - a global that simply defines a string - then we reuse it. But we didn't handle the case where there was a use before the definition, and failed to sort the definition before the use.
* Require string-style identifiers to be UTF-8 (#6941)Thomas Lively2024-09-161-2/+5
| | | | | | | | | | | In the WebAssembly text format, strings can generally be arbitrary bytes, but identifiers must be valid UTF-8. Check for UTF-8 validity when parsing string-style identifiers in the lexer. Update StringLowering to generate valid UTF-8 global names even for strings that may not be valid UTF-8 and test that text round tripping works correctly after StringLowering. Fixes #6937.
* Only generate string.consts custom section if it is needed (#6893)Goktug Gokdogan2024-09-051-7/+10
|
* Add a string lowering mode disallowing non-UTF-8 strings (#6861)Thomas Lively2024-08-211-2/+19
| | | | | | | | | | | The best way to lower strings is via the "magic imports" API that uses the names of imported string globals as their values. This approach only works for valid UTF-8 strings, though. The existing string-lowering-magic-imports pass falls back to putting non-UTF-8 strings in a JSON custom section, but this requires the runtime to support that custom section for correctness. To help catch errors early when runtimes do not support the strings custom section, add a new pass that uses magic imports and raises an error if there are any invalid strings.
* Fix direct comparisons with unshared basic heap types (#6845)Thomas Lively2024-08-161-4/+9
| | | | | Audit the remaining ocurrences of `== HeapType::` and fix those that did not handle shared types correctly. Add tests for some of the fixes; others are NFC but clarify the code.
* [NFC] Add HeapType::getFeatures() (#6707)Alon Zakai2024-06-271-1/+1
|
* [Strings] Keep public and private types separate in StringLowering (#6642)Alon Zakai2024-06-101-13/+39
| | | | | | | | | | | | | | | | We need StringLowering to modify even public types, as it must replace every single stringref with externref, even if that modifies the ABI. To achieve that we told it that all string-using types were private, which let TypeUpdater update them, but the problem is that it moves all private types to a new single rec group, which meant public and private types ended up in the same group. As a result, a single public type would make it all public, preventing optimizations and breaking things as in #6630 #6640. Ideally TypeUpdater would modify public types while keeping them in the same rec groups, but this may be a very specific issue for StringLowering, and that might be a lot of work. Instead, just make StringLowering handle public types of functions in a manual way, which is simple and should handle all cases that matter in practice, at least in J2Wasm.
* [Strings] Remove operations not included in imported strings (#6589)Thomas Lively2024-05-151-5/+6
| | | | | | The stringref proposal has been superseded by the imported JS strings proposal, but the former has many more operations than the latter. To reduce complexity, remove all operations that are part of stringref but not part of imported strings.
* [Strings] Remove stringview types and instructions (#6579)Thomas Lively2024-05-151-34/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The stringview types from the stringref proposal have three irregularities that break common invariants and require pervasive special casing to handle properly: they are supertypes of `none` but not subtypes of `any`, they cannot be the targets of casts, and they cannot be used to construct nullable references. At the same time, the stringref proposal has been superseded by the imported strings proposal, which does not have these irregularities. The cost of maintaing and improving our support for stringview types is no longer worth the benefit of supporting them. Simplify the code base by entirely removing the stringview types and related instructions that do not have analogues in the imported strings proposal and do not make sense in the absense of stringviews. Three remaining instructions, `stringview_wtf16.get_codeunit`, `stringview_wtf16.slice`, and `stringview_wtf16.length` take stringview operands in the stringref proposal but cannot be removed because they lower to operations from the imported strings proposal. These instructions are changed to take stringref operands in Binaryen IR, and to allow a graceful upgrade path for users of these instructions, the text and binary parsers still accept but ignore `string.as_wtf16`, which is the instruction used to convert stringrefs to stringviews. The binary writer emits code sequences that use scratch locals and `string.as_wtf16` to keep the output valid. Future PRs will further align binaryen with the imported strings proposal instead of the stringref proposal, for example by making `string` a subtype of `extern` instead of a subtype of `any` and by removing additional instructions that do not have analogues in the imported strings proposal.
* [Strings] Do not reuse mutable globals in StringGathering (#6531)Alon Zakai2024-04-241-1/+2
| | | | | We were reusing mutable globals in StringGathering, which meant that we'd use a global to represent a particular string but if it was mutated then it could contain a different string during execution.
* [Strings] Add a string lowering pass using magic imports (#6497)Thomas Lively2024-04-151-11/+24
| | | | | | | | | | | | | | | | | The latest idea for efficient string constants is to encode the constants in the import names of their globals and implement fast paths in the engines for materializing those constants at instantiation time without needing to parse anything in JS. This strategy only works for valid strings (i.e. strings without unpaired surrogates) because only valid strings can be used as import names in the WebAssembly syntax. Add a new configuration of the StringLowering pass that encodes valid string contents in import names, falling back to the JSON custom section approach for invalid strings. To test this chang, update the printer to escape import and export names properly and update the legacy parser to parse escapes in import and export names properly. As a drive-by, remove the incorrect check in the parser that the import module and base names are non-empty.
* [Strings] Lower string.concat in StringLowering (#6453)Thomas Lively2024-03-291-0/+9
|
* [Strings] Represent string values as WTF-16 internally (#6418)Thomas Lively2024-03-221-1/+7
| | | | | | | | | | | | | | | | WTF-16, i.e. arbitrary sequences of 16-bit values, is the encoding of Java and JavaScript strings, and using the same encoding makes the interpretation of string operations trivial, even when accounting for non-ascii characters. Specifically, use little-endian WTF-16. Re-encode string constants from WTF-8 to WTF-16 in the parsers, then back to WTF-8 in the writers. Update the constructor for string `Literal`s to interpret the string as WTF-16 and store a sequence of WTF-16 code units, i.e. 16-bit integers. Update `Builder::makeConstantExpression` accordingly to convert from the new `Literal` string representation back to a WTF-16 string. Update the interpreter to remove the logic for detecting non-ascii characters and bailing out. The naive implementations of all the string operations are correct now that our string encoding matches the JS string encoding.
* SubtypingDiscoverer: Differentiate non-flow subtyping constraints (#6344)Alon Zakai2024-02-271-0/+4
| | | | | | | | | | | | | | | | | | When we do a local.set of a value into a local then we have both a subtyping constraint - for the value to be valid to put in that local - and also a flow of a value, which can then reach more places. Such flow then interacts with casts in Unsubtyping, since it needs to know what can flow where in order to know how casts force us to keep subtyping relations. That regressed in the not-actually-NFC #6323 in which I added the innocuous lines to add subtyping constraints in ref.eq. It seems fine to require that the arms of a RefEq must be of type eqref, but Unsubtyping then assuming those arms flowed into a location of type eqref... which means casts might force us to not optimize some things. To fix this, differentiate the rare case of non-flowing subtyping constraints, which is basically only RefEq. There are perhaps a few more cases (like i31 operations) but they do not matter in practice for Unsubtyping anyhow; I suggest we land this first to undo the regression and then at our leisure investigate the other instructions.
* [StringLowering] Lower `stringview_wtf16.get_codeunit` to `charCodeAt` (#6353)Thomas Lively2024-02-261-4/+4
| | | | Previously we lowered this to `getCodePointAt`, which has different semantics around surrogate pairs.
* [NFC] Use SubtypingDiscoverer in StringLowering (#6325)Alon Zakai2024-02-201-49/+49
| | | | | | | | This replaces horrible hacks to find which nulls need to switch (from none to noext) with general code using SubtypingDiscoverer. That helper is aware of where each expression is written, so we can find those nulls trivially. This is NFC on existing usage but should fix any remaining bugs with null constants.
* StringLowering: Escape the JSON in the custom section (#6316)Alon Zakai2024-02-201-8/+12
| | | | Also add an end-to-end test using node to verify we can parse the escaped content properly using TextDecoder+JSON.parse.
* StringLowering: Lower nulls in call params (#6317)Alon Zakai2024-02-201-0/+10
|
* StringLowering: Properly handle nullable inputs to StringAs (#6307)Alon Zakai2024-02-141-1/+11
| | | StringAs's output must be non-nullable, so add a cast.
* StringLowering: Fix up nulls written to struct.new fields (#6306)Alon Zakai2024-02-141-16/+36
|
* StringLowering: Use an array16 type in its own rec group (#6302)Alon Zakai2024-02-131-9/+25
| | | | | | | | | | | | The input module might use an array of 16-bit elements type that is somewhere in a giant rec group, but that is not valid for imported strings: that array type is now on an import and must match the expected ABI, which is to be in its own personal rec group. The old array16 type remains in the module after this transformation, but all uses of it are replaced with uses of the new array16 type. Also move makeImports to after updateTypes: there are no types to update in the new imports. That does not matter but it can make debugging less pleasant, so improve it.
* StringLowering: Hack around if issue with bottom types (#6303)Alon Zakai2024-02-131-0/+21
| | | | | Replacing the string heap type with extern is dangerous as they do not share top/bottom types. In practice this works out almost everywhere except for a few ifs, which we can fix up as a hack for now.
* StringLowering: Modify string=>extern also in public types (#6301)Alon Zakai2024-02-131-1/+15
| | | | We want to actually remove all stringref appearances, in both public and private types.
* [NFC] Add links to specs in StringLowering (#6292)Alon Zakai2024-02-081-0/+4
|
* StringLowering: Lower all remaining important string operations (#6283)Alon Zakai2024-02-081-0/+84
| | | All those in the list from #6271 (comment)
* StringLowering: Start to lower instructions (#6281)Alon Zakai2024-02-061-0/+82
|
* StringLowering pass (#6271)Alon Zakai2024-02-051-4/+60
| | | | | | | | | | | | | | | | | | This extends StringGathering by replacing the gathered string globals to imported globals. It adds a custom section with the strings that the imports are expected to provide. It also replaces the string type with extern. This is a complete lowering of strings, except for string operations that are a TODO. After running this, no strings remain in the wasm, and the outside JS is expected to provide the proper imports, which it can do by processing the JSON of the strings in the custom section "string.consts", which looks like ["foo", "bar", ..] That is, an array of strings, which are imported as (import "string.const" "0" (global $string.const_foo (ref extern))) ;; foo (import "string.const" "1" (global $string.const_bar (ref extern))) ;; bar
* StringGathering pass (#6257)Alon Zakai2024-01-311-0/+180
This pass finds all string.const and creates globals for them. After this transform, no string.const appears anywhere but in a global, and each string appears in one global which is then global.get-ed everywhere. This avoids overhead in VMs where executing a string.const is an allocation, and is also a good step towards imported strings. For that, this pass will be extended from gathering to a full lowering pass, which will first gather into globals as this pass does, and then turn each of those globals with a string.const into an imported externref. (For that reason this pass is in a file called StringLowering, as the two passes will share much of their code, and the larger pass should decide the name I think.) This pass runs in -O2 and above. Repeated executions have no downside (see details in code).