summaryrefslogtreecommitdiff
path: root/src/wasm
Commit message (Collapse)AuthorAgeFilesLines
* Work around a gcc 13 issue with signbit that made us not compute fmin of -0 ↵Alon Zakai2023-10-041-4/+14
| | | | properly (#5994)
* [Parser] Parse labels and br (#5970)Thomas Lively2023-10-022-14/+87
| | | | | | The parser previously parsed labels and could attach them to control flow structures, but did not maintain the context necessary to correctly parse branches. Support parsing labels as both names and indices in IRBuilder, handling shadowing correctly, and use that support to implement parsing of br.
* Refine ref.test's castType during refinalization (#5985)Thomas Lively2023-10-022-0/+6
| | | | | | Just like we do with other casts, refine the cast type to be the greatest lower bound of its previous cast type and its input type. The difference is that the output type of ref.test remains i32, but it's still useful to retain more precise type information.
* wasm-s-parser: Add context in validation errors (#5981)Alon Zakai2023-09-281-201/+188
| | | | Instead of just reporting the reason and line + column, also log out the element the error occurred at.
* Support function contexts in IRBuilder (#5967)Thomas Lively2023-09-221-31/+35
| | | | | | Add a `visitFunctionStart` function to IRBuilder and make it responsible for setting the function's body when the context is closed. This will simplify outlining, will be necessary to support branches to function scope properly, and removes an extra block around function bodies in the new wat parser.
* [Parser] Support loops (#5966)Thomas Lively2023-09-211-5/+20
| | | Parse loops in the new wat parser and add support for them to the IRBuilder.
* [Parser] Parse if-else in the new wat parser and IRBuilder (#5963)Thomas Lively2023-09-211-15/+108
| | | | | | Parse both the straight-line and folded versions of if, including the abbreviations that allow omitting the else clause. In the IRBuilder, generalize the scope stack to be able to track scopes other than blocks and add methods for visiting the beginnings of ifs and elses.
* Support i8/i16 mutable arrays as public types for string interop (#5814)Alon Zakai2023-09-212-1/+28
| | | | | Probably any array of non-reference data can be allowed to be public and sent out of the module, as it is just data. For now, however, just special case the i8 and i16 array types which are useful already for string interop.
* Error on multivalue inputs that we do not handle (#5962)Alon Zakai2023-09-201-2/+6
| | | | | | Before in getType() we silently dropped the params of a signature type. Now we verify that it is none, or we error. Helps #5950
* [NFC] Split the new wat parser into multiple files (#5960)Thomas Lively2023-09-193-4935/+0
| | | | | | And put the new files in a new source directory, "parser". This is a rough split and is not yet expected to dramatically improve compile times. The exact organization of the new files is subject to change, but this splitting should be enough to make further parser development more pleasant.
* Reland "Optimize tuple.extract of gets in BinaryInstWriter" (#5955)Thomas Lively2023-09-181-1/+42
| | | | | | | | | In general, the binary lowering of tuple.extract expects that all the tuple values are on top of the stack, so it inserts drops and possibly uses a scratch local to ensure only the extracted value is left. However, when the extracted tuple expression is a local.get, local.tee, or global.get, it's much more efficient to change the lowering of the get or tee to ensure that only the extracted value is on the stack to begin with. Implement that optimization in the binary writer.
* Fix visitBlock and add visitBlockStart in IRBuilder (#5959)Thomas Lively2023-09-191-2/+7
| | | | | | | | | | Visiting a block should push it onto the stack just like visiting any other expression, but we previously had a `visitBlock` that introduced a new scope instead. Fix `visitBlock` to behave as expected and introduce a new `visitBlockStart` method to introduce a new scope. Unfortunately this cannot be fully tested yet because the wat parser uses the `makeXYZ` API intead of the `visit` API, but at least I updated `makeBlock` to call `visitBlockStart`, so that is tested.
* Fix validation error message for table.fill (#5953)Thomas Lively2023-09-181-4/+3
| | | table.fill requires bulk memory to be enabled, not reference types.
* Implement table.fill (#5949)Thomas Lively2023-09-186-0/+71
| | | | | | | | This instruction was standardized as part of the bulk memory proposal, but we never implemented it until now. Leave similar instructions like table.copy as future work. Fixes #5939.
* Remove legacy type defintion text syntax (#5948)Thomas Lively2023-09-181-40/+4
| | | | | | | Remove support for the "struct_subtype", "array_subtype", "func_subtype", and "extends" notations we used at various times to declare WasmGC types, leaving only support for the standard text fromat for declaring types. Update all the tests using the old formats and delete tests that existed solely to test the old formats.
* Revert "Optimize tuple.extract of gets in BinaryInstWriter (#5941)" (#5945)Thomas Lively2023-09-141-42/+1
| | | | | This reverts commit 56ce1eaba7f500b572bcfe06e3248372e9672322. The binary writer optimization is not always correct when stack IR optimizations have run. Revert the change until we can fix it.
* Optimize tuple.extract of gets in BinaryInstWriter (#5941)Thomas Lively2023-09-141-1/+42
| | | | | | | | | In general, the binary lowering of tuple.extract expects that all the tuple values are on top of the stack, so it inserts drops and possibly uses a scratch local to ensure only the extracted value is left. However, when the extracted tuple expression is a local.get, local.tee, or global.get, it's much more efficient to change the lowering of the get or tee to ensure that only the extracted value is on the stack to begin with. Implement that optimization in the binary writer.
* Encode command line to UTF8 on Windows (#5671)Derek Schuff2023-09-141-3/+11
| | | | | | | | | | | | | | | | This PR changes how file paths and the command line are handled. On startup on Windows, we process the wstring version of the command line (including the file paths) and re-encode it to UTF8 before handing it off to the rest of the command line handling logic. This means that all paths are stored in UTF8-encoded std::strings as they go through the program, right up until they are used to open files. At that time, they are converted to the appropriate native format with the new to_path function before passing to the stdlib open functions. This has the advantage that all of the non-file-opening code can use a single type to hold paths (which is good since std::filesystem::path has proved problematic in some cases), but has the disadvantage that someone could add new code that forgets to convert to_path before opening. That's somewhat mitigated by the fact that most of the code uses the ModuleIOBase classes for opening files. Fixes #4995
* Replace i31.new with ref.i31 everywhere (#5931)Thomas Lively2023-09-131-2/+2
| | | | | Replace i31.new with ref.i31 in the printer, tests, and source code. Continue parsing i31.new for the time being to allow a graceful transition. Also update the JS API to reflect the new instruction name.
* Replace I31New with RefI31 everywhere (#5930)Thomas Lively2023-09-137-21/+21
| | | | | | | | Globally replace the source string "I31New" with "RefI31" in preparation for renaming the instruction from "i31.new" to "ref.i31", as implemented in the spec in https://github.com/WebAssembly/gc/pull/422. This would be NFC, except that it also changes the string in the external-facing C APIs. A follow-up PR will make the corresponding behavioral change.
* Remove legacy GC text syntax (#5929)Thomas Lively2023-09-121-34/+4
| | | | Remove the old forms of ref.test and ref.cast that took heap types instead of ref types and remove the old array.init_static name for array.new_fixed.
* Make final types the default (#5918)Thomas Lively2023-09-094-21/+28
| | | | | | | | | | | | | Match the spec and parse the shorthand binary and text formats as final and emit final types without supertypes using the shorthands as well. This is a potentially-breaking change, since the text and binary shorthands can no longer be used to define types that have subtypes. Also make TypeBuilder entries final by default to better match the spec and update the internal APIs to use the "open" terminology rather than "final" terminology. Future changes will update the text format to use the standard "sub open" rather than the current "sub final" keywords. The exception is the new wat parser, which supporst "sub open" as of this change, since it didn't support final types at all previously.
* [NFC] Source maps: Simplify the code and add comments (#5912)Alon Zakai2023-09-061-21/+16
| | | | | | | | | | | | | | | Almost entirely trivial, except for this part: - if (nextDebugLocation.availablePos == 0 && - nextDebugLocation.previousPos <= pos) { + if (nextDebugLocation.availablePos == 0) { return; I believe removing the extra check has no effect. Removing it does not change anything in the test suite, and logically, if we set availablePos to 0 then we definitely want to return here - we set it to 0 to indicate there is nothing left to read, which is what the code after it does. As a result, we can remove the previousPos field entirely.
* Remove the GCNNLocals feature (#5080)Thomas Lively2023-08-312-42/+8
| | | | | Now that the WasmGC spec has settled on a way of validating non-nullable locals, we no longer need this experimental feature that allowed nonstandard uses of non-nullable locals.
* Parse non-nullable tuple elements without special handling (#5910)Thomas Lively2023-08-301-23/+5
| | | | | | | In the binary parser, when creating a scratch local to hold multivalue results as tuples, we previously ensured that the scratch local did not contain any non-nullable by modifying its type and inserting ref.as_non_null as necessary. Now that we properly support non-nullable elements in tuple locals, however, this parser behavior is no longer necessary. Remove it.
* Source maps: Fix locations without debug info between ones that do (#5906)Alon Zakai2023-08-301-10/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we did nothing for instructions without debug info. So if we had one that did, followed by one that didn't, the one that didn't could get "smeared" with the debug info of the first. Source map locations are the start of segments, apparently, and so if we say a location has info then all others after it will as well, until the next segment. To fix that, add support for source map entries with just one field, the binary location. Such entries have no debug info (no file:line:col), and though the source maps spec is not very clear on this, this seems like the right way to prevent this problem: to stop a segment with debug info by starting a new one without, when we know we don't want that info any more. That is, before this PR we could have this: ;; file.cpp:10:1 (nop) ;; binary offset 5 in wasm (nop) ;; binary offset 6 in wasm and the second nop would get the same debug annotation, since we just have one source map segment, [5, file.cpp, 10, 1] // start at offset 5 in wasm With this PR, we emit: [5, file.cpp, 10, 1] // start at offset 5 in wasm; file.cpp:10:1 [6] // start at offset 6 in wasm; no debug info This does add 7% to source map sizes, however, since we add those 1-length segments now, but that seems unavoidable to fix this bug. To implement this, add a new field that says if the next location in the source map has debug info or not, and use that.
* Validate and fix up tuples with non-nullable elements (#5909)Thomas Lively2023-08-301-3/+6
| | | | | | The code validating and fixing up non-nullable locals previously did not correctly handle tuples that contained non-nullable elements, which could have resulted in invalid modules going undetected. Update the code to handle tuples and add tests.
* Refactor IRBuilder to build blocks internally (#5901)Thomas Lively2023-08-282-220/+316
| | | | | | | | | | | | | The initial PR introducing IRBuilder kept the interface the same as the previous internal interface in the new wat parser. This PR updates that interface to avoid exposing implementation details of the IRBuilder and to provide an API that matches the binary format. For example, after calling `makeBlock` or `visitBlock` at the beginning of a block, users now call `visitEnd()` at the end of the block without having to manually install the block's contents. Providing this improved interface requires refactoring some of the IRBuilder internals. While we are refactoring things anyway, put in extra effort to avoid unnecessarily splitting up and recombining tuples that could simply be returned from a multivalue block.
* Simplify and consolidate type printing (#5816)Thomas Lively2023-08-241-14/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When printing Binaryen IR, we previously generated names for unnamed heap types based on their structure. This was useful for seeing the structure of simple types at a glance without having to separately go look up their definitions, but it also had two problems: 1. The same name could be generated for multiple types. The generated names did not take into account rec group structure or finality, so types that differed only in these properties would have the same name. Also, generated type names were limited in length, so very large types that shared only some structure could also end up with the same names. Using the same name for multiple types produces incorrect and unparsable output. 2. The generated names were not useful beyond the most trivial examples. Even with length limits, names for nontrivial types were extremely long and visually noisy, which made reading disassembled real-world code more challenging. Fix these problems by emitting simple indexed names for unnamed heap types instead. This regresses readability for very simple examples, but the trade off is worth it. This change also reduces the number of type printing systems we have by one. Previously we had the system in Print.cpp, but we had another, more general and extensible system in wasm-type-printing.h and wasm-type.cpp as well. Remove the old type printing system from Print.cpp and replace it with a much smaller use of the new system. This requires significant refactoring of Print.cpp so that PrintExpressionContents object now holds a reference to a parent PrintSExpression object that holds the type name state. This diff is very large because almost every test output changed slightly. To minimize the diff and ease review, change the type printer in wasm-type.cpp to behave the same as the old type printer in Print.cpp except for the differences in name generation. These changes will be reverted in much smaller PRs in the future to generally improve how types are printed.
* Update stringref text format (#5891)Jérôme Vouillon2023-08-221-26/+38
| | | | | | | | | | | * Allow new syntax for some stringref opcodes Fixes #5607 * Update stringref text output * Update tests with new syntax for stringref opcodes Except in test/lit/strings.wat, to check that the legacy syntax still works.
* Factor IRBuilder utility out of the new wat parser (#5880)Thomas Lively2023-08-223-490/+800
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add an IRBuilder utility in a new wasm-ir-builder.h header. IRBuilder is extremely similar to Builder, except that it manages building full trees of Binaryen IR from a linear sequence of instructions, whereas Builder only builds a single IR node at a time. To build full IR trees, IRBuilder maintains an internal stack of expressions, popping children off the stack and pushing the new node onto the stack whenever it builds a new node. In addition to providing makeXYZ function to allocate, initialize, and finalize new IR nodes, IRBuilder also provides a visit() method that can be used when the user has already allocated the IR nodes and only needs to reconstruct the connections between them. This will be useful in outlining both for constructing outlined functions and for reconstructing functions around arbitrary outlined holes. Besides the new wat parser and outlining, this new utility can also eventually be used in the binary parser and to convert from Poppy IR back to Binaryen IR if that ever becomes necessary. To simplify this initial change, IRBuilder exposes the same interface as the code it replaces in the wat parser. A future change requiring more extensive changes to the wat parser will simplify this interface. Also, since the new code is tested only via the new wat parser, it only supports building instructions that were already supported by the new wat parser to avoid trying to support any instructions without corresponding testing. Implementing support for the remaining instructions is left as future work.
* Make the legacy parser follow more closely the standard GC text format (#5889)Jérôme Vouillon2023-08-221-13/+37
| | | | | | | | | | | * Allow empty `then` and `else` clauses * Allow standard syntax for `ref.test` and `ref.cast` Fixes #5795 * Allow size immediate in `array.new_fixed` Fixes #5769
* Rename multimemory flag (#5890)Ashley Nelson2023-08-213-7/+7
| | | Renaming the multimemory flag in Binaryen to match its naming in LLVM.
* Fix finalization of call_ref to handle refined target types (#5883)Thomas Lively2023-08-213-7/+22
| | | | | | | | | | Previously CallRef::finalize() would never update the type of the CallRef, even if the type of the call target had been refined to give a more precise result type. Besides unnecessarily losing type information, this could also lead to validation errors, since the validator checks that the type of CallRef matches the result type of the target signature. Fix the bug by updating CallRef's type based on its target signature in CallRef::finalize() and add a test that depends on this refinalization.
* Further improve ref.cast during finalization (#5882)Thomas Lively2023-08-171-16/+11
| | | | | | We previously improved the nullability and heap type of the ref.cast target type in RefCast::finalize() based on what we knew about its input type. Simplify the code and make this improvement more powerful by using the greatest lower bound of the original cast target and input type.
* Ensure br_on_cast* target type is subtype of input type (#5881)Thomas Lively2023-08-175-0/+22
| | | | | | | | | | | | | | | | The WasmGC spec will require that the target cast type of br_on_cast and br_on_cast_fail be a subtype of the input type, but so far Binaryen has not enforced this constraint, so it could produce invalid modules when optimizations refined the input to a br_on_cast* such that it was no longer a supertype of the cast target type. Fix this problem by setting the cast target type to be the greatest lower bound of the original cast target type and the current input type in `BrOn::finalize()`. This maintains the invariant that the cast target type should be a subtype of the input type and it also does not change cast behavior; any value that could make the original cast succeed at runtime necessarily inhabits both the original cast target type and the input type, so it also must inhabit their greatest lower bound and will make the updated cast succeed as well.
* Improve cast optimizations (#5876)Thomas Lively2023-08-171-0/+25
| | | | | | | | | | | | Simplify the optimization of ref.cast and ref.test in OptimizeInstructions by moving the loop that examines fallthrough values one at a time out to a shared function in properties.h. Also simplify ref.cast optimization by analyzing the cast result in just one place. In addition to simplifying the code, also make the cast optimizations more powerful by analyzing the nullability and heap type of the cast value independently, resulting in a potentially more precise analysis of the cast behavior. Also improve optimization power by considering fallthrough values when optimizing the SuccessOnlyIfNonNull case.
* [NFC] Factor `Result` and `MaybeResult` into a utility header (#5878)Thomas Lively2023-08-141-5/+0
| | | Allow them to be used for more than just the new text parser.
* Remove legacy WasmGC instructions (#5861)Thomas Lively2023-08-094-201/+53
| | | | | Remove old, experimental instructions and type encodings that will not be shipped as part of WasmGC. Updating the encodings and text format to match the final spec is left as future work.
* Fix a fuzz bug in TypeMapper (#5851)Thomas Lively2023-08-021-7/+34
| | | | | | | | | | | | | | | | | | TypeMapper is a utility used to globally rewrite types, mapping some eliminated source types into destination types they should be replaced with. This was previously done by first rewriting all the types in the IR according to the given mapping, then rewriting the type definitions and updating all the types in the IR again. Not only was doing the rewriting twice inefficient, it also introduced a subtle bug where the set of private types eligible to be rewritten could be inconsistent because updating types in the IR could change the types of control flow structures. The fuzzer found a case where this inconsistency caused the type rebuilding to fail. Fix the bug by first building the new types with the mapping applied and only then rewriting the IR a single time. Also add a `TypeBuilder::dump` utility for use in debugging. Fixes #5845.
* Fix binary writing of strings without GC enabled (#5836)Alon Zakai2023-07-311-4/+10
|
* Initial support for `final` types (#5803)Thomas Lively2023-07-063-48/+98
| | | | | | | | | | | | | | | | | | | | Implement support in the type system for final types, which are not allowed to have any subtypes. Final types are syntactically different from similar non-final types, so type canonicalization is made aware of finality. Similarly, TypeMerging and TypeSSA are updated to work correctly in the presence of final types as well. Implement binary and text parsing and emitting of final types. Use the standard text format to represent final types and interpret the non-standard "struct_subtype" and friends as non-final. This allows a graceful upgrade path for users currently using the non-standard text format, where they can update their code to use final types correctly at the point when they update to use the standard format. Once users have migrated to using the fully expanded standard text format, we can update update Binaryen's parsers to interpret the MVP shorthands as final types to match the spec without breaking those users. To make it safe for V8 to independently start interpreting types declared without `sub` as final, also reserve that shorthand encoding only for types that have no strict subtypes.
* [NFC] Fix the use of "strict" in subtypes.h (#5804)Thomas Lively2023-07-061-2/+1
| | | | | | | | Previously we incorrectly used "strict" to mean the immediate subtypes of a type, when in fact a strict subtype of a type is any subtype excluding the type itself. Rename the incorrect `getStrictSubTypes` to `getImmediateSubTypes`, rename the redundant `getAllStrictSubTypes` to `getStrictSubTypes`, and rename the redundant `getAllSubTypes` to `getSubTypes`. Fixing the capitalization of "SubType" to "Subtype" is left as future work.
* Limit printing of Literal[s] in a general way (#5792)Alon Zakai2023-06-281-16/+49
| | | | | | | | Previously we limited printing in a single Literals. But we can have infinitely recursive GC literals, or just huge graphs even without infinite recursion where no single Literals is that big (but we still get exponential blowup). This PR adds a general limit on how much we print once we start to print a Literal or Literals.
* Limit literal printing to a reasonable limit (#5779)Alon Zakai2023-06-211-0/+8
| | | | | | | | | Without this, a massive GC array for example can end up being printed as a huge amount of output, which is a problem in the fuzzer. I noticed this because a testcase was taking minutes to run. Perhaps we should add an option (BINARYEN_PRINT_FULL=1?) to disable this limit, but let's see if we ever need that.
* [NFC] Simplify `Tuple` by making it an alias of `TypeList` (#5775)Thomas Lively2023-06-202-27/+30
| | | | | | | Rather than wrap a `TypeList`, make `Tuple` an alias of `TypeList`. This means removing `Tuple::toString`, but that had no callers and was of limited use for debugging anyway. In return, the use of tuples becomes much less verbose. In the future, it may make sense to remove one of `Tuple` and `TypeList`.
* [NFC] Rewrite isorecursive canonicalization (#5774)Thomas Lively2023-06-151-341/+158
| | | | | | | Rewrite the type canonicalization algorithm to fully canonicalize a single rec group at a time rather than canonicalizing multiple rec groups at once in multiple stages. The previous code was useful when it had to be shared with equirecursive and nominal canonicalization, but was much more complicated than necessary for just isorecursive canonicalization, which is all we support today.
* [NFC] Simplify the HeapTypeStore (#5765)Thomas Lively2023-06-131-107/+38
| | | | | | | | | | | | | Now that we support only isorecursive typing, which canonicalizes heap types by the rec group rather than individually, we no longer need a canonicalizing store for heap types. Simplify the `Store` implementation, which was previously generic over both `HeapType`s and `Type`s, to instead work only for `Type`s. Replace `globalHeapTypeStore` with a simple vector to keep canonicalized `HeapTypeInfo`s alive. Also simplify global canonicalization to not replace heap type uses, since that is already done separately as part of isorecursive rec group canonicalization. This simplification does require adding new information to `CanonicalizationState`, but further work will simplify that away as well.
* Rename WasmBinaryBuilder to WasmBinaryReader (NFC) (#5767)Heejin Ahn2023-06-133-205/+197
| | | | | | We have `WasmBinaryBuilder` that read binary into Binaryen IR and `WasmBinaryWriter` that writes Binaryen IR to binary. To me `WasmBinaryBuilder` sounds similar to `WasmBinaryWriter`, which builds binary. How about renaming it to `WasmBinaryReader`?
* Update br_on_cast binary and text format (#5762)Thomas Lively2023-06-123-32/+48
| | | | | | | | | | | | The final versions of the br_on_cast and br_on_cast_fail instructions have two reference type annotations: one for the input type and one for the cast target type. In the binary format, this is represented as a flags byte followed by two encoded heap types. Upgrade all of the tests at once to use the new versions of the instructions and drop support for the old instructions from the text parser. Keep support in the binary parser to avoid breaking users, though. Drop some binary tests of deprecated instruction encodings that would be more effort to update than they're worth. Re-land with fixes of #5734