summaryrefslogtreecommitdiff
path: root/src/wasm-builder.h
Commit message (Collapse)AuthorAgeFilesLines
* Implement table.fill (#5949)Thomas Lively2023-09-181-0/+12
| | | | | | | | This instruction was standardized as part of the bulk memory proposal, but we never implemented it until now. Leave similar instructions like table.copy as future work. Fixes #5939.
* Replace I31New with RefI31 everywhere (#5930)Thomas Lively2023-09-131-4/+4
| | | | | | | | Globally replace the source string "I31New" with "RefI31" in preparation for renaming the instruction from "i31.new" to "ref.i31", as implemented in the spec in https://github.com/WebAssembly/gc/pull/422. This would be NFC, except that it also changes the string in the external-facing C APIs. A follow-up PR will make the corresponding behavioral change.
* Factor IRBuilder utility out of the new wat parser (#5880)Thomas Lively2023-08-221-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add an IRBuilder utility in a new wasm-ir-builder.h header. IRBuilder is extremely similar to Builder, except that it manages building full trees of Binaryen IR from a linear sequence of instructions, whereas Builder only builds a single IR node at a time. To build full IR trees, IRBuilder maintains an internal stack of expressions, popping children off the stack and pushing the new node onto the stack whenever it builds a new node. In addition to providing makeXYZ function to allocate, initialize, and finalize new IR nodes, IRBuilder also provides a visit() method that can be used when the user has already allocated the IR nodes and only needs to reconstruct the connections between them. This will be useful in outlining both for constructing outlined functions and for reconstructing functions around arbitrary outlined holes. Besides the new wat parser and outlining, this new utility can also eventually be used in the binary parser and to convert from Poppy IR back to Binaryen IR if that ever becomes necessary. To simplify this initial change, IRBuilder exposes the same interface as the code it replaces in the wat parser. A future change requiring more extensive changes to the wat parser will simplify this interface. Also, since the new code is tested only via the new wat parser, it only supports building instructions that were already supported by the new wat parser to avoid trying to support any instructions without corresponding testing. Implementing support for the remaining instructions is left as future work.
* Remove legacy WasmGC instructions (#5861)Thomas Lively2023-08-091-4/+1
| | | | | Remove old, experimental instructions and type encodings that will not be shipped as part of WasmGC. Updating the encodings and text format to match the final spec is left as future work.
* TypeRefining: Add casts when we must (#5840)Alon Zakai2023-07-261-1/+3
| | | | | | | | | See the example in the code and test for a situation that requires this for validation. To fix validation we add a cast. That should practically always be removed by later optimizations, and the fact it took the fuzzer this long to even find such a situation also adds confidence that this won't be adding overhead (and in this situation, the optimizer will definitely remove the cast).
* [NFC] Refactor each of ArrayNewSeg and ArrayInit into subclasses for ↵Alon Zakai2023-05-041-15/+37
| | | | | | | | | | | Data/Elem (#5692) ArrayNewSeg => ArrayNewSegData, ArrayNewSegElem ArrayInit => ArrayInitData, ArrayInitElem Basically we remove the opcode and use the class type to differentiate them. This adds some code but it makes the representation simpler and more compact in memory, and it will help with #5690
* Implement array.fill, array.init_data, and array.init_elem (#5637)Thomas Lively2023-04-061-0/+28
| | | | | These complement array.copy, which we already supported, as an initial complete set of bulk array operations. Replace the WIP spec tests with the upstream spec tests, lightly edited for compatibility with Binaryen.
* Use Names instead of indices to identify segments (#5618)Thomas Lively2023-04-041-3/+3
| | | | | | | | | | All top-level Module elements are identified and referred to by Name, but for historical reasons element and data segments were referred to by index instead. Fix this inconsistency by using Names to refer to segments from expressions that use them. Also parse and print segment names like we do for other elements. The C API is partially converted to use names instead of indices, but there are still many functions that refer to data segments by index. Finishing the conversion can be done in the future once it becomes necessary.
* [Wasm GC] Fix CoalesceLocals i31 local.get removal (#5619)Alon Zakai2023-04-041-1/+6
| | | | When removing a local.get we must replace it with something of the identical type, and not make it non-nullable.
* [Wasm GC] wasm-ctor-eval: Handle externalized data (#5582)Alon Zakai2023-03-161-0/+4
|
* [NFC] Templatize `makeBlock` so it can be used with any iterable (#5583)Thomas Lively2023-03-161-28/+22
| | | | | | | | Replace the different overloads we previously had for different kinds of containers with generic templates. We still need dedicated overloads for `std::initializer_list` because it is never inferred in a template context, though. Also, since `std::initializer_list` does not allow subscripting, update the arena vector implementation to use iterators instead now that initializer lists can be passed down to that layer without being reified as vectors.
* [NFC] Internally rename `ArrayInit` to `ArrayNewFixed` (#5526)Thomas Lively2023-02-281-3/+3
| | | | | | | | To match the standard instruction name, rename the expression class without changing any parsing or printing behavior. A follow-on PR will take care of the functional side of this change while keeping support for parsing the old name. This change will allow `ArrayInit` to be used as the expression class for the upcoming `array.init_data` and `array.init_elem` instructions.
* [Strings] Initial string execution support (#5491)Alon Zakai2023-02-151-0/+8
| | | | | | | | | | Store string data as GC data. Inefficient (one Const per char), but ok for now. Implement string.new_wtf16 and string.const, enough for basic testing. Create strings in makeConstantExpression, which enables ctor-eval support. Print strings in fuzz-exec which makes testing easier.
* [Strings] Add experimental StringNew variants (#5459)Alon Zakai2023-01-261-3/+8
| | | | | | string.from_code_point makes a string from an int code point. string.new_utf8*_try makes a utf8 string and returns null on a UTF8 encoding error rather than trap.
* [Strings] Add string.compare (#5453)Alon Zakai2023-01-251-1/+2
| | | See WebAssembly/stringref#58
* Replace `RefIs` with `RefIsNull` (#5401)Thomas Lively2023-01-091-3/+2
| | | | | | | | | | | | | | | * Replace `RefIs` with `RefIsNull` The other `ref.is*` instructions are deprecated and expressible in terms of `ref.test`. Update binary and text parsing to parse those instructions as `RefTest` expressions. Also update the printing and emitting of `RefTest` expressions to emit the legacy instructions for now to minimize test changes and make this a mostly non-functional change. Since `ref.is_null` is the only `RefIs` instruction left, remove the `RefIsOp` field and rename the expression class to `RefIsNull`. The few test changes are due to the fact that `ref.is*` instructions are now subject to `ref.test` validation, and in particular it is no longer valid to perform a `ref.is_func` on a value outside of the `func` type hierarchy.
* Consolidate br_on* operations (#5399)Thomas Lively2023-01-061-9/+2
| | | | | | | | | | | | | | | | | | The `br_on{_non}_{data,i31,func}` operations are deprecated and directly representable in terms of the new `br_on_cast` and `br_on_cast_fail` instructions, so remove their dedicated IR opcodes in favor of representing them as casts. `br_on_null` and `br_on_non_null` cannot be consolidated the same way because their behavior is not directly representable in terms of `br_on_cast` and `br_on_cast_fail`; when the cast to null bottom type succeeds, the null check instructions implicitly drop the null value whereas the cast instructions would propagate it. Add special logic to the binary writer and printer to continue emitting the deprecated instructions for now. This will allow us to update the test suite in a separate future PR with no additional functional changes. Some tests are updated because the validator no longer allows passing non-func data to `br_on_func`. Doing so has not made sense since we separated the three reference type hierarchies.
* Support br_on_cast null (#5397)Thomas Lively2023-01-051-2/+2
| | | | | | | | | As well as br_on_cast_fail null. Unlike the existing br_on_cast* instructions, these new instructions treat the cast as succeeding when the input is a null. Update the internal representation of the cast type in `BrOn` expressions to be a `Type` rather than a `HeapType` so it will include nullability information. Also update and improve `RemoveUnusedBrs` to handle the new instructions correctly and optimize in more cases.
* [Parser] Parse blocks (#5393)Thomas Lively2023-01-051-0/+8
| | | | | | | | | | | | | Parse both the folded and unfolded forms of blocks and structure the code to make supporting additional block instructions like if-else and try-catch relatively simple. Parsing block types is extra fun because they may implicitly define new signature heap types via a typeuse, but only if their types are not given by a single result type. To figuring out whether a new type may be introduced in all the relevant parsing stages, always track at least the arity of parsed results. The parser parses block labels, but more work will be required to support branch instructions that use them.
* Support `ref.test null` (#5368)Thomas Lively2022-12-211-2/+2
| | | This new variant of ref.test returns 1 if the input is null.
* Update RefCast representation to drop extra HeapType (#5350)Thomas Lively2022-12-201-3/+2
| | | | | | | | | The latest upstream version of ref.cast is parameterized with a target reference type, not just a heap type, because the nullability of the result is parameterizable. As a first step toward implementing these new, more flexible ref.cast instructions, change the internal representation of ref.cast to use the expression type as the cast target rather than storing a separate heap type field. For now require that the encoded semantics match the previously allowed semantics, though, so that none of the optimization passes need to be updated.
* Add standard versions of WasmGC casts (#5331)Thomas Lively2022-12-071-24/+0
| | | | | | | We previously supported only the non-standard cast instructions introduced when we were experimenting with nominal types. Parse the names and opcodes of their standard counterparts and switch to emitting the standard names and opcodes. Port all of the tests to use the standard instructions, but add additional tests showing that the non-standard versions are still parsed correctly.
* Revert "Revert "Make `call_ref` type annotations mandatory (#5246)" (#5265)" ↵Thomas Lively2022-11-161-29/+0
| | | | | (#5266) This reverts commit 570007dbecf86db5ddba8d303896d841fc2b2d27.
* Revert "Make `call_ref` type annotations mandatory (#5246)" (#5265)Thomas Lively2022-11-161-0/+29
| | | | | This reverts commit b2054b72b7daa89b7ad161c0693befad06a20c90. It looks like the necessary V8 change has not rolled out everywhere yet.
* Make `call_ref` type annotations mandatory (#5246)Thomas Lively2022-11-151-29/+0
| | | | They were optional for a while to allow users to gracefully transition to using them, but now make them mandatory to match the upstream WasmGC spec.
* Implement `array.new_data` and `array.new_elem` (#5214)Thomas Lively2022-11-071-0/+14
| | | | | | | | | In order to test them, fix the binary and text parsers to accept passive data segments even if a module has no memory. In addition to parsing and emitting the new instructions, also implement their validation and interpretation. Test the interpretation directly with wasm-shell tests adapted from the upstream spec tests. Running the upstream spec tests directly would require fixing too many bugs in the legacy text parser, so it will have to wait for the new text parser to be ready.
* [Parser] Parse loads and stores (#5174)Thomas Lively2022-10-211-6/+6
| | | | | | | | | | Add parsing functions for `memarg`s, the offset and align fields of load and store instructions. These fields are interesting because they are lexically reserved words that need to be further parsed to extract their actual values. On top of that, add support for parsing all of the load and store instructions. This required fixing a buffer overflow problem in the generated parser code and adding more information to the signatures of the SIMD load and store instructions. `SIMDLoadStoreLane` instructions are particularly interesting because they may require backtracking to parse correctly.
* Implement bottom heap types (#5115)Thomas Lively2022-10-071-14/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | These types, `none`, `nofunc`, and `noextern` are uninhabited, so references to them can only possibly be null. To simplify the IR and increase type precision, introduce new invariants that all `ref.null` instructions must be typed with one of these new bottom types and that `Literals` have a bottom type iff they represent null values. These new invariants requires several additional changes. First, it is now possible that the `ref` or `target` child of a `StructGet`, `StructSet`, `ArrayGet`, `ArraySet`, or `CallRef` instruction has a bottom reference type, so it is not possible to determine what heap type annotation to emit in the binary or text formats. (The bottom types are not valid type annotations since they do not have indices in the type section.) To fix that problem, update the printer and binary emitter to emit unreachables instead of the instruction with undetermined type annotation. This is a valid transformation because the only possible value that could flow into those instructions in that case is null, and all of those instructions trap on nulls. That fix uncovered a latent bug in the binary parser in which new unreachables within unreachable code were handled incorrectly. This bug was not previously found by the fuzzer because we generally stop emitting code once we encounter an instruction with type `unreachable`. Now, however, it is possible to emit an `unreachable` for instructions that do not have type `unreachable` (but are known to trap at runtime), so we will continue emitting code. See the new test/lit/parse-double-unreachable.wast for details. Update other miscellaneous code that creates `RefNull` expressions and null `Literals` to maintain the new invariants as well.
* Fix multi-memory + C API for MemoryGrow and MemorySize (#4953)Alon Zakai2022-08-231-6/+18
| | | | | | | | | | | | | | | | Those instructions need to know if the memory is 64-bit or not. We looked that up on the module globally, which is convenient, but in the C API this was actually a breaking change, it turns out. To keep things working, provide that information when creating a MemoryGrow or MemorySize, as another parameter in the C API. In the C++ API (wasm-builder), support both modes, and default to the automatic lookup. We already require a bunch of other explicit info when creating expressions, like making a Call requires the return type (we don't look it up globally), and even a LocalGet requires the local type (we don't look it up on the function), so this is consistent with those. Fixes #4946
* Mutli-Memories Support in IR (#4811)Ashley Nelson2022-08-171-26/+76
| | | | | | | This PR removes the single memory restriction in IR, adding support for a single module to reference multiple memories. To support this change, a new memory name field was added to 13 memory instructions in order to identify the memory for the instruction. It is a goal of this PR to maintain backwards compatibility with existing text and binary wasm modules, so memory indexes remain optional for memory instructions. Similarly, the JS API makes assumptions about which memory is intended when only one memory is present in the module. Another goal of this PR is that existing tests behavior be unaffected. That said, tests must now explicitly define a memory before invoking memory instructions or exporting a memory, and memory names are now printed for each memory instruction in the text format. There remain quite a few places where a hardcoded reference to the first memory persist (memory flattening, for example, will return early if more than one memory is present in the module). Many of these call-sites, particularly within passes, will require us to rethink how the optimization works in a multi-memories world. Other call-sites may necessitate more invasive code restructuring to fully convert away from relying on a globally available, single memory pointer.
* [Strings] string.new.array methods have start:end arguments (#4888)Alon Zakai2022-08-091-0/+12
|
* Remove RTTs (#4848)Thomas Lively2022-08-051-95/+3
| | | | | | | RTTs were removed from the GC spec and if they are added back in in the future, they will be heap types rather than value types as in our implementation. Updating our implementation to have RTTs be heap types would have been more work than deleting them for questionable benefit since we don't know how long it will be before they are specced again.
* [Strings] GC variants for string.encode (#4817)Alon Zakai2022-07-211-2/+5
|
* Remove basic reference types (#4802)Thomas Lively2022-07-201-22/+11
| | | | | | | | | Basic reference types like `Type::funcref`, `Type::anyref`, etc. made it easy to accidentally forget to handle reference types with the same basic HeapTypes but the opposite nullability. In principle there is nothing special about the types with shorthands except in the binary and text formats. Removing these shorthands from the internal type representation by removing all basic reference types makes some code more complicated locally, but simplifies code globally and encourages properly handling both nullable and non-nullable reference types.
* [Strings] stringview_*.slice (#4805)Alon Zakai2022-07-151-3/+21
| | | | | | | Unfortunately one slice is the same as python [start:end], using 2 params, and the other slice is one param, [CURR:CURR+num] (where CURR is implied by the current state in the iter). So we can't use a single class here. Perhaps a different name would be good, like slice vs substring (like JS does), but I picked names to match the current spec.
* [Strings] stringview access operations (#4798)Alon Zakai2022-07-131-0/+32
|
* [Strings] string.as (#4797)Alon Zakai2022-07-121-0/+7
|
* [Strings] string.eq (#4781)Alon Zakai2022-07-081-0/+7
|
* [Strings] string.concat (#4777)Alon Zakai2022-07-081-0/+7
|
* [Strings] string.encode (#4776)Alon Zakai2022-07-071-0/+9
|
* [Strings] string.measure (#4775)Alon Zakai2022-07-071-0/+7
|
* [Strings] Add string.const (#4768)Alon Zakai2022-07-061-0/+6
| | | | | This is more work than a typical instruction because it also adds a new section: all the (string.const "foo") strings are put in a new "strings" section in the binary, and the instructions refer to them by index.
* [Strings] Add string.new* instructions (#4761)Alon Zakai2022-06-291-0/+9
| | | | | | This is the first instruction from the Strings proposal. This includes everything but interpreter support.
* First class Data Segments (#4733)Ashley Nelson2022-06-211-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Updating wasm.h/cpp for DataSegments * Updating wasm-binary.h/cpp for DataSegments * Removed link from Memory to DataSegments and updated module-utils, Metrics and wasm-traversal * checking isPassive when copying data segments to know whether to construct the data segment with an offset or not * Removing memory member var from DataSegment class as there is only one memory rn. Updated wasm-validator.cpp * Updated wasm-interpreter * First look at updating Passes * Updated wasm-s-parser * Updated files in src/ir * Updating tools files * Last pass on src files before building * added visitDataSegment * Fixing build errors * Data segments need a name * fixing var name * ran clang-format * Ensuring a name on DataSegment * Ensuring more datasegments have names * Adding explicit name support * Fix fuzzing name * Outputting data name in wasm binary only if explicit * Checking temp dataSegments vector to validateBinary because it's the one with the segments before we processNames * Pass on when data segment names are explicitly set * Ran auto_update_tests.py and check.py, success all around * Removed an errant semi-colon and corrected a counter. Everything still passes * Linting * Fixing processing memory names after parsed from binary * Updating the test from the last fix * Correcting error comment * Impl kripken@ comments * Impl tlively@ comments * Updated tests that remove data print when == 0 * Ran clang format * Impl tlively@ comments * Ran clang-format
* Add ref.cast_nop_static (#4656)Thomas Lively2022-05-111-1/+4
| | | | | | This unsafe experimental instruction is semantically equivalent to ref.cast_static, but V8 will unsafely turn it into a nop. This is meant to help us measure cast overhead more precisely than we can by globally turning all casts into nops.
* Remove externref (#4633)Thomas Lively2022-05-041-2/+0
| | | | | | Remove `Type::externref` and `HeapType::ext` and replace them with uses of anyref and any, respectively, now that we have unified these types in the GC proposal. For backwards compatibility, continue to parse `extern` and `externref` and maintain their relevant C API functions.
* [NominalFuzzing] Fix replaceWithIdenticalType() on nondefaultable tuples (#4605)Alon Zakai2022-04-211-1/+1
|
* Change from storing Signature to HeapType on CallIndirect (#4352)Thomas Lively2021-11-221-3/+4
| | | | | | | | | | | | With nominal function types, this change makes it so that we preserve the identity of the function type used with call_indirect instructions rather than recreating a function heap type, which may or may not be the same as the originally parsed heap type, from the function signature during module writing. This will simplify the type system implementation by removing the need to store a "canonical" nominal heap type for each unique signature. We previously depended on those canonical types to avoid creating multiple duplicate function types during module writing, but now we aren't creating any new function types at all.
* Fix RTTs for RTT-less instructions (#4294)Thomas Lively2021-11-031-1/+1
| | | | | | | | | | | | Allocation and cast instructions without explicit RTTs should use the canonical RTTs for the given types. Furthermore, the RTTs for nominal types should reflect the static type hierarchy. Previously, however, we implemented allocations and casts without RTTs using an alternative system that only used static types rather than RTT values. This alternative system would work fine in a world without first-class RTTs, but it did not properly allow mixing instructions that use RTTs and instructions that do not use RTTs as intended by the M4 GC spec. This PR fixes the issue by using canonical RTTs where appropriate and cleans up the relevant casting code using std::variant.
* Add table.grow operation (#4245)Max Graey2021-10-181-0/+8
|