summaryrefslogtreecommitdiff
path: root/src/wasm/wasm-validator.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Implement more TypeGeneralizing transfer functions (#6118)Thomas Lively2023-11-151-12/+14
| | | | | | | Finish the transfer functions for all expressions except for string instructions, exception handling instructions, tuple instructions, and branch instructions that carry values. The latter require more work in the CFG builder because dropping the extra stack values happens after the branch but before the target block.
* Implement table.copy (#6078)Alon Zakai2023-11-061-0/+22
| | | Helps #5951
* Allow rec groups of public function types in closed world (#6053)Alon Zakai2023-10-261-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Closed-world mode allows function types to escape if they are on exported functions, because that has been possible since wasm MVP and cannot be avoided. But we need to also allow all types in those type's rec groups as well. Consider this case: (module (rec (type $0 (func)) (type $1 (func)) ) (func "0" (type $0) (nop) ) (func "1" (type $1) (nop) ) ) The two exported functions make the two types public, so this module validates in closed world mode. Now imagine that metadce removes one export: (module (rec (type $0 (func)) (type $1 (func)) ) (func "0" (type $0) (nop) ) ;; The export "1" is gone. ) Before this PR that no longer validates, because it only marks the type $0 as public. But when a type is public that makes its entire rec group public, so $1 is errored on. To fix that, this PR allows all types in a rec group of an exported function's type, which makes that last module validate.
* Fix segfault in catch validator (#6032)Philip Blair2023-10-231-26/+24
| | | | | The problem was if you construct a try expression which references a nonexistent tag in one of its catch blocks, the validation code successfully identified the null pointer but then proceeded to try to read from it.
* [typed-cont] Allow result types on tags (#5997)Frank Emrich2023-10-051-4/+17
| | | | | | | | | | | This PR is part of a series that adds basic support for the typed continuations proposal. This PR relaxes the restriction that tags must not have results , only params. Tags with results must not be used for exception handling and are only allowed if the typed continuations feature is enabled. As a minor point, this PR also changes the printing of tags without params: To make the presentation consistent, (param) is omitted when printing a tag.
* [Parser] Parse labels and br (#5970)Thomas Lively2023-10-021-2/+5
| | | | | | The parser previously parsed labels and could attach them to control flow structures, but did not maintain the context necessary to correctly parse branches. Support parsing labels as both names and indices in IRBuilder, handling shadowing correctly, and use that support to implement parsing of br.
* Refine ref.test's castType during refinalization (#5985)Thomas Lively2023-10-021-0/+4
| | | | | | Just like we do with other casts, refine the cast type to be the greatest lower bound of its previous cast type and its input type. The difference is that the output type of ref.test remains i32, but it's still useful to retain more precise type information.
* Support i8/i16 mutable arrays as public types for string interop (#5814)Alon Zakai2023-09-211-1/+5
| | | | | Probably any array of non-reference data can be allowed to be public and sent out of the module, as it is just data. For now, however, just special case the i8 and i16 array types which are useful already for string interop.
* Fix validation error message for table.fill (#5953)Thomas Lively2023-09-181-4/+3
| | | table.fill requires bulk memory to be enabled, not reference types.
* Implement table.fill (#5949)Thomas Lively2023-09-181-0/+19
| | | | | | | | This instruction was standardized as part of the bulk memory proposal, but we never implemented it until now. Leave similar instructions like table.copy as future work. Fixes #5939.
* Replace i31.new with ref.i31 everywhere (#5931)Thomas Lively2023-09-131-2/+2
| | | | | Replace i31.new with ref.i31 in the printer, tests, and source code. Continue parsing i31.new for the time being to allow a graceful transition. Also update the JS API to reflect the new instruction name.
* Replace I31New with RefI31 everywhere (#5930)Thomas Lively2023-09-131-2/+2
| | | | | | | | Globally replace the source string "I31New" with "RefI31" in preparation for renaming the instruction from "i31.new" to "ref.i31", as implemented in the spec in https://github.com/WebAssembly/gc/pull/422. This would be NFC, except that it also changes the string in the external-facing C APIs. A follow-up PR will make the corresponding behavioral change.
* Remove the GCNNLocals feature (#5080)Thomas Lively2023-08-311-39/+7
| | | | | Now that the WasmGC spec has settled on a way of validating non-nullable locals, we no longer need this experimental feature that allowed nonstandard uses of non-nullable locals.
* Validate and fix up tuples with non-nullable elements (#5909)Thomas Lively2023-08-301-3/+6
| | | | | | The code validating and fixing up non-nullable locals previously did not correctly handle tuples that contained non-nullable elements, which could have resulted in invalid modules going undetected. Update the code to handle tuples and add tests.
* Rename multimemory flag (#5890)Ashley Nelson2023-08-211-2/+2
| | | Renaming the multimemory flag in Binaryen to match its naming in LLVM.
* Ensure br_on_cast* target type is subtype of input type (#5881)Thomas Lively2023-08-171-0/+5
| | | | | | | | | | | | | | | | The WasmGC spec will require that the target cast type of br_on_cast and br_on_cast_fail be a subtype of the input type, but so far Binaryen has not enforced this constraint, so it could produce invalid modules when optimizations refined the input to a br_on_cast* such that it was no longer a supertype of the cast target type. Fix this problem by setting the cast target type to be the greatest lower bound of the original cast target type and the current input type in `BrOn::finalize()`. This maintains the invariant that the cast target type should be a subtype of the input type and it also does not change cast behavior; any value that could make the original cast succeed at runtime necessarily inhabits both the original cast target type and the input type, so it also must inhabit their greatest lower bound and will make the updated cast succeed as well.
* [NFC] Simplify `Tuple` by making it an alias of `TypeList` (#5775)Thomas Lively2023-06-201-1/+1
| | | | | | | Rather than wrap a `TypeList`, make `Tuple` an alias of `TypeList`. This means removing `Tuple::toString`, but that had no callers and was of limited use for debugging anyway. In return, the use of tuples becomes much less verbose. In the future, it may make sense to remove one of `Tuple` and `TypeList`.
* Validate tag param types (#5759)Alon Zakai2023-06-081-0/+5
| | | | | We already validated function params, but were missing tags. Without this the fuzzer can get confused if a type is only used in a tag.
* Strings: Add initial validation checks (#5758)Alon Zakai2023-06-081-0/+91
| | | | | | | | | This is far from comprehensive, but it checks strings being enabled for all the instructions. Without this, the fuzzer can get confused because it checks if code validates and then proceeds under that assumption, so any missing validation checks can cause problems (specifically, if we have a string.const without strings enabled then we error during writing of the string, since we don't do the initial pass to find all strings to deduplicate them).
* [NFC] Refactor each of ArrayNewSeg and ArrayInit into subclasses for ↵Alon Zakai2023-05-041-55/+86
| | | | | | | | | | | Data/Elem (#5692) ArrayNewSeg => ArrayNewSegData, ArrayNewSegElem ArrayInit => ArrayInitData, ArrayInitElem Basically we remove the opcode and use the class type to differentiate them. This adds some code but it makes the representation simpler and more compact in memory, and it will help with #5690
* Implement array.fill, array.init_data, and array.init_elem (#5637)Thomas Lively2023-04-061-12/+110
| | | | | These complement array.copy, which we already supported, as an initial complete set of bulk array operations. Replace the WIP spec tests with the upstream spec tests, lightly edited for compatibility with Binaryen.
* Support multiple memories in RemoveUnusedModuleElements (#5604)Thomas Lively2023-04-041-6/+1
| | | | | | | | Add support for memory and data segment module elements and treat them uniformly with other module elements rather than as special cases. There is a cyclic dependency between memories (or tables) and their active segments because exported or accessed memories (or tables) keep their active segments alive, but active segments for imported memories (or tables) keep their memories (or tables) alive as well.
* Use Names instead of indices to identify segments (#5618)Thomas Lively2023-04-041-9/+9
| | | | | | | | | | All top-level Module elements are identified and referred to by Name, but for historical reasons element and data segments were referred to by index instead. Fix this inconsistency by using Names to refer to segments from expressions that use them. Also parse and print segment names like we do for other elements. The C API is partially converted to use names instead of indices, but there are still many functions that refer to data segments by index. Finishing the conversion can be done in the future once it becomes necessary.
* [NFC] Remove our bespoke `make_unique` implementation (#5613)Thomas Lively2023-03-311-1/+1
| | | | This code predates our adoption of C++14 and can now be removed in favor of `std::make_unique`, which should be more efficient.
* Do not treat `atomic.fence` as using a memory (#5603)Thomas Lively2023-03-291-2/+0
| | | | | | | | | * Do not treat `atomic.fence` as using a memory Update RemoveUnusedModuleElements so that it no longer keeps the memory alive due to an `atomic.fence` instruction and update validation to allow modules to use `atomic.fence` without a memory. * update wasm2js tests
* Add bulk-array.wast spec test outline (#5568)Thomas Lively2023-03-161-18/+11
| | | | | | | | | Add spec/bulk-array.wast, which contains an outline of the tests that will be necessary for the upcoming bulk array instructions: array.copy (already implemented), array.fill, array.init_data, and array.init_elem. Although the test file does not actually contain any tests yet, it contains some setup code defining types, globals, and element segments that the tests will use. Fix miscellaneous bugs in parsing, validation, and printing to allow this setup code to run without issues.
* Make constant expression validation stricter (#5557)Thomas Lively2023-03-101-36/+29
| | | | | | | | | | Previously we treated global.get as a constant expression and only additionally verified that the target globals were immutable in some cases. But global.get of a mutable global is never a constant expression, and further, only imported globals are available in constant expressions unless GC is enabled. Fix constant expression validation to only allow global.get of immutable, imported globals, and fix all the invalid tests.
* [NFC] Internally rename `ArrayInit` to `ArrayNewFixed` (#5526)Thomas Lively2023-02-281-2/+2
| | | | | | | | To match the standard instruction name, rename the expression class without changing any parsing or printing behavior. A follow-on PR will take care of the functional side of this change while keeping support for parsing the old name. This change will allow `ArrayInit` to be used as the expression class for the upcoming `array.init_data` and `array.init_elem` instructions.
* Fix validation of DataDrop (#5517)Alon Zakai2023-02-231-3/+6
| | | Fixes #5511
* Replace `RefIs` with `RefIsNull` (#5401)Thomas Lively2023-01-091-6/+7
| | | | | | | | | | | | | | | * Replace `RefIs` with `RefIsNull` The other `ref.is*` instructions are deprecated and expressible in terms of `ref.test`. Update binary and text parsing to parse those instructions as `RefTest` expressions. Also update the printing and emitting of `RefTest` expressions to emit the legacy instructions for now to minimize test changes and make this a mostly non-functional change. Since `ref.is_null` is the only `RefIs` instruction left, remove the `RefIsOp` field and rename the expression class to `RefIsNull`. The few test changes are due to the fact that `ref.is*` instructions are now subject to `ref.test` validation, and in particular it is no longer valid to perform a `ref.is_func` on a value outside of the `func` type hierarchy.
* Support br_on_cast null (#5397)Thomas Lively2023-01-051-3/+8
| | | | | | | | | As well as br_on_cast_fail null. Unlike the existing br_on_cast* instructions, these new instructions treat the cast as succeeding when the input is a null. Update the internal representation of the cast type in `BrOn` expressions to be a `Type` rather than a `HeapType` so it will include nullability information. Also update and improve `RemoveUnusedBrs` to handle the new instructions correctly and optimize in more cases.
* Allow non-nullable ref.cast of nullable references (#5386)Thomas Lively2023-01-041-5/+5
| | | | | | | This new cast configuration was not expressible with the legacy cast instructions. Although it is valid in Wasm, do not allow nullable casts of non-nullable references, since those would unnecessarily lose type information. Convert such casts to be non-nullable during expression finalization.
* Support `ref.test null` (#5368)Thomas Lively2022-12-211-1/+1
| | | This new variant of ref.test returns 1 if the input is null.
* Update RefCast representation to drop extra HeapType (#5350)Thomas Lively2022-12-201-1/+7
| | | | | | | | | The latest upstream version of ref.cast is parameterized with a target reference type, not just a heap type, because the nullability of the result is parameterizable. As a first step toward implementing these new, more flexible ref.cast instructions, change the internal representation of ref.cast to use the expression type as the cast target rather than storing a separate heap type field. For now require that the encoded semantics match the previously allowed semantics, though, so that none of the optimization passes need to be updated.
* Do not optimize public types (#5347)Thomas Lively2022-12-161-0/+39
| | | | | | | | | | | | | | | | | Do not optimize or modify public heap types in any way. Public heap types include the types of imported or exported functions, tables, globals, etc. This is important to maintain the public interface of a module and ensure it can still link interact as intended with the outside world. Also add validation error if we find any nontrivial public types that are not the types of imported or exported functions. This error is meant to help the user ensure that type optimizations are not silently inhibited. In the future, we may want to add options to silence this error or downgrade it to a warning. This commit only updates the type updating machinery to avoid updating public types. It does not update any optimization passes accordingly. Since we avoid modifying public signature types already, this is not expected to break anything, but in the future once we have function subtyping or if we make the error optional, we may have to update some of our optimization passes.
* Allow casting to basic heap types (#5332)Thomas Lively2022-12-081-28/+33
| | | | | | | The standard casting instructions now allow casting to basic heap types, not just user-defined types, but they also require that the intended type and argument type have a common supertype. Update the validator to use the standard rules, update the binary parser and printer to allow basic types, and update the tests to remove or modify newly invalid test cases.
* Fix validation and inlining bugs (#5301)Thomas Lively2022-11-291-2/+5
| | | | | | | | | | | | | | Inlining had a bug where it gave return_calls in inlined callees concrete types even when they should have remained unreachable. This bug flew under the radar because validation had a bug where it allowed expressions to have concrete types when they should have been unreachable. The fuzzer found this bug by adding another pass after inlining where the unexpected types caused an assertion failure. Fix the bugs and add a test that would have triggered the inlining bug. Unfortunately the test would have also passed before this change due to the validation bug, but it's better than nothing. Fixes #5294.
* Validator: Print the field number on subtyping errors (#5297)Alon Zakai2022-11-291-4/+6
|
* [NFC] Expand comment about validating function type features (#5286)Thomas Lively2022-11-221-1/+3
| | | This addresses feedback missed in #5279.
* Validate that GC is enabled for rec groups and supertypes (#5279)Thomas Lively2022-11-221-0/+3
| | | | | | | | | Update `HeapType::getFeatures` to report that GC is used for heap types that have nontrivial recursion groups or supertypes. Update validation to check the features on function heap types, not just their individual params and results. This fixes a fuzz bug in #5239 where initial contents included a rec group but the fuzzer disabled GC. Since the resulting module passed validation, the rec groups made it into the binary output, making the type section malformed.
* Implement `array.new_data` and `array.new_elem` (#5214)Thomas Lively2022-11-071-0/+68
| | | | | | | | | In order to test them, fix the binary and text parsers to accept passive data segments even if a module has no memory. In addition to parsing and emitting the new instructions, also implement their validation and interpretation. Test the interpretation directly with wasm-shell tests adapted from the upstream spec tests. Running the upstream spec tests directly would require fixing too many bugs in the legacy text parser, so it will have to wait for the new text parser to be ready.
* Fix binary parsing of data segment memory (#5208)Thomas Lively2022-11-031-3/+5
| | | | | | | | | | | | The binary parser was eagerly getting the name of memories to set the `memory` field of data segments, but that meant that when the memory names were updated later while parsing the names section, the data segment memory fields would become out of date. Update the issue by deferring setting the `memory` fields like we do for other parts of IR that reference memories. Also fix a segfault in the validator that was triggered by the reproducer for this bug before the bug was fixed. Fixes #5204.
* [NFC] Mention relevant flags in validator errors (#5203)Alon Zakai2022-11-011-93/+116
| | | | | | | | | | E.g. Atomic operation (atomics are disabled) => Atomic operations require threads [--enable-threads]
* Remove excessive validation that is not in the wasm spec (#5167)Alon Zakai2022-10-201-28/+1
| | | | | | | | Specifically if a segment offset was a const, we checked that it made sense. But the wasm spec doesn't do that, and it actually causes some issues (#5163). In theory this extra validation might be useful - compile-time error rather than runtime - but if we want this it should probably be an optional thing, like an opt-in flag or a --lint pass or such.
* Implement `array` basic heap type (#5148)Thomas Lively2022-10-181-6/+28
| | | | | | | | | `array` is the supertype of all defined array types and for now is a subtype of `data`. (Once `data` becomes `struct` this will no longer be true.) Update the binary and text parsing of `array.len` to ignore the obsolete type annotation and update the binary emitting to emit a zero in place of the old type annotation and the text printing to print an arbitrary heap type for the annotation. A follow-on PR will add support for the newer unannotated version of `array.len`.
* Validate memory initial < max only if memory hasMax (#5125)Ashley Nelson2022-10-071-2/+4
| | | Making a change to wasm-validator so that Memory::kUnlimitedSize is treated properly like an unlimited case. The check for whether memory.initial < memory.max will only happen if memory.hasMax() — meaning if memory.max is not set to kUnlimitedSize.
* Implement bottom heap types (#5115)Thomas Lively2022-10-071-29/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | These types, `none`, `nofunc`, and `noextern` are uninhabited, so references to them can only possibly be null. To simplify the IR and increase type precision, introduce new invariants that all `ref.null` instructions must be typed with one of these new bottom types and that `Literals` have a bottom type iff they represent null values. These new invariants requires several additional changes. First, it is now possible that the `ref` or `target` child of a `StructGet`, `StructSet`, `ArrayGet`, `ArraySet`, or `CallRef` instruction has a bottom reference type, so it is not possible to determine what heap type annotation to emit in the binary or text formats. (The bottom types are not valid type annotations since they do not have indices in the type section.) To fix that problem, update the printer and binary emitter to emit unreachables instead of the instruction with undetermined type annotation. This is a valid transformation because the only possible value that could flow into those instructions in that case is null, and all of those instructions trap on nulls. That fix uncovered a latent bug in the binary parser in which new unreachables within unreachable code were handled incorrectly. This bug was not previously found by the fuzzer because we generally stop emitting code once we encounter an instruction with type `unreachable`. Now, however, it is possible to emit an `unreachable` for instructions that do not have type `unreachable` (but are known to trap at runtime), so we will continue emitting code. See the new test/lit/parse-double-unreachable.wast for details. Update other miscellaneous code that creates `RefNull` expressions and null `Literals` to maintain the new invariants as well.
* Refactor interaction between Pass and PassRunner (#5093)Thomas Lively2022-09-301-1/+3
| | | | | | | | | | | | | | Previously only WalkerPasses had access to the `getPassRunner` and `getPassOptions` methods. Move those methods to `Pass` so all passes can use them. As a result, the `PassRunner` passed to `Pass::run` and `Pass::runOnFunction` is no longer necessary, so remove it. Also update `Pass::create` to return a unique_ptr, which is more efficient than having it return a raw pointer only to have the `PassRunner` wrap that raw pointer in a `unique_ptr`. Delete the unused template `PassRunner::getLast()`, which looks like it was intended to enable retrieving previous analyses and has been in the code base since 2015 but is not implemented anywhere.
* Remove typed-function-references feature (#5030)Thomas Lively2022-09-091-17/+8
| | | | | | | | | | | | | | | | In practice typed function references will not ship before GC and is not independently useful, so it's not necessary to have a separate feature for it. Roll the functionality previously enabled by --enable-typed-function-references into --enable-gc instead. This also avoids a problem with the ongoing implementation of the new GC bottom heap types. That change will make all ref.null instructions in Binaryen IR refer to one of the bottom heap types. But since those bottom types are introduced in GC, it's not valid to emit them in binaries unless unless GC is enabled. The fix if only reference types is enabled is to emit (ref.null func) instead of (ref.null nofunc), but that doesn't always work if typed function references are enabled because a function type more specific than func may be required. Getting rid of typed function references as a separate feature makes this a nonissue.
* [Wasm GC] Support non-nullable locals in the "1a" form (#4959)Alon Zakai2022-08-311-26/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An overview of this is in the README in the diff here (conveniently, it is near the top of the diff). Basically, we fix up nn locals after each pass, by default. This keeps things easy to reason about - what validates is what is valid wasm - but there are some minor nuances as mentioned there, in particular, we ignore nameless blocks (which are commonly added by various passes; ignoring them means we can keep more locals non-nullable). The key addition here is LocalStructuralDominance which checks which local indexes have the "structural dominance" property of 1a, that is, that each get has a set in its block or an outer block that precedes it. I optimized that function quite a lot to reduce the overhead of running that logic after each pass. The overhead is something like 2% on J2Wasm and 0% on Dart (0%, because in this mode we shrink code size, so there is less work actually, and it balances out). Since we run fixups after each pass, this PR removes logic to manually call the fixup code from various places we used to call it (like eh-utils and various passes). Various passes are now marked as requiresNonNullableLocalFixups => false. That lets us skip running the fixups after them, which we normally do automatically. This helps avoid overhead. Most passes still need the fixups, though - any pass that adds a local, or a named block, or moves code around, likely does. This removes a hack in SimplifyLocals that is no longer needed. Before we worked to avoid moving a set into a try, as it might not validate. Now, we just do it and let fixups happen automatically if they need to: in the common code they probably don't, so the extra complexity seems not worth it. Also removes a hack from StackIR. That hack tried to avoid roundtrip adding a nondefaultable local. But we have the logic to fix that up now, and opts will likely keep it non-nullable as well. Various tests end up updated here because now a local can be non-nullable - previous fixups are no longer needed. Note that this doesn't remove the gc-nn-locals feature. That has been useful for testing, and may still be useful in the future - it basically just allows nn locals in all positions (that can't read the null default value at the entry). We can consider removing it separately. Fixes #4824