summaryrefslogtreecommitdiff
path: root/src/wasm/wasm-validator.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Implement `array` basic heap type (#5148)Thomas Lively2022-10-181-6/+28
| | | | | | | | | `array` is the supertype of all defined array types and for now is a subtype of `data`. (Once `data` becomes `struct` this will no longer be true.) Update the binary and text parsing of `array.len` to ignore the obsolete type annotation and update the binary emitting to emit a zero in place of the old type annotation and the text printing to print an arbitrary heap type for the annotation. A follow-on PR will add support for the newer unannotated version of `array.len`.
* Validate memory initial < max only if memory hasMax (#5125)Ashley Nelson2022-10-071-2/+4
| | | Making a change to wasm-validator so that Memory::kUnlimitedSize is treated properly like an unlimited case. The check for whether memory.initial < memory.max will only happen if memory.hasMax() — meaning if memory.max is not set to kUnlimitedSize.
* Implement bottom heap types (#5115)Thomas Lively2022-10-071-29/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | These types, `none`, `nofunc`, and `noextern` are uninhabited, so references to them can only possibly be null. To simplify the IR and increase type precision, introduce new invariants that all `ref.null` instructions must be typed with one of these new bottom types and that `Literals` have a bottom type iff they represent null values. These new invariants requires several additional changes. First, it is now possible that the `ref` or `target` child of a `StructGet`, `StructSet`, `ArrayGet`, `ArraySet`, or `CallRef` instruction has a bottom reference type, so it is not possible to determine what heap type annotation to emit in the binary or text formats. (The bottom types are not valid type annotations since they do not have indices in the type section.) To fix that problem, update the printer and binary emitter to emit unreachables instead of the instruction with undetermined type annotation. This is a valid transformation because the only possible value that could flow into those instructions in that case is null, and all of those instructions trap on nulls. That fix uncovered a latent bug in the binary parser in which new unreachables within unreachable code were handled incorrectly. This bug was not previously found by the fuzzer because we generally stop emitting code once we encounter an instruction with type `unreachable`. Now, however, it is possible to emit an `unreachable` for instructions that do not have type `unreachable` (but are known to trap at runtime), so we will continue emitting code. See the new test/lit/parse-double-unreachable.wast for details. Update other miscellaneous code that creates `RefNull` expressions and null `Literals` to maintain the new invariants as well.
* Refactor interaction between Pass and PassRunner (#5093)Thomas Lively2022-09-301-1/+3
| | | | | | | | | | | | | | Previously only WalkerPasses had access to the `getPassRunner` and `getPassOptions` methods. Move those methods to `Pass` so all passes can use them. As a result, the `PassRunner` passed to `Pass::run` and `Pass::runOnFunction` is no longer necessary, so remove it. Also update `Pass::create` to return a unique_ptr, which is more efficient than having it return a raw pointer only to have the `PassRunner` wrap that raw pointer in a `unique_ptr`. Delete the unused template `PassRunner::getLast()`, which looks like it was intended to enable retrieving previous analyses and has been in the code base since 2015 but is not implemented anywhere.
* Remove typed-function-references feature (#5030)Thomas Lively2022-09-091-17/+8
| | | | | | | | | | | | | | | | In practice typed function references will not ship before GC and is not independently useful, so it's not necessary to have a separate feature for it. Roll the functionality previously enabled by --enable-typed-function-references into --enable-gc instead. This also avoids a problem with the ongoing implementation of the new GC bottom heap types. That change will make all ref.null instructions in Binaryen IR refer to one of the bottom heap types. But since those bottom types are introduced in GC, it's not valid to emit them in binaries unless unless GC is enabled. The fix if only reference types is enabled is to emit (ref.null func) instead of (ref.null nofunc), but that doesn't always work if typed function references are enabled because a function type more specific than func may be required. Getting rid of typed function references as a separate feature makes this a nonissue.
* [Wasm GC] Support non-nullable locals in the "1a" form (#4959)Alon Zakai2022-08-311-26/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An overview of this is in the README in the diff here (conveniently, it is near the top of the diff). Basically, we fix up nn locals after each pass, by default. This keeps things easy to reason about - what validates is what is valid wasm - but there are some minor nuances as mentioned there, in particular, we ignore nameless blocks (which are commonly added by various passes; ignoring them means we can keep more locals non-nullable). The key addition here is LocalStructuralDominance which checks which local indexes have the "structural dominance" property of 1a, that is, that each get has a set in its block or an outer block that precedes it. I optimized that function quite a lot to reduce the overhead of running that logic after each pass. The overhead is something like 2% on J2Wasm and 0% on Dart (0%, because in this mode we shrink code size, so there is less work actually, and it balances out). Since we run fixups after each pass, this PR removes logic to manually call the fixup code from various places we used to call it (like eh-utils and various passes). Various passes are now marked as requiresNonNullableLocalFixups => false. That lets us skip running the fixups after them, which we normally do automatically. This helps avoid overhead. Most passes still need the fixups, though - any pass that adds a local, or a named block, or moves code around, likely does. This removes a hack in SimplifyLocals that is no longer needed. Before we worked to avoid moving a set into a try, as it might not validate. Now, we just do it and let fixups happen automatically if they need to: in the common code they probably don't, so the extra complexity seems not worth it. Also removes a hack from StackIR. That hack tried to avoid roundtrip adding a nondefaultable local. But we have the logic to fix that up now, and opts will likely keep it non-nullable as well. Various tests end up updated here because now a local can be non-nullable - previous fixups are no longer needed. Note that this doesn't remove the gc-nn-locals feature. That has been useful for testing, and may still be useful in the future - it basically just allows nn locals in all positions (that can't read the null default value at the entry). We can consider removing it separately. Fixes #4824
* Implement `extern.externalize` and `extern.internalize` (#4975)Thomas Lively2022-08-291-0/+35
| | | | These new GC instructions infallibly convert between `extern` and `any` references now that those types are not in the same hierarchy.
* Adding Multi-Memories Wasm Feature (#4968)Ashley Nelson2022-08-251-0/+6
| | | Adding multi-memories to the the list of wasm-features.
* Multi-Memories Validate (#4945)Ashley Nelson2022-08-231-37/+43
| | | This change loops through memories to validate each meets wasm spec. Also factors data segment validation out from memory validation, as it makes more sense for data segments to stand alone like the other module-level elements.
* Validator: Validate unreachable calls more carefully (#4909)Alon Zakai2022-08-181-0/+57
| | | | | | | | | | | | | | | | | | Normally the validator will find stale types properly, by just running refinalize and seeing if the type has changed (if so, then some code forgot to refinalize). However, refinalize is a local operation, so it does not apply to calls: a call's proper type is determined by the global information of the function we are calling. As a result, we would not notice errors like this: (call $foo) ;; type: unreachable Refinalizing that would not change the type from unreachable to the proper type, since that is global information. To validate this properly, validate that a call whose type is unreachable actually has an unreachable child. That rules out an invalid unreachable type here, which leaves concrete types, that we already have proper global validation for. The code here is generalized to handle non-call things as well, but it only helps expressions requiring global validation, so it likely only helps global.get and a few others.
* Restore the `extern` heap type (#4898)Thomas Lively2022-08-171-4/+4
| | | | | | | The GC proposal has split `any` and `extern` back into two separate types, so reintroduce `HeapType::ext` to represent `extern`. Before it was originally removed in #4633, externref was a subtype of anyref, but now it is not. Now that we have separate heaptype type hierarchies, make `HeapType::getLeastUpperBound` fallible as well.
* Mutli-Memories Support in IR (#4811)Ashley Nelson2022-08-171-64/+76
| | | | | | | This PR removes the single memory restriction in IR, adding support for a single module to reference multiple memories. To support this change, a new memory name field was added to 13 memory instructions in order to identify the memory for the instruction. It is a goal of this PR to maintain backwards compatibility with existing text and binary wasm modules, so memory indexes remain optional for memory instructions. Similarly, the JS API makes assumptions about which memory is intended when only one memory is present in the module. Another goal of this PR is that existing tests behavior be unaffected. That said, tests must now explicitly define a memory before invoking memory instructions or exporting a memory, and memory names are now printed for each memory instruction in the text format. There remain quite a few places where a hardcoded reference to the first memory persist (memory flattening, for example, will return early if more than one memory is present in the module). Many of these call-sites, particularly within passes, will require us to rethink how the optimization works in a multi-memories world. Other call-sites may necessitate more invasive code restructuring to fully convert away from relying on a globally available, single memory pointer.
* Validator: More carefully check for stale types (#4907)Alon Zakai2022-08-161-4/+8
| | | | | | | The validation logic to check for stale types (code where we forgot to run refinalize) had a workaround for a control flow issue. That workaround meant we didn't catch errors where a type was concrete but it should be unreachable. This PR makes that workaround only apply for control flow structures, so we can catch more errors.
* Validator: Validate intrinsics (#4880)Alon Zakai2022-08-161-8/+55
| | | | | | | | | | call.without.effects has a specific form, where the last parameter is a function reference, and that function reference must have the right type for the other parameters if called with them: (call $call.without.effects (..i32..) (..f64..) (..function reference, which takes params i32 and f64..)
* Function-level pass-debug mode 2 validation (#4897)Alon Zakai2022-08-121-0/+17
| | | | | | In BINARYEN_PASS_DEBUG=2 we save the module before each pass, and if validation fails afterwards, we print the module before. This PR does the same for function-parallel passes - in that case, we can actually show the specific function that broke validation, as opposed to the whole module.
* [EH] Pop should be supertype of tag type (#4901)Heejin Ahn2022-08-111-1/+1
| | | | `pop`s type should be a supertype, not a subtype, of the tag's type within `catch`.
* Remove RTTs (#4848)Thomas Lively2022-08-051-126/+17
| | | | | | | RTTs were removed from the GC spec and if they are added back in in the future, they will be heap types rather than value types as in our implementation. Updating our implementation to have RTTs be heap types would have been more work than deleting them for questionable benefit since we don't know how long it will be before they are specced again.
* Remove basic reference types (#4802)Thomas Lively2022-07-201-18/+10
| | | | | | | | | Basic reference types like `Type::funcref`, `Type::anyref`, etc. made it easy to accidentally forget to handle reference types with the same basic HeapTypes but the opposite nullability. In principle there is nothing special about the types with shorthands except in the binary and text formats. Removing these shorthands from the internal type representation by removing all basic reference types makes some code more complicated locally, but simplifies code globally and encourages properly handling both nullable and non-nullable reference types.
* First class Data Segments (#4733)Ashley Nelson2022-06-211-20/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Updating wasm.h/cpp for DataSegments * Updating wasm-binary.h/cpp for DataSegments * Removed link from Memory to DataSegments and updated module-utils, Metrics and wasm-traversal * checking isPassive when copying data segments to know whether to construct the data segment with an offset or not * Removing memory member var from DataSegment class as there is only one memory rn. Updated wasm-validator.cpp * Updated wasm-interpreter * First look at updating Passes * Updated wasm-s-parser * Updated files in src/ir * Updating tools files * Last pass on src files before building * added visitDataSegment * Fixing build errors * Data segments need a name * fixing var name * ran clang-format * Ensuring a name on DataSegment * Ensuring more datasegments have names * Adding explicit name support * Fix fuzzing name * Outputting data name in wasm binary only if explicit * Checking temp dataSegments vector to validateBinary because it's the one with the segments before we processNames * Pass on when data segment names are explicitly set * Ran auto_update_tests.py and check.py, success all around * Removed an errant semi-colon and corrected a counter. Everything still passes * Linting * Fixing processing memory names after parsed from binary * Updating the test from the last fix * Correcting error comment * Impl kripken@ comments * Impl tlively@ comments * Updated tests that remove data print when == 0 * Ran clang format * Impl tlively@ comments * Ran clang-format
* Update relaxed SIMD instructionsThomas Lively2022-06-071-2/+1
| | | | | Update the opcodes for all relaxed SIMD instructions and remove the unsigned dot product instructions that are no longer in the proposal.
* [NFC] Refactor EHUtils::findPops() method (#4704)Alon Zakai2022-06-011-25/+2
| | | | This moves it out of the validator so it can be used elsewhere. It will be used in #4685
* [Wasm GC] Fix CFG traversal of call_ref and add missing validation check (#4690)Alon Zakai2022-05-251-0/+32
| | | | | | | | We were missing CallRef in the CFG traversal code in a place where we note possible exceptions. As a result we thought CallRef cannot throw, and were missing some control flow edges. To actually detect the problem, we need to validate non-nullable locals properly, which we were not doing. This adds that as well.
* Validator: Check features for ref.null's type (#4677)Alon Zakai2022-05-181-0/+5
|
* Remove externref (#4633)Thomas Lively2022-05-041-7/+4
| | | | | | Remove `Type::externref` and `HeapType::ext` and replace them with uses of anyref and any, respectively, now that we have unified these types in the GC proposal. For backwards compatibility, continue to parse `extern` and `externref` and maintain their relevant C API functions.
* [NominalFuzzing] Add a validation error on ref.cast's etc. intended type (#4606)Alon Zakai2022-04-211-0/+7
|
* Implement relaxed SIMD dot product instructions (#4586)Thomas Lively2022-04-111-1/+3
| | | As proposed in https://github.com/WebAssembly/relaxed-simd/issues/52.
* [SIMD] Make swizzle's opcode name consistent (NFC) (#4585)Heejin Ahn2022-04-091-2/+2
| | | | Other opcode ends with `Inxm` or `Fnxm` (where n and m are integers), while `i8x16.swizzle`'s opcode name doesn't have an `I` in there.
* Implement i16x8.relaxed_q15mulr_s (#4583)Thomas Lively2022-04-071-1/+2
| | | As proposed in https://github.com/WebAssembly/relaxed-simd/issues/40.
* [Wasm GC] Fix non-nullable tuples (#4555)Alon Zakai2022-03-301-4/+4
| | | | | | Apply the same logic to tuple fields as we do for all other fields, when checking whether a non-nullable value is valid. Fixes #4554
* Add support for extended-const proposal (#4529)Sam Clegg2022-03-191-9/+20
| | | See https://github.com/WebAssembly/extended-const
* Fix an assertion in the validator on call_ref heaptypes (#4496)Alon Zakai2022-02-031-4/+5
| | | | We can only call getHeapType if it is indeed a function type. Otherwise we should show the error and move on.
* [EH] Enable fuzzer with initial contents (#4409)Heejin Ahn2022-01-041-1/+2
| | | | | | | | | This enables fuzzing EH with initial contents. fuzzing.cpp/h does not yet support generation of EH instructions, but with this we can still fuzz EH based on initial contents. The fuzzer ran successfully for more than 1,900,000 iterations, with my local modification that always enables EH and lets the fuzzer select only EH tests for its initial contents.
* Modernize code to C++17 (#3104)Max Graey2021-11-221-5/+2
|
* Change from storing Signature to HeapType on CallIndirect (#4352)Thomas Lively2021-11-221-5/+9
| | | | | | | | | | | | With nominal function types, this change makes it so that we preserve the identity of the function type used with call_indirect instructions rather than recreating a function heap type, which may or may not be the same as the originally parsed heap type, from the function signature during module writing. This will simplify the type system implementation by removing the need to store a "canonical" nominal heap type for each unique signature. We previously depended on those canonical types to avoid creating multiple duplicate function types during module writing, but now we aren't creating any new function types at all.
* Add support for relaxed-simd instructions (#4320)Ng Zhi An2021-11-151-1/+10
| | | | | | | | | | | | | | | | | | | | | This adds relaxed-simd instructions based on the current status of the proposal https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md. Binary opcodes are based on what is listed in https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md#binary-format. Text names are not fixed yet, and some sort sort of names that maps to the non-relaxed versions are chosen for this prototype. Support for these instructions have been added to LLVM via builtins, adding support here will allow Emscripten to successfully compile files that use those builtins. Interpreter support has also been added, and they delegate to the non-relaxed versions of the instructions. Most instructions are implemented in the interpreter the same way as the non-relaxed simd128 instructions, except for fma/fms, which is always fused.
* [EH] Improve catch validation (#4315)Heejin Ahn2021-11-081-0/+65
| | | | | | | | | | | | | | | | | | | This improves validation of `catch` bodies mostly by checking the validity of `pop`s. For every `catch` body: - Checks if its tag exists - If the tag's type is none: - Ensures there shouldn't be any `pop`s - If the tag's type is not none: - Checks if there's a single `pop` within the catch body - Checks if the tag type matches the `pop`'s type - Checks if the `pop`'s location is valid For every `catch_all` body: - Ensures there shuldn't be any `pop`s This uncovers several bugs related to `pop`s in existing tests, which this PR also fixes.
* Fix RTTs for RTT-less instructions (#4294)Thomas Lively2021-11-031-1/+4
| | | | | | | | | | | | Allocation and cast instructions without explicit RTTs should use the canonical RTTs for the given types. Furthermore, the RTTs for nominal types should reflect the static type hierarchy. Previously, however, we implemented allocations and casts without RTTs using an alternative system that only used static types rather than RTT values. This alternative system would work fine in a world without first-class RTTs, but it did not properly allow mixing instructions that use RTTs and instructions that do not use RTTs as intended by the M4 GC spec. This PR fixes the issue by using canonical RTTs where appropriate and cleans up the relevant casting code using std::variant.
* Add table.grow operation (#4245)Max Graey2021-10-181-0/+19
|
* Add table.size operation (#4224)Max Graey2021-10-081-0/+9
|
* Add table.set operation (#4215)Max Graey2021-10-071-0/+17
|
* Implement table.get (#4195)Alon Zakai2021-09-301-0/+15
| | | | Adds the part of the spec test suite that this passes (without table.set we can't do it all).
* [Wasm GC] Implement static (rtt-free) StructNew, ArrayNew, ArrayInit (#4172)Alon Zakai2021-09-231-12/+36
| | | | | | | | | See #4149 This modifies the test added in #4163 which used static casts on dynamically-created structs and arrays. That was technically not valid (as we won't want users to "mix" the two forms). This makes that test 100% static, which both fixes the test and gives test coverage to the new instructions added here.
* [Wasm GC] Add static variants of ref.test, ref.cast, and br_on_cast* (#4163)Alon Zakai2021-09-201-12/+49
| | | | | | | | | | | | These variants take a HeapType that is the type we intend to cast to, and do not take an RTT. These are intended to be more statically optimizable. For now though this PR just implements the minimum to get them parsing and to get through the optimizer without crashing. Spec: https://docs.google.com/document/d/1afthjsL_B9UaMqCA5ekgVmOm75BVFu6duHNsN9-gnXw/edit# See #4149
* Add an Intrinsics mechanism, and a call.without.effects intrinsic (#4126)Alon Zakai2021-09-101-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | An "intrinsic" is modeled as a call to an import. We could also add new IR things for them, but that would take more work and lead to less clear errors in other tools if they try to read a binary using such a nonstandard extension. A first intrinsic is added here, call.without.effects This is basically the same as call_ref except that the optimizer is free to assume the call has no side effects. Consequently, if the result is not used then it can be optimized out (as even if it is not used then side effects could have kept it around). Likewise, the lack of side effects allows more reordering and other things. A lowering pass for intrinsics is provided. Rather than automatically lower them to normal wasm at the end of optimizations, the user must call that pass explicitly. A typical workflow might be -O --intrinsic-lowering -O That optimizes with the intrinsic present - perhaps removing calls thanks to it - then lowers it into normal wasm - it turns into a call_ref - and then optimizes further, which would turns the call_ref into a direct call, potentially inline, etc.
* [Wasm GC] ArrayInit support (#4138)Alon Zakai2021-09-101-0/+26
| | | | | | | array.init is like array.new_with_rtt except that it takes as arguments the values to initialize the array with (as opposed to a size and an optional initial value). Spec: https://docs.google.com/document/d/1afthjsL_B9UaMqCA5ekgVmOm75BVFu6duHNsN9-gnXw/edit#
* Support subtyping in tail calls (#4032)Thomas Lively2021-07-281-5/+7
| | | | | | | | | The tail call spec does not include subtyping because it is based on the upstream spec, which does not contain subtyping. However, there is no reason that subtyping shouldn't apply to tail calls like it does for any other call or return. Update the validator to allow subtyping and avoid a related null pointer dereference while we're at it. Do not run the test in with --nominal because it is buggy in that mode.
* Preserve Function HeapTypes (#3952)Thomas Lively2021-06-301-14/+14
| | | | | | | | | When using nominal types, func.ref of two functions with identical signatures but different HeapTypes will yield different types. To preserve these semantics, Functions need to track their HeapTypes, not just their Signatures. This PR replaces the Signature field in Function with a HeapType field and adds new utility methods to make it almost as simple to update and query the function HeapType as it was to update and query the Function Signature.
* [EH] Make tag's attribute encoding detail (#3947)Heejin Ahn2021-06-211-4/+0
| | | | | | | | | This removes `attribute` field from `Tag` class, making the reserved and unused field known only to binary encoder and decoder. This also removes the `attribute` parameter from `makeTag` and `addTag` methods in wasm-builder.h, C API, and Binaryen JS API. Suggested in https://github.com/WebAssembly/binaryen/pull/3946#pullrequestreview-687756523.
* [EH] Replace event with tag (#3937)Heejin Ahn2021-06-181-21/+20
| | | | | | | | | | | We recently decided to change 'event' to 'tag', and to 'event section' to 'tag section', out of the rationale that the section contains a generalized tag that references a type, which may be used for something other than exceptions, and the name 'event' can be confusing in the web context. See - https://github.com/WebAssembly/exception-handling/issues/159#issuecomment-857910130 - https://github.com/WebAssembly/exception-handling/pull/161
* [Wasm GC] Add experimental support for non-nullable locals (#3932)Alon Zakai2021-06-151-0/+3
| | | | | | | | | | | | | | This adds a new feature flag, GCNNLocals that enables support for non-nullable locals. No validation is applied to check that they are actually assigned before their use yet - this just allows experimentation to begin. This feature is not enabled by default even with -all. If we enabled it, then it would take effect in most of our tests and likely confuse current users as well. Instead, the flag must be opted in explicitly using --enable-gc-nn-locals. That is, this is an experimental feature flag, and as such must be explicitly enabled. (Once the spec stabilizes, we will remove the feature anyhow when we implement the final status of non-nullability. )