summaryrefslogtreecommitdiff
path: root/src/tools/fuzzing
Commit message (Collapse)AuthorAgeFilesLines
* Support `array` and `struct` types in the type fuzzer (#5308)Thomas Lively2022-12-021-40/+54
| | | | | | | Since `data` has been removed from the upstream proposal and `struct` has been added in its place, update the type fuzzer to be structured around `struct` and `array` (which it had not previously been updated to support) rather than `data`. A follow-on PR will make the broader change of removing `data` and adding `struct`.
* Implement `array` basic heap type (#5148)Thomas Lively2022-10-182-12/+25
| | | | | | | | | `array` is the supertype of all defined array types and for now is a subtype of `data`. (Once `data` becomes `struct` this will no longer be true.) Update the binary and text parsing of `array.len` to ignore the obsolete type annotation and update the binary emitting to emit a zero in place of the old type annotation and the text printing to print an arbitrary heap type for the annotation. A follow-on PR will add support for the newer unannotated version of `array.len`.
* Make `Name` a pointer, length pair (#5122)Thomas Lively2022-10-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | With the goal of supporting null characters (i.e. zero bytes) in strings. Rewrite the underlying interned `IString` to store a `std::string_view` rather than a `const char*`, reduce the number of map lookups necessary to intern a string, and present a more immutable interface. Most importantly, replace the `c_str()` method that returned a `const char*` with a `toString()` method that returns a `std::string`. This new method can correctly handle strings containing null characters. A `const char*` can still be had by calling `data()` on the `std::string_view`, although this usage should be discouraged. This change is NFC in spirit, although not in practice. It does not intend to support any particular new functionality, but it is probably now possible to use strings containing null characters in at least some cases. At least one parser bug is also incidentally fixed. Follow-on PRs will explicitly support and test strings containing nulls for particular use cases. The C API still uses `const char*` to represent strings. As strings containing nulls become better supported by the rest of Binaryen, this will no longer be sufficient. Updating the C and JS APIs to use pointer, length pairs is left as future work.
* Implement bottom heap types (#5115)Thomas Lively2022-10-072-19/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | These types, `none`, `nofunc`, and `noextern` are uninhabited, so references to them can only possibly be null. To simplify the IR and increase type precision, introduce new invariants that all `ref.null` instructions must be typed with one of these new bottom types and that `Literals` have a bottom type iff they represent null values. These new invariants requires several additional changes. First, it is now possible that the `ref` or `target` child of a `StructGet`, `StructSet`, `ArrayGet`, `ArraySet`, or `CallRef` instruction has a bottom reference type, so it is not possible to determine what heap type annotation to emit in the binary or text formats. (The bottom types are not valid type annotations since they do not have indices in the type section.) To fix that problem, update the printer and binary emitter to emit unreachables instead of the instruction with undetermined type annotation. This is a valid transformation because the only possible value that could flow into those instructions in that case is null, and all of those instructions trap on nulls. That fix uncovered a latent bug in the binary parser in which new unreachables within unreachable code were handled incorrectly. This bug was not previously found by the fuzzer because we generally stop emitting code once we encounter an instruction with type `unreachable`. Now, however, it is possible to emit an `unreachable` for instructions that do not have type `unreachable` (but are known to trap at runtime), so we will continue emitting code. See the new test/lit/parse-double-unreachable.wast for details. Update other miscellaneous code that creates `RefNull` expressions and null `Literals` to maintain the new invariants as well.
* [Fuzzing] Allow recombine() to replace with a subtype (#5101)Alon Zakai2022-10-031-4/+43
| | | | Previously it would randomly replace an expression with another one with the exact same type. Allowing a subtype may give us more coverage.
* Remove typed-function-references feature (#5030)Thomas Lively2022-09-091-6/+3
| | | | | | | | | | | | | | | | In practice typed function references will not ship before GC and is not independently useful, so it's not necessary to have a separate feature for it. Roll the functionality previously enabled by --enable-typed-function-references into --enable-gc instead. This also avoids a problem with the ongoing implementation of the new GC bottom heap types. That change will make all ref.null instructions in Binaryen IR refer to one of the bottom heap types. But since those bottom types are introduced in GC, it's not valid to emit them in binaries unless unless GC is enabled. The fix if only reference types is enabled is to emit (ref.null func) instead of (ref.null nofunc), but that doesn't always work if typed function references are enabled because a function type more specific than func may be required. Getting rid of typed function references as a separate feature makes this a nonissue.
* [NFC] Remove unused code in type fuzzer (#5023)Thomas Lively2022-09-071-67/+0
| | | | | The only call to `generateSubBasic` was removed as part of a bug fix in #4346, but the function itself was not removed. Remove it and other unused functions it depends on now.
* Update fuzzer to newer GC spec regarding JS interop (#4965)Alon Zakai2022-08-311-7/+24
| | | | Do not export functions that have types not allowed in the rules for JS interop. Only very few GC types can be on the JS boundary atm.
* [Wasm GC] Support non-nullable locals in the "1a" form (#4959)Alon Zakai2022-08-311-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An overview of this is in the README in the diff here (conveniently, it is near the top of the diff). Basically, we fix up nn locals after each pass, by default. This keeps things easy to reason about - what validates is what is valid wasm - but there are some minor nuances as mentioned there, in particular, we ignore nameless blocks (which are commonly added by various passes; ignoring them means we can keep more locals non-nullable). The key addition here is LocalStructuralDominance which checks which local indexes have the "structural dominance" property of 1a, that is, that each get has a set in its block or an outer block that precedes it. I optimized that function quite a lot to reduce the overhead of running that logic after each pass. The overhead is something like 2% on J2Wasm and 0% on Dart (0%, because in this mode we shrink code size, so there is less work actually, and it balances out). Since we run fixups after each pass, this PR removes logic to manually call the fixup code from various places we used to call it (like eh-utils and various passes). Various passes are now marked as requiresNonNullableLocalFixups => false. That lets us skip running the fixups after them, which we normally do automatically. This helps avoid overhead. Most passes still need the fixups, though - any pass that adds a local, or a named block, or moves code around, likely does. This removes a hack in SimplifyLocals that is no longer needed. Before we worked to avoid moving a set into a try, as it might not validate. Now, we just do it and let fixups happen automatically if they need to: in the common code they probably don't, so the extra complexity seems not worth it. Also removes a hack from StackIR. That hack tried to avoid roundtrip adding a nondefaultable local. But we have the logic to fix that up now, and opts will likely keep it non-nullable as well. Various tests end up updated here because now a local can be non-nullable - previous fixups are no longer needed. Note that this doesn't remove the gc-nn-locals feature. That has been useful for testing, and may still be useful in the future - it basically just allows nn locals in all positions (that can't read the null default value at the entry). We can consider removing it separately. Fixes #4824
* Separate `func` into a separate type hierarchy (#4955)Thomas Lively2022-08-222-33/+12
| | | | | Just like `extern` is no longer a subtype of `any` in the new GC type system, `func` is no longer a subtype of `any`, either. Make that change in our type system implementation and update tests and fuzzers accordingly.
* Materialize non-null externrefs in the fuzzer (#4952)Thomas Lively2022-08-221-2/+7
| | | | | | | Some fuzzer initial contents contain non-nullable externrefs that cause the fuzzer to try to materialize non-nullable externref values. Perviously the fuzzer did not support this and crashed with an assertion failure. Fix the assertion failure by instead returning a null cast to non-null, which will trap at runtime but at least produce a valid module.
* Restore the `extern` heap type (#4898)Thomas Lively2022-08-172-6/+30
| | | | | | | The GC proposal has split `any` and `extern` back into two separate types, so reintroduce `HeapType::ext` to represent `extern`. Before it was originally removed in #4633, externref was a subtype of anyref, but now it is not. Now that we have separate heaptype type hierarchies, make `HeapType::getLeastUpperBound` fallible as well.
* Mutli-Memories Support in IR (#4811)Ashley Nelson2022-08-171-53/+118
| | | | | | | This PR removes the single memory restriction in IR, adding support for a single module to reference multiple memories. To support this change, a new memory name field was added to 13 memory instructions in order to identify the memory for the instruction. It is a goal of this PR to maintain backwards compatibility with existing text and binary wasm modules, so memory indexes remain optional for memory instructions. Similarly, the JS API makes assumptions about which memory is intended when only one memory is present in the module. Another goal of this PR is that existing tests behavior be unaffected. That said, tests must now explicitly define a memory before invoking memory instructions or exporting a memory, and memory names are now printed for each memory instruction in the text format. There remain quite a few places where a hardcoded reference to the first memory persist (memory flattening, for example, will return early if more than one memory is present in the module). Many of these call-sites, particularly within passes, will require us to rethink how the optimization works in a multi-memories world. Other call-sites may necessitate more invasive code restructuring to fully convert away from relying on a globally available, single memory pointer.
* Remove RTTs (#4848)Thomas Lively2022-08-053-45/+9
| | | | | | | RTTs were removed from the GC spec and if they are added back in in the future, they will be heap types rather than value types as in our implementation. Updating our implementation to have RTTs be heap types would have been more work than deleting them for questionable benefit since we don't know how long it will be before they are specced again.
* Remove basic reference types (#4802)Thomas Lively2022-07-202-68/+16
| | | | | | | | | Basic reference types like `Type::funcref`, `Type::anyref`, etc. made it easy to accidentally forget to handle reference types with the same basic HeapTypes but the opposite nullability. In principle there is nothing special about the types with shorthands except in the binary and text formats. Removing these shorthands from the internal type representation by removing all basic reference types makes some code more complicated locally, but simplifies code globally and encourages properly handling both nullable and non-nullable reference types.
* [Strings] Add string proposal types (#4755)Alon Zakai2022-06-292-0/+20
| | | | | | | | This starts to implement the Wasm Strings proposal https://github.com/WebAssembly/stringref/blob/main/proposals/stringref/Overview.md This just adds the types.
* First class Data Segments (#4733)Ashley Nelson2022-06-211-18/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Updating wasm.h/cpp for DataSegments * Updating wasm-binary.h/cpp for DataSegments * Removed link from Memory to DataSegments and updated module-utils, Metrics and wasm-traversal * checking isPassive when copying data segments to know whether to construct the data segment with an offset or not * Removing memory member var from DataSegment class as there is only one memory rn. Updated wasm-validator.cpp * Updated wasm-interpreter * First look at updating Passes * Updated wasm-s-parser * Updated files in src/ir * Updating tools files * Last pass on src files before building * added visitDataSegment * Fixing build errors * Data segments need a name * fixing var name * ran clang-format * Ensuring a name on DataSegment * Ensuring more datasegments have names * Adding explicit name support * Fix fuzzing name * Outputting data name in wasm binary only if explicit * Checking temp dataSegments vector to validateBinary because it's the one with the segments before we processNames * Pass on when data segment names are explicitly set * Ran auto_update_tests.py and check.py, success all around * Removed an errant semi-colon and corrected a counter. Everything still passes * Linting * Fixing processing memory names after parsed from binary * Updating the test from the last fix * Correcting error comment * Impl kripken@ comments * Impl tlively@ comments * Updated tests that remove data print when == 0 * Ran clang format * Impl tlively@ comments * Ran clang-format
* Fuzzer: Add support for creating structs and arrays in makeConst (#4707)Alon Zakai2022-06-011-8/+20
| | | | | | | | #4659 adds a testcase with an import of (ref $struct). This could cause an error in the fuzzer, since it wants to remove imports (because the various fuzzers cannot pass in custom imports - they want to just run the wasm). When it tries to remove that import it tries to create a constant for a struct reference, and fails. To fix that, add enough support to create structs and arrays at least in the simple case where all their fields are defaultable.
* Fuzzer: Refactor makeConst into separate functions [NFC] (#4709)Alon Zakai2022-06-011-85/+107
| | | This just moves code around + adds assertions.
* Validator: Check features for ref.null's type (#4677)Alon Zakai2022-05-181-0/+2
|
* [GC Fuzzing] Avoid non-nullable eqref without GC (#4675)Alon Zakai2022-05-181-2/+22
| | | | | | With only reference types but not GC, we cannot easily create a constant for eqref for example. Only GC adds i31.new etc. To avoid assertions in the fuzzer, avoid randomly picking (ref eq) etc., that is, keep it nullable so that we can emit a (ref.null eq) if we need a constant value of that type.
* [Fuzzer] Reduce trap probability in function ref fallback code (#4653)Alon Zakai2022-05-161-10/+15
| | | | | | Also improve comments. As suggested in #4647
* [Fuzzer] Fix another reference types vs gc types issue (#4647)Alon Zakai2022-05-061-36/+37
| | | | | | | | | | Diff without whitespace is smaller. We can't emit HeapType::data without GC. Fixing that by switching to func, another problem was uncovered: makeRefFuncConst had a TODO to handle the case where we need a function to refer to but have created none yet. In fact that TODO was done at the end of the function. Fix up the logic in between to actually get there.
* Fix fuzzer's choosing of reference types (#4642)Alon Zakai2022-05-051-7/+18
| | | | | | * Don't emit "i31" or "data" if GC is not enabled, as only the GC feature adds those. * Don't emit "any" without GC either. While it is allowed, fuzzer limitations prevent this atm (see details in comment - it's fixable).
* Remove externref (#4633)Thomas Lively2022-05-042-30/+4
| | | | | | Remove `Type::externref` and `HeapType::ext` and replace them with uses of anyref and any, respectively, now that we have unified these types in the GC proposal. For backwards compatibility, continue to parse `extern` and `externref` and maintain their relevant C API functions.
* [NominalFuzzing] Fix TranslateToFuzzReader::getSubType(Rtt) (#4604)Alon Zakai2022-04-211-0/+6
| | | | Randomly selecting a depth is ok for structural typing, but in nominal it must match the actual hierarchy of types.
* [SIMD] Make swizzle's opcode name consistent (NFC) (#4585)Heejin Ahn2022-04-091-1/+1
| | | | Other opcode ends with `Inxm` or `Fnxm` (where n and m are integers), while `i8x16.swizzle`'s opcode name doesn't have an `I` in there.
* Isorecursive type fuzzing (#4501)Thomas Lively2022-02-041-25/+71
| | | | | | | | | | Add support for isorecursive types to wasm-fuzz-types by generating recursion groups and ensuring that children types are only selected from candidates through the end of the current group. For non-isorecursive systems, treat all the types as belonging to a single group so that their behavior is unchanged. Also fix two small bugs found by the fuzzer: LUB calculation was taking the wrong path for isorecursive types and isorecursive validation was not handling basic heap types properly.
* [EH] Enable fuzzer with initial contents (#4409)Heejin Ahn2022-01-041-2/+6
| | | | | | | | | This enables fuzzing EH with initial contents. fuzzing.cpp/h does not yet support generation of EH instructions, but with this we can still fuzz EH based on initial contents. The fuzzer ran successfully for more than 1,900,000 iterations, with my local modification that always enables EH and lets the fuzzer select only EH tests for its initial contents.
* [Fuzzer] Allow empty data in --translate-to-fuzz (#4406)Heejin Ahn2021-12-281-2/+2
| | | | | | | When a parameter and a member variable have the same name within a constructor, to access (and change) the member variable, we need to either use `this->` or change the name of the parameter. The current code ended up changing the parameter and didn't affect the status of the member variable, which remained empty.
* Change from storing Signature to HeapType on CallIndirect (#4352)Thomas Lively2021-11-221-1/+1
| | | | | | | | | | | | With nominal function types, this change makes it so that we preserve the identity of the function type used with call_indirect instructions rather than recreating a function heap type, which may or may not be the same as the originally parsed heap type, from the function signature during module writing. This will simplify the type system implementation by removing the need to store a "canonical" nominal heap type for each unique signature. We previously depended on those canonical types to avoid creating multiple duplicate function types during module writing, but now we aren't creating any new function types at all.
* Check for correct subtyping in the type fuzzer (#4350)Thomas Lively2021-11-202-84/+94
| | | | | Check that types that were meant to have a subtype relationship actually do. To expose the intended subtyping to the fuzzer, expose `subtypeIndices` in the return value of the type generation function.
* Allow building basic HeapTypes in nominal mode (#4346)Thomas Lively2021-11-191-19/+11
| | | | | | | | | | | | | | | | As we work toward allowing nominal and structural types to coexist, any difference in how they can be built or used will be an inconvenient footgun that we will have to work around. In the spirit of reducing the differences between the type systems, allow TypeBuilder to construct basic HeapTypes in nominal mode just as it can in equirecursive mode. Although this change is a net increase in code complexity for not much benefit (wasm-opt never needs to build basic HeapTypes), it is also an incremental step toward getting rid of separate type system modes, so I expect it to simplify other PRs in the near future. This change also uncovered a bug in how the type fuzzer generated subtypes of basic HeapTypes. The generated subtypes did not necessarily have the intended `Kind`, which caused failures in nominal subtype validation in the fuzzer.
* Small cleanups in type fuzzer (#4337)Thomas Lively2021-11-171-18/+12
| | | | | | | - Do not require defaultable types in function returns - Increase likelihood of `none` function return types - Correctly generate subtypes of basic types - Actually check output in tests - Print to cout instead of cerr
* Add a fuzzer specifically for types (#4328)Thomas Lively2021-11-156-39/+714
| | | | | | | | | | | | | | | Add a new fuzzer binary that repeatedly generates random types to find bugs in the type system implementation. Each iteration creates some number of root types followed by some number of subtypes thereof. Each built type can contain arbitrary references to other built types, regardless of their order of construction. Right now the fuzzer only finds fatal errors in type building (and in its own implementation), but it is meant to be extended to check other properties in the future, such as that LUB calculations work as expected. The logic for creating types is also intended to be integrated into the main fuzzer in a follow-on PR so that the main fuzzer can fuzz with arbitrarily more interesting GC types.
* Fuzz more basic GC types (#4303)Thomas Lively2021-11-041-116/+244
| | | | | Generate both nullable and non-nullable references to basic HeapTypes and introduce `i31` and `data` HeapTypes. Generate subtypes rather than exact types for all concrete-typed children.
* [NFC] Factor fuzzer randomness into a separate utility (#4304)Thomas Lively2021-11-043-59/+148
| | | | In preparation for using it from a separate file specifically for generating random HeapTypes that has no need to depend on all of fuzzing.h.
* [NFC] Create a .cpp file for fuzzer implementation (#4279)Thomas Lively2021-10-261-0/+3024
Having a monolithic header file containing all the implementation meant there was no good way to split up the code or introduce new files. The new implementation file and source directory will make it much easier to add new fuzzing functionality in new files.