summaryrefslogtreecommitdiff
path: root/src/wasm
Commit message (Collapse)AuthorAgeFilesLines
* [Parser] Token classification (#4699)Thomas Lively2022-06-011-22/+151
| | | | | | | | | | | | Add methods to `Token` for determining whether the token can be interpreted as a particular token type, returning the interpreted value as appropriate. These methods perform additional bounds checks for integers and NaN payloads that could not be done during the initial lexing because the lexer did not know what the intended token type was. The float methods also reinterpret integer tokens as floating point tokens since the float grammar is a superset of the integer grammar and inject the NaN payloads into parsed NaN values. Move all bounds checking to these new classifier functions to have it in one place.
* [NFC] Refactor EHUtils::findPops() method (#4704)Alon Zakai2022-06-011-25/+2
| | | | This moves it out of the validator so it can be used elsewhere. It will be used in #4685
* wasm-emscripten-finalize: Improve detection of mainReadsParams (#4701)Sam Clegg2022-05-311-6/+10
| | | | | | The first way to should detect this is if the main function actually doesn't take any params. They we fallback to looking deeper. In preparation for https://reviews.llvm.org/D75277
* Remove renameMainArgcArgv from wasm-emscripten-finalize (#4700)Sam Clegg2022-05-311-12/+0
| | | | | | | | | This part to finalize is currently not used and was added in preparation for https://reviews.llvm.org/D75277. However, the better solution to dealing with this alternative name for main is on the emscripten side. The main reason for this is that doing the rename here in binaryen would require finalize to always re-write the binary, which is expensive.
* [Parser] Replace Signedness with ternary Sign (#4698)Thomas Lively2022-05-271-23/+21
| | | | | | | | Previously we were tracking whether integer tokens were signed but we did not differentiate between positive and negative signs. Unfortunately, without differentiating them, there's no way to tell the difference between an in-bounds negative integer and a wildly out-of-bounds positive integer when trying to perform bounds checks for s32 tokens. Fix the problem by tracking not only whether there is a sign on an integer token, but also what the sign is.
* [Parser][NFC] Improve comments about default NaN payloads (#4697)Thomas Lively2022-05-271-0/+5
|
* [Parser][NFC] Remove extraneous braces from std::optional returns (#4696)Thomas Lively2022-05-271-9/+9
|
* [Parser][NFC] Create a public wat-lexer.h header (#4695)Thomas Lively2022-05-272-241/+101
| | | | | | wat-parser-internal.h was already quite large after implementing just the lexer, so it made sense to rename it to be lexer-specific and start a new file for the higher-level parser. Also make it a proper .cpp file and split the testable interface out into wat-lexer.h.
* [EH] Export tags (#4691)Heejin Ahn2022-05-262-1/+2
| | | | | | | | | | | | | | | | | This adds exported tags to `exports` section in wasm-emscripten-finalize metadata so Emscripten can use it. Also fixes a bug in the parser. We have only recognized the export format of ```wasm (tag $e2 (param f32)) (export "e2" (tag $e2)) ``` and ignored this format: ```wasm (tag $e1 (export "e1") (param i32)) ``` Companion patch: https://github.com/emscripten-core/emscripten/pull/17064
* [Parser][NFC] Clarify escaped string lexing (#4694)Thomas Lively2022-05-261-24/+27
| | | | Improve comments and variable names to make it clear that we allocate and build a separate string only when necessary to handle escape sequences.
* [Parser] Lex floating point values (#4693)Thomas Lively2022-05-261-25/+231
| | | | | | | | | | | | | | | Rather than trying to actually implement the parsing of float values, which cannot be done naively due to precision concerns, just parse the float grammar then postprocess the parsed text into a form we can pass to `strtod` to do the actual parsing of the value. Since the float grammar reuses `num` and `hexnum` from the integer grammar but does not care about overflow, add a mode to `LexIntCtx`, `num`, and `hexnum` to allow parsing overflowing numbers. For NaNs, store the payload as a separate value rather than as part of the parsed double. The payload will be injected into the NaN at a higher level of the parser once we know whether we are parsing an f64 or an f32 and therefore know what the allowable payload values are.
* [Parser] Lex keywords (#4688)Thomas Lively2022-05-251-1/+36
| | | | | Also include reserved words that look like keywords to avoid having to find and enumerate all the valid keywords. Invalid keywords will be rejected at a higher level in the parser instead.
* [Parser] Lex strings (#4687)Thomas Lively2022-05-251-1/+164
|
* [Parser] Lex idchar and identifiers (#4686)Thomas Lively2022-05-251-3/+82
|
* [Wasm GC] Fix CFG traversal of call_ref and add missing validation check (#4690)Alon Zakai2022-05-251-0/+32
| | | | | | | | We were missing CallRef in the CFG traversal code in a place where we note possible exceptions. As a result we thought CallRef cannot throw, and were missing some control flow edges. To actually detect the problem, we need to validate non-nullable locals properly, which we were not doing. This adds that as well.
* [Parser] Start a new text format parser (#4680)Thomas Lively2022-05-241-0/+557
| | | | | | | | | | | | | | | | | | | | | | | | | Begin implementing a new text format parser that will accept the standard text format. Start with a lexer that can iterate over tokens in an underlying text buffer. The initial supported tokens are integers, parentheses, and whitespace including comments. The implementation is in a new private internal header so it can be included into a gtest source file even though it is not meant to be a public API. Once the parser is more complete, there will be an additional public header exposing a more concise public API and the private header will be included into a source file that implements that public API. The new parser will improve on the existing text format parser not only because it will accept the full standard text format, but also because its code will be simpler and easier to maintain and because it will hopefully be faster as well. The new parser will be built out of small functions that closely mirror the grammar productions given in the spec and will heavily use C++17 features like string_view, optional, and variant to provide more self-documenting and efficient code. Future PRs will add support for lexing other kinds of tokens followed by support for parsing more complex constructs.
* Fix binary parsing of the prototype nominal format (#4679)Thomas Lively2022-05-191-3/+0
| | | | | | We were checking that nominal modules only had a single element in their type sections, but that's not correct for the prototype nominal binary format we still want to support. The test for this missed catching the bug because it wasn't actually parsing in nominal mode.
* [NFC] Deduplicate imported global parsing (#4678)Thomas Lively2022-05-191-23/+7
| | | | | | | | | | Share the logic for parsing imported and non-imported globals of the formats: (import "module" "base" (global $name? type)) (global $name? type init) This fixes #4676, since the deleted logic for parsing imported globals did not handle parsing GC types correctly.
* Validator: Check features for ref.null's type (#4677)Alon Zakai2022-05-181-0/+5
|
* Allow TypeBuilder::grow to take 0 as an argument (#4668)Thomas Lively2022-05-161-1/+1
| | | | | | | There's no reason not to allow growing by zero slots, but previously doing so would trigger an assertion. This caused a crash when roundtripping a trivial module. Fixes #4667.
* [NFC] Make Literal::makeNull take a HeapType (#4664)Alon Zakai2022-05-131-1/+1
| | | | | | | | Taking a Type is redundant as we only care about the heap type - the nullability must be Nullable. This avoids needing an assertion in the function, that is, it makes the API more type-safe.
* Add ref.cast_nop_static (#4656)Thomas Lively2022-05-113-4/+17
| | | | | | This unsafe experimental instruction is semantically equivalent to ref.cast_static, but V8 will unsafely turn it into a nop. This is meant to help us measure cast overhead more precisely than we can by globally turning all casts into nops.
* Parse the prototype nominal binary format (#4644)Thomas Lively2022-05-041-9/+40
| | | | | | In f124a11ca3 we removed support for the prototype nominal binary format entirely, but that means that we can no longer parse older binary modules that used that format. Fix this regression by restoring the ability to parse the prototype binary format.
* Remove externref (#4633)Thomas Lively2022-05-046-62/+10
| | | | | | Remove `Type::externref` and `HeapType::ext` and replace them with uses of anyref and any, respectively, now that we have unified these types in the GC proposal. For backwards compatibility, continue to parse `extern` and `externref` and maintain their relevant C API functions.
* Update the type section binary format (#4625)Thomas Lively2022-05-021-64/+62
| | | | | | | | | | Print subtype declarations using the standards-track format with a vector of supertypes followed by a normal type declaration rather than our interim nominal format that used alternative versions of the func, struct, and array forms. Desugar the nominal format to additionally emit all the types into a single large recursion group. Currently V8 is performing this desugaring, but after this change and a future change that fixes the order of nominal types to ensure supertypes precede subtypes, it will no longer need to.
* [NominalFuzzing] Add a validation error on ref.cast's etc. intended type (#4606)Alon Zakai2022-04-211-0/+7
|
* Implement relaxed SIMD dot product instructions (#4586)Thomas Lively2022-04-114-8/+56
| | | As proposed in https://github.com/WebAssembly/relaxed-simd/issues/52.
* [SIMD] Make swizzle's opcode name consistent (NFC) (#4585)Heejin Ahn2022-04-093-6/+6
| | | | Other opcode ends with `Inxm` or `Fnxm` (where n and m are integers), while `i8x16.swizzle`'s opcode name doesn't have an `I` in there.
* Implement i16x8.relaxed_q15mulr_s (#4583)Thomas Lively2022-04-073-3/+12
| | | As proposed in https://github.com/WebAssembly/relaxed-simd/issues/40.
* [Wasm GC] Fix stacky non-nullable tuples (#4561)Alon Zakai2022-03-311-7/+8
| | | | | #4555 fixed validation for such tuples, but we also did not handle them in "stacky" code using pops etc., due to a logic bug in the binary reading code.
* [Wasm GC] Fix non-nullable tuples (#4555)Alon Zakai2022-03-302-4/+16
| | | | | | Apply the same logic to tuple fields as we do for all other fields, when checking whether a non-nullable value is valid. Fixes #4554
* Remove nearly unused {Heap}Type::isCompound (#4541)Thomas Lively2022-03-221-2/+2
|
* Add support for extended-const proposal (#4529)Sam Clegg2022-03-193-9/+25
| | | See https://github.com/WebAssembly/extended-const
* Include type names in error messages from building (#4517)Thomas Lively2022-02-181-1/+10
| | | | Instead of just reporting the type index that causes an error when building types, report the name of the responsible type when parsing the text format.
* Eagerly canonicalize basic types (#4507)Thomas Lively2022-02-081-8/+21
| | | | | | | | | We were already eagerly canonicalizing basic HeapTypes when building types so the more complicated canonicalization algorithms would not have to handle noncanonical heap types, but we were not doing the same for Types. Equirecursive canonicalization was properly handling noncanonical Types everywhere, but isorecursive canonicalization was not. Rather than update isorecursive canonicalization in multiple places, remove the special handling from equirecursive canonicalization and canonicalize types in a single location.
* Generate heap type names when printing types (#4503)Thomas Lively2022-02-071-91/+138
| | | | | | | | | | | | | | | | | | | | | | The previous printing system in the Types API would print the full recursive structure of a Type or HeapType with special markers using de Bruijn indices to avoid infinite recursion and a separate special marker for when the size exceeded an arbitrary upper limit. In practice, the types printed by that system were not human readable, so all that complexity was not useful. Replace that system with a new system that always emits a HeapType name rather than recursing into the structure of inner HeapTypes. Add methods for printing Types and HeapTypes with custom HeapType name generators. Also add a new wasm-type-printing.h header with off-the-shelf type name generators that implement simple naming schemes sufficient for tests and the type fuzzer. Note that these new printing methods and the old printing methods they augment are not used for emitting text modules. Printing types as part of expressions and modules is handled by separate code in Print.cpp and the printing API modified in this PR is mostly used for debugging. However, the new printing methods are general enough that Print.cpp should be able to use them as well, so update the format used to print types in the modified printing system to match the text format in anticipation of making that change in a follow-up PR.
* Validate supertypes after isorecursive canonicalization (#4506)Thomas Lively2022-02-071-13/+22
| | | | | | | | | | | | | | Validating nominally declared supertypes depends on being able to test type equality, which in turn depends on the compared types having already been canonicalized. Previously we validated supertypes before canonicalization, so validation would fail in cases where it should have succeeded. Fix the bug by canonicalizing first. This means that the global type store can now end up holding invalid types, but those types will never be exposed to the user, so that's not a huge problem. Also fix an unrelated bug that was preventing the test from passing, which is that supertypes were being hashed and compared without the special logic for detecting self-references. This bug preventing the equivalent recursion groups in the test from being deduplicated.
* Isorecursive type fuzzing (#4501)Thomas Lively2022-02-041-1/+5
| | | | | | | | | | Add support for isorecursive types to wasm-fuzz-types by generating recursion groups and ensuring that children types are only selected from candidates through the end of the current group. For non-isorecursive systems, treat all the types as belonging to a single group so that their behavior is unchanged. Also fix two small bugs found by the fuzzer: LUB calculation was taking the wrong path for isorecursive types and isorecursive validation was not handling basic heap types properly.
* Isorecursive binary format (#4494)Thomas Lively2022-02-031-16/+51
| | | | | | | | | | | Write and parse recursion groups in binary type sections. Unlike in the text format, where we ignore recursion groups when not using isorecursive types, do not allow parsing binary recursion group when using other type systems. Doing so would produce incorrect results because recursions groups only count as single entries in the type system vector so we dynamically grow the TypeBuilder when we encounter them. That would change the mapping of later indices to types, and would change the meaning of previous type definitions that use those later indices. This is not a problem in the isorecursive system because in that system type definitions are not allowed to use later indices.
* Fix an assertion in the validator on call_ref heaptypes (#4496)Alon Zakai2022-02-031-4/+5
| | | | We can only call getHeapType if it is indeed a function type. Otherwise we should show the error and move on.
* Topological sorting of types in isorecursive output (#4492)Thomas Lively2022-02-021-34/+36
| | | | | | | | | | | | | | | | | | | Generally we try to order types by decreasing use count so that frequently used types get smaller indices. For the equirecursive and nominal systems, there are no contraints on the ordering of types, so we just have to sort them according to their use counts. For the isorecursive type system, however, there are a number of ordering constraints that have to be met for the type section to be valid. First, types in the same recursion group must be adjacent so they can be grouped together. Second, groups must be ordered topologically so that they only refer to types in themselves or prior groups. Update type ordering to produce a valid isorecursive output by performing a topological sort on the recursion groups. While performing the sort, prefer to visit and finish processing the most used groups first as a heuristic to improve the final ordering. Do not reorder types within groups, since doing so would change type identity and could affect the external interface of the module. Leave that reordering to an optimization pass (not yet implemented) that users can explicitly opt in to.
* Isorecursive HeapType constructors (#4482)Thomas Lively2022-01-291-15/+65
| | | | | Update the HeapType constructors that take Signature, Structs, and Arrays to work properly with isorecursive typing. This is particularly important for the Signature constructor, which is used frequently throughout the code base.
* Isorecursive canonicalization (#4481)Thomas Lively2022-01-291-20/+409
| | | | | | | | | | | | | | | | | | | | | | | | | Isorecursive canonicalization is similar to equirecursive canonicalization in that it deduplicates types based on their structure, but unlike equirecursive canonicalization, isorecursive canonicalization does not need to minimize the type definition first. Another difference is that structures are deduplicated at the level of recursion groups rather than individual types, so we cannot reuse the existing `FiniteShapeHasher` and `FiniteShapeEquator` utilities. Instead, introduce a new `RecGroupStructure` wrapper that provides equality comparison and hashing very similar to the former utilities. Another feature of the isorecursive type system is that its constraints on the order of recursion groups means that they are already topologically sorted and can be incrementally canonicalized in a bottom-up manner. This incremental canonicalization means that the `RecGroupStructure` utility can assume that all child `HeapTypes` have already been canonicalized and can avoid traversing anything but the top-level type definitions in the wrapped recursion group. The only exception is self-references into the wrapped recursion group itself, which may not be canonical. That special case is detected and handled without any nontrivial extra work. Overall, the canonicalization algorithm traverses each type definition once to canonicalize its `HeapType` use sites, then once to hash its recursion group structure, then finally once to canonicalize its `Type` use sites in `globallyCanonicalize`, giving only linear time complexity.
* Add a HeapType method for getting the rec group index (#4480)Thomas Lively2022-01-271-0/+7
| | | | | Storing the rec group index on the HeapTypeInfo avoids having to do a linear scan through the rec group to find the index for a particular type. This will be important for isorecursive canonicalization, which uses rec group indices.
* [NFC] Split shallow and deep updating of CanonicalizationState (#4478)Thomas Lively2022-01-261-19/+41
| | | | | | | Since isorecursive canonicalization will happen incrementally in a bottom-up manner, it will be more efficient if can more precisely control which types get updated at each step. Refactor the `CanonicalizationState` interface to separately expose shallow updating, which updates only the top-level state, and deep updating of the use sites within HeapTypeInfos.
* Isorecursive type validation (#4475)Thomas Lively2022-01-261-43/+102
| | | | | | | When building isorecursive types, validate their relationships according to the rules described in https://github.com/WebAssembly/gc/pull/243. Specifically, supertypes must be declared before their subtypes to statically prevent cycles and child types must be declared either before or in the same recursion group as their parents.
* Make `TypeBuilder::build()` fallible (#4474)Thomas Lively2022-01-253-24/+51
| | | | | | | | | | | It is possible for type building to fail, for example if the declared nominal supertypes form a cycle or are structurally invalid. Previously we would report a fatal error and kill the program from inside `TypeBuilder::build()` in these situations, but this handles errors at the wrong layer of the code base and is inconvenient for testing the error cases. In preparation for testing the new error cases introduced by isorecursive typing, make type building fallible and add new tests for existing error cases. Also fix supertype cycle detection, which it turns out did not work correctly.
* Parse, create, and print isorecursive recursion groups (#4464)Thomas Lively2022-01-212-16/+144
| | | | | | | | | | | | | In `--hybrid` isorecursive mode, associate each defined type with a recursion group, represented as a `(rec ...)` wrapping the type definitions in the text format. Parse that text format, create the rec groups using a new TypeBuilder method, and print the rec groups in the printer. The only semantic difference rec groups currently make is that if one type in a rec group will be included in the output, all the types in that rec group will be included. This is because changing a rec group in any way (for example by removing a type) changes the identity of the types in that group in the isorecursive type system. Notably, rec groups do not yet participate in validation, so `--hybrid` is largely equivalent to `--nominal` for now.
* Create `ParentIndexIterator` to reduce iterator boilerplate (#4469)Thomas Lively2022-01-211-13/+4
| | | | | | | Add a utility class for defining all the common operations like pre- and post- increment and decrement, addition and subtraction, and assigning addition and subtraction for iterators that are comprised of a parent object and an index into that parent object. Use the new utility to reduce the boilerplate in wasm-type.h. Add a new test of the iterator behavior.
* Reset global type state between tests (#4468)Thomas Lively2022-01-201-0/+13
| | | | | Add a `destroyAllTypes` function to clear the global state of the type system and use it in a custom gtest test fixture to ensure that each test starts and ends with a fresh state.