summaryrefslogtreecommitdiff
path: root/src/wasm
Commit message (Collapse)AuthorAgeFilesLines
* [Wasm GC] Properly represent nulls in i31 (#4819)Alon Zakai2022-07-251-6/+6
| | | | | The encoding here is simple: we store i31 values in the literal.i32 field. The top bit says if a value exists, which means literal.i32 == 0 is the same as null.
* [Strings] GC variants for string.encode (#4817)Alon Zakai2022-07-214-3/+45
|
* Remove basic reference types (#4802)Thomas Lively2022-07-208-240/+106
| | | | | | | | | Basic reference types like `Type::funcref`, `Type::anyref`, etc. made it easy to accidentally forget to handle reference types with the same basic HeapTypes but the opposite nullability. In principle there is nothing special about the types with shorthands except in the binary and text formats. Removing these shorthands from the internal type representation by removing all basic reference types makes some code more complicated locally, but simplifies code globally and encourages properly handling both nullable and non-nullable reference types.
* [Strings] Add string.new GC variants (#4813)Alon Zakai2022-07-194-4/+53
|
* [Strings] stringview_wtf16.length (#4809)Alon Zakai2022-07-182-0/+5
| | | | This measures the length of a view, so it seems simplest to make it a sub-operation of the existing measure instruction.
* [Strings] stringview_*.slice (#4805)Alon Zakai2022-07-155-0/+97
| | | | | | | Unfortunately one slice is the same as python [start:end], using 2 params, and the other slice is one param, [CURR:CURR+num] (where CURR is implied by the current state in the iter). So we can't use a single class here. Perhaps a different name would be good, like slice vs substring (like JS does), but I picked names to match the current spec.
* [Strings] stringview access operations (#4798)Alon Zakai2022-07-135-0/+173
|
* [Parser][NFC] Refactor to use context callbacks (#4799)Thomas Lively2022-07-121-442/+514
| | | | | | | | | | | | | | | | | The parser functions previously both parsed the input and controlled what was done with the results using `constexpr` if-else chains. As the number of parsing contexts grew, these if-else chains became increasingly complex and distracting from the core parsing logic of the parsing functions. To simplify the code, refactor the parsing functions to replace the `constexpr` if-else chains with unconditional calls to methods on the context. To avoid duplicating most method definitions for multiple parsing contexts, introduce new utility contexts that implement common methods and (ab)use inheritance and multiple inheritance to reuse their methods from the main parsing contexts. This change will also make it easier to reuse the parser code for entirely different purposes in the future by providing new context implementations. For example, V8 could reuse the code and provide different parser contexts that construct V8-internal data structures rather than Binaryen data structures.
* [Strings] string.as (#4797)Alon Zakai2022-07-125-0/+68
|
* [Parser] Start to parse instructions (#4789)Thomas Lively2022-07-111-28/+890
| | | | | | | | | | | | | | | | | | | | | Update gen-s-parser.py to produce a second version of its parsing code that works with the new wat parser. The new version automatically replaces the `s` element argument in the existing parser with the `ctx` and `in` arguments used by the new parser, so adding new instructions will not require any additional work in gen-s-parser.py after this change. Also add stub `make***` functions to the new wat parser, with a few filled out, namely `makeNop`, `makeUnreachable`, `makeConst`, and `makeRefNull`. Update the `global` parser to parse global initializer instructions and update wat-kitchen-sink.wast to demonstrate that the instructions are parsed correctly. Adding new instruction classes will require adding a new `make***` function to wat-parser.cpp in additional to wasm-s-parser.{h,cpp} after this change, but adding a trivial failing implementation is good enough for the time being, so I don't expect this to appreciably increase our maintenance burden in the near term. The infrastructure for parsing folded instructions, instructions with operands, and control flow instructions will be implemented in future PRs.
* [Parser] Parse rec groups (#4785)Thomas Lively2022-07-081-8/+42
|
* [Strings] string.is_usv_sequence (#4783)Alon Zakai2022-07-082-0/+5
| | | | | | | This implements it as a StringMeasure opcode. They do have the same number of operands, same trapping behavior, and same return type. They both get a string and do some inspection of it to return an i32. Perhaps the name could be StringInspect or something like that, rather than StringMeasure..? But I think for now this might be good enough, and the spec may change anyhow later.
* [Strings] string.eq (#4781)Alon Zakai2022-07-084-0/+30
|
* [Parser] Parse standard subtype declarations (#4778)Thomas Lively2022-07-081-49/+92
| | | | Parse type definitions with the format `(type $t (sub $super ...))`. Update the test to use hybrid types so that the subtypes are reflected in the test output.
* [Strings] string.concat (#4777)Alon Zakai2022-07-084-0/+31
|
* [Strings] string.encode (#4776)Alon Zakai2022-07-074-0/+74
|
* Group reference types in binary format. (#4774)Alon Zakai2022-07-071-0/+23
| | | | | | | | | | | | Grouping all references together makes it easier for baseline compilers to zero out memory (as the zeroing out may be different for MVP types vs. references). This puts all references together, either at the start or the end. As a heuristic for that we see if the first local is a reference. As the optimizer will sort locals by frequency, this ensures that the most-frequent local stays in index 0. Fixes #4773. See more details there
* [Strings] string.measure (#4775)Alon Zakai2022-07-074-6/+77
|
* [Strings] Add string.const (#4768)Alon Zakai2022-07-064-0/+113
| | | | | This is more work than a typical instruction because it also adds a new section: all the (string.const "foo") strings are put in a new "strings" section in the binary, and the instructions refer to them by index.
* [Strings] Add feature flag for Strings proposal (#4766)Alon Zakai2022-06-303-0/+10
|
* [Strings] Print shorthand types where possible (#4763)Alon Zakai2022-06-291-1/+17
|
* [Strings] Add string.new* instructions (#4761)Alon Zakai2022-06-294-0/+80
| | | | | | This is the first instruction from the Strings proposal. This includes everything but interpreter support.
* [Strings] Add string proposal types (#4755)Alon Zakai2022-06-294-0/+91
| | | | | | | | This starts to implement the Wasm Strings proposal https://github.com/WebAssembly/stringref/blob/main/proposals/stringref/Overview.md This just adds the types.
* [Parser] Parse struct and array types (#4745)Thomas Lively2022-06-221-13/+176
| | | | | | | | | Parse struct and array type definitions along with field names. Only the most basic definitions are parsed for now; subtype definitions (both nominal prototype and standard formats) and recursion groups are left to follow-on PRs. Since there is no official standard for the text format for GC type definitions, attempt to define a grammar that allows abbreviations that we already use widely, such as making `(field ... )` optional except for named fields.
* First class Data Segments (#4733)Ashley Nelson2022-06-215-84/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Updating wasm.h/cpp for DataSegments * Updating wasm-binary.h/cpp for DataSegments * Removed link from Memory to DataSegments and updated module-utils, Metrics and wasm-traversal * checking isPassive when copying data segments to know whether to construct the data segment with an offset or not * Removing memory member var from DataSegment class as there is only one memory rn. Updated wasm-validator.cpp * Updated wasm-interpreter * First look at updating Passes * Updated wasm-s-parser * Updated files in src/ir * Updating tools files * Last pass on src files before building * added visitDataSegment * Fixing build errors * Data segments need a name * fixing var name * ran clang-format * Ensuring a name on DataSegment * Ensuring more datasegments have names * Adding explicit name support * Fix fuzzing name * Outputting data name in wasm binary only if explicit * Checking temp dataSegments vector to validateBinary because it's the one with the segments before we processNames * Pass on when data segment names are explicitly set * Ran auto_update_tests.py and check.py, success all around * Removed an errant semi-colon and corrected a counter. Everything still passes * Linting * Fixing processing memory names after parsed from binary * Updating the test from the last fix * Correcting error comment * Impl kripken@ comments * Impl tlively@ comments * Updated tests that remove data print when == 0 * Ran clang format * Impl tlively@ comments * Ran clang-format
* Do not emit recursion groups without GC enabled (#4738)Thomas Lively2022-06-181-2/+7
| | | | | | | | We emit nominal types as a single large recursion group, but this produces invalid modules when --nominal or --hybrid was used without GC enabled. Fix the bug by always emitting types as though they were structural (i.e. without recursion groups) when GC is not enabled. Fixes #4723.
* Fix table exporting (#4736)Alon Zakai2022-06-171-1/+2
| | | | | | | This code was apparently not updated when we added multi-table support, and still had the old hardcoded index 0. Fixes #4711
* [Parser][NFC] Small code cleanups (#4729)Thomas Lively2022-06-143-8/+8
| | | | Apply cleanups suggested by aheejin in post-merge code review of previous parser PRs.
* Fix an unused variable warning (#4728)walkingeyerobot2022-06-141-0/+1
|
* [NFC] Optimize non-equirecursive LUB calculations (#4722)Thomas Lively2022-06-141-89/+148
| | | | | | | | | | | | | | | | | | | Equirecursive LUB calculations potentially require building new recursive heap types that did not already exist in the system, so they have a complicated code path that uses a TypeBuilder to construct a LUB from the ground up. In contrast, nominal and isorecursive LUB calculations never introduce new heap types, so computing their LUBs is much simpler. Previously we were using the same code path with the TypeBuilder for all type systems out of convenience, but this commit factors out the LUB calculations for nominal and isorecursive types into a separate code path that does not use a TypeBuilder. Not only should this make LUB calculations faster for GC workloads, it also avoids a mysterious race condition during parallel LUB calculations with isorecursive types that resulted in a temporary type escaping from one thread and being used-after-free from another thread. It would be good to fix that bug properly, but it is very difficult to investigate. Sweeping it under the rug instead is the best trade off for now. Fixes #4719.
* [Parser] Parse function types (#4718)Thomas Lively2022-06-141-11/+369
| | | | | | Begin implementing the second phase of parsing, parsing of type definitions. Extend `valtype` to parse both user-defined and built in ref types, add `type` as a top-level module field, and implement parsers for params, results, and functype definitions.
* [Parser] Begin parsing modules (#4716)Thomas Lively2022-06-103-4/+668
| | | | | | | | | | | Implement the basic infrastructure for the full WAT parser with just enough detail to parse basic modules that contain only imported globals. Parsing functions correspond to elements of the grammar in the text specification and are templatized over context types that correspond to each phase of parsing. Errors are explicitly propagated via `Result<T>` and `MaybeResult<T>` types. Follow-on PRs will implement additional phases of parsing and parsing for new elements in the grammar.
* Update relaxed SIMD instructionsThomas Lively2022-06-073-17/+1
| | | | | Update the opcodes for all relaxed SIMD instructions and remove the unsigned dot product instructions that are no longer in the proposal.
* [Parser] Token classification (#4699)Thomas Lively2022-06-011-22/+151
| | | | | | | | | | | | Add methods to `Token` for determining whether the token can be interpreted as a particular token type, returning the interpreted value as appropriate. These methods perform additional bounds checks for integers and NaN payloads that could not be done during the initial lexing because the lexer did not know what the intended token type was. The float methods also reinterpret integer tokens as floating point tokens since the float grammar is a superset of the integer grammar and inject the NaN payloads into parsed NaN values. Move all bounds checking to these new classifier functions to have it in one place.
* [NFC] Refactor EHUtils::findPops() method (#4704)Alon Zakai2022-06-011-25/+2
| | | | This moves it out of the validator so it can be used elsewhere. It will be used in #4685
* wasm-emscripten-finalize: Improve detection of mainReadsParams (#4701)Sam Clegg2022-05-311-6/+10
| | | | | | The first way to should detect this is if the main function actually doesn't take any params. They we fallback to looking deeper. In preparation for https://reviews.llvm.org/D75277
* Remove renameMainArgcArgv from wasm-emscripten-finalize (#4700)Sam Clegg2022-05-311-12/+0
| | | | | | | | | This part to finalize is currently not used and was added in preparation for https://reviews.llvm.org/D75277. However, the better solution to dealing with this alternative name for main is on the emscripten side. The main reason for this is that doing the rename here in binaryen would require finalize to always re-write the binary, which is expensive.
* [Parser] Replace Signedness with ternary Sign (#4698)Thomas Lively2022-05-271-23/+21
| | | | | | | | Previously we were tracking whether integer tokens were signed but we did not differentiate between positive and negative signs. Unfortunately, without differentiating them, there's no way to tell the difference between an in-bounds negative integer and a wildly out-of-bounds positive integer when trying to perform bounds checks for s32 tokens. Fix the problem by tracking not only whether there is a sign on an integer token, but also what the sign is.
* [Parser][NFC] Improve comments about default NaN payloads (#4697)Thomas Lively2022-05-271-0/+5
|
* [Parser][NFC] Remove extraneous braces from std::optional returns (#4696)Thomas Lively2022-05-271-9/+9
|
* [Parser][NFC] Create a public wat-lexer.h header (#4695)Thomas Lively2022-05-272-241/+101
| | | | | | wat-parser-internal.h was already quite large after implementing just the lexer, so it made sense to rename it to be lexer-specific and start a new file for the higher-level parser. Also make it a proper .cpp file and split the testable interface out into wat-lexer.h.
* [EH] Export tags (#4691)Heejin Ahn2022-05-262-1/+2
| | | | | | | | | | | | | | | | | This adds exported tags to `exports` section in wasm-emscripten-finalize metadata so Emscripten can use it. Also fixes a bug in the parser. We have only recognized the export format of ```wasm (tag $e2 (param f32)) (export "e2" (tag $e2)) ``` and ignored this format: ```wasm (tag $e1 (export "e1") (param i32)) ``` Companion patch: https://github.com/emscripten-core/emscripten/pull/17064
* [Parser][NFC] Clarify escaped string lexing (#4694)Thomas Lively2022-05-261-24/+27
| | | | Improve comments and variable names to make it clear that we allocate and build a separate string only when necessary to handle escape sequences.
* [Parser] Lex floating point values (#4693)Thomas Lively2022-05-261-25/+231
| | | | | | | | | | | | | | | Rather than trying to actually implement the parsing of float values, which cannot be done naively due to precision concerns, just parse the float grammar then postprocess the parsed text into a form we can pass to `strtod` to do the actual parsing of the value. Since the float grammar reuses `num` and `hexnum` from the integer grammar but does not care about overflow, add a mode to `LexIntCtx`, `num`, and `hexnum` to allow parsing overflowing numbers. For NaNs, store the payload as a separate value rather than as part of the parsed double. The payload will be injected into the NaN at a higher level of the parser once we know whether we are parsing an f64 or an f32 and therefore know what the allowable payload values are.
* [Parser] Lex keywords (#4688)Thomas Lively2022-05-251-1/+36
| | | | | Also include reserved words that look like keywords to avoid having to find and enumerate all the valid keywords. Invalid keywords will be rejected at a higher level in the parser instead.
* [Parser] Lex strings (#4687)Thomas Lively2022-05-251-1/+164
|
* [Parser] Lex idchar and identifiers (#4686)Thomas Lively2022-05-251-3/+82
|
* [Wasm GC] Fix CFG traversal of call_ref and add missing validation check (#4690)Alon Zakai2022-05-251-0/+32
| | | | | | | | We were missing CallRef in the CFG traversal code in a place where we note possible exceptions. As a result we thought CallRef cannot throw, and were missing some control flow edges. To actually detect the problem, we need to validate non-nullable locals properly, which we were not doing. This adds that as well.
* [Parser] Start a new text format parser (#4680)Thomas Lively2022-05-241-0/+557
| | | | | | | | | | | | | | | | | | | | | | | | | Begin implementing a new text format parser that will accept the standard text format. Start with a lexer that can iterate over tokens in an underlying text buffer. The initial supported tokens are integers, parentheses, and whitespace including comments. The implementation is in a new private internal header so it can be included into a gtest source file even though it is not meant to be a public API. Once the parser is more complete, there will be an additional public header exposing a more concise public API and the private header will be included into a source file that implements that public API. The new parser will improve on the existing text format parser not only because it will accept the full standard text format, but also because its code will be simpler and easier to maintain and because it will hopefully be faster as well. The new parser will be built out of small functions that closely mirror the grammar productions given in the spec and will heavily use C++17 features like string_view, optional, and variant to provide more self-documenting and efficient code. Future PRs will add support for lexing other kinds of tokens followed by support for parsing more complex constructs.
* Fix binary parsing of the prototype nominal format (#4679)Thomas Lively2022-05-191-3/+0
| | | | | | We were checking that nominal modules only had a single element in their type sections, but that's not correct for the prototype nominal binary format we still want to support. The test for this missed catching the bug because it wasn't actually parsing in nominal mode.