forks/binaryen.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	[NFC] Make MemoryOrder parameters non-optional (#7171)	Thomas Lively	2024-12-21	1	-1/+2
\| \| \| \| \| \|	Update Builder and IRBuilder makeStructGet and makeStructSet functions to require the memory order to be explicitly supplied. This is slightly more verbose, but will reduce the chances that we forget to properly consider synchronization when implementing new features in the future.
*	Fix UBSan on CI (#7173)	Thomas Lively	2024-12-20	1	-7/+7
\| \| \| \| \| \| \| \| \| \|	The UBSan builder started failing with an error about a misaligned store in wasm-ctor-eval.cpp. The store was already done via `memcpy` to avoid alignment issues, but apparently this is no longer enough. Use `void*` as the destination type to further avoid giving the impression of guaranteed alignment. Also fix UB when executing std::abs on minimum negative integers in literal.cpp.
*	Rename indexType -> addressType. NFC (#7060)	Sam Clegg	2024-11-07	1	-1/+1
\| \| \|	See https://github.com/WebAssembly/memory64/pull/92
*	[wasm64] Make interpreter table methods operate on Address, not Index (#7062)	Alon Zakai	2024-11-07	1	-3/+4
\| \| \|	This allows 64-bit bounds checking to work properly.
*	[wasm64] Fix wasm-ctor-eval + utils on 64-bit indexes for memory64 (#7059)	Alon Zakai	2024-11-06	1	-3/+5
\| \| \| \|	Some places assumed a 32-bit index.
*	[NFC] Use RAII to manage call depth tracking in the interpreter (#7049)	Alon Zakai	2024-11-01	1	-1/+1
\| \| \| \| \| \| \|	The old code manually managed it for no good reason that I can see. After this, there is no difference between callFunction and callFunctionInternal, so fold them together.
*	Replace the old topological sort everywhere (#6902)	Thomas Lively	2024-09-10	1	-27/+9
\| \| \| \| \| \| \| \| \|	To avoid having two separate topological sort utilities in the code base, replace remaining uses of the old DFS-based, CRTP topological sort with the newer Kahn's algorithm implementation. This would be NFC, except that the new topological sort produces a different order than the old topological sort, so the output of some passes is reordered.
*	Add a --preserve-type-order option (#6916)	Thomas Lively	2024-09-10	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unlike other module elements, types are not stored on the `Module`. Instead, they are collected by traversing the IR before printing and binary writing. The code that collects the types tries to optimize the order of rec groups based on the number of times each type is used. As a result, the output order of types generally has no relation to the input order of types. In addition, most type optimizations rewrite the types into a single large rec group, and the order of types in that group is essentially arbitrary. Changes to the code for counting type uses, sorting types, or sorting rec groups can yield very large changes in the output order of types, producing test diffs that are hard to review and potentially harming the readability of tests by moving output types away from the corresponding input types. To help make test output more stable and readable, introduce a tool option that causes the order of output types to match the order of input types as closely as possible. It is implemented by having the parsers record the indices of the input types on the `Module` just like they already record the type names. The `GlobalTypeRewriter` infrastructure used by type optimizations associates the new types with the old indices just like it already does for names and also respects the input order when rewriting types into a large recursion group. By default, wasm-opt and other tools clear the recorded type indices after parsing the module, so their default behavior is not modified by this change. Follow-on PRs will use the new flag in more tests, which will generate large diffs but leave the tests in stable, more readable states that will no longer change due to other changes to the optimizing type sorting logic.
*	[NFC] Rename the old topological sort utility (#6914)	Thomas Lively	2024-09-06	1	-2/+2
\| \| \| \|	This will allow both the old and new topological sort utilities to be included into the same .cpp file while we phase out the old utility.
*	Fix direct comparisons with unshared basic heap types (#6845)	Thomas Lively	2024-08-16	1	-3/+5
\| \| \| \| \|	Audit the remaining ocurrences of `== HeapType::` and fix those that did not handle shared types correctly. Add tests for some of the fixes; others are NFC but clarify the code.
*	Implement table.init (#6827)	Alon Zakai	2024-08-16	1	-9/+15
\| \| \| \| \|	Also use TableInit in the interpreter to initialize module's table state, which will now handle traps properly, fixing #6431
*	Restore isString type methods (#6815)	Thomas Lively	2024-08-06	1	-2/+1
\| \| \| \| \| \| \| \| \|	PR ##6803 proposed removing Type::isString and HeapType::isString in favor of more explicit, verbose callsites. There was no consensus to make this change, but it was accidentally committed as part of #6804. Revert the accidental change, except for the useful, noncontroversial parts, such as fixing the `isString` implementation and a few other locations to correctly handle shared types.
*	[NFC] Add HeapType::getKind returning a new HeapTypeKind enum (#6804)	Thomas Lively	2024-08-06	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The HeapType API has functions like `isBasic()`, `isStruct()`, `isSignature()`, etc. to test the classification of a heap type. Many users have to call these functions in sequence and handle all or most of the possible classifications. When we add a new kind of heap type, finding and updating all these sites is a manual and error-prone process. To make adding new heap type kinds easier, introduce a new API that returns an enum classifying the heap type. The enum can be used in switch statements and the compiler's exhaustiveness checker will flag use sites that need to be updated when we add a new kind of heap type. This commit uses the new enum internally in the type system, but follow-on commits will add new uses and convert uses of the existing APIs to use `getKind` instead.
*	Rename external conversion instructions (#6716)	Jérôme Vouillon	2024-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \|	Rename instructions `extern.internalize` into `any.convert_extern` and `extern.externalize` into `extern.convert_any` to follow more closely the spec. This was changed in https://github.com/WebAssembly/gc/issues/432. The legacy name is still accepted in text inputs and in the C and JS APIs.
*	[Strings] Remove stringview types and instructions (#6579)	Thomas Lively	2024-05-15	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The stringview types from the stringref proposal have three irregularities that break common invariants and require pervasive special casing to handle properly: they are supertypes of `none` but not subtypes of `any`, they cannot be the targets of casts, and they cannot be used to construct nullable references. At the same time, the stringref proposal has been superseded by the imported strings proposal, which does not have these irregularities. The cost of maintaing and improving our support for stringview types is no longer worth the benefit of supporting them. Simplify the code base by entirely removing the stringview types and related instructions that do not have analogues in the imported strings proposal and do not make sense in the absense of stringviews. Three remaining instructions, `stringview_wtf16.get_codeunit`, `stringview_wtf16.slice`, and `stringview_wtf16.length` take stringview operands in the stringref proposal but cannot be removed because they lower to operations from the imported strings proposal. These instructions are changed to take stringref operands in Binaryen IR, and to allow a graceful upgrade path for users of these instructions, the text and binary parsers still accept but ignore `string.as_wtf16`, which is the instruction used to convert stringrefs to stringviews. The binary writer emits code sequences that use scratch locals and `string.as_wtf16` to keep the output valid. Future PRs will further align binaryen with the imported strings proposal instead of the stringref proposal, for example by making `string` a subtype of `extern` instead of a subtype of `any` and by removing additional instructions that do not have analogues in the imported strings proposal.
*	[StackIR] Run StackIR during binary writing and not as a pass (#6568)	Alon Zakai	2024-05-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we had passes --generate-stack-ir, --optimize-stack-ir, --print-stack-ir that could be run like any other passes. After generating StackIR it was stashed on the function and invalidated if we modified BinaryenIR. If it wasn't invalidated then it was used during binary writing. This PR switches things so that we optionally generate, optimize, and print StackIR only during binary writing. It also removes all traces of StackIR from wasm.h - after this, StackIR is a feature of binary writing (and printing) logic only. This is almost NFC, but there are some minor noticeable differences: 1. We no longer print has StackIR in the text format when we see it is there. It will not be there during normal printing, as it is only present during binary writing. (but --print-stack-ir still works as before; as mentioned above it runs during writing). 2. --generate/optimize/print-stack-ir change from being passes to being flags that control that behavior instead. As passes, their order on the commandline mattered, while now it does not, and they only "globally" affect things during writing. 3. The C API changes slightly, as there is no need to pass it an option "optimize" to the StackIR APIs. Whether we optimize is handled by --optimize-stack-ir which is set like other optimization flags on the PassOptions object, so we don't need the old option to those C APIs. The main benefit here is simplifying the code, so we don't need to think about StackIR in more places than just binary writing. That may also allow future improvements to our usage of StackIR.
*	[Strings] wasm-ctor-eval: Stop on seeing a string view, which we cannot ↵	Alon Zakai	2024-04-29	1	-0/+8
\| \| \| \|	precompute (#6561)
*	Do not add an extra null character when reading files (#6538)	Thomas Lively	2024-04-24	1	-2/+0
\| \| \| \| \| \| \| \|	The new wat parser currently considers itself to be at the end of the file whenever it cannot lex another token. This is not quite right, but fixing it causes parser errors because of the extra null character we were appending to files when we read them. This null character is not useful since we can already read files as `std::string`, which always has an implicit null character, so remove it. Clean up some users of `read_file` while we're at it.
*	Handle return calls correctly	Thomas Lively	2024-04-08	1	-68/+126
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a combined commit covering multiple PRs fixing the handling of return calls in different areas. The PRs are all landed as a single commit to ensure internal consistency and avoid problems with bisection. Original PR descriptions follow: * Fix inlining of `return_call` (#6448) Previously we transformed return calls in inlined function bodies into normal calls followed by branches out to the caller code. Similarly, when inlining a `return_call` callsite, we simply added a `return` after the body inlined at the callsite. These transformations would have been correct if the semantics of return calls were to call and then return, but they are not correct for the actual semantics of returning and then calling. The previous implementation is observably incorrect for return calls inside try blocks, where the previous implementation would run the inlined body within the try block, but the proper semantics would be to run the inlined body outside the try block. Fix the problem by transforming inlined return calls to branches followed by calls rather than as calls followed by branches. For the case of inlined return call callsites, insert branches out of the original body of the caller and inline the body of the callee as a sibling of the original caller body. For the other case of return calls appearing in inlined bodies, translate the return calls to branches out to calls inserted as siblings of the original inlined body. In both cases, it would have been convenient to use multivalue block return to send call parameters along the branches to the calls, but unfortunately in our IR that would have required tuple-typed scratch locals to unpack the tuple of operands at the call sites. It is simpler to just use locals to propagate the operands in the first place. Fix interpretation of `return_call` (#6451) We previously interpreted return calls as calls followed by returns, but that is not correct both because it grows the size of the execution stack and because it runs the called functions in the wrong context, which can be observable in the case of exception handling. Update the interpreter to handle return calls correctly by adding a new `RETURN_CALL_FLOW` that behaves like a return, but carries the arguments and reference to the return-callee rather than normal return values. `callFunctionInternal` is updated to intercept this flow and call return-called functions in a loop until a function returns with some other kind of flow. Pull in the upstream spec tests return_call.wast, return_call_indirect.wast, and return_call_ref.wast with light editing so that we parse and validate them successfully. Handle return calls in wasm-ctor-eval (#6464) When an evaluated export ends in a return call, continue evaluating the return-called function. This requires propagating the parameters, handling the case that the return-called function might be an import, and fixing up local indices in case the final function has different parameters than the original function. * Update effects.h to handle return calls correctly (#6470) As far as their surrounding code is concerned return calls are no different from normal returns. It's only from a caller's perspective that a function containing a return call also has the effects of the return-callee. To model this more precisely in EffectAnalyzer, stash the throw effect of return-callees on the side and only merge it in at the end when analyzing the effects of a full function body.
*	wasm-ctor-eval: Properly eval strings (#6276)	Alon Zakai	2024-02-05	1	-8/+3
\| \| \| \| \| \| \|	#6244 tried to do this but was not quite right. It treated a string like an array or a struct, which means create a global for it. But just creating a global isn't enough, as it needs to also be sorted in the right place etc. which requires changes in other places. But there is a much simpler solution here: string constants are just constants, which we can emit in-line, so do that.
*	wasm-ctor-eval: Eval strings (#6244)	Alon Zakai	2024-01-25	1	-0/+6
\|
*	Fix handling of exported imported functions (#6044)	Alon Zakai	2023-10-24	1	-1/+10
\| \| \| \| \| \| \| \|	Two trivial places did not handle that case, and assumed an exported function was actually defined (and not imported). Also add some const stuff to fix compilation after this change. This was discovered by #6026
*	wasm-ctor-eval: Limit memory to a reasonable amount (#5896)	Alon Zakai	2023-08-23	1	-0/+11
\| \| \| \| \| \|	In practice we don't need high addresses, and when they happen the current implementation can OOM, so exit early on them instead. Fixes #5893
*	[Wasm GC] wasm-ctor-eval: Handle cycles of data (#5685)	Alon Zakai	2023-05-05	1	-57/+376
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A cycle of data is something we can't just naively emit as wasm globals. If at runtime we end up, for example, with an object A that refers to itself, then we can't just emit (global $A (struct.new $A (global.get $A))) The struct.get is of this very global, and such a self-reference is invalid. So we need to break such cycles as we emit them. The simple idea used here is to find paths in the cycle that are nullable and mutable, and replace the initial value with a null that is fixed up later in the start function: (global $A (struct.new $A (ref.null $A))) (func $start (struct.set (global.get $A) (global.get $A))) ) This is not optimal in terms of breaking cycles, but it is fast (linear time) and simple, and does well in practice on j2wasm (where cycles in fact occur).
*	[Wasm GC] Allow extern.externalize in globals (#5585)	Alon Zakai	2023-03-17	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	This fixes wasm-ctor-eval on evalling a GC data structure that contains a field initialized with an externalized value. Per the spec this is a constant instruction and I verified that V8 allows this. Also add missing validation in wasm-ctor-eval of the output (which makes debugging this kind of thing a little easier).
*	[Wasm GC] wasm-ctor-eval: Handle externalized data (#5582)	Alon Zakai	2023-03-16	1	-4/+25
\|
*	[NFC] Internally rename `ArrayInit` to `ArrayNewFixed` (#5526)	Thomas Lively	2023-02-28	1	-1/+1
\| \| \| \| \| \| \| \|	To match the standard instruction name, rename the expression class without changing any parsing or printing behavior. A follow-on PR will take care of the functional side of this change while keeping support for parsing the old name. This change will allow `ArrayInit` to be used as the expression class for the upcoming `array.init_data` and `array.init_elem` instructions.
*	[wasm-ctor-eval] Properly handle multiple ctors with GC (#5522)	Alon Zakai	2023-02-24	1	-8/+22
\| \| \| \| \| \| \| \| \| \|	Before, a single ctor with GC worked, but any subsequent ones simply dropped the globals from the previous ones, because we were missing an addGlobal in an important place. Also, we can get confused about which global names are in use in the module, so fix that as well by storing them directly (we keep removing and re-adding globals, so we can't use the normal module mechanism to find which names are in use).
*	[wasm-ctor-eval] Stop evalling at table.set for now (#5516)	Alon Zakai	2023-02-23	1	-0/+7
\| \| \| \|	Until we get full support for serializing table changes, stop evalling so we do not break things.
*	[wasm-ctor-eval] Add v128 load/store support (#5512)	Alon Zakai	2023-02-23	1	-0/+8
\|
*	[wasm-ctor-eval] Add support for multivalue serialization and a quiet mode ↵	Alon Zakai	2023-02-23	1	-18/+56
\| \| \| \| \| \| \| \| \| \|	(#5510) Simply loop over the values and use tuple.make. This also adds a lit test for ctor-eval. I found that the problem blocking us before was the logging, which confuses the update script. As this test at least does not require that logging, this PR adds a --quiet flag that disables the logging, and then a lit test just works.
*	Make `Name` a pointer, length pair (#5122)	Thomas Lively	2022-10-11	1	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the goal of supporting null characters (i.e. zero bytes) in strings. Rewrite the underlying interned `IString` to store a `std::string_view` rather than a `const char`, reduce the number of map lookups necessary to intern a string, and present a more immutable interface. Most importantly, replace the `c_str()` method that returned a `const char` with a `toString()` method that returns a `std::string`. This new method can correctly handle strings containing null characters. A `const char` can still be had by calling `data()` on the `std::string_view`, although this usage should be discouraged. This change is NFC in spirit, although not in practice. It does not intend to support any particular new functionality, but it is probably now possible to use strings containing null characters in at least some cases. At least one parser bug is also incidentally fixed. Follow-on PRs will explicitly support and test strings containing nulls for particular use cases. The C API still uses `const char` to represent strings. As strings containing nulls become better supported by the rest of Binaryen, this will no longer be sufficient. Updating the C and JS APIs to use pointer, length pairs is left as future work.
*	Implement bottom heap types (#5115)	Thomas Lively	2022-10-07	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These types, `none`, `nofunc`, and `noextern` are uninhabited, so references to them can only possibly be null. To simplify the IR and increase type precision, introduce new invariants that all `ref.null` instructions must be typed with one of these new bottom types and that `Literals` have a bottom type iff they represent null values. These new invariants requires several additional changes. First, it is now possible that the `ref` or `target` child of a `StructGet`, `StructSet`, `ArrayGet`, `ArraySet`, or `CallRef` instruction has a bottom reference type, so it is not possible to determine what heap type annotation to emit in the binary or text formats. (The bottom types are not valid type annotations since they do not have indices in the type section.) To fix that problem, update the printer and binary emitter to emit unreachables instead of the instruction with undetermined type annotation. This is a valid transformation because the only possible value that could flow into those instructions in that case is null, and all of those instructions trap on nulls. That fix uncovered a latent bug in the binary parser in which new unreachables within unreachable code were handled incorrectly. This bug was not previously found by the fuzzer because we generally stop emitting code once we encounter an instruction with type `unreachable`. Now, however, it is possible to emit an `unreachable` for instructions that do not have type `unreachable` (but are known to trap at runtime), so we will continue emitting code. See the new test/lit/parse-double-unreachable.wast for details. Update other miscellaneous code that creates `RefNull` expressions and null `Literals` to maintain the new invariants as well.
*	Changing Fatal() to assert() (#4982)	Ashley Nelson	2022-09-09	1	-3/+1
\| \| \|	Replacing Fatal() call sites in src/shell-interface.h & src/tools/wasm-ctor-eval.cpp that were added in the Multi-Memories PR with assert()
*	Mutli-Memories Support in IR (#4811)	Ashley Nelson	2022-08-17	1	-37/+63
\| \| \| \| \| \| \|	This PR removes the single memory restriction in IR, adding support for a single module to reference multiple memories. To support this change, a new memory name field was added to 13 memory instructions in order to identify the memory for the instruction. It is a goal of this PR to maintain backwards compatibility with existing text and binary wasm modules, so memory indexes remain optional for memory instructions. Similarly, the JS API makes assumptions about which memory is intended when only one memory is present in the module. Another goal of this PR is that existing tests behavior be unaffected. That said, tests must now explicitly define a memory before invoking memory instructions or exporting a memory, and memory names are now printed for each memory instruction in the text format. There remain quite a few places where a hardcoded reference to the first memory persist (memory flattening, for example, will return early if more than one memory is present in the module). Many of these call-sites, particularly within passes, will require us to rethink how the optimization works in a multi-memories world. Other call-sites may necessitate more invasive code restructuring to fully convert away from relying on a globally available, single memory pointer.
*	Remove RTTs (#4848)	Thomas Lively	2022-08-05	1	-1/+0
\| \| \| \| \| \| \|	RTTs were removed from the GC spec and if they are added back in in the future, they will be heap types rather than value types as in our implementation. Updating our implementation to have RTTs be heap types would have been more work than deleting them for questionable benefit since we don't know how long it will be before they are specced again.
*	First class Data Segments (#4733)	Ashley Nelson	2022-06-21	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Updating wasm.h/cpp for DataSegments * Updating wasm-binary.h/cpp for DataSegments * Removed link from Memory to DataSegments and updated module-utils, Metrics and wasm-traversal * checking isPassive when copying data segments to know whether to construct the data segment with an offset or not * Removing memory member var from DataSegment class as there is only one memory rn. Updated wasm-validator.cpp * Updated wasm-interpreter * First look at updating Passes * Updated wasm-s-parser * Updated files in src/ir * Updating tools files * Last pass on src files before building * added visitDataSegment * Fixing build errors * Data segments need a name * fixing var name * ran clang-format * Ensuring a name on DataSegment * Ensuring more datasegments have names * Adding explicit name support * Fix fuzzing name * Outputting data name in wasm binary only if explicit * Checking temp dataSegments vector to validateBinary because it's the one with the segments before we processNames * Pass on when data segment names are explicitly set * Ran auto_update_tests.py and check.py, success all around * Removed an errant semi-colon and corrected a counter. Everything still passes * Linting * Fixing processing memory names after parsed from binary * Updating the test from the last fix * Correcting error comment * Impl kripken@ comments * Impl tlively@ comments * Updated tests that remove data print when == 0 * Ran clang format * Impl tlively@ comments * Ran clang-format
*	[Wasm GC] [ctor-eval] Evaluate and serialize GC data (#4491)	Alon Zakai	2022-02-03	1	-4/+169
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This ended up simpler than I thought. We can simply emit global and local data as we go, creating globals as necessary to contain GC data, and referring to them using global.get later. That will ensure that data identity works (things referring to the same object in the interpreter will refer to the same object when the wasm is loaded). In more detail, each live GC item is created in a "defining global", a global that is immutable and of the precise type of that data. Then we just read from that location in any place that wants to refer to that data. That is, something like function foo() { var x = Bar(10); var y = Bar(20); var z = x; z.value++; // first object now contains 11 ... } will be evalled into something like var define$0 = Bar(11); // note the ++ has taken effect here var define$1 = Bar(20); function foo() { var x = define$0; var y = define$1; var z = define$0; ... } This PR should handle everything but "cycles", that is, GC data that at runtime ends up forming a loop. Leaving that for later work (not sure how urgent it is to fix).
*	[Docs] Document wasm-ctor-eval (#4493)	Alon Zakai	2022-02-03	1	-3/+2
\|
*	Interpreter: Remove GlobalManager (#4486)	Alon Zakai	2022-01-31	1	-84/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GlobalManager is another class that added complexity in the interpreter logic, and did not help. In fact it hurts extensibility, as when one wants to extend the interpreter one has another class to customize, and it is templated on the main runner, so again as #4479 we end up with annoying template cycles. This simply removes that class. That makes the interpreter code strictly simpler. Applying that change to wasm-ctor-eval also ends up fixing a pre-existing bug, so this PR gets testing through that. The ctor-eval issue was that we did not extend the GlobalManager properly in the past: we checked for accesses on imported globals there, but not in the main class, i.e., not on global.get operations. Needing to do things in two places is an example of the previous complexity. The fix is simply to implement visitGlobalGet in one place, and remove all the GlobalManager logic added in ctor-eval, which then gets a lot simpler as well. The new imported-global-2.wast checks for that bug (a global.get of an import should stop us from evalling). Existing tests cover the other cases, like it being ok to read a non-imported global, etc. The existing test indirect-call3.wast required a slight change: There was a global.get of an imported global, which was ignored in the place it happened (an init of an elem segment); the new code checks all global.gets, so it now catches that.
*	[NFC] Refactor ModuleInstanceBase+RuntimeExpressionRunner into a single ↵	Alon Zakai	2022-01-28	1	-25/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	class (#4479) As recently discussed, the interpreter code is way too complex. Trying to add ctor-eval stuff I need, I got stuck and ended up spending some time to get rid of some of the complexity. We had a ModuleInstanceBase class which was basically an instance of a module, that is, an execution of it. And internally we have RuntimeExpressionRunner which is a runner that integrates with the ModuleInstanceBase - basically, it uses the runtime info to execute code. For example, the MIB has globals info, and the RER would read it from there. But these two classes are really just one functionality - an execution of a module. We get rid of some complexity by removing the separation between them, ending up with a class that can run a module. One set of problems we avoid is that we can now extend the single class in a simple way. Before, we would need to extend both - and inform each other of those changes. That gets "fun" with CRTP which we use everywhere. In other words, each of the two classes depended on the other / would need to be templated on the other. Specifically, MIB.callFunction would need to be given the RER to run with, and so that would need to be templated on it. This ends up leading to a bunch more templating all around - all complexity that we just don't need. See the simplification to the wasm-ctor-eval for some of that (and even worse complexity would have been needed without this PR in the next steps for that tool to eval GC stuff). The final single class is now called ModuleRunner. Also fixes a pre-existing issue uncovered by this PR. We had the delegate target on the runner, but it should be tied to a function scope. This happened to not be a problem if one always created a new runner for each scope, but this PR makes the runner longer-lived, so the stale data ended up mattering. The PR moves that data to the proper place. Note: Diff without whitespace is far, far smaller.
*	LiteralList => Literals (#4451)	Alon Zakai	2022-01-13	1	-3/+3
\| \| \| \| \| \| \|	LiteralList overlaps with Literals, but is less efficient as it is not a SmallVector. Add reserve/capacity methods to SmallVector which are now necessary to compile.
*	[ctor-eval] Eval functions with params if ignoring external input (#4446)	Alon Zakai	2022-01-12	1	-6/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When ignoring external input, assume params have a value of 0. This makes it possible to eval main(argc, argv) if one is careful and does not actually use those values. This is basically a workaround for main always receiving argc/argv, even if the C code has no args (in that case the compiler emits __original_main for the user's main, and wraps it with a main that adds the args, hence the problem). This is similar to the existing support for handling wasi_args_get when ignoring external input, although it just sets values of zeros for the params. Perhaps it could check for main() specifically and return 1 for argc and a proper buffer for argv somehow, but I think if a program wants to use --ignore-external-input it can avoid actually reading argc/argv.
*	[ctor-eval] Followup refactoring to use std::optional for EvalCtorOutcome ↵	Alon Zakai	2022-01-12	1	-21/+16
\| \| \| \|	(#4448)
*	[ctor-eval] Eval functions with a return value (#4443)	Alon Zakai	2022-01-12	1	-24/+49
\| \| \|	This is necessary for e.g. main() which returns an i32.
*	[ctor-eval] Stop if there are any memory.init instructions (#4442)	Alon Zakai	2022-01-11	1	-18/+25
\| \| \| \| \| \| \| \|	This tool depends (atm) on flattening memory segments. That is not compatible with memory.init which cares about segment identities. This changes flatten() only by adding the check for MemoryInit. The rest is unchanged, although I saw the other two params are not needed and I removed them while I was there.
*	[ctor-eval] Add an option to keep some exports (#4441)	Alon Zakai	2022-01-11	1	-15/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default wasm-ctor-eval removes exports that it manages to completely eval (if it just partially evals then the export remains, but points to a function with partially-evalled contents). However, in some cases we do want to keep the export around even so, for example during fuzzing (as the fuzzer wants to call the same exports before and after wasm-ctor-eval runs) and also if there is an ABI we need to preserve (like if we manage to eval all of main()), or if the function returns a value (which we don't support yet, but this is a PR to prepare for that). Specifically, there is now a new option: --kept-exports foo,bar That is a list of exports to keep around. Note that when we keep around an export after evalling the ctor we make the export point to a new function. That new function just contains a nop, so that nothing happens when it is called. But the original function is kept around as it may have other callers, who we do not want to modify.
*	[ctor-eval] Fix evalling of overlapping table segments (#4440)	Alon Zakai	2022-01-11	1	-19/+25
\|
*	[ctor-eval] Partial evaluation (#4438)	Alon Zakai	2022-01-11	1	-23/+179
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This lets us eval part of a function but not all, which is necessary to handle real-world things like __wasm_call_ctors in LLVM output, as that is the single ctor that is exported and it has calls to the actual ctors. To do so, we look for a toplevel block and execute its items one by one, in a FunctionScope. If we stop in the middle, then we are performing a partial eval. In that case, we only remove the parts of the function that we removed, and we also serialize the locals whose values we read from the FunctionScope. For example, consider this: function foo() { return 10; } function __wasm_call_ctors() { var x; x = foo(); x++; // We stop evalling here. import1(); import2(x); } We can eval x = foo() and x++, but we must stop evalling when we reach the first of those imports. The partially-evalled function then looks like this: function __wasm_call_ctors() { var x; x = 11; import1(); import2(x); } That is, we evalled two lines of executing code and simply removed them, and then we wrote out the value of the local at that point, and then the rest of the code in the function is as it used to be.
*	[ctor-eval] Switch logging from stderr to stdout (#4432)	Alon Zakai	2022-01-07	1	-7/+7
\| \| \| \| \|	This logging is central to what this tool does, and not optional, so stdout makes more sense I think. Also, as I'm re-integrating this on the emscripten side, this makes it simpler.