| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are several reasons why a function may not be trained in deterministically.
So to perform quick validation we need to inspect profile.data (another ways requires split to be performed). However as profile.data is a binary file and is not self sufficient, so we cannot currently use it to perform such validation.
Therefore to allow quick check on whether a particular function has been trained in, we need to dump profile.data in a more readable format.
This PR, allows us to output, the list of functions to be kept (in main wasm) and those split functions (to be moved to deferred.wasm) in a readable format, to console.
Added a new option `--print-profile`
- input path to orig.wasm (its the original wasm file that will be used later during split)
- input path to profile.data that we need to output
optionally pass `--unescape`
to unescape the function names
Usage:
```
binaryen\build>bin\wasm-split.exe test\profile_data\MY.orig.wasm --print-profile=test\profile_data\profile.data > test\profile_data\out.log
```
note: meaning of prefixes
`+` => fn to be kept in main wasm
`-` => fn to be split and moved to deferred wasm
|
|
|
|
|
|
|
|
|
| |
Basic reference types like `Type::funcref`, `Type::anyref`, etc. made it easy to
accidentally forget to handle reference types with the same basic HeapTypes but
the opposite nullability. In principle there is nothing special about the types
with shorthands except in the binary and text formats. Removing these shorthands
from the internal type representation by removing all basic reference types
makes some code more complicated locally, but simplifies code globally and
encourages properly handling both nullable and non-nullable reference types.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
This starts to implement the Wasm Strings proposal
https://github.com/WebAssembly/stringref/blob/main/proposals/stringref/Overview.md
This just adds the types.
|
|
|
|
|
|
|
|
|
|
|
| |
Nominal types don't make much sense without GC, and in particular trying to emit
them with typed function references but not GC enabled can result in invalid
binaries because nominal types do not respect the type ordering constraints
required by the typed function references proposal. Making this change was
mostly straightforward, but required fixing the fuzzer to use --nominal only
when GC is enabled and required exiting early from nominal-only optimizations
when GC was not enabled.
Fixes #4756.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Updating wasm.h/cpp for DataSegments
* Updating wasm-binary.h/cpp for DataSegments
* Removed link from Memory to DataSegments and updated module-utils, Metrics and wasm-traversal
* checking isPassive when copying data segments to know whether to construct the data segment with an offset or not
* Removing memory member var from DataSegment class as there is only one memory rn. Updated wasm-validator.cpp
* Updated wasm-interpreter
* First look at updating Passes
* Updated wasm-s-parser
* Updated files in src/ir
* Updating tools files
* Last pass on src files before building
* added visitDataSegment
* Fixing build errors
* Data segments need a name
* fixing var name
* ran clang-format
* Ensuring a name on DataSegment
* Ensuring more datasegments have names
* Adding explicit name support
* Fix fuzzing name
* Outputting data name in wasm binary only if explicit
* Checking temp dataSegments vector to validateBinary because it's the one with the segments before we processNames
* Pass on when data segment names are explicitly set
* Ran auto_update_tests.py and check.py, success all around
* Removed an errant semi-colon and corrected a counter. Everything still passes
* Linting
* Fixing processing memory names after parsed from binary
* Updating the test from the last fix
* Correcting error comment
* Impl kripken@ comments
* Impl tlively@ comments
* Updated tests that remove data print when == 0
* Ran clang format
* Impl tlively@ comments
* Ran clang-format
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Implement the basic infrastructure for the full WAT parser with just enough
detail to parse basic modules that contain only imported globals. Parsing
functions correspond to elements of the grammar in the text specification and
are templatized over context types that correspond to each phase of parsing.
Errors are explicitly propagated via `Result<T>` and `MaybeResult<T>` types.
Follow-on PRs will implement additional phases of parsing and parsing for new
elements in the grammar.
|
|
|
|
|
|
|
|
| |
#4659 adds a testcase with an import of (ref $struct). This could cause an error in
the fuzzer, since it wants to remove imports (because the various fuzzers cannot pass
in custom imports - they want to just run the wasm). When it tries to remove that
import it tries to create a constant for a struct reference, and fails. To fix that, add
enough support to create structs and arrays at least in the simple case where all their
fields are defaultable.
|
|
|
| |
This just moves code around + adds assertions.
|
|
|
|
|
|
|
|
|
| |
This part to finalize is currently not used and was added in preparation
for https://reviews.llvm.org/D75277.
However, the better solution to dealing with this alternative name for
main is on the emscripten side. The main reason for this is that
doing the rename here in binaryen would require finalize to always
re-write the binary, which is expensive.
|
| |
|
|
|
|
|
|
| |
With only reference types but not GC, we cannot easily create a constant
for eqref for example. Only GC adds i31.new etc. To avoid assertions in
the fuzzer, avoid randomly picking (ref eq) etc., that is, keep it nullable
so that we can emit a (ref.null eq) if we need a constant value of that type.
|
|
|
|
|
|
| |
The old code would short-circuit and not do anything after we managed
any reduction in the loop here. That would end up doing entire iterations of
the whole pipeline before removing another element segment, which could
be slow.
|
|
|
|
|
|
| |
Also improve comments.
As suggested in #4647
|
|
|
|
|
|
|
|
|
|
| |
Diff without whitespace is smaller.
We can't emit HeapType::data without GC. Fixing that by switching to func,
another problem was uncovered: makeRefFuncConst had a TODO to handle
the case where we need a function to refer to but have created none yet. In
fact that TODO was done at the end of the function. Fix up the logic in
between to actually get there.
|
|
|
|
|
|
| |
* Don't emit "i31" or "data" if GC is not enabled, as only the GC feature adds those.
* Don't emit "any" without GC either. While it is allowed, fuzzer limitations prevent
this atm (see details in comment - it's fixable).
|
|
|
|
|
|
| |
Remove `Type::externref` and `HeapType::ext` and replace them with uses of
anyref and any, respectively, now that we have unified these types in the GC
proposal. For backwards compatibility, continue to parse `extern` and
`externref` and maintain their relevant C API functions.
|
|
|
|
|
|
| |
Previously we'd only try to remove functions from index 0, so we missed
some opportunities. With this change we still go through all the functions
if things go well, but we start from a deterministic random location in the
vector.
|
|
|
|
| |
Randomly selecting a depth is ok for structural typing, but in nominal it
must match the actual hierarchy of types.
|
|
|
|
|
| |
The same module will have a different type after some transformations, even
though that is not observable, like --roundtrip. Basically, we should not be
comparing types between separate modules, which is what the fuzzer does.
|
|
|
|
| |
Other opcode ends with `Inxm` or `Fnxm` (where n and m are integers),
while `i8x16.swizzle`'s opcode name doesn't have an `I` in there.
|
|
|
|
|
|
|
|
|
|
|
| |
As we recently noted in #4555, that Feature::All and FeatureSet.setAll()
are different is potentially confusing...
I think the best thing is to make them identical. This does that, and adds a
new Feature::AllPossible which is everything possible and not just the
set of all features that are enabled by -all.
This undoes part of #4555 as now the old/simpler code works properly.
|
|
|
|
|
|
| |
Apply the same logic to tuple fields as we do for all other fields,
when checking whether a non-nullable value is valid.
Fixes #4554
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new signature-pruning pass that prunes parameters from
signature types where those parameters are never used in any function
that has that type. This is similar to DeadArgumentElimination but works
on a set of functions, and it can handle indirect calls.
Also move a little code from SignatureRefining into a shared place to
avoid duplication of logic to update signature types.
This pattern happens in j2wasm code, for example if all method functions
for some virtual method just return a constant and do not use the this
pointer.
|
|
|
| |
See https://github.com/WebAssembly/extended-const
|
|
|
|
|
|
|
| |
* use [[noreturn]] available since C++11 instead of compiler-specific attributes
* replace deprecated std::is_pod with is_trivial&&is_standard_layout (also available since C++11/14)
* explicitly capture this in [=] lambdas
* extra const functions in FeatureSet, fix implicit cast warning by using the features field directly
* Use CMAKE_CXX_STANDARD to ensure the C++ standard parameter is set on all targets, remove manual compiler flag workaround.
|
|
|
|
|
| |
Introduce static consts with PassOptions Defaults.
Add assertion to verify that the default options are the Os options.
Also update the text in relevant tests.
|
|
|
|
|
| |
Allow IndexedTypeNameGenerator to be configured with a custom prefix and also
allow it to be parameterized with an explicit fallback generator. This allows
multiple IndexedTypeNameGenerators to be composed together, for example.
|
|
|
|
|
|
|
| |
Add an option for running the asyncify transformation on the primary module
emitted by wasm-split. The idea is that the placeholder functions should be able
to unwind the stack while the secondary module is asynchronously loaded, then
once the placeholder functions have been patched out by the secondary module the
stack should be rewound and end up in the correct secondary function.
|
|
|
|
|
|
|
| |
Add a new fuzz checker to wasm-type-fuzzer that builds copies of the originally
built types, randomly selecting for each child type from all potential sources,
including both the originally built types and the not-yet-built duplicate types.
After building the new types, check that they are indeed identical to the old
types, which means that nothing has gone wrong with canonicalization.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous printing system in the Types API would print the full recursive
structure of a Type or HeapType with special markers using de Bruijn indices to
avoid infinite recursion and a separate special marker for when the size
exceeded an arbitrary upper limit. In practice, the types printed by that system
were not human readable, so all that complexity was not useful.
Replace that system with a new system that always emits a HeapType name rather
than recursing into the structure of inner HeapTypes. Add methods for printing
Types and HeapTypes with custom HeapType name generators. Also add a new
wasm-type-printing.h header with off-the-shelf type name generators that
implement simple naming schemes sufficient for tests and the type fuzzer.
Note that these new printing methods and the old printing methods they augment
are not used for emitting text modules. Printing types as part of expressions
and modules is handled by separate code in Print.cpp and the printing API
modified in this PR is mostly used for debugging. However, the new printing
methods are general enough that Print.cpp should be able to use them as well, so
update the format used to print types in the modified printing system to match
the text format in anticipation of making that change in a follow-up PR.
|
|
|
|
| |
This makes it easier to get an overview of what methods exist by looking at the
shorter struct definition.
|
|
|
|
|
|
|
|
|
|
| |
Add support for isorecursive types to wasm-fuzz-types by generating recursion
groups and ensuring that children types are only selected from candidates
through the end of the current group. For non-isorecursive systems, treat all
the types as belonging to a single group so that their behavior is unchanged.
Also fix two small bugs found by the fuzzer: LUB calculation was taking the
wrong path for isorecursive types and isorecursive validation was not handling
basic heap types properly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These might help reduction. Most newer passes, like say --type-refining, are not
going to actually help by themselves without other passes, so those are not added
(they get run in the -O2 etc. modes, which at least gives them a chance to help).
DeadArgumentElimination: Might help by itself, if just removing arguments reduces
code size. In some cases applying constants may increase code size, though, but
the -optimizing variant helps there.
GlobalTypeOptimization: This can remove type fields which can shrink the type
section by a lot. This is the reason I realized I should open this PR, when I
happened to notice that running that pass manually after reduction helped a lot more.
SimplifyGlobals: Can remove unused globals, merge identical immutable ones,
etc., all of which can help code size directly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This ended up simpler than I thought. We can simply emit global and
local data as we go, creating globals as necessary to contain GC data,
and referring to them using global.get later. That will ensure that
data identity works (things referring to the same object in the interpreter
will refer to the same object when the wasm is loaded). In more detail,
each live GC item is created in a "defining global", a global that is
immutable and of the precise type of that data. Then we just read from
that location in any place that wants to refer to that data. That is,
something like
function foo() {
var x = Bar(10);
var y = Bar(20);
var z = x;
z.value++; // first object now contains 11
...
}
will be evalled into something like
var define$0 = Bar(11); // note the ++ has taken effect here
var define$1 = Bar(20);
function foo() {
var x = define$0;
var y = define$1;
var z = define$0;
...
}
This PR should handle everything but "cycles", that is, GC data that at
runtime ends up forming a loop. Leaving that for later work (not sure
how urgent it is to fix).
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GlobalManager is another class that added complexity in the interpreter logic,
and did not help. In fact it hurts extensibility, as when one wants to extend the
interpreter one has another class to customize, and it is templated on the main
runner, so again as #4479 we end up with annoying template cycles.
This simply removes that class. That makes the interpreter code strictly
simpler. Applying that change to wasm-ctor-eval also ends up fixing a
pre-existing bug, so this PR gets testing through that.
The ctor-eval issue was that we did not extend the GlobalManager properly
in the past: we checked for accesses on imported globals there, but not in
the main class, i.e., not on global.get operations. Needing to do things in
two places is an example of the previous complexity. The fix is simply to
implement visitGlobalGet in one place, and remove all the GlobalManager
logic added in ctor-eval, which then gets a lot simpler as well.
The new imported-global-2.wast checks for that bug (a global.get of an
import should stop us from evalling). Existing tests cover the other cases,
like it being ok to read a non-imported global, etc. The existing test
indirect-call3.wast required a slight change: There was a global.get of
an imported global, which was ignored in the place it happened (an init
of an elem segment); the new code checks all global.gets, so it now
catches that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
class (#4479)
As recently discussed, the interpreter code is way too complex. Trying to add
ctor-eval stuff I need, I got stuck and ended up spending some time to get rid
of some of the complexity.
We had a ModuleInstanceBase class which was basically an instance of a
module, that is, an execution of it. And internally we have RuntimeExpressionRunner
which is a runner that integrates with the ModuleInstanceBase - basically, it uses
the runtime info to execute code. For example, the MIB has globals info, and the
RER would read it from there.
But these two classes are really just one functionality - an execution of a module.
We get rid of some complexity by removing the separation between them, ending
up with a class that can run a module.
One set of problems we avoid is that we can now extend the single class in a
simple way. Before, we would need to extend both - and inform each other of
those changes. That gets "fun" with CRTP which we use everywhere. In other
words, each of the two classes depended on the other / would need to be
templated on the other. Specifically, MIB.callFunction would need to be given
the RER to run with, and so that would need to be templated on it. This ends up
leading to a bunch more templating all around - all complexity that we just
don't need. See the simplification to the wasm-ctor-eval for some of that (and
even worse complexity would have been needed without this PR in the next
steps for that tool to eval GC stuff).
The final single class is now called ModuleRunner.
Also fixes a pre-existing issue uncovered by this PR. We had the delegate
target on the runner, but it should be tied to a function scope. This happened
to not be a problem if one always created a new runner for each scope, but
this PR makes the runner longer-lived, so the stale data ended up mattering.
The PR moves that data to the proper place.
Note: Diff without whitespace is far, far smaller.
|
|
|
|
|
| |
We emitted the right text to stdout to indicate a trap in one code path, but did
not return a Trap from the function. As a result, we'd continue and hit the
assert on the next line.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible for type building to fail, for example if the declared nominal
supertypes form a cycle or are structurally invalid. Previously we would report
a fatal error and kill the program from inside `TypeBuilder::build()` in these
situations, but this handles errors at the wrong layer of the code base and is
inconvenient for testing the error cases.
In preparation for testing the new error cases introduced by isorecursive
typing, make type building fallible and add new tests for existing error cases.
Also fix supertype cycle detection, which it turns out did not work correctly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add gtest as a git submodule in third_party and integrate it into the build the
same way WABT does. Adds a new executable, `binaryen-unittests`, to execute
`gtest_main`. As a nontrivial example test, port one of the `TypeBuilder` tests
from example/ to gtest/.
Using gtest has a number of advantages over the current example tests:
- Tests are compiled and linked at build time rather than runtime, surfacing
errors earlier and speeding up test execution.
- Tests are all built into a single binary, reducing overall link time and
further reducing test overhead.
- Tests are built from the same CMake project as the rest of Binaryen, so
compiler settings (e.g. sanitizers) are applied uniformly rather than having
to be separately set via the COMPILER_FLAGS environment variable.
- Using the industry-standard gtest rather than our own script reduces our
maintenance burden.
Using gtest will lower the barrier to writing C++ tests and will hopefully lead
to us having more proper unit tests.
|
|
|
|
|
| |
Eventually this will enable the isorecursive hybrid type system described in
https://github.com/WebAssembly/gc/pull/243, but for now it just throws a fatal
error if used.
|
|
|
|
|
|
| |
This is useful for the case where we might want to finalize
without extracting metadata.
See: https://github.com/emscripten-core/emscripten/pull/15918
|
|
|
|
|
|
|
| |
LiteralList overlaps with Literals, but is less efficient as it is not a
SmallVector.
Add reserve/capacity methods to SmallVector which are now
necessary to compile.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When ignoring external input, assume params have a value of 0. This
makes it possible to eval main(argc, argv) if one is careful and does
not actually use those values.
This is basically a workaround for main always receiving argc/argv,
even if the C code has no args (in that case the compiler emits
__original_main for the user's main, and wraps it with a main
that adds the args, hence the problem).
This is similar to the existing support for handling wasi_args_get
when ignoring external input, although it just sets values of zeros for
the params. Perhaps it could check for main() specifically and return
1 for argc and a proper buffer for argv somehow, but I think if a program
wants to use --ignore-external-input it can avoid actually reading
argc/argv.
|