| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#2913)
When doing manual tuning of calls using asyncify lists, we want it to
be possible to write out all the functions that can be on the stack when
pausing, and for that to work. This did not quite work right with the
ignore-indirect option: that would ignore all indirect calls all the
time, so that if foo() calls bar() indirectly, that indirect call was
not instrumented (we didn't check for a pause around it), even if
both foo() and bar() were listed. There was no way to make that
work (except for not ignoring indirect calls at all).
This PR makes the add-list and only-lists fully instrument the functions
mentioned in them: both themselves, and indirect calls from them.
(Note that direct calls need no special handling - we can just add
the direct call target to the add-list or only-list.)
This may add some overhead to existing users, but only in a function
that is instrumented anyhow, and also indirect calls are slow anyhow,
so it's probably fine. And it is simpler to do it this way instead of
adding another list for indirect call handling.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Asyncify does a whole-program analysis to figure out the list of functions
to instrument. In
emscripten-core/emscripten#10746 (comment)
we realized that we need another type of list there, an "add list" which is a
list of functions to add to the instrumented functions list, that is, that we
should definitely instrument.
The use case in that link is that we disable indirect calls, but there is
one special indirect call that we do need to instrument. Being able to add
just that one can be much more efficient than assuming all indirect calls in
a big codebase need instrumentation. Similar issues can come up if we
add a profile-guided option to asyncify, which we've discussed.
The existing lists were not good enough to allow that, so a new option
is needed. I took the opportunity to rename the old ones to something
better and more consistent, so after this PR we have 3 lists as follows:
* The old "remove list" (previously "blacklist") which removes functions
from the list of functions to be instrumented.
* The new "add list" which adds to that list (note how add/remove are
clearly parallel).
* The old "only list" (previously "whitelist") which simply replaces the
entire list, and so only those functions are instrumented and no other.
This PR temporarily still supports the old names in the commandline
arguments, to avoid immediate breakage for our CI.
|
|
|
|
| |
To avoid the conditional trailing comma.
|
|
|
|
|
|
|
| |
The process of DCE'ing the argument handling is already handled in
a different way in standalone more. In standalone mode the entry
point is `_start` which takes no args, and argv code is included
on-demand via the presence or absence the wasi syscalls for argument
processing (__wasi_args_get/__wasi_args_sizes_get).
|
|
|
|
|
|
|
| |
anyref future semantics were changed to only represent opaque host values, and thus renamed to externref.
[Chromium](https://bugs.chromium.org/p/v8/issues/detail?id=7748#c360) was just updated to today (not yet released). I couldn't find a Mozilla bugzilla ticket mentioning externref so I don't immediately know if they've updated yet.
https://github.com/WebAssembly/reference-types/pull/87
|
|
|
|
| |
Expressions that may throw cannot be sinked into 'try'. At the start of
'try', we drop all sinkables that may throw.
|
|
|
|
|
| |
* Micro-optimize base64Decode
* Update test expectations
|
|
|
| |
As specified in https://github.com/WebAssembly/simd/pull/232.
|
|
|
|
|
|
|
|
|
| |
Instead of instrumenting every local.get, instrument parameters
on arrival at a function once on entry. After that, every local will
always contain a de-naned value (since we would denan on a
local.set). This is more efficient and also less confusing I think.
Also avoid doing anything to values that fall through as they
have already been fixed up.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This prevents `exnref.pop`s from being sinked and separated from
`catch`. For example,
```wast
(try
(do)
(catch
(local.set $0 (exnref.pop))
(call $foo
(i32.const 3)
(local.get $0)
)
)
)
```
Here, if we sink `exnref.pop` to remove `local.set $0` and
`local.get $0`, it becomes this:
```wast
(try
(do)
(catch
(nop)
(call $foo
(i32.const 3)
(exnref.pop)
)
)
)
```
This move was possible because `i32.const 3` does not have any side
effects. But this is incorrect because now `exnref.pop` does not follow
right after `catch`.
To prevent this, this patch checks this case in `canSink` in
SimplifyLocals. When we encountered a similar case in CodeFolding, we
prevented every expression that contains `Pop` anywhere in it from being
moved, which was too conservative. This adds `danglingPop` property in
`EffectAnalyzer`, so that only pops that are not enclosed within a
`catch` count as 'dangling pops` and we only prevent those pops from
being moved or sinked.
|
|
|
|
|
|
| |
After #2783 `SideEffects::Branches` includes possibly throwing
expressions, which can be calls (when EH is enabled). This changes
`SideEffects::Branches` back to only include branches, returns, and
infinite loops as it was before #2783.
|
|
|
|
| |
In CodeFolding, we should not take an expression that may throw out of a
`try` scope. This patch adds this restriction in `canMove`.
|
|
|
|
|
|
| |
This moves the fuzzer de-NaN logic out into a separate pass. This is
cleaner and also better since the old way would de-NaN once, but then
the reducer could generate code with nans. The new way lets us de-NaN
while reducing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#2862)
In the .debug_loc section the Start/End address offsets in a location list are
relative to the address of the compilation unit that refers that location list.
There is a problem in function wasm::Debug:: updateLoc(), which compares these
offsets with the actual module addresses of expressions and functions, causing
the generation of invalid location lists.
The fix is not trivial, because the DWARF debug_loc section does not specify
which is the compilation unit associated to each location list entry.
A simple workaround is to store, in LocationUpdater, a map of location list
offsets to the base address of the compilation units referencing them, and that
can be easily calculated in updateDIE().
|
|
|
|
|
|
|
|
|
| |
The special-casing of unreachable there could lead to bad
behavior, where we did nothing to the unreachable and ended
up moving something with side effects before it, see testcase
in test/passes/flatten_all-features.wast.
This emits less efficient code, but only if --dce was not run
earlier, so probably not worth optimizing.
|
|
|
|
|
|
| |
Push and Pop have been superseded by tuples for their original
intended purpose of supporting multivalue. Pop is still used to
represent block arguments for exception handling, but there are no
plans to use Push for anything now or in the future.
|
| |
|
|
|
|
|
|
|
| |
These are now implemented in assembly as part of emscripten's
compiler-rt.
See: https://github.com/emscripten-core/emscripten/pull/11166
|
|
|
|
|
|
|
| |
- `br_on_exn`'s target block cannot be optimized to have a separate
return value. This handles that in `SimplifyLocals`.
- `br_on_exn` and `rethrow` can trap (when the arg is null). This
handles that in `EffectAnalyzer`.
- Fix a few nits
|
|
|
|
| |
This is the only instruction in the current spec proposal that had not
yet been implemnented in the tools.
|
|
|
|
| |
Turns out we had a testcase for this already, but were doing the
wrong thing on it.
|
|
|
|
|
|
|
|
|
|
| |
This change was generated by running:
$ ./scripts/test/generate_lld_tests.py --binaryen-bin=$PWD/../binaryen-out/bin/ $PWD/../llvm-build/bin/ $PWD/../emscripten
Then:
$ ./auto_update_tests.py --binaryen-bin=../binaryen-out/bin/ lld
|
|
|
|
|
| |
This test verifies that functions in the llvm input source that
do stack pointer manipulation get correctly handled by
`wasm-emscripten-finalize --check-stack-overflow` (StackLimitEnforcer)
|
|
|
|
|
|
|
|
|
|
|
|
| |
In `ReFinalize`'s branch handling, `updateBreakValueType` is supposed to
be executed only when the branch itself is not replaced with its
argument (because it is guaranteed not to be taken).
Also this moves `visitBrOnExn` from `RuntimeExpressionRunner` to its
base class `ExpressionRunner`, because it does not depend on anything on
the runtime instance to work. This is effectively NFC for now because
`visitTry` is still only implemented only in `RuntimeExpressionRunner`
because it relies on multivalue handling of it, and without it we cannot
create a valid exception `Literal`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we were able to omit the new syntax `do` when `try` body is
empty. This makes `do` clause mandatory, so when a `try` body is empty,
the folded text format will be
```
(try
(do)
(catch
...
)
```
Suggested in
https://github.com/WebAssembly/exception-handling/issues/52#issuecomment-626696720.
|
|
|
|
|
|
|
| |
This adds support for `throw`, `rethrow`, and `br_on_exn` in
MergeBlocks. While unrelated instructions within blocks can be hoisted
as in other instructions, `br_on_exn` requires a special handling in
`ProblemFinder`, because unlike `br_if`, its `exnref` argument itself
cannot be moved out of `br_on_exn`.
|
|
|
| |
As specified in https://github.com/WebAssembly/simd/pull/122.
|
|
|
|
|
|
| |
This API enables use cases where we want to keep the original expression, yet utilize passes like `vacuum` or `precompute` to evaluate it without implicitly modifying the original.
C-API: **BinaryenExpressionCopy**(expr, module)
JS-API: **Module#copyExpression**(expr)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In WebAssembly/exception-handling#52, We decided to put `try` bodies in
a `do` clause to be more consistent with `catch`.
- Before
```wast
(try
...
(catch
...
)
)
```
- After
```wast
(try
(do
...
)
(catch
...
)
)
```
Another upside of this change is when there are multiple instructions
within a `try` body, we no longer need to wrap them in a `block`.
|
|
|
|
| |
This adds missing handlings for `throw` and `rethrow` in DCE. They
should set `reachable` variable to `false`, like other branches.
|
|
|
|
|
|
| |
This feature was very useful in the early days of the C API,
but has not shown usefuless for quite a while, and has a
significant maintenance burden, so it it's makes sense to
remove it now.
|
|
|
|
|
|
|
|
|
| |
This adds interpreter support for EH instructions. This adds
`ExceptionPackage` struct, which contains info of a thrown exception (an
event tag and thrown values), and the union in `Literal` can take a
`unique_ptr` to `ExceptionPackage`. We need a destructor, a copy
constructor, and an assignment operator for `Literal`, because the union
in `Literal` now has a member that cannot be trivially copied or
deleted.
|
|
|
| |
As described in https://github.com/WebAssembly/simd/pull/209.
|
|
|
|
|
|
|
| |
This allows emscripten to statically set the initial value of the
stack pointer.
Should allow use to avoid doing it dynamically at startup:
https://github.com/emscripten-core/emscripten/pull/11031
|
|
|
|
| |
This list is identical to the export list no there is no need to
output this twice.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had some ad-hoc tuning of which nodes to emit more
frequently in the fuzzer, but it wasn't very good. Things like
loads and stores for example were far too rare. Also it wasn't
easy to adjust the frequencies.
This adds a simple way to adjust them, by passing a size_t
which is the "weight" of that node. Then it just makes that
number of copies of it, making it more likely to be picked.
Example output comparison:
node before after
================================
binary 281 365
block 898 649
break 278 144
call 182 290
call_indirect 9 42
const 808 854
drop 43 92
global.get 440 398
global.set 223 171
if 335 254
load 22 84
local.get 429 301
local.set 434 211
loop 176 99
nop 117 54
return 264 197
select 8 33
store 1 39
unary 405 304
unreachable 1 2
Lots of noise here obviously, but there are large increases
for loads and stores compared to before.
Also add a testcase of random data of the typical size the
fuzzer runs, and print metrics on it. This might help us get
a feel for how future tuning changes affect frequencies.
|
|
|
| |
Since the --roundtrip pass is more general than --fuzz-binary anyways. Also reimplements `ModuleUtils::clearModule` to use the module destructor and placement new to ensure that no members are missed.
|
|
|
|
| |
type. fixes #2807 (#2808)
|
| |
|
| |
|
|
|
|
|
|
| |
Without this change only the import gets renamed not the internal
name. Since the internal name is the one that ends up in the name
section this means that rename wasn't effecting the name section.
|
|
|
|
|
|
|
| |
Refactors most of the precompute pass's expression runner into its
base class so it can also be used via the C and JS APIs. Also adds
the option to populate the runner with known constant local and global
values upfront, and remembers assigned intermediate values as well
as traversing into functions if requested.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds dummy interpreter support for EH instructions, mainly for
fuzzing. The plan is to make the interpreter support for EH instructions
correctly using Asyncify in the future. Also to support the correct
behavior we will need a `Literal` of `exnref` type too, which will be
added later too.
Currently what this dummy implementation does is:
- `try`-`catch`-`end`: only runs `try` body and ignores `catch` body
- `throw`: traps
- `retyrow`:
- Traps on nullref argument (correct behavior based on the spec)
- Traps otherwise too (dummy implementation for now)
- `br_on_exn`:
- Traps on nullref (correct behavior)
- Otherwise we assume the current expression matches the current event
and extracts a 0 literal based on the current type.
This also adds some interpreter tests, which tests the basic dummy
behaviors for now. (Deleted tests are the ones that weren't tested
before.)
|
|
|
|
|
|
|
|
| |
Emit tuple.make, tuple.extract, and multivalue control flow, and tuple locals
and globals when multivalue is enabled. Also slightly refactors the top-level
`makeConcrete` function to be more selective about what it tries to
make based on the requested type to reduce the number of trivial nodes
created because the requested type is incompatible with the requested
node.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main benefit here is comparing VMs, instead of just comparing
each VM to itself after opts. Comparing VMs is a little tricky since there
is room for nondeterminism with how results are printed and other
annoying things, which is why that didn't work well earlier.
With this PR I can run 10's of thousands of iterations without finding
any issues between v8 and the binaryen interpreter. That's after
fixing the various issues over the last few days as found by this:
#2760 #2757 #2750 #2752
Aside from that main benefit I ended up adding more improvements
to make it practical to do all that testing:
Randomize global fuzz settings like whether we allow NaNs and
out-of-bounds memory accesses. (This was necessary here since
we have to disable cross-VM comparisons if NaNs are enabled.)
Better logging of statistics like how many times each handler
was run.
Remove redundant FuzzExecImmediately handler (looks like
after past refactorings it was no longer adding any value).
Deterministic testcase handling: if you run e.g. fuzz_opt.py 42 it
will run one testcase and exactly the same one. If you run without
an argument it will run forever until it fails, and if it fails, it prints out
that ID so that you can easily reproduce it (I guess, on the same
binaryen + same python, not sure how python's deterministic
RNG changes between versions and builds).
Upgrade to Python 3.
|
|
|
|
|
|
|
|
|
|
| |
Previously we tried to reuse `Const` node if a precomputed value is a
constant node. But now we have two more kinds of constant node
(`RefNull` and `RefFunc`), so we shouldn't reuse them interchangeably,
meaning we shouldn't try to reuse a `Const` node when the value at hand
is a `RefNull`. This correctly checks the type of node and tries to
reuse only if the types of nodes match.
Fixes #2759.
|
|
|
|
|
|
| |
When it is certain that the try body does not throw, we can replace the
try-catch with the try body. But in this case we have to notify the type
updater that the catch body is removed, so that all parents' type should
be updated properly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I am working to bring up the fuzzer on comparisons between VMs.
Comparing between the binaryen interpreter and v8, it found some
atomics issues:
Atomic operations, including loads and stores, must be aligned
or they trap.
AtomicRMW did the wrong thing with the operands.
AtomicCmpxchg must wrap the input to the proper size (if we
only load 1 byte, only look at 1 byte of the expected value too).
AtomicWait and AtomicNotify must take into account their
offsets. Also SIMDLoadExtend was missing that. This was
confusing in the code as two getFinalAddresses existed,
one that doesn't compute with an offset, and one that does.
I renamed the one without to getFinalAddressWithoutOffset
so it's explicit and we can easily see we only call that one on
an instruction without an offset (which is the case for
MemoryInit, MemoryCopy, and MemoryFill).
AtomicNotify must check its address to see if it should trap,
even though we don't actually have multiple threads running.
Atomic loads of fewer bytes than the type always do an
unsigned extension, not signed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of using indices into the global interned type table. This
means that a lock is *never* needed to access an expanded Type. The
Type lock is now only acquired when a complex Type is created. On a
real-world wasm2js workload this improves wall clock time by 23% on my
machine with 72 cores and makes traffic on the Type lock entirely
insignificant.
**Before**
72 cores
real 0m6.914s
user 184.014s
sys 0m3.995s
1 core
real 0m25.903s
user 0m25.658s
sys 0m0.253s
**After**
72 cores
real 5.349s
user 70.309s
sys 9.691s
1 core
real 25.859s
user 25.615s
sys 0.253s
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In wasm2js we ignore things that trap in wasm that we can't
really handle, like a load from memory out of bounds would
trap in wasm, but in JS we don't want to emit a bounds check
on each load. So wasm2js focuses on programs that don't
trap.
However, this is annoying in the fuzzer as it turns out that
our behavior for places where wasm would trap was not
deterministic. That is, wasm would trap, wasm2js would not
trap and do behavior X, and wasm2js with optimizations
would also not trap but do behavior Y != X. This produced
false positives in the fuzzer (and might be annoying in
manual debugging too).
As a workaround, this adds a --deterministic flag to wasm2js,
which tries to be deterministic about what it does for cases
where wasm would trap. This handles the case of an int
division by 0 which traps in wasm but without this flag could
have different behavior in wasm2js with or without opts
(see details in the patch).
|