| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Export an object with a `.value` property like the wasm JS API does
in browsers, and implement them with a getter and setter.
Fixes #4522
|
|
|
|
|
|
| |
Without this, the result in a build without assertions might be quite
confusing. See #4410
Also make the internal names more obviously internal names.
|
| |
|
|
|
|
|
|
| |
Also, fix bug where pointer was being used direcltly to
index into Int32Array. I suppose this code had basically
zero users until I tried to land this change in emscripten:
https://github.com/emscripten-core/emscripten/pull/15742
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the old hardcoded value numbering in that pass and makes
it use the new code that was split into helper code. The immediate benefit
of this is to make the code aware of identical constants: if two locals have
the same constant then they do not interfere. Future improvements to
numbering will also automatically help here.
This changes some constants in existing tests so that they keep testing
what they were testing before, and adds new tests for the new benefit here.
This implements a proposed TODO from #4314
|
|
|
|
|
|
| |
Its seems that with this emscripten change DCE is able to remove
the `assert` JS runtime function making this call to assert fail
with `ReferenceError: assert is not defined`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old algorithm can be summarized as: In each basic block, start at the beginning.
Each pair of live locals there might interfere with each other, as they might arrive from
different entry blocks with different values. Afterwards, go through the block and find
overlapping live ranges, and mark interferences there as well.
This is non-linear because at the start of the block we do a double-loop over all
pairs of live locals, which in general can be O(N^2) (N - number of locals). It also
has the downside of ignoring copies: if two locals have overlapping live ranges but
they must have identical values on those ranges, they do not actually interfere,
for example
x = 10;
y = x;
.. // live ranges overlap here
foo(x, y); // live ranges end here.
We can ignore this overlap since the copy shows they are identical there, but the
pass did not take this into account. To some extent other passes can remove such
copies (SimplifyLocals, MergeLocals, RedundantSetElimination), but in general
this was a weak spot for the optimizer.
I realized there is a solution to both these problems: In Wasm, given that we have
a default value for all locals, if a local is live at the start of a block then it must be
live at the end of all the blocks reaching it. That is so because the liveness will
extend backwards all the way to some set of the local, possibly all the way to
the zero-initialization at the start of the function, and it extends that way through
all predecessor blocks. A consequence of this is that there are no interferences
between locals that only occur during a merge: The live ranges include the
predecessor blocks, and theirs, and so forth, until we reach a block where one
of the locals is assigned a value different than the other. That is a necessary and
sufficient condition for intererence, and therefore when processing a block we
only need to look at its contents, and can ignore the merging of control flow,
which allows us to be linear.
More details on this and on the new algorithm in comments in the source, but
the basic idea is that it simply goes through each block in a linear way, finding
which values are assigned to each local (using a numbering of unique values),
and noting which are live at each time. If two locals are live and one is assigned
a value that is not the same as the value in the other, mark them as interfering.
This is of substantial benefit to j2wasm output, I believe because it is common
there to find local subexpression elimination opportunities after inlining, and
each time we find one we add a local. If we inline different functions into the
same target, we may end up with copied locals for each of them. (This was
not noticed in the past because it is very rare on LLVM output, which has
already had inlining and GVN etc. done.)
There is a small benefit to LLVM output as well, though just a few
percent at best. However, it is enough to be noticeable on some of
the code size tests.
This is also faster than the previous pass. It's normally not noticeable
as this pass is not one of the slowest anyhow, but I found some real-world
codebases where the pass becomes 50% faster. I have not found any
case where it is slower than the old algorithm.
Fuzzed over several days to be sure this is correct, and also verified
on the emscripten test suite.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Canonicalize:
(signed)x > -1 ==> x >= 0
(signed)x <= -1 ==> x < 0
(signed)x < 1 ==> x <= 0
(signed)x >= 1 ==> x > 0
(unsigned)x < 1 ==> x == 0
(unsigned)x >= 1 ==> x != 0
This should help #4265, and in general 0 is usually a more
common constant, and reasonable to canonicalize to.
|
|
|
|
|
|
|
|
|
|
|
|
| |
If all a select's inputs are boolean, we can sometimes turn the select
into an AND or an OR operation,
x ? y : 0 => x & y
x ? 1 : y => x | y
I believe LLVM aggressively canonicalizes to this form. It makes sense
to do here too as it is smaller (save the constant 0 or 1). It also allows
further optimizations (which is why LLVM does it) but I don't think we
have those yet.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable it in -O3 and -Os and higher.
This helps very little on output from LLVM, but also it does not alter
compile times much anyhow. On code that has not been run through
an optimizing compiler already, this can help quite a lot, e.g., 15% of
code size on some wasm GC samples.
This will not normally help with speed, as optimizing VMs do such
things anyhow. However, this can help baseline compilers and
interpreters and so forth.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #3973
Loads:
f32.reinterpret_i32(i32.load(x)) => f32.load(x)
f64.reinterpret_i64(i64.load(x)) => f64.load(x)
i32.reinterpret_f32(f32.load(x)) => i32.load(x)
i64.reinterpret_f64(f64.load(x)) => i64.load(x)
Stores:
f32.store(y, f32.reinterpret_i32(x)) => i32.store(y, x)
f64.store(y, f64.reinterpret_i64(x)) => i64.store(y, x)
i32.store(y, i32.reinterpret_f32(x)) => f32.store(y, x)
i64.store(y, i64.reinterpret_f64(x)) => f64.store(y, x)
Also optimize reinterprets that are undone:
i32.reinterpret_f32(f32.reinterpret_i32(x)) => x
i64.reinterpret_f64(f64.reinterpret_i64(x)) => x
f32.reinterpret_i32(i32.reinterpret_f32(x)) => x
f64.reinterpret_i64(i64.reinterpret_f64(x)) => x
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(select
(foo
(X)
)
(foo
(Y)
)
(condition)
)
=>
(foo
(select
(X)
(Y)
(condition)
)
)
To make this simpler, refactor optimizeTernary to be templated.
|
|
|
|
|
|
|
|
| |
The passive keyword has been removed from spec's text format, and now
any data segment that doesn't have an offset is considered as passive.
This PR remove that from both parser and the Print pass, plus all tests
that used that syntax.
Fixes #2339
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements emscripten-core/emscripten#13744
Inlining functions with a single use allows us to remove the function afterward.
That looks highly beneficial, shrinking every single benchmark in emscripten's
benchmark suite, by an average of 2% on the macrobenchmarks and 3.5% on
all of them. Speed also improves, although mostly on the microbenchmarks so
that might be less realistic.
There may be a slight downside to startup time due to emitting larger functions,
but given the baseline compilers in VMs these days it seems worth it, as the
delay would be just to get to the upper tier. On the benchmark suite the risk
seems low.
See more details in the PR above.
|
|
|
| |
Support has been there all along, but we didn't have a reference test of it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- atomic.notify -> memory.atomic.notify
- i32.atomic.wait -> memory.atomic.wait32
- i64.atomic.wait -> memory.atomic.wait64
See WebAssembly/threads#149.
This renames instruction name printing but not the internal data
structure names, such as `AtomicNotify`, which are not always the same
as printed instruction names anyway. This also does not modify C API.
But this fixes interface functions in binaryen.js because it seems
binaryen.js's interface functions all follow the corresponding
instruction names.
|
|
|
|
|
|
|
|
|
| |
This is because we maybe need to reference the segments
during the start function. For example in the case of
pthreads we conditionally load passive segments during
start.
Tested in emscripten with: tests/runner.py wasm2js1
|
|
|
|
|
| |
The asmFunc now sets the outer scope's `bufferView` variable
as well as its own internal views.
|
| |
|
|
|
|
|
|
|
| |
Using addition in more places is better for gzip, and helps simplify the
optimizer as well.
Add a FinalOptimizer phase to do optimizations like our signed LEB tweaks, to
reduce binary size in the rare case when we do want a subtraction.
|
| |
|
|
|
| |
We can only pack memory if we know it is zero-filled before us.
|
|
|
|
|
| |
Selectify turns an if-else into a select where possible. Previously we abandoned
hope if any part of the if had a side effect. But it's fine for the condition to have a
side effect, so long as moving it to the end doesn't invalidate the arms.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The DCE pass is one of the oldest in binaryen, and had quite a lot of
cruft from the changes in unreachability and other stuff in wasm and
binaryen's history. This PR rewrites it from scratch, making it about
1/3 the size.
I noticed this when looking for places to use code autogeneration.
The old version had annoying boilerplate, while the new one avoids
any need for it.
There may be noticeable differences, as the old pass did more than
it needed to. It overlapped with remove-unused-names for some
reason I don't remember. The new pass leaves that to the other
pass to do. I added another run of remove-unused-names to avoid
noticeable differences in optimized builds, but you can see
differences in the testcases that only run DCE by itself. (The test
differences in this PR are mostly whitespace.)
(The overlap is that if a block ended up not needed, that is, all
branches to it were removed, the old DCE would remove the block.)
This pass is about 15% faster than the old version. However, when
adding another run of remove-unused-names the difference
basically vanishes, so this isn't a speedup.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This will allow for the completely removal of
`__growWasmMemory` as a followup. We currently
unconditionally generate this function
in `generateMemoryGrowthFunction`.
See #3180
|
|
|
| |
It's not an actual constructor, just a JS function that returns the object.
|
|
|
|
|
| |
Also, format the asmFunc call to make it more readable in the ES6
modules case.
|
|
|
|
|
|
|
|
|
|
|
|
| |
These test output files are ignored and so contain stale output
that is neither checked during `check.py` not updated during
`auto_update_tests.py`.
There are three clases to tests here:
1. Spec tests that end in 64.wast are ignored by scripts/test/wasm2js.py
2. Spec tests that are globallyi ignoed by shared.py:SPEC_TESTS_TO_SKIP
3. hello_world.2asm.js.. I cant tell where this came remove it seems
like an anomaly.
|
| |
|
|
|
| |
Add floating point Eq and Ne operators to Properties::isSymmetric. Also treat additional float ops as symmetric specifically in OptimizeInstructions when their operands are known to be non-NaN.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was hardcoded as "env.memory", which is usually correct. But if we minify
import names, as in -O3 in emscripten, we need to use the minified name.
Note how in the test it now emits
var memory = env.a;
for the import.
Fixes emscripten-core/emscripten#12123
This was not noticed earlier since that import is only used in memory
growth. The tests that would catch it are wasm2js3.test*memory_growth*
but we only run wasm2js1 on CI. I'll add testing after this lands.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we used "Top" for both exports and the top level
(which has functions and globals). The warning about name
collisions there was meant only for exports (where if a name
collides and so it must be renamed, means that there will
be an externally-visible oddness for the user). But it applied
to functions too, which could be annoying, and was not
dangerous (at worst, it might be confusing when reading the
emitted JS and seeing NAME_1, NAME_2, but there is no
effect on execution or on exports).
To fix this, add a new Export name scope. This separates
function names from export names. However, it runs into
another issue which is that when checking for a name conflict
we had a big set of all the names in all the scopes. That is,
FOO would only ever be used in one scope, period, and
other appearances of that Name in wasm would get a
suffix. As a result, if an exported function FOO has the name
foo, we'd export it as FOO but name the function FOO_1
which is annoying. To fix that, keep sets of all names in each
scope. When mangling a name we can then only care about
the relevant scope, EXCEPT for local names, which must
also not conflict with function names. That is, this would be
bad:
function foo(bar) {
var bar = 0;
}
function bar() { ..
It's not ok to call a parameter "bar" if there is a function by
that name (well, it could be if it isn't called in that scope).
So when mangling the Local scope, also check the Top one
as well.
The test output changes are due to non-overlapping scopes,
specifically Local and Label. It's fine to have
foo : while(1) {
var foo = 5;
}
Those "foo"s do not conflict.
Fixes emscripten-core/emscripten#11743
|
|
|
|
| |
optimizeBoolean does not receive a boolean, it is done when the
output flows into a boolean context.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is usually fine to do if (x | 0) => if (x) since it just cares if the
value is 0 or not. However, if the cast turns it into 0, then that is
incorrect, which the fuzzer found as
-2147483648 + -2147483648 | 0
(the sum is 2^32, which | 0 is 0).
We can maybe look into doing this in a safe way, but for now
just remove it. It doesn't have a big impact on code size as this
is pretty rare (e.g. the minimal runtime code size test is not
broken by this).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We emit FUNCTION_TABLE[ptr], where FUNCTION_TABLE is a JS
array. That is a rare case where true is handled differently than 1
(a typed array or an add would cast, etc.), so we must explicitly cast
there.
Fixes an issue that existed before, but became a problem due to
#2869 which optimized some selects into a form that emitted a true
or a false, and if that was a function pointer, it could be bad, see
https://app.circleci.com/pipelines/github/emscripten-core/emscripten/6699/workflows/0c4da49c-75d0-4b0a-8fac-686a8330a3fe/jobs/336520
The new test/wasm2js/indirect-select.2asm.js.opt output shows
what happened there.
Verified as passing emscripten's wasm2js1 wasm2js2 test suites.
|
|
|
|
|
| |
x ? 1 : 0 => !!x
and so forth.
|
|
|
|
|
|
|
|
|
|
|
| |
i64 reinterprets were lowered in the i64 pass, and i32s at the very end, in
wasm2js itself. This could break since in between the i64 pass and wasm2js
we run optimizations, and the optimizer was not aware of what we lower
the i32 reinterprets to - calls to use scratch memory. Those calls have a
side effect of altering scratch memory. The optimizer just saw an i32
reinterpret, and moved it across the i64 reinterpret's scratch memory calls.
This makes 32-bit reinterprets use separate scratch memory from 64-bit ones,
which means they can never interfere with each other.
|
|
|
|
|
|
| |
The usual "trick" to extend: shift left so the sign bit in the small
integer is now the sign bit in a 32-bit integer, then shift right to
spread that sign bit out and return the lower bits to their
proper place, (x << 24) >> 24.
|
|
|
|
|
| |
Atomic loads, stores, RMW, cmpXchg, wait, and notify. This is enough
to get the asm.js atomics tests in the emscripten test suite to pass, at least
(but they are a subset of the entire pthreads suite).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
That code originally used memory location 1024 to save 64 bits of
data (as that is what rust does apparently). We refactored it
manually to instead use a scratch memory helper, which is safer.
However, that 64-bit function ends up legalized, which actually
changes the interface between the module and the outside,
which is confusing and causes problems with optimizations
that can remove the getTempRet0 imports, see
emscripten-core/emscripten#11456
Instead, just use a global i64 to stash those bits. This requires
adding support for copying globals from the intrinsics module,
but otherwise seems simpler overall.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a special helper functions for data.drop etc., as unlike most
wasm instructions these are too big to emit inline.
Track passive segments at runtime in var memorySegments
whose indexes are the segment indexes.
Emit var bufferView even if the memory exists even without
memory segments, as we do still need the view in order to
operate on it.
Also adds a few constants for atomics that will be useful in future
PRs (as this PR updates the constant lists anyhow).
|
|
|
|
|
| |
* Micro-optimize base64Decode
* Update test expectations
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In wasm2js we ignore things that trap in wasm that we can't
really handle, like a load from memory out of bounds would
trap in wasm, but in JS we don't want to emit a bounds check
on each load. So wasm2js focuses on programs that don't
trap.
However, this is annoying in the fuzzer as it turns out that
our behavior for places where wasm would trap was not
deterministic. That is, wasm would trap, wasm2js would not
trap and do behavior X, and wasm2js with optimizations
would also not trap but do behavior Y != X. This produced
false positives in the fuzzer (and might be annoying in
manual debugging too).
As a workaround, this adds a --deterministic flag to wasm2js,
which tries to be deterministic about what it does for cases
where wasm would trap. This handles the case of an int
division by 0 which traps in wasm but without this flag could
have different behavior in wasm2js with or without opts
(see details in the patch).
|
|
|
|
|
| |
Since the global is never read, we know that any write operation
will be unobservable.
|