summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
...
* Add feature flag for relaxed-simd (#4183)Ng Zhi An2021-09-238-1/+18
|
* [Wasm GC] Implement static (rtt-free) StructNew, ArrayNew, ArrayInit (#4172)Alon Zakai2021-09-2322-125/+451
| | | | | | | | | See #4149 This modifies the test added in #4163 which used static casts on dynamically-created structs and arrays. That was technically not valid (as we won't want users to "mix" the two forms). This makes that test 100% static, which both fixes the test and gives test coverage to the new instructions added here.
* [Refactoring] Code reusage in spliceIntoBlock (#4174)Max Graey2021-09-222-11/+3
|
* Use UniqueDeferringQueue in Precompute (#4179)Alon Zakai2021-09-222-7/+7
|
* Properly error on bad function names in text format (#4177)Alon Zakai2021-09-222-2/+12
|
* Remove unneeded work in Precompute (#4176)Alon Zakai2021-09-221-1/+0
| | | | | isSSA is not called anywhere. See #4165
* [Wasm GC] Fix invalid intermediate IR in OptimizeInstructions (#4169)Alon Zakai2021-09-201-12/+9
| | | | | | | | | | | | We added an optional ReFinalize in OptimizeInstructions at some point, but that is not valid: The ReFinalize only updates types when all other works is done, but the pass works incrementally. The bug the fuzzer found is that a child is changed to be unreachable, and then the parent is optimized before finalize() is called on it, which led to an assertion being hit (as the child was unreachable but not the parent, which should also be). To fix this, do not change types in this pass. Emit an extra block with a declared type when necessary. Other passes can remove the extra block.
* Tiny code cleanups. NFC (#4171)Alon Zakai2021-09-202-2/+1
| | | | | Remove an unnecessary include, and fix a typo in a macro declaration (that macro is not tested as it seems nothing uses DELEGATE_END yet, but I may be soon).
* [Matcher] Add bval for matching boolean literals (#4162)Max Graey2021-09-202-12/+38
|
* Fix an unused variable warning. NFC (#4170)walkingeyerobot2021-09-201-0/+1
|
* [Wasm GC] Add static variants of ref.test, ref.cast, and br_on_cast* (#4163)Alon Zakai2021-09-2016-169/+445
| | | | | | | | | | | | These variants take a HeapType that is the type we intend to cast to, and do not take an RTT. These are intended to be more statically optimizable. For now though this PR just implements the minimum to get them parsing and to get through the optimizer without crashing. Spec: https://docs.google.com/document/d/1afthjsL_B9UaMqCA5ekgVmOm75BVFu6duHNsN9-gnXw/edit# See #4149
* Fix interpreting of ref.as_func|data (#4164)Alon Zakai2021-09-201-2/+2
|
* Partial inlining via function splitting (#4152)Alon Zakai2021-09-172-45/+595
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR helps with functions like this: function foo(x) { if (x) { .. lots of work here .. } } If "lots of work" is large enough, then we won't inline such a function. However, we may end up calling into the function only to get a false on that if and immediately exit. So it is useful to partially inline this function, basically by creating a split of it into a condition part that is inlineable function foo$inlineable(x) { if (x) { foo$outlined(); } } and an outlined part that is not inlineable: function foo$outlined(x) { .. lots of work here .. } We can then inline the inlineable part. That means that a call like foo(param); turns into if (param) { foo$outlined(); } In other words, we end up replacing a call and then a check with a check and then a call. Any time that the condition is false, this will be a speedup. The cost here is increased size, as we duplicate the condition into the callsites. For that reason, only do this when heavily optimizing for size. This is a 10% speedup on j2cl. This helps two types of functions there: Java class inits, which often look like "have I been initialized before? if not, do all this work", and also assertion methods which look like "if the input is null, throw an exception".
* [Wasm GC] Optimize away ref.as_non_null going into local.set in TNH mode (#4157)Alon Zakai2021-09-161-10/+25
| | | | | | | | | | If we can remove such traps, we can remove ref.as_non_null if the local type is nullable anyhow. If we support non-nullable locals, however, then do not do this, as it could inhibit specializing the local type later. Do the same for tees which we had existing code for. Background: #4061 (comment)
* Fix regression from #4130 (#4158)Alon Zakai2021-09-161-2/+8
| | | | | | | | | That PR reused the same node twice in the output, which fails on the assertion in BINARYEN_PASS_DEBUG=1 mode. No new test is needed because the existing test suite fails already in that mode. That the PR managed to land seems to say that we are not testing pass-debug mode on our lit tests, which we need to investigate.
* [Wasm GC] Fix OptimizeInstructions on unreachable ref.test (#4156)Alon Zakai2021-09-151-0/+4
| | | | | | Avoids a crash in calling getHeapType when there isn't one. Also add the relevant lit test (and a few others) to the list of files to fuzz more heavily.
* [Wasm GC] Fix lack of packing in array.init (#4153)Alon Zakai2021-09-141-1/+2
|
* [OptimizeInstructions] Optimize memory.fill with constant arguments (#4130)Max Graey2021-09-141-1/+126
| | | | | | | | | | | | | | This is reland of #3071 Do similar optimizations as in #3038 but for memory.fill. `memory.fill(d, v, 0)` ==> `{ drop(d), drop(v) }` only with `ignoreImplicitTraps` or `trapsNeverHappen` `memory.fill(d, v, 1)` ==> `store8(d, v)` Further simplifications can be done only if v is constant because otherwise binary size would increase: `memory.fill(d, C, 1)` ==> `store8(d, (C & 0xFF))` `memory.fill(d, C, 2)` ==> `store16(d, (C & 0xFF) * 0x0101)` `memory.fill(d, C, 4)` ==> `store32(d, (C & 0xFF) * 0x01010101)` `memory.fill(d, C, 8)` ==> `store64(d, (C & 0xFF) * 0x0101010101010101)` `memory.fill(d, C, 16)` ==> `store128(d, i8x16.splat(C & 0xFF))`
* RemoveUnusedBrs::tablify() improvements: handle EqZ and tee (#4144)Alon Zakai2021-09-131-9/+40
| | | | | | | | | | | | tablify() attempts to turns a sequence of br_ifs into a single br_table. This PR adds some flexibility to the specific pattern it looks for, specifically: * Accept i32.eqz as a comparison to zero, and not just to look for i32.eq against a constant. * Allow the first condition to be a tee. If it is, compare later conditions to local.get of that local. This will allow more br_tables to be emitted in j2cl output.
* OptimizeInstructions: Optimize boolean selects (#4147)Alon Zakai2021-09-132-0/+21
| | | | | | | | | | | | If all a select's inputs are boolean, we can sometimes turn the select into an AND or an OR operation, x ? y : 0 => x & y x ? 1 : y => x | y I believe LLVM aggressively canonicalizes to this form. It makes sense to do here too as it is smaller (save the constant 0 or 1). It also allows further optimizations (which is why LLVM does it) but I don't think we have those yet.
* Support new dylink.0 custom section format (#4141)Sam Clegg2021-09-114-8/+90
| | | | | | | See also: spec change: https://github.com/WebAssembly/tool-conventions/pull/170 llvm change: https://reviews.llvm.org/D109595 wabt change: https://github.com/WebAssembly/wabt/pull/1707 emscripten change: https://github.com/emscripten-core/emscripten/pull/15019
* Add an Intrinsics mechanism, and a call.without.effects intrinsic (#4126)Alon Zakai2021-09-109-2/+212
| | | | | | | | | | | | | | | | | | | | | | | | | An "intrinsic" is modeled as a call to an import. We could also add new IR things for them, but that would take more work and lead to less clear errors in other tools if they try to read a binary using such a nonstandard extension. A first intrinsic is added here, call.without.effects This is basically the same as call_ref except that the optimizer is free to assume the call has no side effects. Consequently, if the result is not used then it can be optimized out (as even if it is not used then side effects could have kept it around). Likewise, the lack of side effects allows more reordering and other things. A lowering pass for intrinsics is provided. Rather than automatically lower them to normal wasm at the end of optimizations, the user must call that pass explicitly. A typical workflow might be -O --intrinsic-lowering -O That optimizes with the intrinsic present - perhaps removing calls thanks to it - then lowers it into normal wasm - it turns into a call_ref - and then optimizes further, which would turns the call_ref into a direct call, potentially inline, etc.
* [NFC] Add a comment in MergeBlocks about subtyping (#4142)Alon Zakai2021-09-101-2/+8
| | | Followup to #4137
* [Wasm GC] ArrayInit support (#4138)Alon Zakai2021-09-1022-1/+152
| | | | | | | array.init is like array.new_with_rtt except that it takes as arguments the values to initialize the array with (as opposed to a size and an optional initial value). Spec: https://docs.google.com/document/d/1afthjsL_B9UaMqCA5ekgVmOm75BVFu6duHNsN9-gnXw/edit#
* Refactor MergeBlocks to use iteration; adds Wasm GC support (#4137)Alon Zakai2021-09-091-30/+22
| | | | | | | | | MergeBlocks was written a very long time ago, before the iteration API, so it had a bunch of hardcoded things for specific instructions. In particular, that did not handle GC. This does a small refactoring to use iteration. The refactoring is NFC, but while doing so it adds support for new relevant instructions, including wasm GC.
* Rename isIntrinsicallyNondeterministic() to isGenerative() (#4092)Alon Zakai2021-09-095-34/+31
|
* [OptimizeInstructions] propagate sign for integer multiplication (#4098)Max Graey2021-09-091-0/+50
| | | | | | | | | | | | ```ts -x * -y => (x * y) -x * y => -(x * y) x * -y => -(x * y), if x != C && y != C -x * C => x * -C, if C != C_pot || shrinkLevel != 0 -x * C => -(x * C), otherwise ``` We are skipping propagation when lhs and rhs are constants because this should handled by constant folding. Also skip cases like `-x * 4 -> x * -4` for `shrinkLevel != 0`, as this will be further converted to `-(x << 2)`.
* Make static buffers in numToString thread local (#4134)Thomas Lively2021-09-091-4/+6
| | | | | | | Validation is performed on multiple threads at once and when there are multiple validation failures, those threads can all end up in `numToString` at the same time as they construct their respective error messages. Previously the threads would race on their access to the snprintf buffers, sometimes leading to segfaults. Fix the data races by making the buffers thread local.
* Do not use a library for wasm-split files (#4132)Thomas Lively2021-09-081-4/+2
|
* [wasm-split] Do not add exports of imported memories (#4133)Thomas Lively2021-09-081-12/+14
| | | | | | We can assume that imported memories (and the profiling data they contain) are already accessible from the module's environment, so there's no need to export them. This also avoids needing to add knowledge of "profile-memory" to Emscripten's library_dylink.js.
* Enumerate objects for wasm-split-lib (#4128)Thomas Lively2021-09-071-1/+1
| | | To support CMake 3.10. `add_executable` does not support OBJECT libraries until 3.12.
* Inlining: Track names over multiple iterations, not pointers (#4127)Alon Zakai2021-09-071-2/+6
| | | | | | | | It can be confusing during debugging to keep a map of pointers when we might have removed some of those functions from the module meanwhile (if you iterate over it in some additional debug logging). This change has no observable effect, however, as no bug could have actually occurred in practice given that nothing is done with the pointers in the actual code.
* Show a clear error on asyncify+references. (#4125)Alon Zakai2021-09-073-3/+31
| | | Helps #3739
* wasm-split: Export the memory if it is not already (#4121)Alon Zakai2021-09-071-1/+14
|
* [wasm-split] Add an option for recording profile data in memory (#4120)Thomas Lively2021-09-035-55/+165
| | | | | | | | | | | | | | | | To avoid requiring a static memory allocation, wasm-split's instrumentation defaults to recording profile data in Wasm globals. This causes problems for multithreaded applications because the globals are thread-local, but it is not always feasible to arrange for a separate profile to be dumped on each thread. To simplify the profiling of such multithreaded applications, add a new instrumentation mode that stores the profiling data in shared memory instead of in globals. This allows a single profile to be written that correctly reflects the called functions on all threads. This new mode is not on by default because it requires users to ensure that the program will not trample the in-memory profiling data. The data is stored beginning at address zero and occupies one byte per declared function in the instrumented module. Emscripten can be told to leave this memory free using the GLOBAL_BASE option.
* Optimize away dominated calls to functions that run only once (#4111)Alon Zakai2021-09-035-3/+456
| | | | | | | | | | | | | | | | | | | | | | | Some functions run only once with this pattern: function foo() { if (foo$ran) return; foo$ran = 1; ... } If that global is not ever set to 0, then the function's payload (after the initial if and return) will never execute more than once. That means we can optimize away dominated calls: foo(); foo(); // we can remove this To do this, we find which globals are "once", which means they can fit in that pattern, as they are never set to 0. If a function looks like the above pattern, and it's global is "once", then the function is "once" as well, and we can perform this optimization. This removes over 8% of static calls in j2cl.
* [NFC] Split wasm-split into multiple files (#4119)Thomas Lively2021-09-038-969/+1101
| | | | | As wasm-split has gained new functionality, its implementation file has become large. In preparation for adding even more functionality, split the existing implementation across multiple files in a new tools/wasm-split subdirectory.
* Support specialized function types in element segments (#4109)Alon Zakai2021-09-028-39/+70
| | | | | | Before this, the element segments would be printed as having type funcref, and then if their table had a specialized type, the element type would not be a subtype of the table and validation would fail.
* Fix the effects of array.copy (#4118)Alon Zakai2021-09-011-0/+2
| | | | | | This appeared to be a regression from #4117, however this was always a bug, and that PR just exposed it. That is, somehow we forgot to indicate the effects of ArrayCopy, and after that PR we'd vacuum it out incorrectly.
* [Refactoring] Cleanup asm2wasm. Use JS instead ASM prefix where possible. ↵Max Graey2021-09-019-474/+152
| | | | NFC (#4090)
* Use TrapsNeverHappen mode in more places in Vacuum (#4117)Alon Zakai2021-09-012-4/+4
| | | | | | | | | | | | | | We had already replaced the check on drop, but we can also use that mode on all the other things there, as the pass never does reorderings of things - it just removes them. For example, the pass can now remove part of a dropped thing, (drop (struct.get (foo))) => (drop (foo)) In this example the struct.get can be removed, even if the foo can't.
* Use the new module version of EffectAnalyzer (#4116)Alon Zakai2021-08-3115-83/+54
| | | | | | | | | | | This finishes the refactoring started in #4115 by doing the same change to pass a Module into EffectAnalyzer instead of features. To do so this refactors the fallthrough API and a few other small things. After those changes, this PR removes the old feature constructor of EffectAnalyzer entirely. This requires a small breaking change in the C API, changing BinaryenExpressionGetSideEffects's feature param to a module. That makes this change not NFC, but otherwise it is.
* Add a Module parameter to EffectAnalyzer. NFC (#4115)Alon Zakai2021-08-3114-110/+115
| | | | | | | | | | | | | Knowing the module will allow us to do more analysis in the effect analyzer. For now, this just refactors the code to allow providing a module instead of features, and to infer the features from the module. This actually shortens the code in most places which is nice (just pass module instead of module->features). This modifies basically all callers to use the new module form, except for the fallthrough logic. That would require some more refactoring, so to keep this PR reasonably small that is not yet done.
* Handle extra info in dylink section (#4112)Sam Clegg2021-08-314-41/+21
| | | | | If extra data is found in this section simply propagate it. Also, remove some dead code from wasm-binary.cpp.
* Use ModuleReader::readStdin for file "-" (#4114)Thomas Lively2021-08-301-2/+2
| | | | | | | | | After #4106 we already treat the input file "-" as a shorthand for reading from stdin at the file.cpp level. However, the "-" input was still treated as a normal file name at the wasm-io.cpp file and as a result was always treated as text input. This commit updates wasm-io.cpp to use the stdin code path supporting both binary and text input for "-". Fixes #4105 (again).
* [API] Add type argument for BinaryenAddTable method (#4107)Max Graey2021-08-273-5/+7
| | | In the JS API this is optional and it defaults to `funcref`.
* Read from stdin when the input file is `-` (#4106)Thomas Lively2021-08-272-2/+18
| | | | We already supported `-` as meaning stdout for output and this is useful in similar situations. Fixes #4105.
* Asyncify: Degrade gracefully if too many locals to compute ↵Alon Zakai2021-08-271-0/+9
| | | | relevantLiveLocals (#4108)
* Dominator Tree (#4100)Alon Zakai2021-08-261-0/+178
| | | | | | | | Add a class to compute the dominator tree for a CFG consisting of a list of basic blocks assumed to be in reverse postorder. This will be useful once cfg-walker emits blocks in reverse-postorder (which it almost does, another PR will handle that). Then we can write optimization passes that use block dominance.
* Costs: Index => ContType (#4103)Alon Zakai2021-08-241-82/+86
| | | The cost type isn't an index in a wasm binary, it's just a number.