forks/binaryen.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	[Wasm GC] Optimize subsequent struct.sets after a struct.new (#4244)	Alon Zakai	2021-10-14	1	-0/+719
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This optimizes this type of pattern: (local.set $x (struct.new X Y Z)) (struct.set (local.get $x) X') => (local.set $x (struct.new X' Y Z)) Note how the struct.set is removed, and X' moves to where X was. This removes almost 90% (!) of the struct.sets in j2wasm output, which reduces total code size by 2.5%. However, I see no speedup with this - I guess that either this is not on the hot path, or V8 optimizes it well already, or the CPU is making stores "free" anyhow...
*	Precompute: Track reference identity (#4243)	Alon Zakai	2021-10-14	2	-164/+532
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Precompute will run the interpreter on struct.new etc. repeatedly, as it keeps doing so while it propagates constant values around (if one of the operands to the struct.new becomes constant, that could have a noticeable effect). But creating new GC data means we lose track of their identity, and so ref.eq would not work, and we disabled basically all struct operations. This implements identity tracking so we can start to optimize there, which is a step towards using it for immutable field propagation. To track identity, always store the data representing each struct.new in the source using the same GCData structure. That keeps identity consistent no matter how many times we execute.
*	MergeBlocks: Allow side effects in a ternary's first element (#4238)	Alon Zakai	2021-10-13	3	-46/+136
\| \| \| \| \| \| \| \| \| \| \|	Side effects in the first element are always ok there, as they are not moved across anything else: they happen before their parent both before and after the opt. The pass just left ternary as a TODO, so do at least one part of that now (we can do the rest as well, with some care). This is fairly useful on array.set which has 3 operands, and the first often has interesting things in it.
*	[Selectify] Increase TooCostlyToRunUnconditionally from 7 to 9 (#4228)	Max Graey	2021-10-13	2	-8/+55
\| \| \| \|	This makes Binaryen match LLVM on a real-world case, which is probably the safest heuristic to use.
*	[Wasm GC] Take advantage of immutable struct fields in effects.h (#4240)	Alon Zakai	2021-10-13	2	-1/+63
\| \| \| \| \| \|	This is the easy part of using immutability more: Just note immutable fields as such when we read from them, and then a write to a struct does not interfere with such reads. That is, only a read from a mutable field can notice the effect of a write.
*	Fix function name `BinaryenTableSizeSetTable` (#4230)	Paulo Matos	2021-10-12	1	-1/+5
\| \| \| \| \|	`BinaryenTableSizeSetTable` was being declared in the header correctly, but defined as `BinaryenTableSetSizeTable`. Add test for `BinaryenTableSizeGetTable` and `BinaryenTableSizeSetTable`.
*	Fix tee/as-non-null reordering when writing to a non-nullable param (#4232)	Alon Zakai	2021-10-11	1	-0/+31
\|
*	Add table.size operation (#4224)	Max Graey	2021-10-08	7	-25/+138
\|
*	Parse milestone 4 nominal types (#4222)	Thomas Lively	2021-10-08	2	-0/+130
\| \| \| \| \| \| \| \| \|	Implement parsing the new {func,struct,array}_subtype format for nominal types. For now, the new format is parsed the same way the old-style (extends X) format is parsed, i.e. in --nominal mode types are parsed as nominal but otherwise they are parsed as equirecursive. Intentionally do not parse the new types unconditionally as nominal for now to allow frontends to update their nominal text format while continuing to use the workflow of running wasm-opt without --nominal to lower nominal types to structural types.
*	Emit heap types for call_indirect that match the table (#4221)	Alon Zakai	2021-10-08	2	-2/+90
\| \| \| \| \| \| \| \|	See #4220 - this lets us handle the common case for now of simply having an identical heap type to the table when the signature is identical. With this PR, #4207's optimization of call_ref + table.get into call_indirect now leads to a binary that works in V8 in nominal mode.
*	Directize: Do not optimize if a table has a table.set (#4218)	Alon Zakai	2021-10-07	1	-0/+64
\| \| \|	Followup to #4215
*	Add table.set operation (#4215)	Max Graey	2021-10-07	9	-31/+225
\|
*	[Wasm GC] GlobalTypeOptimization: Turn fields immutable when possible (#4213)	Alon Zakai	2021-10-06	2	-0/+350
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new pass to perform global type optimization. So far this just does one thing, to find fields with no struct.set and to turn them immutable (where possible - sub and supertypes must agree). To do that, this adds a GlobalTypeRewriter utility which rewrites all the heap types in the module, allowing changes while doing so. In this PR, the change is to flip the mutable field. Otherwise, the utility handles all the boilerplate of creating temp heap types using a TypeBuilder, and it handles replacing the types in every place they are used in the module. This is not enabled by default yet as I don't see enough of a benefit on j2cl. This PR is basically the simplest thing to do in the space of global type optimization, and the simplest way I can think of to fully test the GlobalTypeRewriter (which can't be done as a unit test, really, since we want to emit a full module and validate it etc.). This PR builds the foundation for more complicated things like removing unused fields, subtyping fields, and more.
*	[OptimizeInstructions] Fold select into zero or single expression for some ↵	Max Graey	2021-10-05	3	-27/+422
\| \| \| \| \| \| \| \| \| \| \|	patterns (#4181) i32(x) ? i32(x) : 0 ==> x i32(x) ? 0 : i32(x) ==> {x, 0} i64(x) == 0 ? 0 : i64(x) ==> x i64(x) != 0 ? i64(x) : 0 ==> x i64(x) == 0 ? i64(x) : 0 ==> {x, 0} i64(x) != 0 ? 0 : i64(x) ==> {x, 0}
*	Implement standalone nominal types (#4201)	Thomas Lively	2021-10-05	2	-0/+527
\| \| \| \| \| \| \| \|	These new nominal types do not depend on the global type sytem being changed with the --nominal flag. Instead, they can coexist with the existing equirecursive structural types, as required in the new milestone 4 spec. This PR implements subtyping, upper bounding, canonicalizing, and other type operations but using the new types in the parsers and elsewhere in Binaryen is left to a follow-on PR.
*	Fix roundtripping specialized element segments of table zero (#4212)	Alon Zakai	2021-10-05	1	-0/+42
\| \| \| \| \|	Before this fix, the first table (index 0) is counted as its element segment having "no table index" even when its type is not funcref, which could break things if that table had a more specialized type.
*	Optimize call_indirect of a select of two constants (#4208)	Alon Zakai	2021-10-04	1	-0/+299
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(call_indirect ..args.. (select (i32.const x) (i32.const y) (condition) ) ) => (if (condition) (call $func-for-x ..args.. ) (call $func-for-y ..args.. ) ) To do this we must reorder the condition with the args, and also use the args more than once, so place them all in locals. This works towards the goal of polymorphic devirtualization, that is, turning an indirect call of more than one possible target into more than one direct call.
*	Optimize call_ref+table.get => call_indirect (#4207)	Alon Zakai	2021-10-04	1	-5/+28
\| \| \|	Rather than load from the table and call that reference, call using the table.
*	Fix inlining name collision (#4206)	Alon Zakai	2021-10-04	1	-1/+28
\|
*	Port a batch of passes tests to lit (#4202)	Thomas Lively	2021-10-04	19	-2090/+2212
\| \| \| \| \| \| \| \|	- fpcast-emu.wast - generate-dyncalls_all-features.wast - generaite-i64-dyncalls.wast - instrument-locals_all-features_disable-typed-function-references.wast - instrument-memory.wast - instrument-memory64.wast
*	Implement table.get (#4195)	Alon Zakai	2021-09-30	4	-25/+212
\| \| \| \|	Adds the part of the spec test suite that this passes (without table.set we can't do it all).
*	[Wasm GC] Optimize static (rtt-free) operations (#4186)	Alon Zakai	2021-09-30	2	-15/+665
\| \| \| \| \| \| \| \| \| \| \| \|	Now that they are all implemented, we can optimize them. This removes the big if that ignored static operations, and implements things for them. In general this matches the existing rtt-using case, but there are a few things we can do better, which this does: * A cast of a subtype to a type always succeeds. * A test of a subtype to a type is always 1 (if non-nullable). * Repeated static casts can leave just the most demanding of them.
*	Add a SmallSet and use it in LocalGraph. NFC (#4188)	Alon Zakai	2021-09-29	2	-0/+222
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A SmallSet starts with fixed storage that it uses in the simplest possible way (linear scan, no sorting). If it exceeds a size then it starts using a normal std::set. So for small amounts of data it avoids allocation and any other overhead. This adds a unit test and also uses it in LocalGraph which provides a large amount of additional coverage. I also changed an unrelated data structure from std::map to std::unordered_map which I noticed while doing profiling in LocalGraph. (And a tiny bit of additional refactoring there.) This makes LocalGraph-using passes like ssa-nomerge and precompute-propagate 10-15% faster on a bunch of real-world codebases I tested.
*	Clang-format c/cpp files in test directory (#4192)	Heejin Ahn	2021-09-29	34	-1701/+2406
\| \| \| \| \| \| \| \| \|	This clang-formats c/cpp files in test/ directory, and updates clang-format-diff.sh so that it does not ignore test/ directory anymore. bigswitch.cpp is excluded from formatting, because there are big commented-out code blocks, and apparently clang-format messes up formatting in them. Also to make matters worse, different clang-format versions do different things on those commented-out code blocks.
*	Disable partial inlining by default and add a flag for it. (#4191)	Alon Zakai	2021-09-27	4	-626/+585
\| \| \| \| \|	Locally I saw a 10% speedup on j2cl but reports of regressions have arrived, so let's disable it for now pending investigation. The option added here should make it easy to experiment.
*	[wasm-split] Disallow mixing --profile, --keep-funcs, and --split-funcs (#4187)	Thomas Lively	2021-09-24	4	-14/+35
\| \| \| \| \| \| \| \| \| \| \| \| \|	Previously the set of functions to keep was initially empty, then the profile added new functions to keep, then the --keep-funcs functions were added, then the --split-funcs functions were removed. This method of composing these different options was arbitrary and not necessarily intuitive, and it prevented reasonable workflows from working. For example, providing only a --split-funcs list would result in all functions being split out not matter which functions were listed. To make the behavior of these options, and --split-funcs in particular, more intuitive, disallow mixing them and when --split-funcs is used, split out only the listed functions.
*	Precompute: Only run a single LocalGraph iteration (#4184)	Alon Zakai	2021-09-23	2	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Precompute has a mode in which it propagates results from local.sets to local.gets. That constructs a LocalGraph which is a non-trivial amount of work. We used to run multiple iterations of this, but investigation shows that such opportunities are extremely rare, as doing just a single propagation iteration has no effect on the entire emscripten benchmark suite, nor on j2cl output. Furthermore, we run this pass twice in the normal pipeline (once early, once late) so even if there are such opportunities they may be optimized already. And, --converge is a way to get additional iterations of all passes if a user wants that, so it makes sense not to costly work for more iterations automatically. In effect, 99.99% of the time before this pass we would create the LocalGraph twice: once the first time, then a second time only to see that we can't actually optimize anything further. This PR makes us only create it once, which makes precompute-propagate 10% faster on j2cl and even faster on other things like poppler (33%) and LLVM (29%). See the change in the test suite for an example of a case that does require more than one iteration to be optimized. Note that even there, we only manage to get benefit from a second iteration by doing something that overlaps with another pass (optimizing out an if with condition 0), which shows even more how unnecessary the extra work was. See #4165
*	RemoveUnusedBrs: Optimize if-of-if pattern (#4180)	Alon Zakai	2021-09-23	2	-254/+403
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	if (A) { if (B) { C } } => if (A ? B : 0) { C } when B has no side effects, and is fast enough to consider running unconditionally. In that case, we replace an if with a select and a zero, which is the same size, but should be faster and may be further optimized. As suggested in #4168
*	Add feature flag for relaxed-simd (#4183)	Ng Zhi An	2021-09-23	8	-2/+17
\|
*	[Wasm GC] Implement static (rtt-free) StructNew, ArrayNew, ArrayInit (#4172)	Alon Zakai	2021-09-23	9	-35/+334
\| \| \| \| \| \| \| \| \|	See #4149 This modifies the test added in #4163 which used static casts on dynamically-created structs and arrays. That was technically not valid (as we won't want users to "mix" the two forms). This makes that test 100% static, which both fixes the test and gives test coverage to the new instructions added here.
*	[Wasm GC] Fix invalid intermediate IR in OptimizeInstructions (#4169)	Alon Zakai	2021-09-20	3	-21/+67
\| \| \| \| \| \| \| \| \| \| \| \|	We added an optional ReFinalize in OptimizeInstructions at some point, but that is not valid: The ReFinalize only updates types when all other works is done, but the pass works incrementally. The bug the fuzzer found is that a child is changed to be unreachable, and then the parent is optimized before finalize() is called on it, which led to an assertion being hit (as the child was unreachable but not the parent, which should also be). To fix this, do not change types in this pass. Emit an extra block with a declared type when necessary. Other passes can remove the extra block.
*	[Matcher] Add bval for matching boolean literals (#4162)	Max Graey	2021-09-20	2	-2/+14
\|
*	[Wasm GC] Add static variants of ref.test, ref.cast, and br_on_cast* (#4163)	Alon Zakai	2021-09-20	6	-7/+342
\| \| \| \| \| \| \| \| \| \| \| \|	These variants take a HeapType that is the type we intend to cast to, and do not take an RTT. These are intended to be more statically optimizable. For now though this PR just implements the minimum to get them parsing and to get through the optimizer without crashing. Spec: https://docs.google.com/document/d/1afthjsL_B9UaMqCA5ekgVmOm75BVFu6duHNsN9-gnXw/edit# See #4149
*	Fix interpreting of ref.as_func\|data (#4164)	Alon Zakai	2021-09-20	2	-18/+68
\|
*	Partial inlining via function splitting (#4152)	Alon Zakai	2021-09-17	2	-578/+2325
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR helps with functions like this: function foo(x) { if (x) { .. lots of work here .. } } If "lots of work" is large enough, then we won't inline such a function. However, we may end up calling into the function only to get a false on that if and immediately exit. So it is useful to partially inline this function, basically by creating a split of it into a condition part that is inlineable function foo$inlineable(x) { if (x) { foo$outlined(); } } and an outlined part that is not inlineable: function foo$outlined(x) { .. lots of work here .. } We can then inline the inlineable part. That means that a call like foo(param); turns into if (param) { foo$outlined(); } In other words, we end up replacing a call and then a check with a check and then a call. Any time that the condition is false, this will be a speedup. The cost here is increased size, as we duplicate the condition into the callsites. For that reason, only do this when heavily optimizing for size. This is a 10% speedup on j2cl. This helps two types of functions there: Java class inits, which often look like "have I been initialized before? if not, do all this work", and also assertion methods which look like "if the input is null, throw an exception".
*	[Wasm GC] Optimize away ref.as_non_null going into local.set in TNH mode (#4157)	Alon Zakai	2021-09-16	3	-1/+64
\| \| \| \| \| \| \| \| \| \|	If we can remove such traps, we can remove ref.as_non_null if the local type is nullable anyhow. If we support non-nullable locals, however, then do not do this, as it could inhibit specializing the local type later. Do the same for tees which we had existing code for. Background: #4061 (comment)
*	Propagate environment variables to lit test commands (#4159)	Thomas Lively	2021-09-16	1	-0/+3
\| \| \| \| \| \|	This means that when check.py tries to run the lit tests with BINARYEN_PASS_DEBUG, this is now correctly reflected in the tests. Manually validated to catch the bug identified in https://github.com/WebAssembly/binaryen/pull/4130#discussion_r709619855.
*	Fix regression from #4130 (#4158)	Alon Zakai	2021-09-16	1	-1/+211
\| \| \| \| \| \| \| \| \|	That PR reused the same node twice in the output, which fails on the assertion in BINARYEN_PASS_DEBUG=1 mode. No new test is needed because the existing test suite fails already in that mode. That the PR managed to land seems to say that we are not testing pass-debug mode on our lit tests, which we need to investigate.
*	[Wasm GC] Fix OptimizeInstructions on unreachable ref.test (#4156)	Alon Zakai	2021-09-15	1	-4/+30
\| \| \| \| \| \|	Avoids a crash in calling getHeapType when there isn't one. Also add the relevant lit test (and a few others) to the list of files to fuzz more heavily.
*	[Wasm GC] Fix lack of packing in array.init (#4153)	Alon Zakai	2021-09-14	2	-0/+29
\|
*	[OptimizeInstructions] Optimize memory.fill with constant arguments (#4130)	Max Graey	2021-09-14	2	-83/+343
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is reland of #3071 Do similar optimizations as in #3038 but for memory.fill. `memory.fill(d, v, 0)` ==> `{ drop(d), drop(v) }` only with `ignoreImplicitTraps` or `trapsNeverHappen` `memory.fill(d, v, 1)` ==> `store8(d, v)` Further simplifications can be done only if v is constant because otherwise binary size would increase: `memory.fill(d, C, 1)` ==> `store8(d, (C & 0xFF))` `memory.fill(d, C, 2)` ==> `store16(d, (C & 0xFF) * 0x0101)` `memory.fill(d, C, 4)` ==> `store32(d, (C & 0xFF) * 0x01010101)` `memory.fill(d, C, 8)` ==> `store64(d, (C & 0xFF) * 0x0101010101010101)` `memory.fill(d, C, 16)` ==> `store128(d, i8x16.splat(C & 0xFF))`
*	RemoveUnusedBrs::tablify() improvements: handle EqZ and tee (#4144)	Alon Zakai	2021-09-13	2	-0/+190
\| \| \| \| \| \| \| \| \| \| \| \|	tablify() attempts to turns a sequence of br_ifs into a single br_table. This PR adds some flexibility to the specific pattern it looks for, specifically: * Accept i32.eqz as a comparison to zero, and not just to look for i32.eq against a constant. * Allow the first condition to be a tee. If it is, compare later conditions to local.get of that local. This will allow more br_tables to be emitted in j2cl output.
*	OptimizeInstructions: Optimize boolean selects (#4147)	Alon Zakai	2021-09-13	2	-4/+250
\| \| \| \| \| \| \| \| \| \| \| \|	If all a select's inputs are boolean, we can sometimes turn the select into an AND or an OR operation, x ? y : 0 => x & y x ? 1 : y => x \| y I believe LLVM aggressively canonicalizes to this form. It makes sense to do here too as it is smaller (save the constant 0 or 1). It also allows further optimizations (which is why LLVM does it) but I don't think we have those yet.
*	Support new dylink.0 custom section format (#4141)	Sam Clegg	2021-09-11	2	-3/+3
\| \| \| \| \| \| \|	See also: spec change: https://github.com/WebAssembly/tool-conventions/pull/170 llvm change: https://reviews.llvm.org/D109595 wabt change: https://github.com/WebAssembly/wabt/pull/1707 emscripten change: https://github.com/emscripten-core/emscripten/pull/15019
*	Add an Intrinsics mechanism, and a call.without.effects intrinsic (#4126)	Alon Zakai	2021-09-10	3	-0/+288
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An "intrinsic" is modeled as a call to an import. We could also add new IR things for them, but that would take more work and lead to less clear errors in other tools if they try to read a binary using such a nonstandard extension. A first intrinsic is added here, call.without.effects This is basically the same as call_ref except that the optimizer is free to assume the call has no side effects. Consequently, if the result is not used then it can be optimized out (as even if it is not used then side effects could have kept it around). Likewise, the lack of side effects allows more reordering and other things. A lowering pass for intrinsics is provided. Rather than automatically lower them to normal wasm at the end of optimizations, the user must call that pass explicitly. A typical workflow might be -O --intrinsic-lowering -O That optimizes with the intrinsic present - perhaps removing calls thanks to it - then lowers it into normal wasm - it turns into a call_ref - and then optimizes further, which would turns the call_ref into a direct call, potentially inline, etc.
*	[Wasm GC] ArrayInit support (#4138)	Alon Zakai	2021-09-10	8	-9/+142
\| \| \| \| \| \| \|	array.init is like array.new_with_rtt except that it takes as arguments the values to initialize the array with (as opposed to a size and an optional initial value). Spec: https://docs.google.com/document/d/1afthjsL_B9UaMqCA5ekgVmOm75BVFu6duHNsN9-gnXw/edit#
*	Refactor MergeBlocks to use iteration; adds Wasm GC support (#4137)	Alon Zakai	2021-09-09	1	-1/+60
\| \| \| \| \| \| \| \| \|	MergeBlocks was written a very long time ago, before the iteration API, so it had a bunch of hardcoded things for specific instructions. In particular, that did not handle GC. This does a small refactoring to use iteration. The refactoring is NFC, but while doing so it adds support for new relevant instructions, including wasm GC.
*	[OptimizeInstructions] propagate sign for integer multiplication (#4098)	Max Graey	2021-09-09	2	-0/+266
\| \| \| \| \| \| \| \| \| \| \| \|	```ts -x * -y => (x * y) -x * y => -(x * y) x * -y => -(x * y), if x != C && y != C -x * C => x * -C, if C != C_pot \|\| shrinkLevel != 0 -x * C => -(x * C), otherwise ``` We are skipping propagation when lhs and rhs are constants because this should handled by constant folding. Also skip cases like `-x * 4 -> x * -4` for `shrinkLevel != 0`, as this will be further converted to `-(x << 2)`.
*	[wasm-split] Do not add exports of imported memories (#4133)	Thomas Lively	2021-09-08	1	-0/+12
\| \| \| \| \| \|	We can assume that imported memories (and the profiling data they contain) are already accessible from the module's environment, so there's no need to export them. This also avoids needing to add knowledge of "profile-memory" to Emscripten's library_dylink.js.
*	wasm-split: Export the memory if it is not already (#4121)	Alon Zakai	2021-09-07	1	-1/+4
\|