summaryrefslogtreecommitdiff
path: root/test
Commit message (Collapse)AuthorAgeFilesLines
* Return the correct flow when an RTT is breaking (#4310)Thomas Lively2021-11-051-0/+45
| | | Fixes #4308.
* Fuzz more basic GC types (#4303)Thomas Lively2021-11-042-60/+60
| | | | | Generate both nullable and non-nullable references to basic HeapTypes and introduce `i31` and `data` HeapTypes. Generate subtypes rather than exact types for all concrete-typed children.
* ChildLocalizer: Do not use locals needlessly (#4292)Alon Zakai2021-11-041-1/+84
| | | | | | | | | | | We only need to use locals if there are effects we can't remove, or if they interact with other children. Improve the comment to explain what the ChildLocalizer is working towards: a state where all the children of the expression can be reordered or removed freely (local.gets have that property, as do other things if they have no relevant effects). Aside from avoiding wasteful locals, this is necessary for running GlobalTypeOptimization on j2wasm: That code will do a global.get of an rtt, and those cannot be placed in locals.
* Fix RTTs for RTT-less instructions (#4294)Thomas Lively2021-11-033-40/+54
| | | | | | | | | | | | Allocation and cast instructions without explicit RTTs should use the canonical RTTs for the given types. Furthermore, the RTTs for nominal types should reflect the static type hierarchy. Previously, however, we implemented allocations and casts without RTTs using an alternative system that only used static types rather than RTT values. This alternative system would work fine in a world without first-class RTTs, but it did not properly allow mixing instructions that use RTTs and instructions that do not use RTTs as intended by the M4 GC spec. This PR fixes the issue by using canonical RTTs where appropriate and cleans up the relevant casting code using std::variant.
* Generate lit checks for fuzz-exec output (#4301)Thomas Lively2021-11-031-0/+141
| | | Use the new capability in a new test of RTT behavior that will be fixed in #4284,
* Effects: Differentiate mutable from immutable globals (#4286)Alon Zakai2021-10-294-10/+133
| | | | | | | | | | | | | Similar to what we do with structs, if a global is immutable then we know it cannot interact with calls. This changes the JS API for getSideEffects(). That was actually broken, as passing in the optional module param would just pass it along to the compiled C code, so it was coerced to 0 or 1, and not a pointer to a module. To fix that, this now does module.ptr to actually get the pointer, and this is now actually tested as without a module we cannot compute the effects of a global. This PR also makes the module param mandatory in the JS API, as again, without a module we can't compute global effects. (The module param has already been mandatory in the C++ API for some time.)
* Heap2Local: Handle loops (#4288)Alon Zakai2021-10-282-1/+68
| | | | | | | When the allocation we optimize away flows through a loop, then just like with a block we must change the type to be nullable, since we are replacing the allocation with a null. Fixes #4287
* Switch binaryen.js/wasm to ESM (#4280)dcode2021-10-281-2/+2
|
* Fix printing of some unreachable GC instructions (#4281)Alon Zakai2021-10-277-31/+69
| | | | | We have separate logic for printing their headers and bodies, and they were not in sync. Specifically, we would not emit drops in the body of a block, which is not valid, and would fail roundtripping on text.
* [Wasm GC] Constant Field Propagation: Handle immutable globals (#4258)Alon Zakai2021-10-271-0/+298
| | | | | | | | | | | | | | | | | | If we write an immutable global to a field, and that is the only thing we ever write, then we can replace reads of the field with a get of the global. To do that, this tracks immutable globals written to fields and not just constant values. Normally this is not needed, as if the global is immutable then we propagate its constant value to everywhere anyhow. However, for references this is useful: If we have a global immutable vtable, for example, then we cannot replace a get of it with a constant. So this PR helps with immutable reference types in globals, allowing us to propagate global.gets to them to more places, which then can allow optimizations there. This + later opts removes 25% of array.gets from j2wasm. I believe almost all of those are itable calls, so this means those are getting devirtualized now. I see something like a 5% speedup due to that.
* [OptimizeInstructions] Canonicalize relational ops with near zero on rhs (#4272)Max Graey2021-10-267-43/+134
| | | | | | | | | | | | | Canonicalize: (signed)x > -1 ==> x >= 0 (signed)x <= -1 ==> x < 0 (signed)x < 1 ==> x <= 0 (signed)x >= 1 ==> x > 0 (unsigned)x < 1 ==> x == 0 (unsigned)x >= 1 ==> x != 0 This should help #4265, and in general 0 is usually a more common constant, and reasonable to canonicalize to.
* [Wasm GC] Add missing Tuple logic to TypeRewriter (#4278)Alon Zakai2021-10-261-0/+35
| | | | | Without this, we'd just return the old type for the tuple, which meant its fields referred to unrewritten types, and possible validation errors if the types changed.
* GlobalTypeOptimization: Handle side effects in removed fields in struct.new ↵Alon Zakai2021-10-261-0/+157
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#4263) If struct.new operands have side effects, and we are removing the operand as the field is removed, we must keep the side effects. To handle that, store all the operands in locals and read from the locals, and then removing a local.get is always safe to do, and nothing has been reordered: (struct.new (A) (side effect) ;; this field will be removed (B) ) => (local.set $a (A)) (local.set $t (side effect)) (local.set $b (B)) (struct.new (local.get $a) (local.get $b) ) Later passes can remove unneeded local operations etc. This is necessary before enabling this pass, as this corner case occurs on j2wasm.
* [EH] Support try-delegate in CFGWalker (#4269)Heejin Ahn2021-10-211-0/+358
| | | | This adds support for `try`-`delegate` to `CFGWalker`. This also adds a single test for `catch`-less `try`.
* [Wasm GC] Global Type Optimization: Remove unread fields (#4255)Alon Zakai2021-10-202-6/+584
| | | | | | | | | | | | | | | Add struct.get tracking, and if a field is never read from, simply remove it. This will error if a field is written using struct.new with a value with side effects. It is not clear we can handle that, as if the struct.new is in a global then we can't save the other values to locals etc. to reorder things. We could perhaps use other globals for it (ugh) but at least for now, that corner case does not happen on any code I can see. This allows a quite large code size reduction on j2wasm output (20%). The reason is that many vtable fields are not actually read, and so removing them and the ref.func they hold allows us to get rid of those functions, and code that they reach.
* SimplifyGlobals: Detect trivial read-only-to-write functions (#4257)Alon Zakai2021-10-191-0/+149
| | | | | | | | | | | | | | | | | | | | | We already detected code that looks like if (foo == 0) { foo = 1; } That "read only to write" pattern occurs also in functions, like this: function bar() { if (foo == 0) return; foo = 1; } This PR detects that pattern. It moves code around to share almost all the logic with the previous pattern (the git diff is not that useful there, sadly, but looking at them side by side that should be obvious). This helps in j2cl on some common clinits, where the clinit function ends up empty, which is exactly this pattern.
* OptimizeInstructions: Ignore unreachable subsequent sets (#4259)Alon Zakai2021-10-191-0/+47
| | | Fuzzing followup to #4244.
* MergeBlocks: optimize If conditions (#4260)Alon Zakai2021-10-194-124/+152
| | | | | Code in the If condition can be moved out to before the if. Existing test updates are 99% whitespace.
* Add table.grow operation (#4245)Max Graey2021-10-188-71/+285
|
* Add a --structural flag (#4252)Thomas Lively2021-10-1610-171/+40
| | | | | | | | | Just as the --nominal flag forces all types to be parsed as nominal, the --structural flag forces all types to be parsed as equirecursive. This is the current default behavior, but a future PR will change the default to parse types as either structural or nominal according to their syntax or encoding. This new flag will then be necessary to get the current behavior. Also take this opportunity to deduplicate more flags in the help tests.
* [Wasm GC] Propagate immutable fields (#4251)Alon Zakai2021-10-151-0/+820
| | | | Very simple with the work so far, just add StructGet/ArrayGet code to check if the field is immutable, and allow the get to go through in that case.
* [wasm-metadce] Add support for tags (#4250)Heejin Ahn2021-10-144-0/+41
| | | | | | This adds support for tag-using instructions (`throw` and `catch`) to wasm-metadce. We had to use a hacky workaround in emscripten-core/emscripten#15266 because of the lack of this support; after this lands we can remove it.
* Switch from "extends" to M4 nominal syntax (#4248)Thomas Lively2021-10-1428-221/+2254
| | | | | | | | Switch from "extends" to M4 nominal syntax Change all test inputs from using the old (extends $super) syntax to using the new *_subtype syntax for their inputs and also update the printer to emit the new syntax. Add a new test explicitly testing the old notation to make sure it keeps working until we remove support for it.
* [Wasm GC] Optimize subsequent struct.sets after a struct.new (#4244)Alon Zakai2021-10-141-0/+719
| | | | | | | | | | | | | | | | | This optimizes this type of pattern: (local.set $x (struct.new X Y Z)) (struct.set (local.get $x) X') => (local.set $x (struct.new X' Y Z)) Note how the struct.set is removed, and X' moves to where X was. This removes almost 90% (!) of the struct.sets in j2wasm output, which reduces total code size by 2.5%. However, I see no speedup with this - I guess that either this is not on the hot path, or V8 optimizes it well already, or the CPU is making stores "free" anyhow...
* Precompute: Track reference identity (#4243)Alon Zakai2021-10-142-164/+532
| | | | | | | | | | | | | | Precompute will run the interpreter on struct.new etc. repeatedly, as it keeps doing so while it propagates constant values around (if one of the operands to the struct.new becomes constant, that could have a noticeable effect). But creating new GC data means we lose track of their identity, and so ref.eq would not work, and we disabled basically all struct operations. This implements identity tracking so we can start to optimize there, which is a step towards using it for immutable field propagation. To track identity, always store the data representing each struct.new in the source using the same GCData structure. That keeps identity consistent no matter how many times we execute.
* MergeBlocks: Allow side effects in a ternary's first element (#4238)Alon Zakai2021-10-133-46/+136
| | | | | | | | | | | Side effects in the first element are always ok there, as they are not moved across anything else: they happen before their parent both before and after the opt. The pass just left ternary as a TODO, so do at least one part of that now (we can do the rest as well, with some care). This is fairly useful on array.set which has 3 operands, and the first often has interesting things in it.
* [Selectify] Increase TooCostlyToRunUnconditionally from 7 to 9 (#4228)Max Graey2021-10-132-8/+55
| | | | This makes Binaryen match LLVM on a real-world case, which is probably the safest heuristic to use.
* [Wasm GC] Take advantage of immutable struct fields in effects.h (#4240)Alon Zakai2021-10-132-1/+63
| | | | | | This is the easy part of using immutability more: Just note immutable fields as such when we read from them, and then a write to a struct does not interfere with such reads. That is, only a read from a mutable field can notice the effect of a write.
* Fix function name `BinaryenTableSizeSetTable` (#4230)Paulo Matos2021-10-121-1/+5
| | | | | `BinaryenTableSizeSetTable` was being declared in the header correctly, but defined as `BinaryenTableSetSizeTable`. Add test for `BinaryenTableSizeGetTable` and `BinaryenTableSizeSetTable`.
* Fix tee/as-non-null reordering when writing to a non-nullable param (#4232)Alon Zakai2021-10-111-0/+31
|
* Add table.size operation (#4224)Max Graey2021-10-087-25/+138
|
* Parse milestone 4 nominal types (#4222)Thomas Lively2021-10-082-0/+130
| | | | | | | | | Implement parsing the new {func,struct,array}_subtype format for nominal types. For now, the new format is parsed the same way the old-style (extends X) format is parsed, i.e. in --nominal mode types are parsed as nominal but otherwise they are parsed as equirecursive. Intentionally do not parse the new types unconditionally as nominal for now to allow frontends to update their nominal text format while continuing to use the workflow of running wasm-opt without --nominal to lower nominal types to structural types.
* Emit heap types for call_indirect that match the table (#4221)Alon Zakai2021-10-082-2/+90
| | | | | | | | See #4220 - this lets us handle the common case for now of simply having an identical heap type to the table when the signature is identical. With this PR, #4207's optimization of call_ref + table.get into call_indirect now leads to a binary that works in V8 in nominal mode.
* Directize: Do not optimize if a table has a table.set (#4218)Alon Zakai2021-10-071-0/+64
| | | Followup to #4215
* Add table.set operation (#4215)Max Graey2021-10-079-31/+225
|
* [Wasm GC] GlobalTypeOptimization: Turn fields immutable when possible (#4213)Alon Zakai2021-10-062-0/+350
| | | | | | | | | | | | | | | | | | | | | Add a new pass to perform global type optimization. So far this just does one thing, to find fields with no struct.set and to turn them immutable (where possible - sub and supertypes must agree). To do that, this adds a GlobalTypeRewriter utility which rewrites all the heap types in the module, allowing changes while doing so. In this PR, the change is to flip the mutable field. Otherwise, the utility handles all the boilerplate of creating temp heap types using a TypeBuilder, and it handles replacing the types in every place they are used in the module. This is not enabled by default yet as I don't see enough of a benefit on j2cl. This PR is basically the simplest thing to do in the space of global type optimization, and the simplest way I can think of to fully test the GlobalTypeRewriter (which can't be done as a unit test, really, since we want to emit a full module and validate it etc.). This PR builds the foundation for more complicated things like removing unused fields, subtyping fields, and more.
* [OptimizeInstructions] Fold select into zero or single expression for some ↵Max Graey2021-10-053-27/+422
| | | | | | | | | | | patterns (#4181) i32(x) ? i32(x) : 0 ==> x i32(x) ? 0 : i32(x) ==> {x, 0} i64(x) == 0 ? 0 : i64(x) ==> x i64(x) != 0 ? i64(x) : 0 ==> x i64(x) == 0 ? i64(x) : 0 ==> {x, 0} i64(x) != 0 ? 0 : i64(x) ==> {x, 0}
* Implement standalone nominal types (#4201)Thomas Lively2021-10-052-0/+527
| | | | | | | | These new nominal types do not depend on the global type sytem being changed with the --nominal flag. Instead, they can coexist with the existing equirecursive structural types, as required in the new milestone 4 spec. This PR implements subtyping, upper bounding, canonicalizing, and other type operations but using the new types in the parsers and elsewhere in Binaryen is left to a follow-on PR.
* Fix roundtripping specialized element segments of table zero (#4212)Alon Zakai2021-10-051-0/+42
| | | | | Before this fix, the first table (index 0) is counted as its element segment having "no table index" even when its type is not funcref, which could break things if that table had a more specialized type.
* Optimize call_indirect of a select of two constants (#4208)Alon Zakai2021-10-041-0/+299
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | (call_indirect ..args.. (select (i32.const x) (i32.const y) (condition) ) ) => (if (condition) (call $func-for-x ..args.. ) (call $func-for-y ..args.. ) ) To do this we must reorder the condition with the args, and also use the args more than once, so place them all in locals. This works towards the goal of polymorphic devirtualization, that is, turning an indirect call of more than one possible target into more than one direct call.
* Optimize call_ref+table.get => call_indirect (#4207)Alon Zakai2021-10-041-5/+28
| | | Rather than load from the table and call that reference, call using the table.
* Fix inlining name collision (#4206)Alon Zakai2021-10-041-1/+28
|
* Port a batch of passes tests to lit (#4202)Thomas Lively2021-10-0419-2090/+2212
| | | | | | | | - fpcast-emu.wast - generate-dyncalls_all-features.wast - generaite-i64-dyncalls.wast - instrument-locals_all-features_disable-typed-function-references.wast - instrument-memory.wast - instrument-memory64.wast
* Implement table.get (#4195)Alon Zakai2021-09-304-25/+212
| | | | Adds the part of the spec test suite that this passes (without table.set we can't do it all).
* [Wasm GC] Optimize static (rtt-free) operations (#4186)Alon Zakai2021-09-302-15/+665
| | | | | | | | | | | | Now that they are all implemented, we can optimize them. This removes the big if that ignored static operations, and implements things for them. In general this matches the existing rtt-using case, but there are a few things we can do better, which this does: * A cast of a subtype to a type always succeeds. * A test of a subtype to a type is always 1 (if non-nullable). * Repeated static casts can leave just the most demanding of them.
* Add a SmallSet and use it in LocalGraph. NFC (#4188)Alon Zakai2021-09-292-0/+222
| | | | | | | | | | | | | | | | | A SmallSet starts with fixed storage that it uses in the simplest possible way (linear scan, no sorting). If it exceeds a size then it starts using a normal std::set. So for small amounts of data it avoids allocation and any other overhead. This adds a unit test and also uses it in LocalGraph which provides a large amount of additional coverage. I also changed an unrelated data structure from std::map to std::unordered_map which I noticed while doing profiling in LocalGraph. (And a tiny bit of additional refactoring there.) This makes LocalGraph-using passes like ssa-nomerge and precompute-propagate 10-15% faster on a bunch of real-world codebases I tested.
* Clang-format c/cpp files in test directory (#4192)Heejin Ahn2021-09-2934-1701/+2406
| | | | | | | | | This clang-formats c/cpp files in test/ directory, and updates clang-format-diff.sh so that it does not ignore test/ directory anymore. bigswitch.cpp is excluded from formatting, because there are big commented-out code blocks, and apparently clang-format messes up formatting in them. Also to make matters worse, different clang-format versions do different things on those commented-out code blocks.
* Disable partial inlining by default and add a flag for it. (#4191)Alon Zakai2021-09-274-626/+585
| | | | | Locally I saw a 10% speedup on j2cl but reports of regressions have arrived, so let's disable it for now pending investigation. The option added here should make it easy to experiment.
* [wasm-split] Disallow mixing --profile, --keep-funcs, and --split-funcs (#4187)Thomas Lively2021-09-244-14/+35
| | | | | | | | | | | | | Previously the set of functions to keep was initially empty, then the profile added new functions to keep, then the --keep-funcs functions were added, then the --split-funcs functions were removed. This method of composing these different options was arbitrary and not necessarily intuitive, and it prevented reasonable workflows from working. For example, providing only a --split-funcs list would result in all functions being split out not matter which functions were listed. To make the behavior of these options, and --split-funcs in particular, more intuitive, disallow mixing them and when --split-funcs is used, split out only the listed functions.
* Precompute: Only run a single LocalGraph iteration (#4184)Alon Zakai2021-09-232-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | Precompute has a mode in which it propagates results from local.sets to local.gets. That constructs a LocalGraph which is a non-trivial amount of work. We used to run multiple iterations of this, but investigation shows that such opportunities are extremely rare, as doing just a single propagation iteration has no effect on the entire emscripten benchmark suite, nor on j2cl output. Furthermore, we run this pass twice in the normal pipeline (once early, once late) so even if there are such opportunities they may be optimized already. And, --converge is a way to get additional iterations of all passes if a user wants that, so it makes sense not to costly work for more iterations automatically. In effect, 99.99% of the time before this pass we would create the LocalGraph twice: once the first time, then a second time only to see that we can't actually optimize anything further. This PR makes us only create it once, which makes precompute-propagate 10% faster on j2cl and even faster on other things like poppler (33%) and LLVM (29%). See the change in the test suite for an example of a case that does require more than one iteration to be optimized. Note that even there, we only manage to get benefit from a second iteration by doing something that overlaps with another pass (optimizing out an if with condition 0), which shows even more how unnecessary the extra work was. See #4165