summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
...
* OptimizeInstructions: Ignore unreachable subsequent sets (#4259)Alon Zakai2021-10-191-0/+5
| | | Fuzzing followup to #4244.
* MergeBlocks: optimize If conditions (#4260)Alon Zakai2021-10-191-0/+5
| | | | | Code in the If condition can be moved out to before the if. Existing test updates are 99% whitespace.
* Update to C++17 and use std::optional for getSuperType (#4203)Derek Schuff2021-10-188-35/+30
| | | This sets the C++ standard variable in the build to C++17, and makes use of std::optional (a C++17 library feature) in one place, to test that it's working.
* Add table.grow operation (#4245)Max Graey2021-10-1824-42/+343
|
* Add a --structural flag (#4252)Thomas Lively2021-10-161-2/+9
| | | | | | | | | Just as the --nominal flag forces all types to be parsed as nominal, the --structural flag forces all types to be parsed as equirecursive. This is the current default behavior, but a future PR will change the default to parse types as either structural or nominal according to their syntax or encoding. This new flag will then be necessary to get the current behavior. Also take this opportunity to deduplicate more flags in the help tests.
* [Wasm GC] Propagate immutable fields (#4251)Alon Zakai2021-10-151-2/+31
| | | | Very simple with the work so far, just add StructGet/ArrayGet code to check if the field is immutable, and allow the get to go through in that case.
* [wasm-metadce] Add support for tags (#4250)Heejin Ahn2021-10-141-0/+17
| | | | | | This adds support for tag-using instructions (`throw` and `catch`) to wasm-metadce. We had to use a hacky workaround in emscripten-core/emscripten#15266 because of the lack of this support; after this lands we can remove it.
* Switch from "extends" to M4 nominal syntax (#4248)Thomas Lively2021-10-142-12/+41
| | | | | | | | Switch from "extends" to M4 nominal syntax Change all test inputs from using the old (extends $super) syntax to using the new *_subtype syntax for their inputs and also update the printer to emit the new syntax. Add a new test explicitly testing the old notation to make sure it keeps working until we remove support for it.
* [Wasm GC] Optimize subsequent struct.sets after a struct.new (#4244)Alon Zakai2021-10-141-0/+141
| | | | | | | | | | | | | | | | | This optimizes this type of pattern: (local.set $x (struct.new X Y Z)) (struct.set (local.get $x) X') => (local.set $x (struct.new X' Y Z)) Note how the struct.set is removed, and X' moves to where X was. This removes almost 90% (!) of the struct.sets in j2wasm output, which reduces total code size by 2.5%. However, I see no speedup with this - I guess that either this is not on the hot path, or V8 optimizes it well already, or the CPU is making stores "free" anyhow...
* Refactor binaryen-c to use Builder when possible. NFC (#4247)Max Graey2021-10-141-59/+32
|
* [wasm-metadce] Don't add null names to roots (#4246)Heejin Ahn2021-10-141-7/+5
| | | | | | | | | Not sure why the current code tries to add the name even when it is null, but it causes `dump()` to behave strangely and pollute stdout when it tries to print `root.str`. Also this changes code printing `Name.str` to printing just `Name`; when `Name.str` is null, it prints `(null Name)` instead of polluting stdout, and it is the recommended way of printing `Name` anyway.
* Precompute: Track reference identity (#4243)Alon Zakai2021-10-141-15/+78
| | | | | | | | | | | | | | Precompute will run the interpreter on struct.new etc. repeatedly, as it keeps doing so while it propagates constant values around (if one of the operands to the struct.new becomes constant, that could have a noticeable effect). But creating new GC data means we lose track of their identity, and so ref.eq would not work, and we disabled basically all struct operations. This implements identity tracking so we can start to optimize there, which is a step towards using it for immutable field propagation. To track identity, always store the data representing each struct.new in the source using the same GCData structure. That keeps identity consistent no matter how many times we execute.
* MergeBlocks: Allow side effects in a ternary's first element (#4238)Alon Zakai2021-10-131-6/+2
| | | | | | | | | | | Side effects in the first element are always ok there, as they are not moved across anything else: they happen before their parent both before and after the opt. The pass just left ternary as a TODO, so do at least one part of that now (we can do the rest as well, with some care). This is fairly useful on array.set which has 3 operands, and the first often has interesting things in it.
* [Selectify] Increase TooCostlyToRunUnconditionally from 7 to 9 (#4228)Max Graey2021-10-131-1/+4
| | | | This makes Binaryen match LLVM on a real-world case, which is probably the safest heuristic to use.
* Fix table.size typo in declarations (#4242)Max Graey2021-10-131-1/+1
|
* [Wasm GC] Take advantage of immutable struct fields in effects.h (#4240)Alon Zakai2021-10-131-8/+26
| | | | | | This is the easy part of using immutability more: Just note immutable fields as such when we read from them, and then a write to a struct does not interfere with such reads. That is, only a read from a mutable field can notice the effect of a write.
* Minor fixes in binary type name emitting (#4239)Alon Zakai2021-10-131-2/+3
| | | | | | | | | | | Add an assert on not emitting a null name (which would cause a crash a few lines down on trying to read its bytes). I hit that when writing a buggy pass that updated field names. Also fix the case of a type not having a name but some of its fields having names. We can't test that atm since our text format requires types to have names anyhow, so this is a fix for a possible future where we do allow parsing non-named types.
* [Costs] More precise costs for int div & rem (#4229)Max Graey2021-10-121-2/+2
| | | | | | | Div/rem by a constant can be optimized by VMs, so it is usually closer to the speed of a mul. Div on 64-bit (either with or without a constant) can be slower than 32-bit, so bump that up by one as well.
* Fix function name `BinaryenTableSizeSetTable` (#4230)Paulo Matos2021-10-121-1/+1
| | | | | `BinaryenTableSizeSetTable` was being declared in the header correctly, but defined as `BinaryenTableSetSizeTable`. Add test for `BinaryenTableSizeGetTable` and `BinaryenTableSizeSetTable`.
* Remove forgotten call_ref-related logic in Directize. NFC (#4233)Alon Zakai2021-10-111-4/+2
| | | | | We moved call_ref out of there, but it was still checking for the possible presence of call_refs (using the feature), which means that even if we had no valid tables to optimize on, we'd scan the whole module.
* Add runOnModuleCode helper. NFC (#4234)Alon Zakai2021-10-115-5/+11
| | | | | | | | | This method is in parallel to runOnFunction above it. It sets the runner and then does the walk, like that method. Also set runner to nullptr by default. I noticed ubsan was warning on things here, which this should avoid, but otherwise I'm not aware of an actual bug, so this should be NFC. But it does provide a safer API that should avoid future bugs.
* Fix tee/as-non-null reordering when writing to a non-nullable param (#4232)Alon Zakai2021-10-111-1/+5
|
* Fix typo in comment (#4231)Paulo Matos2021-10-111-1/+1
|
* Add table.size operation (#4224)Max Graey2021-10-0823-6/+162
|
* Parse milestone 4 nominal types (#4222)Thomas Lively2021-10-081-15/+39
| | | | | | | | | Implement parsing the new {func,struct,array}_subtype format for nominal types. For now, the new format is parsed the same way the old-style (extends X) format is parsed, i.e. in --nominal mode types are parsed as nominal but otherwise they are parsed as equirecursive. Intentionally do not parse the new types unconditionally as nominal for now to allow frontends to update their nominal text format while continuing to use the workflow of running wasm-opt without --nominal to lower nominal types to structural types.
* Emit heap types for call_indirect that match the table (#4221)Alon Zakai2021-10-085-7/+47
| | | | | | | | See #4220 - this lets us handle the common case for now of simply having an identical heap type to the table when the signature is identical. With this PR, #4207's optimization of call_ref + table.get into call_indirect now leads to a binary that works in V8 in nominal mode.
* Directize: Do not optimize if a table has a table.set (#4218)Alon Zakai2021-10-071-13/+44
| | | Followup to #4215
* Add table.set operation (#4215)Max Graey2021-10-0725-18/+369
|
* Rename field names from "name" to "field" in DELEGATE macros (#4216)Alon Zakai2021-10-068-218/+218
| | | Clearer this way.
* [Wasm GC] GlobalTypeOptimization: Turn fields immutable when possible (#4213)Alon Zakai2021-10-066-0/+432
| | | | | | | | | | | | | | | | | | | | | Add a new pass to perform global type optimization. So far this just does one thing, to find fields with no struct.set and to turn them immutable (where possible - sub and supertypes must agree). To do that, this adds a GlobalTypeRewriter utility which rewrites all the heap types in the module, allowing changes while doing so. In this PR, the change is to flip the mutable field. Otherwise, the utility handles all the boilerplate of creating temp heap types using a TypeBuilder, and it handles replacing the types in every place they are used in the module. This is not enabled by default yet as I don't see enough of a benefit on j2cl. This PR is basically the simplest thing to do in the space of global type optimization, and the simplest way I can think of to fully test the GlobalTypeRewriter (which can't be done as a unit test, really, since we want to emit a full module and validate it etc.). This PR builds the foundation for more complicated things like removing unused fields, subtyping fields, and more.
* [OptimizeInstructions] Fold select into zero or single expression for some ↵Max Graey2021-10-051-0/+57
| | | | | | | | | | | patterns (#4181) i32(x) ? i32(x) : 0 ==> x i32(x) ? 0 : i32(x) ==> {x, 0} i64(x) == 0 ? 0 : i64(x) ==> x i64(x) != 0 ? i64(x) : 0 ==> x i64(x) == 0 ? i64(x) : 0 ==> {x, 0} i64(x) != 0 ? 0 : i64(x) ==> {x, 0}
* Implement standalone nominal types (#4201)Thomas Lively2021-10-052-21/+97
| | | | | | | | These new nominal types do not depend on the global type sytem being changed with the --nominal flag. Instead, they can coexist with the existing equirecursive structural types, as required in the new milestone 4 spec. This PR implements subtyping, upper bounding, canonicalizing, and other type operations but using the new types in the parsers and elsewhere in Binaryen is left to a follow-on PR.
* Update nominal binary format to match milestone 4 (#4211)Thomas Lively2021-10-041-14/+37
| | | | | | | | Update the binary format used in --nominal mode to match the format of nominal types in milestone 4. In particular, types without declared supertypes are now emitted using the nominal type codes with either `func` or `data` as their supertypes. This change is hopefully enough to get --nominal mode code running on V8's milestone 4 implementation until the rest of the type system changes can be implemented for use without --nominal.
* Fix roundtripping specialized element segments of table zero (#4212)Alon Zakai2021-10-051-1/+6
| | | | | Before this fix, the first table (index 0) is counted as its element segment having "no table index" even when its type is not funcref, which could break things if that table had a more specialized type.
* Optimize call_indirect of a select of two constants (#4208)Alon Zakai2021-10-041-29/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | (call_indirect ..args.. (select (i32.const x) (i32.const y) (condition) ) ) => (if (condition) (call $func-for-x ..args.. ) (call $func-for-y ..args.. ) ) To do this we must reorder the condition with the args, and also use the args more than once, so place them all in locals. This works towards the goal of polymorphic devirtualization, that is, turning an indirect call of more than one possible target into more than one direct call.
* Fix RefNull in wasm-delegations-fields.def (#4210)Alon Zakai2021-10-041-1/+0
| | | | | The type field is present in all Expressions, but RefNull's delegations marked it as if it were a new field. That meant that we process it twice. This was just some extra work mostly.
* Refactor generic functionality out of ConstantFieldPropagation. NFC (#4209)Alon Zakai2021-10-043-249/+409
| | | | | | | | | This just moves code outside and makes it more generic. One set of functionality are "struct utils", which are tools to scan wasm for info about the usage of struct fields, and to analyze that data. The other tool is a general analysis of nominal subtypes. The code will be useful in a few upcoming passes, so this will avoid a significant amount of code duplication.
* Optimize call_ref+table.get => call_indirect (#4207)Alon Zakai2021-10-041-0/+13
| | | Rather than load from the table and call that reference, call using the table.
* Fix inlining name collision (#4206)Alon Zakai2021-10-042-0/+39
|
* Fix Emscripten build by changing `|` to `||` (#4205)Thomas Lively2021-10-041-1/+1
| | | Emscripten must have rolled in a new warning about using `|` on booleans.
* Use ptrdiff_t instead of long for type iterator diff (#4200)Derek Schuff2021-10-011-1/+1
|
* Remove use of std::iterator (#4199)Derek Schuff2021-10-012-6/+12
| | | It's deprecated in C++17
* Implement table.get (#4195)Alon Zakai2021-09-3021-25/+189
| | | | Adds the part of the spec test suite that this passes (without table.set we can't do it all).
* [Wasm GC] Optimize static (rtt-free) operations (#4186)Alon Zakai2021-09-301-91/+134
| | | | | | | | | | | | Now that they are all implemented, we can optimize them. This removes the big if that ignored static operations, and implements things for them. In general this matches the existing rtt-using case, but there are a few things we can do better, which this does: * A cast of a subtype to a type always succeeds. * A test of a subtype to a type is always 1 (if non-nullable). * Repeated static casts can leave just the most demanding of them.
* Add a SmallSet and use it in LocalGraph. NFC (#4188)Alon Zakai2021-09-293-8/+288
| | | | | | | | | | | | | | | | | A SmallSet starts with fixed storage that it uses in the simplest possible way (linear scan, no sorting). If it exceeds a size then it starts using a normal std::set. So for small amounts of data it avoids allocation and any other overhead. This adds a unit test and also uses it in LocalGraph which provides a large amount of additional coverage. I also changed an unrelated data structure from std::map to std::unordered_map which I noticed while doing profiling in LocalGraph. (And a tiny bit of additional refactoring there.) This makes LocalGraph-using passes like ssa-nomerge and precompute-propagate 10-15% faster on a bunch of real-world codebases I tested.
* Disable partial inlining by default and add a flag for it. (#4191)Alon Zakai2021-09-273-2/+16
| | | | | Locally I saw a 10% speedup on j2cl but reports of regressions have arrived, so let's disable it for now pending investigation. The option added here should make it easy to experiment.
* Inlining: Remove unneeded functions in linear time (#4190)Alon Zakai2021-09-271-2/+7
| | | | | | | | | | By mistake the recent partial inlining work introduced quadratic time into the compiler: erasing a function from the list of functions takes linear time, which is why we have removeFunctions that does a group at a time. This isn't noticeable on small programs, but on j2cl output this makes the inlining-optimizing step 2x faster. See #4165
* [wasm-split] Disallow mixing --profile, --keep-funcs, and --split-funcs (#4187)Thomas Lively2021-09-242-41/+40
| | | | | | | | | | | | | Previously the set of functions to keep was initially empty, then the profile added new functions to keep, then the --keep-funcs functions were added, then the --split-funcs functions were removed. This method of composing these different options was arbitrary and not necessarily intuitive, and it prevented reasonable workflows from working. For example, providing only a --split-funcs list would result in all functions being split out not matter which functions were listed. To make the behavior of these options, and --split-funcs in particular, more intuitive, disallow mixing them and when --split-funcs is used, split out only the listed functions.
* Precompute: Only run a single LocalGraph iteration (#4184)Alon Zakai2021-09-231-19/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | Precompute has a mode in which it propagates results from local.sets to local.gets. That constructs a LocalGraph which is a non-trivial amount of work. We used to run multiple iterations of this, but investigation shows that such opportunities are extremely rare, as doing just a single propagation iteration has no effect on the entire emscripten benchmark suite, nor on j2cl output. Furthermore, we run this pass twice in the normal pipeline (once early, once late) so even if there are such opportunities they may be optimized already. And, --converge is a way to get additional iterations of all passes if a user wants that, so it makes sense not to costly work for more iterations automatically. In effect, 99.99% of the time before this pass we would create the LocalGraph twice: once the first time, then a second time only to see that we can't actually optimize anything further. This PR makes us only create it once, which makes precompute-propagate 10% faster on j2cl and even faster on other things like poppler (33%) and LLVM (29%). See the change in the test suite for an example of a case that does require more than one iteration to be optimized. Note that even there, we only manage to get benefit from a second iteration by doing something that overlaps with another pass (optimizing out an if with condition 0), which shows even more how unnecessary the extra work was. See #4165
* RemoveUnusedBrs: Optimize if-of-if pattern (#4180)Alon Zakai2021-09-231-2/+41
| | | | | | | | | | | | | | | | | | | if (A) { if (B) { C } } => if (A ? B : 0) { C } when B has no side effects, and is fast enough to consider running unconditionally. In that case, we replace an if with a select and a zero, which is the same size, but should be faster and may be further optimized. As suggested in #4168