summaryrefslogtreecommitdiff
path: root/test/lit/passes
Commit message (Collapse)AuthorAgeFilesLines
* [EH] Support try-delegate in CFGWalker (#4269)Heejin Ahn2021-10-211-0/+358
| | | | This adds support for `try`-`delegate` to `CFGWalker`. This also adds a single test for `catch`-less `try`.
* [Wasm GC] Global Type Optimization: Remove unread fields (#4255)Alon Zakai2021-10-202-6/+584
| | | | | | | | | | | | | | | Add struct.get tracking, and if a field is never read from, simply remove it. This will error if a field is written using struct.new with a value with side effects. It is not clear we can handle that, as if the struct.new is in a global then we can't save the other values to locals etc. to reorder things. We could perhaps use other globals for it (ugh) but at least for now, that corner case does not happen on any code I can see. This allows a quite large code size reduction on j2wasm output (20%). The reason is that many vtable fields are not actually read, and so removing them and the ref.func they hold allows us to get rid of those functions, and code that they reach.
* SimplifyGlobals: Detect trivial read-only-to-write functions (#4257)Alon Zakai2021-10-191-0/+149
| | | | | | | | | | | | | | | | | | | | | We already detected code that looks like if (foo == 0) { foo = 1; } That "read only to write" pattern occurs also in functions, like this: function bar() { if (foo == 0) return; foo = 1; } This PR detects that pattern. It moves code around to share almost all the logic with the previous pattern (the git diff is not that useful there, sadly, but looking at them side by side that should be obvious). This helps in j2cl on some common clinits, where the clinit function ends up empty, which is exactly this pattern.
* OptimizeInstructions: Ignore unreachable subsequent sets (#4259)Alon Zakai2021-10-191-0/+47
| | | Fuzzing followup to #4244.
* MergeBlocks: optimize If conditions (#4260)Alon Zakai2021-10-192-55/+91
| | | | | Code in the If condition can be moved out to before the if. Existing test updates are 99% whitespace.
* [Wasm GC] Propagate immutable fields (#4251)Alon Zakai2021-10-151-0/+820
| | | | Very simple with the work so far, just add StructGet/ArrayGet code to check if the field is immutable, and allow the get to go through in that case.
* Switch from "extends" to M4 nominal syntax (#4248)Thomas Lively2021-10-1416-172/+2105
| | | | | | | | Switch from "extends" to M4 nominal syntax Change all test inputs from using the old (extends $super) syntax to using the new *_subtype syntax for their inputs and also update the printer to emit the new syntax. Add a new test explicitly testing the old notation to make sure it keeps working until we remove support for it.
* [Wasm GC] Optimize subsequent struct.sets after a struct.new (#4244)Alon Zakai2021-10-141-0/+719
| | | | | | | | | | | | | | | | | This optimizes this type of pattern: (local.set $x (struct.new X Y Z)) (struct.set (local.get $x) X') => (local.set $x (struct.new X' Y Z)) Note how the struct.set is removed, and X' moves to where X was. This removes almost 90% (!) of the struct.sets in j2wasm output, which reduces total code size by 2.5%. However, I see no speedup with this - I guess that either this is not on the hot path, or V8 optimizes it well already, or the CPU is making stores "free" anyhow...
* Precompute: Track reference identity (#4243)Alon Zakai2021-10-141-22/+518
| | | | | | | | | | | | | | Precompute will run the interpreter on struct.new etc. repeatedly, as it keeps doing so while it propagates constant values around (if one of the operands to the struct.new becomes constant, that could have a noticeable effect). But creating new GC data means we lose track of their identity, and so ref.eq would not work, and we disabled basically all struct operations. This implements identity tracking so we can start to optimize there, which is a step towards using it for immutable field propagation. To track identity, always store the data representing each struct.new in the source using the same GCData structure. That keeps identity consistent no matter how many times we execute.
* MergeBlocks: Allow side effects in a ternary's first element (#4238)Alon Zakai2021-10-132-21/+119
| | | | | | | | | | | Side effects in the first element are always ok there, as they are not moved across anything else: they happen before their parent both before and after the opt. The pass just left ternary as a TODO, so do at least one part of that now (we can do the rest as well, with some care). This is fairly useful on array.set which has 3 operands, and the first often has interesting things in it.
* [Selectify] Increase TooCostlyToRunUnconditionally from 7 to 9 (#4228)Max Graey2021-10-131-3/+50
| | | | This makes Binaryen match LLVM on a real-world case, which is probably the safest heuristic to use.
* [Wasm GC] Take advantage of immutable struct fields in effects.h (#4240)Alon Zakai2021-10-131-0/+61
| | | | | | This is the easy part of using immutability more: Just note immutable fields as such when we read from them, and then a write to a struct does not interfere with such reads. That is, only a read from a mutable field can notice the effect of a write.
* Fix tee/as-non-null reordering when writing to a non-nullable param (#4232)Alon Zakai2021-10-111-0/+31
|
* Emit heap types for call_indirect that match the table (#4221)Alon Zakai2021-10-082-2/+90
| | | | | | | | See #4220 - this lets us handle the common case for now of simply having an identical heap type to the table when the signature is identical. With this PR, #4207's optimization of call_ref + table.get into call_indirect now leads to a binary that works in V8 in nominal mode.
* Directize: Do not optimize if a table has a table.set (#4218)Alon Zakai2021-10-071-0/+64
| | | Followup to #4215
* [Wasm GC] GlobalTypeOptimization: Turn fields immutable when possible (#4213)Alon Zakai2021-10-061-0/+348
| | | | | | | | | | | | | | | | | | | | | Add a new pass to perform global type optimization. So far this just does one thing, to find fields with no struct.set and to turn them immutable (where possible - sub and supertypes must agree). To do that, this adds a GlobalTypeRewriter utility which rewrites all the heap types in the module, allowing changes while doing so. In this PR, the change is to flip the mutable field. Otherwise, the utility handles all the boilerplate of creating temp heap types using a TypeBuilder, and it handles replacing the types in every place they are used in the module. This is not enabled by default yet as I don't see enough of a benefit on j2cl. This PR is basically the simplest thing to do in the space of global type optimization, and the simplest way I can think of to fully test the GlobalTypeRewriter (which can't be done as a unit test, really, since we want to emit a full module and validate it etc.). This PR builds the foundation for more complicated things like removing unused fields, subtyping fields, and more.
* [OptimizeInstructions] Fold select into zero or single expression for some ↵Max Graey2021-10-052-12/+419
| | | | | | | | | | | patterns (#4181) i32(x) ? i32(x) : 0 ==> x i32(x) ? 0 : i32(x) ==> {x, 0} i64(x) == 0 ? 0 : i64(x) ==> x i64(x) != 0 ? i64(x) : 0 ==> x i64(x) == 0 ? i64(x) : 0 ==> {x, 0} i64(x) != 0 ? 0 : i64(x) ==> {x, 0}
* Optimize call_indirect of a select of two constants (#4208)Alon Zakai2021-10-041-0/+299
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | (call_indirect ..args.. (select (i32.const x) (i32.const y) (condition) ) ) => (if (condition) (call $func-for-x ..args.. ) (call $func-for-y ..args.. ) ) To do this we must reorder the condition with the args, and also use the args more than once, so place them all in locals. This works towards the goal of polymorphic devirtualization, that is, turning an indirect call of more than one possible target into more than one direct call.
* Optimize call_ref+table.get => call_indirect (#4207)Alon Zakai2021-10-041-5/+28
| | | Rather than load from the table and call that reference, call using the table.
* Fix inlining name collision (#4206)Alon Zakai2021-10-041-1/+28
|
* Port a batch of passes tests to lit (#4202)Thomas Lively2021-10-046-0/+2212
| | | | | | | | - fpcast-emu.wast - generate-dyncalls_all-features.wast - generaite-i64-dyncalls.wast - instrument-locals_all-features_disable-typed-function-references.wast - instrument-memory.wast - instrument-memory64.wast
* [Wasm GC] Optimize static (rtt-free) operations (#4186)Alon Zakai2021-09-301-3/+662
| | | | | | | | | | | | Now that they are all implemented, we can optimize them. This removes the big if that ignored static operations, and implements things for them. In general this matches the existing rtt-using case, but there are a few things we can do better, which this does: * A cast of a subtype to a type always succeeds. * A test of a subtype to a type is always 1 (if non-nullable). * Repeated static casts can leave just the most demanding of them.
* Disable partial inlining by default and add a flag for it. (#4191)Alon Zakai2021-09-272-624/+579
| | | | | Locally I saw a 10% speedup on j2cl but reports of regressions have arrived, so let's disable it for now pending investigation. The option added here should make it easy to experiment.
* RemoveUnusedBrs: Optimize if-of-if pattern (#4180)Alon Zakai2021-09-232-254/+403
| | | | | | | | | | | | | | | | | | | if (A) { if (B) { C } } => if (A ? B : 0) { C } when B has no side effects, and is fast enough to consider running unconditionally. In that case, we replace an if with a select and a zero, which is the same size, but should be faster and may be further optimized. As suggested in #4168
* [Wasm GC] Implement static (rtt-free) StructNew, ArrayNew, ArrayInit (#4172)Alon Zakai2021-09-232-1/+44
| | | | | | | | | See #4149 This modifies the test added in #4163 which used static casts on dynamically-created structs and arrays. That was technically not valid (as we won't want users to "mix" the two forms). This makes that test 100% static, which both fixes the test and gives test coverage to the new instructions added here.
* [Wasm GC] Fix invalid intermediate IR in OptimizeInstructions (#4169)Alon Zakai2021-09-203-21/+67
| | | | | | | | | | | | We added an optional ReFinalize in OptimizeInstructions at some point, but that is not valid: The ReFinalize only updates types when all other works is done, but the pass works incrementally. The bug the fuzzer found is that a child is changed to be unreachable, and then the parent is optimized before finalize() is called on it, which led to an assertion being hit (as the child was unreachable but not the parent, which should also be). To fix this, do not change types in this pass. Emit an extra block with a declared type when necessary. Other passes can remove the extra block.
* Partial inlining via function splitting (#4152)Alon Zakai2021-09-172-578/+2325
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR helps with functions like this: function foo(x) { if (x) { .. lots of work here .. } } If "lots of work" is large enough, then we won't inline such a function. However, we may end up calling into the function only to get a false on that if and immediately exit. So it is useful to partially inline this function, basically by creating a split of it into a condition part that is inlineable function foo$inlineable(x) { if (x) { foo$outlined(); } } and an outlined part that is not inlineable: function foo$outlined(x) { .. lots of work here .. } We can then inline the inlineable part. That means that a call like foo(param); turns into if (param) { foo$outlined(); } In other words, we end up replacing a call and then a check with a check and then a call. Any time that the condition is false, this will be a speedup. The cost here is increased size, as we duplicate the condition into the callsites. For that reason, only do this when heavily optimizing for size. This is a 10% speedup on j2cl. This helps two types of functions there: Java class inits, which often look like "have I been initialized before? if not, do all this work", and also assertion methods which look like "if the input is null, throw an exception".
* [Wasm GC] Optimize away ref.as_non_null going into local.set in TNH mode (#4157)Alon Zakai2021-09-163-1/+64
| | | | | | | | | | If we can remove such traps, we can remove ref.as_non_null if the local type is nullable anyhow. If we support non-nullable locals, however, then do not do this, as it could inhibit specializing the local type later. Do the same for tees which we had existing code for. Background: #4061 (comment)
* Fix regression from #4130 (#4158)Alon Zakai2021-09-161-1/+211
| | | | | | | | | That PR reused the same node twice in the output, which fails on the assertion in BINARYEN_PASS_DEBUG=1 mode. No new test is needed because the existing test suite fails already in that mode. That the PR managed to land seems to say that we are not testing pass-debug mode on our lit tests, which we need to investigate.
* [Wasm GC] Fix OptimizeInstructions on unreachable ref.test (#4156)Alon Zakai2021-09-151-4/+30
| | | | | | Avoids a crash in calling getHeapType when there isn't one. Also add the relevant lit test (and a few others) to the list of files to fuzz more heavily.
* [OptimizeInstructions] Optimize memory.fill with constant arguments (#4130)Max Graey2021-09-142-83/+343
| | | | | | | | | | | | | | This is reland of #3071 Do similar optimizations as in #3038 but for memory.fill. `memory.fill(d, v, 0)` ==> `{ drop(d), drop(v) }` only with `ignoreImplicitTraps` or `trapsNeverHappen` `memory.fill(d, v, 1)` ==> `store8(d, v)` Further simplifications can be done only if v is constant because otherwise binary size would increase: `memory.fill(d, C, 1)` ==> `store8(d, (C & 0xFF))` `memory.fill(d, C, 2)` ==> `store16(d, (C & 0xFF) * 0x0101)` `memory.fill(d, C, 4)` ==> `store32(d, (C & 0xFF) * 0x01010101)` `memory.fill(d, C, 8)` ==> `store64(d, (C & 0xFF) * 0x0101010101010101)` `memory.fill(d, C, 16)` ==> `store128(d, i8x16.splat(C & 0xFF))`
* OptimizeInstructions: Optimize boolean selects (#4147)Alon Zakai2021-09-131-0/+246
| | | | | | | | | | | | If all a select's inputs are boolean, we can sometimes turn the select into an AND or an OR operation, x ? y : 0 => x & y x ? 1 : y => x | y I believe LLVM aggressively canonicalizes to this form. It makes sense to do here too as it is smaller (save the constant 0 or 1). It also allows further optimizations (which is why LLVM does it) but I don't think we have those yet.
* Add an Intrinsics mechanism, and a call.without.effects intrinsic (#4126)Alon Zakai2021-09-102-0/+286
| | | | | | | | | | | | | | | | | | | | | | | | | An "intrinsic" is modeled as a call to an import. We could also add new IR things for them, but that would take more work and lead to less clear errors in other tools if they try to read a binary using such a nonstandard extension. A first intrinsic is added here, call.without.effects This is basically the same as call_ref except that the optimizer is free to assume the call has no side effects. Consequently, if the result is not used then it can be optimized out (as even if it is not used then side effects could have kept it around). Likewise, the lack of side effects allows more reordering and other things. A lowering pass for intrinsics is provided. Rather than automatically lower them to normal wasm at the end of optimizations, the user must call that pass explicitly. A typical workflow might be -O --intrinsic-lowering -O That optimizes with the intrinsic present - perhaps removing calls thanks to it - then lowers it into normal wasm - it turns into a call_ref - and then optimizes further, which would turns the call_ref into a direct call, potentially inline, etc.
* Refactor MergeBlocks to use iteration; adds Wasm GC support (#4137)Alon Zakai2021-09-091-1/+60
| | | | | | | | | MergeBlocks was written a very long time ago, before the iteration API, so it had a bunch of hardcoded things for specific instructions. In particular, that did not handle GC. This does a small refactoring to use iteration. The refactoring is NFC, but while doing so it adds support for new relevant instructions, including wasm GC.
* [OptimizeInstructions] propagate sign for integer multiplication (#4098)Max Graey2021-09-092-0/+266
| | | | | | | | | | | | ```ts -x * -y => (x * y) -x * y => -(x * y) x * -y => -(x * y), if x != C && y != C -x * C => x * -C, if C != C_pot || shrinkLevel != 0 -x * C => -(x * C), otherwise ``` We are skipping propagation when lhs and rhs are constants because this should handled by constant folding. Also skip cases like `-x * 4 -> x * -4` for `shrinkLevel != 0`, as this will be further converted to `-(x << 2)`.
* Optimize away dominated calls to functions that run only once (#4111)Alon Zakai2021-09-031-0/+1429
| | | | | | | | | | | | | | | | | | | | | | | Some functions run only once with this pattern: function foo() { if (foo$ran) return; foo$ran = 1; ... } If that global is not ever set to 0, then the function's payload (after the initial if and return) will never execute more than once. That means we can optimize away dominated calls: foo(); foo(); // we can remove this To do this, we find which globals are "once", which means they can fit in that pattern, as they are never set to 0. If a function looks like the above pattern, and it's global is "once", then the function is "once" as well, and we can perform this optimization. This removes over 8% of static calls in j2cl.
* Use TrapsNeverHappen mode in more places in Vacuum (#4117)Alon Zakai2021-09-011-13/+50
| | | | | | | | | | | | | | We had already replaced the check on drop, but we can also use that mode on all the other things there, as the pass never does reorderings of things - it just removes them. For example, the pass can now remove part of a dropped thing, (drop (struct.get (foo))) => (drop (foo)) In this example the struct.get can be removed, even if the foo can't.
* OptimizeInstructions: Handle trivial ref.cast and ref.test (#4097)Alon Zakai2021-08-244-53/+317
| | | | | If the types are completely incompatible, we know the cast will fail. However, ref.cast does allow a null to pass through, which makes it a little more complicated.
* Enable LocalCSE by default (#4089)Alon Zakai2021-08-192-53/+63
| | | | | | | | | | | | Enable it in -O3 and -Os and higher. This helps very little on output from LLVM, but also it does not alter compile times much anyhow. On code that has not been run through an optimizing compiler already, this can help quite a lot, e.g., 15% of code size on some wasm GC samples. This will not normally help with speed, as optimizing VMs do such things anyhow. However, this can help baseline compilers and interpreters and so forth.
* [Wasm GC] Effects: Differentiate Struct and Array types (#4088)Alon Zakai2021-08-181-2/+53
| | | | | | | | | | | This allows common patterns in J2CL to be optimized, where we write to various array indices and get the values or the reference from a struct. It would be nice to do even better here, and look at actually specific types, but I think we should be careful to keep the runtime constant. That seems hard to do if we accumulate a list of types and do Type::isSubType on them etc. But maybe someone has a better idea than this PR?
* LocalCSE: ignore traps (#4085)Alon Zakai2021-08-172-9/+50
| | | | | | | | | | | | | | | | | | | | If we replace A A A with (local.set A) (local.get) (local.get) then it is ok for A to trap (so long as it does so deterministically), as if it does trap then the first appearance will do so, and the others not be reached anyhow. This helps GC code as often there are repeated struct.gets and such that may trap.
* TrapsNeverHappen mode (#4059)Alon Zakai2021-08-172-0/+144
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The goal of this mode is to remove obviously-unneeded code like (drop (i32.load (local.get $x))) In general we can't remove it, as the load might trap - we'd be removing a side effect. This is fairly rare in general, but actually becomes quite annoying with wasm GC code where such patterns are more common, and we really need to remove them. Historically the IgnoreImplicitTraps option was meant to help here. However, in practice it did not quite work well enough for most production code, as mentioned e.g. in #3934 . TrapsNeverHappen mode is an attempt to fix that, based on feedback from @askeksa in that issue, and also I believe this implements an idea that @fitzgen mentioned a while ago (sorry, I can't remember where exactly...). So I'm hopeful this will be generally useful and not just for GC. The idea in TrapsNeverHappen mode is that traps are assumed to not actually happen at runtime. That is, if there is a trap in the code, it will not be reached, or if it is reached then it will not trap. For example, an (unreachable) would be assumed to never be reached, which means that the optimizer can remove it and any code that executes right before it: (if (..condition..) (block (..code that can be removed, if it does not branch out..) (..code that can be removed, if it does not branch out..) (..code that can be removed, if it does not branch out..) (unreachable))) And something like a load from memory is assumed to not trap, etc., which in particular would let us remove that dropped load from earlier. This mode should be usable in production builds with assertions disabled, if traps are seen as failing assertions. That might not be true of all release builds (maybe some use traps for other purposes), but hopefully in some. That is, if traps are like assertions, then enabling this new mode would be like disabling assertions in release builds and living with the fact that if an assertion would have been hit then that is "undefined behavior" and the optimizer might have removed the trap or done something weird. TrapsNeverHappen (TNH) is different from IgnoreImplicitTraps (IIT). The old IIT mode would just ignore traps when computing effects. That is a simple model, but a problem happens with a trap behind a condition, like this: if (x != 0) foo(1 / x); We won't trap on integer division by zero here only because of the guarding if. In IIT, we'd compute no side effects on 1 / x, and then we might end up moving it around, depending on other code in the area, and potentially out of the if - which would make it happen unconditionally, which would break. TNH avoids that problem because it does not simply ignore traps. Instead, there is a new hasUnremovableSideEffects() method that must be opted-in by passes. That checks if there are no side effects, or if there are, if we can remove them - and we know we can remove a trap if we are running under TrapsNeverHappen mode, as the trap won't happen by assumption. A pass must only use that method where it is safe, that is, where it would either remove the side effect (in which case, no problem), or if not, that it at least does not move it around (avoiding the above problem with IIT). This PR does not implement all optimizations possible with TNH, just a small initial set of things to get started. It is already useful on wasm GC code, including being as good as IIT on removing unnecessary casts in some cases, see the test suite updates here. Also, a significant part of the 18% speedup measured in #4052 (comment) is due to my testing with this enabled, as otherwise the devirtualization there leaves a lot of unneeded code.
* LocalCSE rewrite (#4079)Alon Zakai2021-08-175-1282/+775
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Technically this is not a new pass, but it is a rewrite almost from scratch. Local Common Subexpression Elimination looks for repeated patterns, stuff like this: x = (a + b) + c y = a + b => temp = a + b x = temp + c y = temp The old pass worked on flat IR, which is inefficient, and was overly complicated because of that. The new pass uses a new algorithm that I think is pretty simple, see the detailed comment at the top. This keeps the pass enabled only in -O4, like before - right after flattening the IR. That is to make this as minimal a change as possible. Followups will enable the pass in the main pipeline, that is, we will finally be able to run it by default. (Note that to make the pass work well after flatten, an extra simplify-locals is added - the old pass used to do part of simplify-locals internally, which was one source of complexity. Even so, some of the -O4 tests have changes, due to minor factors - they are just minor orderings etc., which can be seen by inspecting the outputs before and after using e.g. --metrics) This plus some followup work leads to large wins on wasm GC output. On j2cl there is a common pattern of repeated struct.gets, so common that this pass removes 85% of all struct.gets, which makes the total binary 15% smaller. However, on LLVM-emitted code the benefit is minor, less than 1%.
* [Wasm GC] ConstantFieldPropagation: Ignore copies (#4084)Alon Zakai2021-08-161-0/+161
| | | | | | | | When looking for all values written to a field, we can ignore values that are loaded from that same field, i.e., are copied from something already present there. Such operations never introduce new values. This helps by a small but non-zero amount on j2cl.
* [Wasm GC] Fix OptimizeInstructions on folding of identical code with nominal ↵Alon Zakai2021-08-161-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | typing (#4069) (if (result i32) (local.get $x) (struct.get $B 1 (ref.null $B) ) (struct.get $C 1 (ref.null $C) ) ) With structural typing it is safe to turn this into this: (struct.get $A 1 (if (result (ref $A)) (local.get $x) (ref.null $B) (ref.null $C) ) ) Here $A is the LUB of the others. This works since $A must have field 1 in it. But with nominal types it is possible that the LUB in fact does not have that field, and we would not validate. This actually seems like a more general issue that might happen with other things, even though atm perhaps it can't. For simplicity, avoid this pattern in both nominal and structural typing, to avoid making a difference between them.
* Add nominal typing mode to a test in prep for #4069 (#4081)Alon Zakai2021-08-131-1/+513
|
* [Wasm GC] Fix LocalSubtyping on unreachable sets with incompatible values ↵Alon Zakai2021-08-111-0/+49
| | | | | | | | (#4051) We ignore sets in unreachable code, but their values may not be compatible with a new type we specialize a local for. That is, the validator cares about unreachable sets, while logically we don't need to, and this pass doesn't. Fix up such unreachable sets at the end.
* SimplifyGlobals: Optimize away globals that are only read in order to write ↵Alon Zakai2021-08-101-0/+240
| | | | | | | | | | | | | | | | | | themselves (#4070) If the only uses of a global are if (global == 0) { global = 1; } Then we do not need that global: while it has both reads and writes, the value in the global does not cause anything observable. It is read, but only to write to itself, and nothing else. This happens in real-world code from j2cl quite a lot, as they have an initialization pattern with globals, and in some cases we can optimize away the work done in the initialization, leaving only the globals in this pattern.
* Improve optimization of call_ref into direct calls (#4068)Alon Zakai2021-08-102-25/+203
| | | | | | | | | | | | | First, move the tiny pattern of call-ref-of-ref-func from Directize into OptimizeInstructions. This is important because Directize is a global optimization pass - it looks at the table to see if a CallIndirect can be turned into a direct call. We only run global passes at the end of the pipeline, but we don't need any global data for call-ref of a ref-func, and OptimizeInstructions is the place for such patterns. Second, extend that to also handle fallthrough values. This is less simple, but as call_ref is so inefficient, it's worth doing all we can.
* Improve inlining limits for recursion (#4067)Alon Zakai2021-08-101-1/+203
| | | | | | | | | | | | | | Previously we would keep doing iterations of inlining until we hit the number of functions in the module (at which point, we would definitely know we are recursing). This prevents infinite recursion, but it can take a very very long time to notice that in a huge module with one tiny recursive function and 100,000 other ones. To do better than that, track how many times we've inlined into a function. After a fixed number of such inlinings, stop. Aside from avoding very slow compile times, such infinite recursion likely is not that beneficial to do a great many times, so anyhow it is best to stop after a few iterations.