summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
...
* [OptimizeInstructions] propagate sign for integer multiplication (#4098)Max Graey2021-09-091-0/+50
| | | | | | | | | | | | ```ts -x * -y => (x * y) -x * y => -(x * y) x * -y => -(x * y), if x != C && y != C -x * C => x * -C, if C != C_pot || shrinkLevel != 0 -x * C => -(x * C), otherwise ``` We are skipping propagation when lhs and rhs are constants because this should handled by constant folding. Also skip cases like `-x * 4 -> x * -4` for `shrinkLevel != 0`, as this will be further converted to `-(x << 2)`.
* Make static buffers in numToString thread local (#4134)Thomas Lively2021-09-091-4/+6
| | | | | | | Validation is performed on multiple threads at once and when there are multiple validation failures, those threads can all end up in `numToString` at the same time as they construct their respective error messages. Previously the threads would race on their access to the snprintf buffers, sometimes leading to segfaults. Fix the data races by making the buffers thread local.
* Do not use a library for wasm-split files (#4132)Thomas Lively2021-09-081-4/+2
|
* [wasm-split] Do not add exports of imported memories (#4133)Thomas Lively2021-09-081-12/+14
| | | | | | We can assume that imported memories (and the profiling data they contain) are already accessible from the module's environment, so there's no need to export them. This also avoids needing to add knowledge of "profile-memory" to Emscripten's library_dylink.js.
* Enumerate objects for wasm-split-lib (#4128)Thomas Lively2021-09-071-1/+1
| | | To support CMake 3.10. `add_executable` does not support OBJECT libraries until 3.12.
* Inlining: Track names over multiple iterations, not pointers (#4127)Alon Zakai2021-09-071-2/+6
| | | | | | | | It can be confusing during debugging to keep a map of pointers when we might have removed some of those functions from the module meanwhile (if you iterate over it in some additional debug logging). This change has no observable effect, however, as no bug could have actually occurred in practice given that nothing is done with the pointers in the actual code.
* Show a clear error on asyncify+references. (#4125)Alon Zakai2021-09-073-3/+31
| | | Helps #3739
* wasm-split: Export the memory if it is not already (#4121)Alon Zakai2021-09-071-1/+14
|
* [wasm-split] Add an option for recording profile data in memory (#4120)Thomas Lively2021-09-035-55/+165
| | | | | | | | | | | | | | | | To avoid requiring a static memory allocation, wasm-split's instrumentation defaults to recording profile data in Wasm globals. This causes problems for multithreaded applications because the globals are thread-local, but it is not always feasible to arrange for a separate profile to be dumped on each thread. To simplify the profiling of such multithreaded applications, add a new instrumentation mode that stores the profiling data in shared memory instead of in globals. This allows a single profile to be written that correctly reflects the called functions on all threads. This new mode is not on by default because it requires users to ensure that the program will not trample the in-memory profiling data. The data is stored beginning at address zero and occupies one byte per declared function in the instrumented module. Emscripten can be told to leave this memory free using the GLOBAL_BASE option.
* Optimize away dominated calls to functions that run only once (#4111)Alon Zakai2021-09-035-3/+456
| | | | | | | | | | | | | | | | | | | | | | | Some functions run only once with this pattern: function foo() { if (foo$ran) return; foo$ran = 1; ... } If that global is not ever set to 0, then the function's payload (after the initial if and return) will never execute more than once. That means we can optimize away dominated calls: foo(); foo(); // we can remove this To do this, we find which globals are "once", which means they can fit in that pattern, as they are never set to 0. If a function looks like the above pattern, and it's global is "once", then the function is "once" as well, and we can perform this optimization. This removes over 8% of static calls in j2cl.
* [NFC] Split wasm-split into multiple files (#4119)Thomas Lively2021-09-038-969/+1101
| | | | | As wasm-split has gained new functionality, its implementation file has become large. In preparation for adding even more functionality, split the existing implementation across multiple files in a new tools/wasm-split subdirectory.
* Support specialized function types in element segments (#4109)Alon Zakai2021-09-028-39/+70
| | | | | | Before this, the element segments would be printed as having type funcref, and then if their table had a specialized type, the element type would not be a subtype of the table and validation would fail.
* Fix the effects of array.copy (#4118)Alon Zakai2021-09-011-0/+2
| | | | | | This appeared to be a regression from #4117, however this was always a bug, and that PR just exposed it. That is, somehow we forgot to indicate the effects of ArrayCopy, and after that PR we'd vacuum it out incorrectly.
* [Refactoring] Cleanup asm2wasm. Use JS instead ASM prefix where possible. ↵Max Graey2021-09-019-474/+152
| | | | NFC (#4090)
* Use TrapsNeverHappen mode in more places in Vacuum (#4117)Alon Zakai2021-09-012-4/+4
| | | | | | | | | | | | | | We had already replaced the check on drop, but we can also use that mode on all the other things there, as the pass never does reorderings of things - it just removes them. For example, the pass can now remove part of a dropped thing, (drop (struct.get (foo))) => (drop (foo)) In this example the struct.get can be removed, even if the foo can't.
* Use the new module version of EffectAnalyzer (#4116)Alon Zakai2021-08-3115-83/+54
| | | | | | | | | | | This finishes the refactoring started in #4115 by doing the same change to pass a Module into EffectAnalyzer instead of features. To do so this refactors the fallthrough API and a few other small things. After those changes, this PR removes the old feature constructor of EffectAnalyzer entirely. This requires a small breaking change in the C API, changing BinaryenExpressionGetSideEffects's feature param to a module. That makes this change not NFC, but otherwise it is.
* Add a Module parameter to EffectAnalyzer. NFC (#4115)Alon Zakai2021-08-3114-110/+115
| | | | | | | | | | | | | Knowing the module will allow us to do more analysis in the effect analyzer. For now, this just refactors the code to allow providing a module instead of features, and to infer the features from the module. This actually shortens the code in most places which is nice (just pass module instead of module->features). This modifies basically all callers to use the new module form, except for the fallthrough logic. That would require some more refactoring, so to keep this PR reasonably small that is not yet done.
* Handle extra info in dylink section (#4112)Sam Clegg2021-08-314-41/+21
| | | | | If extra data is found in this section simply propagate it. Also, remove some dead code from wasm-binary.cpp.
* Use ModuleReader::readStdin for file "-" (#4114)Thomas Lively2021-08-301-2/+2
| | | | | | | | | After #4106 we already treat the input file "-" as a shorthand for reading from stdin at the file.cpp level. However, the "-" input was still treated as a normal file name at the wasm-io.cpp file and as a result was always treated as text input. This commit updates wasm-io.cpp to use the stdin code path supporting both binary and text input for "-". Fixes #4105 (again).
* [API] Add type argument for BinaryenAddTable method (#4107)Max Graey2021-08-273-5/+7
| | | In the JS API this is optional and it defaults to `funcref`.
* Read from stdin when the input file is `-` (#4106)Thomas Lively2021-08-272-2/+18
| | | | We already supported `-` as meaning stdout for output and this is useful in similar situations. Fixes #4105.
* Asyncify: Degrade gracefully if too many locals to compute ↵Alon Zakai2021-08-271-0/+9
| | | | relevantLiveLocals (#4108)
* Dominator Tree (#4100)Alon Zakai2021-08-261-0/+178
| | | | | | | | Add a class to compute the dominator tree for a CFG consisting of a list of basic blocks assumed to be in reverse postorder. This will be useful once cfg-walker emits blocks in reverse-postorder (which it almost does, another PR will handle that). Then we can write optimization passes that use block dominance.
* Costs: Index => ContType (#4103)Alon Zakai2021-08-241-82/+86
| | | The cost type isn't an index in a wasm binary, it's just a number.
* OptimizeInstructions: Handle trivial ref.cast and ref.test (#4097)Alon Zakai2021-08-243-29/+160
| | | | | If the types are completely incompatible, we know the cast will fail. However, ref.cast does allow a null to pass through, which makes it a little more complicated.
* Ensure cfg-traversal emits blocks in reverse postorder, refactoring try. NFC ↵Alon Zakai2021-08-241-26/+48
| | | | | | | | | | | | | | | | | | | | | | | (#4101) Reverse postorder basically just means that a block's immediate dominator must precede it in the list. That is useful because then algorithms that look at dominance can simply process the list in order, and the immediate dominator will have already been seen before each block. Another way to put it is that in reverse postorder a block's dominators appear before it in the list, as do all non-loop predecessors. At least in reducible graphs that is the case, and our IR, like wasm, is reducible. It is pretty natural to emit reverse postorder on wasm given the reducibility: simply process the wasm in postorder, and make sure to create new basic blocks only when reaching their code - that is, do not create them "ahead of time". We were doing that in a single place, for try-catch, so this PR refactors that. Specifically it makes us create the basic blocks for catches right when we reach them, and not earlier. So the data structure that used to store them becomes a list of things to connect to them. This is useful for #4100 , see more details there.
* wasm-split: accept file in keep-funcs/split-funcs (#4053)Aleksander Guryanov2021-08-231-4/+36
|
* [Wasm GC] Nulls compare equal regardless of type (#4094)Alon Zakai2021-08-191-3/+6
|
* Enable LocalCSE by default (#4089)Alon Zakai2021-08-191-0/+3
| | | | | | | | | | | | Enable it in -O3 and -Os and higher. This helps very little on output from LLVM, but also it does not alter compile times much anyhow. On code that has not been run through an optimizing compiler already, this can help quite a lot, e.g., 15% of code size on some wasm GC samples. This will not normally help with speed, as optimizing VMs do such things anyhow. However, this can help baseline compilers and interpreters and so forth.
* Optimize LocalCSE hash computations using a stack. NFC (#4091)Alon Zakai2021-08-182-8/+44
| | | | | | | | Before, we'd compute the hash of a child, then store that in a map, then the parent would find the child's hash in the map using the pointer to the child. But as we do a simple postorder walk, we can use a stack, and avoid hashing the child pointers. This makes it 10% faster or so.
* [Wasm GC] Effects: Differentiate Struct and Array types (#4088)Alon Zakai2021-08-181-16/+24
| | | | | | | | | | | This allows common patterns in J2CL to be optimized, where we write to various array indices and get the values or the reference from a struct. It would be nice to do even better here, and look at actually specific types, but I think we should be careful to keep the runtime constant. That seems hard to do if we accumulate a list of types and do Type::isSubType on them etc. But maybe someone has a better idea than this PR?
* Deprecate IgnoreImplicitTraps (#4087)Alon Zakai2021-08-171-0/+1
|
* Add TrapsNeverHappen to SideEffects's API (#4086)Max Graey2021-08-174-1/+12
|
* LocalCSE: ignore traps (#4085)Alon Zakai2021-08-171-0/+9
| | | | | | | | | | | | | | | | | | | | If we replace A A A with (local.set A) (local.get) (local.get) then it is ok for A to trap (so long as it does so deterministically), as if it does trap then the first appearance will do so, and the others not be reached anyhow. This helps GC code as often there are repeated struct.gets and such that may trap.
* TrapsNeverHappen mode (#4059)Alon Zakai2021-08-175-13/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The goal of this mode is to remove obviously-unneeded code like (drop (i32.load (local.get $x))) In general we can't remove it, as the load might trap - we'd be removing a side effect. This is fairly rare in general, but actually becomes quite annoying with wasm GC code where such patterns are more common, and we really need to remove them. Historically the IgnoreImplicitTraps option was meant to help here. However, in practice it did not quite work well enough for most production code, as mentioned e.g. in #3934 . TrapsNeverHappen mode is an attempt to fix that, based on feedback from @askeksa in that issue, and also I believe this implements an idea that @fitzgen mentioned a while ago (sorry, I can't remember where exactly...). So I'm hopeful this will be generally useful and not just for GC. The idea in TrapsNeverHappen mode is that traps are assumed to not actually happen at runtime. That is, if there is a trap in the code, it will not be reached, or if it is reached then it will not trap. For example, an (unreachable) would be assumed to never be reached, which means that the optimizer can remove it and any code that executes right before it: (if (..condition..) (block (..code that can be removed, if it does not branch out..) (..code that can be removed, if it does not branch out..) (..code that can be removed, if it does not branch out..) (unreachable))) And something like a load from memory is assumed to not trap, etc., which in particular would let us remove that dropped load from earlier. This mode should be usable in production builds with assertions disabled, if traps are seen as failing assertions. That might not be true of all release builds (maybe some use traps for other purposes), but hopefully in some. That is, if traps are like assertions, then enabling this new mode would be like disabling assertions in release builds and living with the fact that if an assertion would have been hit then that is "undefined behavior" and the optimizer might have removed the trap or done something weird. TrapsNeverHappen (TNH) is different from IgnoreImplicitTraps (IIT). The old IIT mode would just ignore traps when computing effects. That is a simple model, but a problem happens with a trap behind a condition, like this: if (x != 0) foo(1 / x); We won't trap on integer division by zero here only because of the guarding if. In IIT, we'd compute no side effects on 1 / x, and then we might end up moving it around, depending on other code in the area, and potentially out of the if - which would make it happen unconditionally, which would break. TNH avoids that problem because it does not simply ignore traps. Instead, there is a new hasUnremovableSideEffects() method that must be opted-in by passes. That checks if there are no side effects, or if there are, if we can remove them - and we know we can remove a trap if we are running under TrapsNeverHappen mode, as the trap won't happen by assumption. A pass must only use that method where it is safe, that is, where it would either remove the side effect (in which case, no problem), or if not, that it at least does not move it around (avoiding the above problem with IIT). This PR does not implement all optimizations possible with TNH, just a small initial set of things to get started. It is already useful on wasm GC code, including being as good as IIT on removing unnecessary casts in some cases, see the test suite updates here. Also, a significant part of the 18% speedup measured in #4052 (comment) is due to my testing with this enabled, as otherwise the devirtualization there leaves a lot of unneeded code.
* LocalCSE rewrite (#4079)Alon Zakai2021-08-179-229/+577
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Technically this is not a new pass, but it is a rewrite almost from scratch. Local Common Subexpression Elimination looks for repeated patterns, stuff like this: x = (a + b) + c y = a + b => temp = a + b x = temp + c y = temp The old pass worked on flat IR, which is inefficient, and was overly complicated because of that. The new pass uses a new algorithm that I think is pretty simple, see the detailed comment at the top. This keeps the pass enabled only in -O4, like before - right after flattening the IR. That is to make this as minimal a change as possible. Followups will enable the pass in the main pipeline, that is, we will finally be able to run it by default. (Note that to make the pass work well after flatten, an extra simplify-locals is added - the old pass used to do part of simplify-locals internally, which was one source of complexity. Even so, some of the -O4 tests have changes, due to minor factors - they are just minor orderings etc., which can be seen by inspecting the outputs before and after using e.g. --metrics) This plus some followup work leads to large wins on wasm GC output. On j2cl there is a common pattern of repeated struct.gets, so common that this pass removes 85% of all struct.gets, which makes the total binary 15% smaller. However, on LLVM-emitted code the benefit is minor, less than 1%.
* [Wasm GC] ConstantFieldPropagation: Ignore copies (#4084)Alon Zakai2021-08-161-7/+35
| | | | | | | | When looking for all values written to a field, we can ignore values that are loaded from that same field, i.e., are copied from something already present there. Such operations never introduce new values. This helps by a small but non-zero amount on j2cl.
* Support nominal typing in wasm-reduce (#4080)Alon Zakai2021-08-161-3/+8
| | | | | Use ToolOptions there, which adds --nominal support. We must also pass --nominal to the sub-commands we run.
* [Wasm GC] Fix OptimizeInstructions on folding of identical code with nominal ↵Alon Zakai2021-08-161-4/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | typing (#4069) (if (result i32) (local.get $x) (struct.get $B 1 (ref.null $B) ) (struct.get $C 1 (ref.null $C) ) ) With structural typing it is safe to turn this into this: (struct.get $A 1 (if (result (ref $A)) (local.get $x) (ref.null $B) (ref.null $C) ) ) Here $A is the LUB of the others. This works since $A must have field 1 in it. But with nominal types it is possible that the LUB in fact does not have that field, and we would not validate. This actually seems like a more general issue that might happen with other things, even though atm perhaps it can't. For simplicity, avoid this pattern in both nominal and structural typing, to avoid making a difference between them.
* [JS/C API] Expose zeroFilledMemory option for JS and C API (#4071)Max Graey2021-08-133-0/+25
|
* Add a shallowHash() method (#4077)Alon Zakai2021-08-122-70/+85
| | | | | | This adds and tests the new method. It will be used in a new pass later, where computing shallow hashes allows it to be done in linear time. 99% of the diff is whitespace.
* Fix the Switch operand order in LinearExecutionWalker (#4076)Alon Zakai2021-08-122-2/+2
| | | | | | | | | | This caused no noticeable bugs, but it could in theory in new passes - in fact in a pass I will open later this week it did. Also fix the order in wasm.h. That part has no effect, but it is nice to be consistent. After this PR, everything should match the single source of truth which is wasm-delegations-fields.h (as that is used in printing, binary reading/writing, etc., so it has to be correct). Also Switch now matches the ordering in Break.
* Fix signed_ field initialization in Load. (#4075)Alon Zakai2021-08-122-8/+2
| | | | | | | | | | | This was being set in the creation of Loads in the binary reader, but forgotten in the SIMD logic - which ends up creating a Load with type v128, and signed_ was uninitialized. Very hard to test this, but I saw it "break" hash value computation which is how I noticed this. Also initialize the I31 sign field. Now all of them in wasm.h are properly initialized.
* [Wasm GC] Fix LocalSubtyping on unreachable sets with incompatible values ↵Alon Zakai2021-08-111-1/+28
| | | | | | | | (#4051) We ignore sets in unreachable code, but their values may not be compatible with a new type we specialize a local for. That is, the validator cares about unreachable sets, while logically we don't need to, and this pass doesn't. Fix up such unreachable sets at the end.
* SimplifyGlobals: Optimize away globals that are only read in order to write ↵Alon Zakai2021-08-101-14/+158
| | | | | | | | | | | | | | | | | | themselves (#4070) If the only uses of a global are if (global == 0) { global = 1; } Then we do not need that global: while it has both reads and writes, the value in the global does not cause anything observable. It is read, but only to write to itself, and nothing else. This happens in real-world code from j2cl quite a lot, as they have an initialization pattern with globals, and in some cases we can optimize away the work done in the initialization, leaving only the globals in this pattern.
* Improve optimization of call_ref into direct calls (#4068)Alon Zakai2021-08-102-9/+83
| | | | | | | | | | | | | First, move the tiny pattern of call-ref-of-ref-func from Directize into OptimizeInstructions. This is important because Directize is a global optimization pass - it looks at the table to see if a CallIndirect can be turned into a direct call. We only run global passes at the end of the pipeline, but we don't need any global data for call-ref of a ref-func, and OptimizeInstructions is the place for such patterns. Second, extend that to also handle fallthrough values. This is less simple, but as call_ref is so inefficient, it's worth doing all we can.
* Improve inlining limits for recursion (#4067)Alon Zakai2021-08-101-18/+50
| | | | | | | | | | | | | | Previously we would keep doing iterations of inlining until we hit the number of functions in the module (at which point, we would definitely know we are recursing). This prevents infinite recursion, but it can take a very very long time to notice that in a huge module with one tiny recursive function and 100,000 other ones. To do better than that, track how many times we've inlined into a function. After a fixed number of such inlinings, stop. Aside from avoding very slow compile times, such infinite recursion likely is not that beneficial to do a great many times, so anyhow it is best to stop after a few iterations.
* [Wasm GC] RefEq(x, null) => RefIsNull(x) (#4066)Alon Zakai2021-08-091-0/+12
|
* Fix BrOn logic in RemoveUnusedBrs (#4062)Alon Zakai2021-08-091-47/+74
| | | | | | | | | | | This only moves code around. visitBrOn was in the main part of the pass, which was incorrect as it could interfere with other work being done there. Specifically, we have a stack of Ifs there, and if we replace a BrOn with an If, an assertion was hit. To fix this, run it like sinkBlocks(), in a separate interleaved phase. This fixes a bug reported by askeksa-google here: https://github.com/WebAssembly/gc/issues/226#issuecomment-868739853
* [Wasm GC] Track struct.new and struct.set separately in ↵Alon Zakai2021-08-091-51/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | ConstantFieldPropagation (#4064) Previously we tracked them in the same way. That means that we did the same when seeing if either a struct.new or a struct.set can write to the memory that is read by a struct.get, where the rule is that if either type is a subtype of the other then they might. But with struct.new we know the precise type, which means we can do better. Specifically, if we see a new of type B, then only a get of a supertype of B can possibly read that data: it is not possible for our struct of type B to appear in a location that requires a subtype of B. Conceptually: A = type struct B = type extends A C = type extends B x = struct.new<B> struct.get<A>(y) // x might appear here, as it can be assigned to a // variable y of a supertype struct.get<C>(y) // x cannot appear here This allows more devirtualization. It is a followup for #4052 that implements a TODO from there. The diff without whitespace is simpler.