forks/binaryen.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	[StackIR] Run StackIR during binary writing and not as a pass (#6568)	Alon Zakai	2024-05-09	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we had passes --generate-stack-ir, --optimize-stack-ir, --print-stack-ir that could be run like any other passes. After generating StackIR it was stashed on the function and invalidated if we modified BinaryenIR. If it wasn't invalidated then it was used during binary writing. This PR switches things so that we optionally generate, optimize, and print StackIR only during binary writing. It also removes all traces of StackIR from wasm.h - after this, StackIR is a feature of binary writing (and printing) logic only. This is almost NFC, but there are some minor noticeable differences: 1. We no longer print has StackIR in the text format when we see it is there. It will not be there during normal printing, as it is only present during binary writing. (but --print-stack-ir still works as before; as mentioned above it runs during writing). 2. --generate/optimize/print-stack-ir change from being passes to being flags that control that behavior instead. As passes, their order on the commandline mattered, while now it does not, and they only "globally" affect things during writing. 3. The C API changes slightly, as there is no need to pass it an option "optimize" to the StackIR APIs. Whether we optimize is handled by --optimize-stack-ir which is set like other optimization flags on the PassOptions object, so we don't need the old option to those C APIs. The main benefit here is simplifying the code, so we don't need to think about StackIR in more places than just binary writing. That may also allow future improvements to our usage of StackIR.
*	Fix incorrect wat in tests (#6207)	Thomas Lively	2024-01-08	1	-1/+1
\| \| \| \| \| \| \| \| \|	The new wat parser is much more strict than the legacy wat parser; the latter accepts all sorts of things that the spec does not allow. To ease an eventual transition to using the new wat parser by default, update the tests to use the standard text format in many places where they previously did not. We do not yet have a way to prevent new errors from being introduced into the test suite, but at least there will now be many fewer errors when it comes time to make the switch.
*	Require `then` and `else` with `if` (#6201)	Thomas Lively	2024-01-04	1	-18/+42
\| \| \| \| \| \| \| \| \| \| \| \|	We previously supported (and primarily used) a non-standard text format for conditionals in which the condition, if-true expression, and if-false expression were all simply s-expression children of the `if` expression. The standard text format, however, requires the use of `then` and `else` forms to introduce the if-true and if-false arms of the conditional. Update the legacy text parser to require the standard format and update all tests to match. Update the printer to print the standard format as well. The .wast and .wat test inputs were mechanically updated with this script: https://gist.github.com/tlively/85ae7f01f92f772241ec994c840ccbb1
*	Simplify and consolidate type printing (#5816)	Thomas Lively	2023-08-24	1	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When printing Binaryen IR, we previously generated names for unnamed heap types based on their structure. This was useful for seeing the structure of simple types at a glance without having to separately go look up their definitions, but it also had two problems: 1. The same name could be generated for multiple types. The generated names did not take into account rec group structure or finality, so types that differed only in these properties would have the same name. Also, generated type names were limited in length, so very large types that shared only some structure could also end up with the same names. Using the same name for multiple types produces incorrect and unparsable output. 2. The generated names were not useful beyond the most trivial examples. Even with length limits, names for nontrivial types were extremely long and visually noisy, which made reading disassembled real-world code more challenging. Fix these problems by emitting simple indexed names for unnamed heap types instead. This regresses readability for very simple examples, but the trade off is worth it. This change also reduces the number of type printing systems we have by one. Previously we had the system in Print.cpp, but we had another, more general and extensible system in wasm-type-printing.h and wasm-type.cpp as well. Remove the old type printing system from Print.cpp and replace it with a much smaller use of the new system. This requires significant refactoring of Print.cpp so that PrintExpressionContents object now holds a reference to a parent PrintSExpression object that holds the type name state. This diff is very large because almost every test output changed slightly. To minimize the diff and ease review, change the type printer in wasm-type.cpp to behave the same as the old type printer in Print.cpp except for the differences in name generation. These changes will be reverted in much smaller PRs in the future to generally improve how types are printed.
*	LinearExecutionTraversal: Add an option to connect adjacent code, use in ↵	Alon Zakai	2023-08-08	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \|	SimplifyLocals (#5860) This addresses most of the minor regression from the correctness fix in #5857. That PR makes us consider calls as branching, but in some cases it is ok to ignore that branching (see the comment in the code here), which this PR allows as an option. This undoes one test change from that PR, showing it undoes the regression for SimplifyLocals. More tests are added to cover this specifically as well.
*	[NFC] Avoid unneeded work in GTO (#5304)	Alon Zakai	2022-11-30	1	-5/+4
\| \| \| \| \|	As noticed in #5303, the test changes here are because we did unnecessary work which created a new rec group, which then led to a rec group being printed out.
*	Change the default type system to isorecursive (#5239)	Thomas Lively	2022-11-23	1	-13/+14
\| \| \| \| \| \| \| \| \| \|	This makes Binaryen's default type system match the WasmGC spec. Update the way type definitions without supertypes are printed to reduce the output diff for MVP tests that do not involve WasmGC. Also port some type-builder.cpp tests from test/example to test/gtest since they needed to be rewritten to work with isorecursive type anyway. A follow-on PR will remove equirecursive types completely.
*	Fix name of port_passes_tests_to_lit.py script. NFC (#4902)	Sam Clegg	2022-08-12	1	-1/+1
\| \| \|	I was reading these tests and failing to find the names script.
*	[Precompute][SIMD] Enable constant folding for simd (#4381)	Max Graey	2021-12-13	1	-1/+14
\|
*	CoalesceLocals: Remove a redundant tee of the same local as a parent set (#4318)	Alon Zakai	2021-11-09	1	-5/+3
\| \| \| \| \| \|	Tiny followup to #4314 Also updates some function types in test output, fixing breakage on main after racing landings.
*	CoalesceLocals: Rewrite the algorithm to be linear and to ignore copies (#4314)	Alon Zakai	2021-11-10	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old algorithm can be summarized as: In each basic block, start at the beginning. Each pair of live locals there might interfere with each other, as they might arrive from different entry blocks with different values. Afterwards, go through the block and find overlapping live ranges, and mark interferences there as well. This is non-linear because at the start of the block we do a double-loop over all pairs of live locals, which in general can be O(N^2) (N - number of locals). It also has the downside of ignoring copies: if two locals have overlapping live ranges but they must have identical values on those ranges, they do not actually interfere, for example x = 10; y = x; .. // live ranges overlap here foo(x, y); // live ranges end here. We can ignore this overlap since the copy shows they are identical there, but the pass did not take this into account. To some extent other passes can remove such copies (SimplifyLocals, MergeLocals, RedundantSetElimination), but in general this was a weak spot for the optimizer. I realized there is a solution to both these problems: In Wasm, given that we have a default value for all locals, if a local is live at the start of a block then it must be live at the end of all the blocks reaching it. That is so because the liveness will extend backwards all the way to some set of the local, possibly all the way to the zero-initialization at the start of the function, and it extends that way through all predecessor blocks. A consequence of this is that there are no interferences between locals that only occur during a merge: The live ranges include the predecessor blocks, and theirs, and so forth, until we reach a block where one of the locals is assigned a value different than the other. That is a necessary and sufficient condition for intererence, and therefore when processing a block we only need to look at its contents, and can ignore the merging of control flow, which allows us to be linear. More details on this and on the new algorithm in comments in the source, but the basic idea is that it simply goes through each block in a linear way, finding which values are assigned to each local (using a numbering of unique values), and noting which are live at each time. If two locals are live and one is assigned a value that is not the same as the value in the other, mark them as interfering. This is of substantial benefit to j2wasm output, I believe because it is common there to find local subexpression elimination opportunities after inlining, and each time we find one we add a local. If we inline different functions into the same target, we may end up with copied locals for each of them. (This was not noticed in the past because it is very rare on LLVM output, which has already had inlining and GVN etc. done.) There is a small benefit to LLVM output as well, though just a few percent at best. However, it is enough to be noticeable on some of the code size tests. This is also faster than the previous pass. It's normally not noticeable as this pass is not one of the slowest anyhow, but I found some real-world codebases where the pass becomes 50% faster. I have not found any case where it is slower than the old algorithm. Fuzzed over several days to be sure this is correct, and also verified on the emscripten test suite.
*	[OptimizeInstructions] Fold select into zero or single expression for some ↵	Max Graey	2021-10-05	1	-2/+126
\| \| \| \| \| \| \| \| \| \| \|	patterns (#4181) i32(x) ? i32(x) : 0 ==> x i32(x) ? 0 : i32(x) ==> {x, 0} i64(x) == 0 ? 0 : i64(x) ==> x i64(x) != 0 ? i64(x) : 0 ==> x i64(x) == 0 ? i64(x) : 0 ==> {x, 0} i64(x) != 0 ? 0 : i64(x) ==> {x, 0}
*	[OptimizeInstructions] propagate sign for integer multiplication (#4098)	Max Graey	2021-09-09	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \|	```ts -x * -y => (x * y) -x * y => -(x * y) x * -y => -(x * y), if x != C && y != C -x * C => x * -C, if C != C_pot \|\| shrinkLevel != 0 -x * C => -(x * C), otherwise ``` We are skipping propagation when lhs and rhs are constants because this should handled by constant folding. Also skip cases like `-x * 4 -> x * -4` for `shrinkLevel != 0`, as this will be further converted to `-(x << 2)`.
*	Enable LocalCSE by default (#4089)	Alon Zakai	2021-08-19	1	-14/+12
\| \| \| \| \| \| \| \| \| \| \| \|	Enable it in -O3 and -Os and higher. This helps very little on output from LLVM, but also it does not alter compile times much anyhow. On code that has not been run through an optimizing compiler already, this can help quite a lot, e.g., 15% of code size on some wasm GC samples. This will not normally help with speed, as optimizing VMs do such things anyhow. However, this can help baseline compilers and interpreters and so forth.
*	Port test/passes/O* to lit (#3969)	Thomas Lively	2021-07-07	1	-0/+117