forks/binaryen.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	[StackIR] Run StackIR during binary writing and not as a pass (#6568)	Alon Zakai	2024-05-09	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we had passes --generate-stack-ir, --optimize-stack-ir, --print-stack-ir that could be run like any other passes. After generating StackIR it was stashed on the function and invalidated if we modified BinaryenIR. If it wasn't invalidated then it was used during binary writing. This PR switches things so that we optionally generate, optimize, and print StackIR only during binary writing. It also removes all traces of StackIR from wasm.h - after this, StackIR is a feature of binary writing (and printing) logic only. This is almost NFC, but there are some minor noticeable differences: 1. We no longer print has StackIR in the text format when we see it is there. It will not be there during normal printing, as it is only present during binary writing. (but --print-stack-ir still works as before; as mentioned above it runs during writing). 2. --generate/optimize/print-stack-ir change from being passes to being flags that control that behavior instead. As passes, their order on the commandline mattered, while now it does not, and they only "globally" affect things during writing. 3. The C API changes slightly, as there is no need to pass it an option "optimize" to the StackIR APIs. Whether we optimize is handled by --optimize-stack-ir which is set like other optimization flags on the PassOptions object, so we don't need the old option to those C APIs. The main benefit here is simplifying the code, so we don't need to think about StackIR in more places than just binary writing. That may also allow future improvements to our usage of StackIR.
*	Drop support for non-standard quoted function names (#6188)	Thomas Lively	2023-12-20	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We previously supported a non-standard `(func "name" ...` syntax for declaring functions exported with the quoted name. Since that is not part of the standard text format, drop support for it, replacing it with the standard `(func $name (export "name") ...` syntax instead. Also replace our other usage of the quoted form in our text output, which was where we quoted names containing characters that are not allowed to appear in standard names. To handle that case, adjust our output from `"$name"` to `$"name"`, which is the standards-track way of supporting such names. Also fix how we detect non-standard name characters to match the spec. Update the lit test output generation script to account for these changes, including by making the `$` prefix on names mandatory. This causes the script to stop interpreting declarative element segments with the `(elem declare ...` syntax as being named "declare", so prevent our generated output from regressing by counting "declare" as a name in the script.
*	Simplify and consolidate type printing (#5816)	Thomas Lively	2023-08-24	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When printing Binaryen IR, we previously generated names for unnamed heap types based on their structure. This was useful for seeing the structure of simple types at a glance without having to separately go look up their definitions, but it also had two problems: 1. The same name could be generated for multiple types. The generated names did not take into account rec group structure or finality, so types that differed only in these properties would have the same name. Also, generated type names were limited in length, so very large types that shared only some structure could also end up with the same names. Using the same name for multiple types produces incorrect and unparsable output. 2. The generated names were not useful beyond the most trivial examples. Even with length limits, names for nontrivial types were extremely long and visually noisy, which made reading disassembled real-world code more challenging. Fix these problems by emitting simple indexed names for unnamed heap types instead. This regresses readability for very simple examples, but the trade off is worth it. This change also reduces the number of type printing systems we have by one. Previously we had the system in Print.cpp, but we had another, more general and extensible system in wasm-type-printing.h and wasm-type.cpp as well. Remove the old type printing system from Print.cpp and replace it with a much smaller use of the new system. This requires significant refactoring of Print.cpp so that PrintExpressionContents object now holds a reference to a parent PrintSExpression object that holds the type name state. This diff is very large because almost every test output changed slightly. To minimize the diff and ease review, change the type printer in wasm-type.cpp to behave the same as the old type printer in Print.cpp except for the differences in name generation. These changes will be reverted in much smaller PRs in the future to generally improve how types are printed.
*	Remove Type ordering (#3793)	Thomas Lively	2021-05-18	1	-1/+1
\| \| \| \| \| \| \| \| \|	As found in #3682, the current implementation of type ordering is not correct, and although the immediate issue would be easy to fix, I don't think the current intended comparison algorithm is correct in the first place. Rather than try to switch to using a correct algorithm (which I am not sure I know how to implement, although I have an idea) this PR removes Type ordering entirely. In places that used Type ordering with std::set or std::map because they require deterministic iteration order, this PR uses InsertOrdered{Set,Map} instead.
*	Standardize NaNs in the interpreter, when there is nondeterminism (#3298)	Alon Zakai	2020-10-30	1	-30/+27
\| \| \| \| \| \| \|	Specifically, pick a simple positive canonical NaN as the NaN output, when the output is a NaN. This is the same as what tools like wabt do. This fixes a testcase found by the fuzzer on #3289 but it was not that PR's fault.
*	Add --fast-math mode (#3155)	Alon Zakai	2020-09-30	1	-8/+44
\| \| \| \| \| \| \| \| \| \| \| \|	Similar to clang and gcc, --fast-math makes us ignore corner cases of floating-point math like NaN changes and (not done yet) lack of associativity and so forth. In the future we may want to have separate fast math flags for each specific thing, like gcc and clang do. This undoes some changes (#2958 and #3096) where we assumed it was ok to not change NaN bits, but @binji corrected us. We can only do such things in fast math mode. This puts those optimizations behind that flag, adds tests for it, and restores the interpreter to the simpler code from before with no special cases.
*	Interpreter: Don't change NaN bits when multiplying by 1 (#3096)	Alon Zakai	2020-09-09	1	-16/+24
\| \| \| \| \| \| \| \| \| \|	Similar to #2958, but for multiplication. I thought this was limited only to division (it doesn't happen for addition, for example), but the fuzzer found that it does indeed happen for multiplication as well. Overall these are kind of workarounds for the interpreter doing normal f32/f64 multiplications using the host CPU, so we pick up any oddness of its NaN behavior. Using soft float might be safer (but much slower).
*	Interpreter: Don't change NaN bits when dividing by 1 (#2958)	Alon Zakai	2020-07-15	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's valid to change NaN bits in that case per the wasm spec, but if we do so then fuzz testcases will fail on the optimization of nan:foo / 1 => nan:foo That is, it is ok to leave the bits as they are, and if we do that then we are consistent with the simple and valid optimization of removing a divide by 1. Found by the fuzzer - looks like on x64 on some float32 NaNs, the bits will actually change (see the testcase). I've seen this on two machines consistently, so it's normal apparently. Disable an old wasm spectest that has been updated in upstream anyhow, but the new test here is even more strict and verifies the interpreter literally changes no bits.
*	Use std::cout for interpreter trap logging (#2755)	Alon Zakai	2020-04-13	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We used std::cerr as a workaround for that this logging interfered with spec testing. But it's easy enough to filter out this stuff for the spec tests. The benefit to using std::cout is that as you can see in the test output here, this is relevant test output - it's not a side channel for debugging. If the rest of the interpreter output is in std::cout but only traps are in std::cerr then they might end up out of order etc., so best to keep them all together. This will allow easier additions of tests for fuzz testcases
*	Remove function index printing (#2742)	Thomas Lively	2020-04-09	1	-2/+2
\| \| \| \| \| \| \| \|	`BinaryIndexes` was only used in two places (Print.cpp and wasm-binary.h), so it didn't seem to be a great fit for module-utils.h. This change moves it to wasm-binary.h and removes its usage in Print.cpp. This means that function indexes are no longer printed, but those were of limited utility and were the source of annoying noise when updating tests, anyway.
*	Remove FunctionType (#2510)	Thomas Lively	2019-12-11	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Function signatures were previously redundantly stored on Function objects as well as on FunctionType objects. These two signature representations had to always be kept in sync, which was error-prone and needlessly complex. This PR takes advantage of the new ability of Type to represent multiple value types by consolidating function signatures as a pair of Types (params and results) stored on the Function object. Since there are no longer module-global named function types, significant changes had to be made to the printing and emitting of function types, as well as their parsing and manipulation in various passes. The C and JS APIs and their tests also had to be updated to remove named function types.
*	Don't include `$` with names unless outputting to wat format (#2506)	Sam Clegg	2019-12-06	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \|	The `$` is not actually part of the name, its the marker that starts a name in the wat format. It can be confusing to see it show up when doing `cerr << name`, for example. This change has Print.cpp add the `$` which seem like the right place to do this. Plus it revealed a bunch of places where were not calling printName to escape all the names we were printing.
*	Print only literal values when printing literals (#2469)	Heejin Ahn	2019-11-26	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current `<<` operator on `Literal` prints `[type].const` with it. But `[type].const` is rather an instruction than a literal itself, and printing it with the literals makes less sense when we later have literals whose type don't have `const` instructions (such as reference types). This patch - Makes `<<` operator on `Literal` print only its value - Makes wasm-shell's shell interface comply with the spec interpreter's printing format (`value : type`). - Prints wasm-shell's `[trap]` message to stderr These make all `fix_` routines for spec tests in check.py unnecessary.
*	Refactor type and function parsing (#2143)	Heejin Ahn	2019-05-24	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Refactored & fixed typeuse parsing rules so now the rules more closely follow the spec. There have been multiple parsing rules that were different in subtle ways, which are supposed to be the same according to the spec. - Duplicate types, i.e., types with the same signature, in the type section are allowed as long as they don't have the same given name. If a name is given, we use it; if type name is not given, we generate one in the form of `$FUNCSIG$` + signature string. If the same generated name already exists in the type section, we append `_` at the end. This causes most of the changes in the autogenerated type names in test outputs. - A typeuse has to be in the order of (type) -> (param) -> (result), if more than one of them exist. In case of function definitions, (local) has to be after all of these. Fixed some test cases that violate this rule. - When only (param)/(result) are given, its type will be the type with the smallest existing type index whose parameter and result are the same. If there's no such type, a new type will be created and inserted. - Added a test case `duplicate_types.wast` to test type namings for duplicate types. - Refactored `parseFunction` function. - Add more overrides to helper functions: `getSig` and `ensureFunctionType`.
*	Compare binaryen fuzz-exec to JS VMs (#1856)	Alon Zakai	2019-01-10	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \|	The main fuzz_opt.py script compares JS VMs, and separately runs binaryen's fuzz-exec that compares the binaryen interpreter to itself (before and after opts). This PR lets us directly compare binaryen's interpreter output to JS VMs. This found a bunch of minor things we can do better on both sides, giving more fuzz coverage. To enable this, a bunch of tiny fixes were needed: * Add --fuzz-exec-before which is like --fuzz-exec but just runs the code before opts are run, instead of before and after. * Normalize double printing (so JS and C++ print comparable things). This includes negative zero in JS, which we never printed properly til now. * Various improvements to how we print fuzz-exec logging - remove unuseful things, and normalize the others across JS and C++. * Properly legalize the wasm when --emit-js-wrapper (i.e., we will run the code from JS), and use that in the JS wrapper code.
*	Print Stack IR in proper .wat format (#1630)	Alon Zakai	2018-08-14	1	-4/+4
\| \| \|	This now makes --generate-stack-ir --print-stack-ir emit a fully valid .wat wasm file, in stacky format.
*	Stack IR (#1623)	Alon Zakai	2018-07-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a new IR, "Stack IR". This represents wasm at a very low level, as a simple stream of instructions, basically the same as wasm's binary format. This is unlike Binaryen IR which is structured and in a tree format. This gives some small wins on binary sizes, less than 1% in most cases, usually 0.25-0.50% or so. That's not much by itself, but looking forward this prepares us for multi-value, which we really need an IR like this to be able to optimize well. Also, it's possible there is more we can do already - currently there are just a few stack IR optimizations implemented, DCE local2stack - check if a set_local/get_local pair can be removed, which keeps the set's value on the stack, which if the stars align it can be popped instead of the get. Block removal - remove any blocks with no branches, as they are valid in wasm binary format. Implementation-wise, the IR is defined in wasm-stack.h. A new StackInst is defined, representing a single instruction. Most are simple reflections of Binaryen IR (an add, a load, etc.), and just pointers to them. Control flow constructs are expanded into multiple instructions, like a block turns into a block begin and end, and we may also emit extra unreachables to handle the fact Binaryen IR has unreachable blocks/ifs/loops but wasm does not. Overall, all the Binaryen IR differences with wasm vanish on the way to stack IR. Where this IR lives: Each Function now has a unique_ptr to stack IR, that is, a function may have stack IR alongside the main IR. If the stack IR is present, we write it out during binary writing; if not, we do the same binaryen IR => wasm binary process as before (this PR should not affect speed there). This design lets us use normal Passes on stack IR, in particular this PR defines 3 passes: Generate stack IR Optimize stack IR (might be worth splitting out into separate passes eventually) Print stack IR for debugging purposes Having these as normal passes is convenient as then they can run in parallel across functions and all the other conveniences of our current Pass system. However, a downside of keeping the second IR as an option on Functions, and using normal Passes to operate on it, means that we may get out of sync: if you generate stack IR, then modify binaryen IR, then the stack IR may no longer be valid (for example, maybe you removed locals or modified instructions in place etc.). To avoid that, Passes now define if they modify Binaryen IR or not; if they do, we throw away the stack IR. Miscellaneous notes: Just writing Stack IR, then writing to binary - no optimizations - is 20% slower than going directly to binary, which is one reason why we still support direct writing. This does lead to some "fun" C++ template code to make that convenient: there is a single StackWriter class, templated over the "mode", which is either Binaryen2Binary (direct writing), Binaryen2Stack, or Stack2Binary. This avoids a lot of boilerplate as the 3 modes share a lot of code in overlapping ways. Stack IR does not support source maps / debug info. We just don't use that IR if debug info is present. A tiny text format comment (if emitting non-minified text) indicates stack IR is present, if it is ((; has Stack IR ;)). This may help with debugging, just in case people forget. There is also a pass to print out the stack IR for debug purposes, as mentioned above. The sieve binaryen.js test was actually not validating all along - these new opts broke it in a more noticeable manner. Fixed. Added extra checks in pass-debug mode, to verify that if stack IR should have been thrown out, it was. This should help avoid any confusion with the IR being invalid. Added a comment about the possible future of stack IR as the main IR, depending on optimization results, following some discussion earlier today.
*	Emit binary function index in comment in text format, for convenience (#1232)	Alon Zakai	2017-10-20	1	-2/+2
\|
*	Fuzzing improvement: Run execution results on the same instance (#1132)	Alon Zakai	2017-08-22	1	-4/+4
\| \| \| \| \| \| \| \|	* run execution results on the same instance, so side effects of memory writes persist, which is the same as when we run the code in a js vm, so we can directly compare * fuzz only exported functions, not things that opts might remove * note results in fuzz-exec by export name
*	New fuzzer (#1126)	Alon Zakai	2017-08-11	1	-1/+7
\| \| \| \| \| \|	This adds a new method of fuzzing, "translate to fuzz" which means we consider the input to be a stream of data that we translate into a valid wasm module. It's sort of like a random seed for a process that creates a random wasm module. By using the input that way, we can explore the space of valid wasm modules quickly, and it makes afl-fuzz integration easy. Also adds a "fuzz binary" option which is similar to "fuzz execution". It makes wasm-opt not only execute the code before and after opts, but also write to binary and read from it, helping to fuzz the binary format.
*	do not combine a load/store offset with a constant pointer if it would wrap ↵	Alon Zakai (kripken)	2017-07-29	1	-0/+25
	a negative value to a positive one, as trapping is tricky