forks/binaryen.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	[NFC] Change topological sort utilities to functions (#6889)	Thomas Lively	2024-09-03	2	-72/+74
\| \| \| \| \| \| \|	Previously they were structs and their results were accessed with `operator*()`, but that was unnecessarily complicated and could lead to problems with temporary lifetimes being too short. Simplify the utilities by making them functions. This also allows the wrapper templates to infer the proper element types automatically.
*	Add a utility for finding minimal topological sorts (#6884)	Thomas Lively	2024-08-29	2	-9/+115
\| \| \| \| \| \| \| \|	Reuse the code implementing Kahn's topological sort algorithm with a new configuration that uses a min-heap to always choose the best available element. Also add wrapper utilities that can find topological sorts of graphs with arbitrary element types, not just indices.
*	Simplify TopologicalOrders (#6811)	Thomas Lively	2024-08-07	2	-28/+22
\| \| \| \| \| \|	Make `TopologicalOrders` its own iterator rather than having a separate iterator class that wraps a pointer to `TopologicalOrders`. This simplifies usage in cases where an iterator needs to be persistently stored. Notably, all of the tests continue working as they are.
*	Add a utility for iterating over all topological orders (#6801)	Thomas Lively	2024-08-05	4	-2/+225
\| \| \| \| \| \| \| \| \|	Use an extension of Kahn's algorithm for finding topological orders that iteratively makes every possible choice at every step to find all the topological orders. The order being constructed and the set of possible choices are managed in-place in the same buffer, so the algorithm takes linear time and space plus amortized constant time per generated order. This will be used in an upcoming type optimization.
*	Add a disjoint sets (union-find) utility (#6797)	Thomas Lively	2024-08-01	1	-0/+87
\| \| \| \|	This will be used in an upcoming type optimization pass and may be generally useful.
*	Add a Tarjan's Strongly Connected Component utilty (#6790)	Thomas Lively	2024-07-30	1	-0/+262
\| \| \| \| \| \| \| \| \|	Implement a non-recursive version of Tarjan's Strongly Connected Component algorithm that consumes and produces iterators for maximum flexibility. This will be used in an optimization that transforms the heap type graph to use minimal recursion groups, which correspond to the strongly connected components of the type graph.
*	[NFC] Add pragma to ignore maybe-uninitialized in strings.cpp (#6686)	Thomas Lively	2024-06-20	1	-0/+5
\| \| \|	This will hopefully fix the build on the coverage builder.
*	Validate that names are valid UTF-8 (#6682)	Thomas Lively	2024-06-19	2	-0/+25
\| \| \| \| \| \|	Add an `isUTF8` utility and use it in both the text and binary parsers. Add missing checks for overlong encodings and overlarge code points in our WTF8 reader, which the new utility uses. Re-enable the spec tests that test UTF-8 validation.
*	Fuzzer: Better fuzzing of globals (#6611)	Alon Zakai	2024-05-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	With this PR we generate global.gets in globals, which we did not do before. We do that by replacing makeConst (the only thing we did before, for the contents of globals) with makeTrivial, and add code to makeTrivial to sometimes make a global.get. When no suitable global exists, makeGlobalGet will emit a constant, so there is no danger in trying. Also raise the number of globals a little. Also explicitly note the current limitation of requiring all tuple globals to contain tuple.make and nothing else, including not global.get, and avoid adding such invalid global.gets in tuple globals in the fuzzer.
*	Do not add an extra null character when reading files (#6538)	Thomas Lively	2024-04-24	1	-3/+3
\| \| \| \| \| \| \| \|	The new wat parser currently considers itself to be at the end of the file whenever it cannot lex another token. This is not quite right, but fixing it causes parser errors because of the extra null character we were appending to files when we read them. This null character is not useful since we can already read files as `std::string`, which always has an implicit null character, so remove it. Clean up some users of `read_file` while we're at it.
*	[Strings] Add a string lowering pass using magic imports (#6497)	Thomas Lively	2024-04-15	2	-12/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The latest idea for efficient string constants is to encode the constants in the import names of their globals and implement fast paths in the engines for materializing those constants at instantiation time without needing to parse anything in JS. This strategy only works for valid strings (i.e. strings without unpaired surrogates) because only valid strings can be used as import names in the WebAssembly syntax. Add a new configuration of the StringLowering pass that encodes valid string contents in import names, falling back to the JSON custom section approach for invalid strings. To test this chang, update the printer to escape import and export names properly and update the legacy parser to parse escapes in import and export names properly. As a drive-by, remove the incorrect check in the parser that the import module and base names are non-empty.
*	[Strings] Represent string values as WTF-16 internally (#6418)	Thomas Lively	2024-03-22	3	-55/+202
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	WTF-16, i.e. arbitrary sequences of 16-bit values, is the encoding of Java and JavaScript strings, and using the same encoding makes the interpretation of string operations trivial, even when accounting for non-ascii characters. Specifically, use little-endian WTF-16. Re-encode string constants from WTF-8 to WTF-16 in the parsers, then back to WTF-8 in the writers. Update the constructor for string `Literal`s to interpret the string as WTF-16 and store a sequence of WTF-16 code units, i.e. 16-bit integers. Update `Builder::makeConstantExpression` accordingly to convert from the new `Literal` string representation back to a WTF-16 string. Update the interpreter to remove the logic for detecting non-ascii characters and bailing out. The naive implementations of all the string operations are correct now that our string encoding matches the JS string encoding.
*	[NFC] Fix build error on RISC-V 64 (#6410)	moui0	2024-03-18	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similar issue as: #6330 FAILED: src/passes/CMakeFiles/passes.dir/Precompute.cpp.o /usr/bin/c++ -I/build/binaryen/src/binaryen-version_117/src -I/build/binaryen/src/binaryen-version_117/third_party/llvm-project/include -I/build/binaryen/src/binaryen-version_117/build -march=rv64gc -mabi=lp64d -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -Wp,-D_GLIBCXX_ASSERTIONS -g -ffile-prefix-map=/build/binaryen/src=/usr/src/debug/binaryen -DBUILD_LLVM_DWARF -Wall -Werror -Wextra -Wno-unused-parameter -Wno-dangling-pointer -fno-omit-frame-pointer -fno-rtti -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -Wswitch -Wimplicit-fallthrough -Wnon-virtual-dtor -fPIC -fdiagnostics-color=always -O3 -DNDEBUG -UNDEBUG -std=c++17 -MD -MT src/passes/CMakeFiles/passes.dir/Precompute.cpp.o -MF src/passes/CMakeFiles/passes.dir/Precompute.cpp.o.d -o src/passes/CMakeFiles/passes.dir/Precompute.cpp.o -c /build/binaryen/src/binaryen-version_117/src/passes/Precompute.cpp In file included from /build/binaryen/src/binaryen-version_117/src/wasm-traversal.h:30, from /build/binaryen/src/binaryen-version_117/src/pass.h:24, from /build/binaryen/src/binaryen-version_117/src/ir/intrinsics.h:20, from /build/binaryen/src/binaryen-version_117/src/ir/effects.h:20, from /build/binaryen/src/binaryen-version_117/src/passes/Precompute.cpp:30: In copy constructor ‘wasm::SmallVector<wasm::Expression, 10>::SmallVector(const wasm::SmallVector<wasm::Expression, 10>&)’, inlined from ‘constexpr std::pair<_T1, _T2>::pair(const _T1&, const _T2&) [with _U1 = wasm::Select* const; _U2 = wasm::SmallVector<wasm::Expression, 10>; typename std::enable_if<(std::_PCC<true, _T1, _T2>::_ConstructiblePair<_U1, _U2>() && std::_PCC<true, _T1, _T2>::_ImplicitlyConvertiblePair<_U1, _U2>()), bool>::type <anonymous> = true; _T1 = wasm::Select const; _T2 = wasm::SmallVector<wasm::Expression, 10>]’ at /usr/include/c++/13.2.1/bits/stl_pair.h:559:21, inlined from ‘T& wasm::InsertOrderedMap<Key, T>::operator[](const Key&) [with Key = wasm::Select; T = wasm::SmallVector<wasm::Expression, 10>]’ at /build/binaryen/src/binaryen-version_117/src/support/insert_ordered.h:112:29: /build/binaryen/src/binaryen-version_117/src/support/small_vector.h:42:38: error: ‘<unnamed>.wasm::SmallVector<wasm::Expression, 10>::fixed’ is used uninitialized [-Werror=uninitialized] 42 \| template<typename T, size_t N> class SmallVector { \| ^~~~~~~~~~~ In file included from /build/binaryen/src/binaryen-version_117/src/passes/Precompute.cpp:38: /build/binaryen/src/binaryen-version_117/src/support/insert_ordered.h: In function ‘T& wasm::InsertOrderedMap<Key, T>::operator[](const Key&) [with Key = wasm::Select; T = wasm::SmallVector<wasm::Expression, 10>]’: /build/binaryen/src/binaryen-version_117/src/support/insert_ordered.h:112:29: note: ‘<anonymous>’ declared here 112 \| std::pair<const Key, T> kv = {k, {}}; \| ^~
*	[NFC] Use ifdef-else in threads.cpp (#6355)	Alon Zakai	2024-02-27	1	-2/+2
\|
*	[Emscripten port] Fix core count logic for Emscripten+pthreads (#6350)	Alon Zakai	2024-02-26	1	-3/+5
\| \| \| \|	Before this all Emscripten builds would use 1 core, but it is important to allow pthreads builds there to use more.
*	Fix build error on aarch64 [NFC] (#6330)	Darren Worrall	2024-02-21	1	-0/+14
\|
*	Improve JSON string encoding (#6328)	Thomas Lively	2024-02-21	1	-69/+103
\| \| \| \| \| \| \| \|	Catch and report all kinds of WTF-8 encoding errors in the source strings, including invalid leading bytes, invalid trailing bytes, unexpected ends of strings, and invalid surrogate sequences. Insert replacement characters into the output as necessary. Add a TODO about minimizing size by escaping only those code points mandated to be escaped by the JSON spec. Generally improve readability of the code.
*	StringLowering: Escape the JSON in the custom section (#6316)	Alon Zakai	2024-02-20	3	-4/+92
\| \| \| \|	Also add an end-to-end test using node to verify we can parse the escaped content properly using TextDecoder+JSON.parse.
*	[NFC] Move code to string.cpp (#6282)	Thomas Lively	2024-02-06	2	-84/+92
\| \| \| \|	Now that we have a .cpp file, none of the code that was in string.h needs to be in a header any more.
*	Properly stringify names in tests (#6279)	Thomas Lively	2024-02-06	5	-0/+123
\| \| \| \| \| \| \| \| \| \| \| \| \|	Update identifiers used in tests to use a format supported by the new text parser, i.e. either the standard format with its limited set of allowed characters or the non-standard `$"..."` format. Notably, any name containing square or curly braces now uses the string format. Input automatically updated with this script: https://gist.github.com/tlively/4e22311736661849e641d02e521a0748 The printer is updated to properly escape names in more places as well. The logic for escaping names is moved to a common location so that the type printing logic in wasm-type.cpp can use it as well.
*	JSON: Add simple printing and creation (#6265)	Alon Zakai	2024-02-01	3	-0/+46
\|
*	Memory flattening: Check for overflow (#6233)	Alon Zakai	2024-01-24	1	-0/+43
\| \| \| \| \|	Fixes a fuzz testcase for wasm-ctor-eval. Add the beginnings of a polyfill for stdckdint.h to help that.
*	[Parser] Parse tables and element segments (#6147)	Thomas Lively	2023-12-06	1	-0/+3
\| \| \| \| \| \| \|	These module fields are especially complex to parse because they contain both nontrivial types and instructions, so their parsing logic needs to be spread out across the ParseDecls, ParseModuleTypes, and ParseDefs phases of parsing. This applies to in-line elements in table definitions as well, which means we need to be able to match a table to its in-line element segment across multiple phases.
*	Support one-line-one-function file format for asyncify lists (#6051)	Alexander Guryanov	2023-10-30	1	-3/+36
\| \| \| \| \| \| \|	If there are newlines in the list, then we split using them in a simple manner (that does not take into account nesting of any other delimiters). Fixes #6047 Fixes #5271
*	[Outlining] Filter Local Set (#6018)	Ashley Nelson	2023-10-18	1	-0/+14
\| \| \| \| \|	Adds a general purpose walker named FilterStringifyWalker, intended to walk control flow and take note of whether any of the expressions satisfy the condition. Also includes an << overload for SuffixTree::RepeatedSubstring to make debugging easier.
*	Encode command line to UTF8 on Windows (#5671)	Derek Schuff	2023-09-14	4	-9/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR changes how file paths and the command line are handled. On startup on Windows, we process the wstring version of the command line (including the file paths) and re-encode it to UTF8 before handing it off to the rest of the command line handling logic. This means that all paths are stored in UTF8-encoded std::strings as they go through the program, right up until they are used to open files. At that time, they are converted to the appropriate native format with the new to_path function before passing to the stdlib open functions. This has the advantage that all of the non-file-opening code can use a single type to hold paths (which is good since std::filesystem::path has proved problematic in some cases), but has the disadvantage that someone could add new code that forgets to convert to_path before opening. That's somewhat mitigated by the fact that most of the code uses the ModuleIOBase classes for opening files. Fixes #4995
*	Avoid off_t in small_vector.h (#5936)	Alon Zakai	2023-09-13	1	-1/+3
\| \| \| \|	Fixes #5928 , on FreeBSD off_t is not defined in the headers we include.
*	[NFC] Factor `Result` and `MaybeResult` into a utility header (#5878)	Thomas Lively	2023-08-14	1	-0/+87
\| \| \|	Allow them to be used for more than just the new text parser.
*	[Outlining] Integration test for suffix_tree and stringify (#5839)	Ashley Nelson	2023-08-03	1	-0/+11
\| \| \|	Adds an integration test that identifies the substrings of a stringified wasm module using the suffix_tree.
*	SmallVector iteration improvements (#5825)	Alon Zakai	2023-07-26	1	-1/+26
\|
*	[Outlining] LLVM Suffix Tree (#5821)	Ashley Nelson	2023-07-18	5	-1/+754
\| \| \| \| \|	This PR adds LLVM's suffix tree data structure to Binaryen. This suffix tree is implemented using Ukkonen's algorithm for linear-time suffix tree construction, and is intended for fast substring queries. Note: All of the .h and .cpp files included are from LLVM. These files were copied directly instead of imported into our existing LLVM integration (in third_party/) to avoid bumping the commit hash and avoid the potential for complications with upstream changes.
*	TypeSSA: Handle collisions by adding a hash to ensure a fresh rec group (#5724)	Alon Zakai	2023-05-19	1	-1/+4
\| \| \|	Fixes #5720
*	[NFC] Remove our bespoke `make_unique` implementation (#5613)	Thomas Lively	2023-03-31	2	-8/+3
\| \| \| \|	This code predates our adoption of C++14 and can now be removed in favor of `std::make_unique`, which should be more efficient.
*	Fix ambiguous operators under C++20 (#5567)	Thomas Lively	2023-03-10	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	When resolving `operator!=`, C++20 also considers `operator==` implementations when the types on `operator!=` do not match exactly. This caused the modified code to have no most-specific overload to choose, resulting in an error. This is actually a bug in the language that is being fixed, but there exist compilers without the fix applied. Work around the problem by updating the types in the declaration of `operator==` and `operator!=` to be more exact. This is a copy of #5029 with formatting fixes.
*	Add Valmari-Lehtinen DFA minimization as a standalone utility (#5430)	Thomas Lively	2023-01-17	3	-0/+466
\| \| \| \| \| \| \| \|	We used to have this algorithm in wasm-type.cpp, where we used it to implement equirecursive type canonicalization, but we removed it when we removed equirecursive typing. Bring the algorithm back as a standalone utility for future use in optimization passes. In particular, it will be useful in TypeMerging for identifying the greatest fixed point of mergeable types rather than the smallest fixed point.
*	Make wasm::IString::operator bool() explicit (#5371)	higher-performance	2022-12-21	1	-1/+1
\| \| \|	Fixes #5370
*	Do not optimize public types (#5347)	Thomas Lively	2022-12-16	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Do not optimize or modify public heap types in any way. Public heap types include the types of imported or exported functions, tables, globals, etc. This is important to maintain the public interface of a module and ensure it can still link interact as intended with the outside world. Also add validation error if we find any nontrivial public types that are not the types of imported or exported functions. This error is meant to help the user ensure that type optimizations are not silently inhibited. In the future, we may want to add options to silence this error or downgrade it to a warning. This commit only updates the type updating machinery to avoid updating public types. It does not update any optimization passes accordingly. Since we avoid modifying public signature types already, this is not expected to break anything, but in the future once we have function subtyping or if we make the error optional, we may have to update some of our optimization passes.
*	Use C++17's [[maybe_unused]]. NFC (#5309)	Sam Clegg	2022-12-02	2	-4/+3
\|
*	[NFC] Avoid scanning SmallSets twice during insert (#5221)	Alon Zakai	2022-11-30	1	-15/+39
\| \| \|	As suggested in #5218
*	Avoid calling back() on an empty string in setBinaryenBinDir (#5280)	Piotr Sarna	2022-11-18	1	-1/+1
\| \| \| \| \| \| \| \| \|	std::string::back() is only well defined for non-empty strings. Without the change, wasm-reduce fails if it is called from $PATH, because then, the parent directory is an empty string. A workaround is to explicitly set the binaryen path with -b, and it is still necessary after this fix, but at least the program ends with a comprehensible error message instead of a generic assertion failure from the standard library.
*	Fix warnings from -Wheader-hygiene and -Wimplicit-const-int-float-conversion ↵	Martin Kustermann	2022-11-17	1	-2/+2
\| \| \| \| \| \| \| \| \|	(#5273) When `-Wheader-hygiene` is enabled, C compiler will warn when using namespace directive in global context in header file. When `-Wimplicit-const-int-float-conversion` is enabled C compiler will warn on implicit integer to double conversions that change values.
*	Switch from `typedef` to `using` in C++ code. NFC (#5258)	Sam Clegg	2022-11-15	3	-8/+8
\| \| \| \|	This is more modern and (IMHO) easier to read than that old C typedef syntax.
*	Fix SmallSet ordering (#5218)	Alon Zakai	2022-11-04	1	-31/+92
\| \| \|	We did not preserve the ordering of the fixed-size storage there.
*	[NFC] Add a generic hash implementation for tuples (#5162)	Thomas Lively	2022-10-19	1	-0/+18
\| \| \| \| \|	We already provided a specialization of `std::hash` for arbitrary pairs, so add one for `std::tuple` as well. Use the new specialization where we were previously using nested pairs just to be able to use the pair specialization.
*	Make `Name` a pointer, length pair (#5122)	Thomas Lively	2022-10-11	5	-18/+221
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the goal of supporting null characters (i.e. zero bytes) in strings. Rewrite the underlying interned `IString` to store a `std::string_view` rather than a `const char`, reduce the number of map lookups necessary to intern a string, and present a more immutable interface. Most importantly, replace the `c_str()` method that returned a `const char` with a `toString()` method that returns a `std::string`. This new method can correctly handle strings containing null characters. A `const char` can still be had by calling `data()` on the `std::string_view`, although this usage should be discouraged. This change is NFC in spirit, although not in practice. It does not intend to support any particular new functionality, but it is probably now possible to use strings containing null characters in at least some cases. At least one parser bug is also incidentally fixed. Follow-on PRs will explicitly support and test strings containing nulls for particular use cases. The C API still uses `const char` to represent strings. As strings containing nulls become better supported by the rest of Binaryen, this will no longer be sufficient. Updating the C and JS APIs to use pointer, length pairs is left as future work.
*	[NFC] Remove `cashew::` namespace from IString (#5126)	Thomas Lively	2022-10-10	1	-1/+1
\| \| \| \|	As an NFC preliminary change that will minimize the diff in #5122, which moves IString to the wasm namespace.
*	Change `exit()` to `Fatal()`; make it possible to throw on `Fatal()`. (#5087)	Brian Anderson	2022-10-01	2	-12/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch makes binaryen easier to call from other applications by making more errors recoverable instead of early-exiting. The main thing it does is change three calls to exit on I/O errors into calls to Fatal(), which is an existing custom abstraction for handling unrecoverable errors. Currently Fatal's destructor calls _Exit(1). My intent is to make it possible for Fatal to not exit, but to throw, allowing an embedding application to catch the exception. Because the previous early exits were exiting with error code EXIT_FAILURE, I also changed Fatal to exit with EXIT_FAILURE. The test suite continues to pass so I assume this is ok. Next I changed Fatal to buffer its error message until the destructor instead of immediately printing it to stderr. This is for ease of patching Fatal to throw instead. Finally, I also included the patch I need to make Fatal throw when THROW_ON_FATAL is defined at compile time. I can carry this patch out of tree, but it is a small patch, so perhaps you will be willing to take it. I am happy to remove it. Fixes #4938
*	Fix bugs with copying expressions (#5099)	Thomas Lively	2022-09-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It does not make sense to construct an `Expression` directly because all expressions must be specific expressions. However, we previously allowed constructing Expressions, and in particular we allowed them to be copy constructed. Unrelatedly, `Fatal::operator<<` took its argument by value. Together, these two facts produced UB when printing Expressions in fatal error messages because a new Expression would be copy constructed with the original expression ID but without any of the actual data from the original specific expression. For example, when trying to print a Block, the printing code would try to look at the expression list, but the expression list would be junk stack data because the copied Expression does not contain an expression list. Fix the problem by making Expression's constructors visible only to its subclasses and making `Fatal::operator<<` take its argument by forwarding reference instead of by value.
*	[GUFA] Improve hashing (#5091)	Alon Zakai	2022-09-28	1	-1/+1
\| \| \| \|	Avoid manually doing bitshifts etc. - leave combining to the core hash logic, which can do a better job.
*	Fix SmallVector's resize() method (#4979)	Alon Zakai	2022-08-29	1	-0/+2
\| \| \| \|	A resize from a large amount to a small amount would sometimes not clear the flexible storage, if we used it before but not after.