summaryrefslogtreecommitdiff
path: root/src/support
Commit message (Collapse)AuthorAgeFilesLines
* Fuzzer: Better fuzzing of globals (#6611)Alon Zakai2024-05-211-1/+1
| | | | | | | | | | | | | With this PR we generate global.gets in globals, which we did not do before. We do that by replacing makeConst (the only thing we did before, for the contents of globals) with makeTrivial, and add code to makeTrivial to sometimes make a global.get. When no suitable global exists, makeGlobalGet will emit a constant, so there is no danger in trying. Also raise the number of globals a little. Also explicitly note the current limitation of requiring all tuple globals to contain tuple.make and nothing else, including not global.get, and avoid adding such invalid global.gets in tuple globals in the fuzzer.
* Do not add an extra null character when reading files (#6538)Thomas Lively2024-04-241-3/+3
| | | | | | | | The new wat parser currently considers itself to be at the end of the file whenever it cannot lex another token. This is not quite right, but fixing it causes parser errors because of the extra null character we were appending to files when we read them. This null character is not useful since we can already read files as `std::string`, which always has an implicit null character, so remove it. Clean up some users of `read_file` while we're at it.
* [Strings] Add a string lowering pass using magic imports (#6497)Thomas Lively2024-04-152-12/+34
| | | | | | | | | | | | | | | | | The latest idea for efficient string constants is to encode the constants in the import names of their globals and implement fast paths in the engines for materializing those constants at instantiation time without needing to parse anything in JS. This strategy only works for valid strings (i.e. strings without unpaired surrogates) because only valid strings can be used as import names in the WebAssembly syntax. Add a new configuration of the StringLowering pass that encodes valid string contents in import names, falling back to the JSON custom section approach for invalid strings. To test this chang, update the printer to escape import and export names properly and update the legacy parser to parse escapes in import and export names properly. As a drive-by, remove the incorrect check in the parser that the import module and base names are non-empty.
* [Strings] Represent string values as WTF-16 internally (#6418)Thomas Lively2024-03-223-55/+202
| | | | | | | | | | | | | | | | WTF-16, i.e. arbitrary sequences of 16-bit values, is the encoding of Java and JavaScript strings, and using the same encoding makes the interpretation of string operations trivial, even when accounting for non-ascii characters. Specifically, use little-endian WTF-16. Re-encode string constants from WTF-8 to WTF-16 in the parsers, then back to WTF-8 in the writers. Update the constructor for string `Literal`s to interpret the string as WTF-16 and store a sequence of WTF-16 code units, i.e. 16-bit integers. Update `Builder::makeConstantExpression` accordingly to convert from the new `Literal` string representation back to a WTF-16 string. Update the interpreter to remove the logic for detecting non-ascii characters and bailing out. The naive implementations of all the string operations are correct now that our string encoding matches the JS string encoding.
* [NFC] Fix build error on RISC-V 64 (#6410)moui02024-03-181-1/+11
| | | | | | | | | | | | | | | | | | | | | | Similar issue as: #6330 FAILED: src/passes/CMakeFiles/passes.dir/Precompute.cpp.o /usr/bin/c++ -I/build/binaryen/src/binaryen-version_117/src -I/build/binaryen/src/binaryen-version_117/third_party/llvm-project/include -I/build/binaryen/src/binaryen-version_117/build -march=rv64gc -mabi=lp64d -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -Wp,-D_GLIBCXX_ASSERTIONS -g -ffile-prefix-map=/build/binaryen/src=/usr/src/debug/binaryen -DBUILD_LLVM_DWARF -Wall -Werror -Wextra -Wno-unused-parameter -Wno-dangling-pointer -fno-omit-frame-pointer -fno-rtti -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -Wswitch -Wimplicit-fallthrough -Wnon-virtual-dtor -fPIC -fdiagnostics-color=always -O3 -DNDEBUG -UNDEBUG -std=c++17 -MD -MT src/passes/CMakeFiles/passes.dir/Precompute.cpp.o -MF src/passes/CMakeFiles/passes.dir/Precompute.cpp.o.d -o src/passes/CMakeFiles/passes.dir/Precompute.cpp.o -c /build/binaryen/src/binaryen-version_117/src/passes/Precompute.cpp In file included from /build/binaryen/src/binaryen-version_117/src/wasm-traversal.h:30, from /build/binaryen/src/binaryen-version_117/src/pass.h:24, from /build/binaryen/src/binaryen-version_117/src/ir/intrinsics.h:20, from /build/binaryen/src/binaryen-version_117/src/ir/effects.h:20, from /build/binaryen/src/binaryen-version_117/src/passes/Precompute.cpp:30: In copy constructor ‘wasm::SmallVector<wasm::Expression*, 10>::SmallVector(const wasm::SmallVector<wasm::Expression*, 10>&)’, inlined from ‘constexpr std::pair<_T1, _T2>::pair(const _T1&, const _T2&) [with _U1 = wasm::Select* const; _U2 = wasm::SmallVector<wasm::Expression*, 10>; typename std::enable_if<(std::_PCC<true, _T1, _T2>::_ConstructiblePair<_U1, _U2>() && std::_PCC<true, _T1, _T2>::_ImplicitlyConvertiblePair<_U1, _U2>()), bool>::type <anonymous> = true; _T1 = wasm::Select* const; _T2 = wasm::SmallVector<wasm::Expression*, 10>]’ at /usr/include/c++/13.2.1/bits/stl_pair.h:559:21, inlined from ‘T& wasm::InsertOrderedMap<Key, T>::operator[](const Key&) [with Key = wasm::Select*; T = wasm::SmallVector<wasm::Expression*, 10>]’ at /build/binaryen/src/binaryen-version_117/src/support/insert_ordered.h:112:29: /build/binaryen/src/binaryen-version_117/src/support/small_vector.h:42:38: error: ‘<unnamed>.wasm::SmallVector<wasm::Expression*, 10>::fixed’ is used uninitialized [-Werror=uninitialized] 42 | template<typename T, size_t N> class SmallVector { | ^~~~~~~~~~~ In file included from /build/binaryen/src/binaryen-version_117/src/passes/Precompute.cpp:38: /build/binaryen/src/binaryen-version_117/src/support/insert_ordered.h: In function ‘T& wasm::InsertOrderedMap<Key, T>::operator[](const Key&) [with Key = wasm::Select*; T = wasm::SmallVector<wasm::Expression*, 10>]’: /build/binaryen/src/binaryen-version_117/src/support/insert_ordered.h:112:29: note: ‘<anonymous>’ declared here 112 | std::pair<const Key, T> kv = {k, {}}; | ^~
* [NFC] Use ifdef-else in threads.cpp (#6355)Alon Zakai2024-02-271-2/+2
|
* [Emscripten port] Fix core count logic for Emscripten+pthreads (#6350)Alon Zakai2024-02-261-3/+5
| | | | Before this all Emscripten builds would use 1 core, but it is important to allow pthreads builds there to use more.
* Fix build error on aarch64 [NFC] (#6330)Darren Worrall2024-02-211-0/+14
|
* Improve JSON string encoding (#6328)Thomas Lively2024-02-211-69/+103
| | | | | | | | Catch and report all kinds of WTF-8 encoding errors in the source strings, including invalid leading bytes, invalid trailing bytes, unexpected ends of strings, and invalid surrogate sequences. Insert replacement characters into the output as necessary. Add a TODO about minimizing size by escaping only those code points mandated to be escaped by the JSON spec. Generally improve readability of the code.
* StringLowering: Escape the JSON in the custom section (#6316)Alon Zakai2024-02-203-4/+92
| | | | Also add an end-to-end test using node to verify we can parse the escaped content properly using TextDecoder+JSON.parse.
* [NFC] Move code to string.cpp (#6282)Thomas Lively2024-02-062-84/+92
| | | | Now that we have a .cpp file, none of the code that was in string.h needs to be in a header any more.
* Properly stringify names in tests (#6279)Thomas Lively2024-02-065-0/+123
| | | | | | | | | | | | | Update identifiers used in tests to use a format supported by the new text parser, i.e. either the standard format with its limited set of allowed characters or the non-standard `$"..."` format. Notably, any name containing square or curly braces now uses the string format. Input automatically updated with this script: https://gist.github.com/tlively/4e22311736661849e641d02e521a0748 The printer is updated to properly escape names in more places as well. The logic for escaping names is moved to a common location so that the type printing logic in wasm-type.cpp can use it as well.
* JSON: Add simple printing and creation (#6265)Alon Zakai2024-02-013-0/+46
|
* Memory flattening: Check for overflow (#6233)Alon Zakai2024-01-241-0/+43
| | | | | Fixes a fuzz testcase for wasm-ctor-eval. Add the beginnings of a polyfill for stdckdint.h to help that.
* [Parser] Parse tables and element segments (#6147)Thomas Lively2023-12-061-0/+3
| | | | | | | These module fields are especially complex to parse because they contain both nontrivial types and instructions, so their parsing logic needs to be spread out across the ParseDecls, ParseModuleTypes, and ParseDefs phases of parsing. This applies to in-line elements in table definitions as well, which means we need to be able to match a table to its in-line element segment across multiple phases.
* Support one-line-one-function file format for asyncify lists (#6051)Alexander Guryanov2023-10-301-3/+36
| | | | | | | If there are newlines in the list, then we split using them in a simple manner (that does not take into account nesting of any other delimiters). Fixes #6047 Fixes #5271
* [Outlining] Filter Local Set (#6018)Ashley Nelson2023-10-181-0/+14
| | | | | Adds a general purpose walker named FilterStringifyWalker, intended to walk control flow and take note of whether any of the expressions satisfy the condition. Also includes an << overload for SuffixTree::RepeatedSubstring to make debugging easier.
* Encode command line to UTF8 on Windows (#5671)Derek Schuff2023-09-144-9/+83
| | | | | | | | | | | | | | | | This PR changes how file paths and the command line are handled. On startup on Windows, we process the wstring version of the command line (including the file paths) and re-encode it to UTF8 before handing it off to the rest of the command line handling logic. This means that all paths are stored in UTF8-encoded std::strings as they go through the program, right up until they are used to open files. At that time, they are converted to the appropriate native format with the new to_path function before passing to the stdlib open functions. This has the advantage that all of the non-file-opening code can use a single type to hold paths (which is good since std::filesystem::path has proved problematic in some cases), but has the disadvantage that someone could add new code that forgets to convert to_path before opening. That's somewhat mitigated by the fact that most of the code uses the ModuleIOBase classes for opening files. Fixes #4995
* Avoid off_t in small_vector.h (#5936)Alon Zakai2023-09-131-1/+3
| | | | Fixes #5928 , on FreeBSD off_t is not defined in the headers we include.
* [NFC] Factor `Result` and `MaybeResult` into a utility header (#5878)Thomas Lively2023-08-141-0/+87
| | | Allow them to be used for more than just the new text parser.
* [Outlining] Integration test for suffix_tree and stringify (#5839)Ashley Nelson2023-08-031-0/+11
| | | Adds an integration test that identifies the substrings of a stringified wasm module using the suffix_tree.
* SmallVector iteration improvements (#5825)Alon Zakai2023-07-261-1/+26
|
* [Outlining] LLVM Suffix Tree (#5821)Ashley Nelson2023-07-185-1/+754
| | | | | This PR adds LLVM's suffix tree data structure to Binaryen. This suffix tree is implemented using Ukkonen's algorithm for linear-time suffix tree construction, and is intended for fast substring queries. Note: All of the .h and .cpp files included are from LLVM. These files were copied directly instead of imported into our existing LLVM integration (in third_party/) to avoid bumping the commit hash and avoid the potential for complications with upstream changes.
* TypeSSA: Handle collisions by adding a hash to ensure a fresh rec group (#5724)Alon Zakai2023-05-191-1/+4
| | | Fixes #5720
* [NFC] Remove our bespoke `make_unique` implementation (#5613)Thomas Lively2023-03-312-8/+3
| | | | This code predates our adoption of C++14 and can now be removed in favor of `std::make_unique`, which should be more efficient.
* Fix ambiguous operators under C++20 (#5567)Thomas Lively2023-03-101-2/+4
| | | | | | | | | | | | When resolving `operator!=`, C++20 also considers `operator==` implementations when the types on `operator!=` do not match exactly. This caused the modified code to have no most-specific overload to choose, resulting in an error. This is actually a bug in the language that is being fixed, but there exist compilers without the fix applied. Work around the problem by updating the types in the declaration of `operator==` and `operator!=` to be more exact. This is a copy of #5029 with formatting fixes.
* Add Valmari-Lehtinen DFA minimization as a standalone utility (#5430)Thomas Lively2023-01-173-0/+466
| | | | | | | | We used to have this algorithm in wasm-type.cpp, where we used it to implement equirecursive type canonicalization, but we removed it when we removed equirecursive typing. Bring the algorithm back as a standalone utility for future use in optimization passes. In particular, it will be useful in TypeMerging for identifying the greatest fixed point of mergeable types rather than the smallest fixed point.
* Make wasm::IString::operator bool() explicit (#5371)higher-performance2022-12-211-1/+1
| | | Fixes #5370
* Do not optimize public types (#5347)Thomas Lively2022-12-161-4/+5
| | | | | | | | | | | | | | | | | Do not optimize or modify public heap types in any way. Public heap types include the types of imported or exported functions, tables, globals, etc. This is important to maintain the public interface of a module and ensure it can still link interact as intended with the outside world. Also add validation error if we find any nontrivial public types that are not the types of imported or exported functions. This error is meant to help the user ensure that type optimizations are not silently inhibited. In the future, we may want to add options to silence this error or downgrade it to a warning. This commit only updates the type updating machinery to avoid updating public types. It does not update any optimization passes accordingly. Since we avoid modifying public signature types already, this is not expected to break anything, but in the future once we have function subtyping or if we make the error optional, we may have to update some of our optimization passes.
* Use C++17's [[maybe_unused]]. NFC (#5309)Sam Clegg2022-12-022-4/+3
|
* [NFC] Avoid scanning SmallSets twice during insert (#5221)Alon Zakai2022-11-301-15/+39
| | | As suggested in #5218
* Avoid calling back() on an empty string in setBinaryenBinDir (#5280)Piotr Sarna2022-11-181-1/+1
| | | | | | | | | std::string::back() is only well defined for non-empty strings. Without the change, wasm-reduce fails if it is called from $PATH, because then, the parent directory is an empty string. A workaround is to explicitly set the binaryen path with -b, and it is still necessary after this fix, but at least the program ends with a comprehensible error message instead of a generic assertion failure from the standard library.
* Fix warnings from -Wheader-hygiene and -Wimplicit-const-int-float-conversion ↵Martin Kustermann2022-11-171-2/+2
| | | | | | | | | (#5273) When `-Wheader-hygiene` is enabled, C compiler will warn when using namespace directive in global context in header file. When `-Wimplicit-const-int-float-conversion` is enabled C compiler will warn on implicit integer to double conversions that change values.
* Switch from `typedef` to `using` in C++ code. NFC (#5258)Sam Clegg2022-11-153-8/+8
| | | | This is more modern and (IMHO) easier to read than that old C typedef syntax.
* Fix SmallSet ordering (#5218)Alon Zakai2022-11-041-31/+92
| | | We did not preserve the ordering of the fixed-size storage there.
* [NFC] Add a generic hash implementation for tuples (#5162)Thomas Lively2022-10-191-0/+18
| | | | | We already provided a specialization of `std::hash` for arbitrary pairs, so add one for `std::tuple` as well. Use the new specialization where we were previously using nested pairs just to be able to use the pair specialization.
* Make `Name` a pointer, length pair (#5122)Thomas Lively2022-10-115-18/+221
| | | | | | | | | | | | | | | | | | | | | | | With the goal of supporting null characters (i.e. zero bytes) in strings. Rewrite the underlying interned `IString` to store a `std::string_view` rather than a `const char*`, reduce the number of map lookups necessary to intern a string, and present a more immutable interface. Most importantly, replace the `c_str()` method that returned a `const char*` with a `toString()` method that returns a `std::string`. This new method can correctly handle strings containing null characters. A `const char*` can still be had by calling `data()` on the `std::string_view`, although this usage should be discouraged. This change is NFC in spirit, although not in practice. It does not intend to support any particular new functionality, but it is probably now possible to use strings containing null characters in at least some cases. At least one parser bug is also incidentally fixed. Follow-on PRs will explicitly support and test strings containing nulls for particular use cases. The C API still uses `const char*` to represent strings. As strings containing nulls become better supported by the rest of Binaryen, this will no longer be sufficient. Updating the C and JS APIs to use pointer, length pairs is left as future work.
* [NFC] Remove `cashew::` namespace from IString (#5126)Thomas Lively2022-10-101-1/+1
| | | | As an NFC preliminary change that will minimize the diff in #5122, which moves IString to the wasm namespace.
* Change `exit()` to `Fatal()`; make it possible to throw on `Fatal()`. (#5087)Brian Anderson2022-10-012-12/+27
| | | | | | | | | | | | | | | | This patch makes binaryen easier to call from other applications by making more errors recoverable instead of early-exiting. The main thing it does is change three calls to exit on I/O errors into calls to Fatal(), which is an existing custom abstraction for handling unrecoverable errors. Currently Fatal's destructor calls _Exit(1). My intent is to make it possible for Fatal to not exit, but to throw, allowing an embedding application to catch the exception. Because the previous early exits were exiting with error code EXIT_FAILURE, I also changed Fatal to exit with EXIT_FAILURE. The test suite continues to pass so I assume this is ok. Next I changed Fatal to buffer its error message until the destructor instead of immediately printing it to stderr. This is for ease of patching Fatal to throw instead. Finally, I also included the patch I need to make Fatal throw when THROW_ON_FATAL is defined at compile time. I can carry this patch out of tree, but it is a small patch, so perhaps you will be willing to take it. I am happy to remove it. Fixes #4938
* Fix bugs with copying expressions (#5099)Thomas Lively2022-09-301-1/+1
| | | | | | | | | | | | | | | | | It does not make sense to construct an `Expression` directly because all expressions must be specific expressions. However, we previously allowed constructing Expressions, and in particular we allowed them to be copy constructed. Unrelatedly, `Fatal::operator<<` took its argument by value. Together, these two facts produced UB when printing Expressions in fatal error messages because a new Expression would be copy constructed with the original expression ID but without any of the actual data from the original specific expression. For example, when trying to print a Block, the printing code would try to look at the expression list, but the expression list would be junk stack data because the copied Expression does not contain an expression list. Fix the problem by making Expression's constructors visible only to its subclasses and making `Fatal::operator<<` take its argument by forwarding reference instead of by value.
* [GUFA] Improve hashing (#5091)Alon Zakai2022-09-281-1/+1
| | | | Avoid manually doing bitshifts etc. - leave combining to the core hash logic, which can do a better job.
* Fix SmallVector's resize() method (#4979)Alon Zakai2022-08-291-0/+2
| | | | A resize from a large amount to a small amount would sometimes not clear the flexible storage, if we used it before but not after.
* Allow `BINARYEN_DEBUG` environment variable to be used in place of ↵Sam Clegg2022-08-041-0/+4
| | | | | | | | | `--debug`. NFC (#4874) For example I found it useful to able to do something like this: ``` $ BINARYEN_DEBUG=post-emscripten ./test/runner sometest ```
* [NFC] Make InsertOrderedMap::insert's param const (#4669)Alon Zakai2022-05-171-1/+1
| | | | | Being a const reference allows writing insert({a, b}), which will be useful in a future PR, and there is no reason to actually update the reference.
* Lift the restriction in liveness-traversal.h that supported max 65535 locals ↵juj2022-04-281-0/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in a function. (#4567) * Lift the restriction in liveness-traversal.h that supported max 65535 locals in a function. * Lint * Fix typo * Fix static * Lint * Lint * Lint * Add needed canRun function * lint * Use either a sparse or a dense matrix for tracking liveness copies, depending on the locals count. * Lint * Fix lint * Lint * Implement sparse_square_matrix class and use that as a backing. * Lint * Lint * Lint #includes * Lint * Lint includes * Remove unnecessary code * Fix canonical accesses to copies matrix * Lint * Add missing variable update * Remove canRun() function * Address review * Update expected test results * Update test name * Add asserts to sparse_square_matrix set and get functions that they are not out of bound. * Lint includes * Update test expectation * Use .clear() + .resize() to reset totalCopies vector
* SmallSet: Mark iterator parent as const (#4613)Alon Zakai2022-04-251-2/+2
| | | | | | | | | | | | | | | | | | | We already assume the parent does not change, binaryen/src/support/small_set.h Lines 202 to 208 in 94d77ef // std::set allows changes while iterating. For us here, though, it would // be nontrivial to support that given we have two iterators that we // generalize over (switching "in the middle" would not be easy or fast), // so error on that. if (usingFixed != other.usingFixed) { Fatal() << "SmallSet does not support changes while iterating"; } This also marks the parent as const to reflect that. This fixed a weird C++ compilation error I had when working on something unrelated, but seems worth landing independently.
* Fix errors when building in C++20 mode (#4528)Jakub Szewczyk2022-03-182-9/+11
| | | | | | | * use [[noreturn]] available since C++11 instead of compiler-specific attributes * replace deprecated std::is_pod with is_trivial&&is_standard_layout (also available since C++11/14) * explicitly capture this in [=] lambdas * extra const functions in FeatureSet, fix implicit cast warning by using the features field directly * Use CMAKE_CXX_STANDARD to ensure the C++ standard parameter is set on all targets, remove manual compiler flag workaround.
* Generate heap type names when printing types (#4503)Thomas Lively2022-02-071-0/+29
| | | | | | | | | | | | | | | | | | | | | | The previous printing system in the Types API would print the full recursive structure of a Type or HeapType with special markers using de Bruijn indices to avoid infinite recursion and a separate special marker for when the size exceeded an arbitrary upper limit. In practice, the types printed by that system were not human readable, so all that complexity was not useful. Replace that system with a new system that always emits a HeapType name rather than recursing into the structure of inner HeapTypes. Add methods for printing Types and HeapTypes with custom HeapType name generators. Also add a new wasm-type-printing.h header with off-the-shelf type name generators that implement simple naming schemes sufficient for tests and the type fuzzer. Note that these new printing methods and the old printing methods they augment are not used for emitting text modules. Printing types as part of expressions and modules is handled by separate code in Print.cpp and the printing API modified in this PR is mostly used for debugging. However, the new printing methods are general enough that Print.cpp should be able to use them as well, so update the format used to print types in the modified printing system to match the text format in anticipation of making that change in a follow-up PR.
* CRTP topological sort utility (#4499)Thomas Lively2022-02-031-0/+118
| | | | | | | | Refactor the `TopologicalSortStack` into a `TopologicalSort` CRTP utility that manipulates the DFS stack internally instead of exposing it to the user. The only method users have to implement to use the utility is `pushPredecessors`, which should call the provided `push` method on all the predecessors of a given item. The public interface to `TopologicalSort` is an input iterator, so it can easily be used in range-based for loops.
* Topological sorting of types in isorecursive output (#4492)Thomas Lively2022-02-021-0/+2
| | | | | | | | | | | | | | | | | | | Generally we try to order types by decreasing use count so that frequently used types get smaller indices. For the equirecursive and nominal systems, there are no contraints on the ordering of types, so we just have to sort them according to their use counts. For the isorecursive type system, however, there are a number of ordering constraints that have to be met for the type section to be valid. First, types in the same recursion group must be adjacent so they can be grouped together. Second, groups must be ordered topologically so that they only refer to types in themselves or prior groups. Update type ordering to produce a valid isorecursive output by performing a topological sort on the recursion groups. While performing the sort, prefer to visit and finish processing the most used groups first as a heuristic to improve the final ordering. Do not reorder types within groups, since doing so would change type identity and could affect the external interface of the module. Leave that reordering to an optimization pass (not yet implemented) that users can explicitly opt in to.