| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Previously they were structs and their results were accessed with
`operator*()`, but that was unnecessarily complicated and could lead to
problems with temporary lifetimes being too short. Simplify the
utilities by making them functions. This also allows the wrapper
templates to infer the proper element types automatically.
|
|
|
|
|
|
|
|
| |
Reuse the code implementing Kahn's topological sort algorithm with a new
configuration that uses a min-heap to always choose the best available
element.
Also add wrapper utilities that can find topological sorts of graphs
with arbitrary element types, not just indices.
|
|
|
|
|
|
| |
Make `TopologicalOrders` its own iterator rather than having a separate
iterator class that wraps a pointer to `TopologicalOrders`. This
simplifies usage in cases where an iterator needs to be persistently
stored. Notably, all of the tests continue working as they are.
|
|
|
|
|
|
|
|
|
| |
Use an extension of Kahn's algorithm for finding topological orders that
iteratively makes every possible choice at every step to find all the
topological orders. The order being constructed and the set of possible
choices are managed in-place in the same buffer, so the algorithm takes
linear time and space plus amortized constant time per generated order.
This will be used in an upcoming type optimization.
|
|
|
|
| |
This will be used in an upcoming type optimization pass and may be
generally useful.
|
|
|
|
|
|
|
|
|
| |
Implement a non-recursive version of Tarjan's Strongly Connected
Component algorithm that consumes and produces iterators for maximum
flexibility.
This will be used in an optimization that transforms the heap type graph
to use minimal recursion groups, which correspond to the strongly
connected components of the type graph.
|
|
|
| |
This will hopefully fix the build on the coverage builder.
|
|
|
|
|
|
| |
Add an `isUTF8` utility and use it in both the text and binary parsers.
Add missing checks for overlong encodings and overlarge code points in
our WTF8 reader, which the new utility uses. Re-enable the spec tests
that test UTF-8 validation.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this PR we generate global.gets in globals, which we did not do before.
We do that by replacing makeConst (the only thing we did before, for the
contents of globals) with makeTrivial, and add code to makeTrivial to sometimes
make a global.get. When no suitable global exists, makeGlobalGet will emit a
constant, so there is no danger in trying.
Also raise the number of globals a little.
Also explicitly note the current limitation of requiring all tuple globals to contain
tuple.make and nothing else, including not global.get, and avoid adding such
invalid global.gets in tuple globals in the fuzzer.
|
|
|
|
|
|
|
|
| |
The new wat parser currently considers itself to be at the end of the file
whenever it cannot lex another token. This is not quite right, but fixing it
causes parser errors because of the extra null character we were appending to
files when we read them. This null character is not useful since we can already
read files as `std::string`, which always has an implicit null character, so
remove it. Clean up some users of `read_file` while we're at it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The latest idea for efficient string constants is to encode the constants in
the import names of their globals and implement fast paths in the engines for
materializing those constants at instantiation time without needing to parse
anything in JS. This strategy only works for valid strings (i.e. strings without
unpaired surrogates) because only valid strings can be used as import names in
the WebAssembly syntax.
Add a new configuration of the StringLowering pass that encodes valid string
contents in import names, falling back to the JSON custom section approach for
invalid strings.
To test this chang, update the printer to escape import and export names
properly and update the legacy parser to parse escapes in import and export
names properly. As a drive-by, remove the incorrect check in the parser that the
import module and base names are non-empty.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
WTF-16, i.e. arbitrary sequences of 16-bit values, is the encoding of Java and
JavaScript strings, and using the same encoding makes the interpretation of
string operations trivial, even when accounting for non-ascii characters.
Specifically, use little-endian WTF-16.
Re-encode string constants from WTF-8 to WTF-16 in the parsers, then back to
WTF-8 in the writers. Update the constructor for string `Literal`s to interpret
the string as WTF-16 and store a sequence of WTF-16 code units, i.e. 16-bit
integers. Update `Builder::makeConstantExpression` accordingly to convert from
the new `Literal` string representation back to a WTF-16 string.
Update the interpreter to remove the logic for detecting non-ascii characters
and bailing out. The naive implementations of all the string operations are
correct now that our string encoding matches the JS string encoding.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar issue as: #6330
FAILED: src/passes/CMakeFiles/passes.dir/Precompute.cpp.o
/usr/bin/c++ -I/build/binaryen/src/binaryen-version_117/src -I/build/binaryen/src/binaryen-version_117/third_party/llvm-project/include -I/build/binaryen/src/binaryen-version_117/build -march=rv64gc -mabi=lp64d -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -Wp,-D_GLIBCXX_ASSERTIONS -g -ffile-prefix-map=/build/binaryen/src=/usr/src/debug/binaryen -DBUILD_LLVM_DWARF -Wall -Werror -Wextra -Wno-unused-parameter -Wno-dangling-pointer -fno-omit-frame-pointer -fno-rtti -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -Wswitch -Wimplicit-fallthrough -Wnon-virtual-dtor -fPIC -fdiagnostics-color=always -O3 -DNDEBUG -UNDEBUG -std=c++17 -MD -MT src/passes/CMakeFiles/passes.dir/Precompute.cpp.o -MF src/passes/CMakeFiles/passes.dir/Precompute.cpp.o.d -o src/passes/CMakeFiles/passes.dir/Precompute.cpp.o -c /build/binaryen/src/binaryen-version_117/src/passes/Precompute.cpp
In file included from /build/binaryen/src/binaryen-version_117/src/wasm-traversal.h:30,
from /build/binaryen/src/binaryen-version_117/src/pass.h:24,
from /build/binaryen/src/binaryen-version_117/src/ir/intrinsics.h:20,
from /build/binaryen/src/binaryen-version_117/src/ir/effects.h:20,
from /build/binaryen/src/binaryen-version_117/src/passes/Precompute.cpp:30:
In copy constructor ‘wasm::SmallVector<wasm::Expression*, 10>::SmallVector(const wasm::SmallVector<wasm::Expression*, 10>&)’,
inlined from ‘constexpr std::pair<_T1, _T2>::pair(const _T1&, const _T2&) [with _U1 = wasm::Select* const; _U2 = wasm::SmallVector<wasm::Expression*, 10>; typename std::enable_if<(std::_PCC<true, _T1, _T2>::_ConstructiblePair<_U1, _U2>() && std::_PCC<true, _T1, _T2>::_ImplicitlyConvertiblePair<_U1, _U2>()), bool>::type <anonymous> = true; _T1 = wasm::Select* const; _T2 = wasm::SmallVector<wasm::Expression*, 10>]’ at /usr/include/c++/13.2.1/bits/stl_pair.h:559:21,
inlined from ‘T& wasm::InsertOrderedMap<Key, T>::operator[](const Key&) [with Key = wasm::Select*; T = wasm::SmallVector<wasm::Expression*, 10>]’ at /build/binaryen/src/binaryen-version_117/src/support/insert_ordered.h:112:29:
/build/binaryen/src/binaryen-version_117/src/support/small_vector.h:42:38: error: ‘<unnamed>.wasm::SmallVector<wasm::Expression*, 10>::fixed’ is used uninitialized [-Werror=uninitialized]
42 | template<typename T, size_t N> class SmallVector {
| ^~~~~~~~~~~
In file included from /build/binaryen/src/binaryen-version_117/src/passes/Precompute.cpp:38:
/build/binaryen/src/binaryen-version_117/src/support/insert_ordered.h: In function ‘T& wasm::InsertOrderedMap<Key, T>::operator[](const Key&) [with Key = wasm::Select*; T = wasm::SmallVector<wasm::Expression*, 10>]’:
/build/binaryen/src/binaryen-version_117/src/support/insert_ordered.h:112:29: note: ‘<anonymous>’ declared here
112 | std::pair<const Key, T> kv = {k, {}};
| ^~
|
| |
|
|
|
|
| |
Before this all Emscripten builds would use 1 core, but it is important to
allow pthreads builds there to use more.
|
| |
|
|
|
|
|
|
|
|
| |
Catch and report all kinds of WTF-8 encoding errors in the source strings,
including invalid leading bytes, invalid trailing bytes, unexpected ends of
strings, and invalid surrogate sequences. Insert replacement characters into the
output as necessary. Add a TODO about minimizing size by escaping only those
code points mandated to be escaped by the JSON spec. Generally improve
readability of the code.
|
|
|
|
| |
Also add an end-to-end test using node to verify we can parse the escaped
content properly using TextDecoder+JSON.parse.
|
|
|
|
| |
Now that we have a .cpp file, none of the code that was in string.h needs to be
in a header any more.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update identifiers used in tests to use a format supported by the new text
parser, i.e. either the standard format with its limited set of allowed
characters or the non-standard `$"..."` format. Notably, any name containing
square or curly braces now uses the string format.
Input automatically updated with this script:
https://gist.github.com/tlively/4e22311736661849e641d02e521a0748
The printer is updated to properly escape names in more places as well. The
logic for escaping names is moved to a common location so that the type
printing logic in wasm-type.cpp can use it as well.
|
| |
|
|
|
|
|
| |
Fixes a fuzz testcase for wasm-ctor-eval.
Add the beginnings of a polyfill for stdckdint.h to help that.
|
|
|
|
|
|
|
| |
These module fields are especially complex to parse because they contain both
nontrivial types and instructions, so their parsing logic needs to be spread out
across the ParseDecls, ParseModuleTypes, and ParseDefs phases of parsing. This
applies to in-line elements in table definitions as well, which means we need to
be able to match a table to its in-line element segment across multiple phases.
|
|
|
|
|
|
|
| |
If there are newlines in the list, then we split using them in a simple manner
(that does not take into account nesting of any other delimiters).
Fixes #6047
Fixes #5271
|
|
|
|
|
| |
Adds a general purpose walker named FilterStringifyWalker, intended to walk control flow and take note of whether any of the expressions satisfy the condition.
Also includes an << overload for SuffixTree::RepeatedSubstring to make debugging easier.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR changes how file paths and the command line are handled. On startup on Windows,
we process the wstring version of the command line (including the file paths) and re-encode
it to UTF8 before handing it off to the rest of the command line handling logic. This means
that all paths are stored in UTF8-encoded std::strings as they go through the program, right
up until they are used to open files. At that time, they are converted to the appropriate native
format with the new to_path function before passing to the stdlib open functions.
This has the advantage that all of the non-file-opening code can use a single type to hold paths
(which is good since std::filesystem::path has proved problematic in some cases), but has the
disadvantage that someone could add new code that forgets to convert to_path before
opening. That's somewhat mitigated by the fact that most of the code uses the ModuleIOBase
classes for opening files.
Fixes #4995
|
|
|
|
| |
Fixes #5928 , on FreeBSD off_t is not defined in the headers we include.
|
|
|
| |
Allow them to be used for more than just the new text parser.
|
|
|
| |
Adds an integration test that identifies the substrings of a stringified wasm module using the suffix_tree.
|
| |
|
|
|
|
|
| |
This PR adds LLVM's suffix tree data structure to Binaryen. This suffix tree is implemented using Ukkonen's algorithm for linear-time suffix tree construction, and is intended for fast substring queries.
Note: All of the .h and .cpp files included are from LLVM. These files were copied directly instead of imported into our existing LLVM integration (in third_party/) to avoid bumping the commit hash and avoid the potential for complications with upstream changes.
|
|
|
| |
Fixes #5720
|
|
|
|
| |
This code predates our adoption of C++14 and can now be removed in favor of
`std::make_unique`, which should be more efficient.
|
|
|
|
|
|
|
|
|
|
|
|
| |
When resolving `operator!=`, C++20 also considers `operator==` implementations
when the types on `operator!=` do not match exactly. This caused the modified
code to have no most-specific overload to choose, resulting in an error. This is
actually a bug in the language that is being fixed, but there exist compilers
without the fix applied.
Work around the problem by updating the types in the declaration of `operator==`
and `operator!=` to be more exact.
This is a copy of #5029 with formatting fixes.
|
|
|
|
|
|
|
|
| |
We used to have this algorithm in wasm-type.cpp, where we used it to implement
equirecursive type canonicalization, but we removed it when we removed
equirecursive typing. Bring the algorithm back as a standalone utility for
future use in optimization passes. In particular, it will be useful in
TypeMerging for identifying the greatest fixed point of mergeable types rather
than the smallest fixed point.
|
|
|
| |
Fixes #5370
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do not optimize or modify public heap types in any way. Public heap types
include the types of imported or exported functions, tables, globals, etc. This
is important to maintain the public interface of a module and ensure it can
still link interact as intended with the outside world.
Also add validation error if we find any nontrivial public types that are not
the types of imported or exported functions. This error is meant to help the
user ensure that type optimizations are not silently inhibited. In the future,
we may want to add options to silence this error or downgrade it to a warning.
This commit only updates the type updating machinery to avoid updating public
types. It does not update any optimization passes accordingly. Since we avoid
modifying public signature types already, this is not expected to break
anything, but in the future once we have function subtyping or if we make the
error optional, we may have to update some of our optimization passes.
|
| |
|
|
|
| |
As suggested in #5218
|
|
|
|
|
|
|
|
|
| |
std::string::back() is only well defined for non-empty strings.
Without the change, wasm-reduce fails if it is called from
$PATH, because then, the parent directory is an empty string.
A workaround is to explicitly set the binaryen path with -b,
and it is still necessary after this fix, but at least the program
ends with a comprehensible error message instead of a generic
assertion failure from the standard library.
|
|
|
|
|
|
|
|
|
| |
(#5273)
When `-Wheader-hygiene` is enabled, C compiler will warn when using
namespace directive in global context in header file.
When `-Wimplicit-const-int-float-conversion` is enabled C compiler will
warn on implicit integer to double conversions that change values.
|
|
|
|
| |
This is more modern and (IMHO) easier to read than that old C typedef
syntax.
|
|
|
| |
We did not preserve the ordering of the fixed-size storage there.
|
|
|
|
|
| |
We already provided a specialization of `std::hash` for arbitrary pairs, so add
one for `std::tuple` as well. Use the new specialization where we were
previously using nested pairs just to be able to use the pair specialization.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the goal of supporting null characters (i.e. zero bytes) in strings.
Rewrite the underlying interned `IString` to store a `std::string_view` rather
than a `const char*`, reduce the number of map lookups necessary to intern a
string, and present a more immutable interface.
Most importantly, replace the `c_str()` method that returned a `const char*`
with a `toString()` method that returns a `std::string`. This new method can
correctly handle strings containing null characters. A `const char*` can still
be had by calling `data()` on the `std::string_view`, although this usage should
be discouraged.
This change is NFC in spirit, although not in practice. It does not intend to
support any particular new functionality, but it is probably now possible to use
strings containing null characters in at least some cases. At least one parser
bug is also incidentally fixed. Follow-on PRs will explicitly support and test
strings containing nulls for particular use cases.
The C API still uses `const char*` to represent strings. As strings containing
nulls become better supported by the rest of Binaryen, this will no longer be
sufficient. Updating the C and JS APIs to use pointer, length pairs is left as
future work.
|
|
|
|
| |
As an NFC preliminary change that will minimize the diff in #5122, which moves
IString to the wasm namespace.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes binaryen easier to call from other applications by making more errors recoverable instead of early-exiting.
The main thing it does is change three calls to exit on I/O errors into calls to Fatal(), which is an existing custom abstraction for handling unrecoverable errors. Currently Fatal's destructor calls _Exit(1).
My intent is to make it possible for Fatal to not exit, but to throw, allowing an embedding application to catch the exception.
Because the previous early exits were exiting with error code EXIT_FAILURE, I also changed Fatal to exit with EXIT_FAILURE. The test suite continues to pass so I assume this is ok.
Next I changed Fatal to buffer its error message until the destructor instead of immediately printing it to stderr. This is for ease of patching Fatal to throw instead.
Finally, I also included the patch I need to make Fatal throw when THROW_ON_FATAL is defined at compile time. I can carry this patch out of tree, but it is a small patch, so perhaps you will be willing to take it. I am happy to remove it.
Fixes #4938
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It does not make sense to construct an `Expression` directly because all
expressions must be specific expressions. However, we previously allowed
constructing Expressions, and in particular we allowed them to be copy
constructed. Unrelatedly, `Fatal::operator<<` took its argument by value.
Together, these two facts produced UB when printing Expressions in fatal error
messages because a new Expression would be copy constructed with the original
expression ID but without any of the actual data from the original specific
expression. For example, when trying to print a Block, the printing code would
try to look at the expression list, but the expression list would be junk stack
data because the copied Expression does not contain an expression list.
Fix the problem by making Expression's constructors visible only to its
subclasses and making `Fatal::operator<<` take its argument by forwarding
reference instead of by value.
|
|
|
|
| |
Avoid manually doing bitshifts etc. - leave combining to the core hash
logic, which can do a better job.
|
|
|
|
| |
A resize from a large amount to a small amount would sometimes not clear
the flexible storage, if we used it before but not after.
|