| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
| |
Adds a function to filter branch and return instructions from being included in potential outlining candidates.
|
|
|
|
|
| |
Adds a general purpose walker named FilterStringifyWalker, intended to walk control flow and take note of whether any of the expressions satisfy the condition.
Also includes an << overload for SuffixTree::RepeatedSubstring to make debugging easier.
|
|
|
| |
Fixes #6003
|
|
|
|
| |
With this, the fuzzer can replace e.g. an eq expression with a specific struct type,
because now it is away that struct types have eq as their ancestor.
|
|
|
|
| |
A later PR will add getSuperType which will mean "get the general super type -
either declared, or not".
|
|
|
| |
Adds a std::variant to represent the context of why a unique symbol was inserted in the stringified module. This allows us to pass necessary contextual data to subclasses of StringifyWalker in a structured manner.
|
|
|
| |
This PR adds a StringProcessor struct intended to hold functions that filter vectors of SuffixTree::RepeatedSubstring, and a test of its first functionality, removing overlapping repeated substrings.
|
|
|
| |
Abstracts repeat code
|
|
|
|
|
|
| |
And put the new files in a new source directory, "parser". This is a rough split
and is not yet expected to dramatically improve compile times. The exact
organization of the new files is subject to change, but this splitting should be
enough to make further parser development more pleasant.
|
|
|
|
|
|
|
| |
Remove support for the "struct_subtype", "array_subtype", "func_subtype", and
"extends" notations we used at various times to declare WasmGC types, leaving
only support for the standard text fromat for declaring types. Update all the
tests using the old formats and delete tests that existed solely to test the old
formats.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Match the spec and parse the shorthand binary and text formats as final and emit
final types without supertypes using the shorthands as well. This is a
potentially-breaking change, since the text and binary shorthands can no longer
be used to define types that have subtypes.
Also make TypeBuilder entries final by default to better match the spec and
update the internal APIs to use the "open" terminology rather than "final"
terminology. Future changes will update the text format to use the standard "sub
open" rather than the current "sub final" keywords. The exception is the new wat
parser, which supporst "sub open" as of this change, since it didn't support
final types at all previously.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change introduces StackLattice, a lattice to model stack-related
behavior. It is templated on a separate lattice whose elements model
some property of individual values on the stack. The StackLattice
allows users to access the top of the stack, push abstract values, and
pop them. Comparisons and least upper bound operations are done on a
value by value basis starting from the top of the stack and moving
toward the bottom. This is because it allows stacks from different
scopes to be joined easily.
An application of StackLattice is to model the wasm value stack. The goal
is to organize lattice elements representing individual stack values in a
natural way which mirrors the wasm value stack. Transfer functions operate
on each stack value individually. The stack lattice is an intermediate
structure which is not intended to be directly operated on. Rather, it
simulates the push and pop behavior of instructions.
|
|
|
| |
Adds an integration test that identifies the substrings of a stringified wasm module using the suffix_tree.
|
|
|
|
|
|
|
|
| |
None of that code is speed-sensitive, or at least doesn't need to be inlined to be
fast. Move it to cpp for faster compile times.
This caused a cascade of necessary header fixes (i.e. after removing unneeded
header inclusions in module-utils.h, files that improperly depended on that
stopped working and needed an added include).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a TrapsNeverHappen oracle that is used inside the main PossibleContents
oracle of GUFA. The idea is that when traps never happen we can reason "backwards"
from information to things that must be true before it:
temp = x.field;
x.cast_to<Y>(); // Y is a subtype of x's type X
Here we cast x to a subtype. If we assume traps never happen then the cast must
succeed, and that means we can assume we had a Y on the previous line, where
perhaps that information lets us infer the value of x.field.
This PR focuses on calls, which are the more interesting situation to optimize
because other passes do some work already inside functions. Specifically, we look
for things that will trap in the called function or the caller, such as if the called
function always casts a param to some type, we can assume the caller passes
such a type in. And if we have a call_ref then any target that would trap cannot be
called (at least in a closed world).
This has some benefits, in particular when combined with --gufa-cast-all since
that casts more things, which lets us apply the inferences made here. I see 3.3%
fewer call_ref instructions on a Kotlin testcase, for example. This helps more
on -Os when we inline less.
|
| |
|
|
|
|
|
| |
Changes the static asserts checking a lattice type to require a non-static
compare function instead of a static one. Also changes the compare
functions of existing lattices to be non-static.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this PR, if a call had no paths to a catch in the same function then we skipped
creating a new basic block right after it. As a result, we could have a call in the middle
of a basic block. If EH is enabled that means we might transfer control flow out of
the function from the middle of a block. But it is better to have the property that
any transfer of control flow - to another basic block, or outside of the function - can
only happen at the end of a basic block.
This causes some overhead, but a subsequent PR (#5838) will remove that as a
followup, and this PR adds a little code to pass the module and check if EH is enabled,
and avoid the overhead if not, which at least avoids regressing the non-EH case
until that followup lands.
|
|
|
|
|
| |
The LLVM suffix tree expects to be provided with a vector of 32-bit unsigned integers. This PR makes it easier to integrate our wasm program string with the suffix tree.
Because the range of possible values is reduced from 2^64 to 2^32, a signed integer was added to manage the next separator value and ensure we're using every possible negative number.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This change implements a reaching definitions analysis which is intended
to be equivalent to the information provided by LocalGraph, specifically
the Flower class of LocalGraph.
It also introduces a CRTP utility in visitor-transfer-function.h which
implements most commonly found visitor-type transfer function
functionalities. The MonotoneCFGAnalyzer is also modified to add a phase
to collect results after the analysis is solved from the final CFG
states.
|
|
|
|
|
| |
This PR adds LLVM's suffix tree data structure to Binaryen. This suffix tree is implemented using Ukkonen's algorithm for linear-time suffix tree construction, and is intended for fast substring queries.
Note: All of the .h and .cpp files included are from LLVM. These files were copied directly instead of imported into our existing LLVM integration (in third_party/) to avoid bumping the commit hash and avoid the potential for complications with upstream changes.
|
|
|
|
|
| |
This PR is part of the effort to bring Outlining to Binaryen.
HashStringifyWalker is a subclass of StringifyWalker #5772, and used to encode a wasm module as a "string". Our "string" is a vector and each symbol is a uint64_t, providing us with a capacity of 2^64 symbols.
|
|
|
|
|
|
|
|
|
|
| |
This change creates a lattice, FinitePowersetLattice, to represent finite
powerset lattices constructed from sets containing members of arbitrary type
The bitvector finite powerset lattice class is renamed FiniteIntPowersetLattice.
The templated FinitePowersetLattice class contains a FiniteIntPowersetLattice
object, and over that provides functionality to map lattice element members
to bitvector indices. Methods are also provided by the lattice to add or
remove members of the given type from lattice members as an abstraction
over flipping bits in the bitvector.
|
|
|
|
|
|
|
| |
Analyzer (#5794)
This change makes the transfer function an object separate from the monotone analyzer. The transfer function class type is a generic template of the monotone analyzer, and an instance of the transfer function is instantiated and then passed into the monotone analyzer via its constructor. The API of the transfer function is enforced by a static assertion.
This change also introduces LivenessTransferFunction as a transfer function for liveness analysis as an example.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement support in the type system for final types, which are not allowed to
have any subtypes. Final types are syntactically different from similar
non-final types, so type canonicalization is made aware of finality. Similarly,
TypeMerging and TypeSSA are updated to work correctly in the presence of final
types as well.
Implement binary and text parsing and emitting of final types. Use the standard
text format to represent final types and interpret the non-standard
"struct_subtype" and friends as non-final. This allows a graceful upgrade path
for users currently using the non-standard text format, where they can update
their code to use final types correctly at the point when they update to use the
standard format. Once users have migrated to using the fully expanded standard
text format, we can update update Binaryen's parsers to interpret the MVP
shorthands as final types to match the spec without breaking those users.
To make it safe for V8 to independently start interpreting types declared
without `sub` as final, also reserve that shorthand encoding only for types that
have no strict subtypes.
|
|
|
|
|
|
|
|
| |
Previously we incorrectly used "strict" to mean the immediate subtypes of a
type, when in fact a strict subtype of a type is any subtype excluding the type
itself. Rename the incorrect `getStrictSubTypes` to `getImmediateSubTypes`,
rename the redundant `getAllStrictSubTypes` to `getStrictSubTypes`, and rename
the redundant `getAllSubTypes` to `getSubTypes`. Fixing the capitalization of
"SubType" to "Subtype" is left as future work.
|
|
|
| |
StringifyWalker is a new Walker with UnifiedExpressionVisitor. This walker performs a shallow visit of control-flow (try, if, block, loop) expressions and their simple expression siblings before then visiting the children of each control-flow expression in postorder. As a result, this walker un-nests nested control flow structures, so the expression visit order does not correspond to a normal postorder traversal of the function.
|
|
|
| |
This change introduces a finite-height powerset lattice FinitePowersetLattice where each element's height is determined when constructed in runtime. This is implemented using a vector of `bools. This change also modifies BlockState and MonotoneCFGAnalyzer to be templated on a generic lattice type instead of using a hardcoded lattice. It additionally fixes an iterator bug in MonotoneCFGAnalyzer::fromCFG which assigned a temporary iterator object as predecessor and successor pointers to BlockStates instead of pointers to actual BlockState objects.
|
|
|
| |
This introduces a limited monotone flow-sensitive liveness analysis on local indices as an initial proof of concept for the creation of a monotone flow-sensitive static analysis framework. Tests are included in test/gtest/cfg.cpp.
|
|
|
|
|
| |
Such tests all have to handle disabling and re-enabling color output for printed
wast and have to handle parsing input wast, so add a test fixture that makes
these tasks easier.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new "analysis" source directory that will contain the source for a new
static program analysis framework. To start the framework, add a CFG utility
that provides convenient iterators for iterating through the basic blocks of the
CFG as well as the predecessors, successors, and contents of each block.
The new CFGs are constructed using the existing CFGWalker, but they are
different in that the new utility is meant to provide a usable representation of
a CFG whereas CFGWalker is meant to allow collecting arbitrary information about
each basic block in a CFG.
For testing and debugging purposes, add `print` methods to CFGs and basic
blocks. This requires exposing the ability to print expression contents
excluding children, which was something we previously did only for StackIR.
Also add a new gtest file with a test for constructing and printing a CFG. The
test reveals some strange properties of the current CFG construction, including
empty blocks and strange placement of `loop` instructions, but fixing these
problems is left as future work.
|
|
|
|
|
|
|
|
|
|
|
| |
This capability was originally introduced to support calculating LUBs in the
equirecursive type system, but has not been needed for anything except tests
since the equirecursive type system was removed. Since building basic heap types
is no longer useful and was a source of significant complexity, remove the APIs
that allowed it and the tests that used those APIs.
Also remove test/example/type-builder.cpp, since a significant portion of it
tested the removed APIs and the rest is already better tested in
test/gtest/type-builder.cpp.
|
|
|
|
|
| |
And since the only type system left is the standard isorecursive type system,
remove `TypeSystem` and its associated APIs entirely. Delete a few tests that
only made sense under the isorecursive type system.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously (ref.as_non_null (global.get ..)) would return the global with no changes,
and if the global was nullable then the type didn't match the output, which hit an
assertion (where GUFA checks that the contents match the declared type in the wasm).
To fix this, refine global types, that is, the type we track on GlobalInfo may be more
refined than the global itself. In the above example, if the global is nullable then
the GlobalInfo would point to that global but have a non-nullable type.
In fact the code was already prepared for this, and few changes were needed.
|
|
|
|
|
|
|
|
|
| |
If the module does not have a name for a particular type, the new utility falls
back to use a different user-configurable type name generator, just like the
existing IndexedTypeNameGenerator does.
Also change how heap types are printed by this printing machinery (which is
currently only used for debugging) so that their names are printed in addition
to their contents. This makes the printer much more useful for debugging.
|
|
|
|
|
|
|
|
| |
We used to have this algorithm in wasm-type.cpp, where we used it to implement
equirecursive type canonicalization, but we removed it when we removed
equirecursive typing. Bring the algorithm back as a standalone utility for
future use in optimization passes. In particular, it will be useful in
TypeMerging for identifying the greatest fixed point of mergeable types rather
than the smallest fixed point.
|
|
|
|
|
|
| |
`struct` has replaced `data` in the upstream spec, so update Binaryen's types to
match. We had already supported `struct` as an alias for data, but now remove
support for `data` entirely. Also remove instructions like `ref.is_data` that
are deprecated and do not make sense without a `data` type.
|
|
|
|
| |
MSVC is making `NAN` negative, so use an explicitly constructed positive NaN
instead.
|
|
|
|
|
| |
In favor of the more portable code snippet using `std::copysign`. Also
reintroduce assertions that the NaNs have the expected signs. This continues
work started in #5302.
|
|
|
|
| |
Equirecursive is no longer standards track and its implementation is extremely
complex. Remove it.
|
|
|
|
|
|
|
|
|
|
| |
This makes Binaryen's default type system match the WasmGC spec.
Update the way type definitions without supertypes are printed to reduce the
output diff for MVP tests that do not involve WasmGC. Also port some
type-builder.cpp tests from test/example to test/gtest since they needed to be
rewritten to work with isorecursive type anyway.
A follow-on PR will remove equirecursive types completely.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes a longstanding problem with isorecursive canonicalization that only showed
up in MacOS and occasionally Windows builds. The problem was that
`RecGroupEquator` was not quite correct in the presence of self-references in
rec groups. Specifically, `RecGroupEquator` did not differentiate between
instances of the same type appearing across two rec groups where the type was a
self-reference in one group but not in the other.
The reason this only showed up occasionally on some platforms was that this bug
could only cause incorrect behavior if two groups that would incorrectly be
compared as equal were hashed into the same bucket of a hash map. Apparently the
hash map used on Linux never hashes the two problematic groups into the same
bucket.
|
|
|
|
|
| |
This makes the logic symmetric and easier to read.
Measuring speed, this seems identical to before, so that concern seems fine.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we have a cone type, we are able to represent in PossibleContents the
natural content of a wasm location: a type or any of its subtypes. This allows us to
enforce the wasm typing rules, that is, to filter the data arriving at a location by the
wasm type of the location.
Technically this could be unnecessary if we had full implementations of flowFoo
and so forth, that is, tailored code for each wasm expression that makes sure we
only contain and flow content that fits in the wasm type. Atm we don't have that,
and until the wasm spec stabilizes it's probably not worth the effort. Instead,
simply filter based on the type, which gives the same result (though it does take
a little more work; I measured it at 3% or so of runtime).
While doing so normalize cones to their actual maximum depth, which simplifies
things and will help more later as well.
|
|
|
|
|
|
|
|
|
| |
`array` is the supertype of all defined array types and for now is a subtype of
`data`. (Once `data` becomes `struct` this will no longer be true.) Update the
binary and text parsing of `array.len` to ignore the obsolete type annotation
and update the binary emitting to emit a zero in place of the old type
annotation and the text printing to print an arbitrary heap type for the
annotation. A follow-on PR will add support for the newer unannotated version of
`array.len`.
|
|
|
|
|
|
| |
As the number of basic heap types has grown, the complexity of the subtype and
LUB calculations has grown as well. To ensure that they are correct, test the
complete matrix of basic types and trivial user-defined types. Fix the subtype
calculation to make string types subtypes of `any` to make the test pass.
|
|
|
|
|
|
| |
This computes how deep the children of a heap type are. This will be useful in
cone type optimizations, since we want to "normalize" cones: a cone of depth
infinity can just be a cone of the actual maximum depth of existing children, etc.,
and it's simpler to have a single canonical representation to avoid extra work.
|