| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We previously supported a non-standard `(func "name" ...` syntax for declaring
functions exported with the quoted name. Since that is not part of the standard
text format, drop support for it, replacing it with the standard `(func $name
(export "name") ...` syntax instead.
Also replace our other usage of the quoted form in our text output, which was
where we quoted names containing characters that are not allowed to appear in
standard names. To handle that case, adjust our output from `"$name"` to
`$"name"`, which is the standards-track way of supporting such names. Also fix
how we detect non-standard name characters to match the spec.
Update the lit test output generation script to account for these changes,
including by making the `$` prefix on names mandatory. This causes the script to
stop interpreting declarative element segments with the `(elem declare ...`
syntax as being named "declare", so prevent our generated output from regressing
by counting "declare" as a name in the script.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Match the spec and parse the shorthand binary and text formats as final and emit
final types without supertypes using the shorthands as well. This is a
potentially-breaking change, since the text and binary shorthands can no longer
be used to define types that have subtypes.
Also make TypeBuilder entries final by default to better match the spec and
update the internal APIs to use the "open" terminology rather than "final"
terminology. Future changes will update the text format to use the standard "sub
open" rather than the current "sub final" keywords. The exception is the new wat
parser, which supporst "sub open" as of this change, since it didn't support
final types at all previously.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When printing Binaryen IR, we previously generated names for unnamed heap types
based on their structure. This was useful for seeing the structure of simple
types at a glance without having to separately go look up their definitions, but
it also had two problems:
1. The same name could be generated for multiple types. The generated names did
not take into account rec group structure or finality, so types that differed
only in these properties would have the same name. Also, generated type names
were limited in length, so very large types that shared only some structure
could also end up with the same names. Using the same name for multiple types
produces incorrect and unparsable output.
2. The generated names were not useful beyond the most trivial examples. Even
with length limits, names for nontrivial types were extremely long and visually
noisy, which made reading disassembled real-world code more challenging.
Fix these problems by emitting simple indexed names for unnamed heap types
instead. This regresses readability for very simple examples, but the trade off
is worth it.
This change also reduces the number of type printing systems we have by one.
Previously we had the system in Print.cpp, but we had another, more general and
extensible system in wasm-type-printing.h and wasm-type.cpp as well. Remove the
old type printing system from Print.cpp and replace it with a much smaller use
of the new system. This requires significant refactoring of Print.cpp so that
PrintExpressionContents object now holds a reference to a parent
PrintSExpression object that holds the type name state.
This diff is very large because almost every test output changed slightly. To
minimize the diff and ease review, change the type printer in wasm-type.cpp to
behave the same as the old type printer in Print.cpp except for the differences
in name generation. These changes will be reverted in much smaller PRs in the
future to generally improve how types are printed.
|
|
|
|
| |
The function type should be printed there just like for non-imported
functions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A cycle of data is something we can't just naively emit as wasm globals. If
at runtime we end up, for example, with an object A that refers to itself,
then we can't just emit
(global $A
(struct.new $A
(global.get $A)))
The struct.get is of this very global, and such a self-reference is invalid. So
we need to break such cycles as we emit them. The simple idea used here
is to find paths in the cycle that are nullable and mutable, and replace the
initial value with a null that is fixed up later in the start function:
(global $A
(struct.new $A
(ref.null $A)))
(func $start
(struct.set
(global.get $A)
(global.get $A)))
)
This is not optimal in terms of breaking cycles, but it is fast (linear time)
and simple, and does well in practice on j2wasm (where cycles in fact
occur).
|
|
|
|
|
|
|
| |
Without the hint, we always look for a valid name using name$0, $1, $2, etc.,
starting from 0, and in some cases that can lead to quadratic behavior.
Noticed on a testcase in the fuzzer that runs for over 24 seconds (I gave up at
that point) but takes only 2 seconds with this.
|
|
|
|
|
|
|
|
|
|
| |
Before, a single ctor with GC worked, but any subsequent ones simply dropped
the globals from the previous ones, because we were missing an addGlobal in
an important place.
Also, we can get confused about which global names are in use in the module, so fix
that as well by storing them directly (we keep removing and re-adding globals, so
we can't use the normal module mechanism to find which names are in use).
|
|
|
|
|
|
|
|
|
|
| |
This makes Binaryen's default type system match the WasmGC spec.
Update the way type definitions without supertypes are printed to reduce the
output diff for MVP tests that do not involve WasmGC. Also port some
type-builder.cpp tests from test/example to test/gtest since they needed to be
rewritten to work with isorecursive type anyway.
A follow-on PR will remove equirecursive types completely.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These types, `none`, `nofunc`, and `noextern` are uninhabited, so references to
them can only possibly be null. To simplify the IR and increase type precision,
introduce new invariants that all `ref.null` instructions must be typed with one
of these new bottom types and that `Literals` have a bottom type iff they
represent null values. These new invariants requires several additional changes.
First, it is now possible that the `ref` or `target` child of a `StructGet`,
`StructSet`, `ArrayGet`, `ArraySet`, or `CallRef` instruction has a bottom
reference type, so it is not possible to determine what heap type annotation to
emit in the binary or text formats. (The bottom types are not valid type
annotations since they do not have indices in the type section.)
To fix that problem, update the printer and binary emitter to emit unreachables
instead of the instruction with undetermined type annotation. This is a valid
transformation because the only possible value that could flow into those
instructions in that case is null, and all of those instructions trap on nulls.
That fix uncovered a latent bug in the binary parser in which new unreachables
within unreachable code were handled incorrectly. This bug was not previously
found by the fuzzer because we generally stop emitting code once we encounter an
instruction with type `unreachable`. Now, however, it is possible to emit an
`unreachable` for instructions that do not have type `unreachable` (but are
known to trap at runtime), so we will continue emitting code. See the new
test/lit/parse-double-unreachable.wast for details.
Update other miscellaneous code that creates `RefNull` expressions and null
`Literals` to maintain the new invariants as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An overview of this is in the README in the diff here (conveniently, it is near the
top of the diff). Basically, we fix up nn locals after each pass, by default. This keeps
things easy to reason about - what validates is what is valid wasm - but there are
some minor nuances as mentioned there, in particular, we ignore nameless blocks
(which are commonly added by various passes; ignoring them means we can keep
more locals non-nullable).
The key addition here is LocalStructuralDominance which checks which local
indexes have the "structural dominance" property of 1a, that is, that each get has
a set in its block or an outer block that precedes it. I optimized that function quite
a lot to reduce the overhead of running that logic after each pass. The overhead
is something like 2% on J2Wasm and 0% on Dart (0%, because in this mode we
shrink code size, so there is less work actually, and it balances out).
Since we run fixups after each pass, this PR removes logic to manually call the
fixup code from various places we used to call it (like eh-utils and various passes).
Various passes are now marked as requiresNonNullableLocalFixups => false.
That lets us skip running the fixups after them, which we normally do automatically.
This helps avoid overhead. Most passes still need the fixups, though - any pass
that adds a local, or a named block, or moves code around, likely does.
This removes a hack in SimplifyLocals that is no longer needed. Before we
worked to avoid moving a set into a try, as it might not validate. Now, we just do it
and let fixups happen automatically if they need to: in the common code they
probably don't, so the extra complexity seems not worth it.
Also removes a hack from StackIR. That hack tried to avoid roundtrip adding a
nondefaultable local. But we have the logic to fix that up now, and opts will
likely keep it non-nullable as well.
Various tests end up updated here because now a local can be non-nullable -
previous fixups are no longer needed.
Note that this doesn't remove the gc-nn-locals feature. That has been useful for
testing, and may still be useful in the future - it basically just allows nn locals in
all positions (that can't read the null default value at the entry). We can consider
removing it separately.
Fixes #4824
|
|
This ended up simpler than I thought. We can simply emit global and
local data as we go, creating globals as necessary to contain GC data,
and referring to them using global.get later. That will ensure that
data identity works (things referring to the same object in the interpreter
will refer to the same object when the wasm is loaded). In more detail,
each live GC item is created in a "defining global", a global that is
immutable and of the precise type of that data. Then we just read from
that location in any place that wants to refer to that data. That is,
something like
function foo() {
var x = Bar(10);
var y = Bar(20);
var z = x;
z.value++; // first object now contains 11
...
}
will be evalled into something like
var define$0 = Bar(11); // note the ++ has taken effect here
var define$1 = Bar(20);
function foo() {
var x = define$0;
var y = define$1;
var z = define$0;
...
}
This PR should handle everything but "cycles", that is, GC data that at
runtime ends up forming a loop. Leaving that for later work (not sure
how urgent it is to fix).
|