| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for that section to the YAML layer, and add
code to update it.
The updating is slightly tricky - unlike .debug_ranges, the
size of entries is not fixed. So we can't just skip entries,
as the end marker is smaller than a normal entry. Instead,
replace now-invalid segments with (1, 1) which is of size
0 and so should be ignored by the debugger (we can't use
(0, 0) as that would be an end marker, and (-1, *) is
the special base marker).
In the future we probably do want to do this in a more
sophisticated manner, completely rewriting the indexes
into the section as well. For now though this should be
enough for when binaryen does not optimize (as we
don't move/reorder anything).
Note that this doesn't update the location description
(like where on the wasm expression stack the value is).
Again, that is correct for when binaryen doesn't
optimize, but for fully optimized builds we would need
to track things (which would be hard!).
Also clean up some code that uses "Extra" instead of
"Delimiter" that was missed before, and shorten some
unnecessarily long names.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pretty straightforward given all we have so far.
Note that fannkuch3_manyopts has an example of
a sequence of ranges of which some must be skipped
while others must not, showing we handle that by
skipping the bad ones and updating the remaining. That
is, if that we have a sequence of two (begin, end) spans
[(10, 20),
(30, 40)]
It's possible (10, 20) maps in the new binary to (110, 120)
while (30, 40) was eliminated by the optimizer and we have
nothing valid to map it to. In that case we emit
[(110, 120)]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just some trivial fixes:
* Properly reset prologue after each line (unlike others, this
flag should be reset immediately).
* Test for a function's end address first, as LLVM output appears to
use 1-past-the-end-of-the-function as a location in that function,
and not the next (note the first byte of the next function, which is
ambiguously identical to that value, is used at least in low_pc;
I'm not sure if it's used in debug lines too).
* Ignore the same address if LLVM emitted it more than once, which
it does sometimes.
|
|
|
|
|
|
|
| |
We need to track end_sequence directly, and use either
end_sequence or copy (copy emits a line without marking
it as ending a sequence).
After this, fib2 debug line output looks perfect.
|
|
|
|
| |
While line and address values of 0 should be skipped, it
seems like column 0 are valid lines emitted by LLVM.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DWARF from LLVM can refer to the first byte belonging to the function,
where the size LEB is, or to the first byte after that, where the local
declarations are, or the end opcode, or to one byte past that which is
one byte past the bytes that belong to the function. We aren't sure why
LLVM does this, but track it all for now.
After this all debug line positions are identified. However,
in some cases a debug line refers to one past the end of the
function, which may be an LLVM bug. That location is ambiguous
as it could also be the first byte of the next function (what
made this discovery possible was when this happened to the
last function, after which there is another section).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Control flow structures have those in addition to the normal span of
(start, end), and we need to track them too.
Tracking them during reading requires us to track control flow
structures while parsing, so that we can know to which structure
an end/else/catch refers to.
We track these locations using a map on the side of instruction
to its "extra" locations. That avoids increasing the size of the
tracking info for the much more common non-control flow
instructions.
Note that there is one more 'end' location, that of the function
(not referring to any instruction). I left that to a later PR to
not increase this one too much.
|
|
|
|
|
| |
Instead of hackishly advancing the read position in the
binary buffer, call readExpression which will do that, and
also do all the debug info handling for us.
|
|
|
|
|
|
| |
LLVM points to the start of the function in some debug line
entries - right after the size LEB of the function, which is
where the locals are declared, and before any instructions.
|
|
|
|
| |
This will make it easier to switch to something else for
offsets in wasm binaries if we get >4GB files.
|
|
|
|
|
|
|
| |
Update high_pc values. These are interesting as they
may be a relative offset compared to the low_pc.
For functions we already had both a start and an end. Add
such tracking for instructions as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Track the beginning and end of each function, both when reading
and writing.
We track expressions and functions separately, instead of having a single
big map of (oldAddr) => (newAddr) because of the potentially ambiguous case
of the final expression in a function: it's end might be identical in offset
to the end of the function. So we have two different things that map to the
same offset. However, if the context is "the end of the function" then the
updated address is the new end of the function, even if the function ends
with a different instruction now, as the old last instruction might have
moved or been optimized out. Concretely, we have getNewExprAddr
and getNewFuncAddr, so we can ask to update the location of either
an expression or a function, and use that contextual information.
This checks for the DIE tag in order to know what we are looking for.
To be safe, if we hit an unknown tag, we halt, so that we don't silently
miss things.
As the test updates show, the new things we can do thanks to this
PR are to update compile unit and subprogram low_pc locations.
Note btw that in the first test (dwarfdump_roundtrip_dwarfdump.bin.txt)
we change 5 to 0: that is correct since that test does not write out
DWARF (it intentionally has no -g), so we do not track binary
locations while writing, and so we have nothing to update to (the
other tests show actual updating).
Also fix the order in the python test runner code to show a diff
of expected to encountered, and not the reverse, which confused
me.
|
|
|
|
|
|
|
| |
Mostly straightforward: go over the dwarf entries, find the
low_pc ones, and update their positions. A slight oddity is
that we must traverse both the dwarf context - which has
the rich APIs for analsis - and the YAML data structure -
which is minimal but is used for writing out.
|
|
|
|
|
|
|
|
|
|
|
| |
Check if an entry starts a new range of addresses. Each range is a set of
related addresses, where in particular, if the first has been zeroed out
by the linker, we must omit the entire range. If we do not, then the
initial range is 0 and the others are offsets relative to it, which will
look like random addresses, perhaps into the middle of instructions, and
perhaps that happen to collide with real ones (a debugger would ignore
those, so we must too; it's easier and better to simply omit them).
See https://bugs.llvm.org/show_bug.cgi?id=44516#c2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Multiple tables appear to be emitted when linking files
together. This fixes our support for that, which did not
update their size properly. This required patching the
YAML emitting code from LLVM in order to measure
the size and then emit it, as that code is apparently
not designed to handle changes in line table
contents.
Other minor fixes:
* Set the flags for our dwarfdump command to emit
the same as llvm-dwarfdump does with -v -all.
* Add support for a few more opcodes,
set_discriminator, set_basic_block, fixed_advance_pc,
set_isa.
* Handle a compile unit without abbreviations in the
YAML code (again, apparently not something this
LLVM code was intended to do).
* Handle a compile unit with zero entries in the
YAML code (ditto).
* Properly set the AddressSize - we use the
DWARFContext in a different way than LLVM expects,
apparently.
With this the emscripten test suite passes with
-gforce_dwarf without crashing.
My overall impression so from the the YAML code is
that it probably isn't a long-term solution for us. Perhaps
it may end up being scaffolding, that is, we can
replace it with our own code eventually that is based
on it, and remove most of the LLVM code. Before
deciding that we should get everything working first,
and this seems like the quickest path there.
|
|
|
|
|
|
|
|
|
|
| |
* Remove implicit conversion operators from Type
Now types must be explicitly converted to uint32_t with Type::getID or
to ValueType with Type::getVT. This fixes #2572 for switches that use
Type::getVT.
* getVT => getSingle
|
|
|
|
|
| |
(#2542)" (#2576)
This reverts commit f62e171c38bea14302f9b79f7941a248ea704425.
|
| |
|
|
|
|
|
| |
This allows subtype for arguments of `throw`. This also renames
`shouldBeSubTypeOrUnreachable` to `shouldBeSubTypeOrFirstIsUnreachable`,
to be consistent with `shouldBeEqualOrFirstIsUnreachable`.
|
|
|
|
|
|
| |
This adds line and column info to wast parser exception messages to be
more readable when they are encoutered. In other cases this makes
existing line and column number more fine grained, or adds some helpful
strings (if line and column info is not available).
|
|
|
|
|
|
| |
- Allow `atomic.notify` and `atomic.wait` instructions to parse memory
arguments (`align` and `offset`) and print the offset in these
instruction when writing binary, rather than assuming it to be 0
- Change arguments of `parseMemAttributes` to be references
|
|
|
|
|
|
|
|
|
|
|
| |
We have not been generating push and pop instructions in the stack IR.
Even though they are not written in binary, they have to be in the stack
IR to match the number of inputs and outputs of instructions.
Currently `BinaryenIRWriter` is used both for stack IR generation and
binary generation, so we should emit those instructions in
`BinaryenIRWriter`. `BinaryenIRToBinaryWriter`, which inherits
`BinaryenIRWriter`, does not do anything for push and pop instructions,
so they are still not emitted in binary.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for the reference type proposal. This includes support
for all reference types (`anyref`, `funcref`(=`anyfunc`), and `nullref`)
and four new instructions: `ref.null`, `ref.is_null`, `ref.func`, and
new typed `select`. This also adds subtype relationship support between
reference types.
This does not include table instructions yet. This also does not include
wasm2js support.
Fixes #2444 and fixes #2447.
|
|
|
|
|
|
|
|
|
|
|
| |
Several type-related functions currently exist outside of `Type`
class and thus in the `wasm`, effectively global, namespace. This moves
these functions into `Type` class, making them either member functions
or static functions.
Also this renames `getSize` to `getByteSize` to make it not to be
confused with `size`, which returns the number of types in multiple
types. This also reorders the order of functions in `wasm-type.cpp` to
match that of `wasm-type.h`.
|
|
|
|
| |
This does something similar to #2489 for more functions, removing
boilerplate code for each module element using template functions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this, we can update DWARF debug line info properly as
we write a new binary.
To do that we track binary locations as we write. Each
instruction is mapped to the location it is written to. We
must also adjust them as we move code around because
of LEB optimization (we emit a function or a section
with a 5-byte LEB placeholder, the maximal size; later
we shrink it which is almost always possible).
writeDWARFSections() now takes a second param, the new
locations of instructions. It then maps debug line info from the
original offsets in the binary to the new offsets in the binary
being written.
The core logic for updating the debug line section is in
wasm-debug.cpp. It basically tracks state machine logic
both to read the existing debug lines and to emit the new
ones. I couldn't find a way to reuse LLVM code for this, but
reading LLVM's code was very useful here.
A final tricky thing we need to do is to update the DWARF
section's internal size annotation. The LLVM YAML writing
code doesn't do that for us. Luckily it's pretty easy, in
fixEmittedSection we just update the first 4 bytes in place
to have the section size, after we've emitted it and know
the size.
This ignores debug lines with a 0 in the line, col, or addr,
see WebAssembly/debugging#9 (comment)
This ignores debug line offsets into the middle of
instructions, which LLVM sometimes emits for some
reason, see WebAssembly/debugging#9 (comment)
Handling that would likely at least double our memory
usage, which is unfortunate - we are run in an LTO manner,
where the entire app's DWARF is present, and it may be
massive. I think we should see if such odd offsets are
a bug in LLVM, and if we can fix or prevent that.
This does not emit "special" opcodes for debug lines. Those
are purely an optimization, which I wanted to leave for
later. (Even without them we decrease the size quite a lot,
btw, as many lines have 0s in them...)
This adds some testing that shows we can load and save
fib2.c and fannkuch.cpp properly. The latter includes more
than one function and has nontrivial code.
To actually emit correct offsets a few minor fixes are
done here:
* Fix the code section location tracking during reading -
the correct offset we care about is the body of the code
section, not including the section declaration and size.
* Fix wasm-stack debug line emitting. We need to update
in BinaryInstWriter::visit(), that is, right before writing
bytes for the instruction. That differs from
* BinaryenIRWriter::visit which is a recursive function
that also calls the children - so the offset there would be
of the first child. For some reason that is correct with
source maps, I don't understand why, but it's wrong for
DWARF...
* Print code section offsets in hex, to match other tools.
Remove DWARFUpdate pass, which was useful for testing
temporarily, but doesn't make sense now (it just updates without
writing a binary).
cc @yurydelendik
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Reland "Fix renaming in FixInvokeFunctionNamesWalker (#2513)"
In the previous iteration of this change we were not calling
`renameFunctions` for each of the functions we removed.
The problem manifested itself when we rename the imported function to
`emscripten_longjmp_jmpbuf` to `emscripten_longjmp`. In this case the
import of `emscripten_longjmp` already exists so we remove the import of
`emscripten_longjmp_jmpbuf` but we were not correclty calling
renameFunctions to handle the rename of all the uses.
Add an additional test case to cover the failures that we saw on the
emscripten tree.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optionally track the binary format code section offsets,
that is, when loading a binary, remember where each IR
node was read from. This is necessary for DWARF
debug info, as these are the offsets DWARF refers to.
(Note that eventually we may want to do something
else, like first read the DWARF and only then add
debug info annotations into the IR in a more LLVM-like
manner, but this is more straightforward and should be
enough to update debug lines and ranges).
This tracking adds noticeable overhead - every single
IR node adds an entry in a map - so avoid it unless
actually necessary. Specifically, if the user passes in
-g and there are actually DWARF sections in the
binary, and we are not about to remove those sections,
then we need it.
Print binary format code section offsets in text, when
printing with -g. This will help debug and test dwarf
support. It looks like
;; code offset: 0x7
as an annotation right before each node.
Also add support for -g in wasm-opt tests (unlike
a pass, it has just one - as a prefix).
Helps #2400
|
|
|
| |
This reverts commit f0a2e2c75c7bb3008f10b6edbb8dc4cfd27b7d28.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This imports LLVM code for DWARF handling. That code has the
Apache 2 license like us. It's also the same code used to
emit DWARF in the common toolchain, so it seems like a safe choice.
This adds two passes: --dwarfdump which runs the same code LLVM
runs for llvm-dwarfdump. This shows we can parse it ok, and will
be useful for debugging. And --dwarfupdate writes out the DWARF
sections (unchanged from what we read, so it just roundtrips - for
updating we need #2515).
This puts LLVM in thirdparty which is added here.
All the LLVM code is behind USE_LLVM_DWARF, which is on
by default, but off in JS for now, as it increases code size by 20%.
This current approach imports the LLVM files directly. This is not
how they are intended to be used, so it required a bunch of
local changes - more than I expected actually, for the platform-specific
stuff. For now this seems to work, so it may be good enough, but
in the long term we may want to switch to linking against libllvm.
A downside to doing that is that binaryen users would need to
have an LLVM build, and even in the waterfall builds we'd have a
problem - while we ship LLVM there anyhow, we constantly update
it, which means that binaryen would need to be on latest llvm all
the time too (which otherwise, given DWARF is quite stable, we
might not need to constantly update).
An even larger issue is that as I did this work I learned about how
DWARF works in LLVM, and while the reading code is easy to
reuse, the writing code is trickier. The main code path is heavily
integrated with the MC layer, which we don't have - we might want
to create a "fake MC layer" for that, but it sounds hard. Instead,
there is the YAML path which is used mostly for testing, and which
can convert DWARF to and from YAML and from binary. Using
the non-YAML parts there, we can convert binary DWARF to
the YAML layer's nice Info data, then convert that to binary. This
works, however, this is not the path LLVM uses normally, and it
supports only some basic DWARF sections - I had to add ranges
support, in fact. So if we need more complex things, we may end
up needing to use the MC layer approach, or consider some other
DWARF library. However, hopefully that should not affect the core
binaryen code which just calls a library for DWARF stuff.
Helps #2400
|
|
|
| |
As specified in https://github.com/WebAssembly/simd/pull/126.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes https://github.com/emscripten-core/emscripten/issues/9950.
The issue only shows up when debug names are not present so most of
the changes in CL come from disabling debug names in the lld tests.
We want to make sure that wasm-emscripten-finalize runs fine without
debug names so I think it makes most sense to test in this mode.
The actual bugfix is in wasm-emscripten.cpp as part of the
FixInvokeFunctionNamesWalker. The problem was the name of the function
rather than is import name was being added to importRenames. This means
that when debug names were present (and the two names were the same)
we didn't see the bug.
|
|
|
|
|
|
|
|
|
| |
In normal mode we call a JS import, but we can't import from JS
in standalone mode. Instead, just trap in that case with an
unreachable. (The error reporting is not as good in this case, but
at least it catches all errors and halts, and the emitted wasm is
valid for standalone mode.)
Helps emscripten-core/emscripten#10019
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the current spec, `local.tee`'s return type should be the
same as its local's type. (Discussions on whether we should change this
rule is going on in WebAssembly/reference-types#55, but here I will
assume this spec does not change. If this changes, we should change many
parts of Binaryen transformation anyway...)
But currently in Binaryen `local.tee`'s type is computed from its
value's type. This didn't make any difference in the MVP, but after we
have subtype relationship in #2451, this can become a problem. For
example:
```
(func $test (result funcref) (local $0 anyref)
(local.tee $0
(ref.func $test)
)
)
```
This shouldn't validate in the spec, but this will pass Binaryen
validation with the current `local.tee` implementation.
This makes `local.tee`'s type computed from the local's type, and makes
`LocalSet::makeTee` get a type parameter, to which we should pass the
its corresponding local's type. We don't embed the local type in the
class `LocalSet` because it may increase memory size.
This also fixes the type of `local.get` to be the local type where
`local.get` and `local.set` pair is created from `local.tee`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Function signatures were previously redundantly stored on Function
objects as well as on FunctionType objects. These two signature
representations had to always be kept in sync, which was error-prone
and needlessly complex. This PR takes advantage of the new ability of
Type to represent multiple value types by consolidating function
signatures as a pair of Types (params and results) stored on the
Function object.
Since there are no longer module-global named function types,
significant changes had to be made to the printing and emitting of
function types, as well as their parsing and manipulation in various
passes.
The C and JS APIs and their tests also had to be updated to remove
named function types.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently `none` and `unreachable` types are stored as the same empty
`{}` in src/wasm/wasm-type.cpp. This makes `Type::operator<` incorrectly
when given `none` and `unreachable`, because it expands both given types
and lexicographically compare them, when both of the expanded vector
will be empty.
This was found by the fuzzer. This line in `Modder::visitExpression`
tries to retrieve candidates of the same type. Because we can't really
compare these two types, if you give `unreachable` as the key,
candidates of `none` type can be returned. This generates incorrect code
that ends up failing in validation in a very weird way.
It was hard to generate a small testcase to trigger this part because it
was found by generating fuzzed code from a random data file. But I guess
this fix is pretty straightforward.
Fixes #2512.
|
|
|
|
|
| |
Also fix, but in splitting the names of the trace channels. Obviously
I can't write string.split correctly in C first time around.
|
|
|
|
|
| |
This works more like llvm's unreachable handler in that is preserves
information even in release builds.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
That was needed for super-old wasm type system, where we allowed
(block $x
(br_if $x
(unreachable)
(nop)
)
)
That is, we differentiated "taken" branches from "named" ones (just
referred to by name, but not actually taken as it's in unreachable code).
We don't need to differentiate those any more. Remove the ReFinalize
code that considered it, and also remove the named/taken distinction in
other places.
|
|
|
| |
This is line with modern cmake conventions is much less SHOUTY!
|
|
|
|
|
|
| |
This means that debugging/tracing can now be enabled and controlled
centrally without managing and passing state around the codebase.
|
|
|
|
|
|
|
|
|
|
|
| |
This creates utility functions for removing module elements: removing
one element by name, and removing multiple elements using a predicate
function. And makes other parts of code use it. I think this is a
light-handed approach than calling `Module::updateMaps` after removing
only a part of module elements.
This also fixes a bug in the inlining pass: it didn't call
`Module::updateMaps` after removing functions. After this patch callers
don't need to additionally call it anyway.
|
|
|
|
|
|
|
|
|
| |
using the `$<TARGET_OBJECTS:objlib>` syntax. Use this variable when
adding `libbinaryen` as static or shared library. Additionally, use the
variable with the object files to simplify the `TARGET_LINK_LIBRARIES`
commands: add the object libraries to the sources of executables and
drop the use of our libraries in `TARGET_LINK_LIBRARIES`. (Object
libraries cannot be linked but must be used as sources. See
https://cmake.org/pipermail/cmake/2018-June/067721.html)
|
|
|
|
|
|
|
|
|
|
| |
Create a new ParallelFunctionAnalysis helper, which lets us
run in parallel on all functions and collect info from them,
without manually handling locks etc.
Use that in the binary writing code's type collection logic,
avoiding a lock for each type increment.
Also add Signature printing which was useful to debug this.
|
|
|
|
|
|
|
|
|
| |
We were only updating the imported Function's type name
field and failing to update its params and results. This caused the
binary writer to start using the wrong types after #2466.
This PR fixes the code to update both type representations on the
imported function. This double bookkeeping will be removed entirely in
an upcoming PR.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current `<<` operator on `Literal` prints `[type].const` with it. But
`[type].const` is rather an instruction than a literal itself, and
printing it with the literals makes less sense when we later have
literals whose type don't have `const` instructions (such as reference
types).
This patch
- Makes `<<` operator on `Literal` print only its value
- Makes wasm-shell's shell interface comply with the spec interpreter's
printing format (`value : type`).
- Prints wasm-shell's `[trap]` message to stderr
These make all `fix_` routines for spec tests in check.py unnecessary.
|
|
|
|
|
| |
(#2474)
This reverts commit bf8f36c31c0b8e6213bce840be66937dd6d0f6af.
|
|
|
|
|
|
|
|
|
| |
This is the start of a larger refactoring to remove FunctionType entirely and
store types and signatures directly on the entities that use them. This PR
updates BrOnExn and Events to remove their use of FunctionType and makes the
BinaryWriter traverse the module and collect types rather than using the global
FunctionType list. While we are collecting types, we also sort them by frequency
as an optimization. Remaining uses of FunctionType in Function, CallIndirect,
and parsing will be removed in a future PR.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Transform libraries created in subdirectories from statically linked
libraries to CMake object libraries.
* Link object libraries as `PRIVATE` to `libbinaryen`.
According to CMake documentation: "Libraries and targets following
PRIVATE are linked to, but are not made part of the link interface."
This is exactly what we want, as we only want the C API to be part of
the interface.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds the ability to create multivalue types from vectors of concrete value
types. All types are transparently interned, so their representation is still a
single uint32_t. Types can be extracted into vectors of their component parts,
and all the single value types expand into vectors containing themselves.
Multivalue types are not yet used in the IR, but their creation and inspection
functionality is exposed and tested in the C and JS APIs.
Also makes common type predicates methods of Type and improves the ergonomics of
type printing.
|