| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See https://github.com/WebAssembly/tool-conventions/blob/main/CodeMetadata.md for the specification.
In particular this pr implements the following:
- Parsing code metadata sections in BinaryReader, providing appropriate callbacks that a BinaryReaderDelegate can implement:
- BinaryReaderObjdump: show the sections in a human-readable form
- BinaryReaderIr: add code metadata in the IR as expressions
- Parsing code metadata annotations in text format, adding them in the IR like the BinaryReaderIR does
- Writing the code metadata present in the IR in the proper sections when converting IR to binary
- Support in wasm-decompiler for showing code metadata as comments in the pseudo-code
All the features have corresponding tests.
Support for code metadata is gated through the --enable-code-metadata feature. For reading/writing in the text format, --enable-annotations is also required.
Missing features:
Support for function-level code metadata (offset 0)
Extensive validation in validator.cc (like making sure that all metadata instances are at the same code offset of an instruction)
|
|
|
|
|
| |
Now that we have C++17 we don't need our own string_view class anymore.
Depends on #1825
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This applies clang-format to the whole codebase.
I noticed we have .clang-format in wabt but the codebase is not very
well formatted. This kind of mass-formatting PR has fans and skeptics
because it can mess with `git blame`, but we did a similar thing in
Binaryen a few years ago (WebAssembly/binaryen#2048, which was merged in
WebAssembly/binaryen#2059) and it was not very confusing after all.
If we are ever going to format the codebase, I think it is easier to do
it in a single big PR than dozens of smaller PRs.
This is using the existing .clang-format file in this repo, which
follows the style of Chromium. If we think this does not suit the
current formatting style, we can potentially tweak .clang-format too.
For example, I noticed the current codebase puts many `case` statements
within a single line when they are short, but the current .clang-format
does not allow that.
This does not include files in src/prebuilt, because they are generated.
This also manually fixes some comment lines, because mechanically
applying clang-format to long inline comments can look weird.
I also added a clang-format check hook in the Github CI in #1683, which
I think can be less controversial, given that it only checks the diff.
---
After discussions, we ended up reverting many changes, especially
one-liner functions and switch-cases, which are too many to wrap in
`// clang-format off` and `// clang-format on`. I also considered fixing
`.clang-format` to allow those one-liners but it caused a larger churn
in other parts. So currently the codebase does not conform to
`.clang-format` 100%, but we decided it's fine.
|
|
|
|
|
| |
* wasm-decompile: Avoid trailing whitespace in data declarations
* wasm-decompile: Avoid trailing whitespace in binary operators
|
| |
|
|
|
|
| |
It would previously assume the blocktype is "simple" (at most a single result value), but now also supports function signatures.
Also fixed it ignoring the validator result.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Const previously stored each value as a union of bit patterns (uint32_t,
uint64_t, v128, etc). It was then extended to support cases where NaN
value (either arithmetic or canonical) was expected.
bool is_expected_nan;
union {
uint32_t u32;
uint32_t f32_bits;
...
ExpectedNan expected;
}
With the SIMD proposal, it's possible for each lane of a f32x4 or f64x2
to be a float or an expected NaN, so this doesn't work anymore. It's
possible to move ExpectedNan out of the union, but it's a bit clumsy to
use properly:
bool is_expected_nan[4];
ExpectedNan expected[4];
union { ... }
Instead, I took this as an opportunity to clean up the class a bit.
First, ExpectedNan is extended to handle the case where it is not a NaN
(i.e. not a not a number), which allows us to remove the bool. Then I
store the rest of the data as an array of `uint32_t`, and provide
accessor functions instead.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Workaround for Cygwin build
On cygwin, `__STRICT_ANSI__` does not show POSIX definitions. Use
gnu++11 language instead.
* wasm-decompile: Silence -Wsign-compare
Silence -Wsign-compare warning, by using unsigned literal one.
* wasm-objdump: Fix 4294967296 output on disasm
Use `%u` instead of `%lu` as we use `uint32_t` here.
|
|
|
|
|
|
|
| |
The decompiler assumes it can define a variable where it is first
assigned to, which works for almost all cases, but occasionally there
is a use of a variable outside of the scope where it was defined.
This detects that case, and makes sure that variable is pre-declared.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows wasm .o files to have more readable names, or even final
linked modules if the linking information is preserved (with e.g.
--emit-relocs in LLD).
This is implemented as part of the WABT IR representation, so
benefits wasm2wat as well.
Named obtained this way are only set for functions if the function
doesn't also have a name in the name section, but is preferred over
the export name if there is one.
|
|
|
|
| |
This makes them easier to look up than the large integer
constants LLVM output is full of.
|
| |
|
|
|
|
|
|
|
| |
If deriving a "struct" from load/store ops fails, the next
best thing is a typed pointer, if all accesses are to the
same type.
Also fixed some precedence related issues.
|
|
|
|
|
| |
- Now has an index that is relative to the type.
- Now detects the common case where the index is shifted to
produce a new base address.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
What was before: `block L { STATS }`
is now `{ STATS; label L: }`
or when possible just: `STATS; label L:`
The latter having no indentation at all, and thus automatically
flattening all `br_table` nestings and other common patterns.
It was initially attempted to create a proper switch out of `br_table`,
but the typical LLVM output is so intertwined (with br/br_if jumping
in and out of the br_table targets etc) that a switch could have only
cleanly applied applied to a small subset of cases. The current
simple label flattening works with all wasm code equally, but is a
a bit more low level.
Also rename `break` into `goto`, reflecting what it is really doing.
Though here, `goto` only ever jumps downwards, backwards jumps to the
`loop` construct are still called `continue`.
|
|
|
|
|
|
|
| |
This outputs some more WABT IR node types with special purpose
syntax, rather than the default catch-all of a function call.
Still incomplete (especially for >MVP), more later.
Reworking br_table will be a seperate PR.
|
|
|
|
|
| |
Previously it would simply bracket all binary exps. Now it has a
precedence system that is in line with what people know from most
programming languages.
|
|
|
|
|
|
|
| |
The previous implementation was too simplistic, as it didn't do the
renaming at the correct location (such that it can catch all
occurrences), and was also very ineffective in cutting down gigantic
STL signatures to something managable. This version creates more
usable identifiers in almost all cases.
|
|
|
|
|
|
|
|
| |
This generally can't work, since the local in question may still be
used after the block, but in this case was also causing some bad
corruption of the exp_stack (thanks, vector::emplace, for not
asserting on values out of range).
Also refactored affected code to be better debuggable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code had 3 ways of doing string composition:
- Using + and += on string/string_view
- ostringstream
- wabt::Stream
Of these, the first was by far the most widely used, simply
because decompilation is a hierarchical process, which requires
storing intermediate strings before knowing what surrounds them
(thus unsuitable for streams).
To make the code more uniform, everything was converted to use
the first approach. To not get further performance degradations,
some more efficient concatenation methods were added, that also
work with wabt::string_view.
|
|
|
|
|
|
|
| |
This tries to make code more readable by summarizing patterns of
load/store ops into "struct" declarations.
Initial version, can probably be improved, but has all essentials
of the idea in place.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* wasm-decompile: Output of other sections + import/export.
This now outputs data, memories, globals, tables, and import/export of
these (and functions).
Changed the syntax to be more consistent and refactored how it is
checked.
* code-review fixes
* Fixed printf format warning.
|
|
|
|
|
|
|
|
|
|
|
| |
For example: multi-value, and void exps while there are non-void
exps on the stack.
It now uses temp variables instead of pseudo push/pop, as the latter
weren't particularly readable and had an ordering problem that was
hard to make intuitive.
The new system covers all possible situations, generates as few
variables as possible, has clearer comments, and tests.
|
|
|
| |
These are pretty minimal, more will be added as part of feature-PRs.
|
| |
|
|
|
|
| |
This will pave the way for a better multi-pass analysis that
can collect information for the final output pass.
|
|
|
|
|
|
|
|
| |
global myvar:type = initializer; // At file scope.
var myvar = initializer; // Local, in a function.
Also takes care of lifting these out to function level if these
happen inside an exp, or make use of uninitialized local.
|
|
|
|
|
|
|
|
|
|
|
|
| |
br becomes break or continue, and br_if the same, but prefixed
by an if(..).
These refer to the enclosing loop/block by generated label name,
this syntax may change.
We may later want to replace these by while/do-while/switch and
other special cases, but for now this is decently readable.
Also added ; to statements.
|
|
|
| |
Previously, this was just the last value(s) of a block.
|
|
|
|
|
|
|
|
|
| |
This really de-tangles the code, as in-line assignments are
hard to read.
To make this possible, I had to track the current stack depth, take
into account unreachable paths and a few other support features.
Also added debug output upon assert.
|
|
|
|
|
|
| |
This was initially using the same names generated for the wat
format. Modified the GenerateNames function slightly to allow
alpha based names with no $ prefix, which appears to make for less
"noisy" looking code.
|
| |
|
|
* [WIP] Added initial skeleton code for wasm-decompile.
* Code review changes.
|