summaryrefslogtreecommitdiff
path: root/src/decompiler.cc
Commit message (Collapse)AuthorAgeFilesLines
* Add initial support for code metadata (#1840)Yuri Iozzelli2022-02-251-0/+6
| | | | | | | | | | | | | | | | | | | | | See https://github.com/WebAssembly/tool-conventions/blob/main/CodeMetadata.md for the specification. In particular this pr implements the following: - Parsing code metadata sections in BinaryReader, providing appropriate callbacks that a BinaryReaderDelegate can implement: - BinaryReaderObjdump: show the sections in a human-readable form - BinaryReaderIr: add code metadata in the IR as expressions - Parsing code metadata annotations in text format, adding them in the IR like the BinaryReaderIR does - Writing the code metadata present in the IR in the proper sections when converting IR to binary - Support in wasm-decompiler for showing code metadata as comments in the pseudo-code All the features have corresponding tests. Support for code metadata is gated through the --enable-code-metadata feature. For reading/writing in the text format, --enable-annotations is also required. Missing features: Support for function-level code metadata (offset 0) Extensive validation in validator.cc (like making sure that all metadata instances are at the same code offset of an instruction)
* Use C++17 string_view (#1826)Sam Clegg2022-02-111-11/+11
| | | | | Now that we have C++17 we don't need our own string_view class anymore. Depends on #1825
* Clang-format codebase (#1684)Heejin Ahn2021-12-201-72/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This applies clang-format to the whole codebase. I noticed we have .clang-format in wabt but the codebase is not very well formatted. This kind of mass-formatting PR has fans and skeptics because it can mess with `git blame`, but we did a similar thing in Binaryen a few years ago (WebAssembly/binaryen#2048, which was merged in WebAssembly/binaryen#2059) and it was not very confusing after all. If we are ever going to format the codebase, I think it is easier to do it in a single big PR than dozens of smaller PRs. This is using the existing .clang-format file in this repo, which follows the style of Chromium. If we think this does not suit the current formatting style, we can potentially tweak .clang-format too. For example, I noticed the current codebase puts many `case` statements within a single line when they are short, but the current .clang-format does not allow that. This does not include files in src/prebuilt, because they are generated. This also manually fixes some comment lines, because mechanically applying clang-format to long inline comments can look weird. I also added a clang-format check hook in the Github CI in #1683, which I think can be less controversial, given that it only checks the diff. --- After discussions, we ended up reverting many changes, especially one-liner functions and switch-cases, which are too many to wrap in `// clang-format off` and `// clang-format on`. I also considered fixing `.clang-format` to allow those one-liners but it caused a larger churn in other parts. So currently the codebase does not conform to `.clang-format` 100%, but we decided it's fine.
* wasm-decompile: Avoid trailing whitespace (#1714)relrelb2021-09-271-7/+10
| | | | | * wasm-decompile: Avoid trailing whitespace in data declarations * wasm-decompile: Avoid trailing whitespace in binary operators
* Added initial "memory64" proposal support (#1500)Wouter van Oortmerssen2020-08-071-1/+1
|
* [decompiler] fixed blocks with params. (#1497)Wouter van Oortmerssen2020-07-231-2/+2
| | | | It would previously assume the blocktype is "simple" (at most a single result value), but now also supports function signatures. Also fixed it ignoring the validator result.
* Refactor Const struct's internal storage (#1356)Ben Smith2020-03-161-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Const previously stored each value as a union of bit patterns (uint32_t, uint64_t, v128, etc). It was then extended to support cases where NaN value (either arithmetic or canonical) was expected. bool is_expected_nan; union { uint32_t u32; uint32_t f32_bits; ... ExpectedNan expected; } With the SIMD proposal, it's possible for each lane of a f32x4 or f64x2 to be a float or an expected NaN, so this doesn't work anymore. It's possible to move ExpectedNan out of the union, but it's a bit clumsy to use properly: bool is_expected_nan[4]; ExpectedNan expected[4]; union { ... } Instead, I took this as an opportunity to clean up the class a bit. First, ExpectedNan is extended to handle the case where it is not a NaN (i.e. not a not a number), which allows us to remove the bool. Then I store the rest of the data as an array of `uint32_t`, and provide accessor functions instead.
* Cygwin build fixes (#1332)okuoku2020-02-111-1/+1
| | | | | | | | | | | | | | | * Workaround for Cygwin build On cygwin, `__STRICT_ANSI__` does not show POSIX definitions. Use gnu++11 language instead. * wasm-decompile: Silence -Wsign-compare Silence -Wsign-compare warning, by using unsigned literal one. * wasm-objdump: Fix 4294967296 output on disasm Use `%u` instead of `%lu` as we use `uint32_t` here.
* wasm-decompile: escape hatch for variables used outside scope. (#1322)Wouter van Oortmerssen2020-01-301-7/+18
| | | | | | | The decompiler assumes it can define a variable where it is first assigned to, which works for almost all cases, but occasionally there is a use of a variable outside of the scope where it was defined. This detects that case, and makes sure that variable is pre-declared.
* wasm-decompile: use symbols from linking section for names. (#1318)Wouter van Oortmerssen2020-01-271-12/+20
| | | | | | | | | | | | This allows wasm .o files to have more readable names, or even final linked modules if the linking information is preserved (with e.g. --emit-relocs in LLD). This is implemented as part of the WABT IR representation, so benefits wasm2wat as well. Named obtained this way are only set for functions if the function doesn't also have a name in the name section, but is preferred over the export name if there is one.
* wasm-decompile: absolute accesses refer to data segments (#1302)Wouter van Oortmerssen2020-01-161-2/+34
| | | | This makes them easier to look up than the large integer constants LLVM output is full of.
* wasm-decompile: wrap data declarations. (#1298)Wouter van Oortmerssen2020-01-131-2/+17
|
* wasm-decompile: support for pointers to single types. (#1296)Wouter van Oortmerssen2020-01-101-19/+32
| | | | | | | If deriving a "struct" from load/store ops fails, the next best thing is a typed pointer, if all accesses are to the same type. Also fixed some precedence related issues.
* wasm-decompile: friendlier general load/store ops. (#1284)Wouter van Oortmerssen2020-01-091-8/+54
| | | | | - Now has an index that is relative to the type. - Now detects the common case where the index is shifted to produce a new base address.
* wasm-decompile: blocks now represented as labels (#1282)Wouter van Oortmerssen2020-01-061-18/+30
| | | | | | | | | | | | | | | | | | What was before: `block L { STATS }` is now `{ STATS; label L: }` or when possible just: `STATS; label L:` The latter having no indentation at all, and thus automatically flattening all `br_table` nestings and other common patterns. It was initially attempted to create a proper switch out of `br_table`, but the typical LLVM output is so intertwined (with br/br_if jumping in and out of the br_table targets etc) that a switch could have only cleanly applied applied to a small subset of cases. The current simple label flattening works with all wasm code equally, but is a a bit more low level. Also rename `break` into `goto`, reflecting what it is really doing. Though here, `goto` only ever jumps downwards, backwards jumps to the `loop` construct are still called `continue`.
* wasm-decompile: supporting some more node types specifically. (#1279)Wouter van Oortmerssen2020-01-061-1/+64
| | | | | | | This outputs some more WABT IR node types with special purpose syntax, rather than the default catch-all of a function call. Still incomplete (especially for >MVP), more later. Reworking br_table will be a seperate PR.
* wasm-decompile: added precedence support. (#1277)Wouter van Oortmerssen2020-01-021-36/+80
| | | | | Previously it would simply bracket all binary exps. Now it has a precedence system that is in line with what people know from most programming languages.
* wasm-decompile: overhauled name filtering. (#1272)Wouter van Oortmerssen2019-12-231-2/+5
| | | | | | | The previous implementation was too simplistic, as it didn't do the renaming at the correct location (such that it can catch all occurrences), and was also very ineffective in cutting down gigantic STL signatures to something managable. This version creates more usable identifiers in almost all cases.
* wasm-decompile: fixed PreDecl being added to nested blocks. (#1271)Wouter van Oortmerssen2019-12-191-7/+11
| | | | | | | | This generally can't work, since the local in question may still be used after the block, but in this case was also causing some bad corruption of the exp_stack (thanks, vector::emplace, for not asserting on values out of range). Also refactored affected code to be better debuggable.
* wabt-decompile: cleaned up string composition. (#1265)Wouter van Oortmerssen2019-12-131-147/+118
| | | | | | | | | | | | | | | | The code had 3 ways of doing string composition: - Using + and += on string/string_view - ostringstream - wabt::Stream Of these, the first was by far the most widely used, simply because decompilation is a hierarchical process, which requires storing intermediate strings before knowing what surrounds them (thus unsuitable for streams). To make the code more uniform, everything was converted to use the first approach. To not get further performance degradations, some more efficient concatenation methods were added, that also work with wabt::string_view.
* wasm-decompile: Load/Store tracking for struct output. (#1258)Wouter van Oortmerssen2019-12-091-59/+54
| | | | | | | This tries to make code more readable by summarizing patterns of load/store ops into "struct" declarations. Initial version, can probably be improved, but has all essentials of the idea in place.
* wasm-decompile: Output of other sections + import/export. (#1233)Wouter van Oortmerssen2019-11-251-10/+97
| | | | | | | | | | | | | | * wasm-decompile: Output of other sections + import/export. This now outputs data, memories, globals, tables, and import/export of these (and functions). Changed the syntax to be more consistent and refactored how it is checked. * code-review fixes * Fixed printf format warning.
* wasm-decompile: reworked how "stacky" code gets decompiled. (#1205)Wouter van Oortmerssen2019-11-071-7/+20
| | | | | | | | | | | For example: multi-value, and void exps while there are non-void exps on the stack. It now uses temp variables instead of pseudo push/pop, as the latter weren't particularly readable and had an ordering problem that was hard to make intuitive. The new system covers all possible situations, generates as few variables as possible, has clearer comments, and tests.
* wasm-decompile: Added initial tests. (#1195)Wouter van Oortmerssen2019-10-241-12/+14
| | | These are pretty minimal, more will be added as part of feature-PRs.
* wasm-decompile: improved Return & Drop. (#1192)Wouter van Oortmerssen2019-10-181-0/+8
|
* wasm-decompile: Refactored code to first build an AST. (#1191)Wouter van Oortmerssen2019-10-181-269/+164
| | | | This will pave the way for a better multi-pass analysis that can collect information for the final output pass.
* wasm-decompile: Declaring local and global vars. (#1173)Wouter van Oortmerssen2019-09-251-12/+54
| | | | | | | | global myvar:type = initializer; // At file scope. var myvar = initializer; // Local, in a function. Also takes care of lifting these out to function level if these happen inside an exp, or make use of uninitialized local.
* wasm-decompile: Implement br/br_if (#1172)Wouter van Oortmerssen2019-09-251-12/+35
| | | | | | | | | | | | br becomes break or continue, and br_if the same, but prefixed by an if(..). These refer to the enclosing loop/block by generated label name, this syntax may change. We may later want to replace these by while/do-while/switch and other special cases, but for now this is decently readable. Also added ; to statements.
* wasm-decompile: added explicit return statements and types. (#1170)Wouter van Oortmerssen2019-09-251-51/+73
| | | Previously, this was just the last value(s) of a block.
* wasm-decompile: Output Tee as Set+Get instead if possible. (#1165)Wouter van Oortmerssen2019-09-251-29/+77
| | | | | | | | | This really de-tangles the code, as in-line assignments are hard to read. To make this possible, I had to track the current stack depth, take into account unreachable paths and a few other support features. Also added debug output upon assert.
* wasm-decompile: Improved naming. (#1163)Wouter van Oortmerssen2019-09-231-4/+32
| | | | | | This was initially using the same names generated for the wat format. Modified the GenerateNames function slightly to allow alpha based names with no $ prefix, which appears to make for less "noisy" looking code.
* Decompiler: Recurse into block exprs (#1161)Jacob Gravelle2019-09-161-0/+9
|
* [WIP] Added initial skeleton code for wasm-decompile. (#1155)Wouter van Oortmerssen2019-09-121-0/+436
* [WIP] Added initial skeleton code for wasm-decompile. * Code review changes.