diff options
-rw-r--r-- | README.md | 53 | ||||
-rw-r--r-- | docs/decompiler.md | 169 | ||||
-rw-r--r-- | man/wasm-decompile.1 | 55 |
3 files changed, 260 insertions, 17 deletions
@@ -9,6 +9,7 @@ WABT (we pronounce it "wabbit") is a suite of tools for WebAssembly, including: - [**wasm2wat**](https://webassembly.github.io/wabt/doc/wasm2wat.1.html): the inverse of wat2wasm, translate from the binary format back to the text format (also known as a .wat) - [**wasm-objdump**](https://webassembly.github.io/wabt/doc/wasm-objdump.1.html): print information about a wasm binary. Similiar to objdump. - [**wasm-interp**](https://webassembly.github.io/wabt/doc/wasm-interp.1.html): decode and run a WebAssembly binary file using a stack-based interpreter + - [**wasm-decompile**](https://webassembly.github.io/wabt/doc/wasm-decompile.1.html): decompile a wasm binary into readable C-like syntax. - [**wat-desugar**](https://webassembly.github.io/wabt/doc/wat-desugar.1.html): parse .wat text form as supported by the spec interpreter (s-expressions, flat syntax, or mixed) and print "canonical" flat format - [**wasm2c**](https://webassembly.github.io/wabt/doc/wasm2c.1.html): convert a WebAssembly binary file to a C source and header - [**wasm-strip**](https://webassembly.github.io/wabt/doc/wasm-strip.1.html): remove sections of a WebAssembly binary file @@ -104,8 +105,8 @@ executable called `wasm2c` which conflicts with the `wasm2c` directory. On some systems (typically macOS), this doesn't build properly. If you see these errors, you can build using CMake directly as described above. -You'll need [CMake](https://cmake.org). If you just run `make`, it will run CMake for you, -and put the result in `out/clang/Debug/` by default: +You'll need [CMake](https://cmake.org). If you just run `make`, it will run CMake for you, +and put the result in `bin/clang/Debug/` by default: > Note: If you are on macOS, you will need to use CMake version 3.2 or higher @@ -189,24 +190,24 @@ Some examples: ```sh # parse and typecheck test.wat -$ out/wat2wasm test.wat +$ bin/wat2wasm test.wat # parse test.wat and write to binary file test.wasm -$ out/wat2wasm test.wat -o test.wasm +$ bin/wat2wasm test.wat -o test.wasm # parse spec-test.wast, and write verbose output to stdout (including the # meaning of every byte) -$ out/wat2wasm spec-test.wast -v +$ bin/wat2wasm spec-test.wast -v # parse spec-test.wast, and write files to spec-test.json. Modules are written # to spec-test.0.wasm, spec-test.1.wasm, etc. -$ out/wast2json spec-test.wast -o spec-test.json +$ bin/wast2json spec-test.wast -o spec-test.json ``` You can use `--help` to get additional help: ```console -$ out/wat2wasm --help +$ bin/wat2wasm --help ``` Or try the [online demo](https://webassembly.github.io/wabt/demo/wat2wasm/). @@ -217,16 +218,16 @@ Some examples: ```sh # parse binary file test.wasm and write text file test.wat -$ out/wasm2wat test.wasm -o test.wat +$ bin/wasm2wat test.wasm -o test.wat # parse test.wasm and write test.wat -$ out/wasm2wat test.wasm -o test.wat +$ bin/wasm2wat test.wasm -o test.wat ``` You can use `--help` to get additional help: ```console -$ out/wasm2wat --help +$ bin/wasm2wat --help ``` Or try the [online demo](https://webassembly.github.io/wabt/demo/wasm2wat/). @@ -237,28 +238,46 @@ Some examples: ```sh # parse binary file test.wasm, and type-check it -$ out/wasm-interp test.wasm +$ bin/wasm-interp test.wasm # parse test.wasm and run all its exported functions -$ out/wasm-interp test.wasm --run-all-exports +$ bin/wasm-interp test.wasm --run-all-exports # parse test.wasm, run the exported functions and trace the output -$ out/wasm-interp test.wasm --run-all-exports --trace +$ bin/wasm-interp test.wasm --run-all-exports --trace # parse test.json and run the spec tests -$ out/wasm-interp test.json --spec +$ bin/wasm-interp test.json --spec # parse test.wasm and run all its exported functions, setting the value stack # size to 100 elements -$ out/wasm-interp test.wasm -V 100 --run-all-exports +$ bin/wasm-interp test.wasm -V 100 --run-all-exports ``` You can use `--help` to get additional help: ```console -$ out/wasm-interp --help +$ bin/wasm-interp --help ``` +## Running wasm-decompile + +For example: + +```sh +# parse binary file test.wasm and write text file test.dcmp +$ bin/wasm-decompile test.wasm -o test.dcmp +``` + +You can use `--help` to get additional help: + +```console +$ bin/wasm-decompile --help +``` + +See [decompiler.md](docs/decompiler.md) for more information on the language +being generated. + ## Running wasm2c See [wasm2c.md](wasm2c/README.md) @@ -283,7 +302,7 @@ There are configurations for the Address Sanitizer (ASAN), Memory Sanitizer (MSAN), Leak Sanitizer (LSAN) and Undefine Behavior Sanitizer (UBSAN). You can read about the behaviors of the sanitizers in the link above, but essentially the Address Sanitizer finds invalid memory accesses (use after free, access -out-of-bounds, etc.), Memory Sanitizer finds uses of uninitialized memory, +out-of-bounds, etc.), Memory Sanitizer finds uses of uninitialized memory, the Leak Sanitizer finds memory leaks, and the Undefined Behavior Sanitizer finds undefined behavior (surprise!). diff --git a/docs/decompiler.md b/docs/decompiler.md new file mode 100644 index 00000000..5dcd78c8 --- /dev/null +++ b/docs/decompiler.md @@ -0,0 +1,169 @@ +# wasm-decompile + +Decompiles binary wasm modules into a text format that is significantly +more compact and familiar (for users of C-style languages). + +Example: + +`bin/wasm-decompile test.wasm -o test.dcmp` + +## Goals. + +This tool is aimed at users that want to be able to "read" large volumes +of Wasm code such as language, runtime and tool developers, or any programmers +that may not have the source code of the generated wasm available, or are +trying to understand what the generated code does. + +The syntax has been designed to be as light-weight and as readable as possible, +while still allowing one to see the underlying Wasm constructs clearly. + +## Non-goals. + +Be a programming language. + +Though compiling this output code back into a wasm module is possible, +such functionality is currently not provided. The format is very low-level, +much like Wasm itself, so even though it looks more high level than the .wat +format, it wouldn't be any more suitable for general purpose programming. + +## Language guide. + +This section shows some aspects of the language in terms of how they map to +Wasm and/or how they might differ from a typical C-like language. It does +not try to define the actual semantics of Wasm, the reader is expected to +already be mostly familiar with that. + +### Naming. + +wasm-decompile, much like wasm2wat, derives names from import/export +declarations and the name section where possible. For things that have no +names, names are generated starting from `a`, `b`, `c` and so forth. + +In addition, prefixes are used for things that are not arguments/locals: +`f_` for functions, `g_` for globals, etc. + +Existing names may be generated "demangled" C++ function signatures, which +in the case of functions using STL types may end up several hundred characters +long. Besides removing characters not typically part of an identifier, the +decompiler also strips common keywords/types from these in an effort to +reduce their size. + +### Top level declarations. + +Top level items may be preceded with `import` or `export`. + +Memory is declared like `memory m(initial: 1, max: 0);` + +Globals: `global my_glob:int;` + +Data: `data d_a(offset: 0) = "Hello, World!";` + +Functions (see below for instructions that may appear between `{}`): +`function f(a:int, b:int):int { return a + b; }` + +### Statements and expressions. + +An expression is generated for any sequence of Wasm instructions that +leave exactly 1 value on the stack. + +For instructions that leave no value values on the stack, a statement is +generated, which is an expression that sits on its own line in the context +of a control-flow block, or the function itself. A statement may also be +generated for expressions that return a value through control flow, such +as a branch instruction. + +Instructions that leave multiple values on the stack, or otherwise do stack +operations that break the "expression order", instead force the values to be +written to temporary variables (named `t1`, `t2` etc) which the subsequent +instructions can then operate upon (this does not happen with MVP-only code). + +### Declaration of arguments and locals. + +Arguments are defined in the function signature, as shown above. + +Locals are defined upon first use: `var my_local:int = 1;` + +### Types. + +The decompiler uses `int` and `long` for 32-bit and 64-bit integers, and +`float` and `double` for 32-bit and 64-bit floating point numbers. + +Besides these, there are the types `byte` and `ubyte` (8-bit), `short` and +`ushort` (16-bit), and `uint`, which are used exclusively with certain +load/store operations. + +### Loads and stores. + +These tend to be the hardest to "read" in Wasm code, as they've lost all +context of the data structures and types the language that Wasm was compiled +from was operating upon. + +wasm-decompile has a few feature to try and make these more readable. + +The basic form looks like an array indexing operation, so `o[2]:int` says: read +element 2 from `o` when seen as an array of ints. This thus accesses 4 bytes +at byte-offset 8. + +`o` is just declared as an `int`, since there is no such thing as a pointer +type in Wasm. But wasm-decompile tries to derive them. For example, if the +code is doing `o[0]:int = o[1]:int + o[2]:int`, then wasm-decompile assumes +`o` points to a struct with 3 ints, and may instead compile this to: + + var o:{ a:int, b:int, c:int }; + o.a = o.b + o.c + +The `{}` type is a nameless struct declaration (named ones tbd) that hints the +reader at what kind of memory layout `o` is accessing. This seems more +informative than just uncorrelated indices all over the code. + +Sadly, optimized output from a compiler like LLVM often reworks memory accesses +in such crazy ways that this "struct detection" fails, for example it falls +back to indexing operations when there are holes or overlaps in the memory +layout, or types are mixed, etc. This happens even more so when locals such +as `o` are being re-used for unrelated things in memory. + +Additionally, wasm-decompile tried to clean up typical indexing operations. +For example, when accessing any array of 32-bit elements, generated Wasm +code often looks like `(base + (index << 2))[0]:int`, since Wasm has no +built-in way to scale the index by the type of thing being loaded. +wasm-decompile then transforms this into just `base[index]:int`, since the +scaling of anything between the `[]` by the type size is already implied. + +### Control flow. + +Wasm's if-then maps fairly directly to a C-like `if (c) { 1; } else { 2; }`. +Unlike most languages, these if-thens can also be expressions, as shown in +this example (wasm-decompile does not currently use the `?:` ternary). + +Wasm's loop becomes a `loop L { ...; continue L; }` structure. The +inclusion of a label means nested loops can continue any of them. + +Wasm's blocks are little more than a label for forward jumps, and cause +excessive amounts of nesting in other text formats such as .wat, so here +they are reduced to what they naturally are: a label. This label uses `{}` +for denoting a block only when used as an expression, so typically does not +indent, and thus doesn't cause endless nesting: + + if (c) goto L; + ... + label L: + +### Operator precedence. + +wasm-decompile uses the following operator precedence to reduce the amount of +`()` needed in expressions, from high (needs no `()`) to low (always needs +`()` when nested): + +* `()`, `a`, `1`, `a()` +* `[]` +* `if () {} else {}` +* `*`, `/`, `%` +* `+`, `-` +* `<<`, `>>` +* `==`, `!=`, `<`, `>`, `>=`, `<=` +* `&`, `|` +* `min`, `max` +* `=` + +Only `+` and `*` are associative, i.e. can have multiple of them in sequence +without additional `()`. diff --git a/man/wasm-decompile.1 b/man/wasm-decompile.1 new file mode 100644 index 00000000..178f694d --- /dev/null +++ b/man/wasm-decompile.1 @@ -0,0 +1,55 @@ +.Dd $Mdocdate$ +.Dt WABT 1 +.Os +.Sh NAME +.Nm wasm-decompile +.Nd translate from the binary format to readable C-like syntax +.Sh SYNOPSIS +.Nm wasm-decompile +.Op options +.Ar file +.Sh DESCRIPTION +.Nm +translate from the binary format to readable C-like syntax. +.Pp +The options are as follows: +.Bl -tag -width Ds +.It Fl v , Fl Fl verbose +Use multiple times for more info +.It Fl Fl help +Print a help message +.It Fl o , Fl Fl output=FILENAME +Output file for the generated wast file, by default use stdout +.It Fl Fl enable-exceptions +Experimental exception handling +.It Fl Fl disable-mutable-globals +Import/export mutable globals +.It Fl Fl enable-saturating-float-to-int +Saturating float-to-int operators +.It Fl Fl enable-sign-extension +Sign-extension operators +.It Fl Fl enable-simd +SIMD support +.It Fl Fl enable-threads +Threading support +.El +.Sh EXAMPLES +Parse binary file test.wasm and write text file test.dcmp +.Pp +.Dl $ wasm-decompile test.wasm -o test.dcmp +.Sh SEE ALSO +.Xr wasm2wat 1 , +.Xr wasm-interp 1 , +.Xr wasm-objdump 1 , +.Xr wasm-opcodecnt 1 , +.Xr wasm-strip 1 , +.Xr wasm-validate 1 , +.Xr wasm2c 1 , +.Xr wast2json 1 , +.Xr wat-desugar 1 , +.Xr wat2wasm 1 , +.Xr spectest-interp 1 +.Sh BUGS +If you find a bug, please report it at +.br +.Lk https://github.com/WebAssembly/wabt/issues . |