summaryrefslogtreecommitdiff
path: root/src/tools/wasm-emscripten-finalize.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Make `Name` a pointer, length pair (#5122)Thomas Lively2022-10-111-1/+0
| | | | | | | | | | | | | | | | | | | | | | | With the goal of supporting null characters (i.e. zero bytes) in strings. Rewrite the underlying interned `IString` to store a `std::string_view` rather than a `const char*`, reduce the number of map lookups necessary to intern a string, and present a more immutable interface. Most importantly, replace the `c_str()` method that returned a `const char*` with a `toString()` method that returns a `std::string`. This new method can correctly handle strings containing null characters. A `const char*` can still be had by calling `data()` on the `std::string_view`, although this usage should be discouraged. This change is NFC in spirit, although not in practice. It does not intend to support any particular new functionality, but it is probably now possible to use strings containing null characters in at least some cases. At least one parser bug is also incidentally fixed. Follow-on PRs will explicitly support and test strings containing nulls for particular use cases. The C API still uses `const char*` to represent strings. As strings containing nulls become better supported by the rest of Binaryen, this will no longer be sufficient. Updating the C and JS APIs to use pointer, length pairs is left as future work.
* Remove metadata generation from wasm-emscripten-finalize (#4863)Sam Clegg2022-08-071-31/+3
| | | | This is no longer needed by emscripten as of: https://github.com/emscripten-core/emscripten/pull/16529
* Remove renameMainArgcArgv from wasm-emscripten-finalize (#4700)Sam Clegg2022-05-311-6/+0
| | | | | | | | | This part to finalize is currently not used and was added in preparation for https://reviews.llvm.org/D75277. However, the better solution to dealing with this alternative name for main is on the emscripten side. The main reason for this is that doing the rename here in binaryen would require finalize to always re-write the binary, which is expensive.
* Remove used wasm-emscripten-finalize option `--initial-stack-pointer` (#4490)Sam Clegg2022-02-011-8/+0
|
* wasm-emscripten-finalize: Remove legacy --new-pic-abi option (#4483)Sam Clegg2022-01-271-6/+0
|
* Add --no-emit-metadata option to wasm-emscripten-finalize (#4450)Sam Clegg2022-01-191-3/+14
| | | | | | This is useful for the case where we might want to finalize without extracting metadata. See: https://github.com/emscripten-core/emscripten/pull/15918
* Add categories to --help text (#4421)Alon Zakai2022-01-051-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The general shape of the --help output is now: ======================== wasm-foo Does the foo operation ======================== wasm-foo opts: -------------- --foo-bar .. Tool opts: ---------- .. The options are now in categories, with the more specific ones - most likely to be wanted by the user - first. I think this makes the list a lot less confusing. In particular, in wasm-opt all the opt passes are now in their own category. Also add a script to make it easy to update the help tests.
* Apply features from the commandline first (#3960)Alon Zakai2021-07-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | As suggested in https://github.com/WebAssembly/binaryen/pull/3955#issuecomment-871016647 This applies commandline features first. If the features section is present, and disallows some of them, then we warn. Otherwise, the features can combine (for example, a wasm may enable feature X because it has to use it, and a user can simply add the flag for feature Y if they want the optimizer to try to use it; both flags will then be enabled). This is important because in some cases we need to know the features before parsing the wasm, in the case that the wasm does not use the features section. In particular, non-nullable GC locals have an effect during parsing. (Typed function references also does, but we found a way to apply its effect all the time, that is, always use the refined type, and that happened to not break the case where the feature is disabled - but such a workaround is not possible with non-nullable locals.) To make this less error-prone, add a FeatureSet input as a parameter to WasmBinaryBuilder. That is, when building a module, we must give it the features to use while doing so. This will unblock #3955 . That PR will also add a test for the actual usage of a feature during loading (the test can only be added there, after that PR unbreaks things).
* Remove renaming of __wasm_call_ctors (#3811)Sam Clegg2021-04-151-9/+0
| | | See https://github.com/emscripten-core/emscripten/issues/13893
* wasm-emscripten-finalize: Do not skip the start function body (#3714)Alon Zakai2021-03-221-3/+6
| | | | | | When we can skip function bodies, we still need to parse the start function for the pthreads case, see details in the comments. This still gives us 99% of the speedup as the start function is just 1 function and it's not that big, so with this we return to full speed after the reversion in #3705
* Revert the effect of #3689 (#3705)Alon Zakai2021-03-181-2/+3
| | | | | | | | | | | | | That PR assumed that wasm-emscripten-finalize does not need to scan function bodies for metadata. But there is a case where it does, which is that EM_ASMs with pthreads do still require scanning of the code. So that approach is not valid. We could maybe disable the optimization just on pthreads, but I think major use cases need that. Also there is no simple way to disable it atm, we'd need changes on both emscripten and binaryen. Also that PR can no longer be reverted cleanly due to other changes. For all those reasons, this just disables the optimization so that users of tot are no longer broken, while we figure out how a valid way to optimize this use case.
* wasm-emscripten-finalize: Do not read the Names section when not writing ↵Alon Zakai2021-03-181-1/+6
| | | | | | | | | | | | | | | | | output (#3698) When not writing output we don't need debug info, as it is not relevant for our metadata. This saves loading and interning all the names, which takes several seconds on massive inputs. This is possible in principle in other tools, but this does not change anything in them for now. (We do use names internally in some nontrivial ways without opting in to it, so that would require further refactoring. Also the other tools almost always do write an output.) This is not 100% unobservable. If validation fails then the validation error would just contain the function index instead of the name from the Names section if there is one. However finalize does not validate atm so that would only matter if we change that later.
* Skip function bodies in wasm-emscripten-finalize when we don't need them (#3689)Alon Zakai2021-03-171-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After sbc100 's work on EM_ASM and EM_JS they are now parsed from the wasm using exports etc. and so we no longer need to parse function bodies. As a result if we are not emitting a wasm from wasm-emscripten-finalize then all we are doing is scanning global structures like imports and exports and emitting metadata about them. And indeed we do not need to emit a wasm in some cases, specifically when not optimizing and when using WASM_BIGINT (to avoid needing to legalize). We had considering skipping wasm-emscripten-finalize entirely in that situation, and instead to parse the metadata from the wasm in python on the emscripten side. However sbc100 had the brilliant idea today to just skip function bodies. That is very simple to do - no need to write another parser for wasm, and also look at how simple this PR is - and also it will be faster to run wasm-emscripten-finalize in this mode than to run python. (With the only downside that the bytes of the wasm are loaded even if they aren't parsed; but almost certainly they are in the disk cache anyhow.) This PR implements that idea: when wasm-emscripten-finalize knows it will not write a wasm output, it notes "skip function bodies". The binary reader then skips the bodies and places unreachables there instead (so that the wasm still validates). There are no new tests here because this can't be tested - by design it is an unobservable optimization. (If we could notice the bodies have been skipped, we would not have skipped them.) This is also why no changes are needed on the emscripten side to benefit from this speedup. Basically when binaryen sees it will not need X, it skips parsing of X automatically. Benchmarking speed, it is as fast as you'd expect: the wasm-emscripten-finalize step is 15x faster on SQLite (1MB of wasm) and almost 50x faster on the biggest wasm I have on my drive (40MB of LLVM). (These numbers are on release builds, without debug info - debug into makes things slower, so the speedup is lower there, and will need further work.) Tested manually and also on wasm0 wasm2 other on emscripten.
* finalize: remove initializers from metadata output (#3479)Sam Clegg2021-01-111-13/+1
| | | See https://github.com/emscripten-core/emscripten/pull/13208
* finalize: there can only ever be a single initializer function. NFC. (#3452)Sam Clegg2020-12-181-3/+3
|
* Refactor printing code so that printing Expressions always works (#3450)Alon Zakai2020-12-171-5/+2
| | | | | | | | This avoids needing to add include wasm-printing if a file doesn't already have it. To achieve that, add the std::ostream hooks in wasm.h, and also use them when possible, removing the need for the special WasmPrinter object. Also stop printing in "full" (print types on each line) in error messages by default. The user can still get that, as always, using BINARYEN_PRINT_FULL=1 in the env.
* wasm-emscripten-finalize: Remove staticBump from metadata (#3300)Sam Clegg2020-10-291-25/+1
| | | | | | Emscripten no longer needs this information as of https://github.com/emscripten-core/emscripten/pull/12643. This also removes the need to export __data_end.
* Remove support for emscripten legacy PIC ABI (#3299)Sam Clegg2020-10-291-24/+6
|
* Remove now-redundant --mutable-sp flag from finalize (#3273)Sam Clegg2020-10-231-8/+0
|
* Remove now-redundant stack pointer manipulation passes (#3251)Sam Clegg2020-10-181-7/+0
| | | | The use of these passes was removed on the emscripten side in https://github.com/emscripten-core/emscripten/pull/12536.
* finalize: remove legacy support for "table" import (#3249)Sam Clegg2020-10-161-7/+0
| | | | | These days we always export the table, except in the case of dynamic linking, and even then we use the name `__indirect_function_table`.
* finalize: add --mutable-sp flag (#3250)Sam Clegg2020-10-151-3/+13
| | | | | | | This flag disables the features of `wasm-emscripten-finalize` the replace the mutable global import of `__stack_pointer`. See the corresponding emscripten change that depends on this one: https://github.com/emscripten-core/emscripten/pull/12536
* finalize: move more functionality behind legacyPIC (#3248)Sam Clegg2020-10-151-9/+11
| | | | | | | Internalizing of the stack pointer is only needed in legacy PIC mode, since in the new PIC mode we support mutable globals. Also the additional ASSIGN_GOT_ENTRIES function only exists in support of the legacy mode.
* Rename Emscripten EHSjLj functions in wasm backend (#3191)Heejin Ahn2020-10-101-2/+0
| | | | | | | | | | | Now that we are renaming invoke wrappers and `emscripten_longjmp_jmpbuf` in the wasm backend, this deletes all related renaming routines and relevant tests. Depends on #3192. Addresses: #3043 and #3081 Companions: https://reviews.llvm.org/D88697 emscripten-core/emscripten#12399
* Stop generating __growWasmMemory (#3180)Sam Clegg2020-10-011-1/+0
| | | This depends on https://github.com/emscripten-core/emscripten/pull/12391
* wasm-emscripten-finalize: Add --new-pic-abi option (#3118)Sam Clegg2020-09-111-3/+16
| | | | | | | | This option skips the PIC ABI transforms that are normally done by wasm-emscripten-finalize and keeps the llvm PIC ABI in place. The LLVM abi uses mutable globals (GOT.mem.foo and GOT.func.bar) for data and function offsets rather than accessor functions (g$foo and g$bar)
* wasm-emscripten-finalize: Don't rename the imported table (#3101)Alon Zakai2020-09-031-9/+4
| | | | | | | | | When minimizing wasm changes, leave it as __indirect_function_table which is what LLVM emits. This also removes the renaming of the memory. That was never needed as LLVM already emits "memory" there. See #3043
* wasm-emscripten-finalize: Add flags to limit dynCall creation (#3070)Sam Clegg2020-08-261-5/+25
| | | | | | Two new flags here, one to completely removes dynCalls, and another to limit them to only signatures that contains i64. See #3043
* Remove old EM_ASM handling method (#3069)Alon Zakai2020-08-211-0/+1
| | | | | | | The minimizeWasmChanges flag now does nothing (but new changes are coming, so keep it around) - this moves us to always doing the new way of things. With that we can update the tests. See #3043
* wasm-emscripten-finalize: Make EM_ASM modifications optional (#3044)Alon Zakai2020-08-191-0/+11
| | | | | | | | | | | | | | | | | | | | | | wasm-emscripten-finalize renames EM_ASM calls to have the signature in the name. This isn't actually useful - emscripten doesn't benefit from that. I think it was optimized in fastcomp, and in upstream we copied the general form but not the optimizations, and then EM_JS came along which is easier to optimize anyhow. This PR makes those changes optional: when not doing them, it just leaves the calls as they are. Emscripten will need some changes to handle that, but those are simple. For convenience this adds a flag to "minimize wasm changes". The idea is that this flag avoids needing a double-roll or other inconvenience as the changes need to happen in tandem on the emscripten side. The same flag can be reused for later changes similar to this one. When they are all done we can remove the flag. (Note how the code ifdefed by the flag can be removed once we no longer need the old way of doing things - that is, the new approach is simpler on the binaryen side). See #3043
* Make wasm-emscripten-finalize's output optional (#3055)Alon Zakai2020-08-171-18/+22
| | | | | | | | | | This helps towards the goal of allowing emscripten to not always modify the wasm during link. Until now wasm-emscripten-finalize always wrote an output, while with this PR it only does so if it was asked to, either by giving it an output filename, or asking for text output. The only noticeable change from this should be to make what was an error before (not specify an output or ask for text) into a non-error (run and print metadata, but do not write the wasm).
* Refactor wasm-emscripten-finalize to use a single pass runner (#2987)Sam Clegg2020-08-051-47/+41
|
* Move generateDynCallThunks into its own pass. NFC. (#3000)Sam Clegg2020-08-041-1/+3
| | | | | | The core logic is still living in EmscriptenGlueGenerator because its used also by fixInvokeFunctionNames. As a followup we can figure out how to make these more independent.
* Move stack-check into its own pass (#2994)Sam Clegg2020-07-271-3/+10
| | | | | This new pass takes an optional stack-check-handler argument which is the name of the function to call on stack overflow. If no argument is passed then it just traps.
* wasm-emscripten-finalize: remove exportWasiStart (#2986)Sam Clegg2020-07-271-4/+1
| | | | This should not be needed since in emscripten standalone mode we always include a crt1.o that includes _start.
* Move emscripten PIC ABI conversion to a pass. NFC. (#2985)Sam Clegg2020-07-241-1/+5
| | | | Doing it this way happens to re-order the __assign_got_entries function in the module, but its otherwise NFC.
* Move ReplaceStackPoint into a pass (#2984)Sam Clegg2020-07-241-1/+3
| | | First step in making wasm-emscripten-finalize use more passes.
* Support new clang mangling of main (__main_argc_argv) (#2671)Sam Clegg2020-06-101-0/+3
| | | | | | | | | The plan is that for standlone mode we can function just like wasi-sdk and call the correct main from crt1.c. For non-standalone mode we still want to export can call main directly so we rename __main_argc_argv back to main as part of finalize. See https://reviews.llvm.org/D70700
* Remove stackSave/stackAlloc/stackRestore code generation (#2852)Sam Clegg2020-05-201-1/+0
| | | | | | | These are now implemented in assembly as part of emscripten's compiler-rt. See: https://github.com/emscripten-core/emscripten/pull/11166
* JS/Wasm BigInt support for wasm-emscripten-finalize (#2726)Alon Zakai2020-04-071-2/+10
| | | | | If wasm-emscripten-finalize is given the BigInt flag, then we will be using BigInts on the JS side, and need no legalization at all since i64s will just be BigInts.
* Avoid fp$ access in MAIN_MODULES (#2704)Alon Zakai2020-03-271-6/+7
| | | | | | | | | | | | | | | | Depends on emscripten-core/emscripten#10741 which ensures that table indexes are unique. With that guarantee, a main module can just add its function pointers into the table, and use them based on that index. The loader will then see them in the table and then give other modules the identical function pointer for a function, ensuring function pointer equality. This avoids calling fp$ functions during startup for the main module's own functions (which are slow). We do still call fp$s of things we import from outside, as we don't have anything to put in the table for them, we depend on the loader for that. I suspect this can also be done with SIDE_MODULES, but did not want to try too much at once.
* Strip DWARF in finalize, to avoid keeping it around til later unnecessarily ↵Alon Zakai2019-12-201-0/+8
| | | | | | | | (#2544) Without this, the first wasm-opt invocation will remove it. But it can be very large, and we will soon start to automatically do updating on it when it exists, so avoid the work if we aren't actually building a final output with dwarf.
* Binary format code section offset tracking (#2515)Alon Zakai2019-12-191-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optionally track the binary format code section offsets, that is, when loading a binary, remember where each IR node was read from. This is necessary for DWARF debug info, as these are the offsets DWARF refers to. (Note that eventually we may want to do something else, like first read the DWARF and only then add debug info annotations into the IR in a more LLVM-like manner, but this is more straightforward and should be enough to update debug lines and ranges). This tracking adds noticeable overhead - every single IR node adds an entry in a map - so avoid it unless actually necessary. Specifically, if the user passes in -g and there are actually DWARF sections in the binary, and we are not about to remove those sections, then we need it. Print binary format code section offsets in text, when printing with -g. This will help debug and test dwarf support. It looks like ;; code offset: 0x7 as an annotation right before each node. Also add support for -g in wasm-opt tests (unlike a pass, it has just one - as a prefix). Helps #2400
* Support stack overflow checks in standalone mode (#2525)Alon Zakai2019-12-121-0/+2
| | | | | | | | | In normal mode we call a JS import, but we can't import from JS in standalone mode. Instead, just trap in that case with an unreachable. (The error reporting is not as good in this case, but at least it catches all errors and halts, and the emitted wasm is valid for standalone mode.) Helps emscripten-core/emscripten#10019
* Add some tracing to wasm-emscripten-finalize (#2505)Sam Clegg2019-12-051-8/+13
| | | | | Also fix, but in splitting the names of the trace channels. Obviously I can't write string.split correctly in C first time around.
* Convert to using DEBUG macros (#2497)Sam Clegg2019-12-041-4/+2
| | | | | | This means that debugging/tracing can now be enabled and controlled centrally without managing and passing state around the codebase.
* Support --pass-arg in ToolOptions. (#2429)Alon Zakai2019-11-111-0/+1
| | | | | | This will allow us to pass pass args to wasm-emscripten-finalize, which runs legalize-js-interface internally, which recently added an optional argument.
* Don't add __wasm_call_ctors to startup function list in wasm standalone mode ↵Sam Clegg2019-10-141-2/+6
| | | | | (#2384) In this mode crt1 takes care of calling it.
* wasm-emscripten-finalize: Add more checking of __data_end global (#2352)Sam Clegg2019-09-231-0/+3
|
* Add a --standalone-wasm flag to wasm-emscripten-finalize (#2333)Alon Zakai2019-09-181-1/+17
| | | The flag indicates that we want to run the wasm by itself, without JS support. In that case we don't emit JS dynCalls etc., and we also emit a wasi _start if there is a main, i.e., we try to use the current conventions in the wasm-only space.