summaryrefslogtreecommitdiff
path: root/src/wasm/wasm-io.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [Parser] Parse tags and throw (#6126)Thomas Lively2023-11-201-2/+1
| | | | Also fix the parser to correctly error if an imported item appears after a non-imported item and make the corresponding fix to the test.
* Encode command line to UTF8 on Windows (#5671)Derek Schuff2023-09-141-3/+11
| | | | | | | | | | | | | | | | This PR changes how file paths and the command line are handled. On startup on Windows, we process the wstring version of the command line (including the file paths) and re-encode it to UTF8 before handing it off to the rest of the command line handling logic. This means that all paths are stored in UTF8-encoded std::strings as they go through the program, right up until they are used to open files. At that time, they are converted to the appropriate native format with the new to_path function before passing to the stdlib open functions. This has the advantage that all of the non-file-opening code can use a single type to hold paths (which is good since std::filesystem::path has proved problematic in some cases), but has the disadvantage that someone could add new code that forgets to convert to_path before opening. That's somewhat mitigated by the fact that most of the code uses the ModuleIOBase classes for opening files. Fixes #4995
* Rename WasmBinaryBuilder to WasmBinaryReader (NFC) (#5767)Heejin Ahn2023-06-131-1/+1
| | | | | | We have `WasmBinaryBuilder` that read binary into Binaryen IR and `WasmBinaryWriter` that writes Binaryen IR to binary. To me `WasmBinaryBuilder` sounds similar to `WasmBinaryWriter`, which builds binary. How about renaming it to `WasmBinaryReader`?
* [NFC] Remove our bespoke `make_unique` implementation (#5613)Thomas Lively2023-03-311-2/+2
| | | | This code predates our adoption of C++14 and can now be removed in favor of `std::make_unique`, which should be more efficient.
* [Parser][NFC] Small code cleanups (#4729)Thomas Lively2022-06-141-1/+1
| | | | Apply cleanups suggested by aheejin in post-merge code review of previous parser PRs.
* [Parser] Begin parsing modules (#4716)Thomas Lively2022-06-101-4/+16
| | | | | | | | | | | Implement the basic infrastructure for the full WAT parser with just enough detail to parse basic modules that contain only imported globals. Parsing functions correspond to elements of the grammar in the text specification and are templatized over context types that correspond to each phase of parsing. Errors are explicitly propagated via `Result<T>` and `MaybeResult<T>` types. Follow-on PRs will implement additional phases of parsing and parsing for new elements in the grammar.
* Use ModuleReader::readStdin for file "-" (#4114)Thomas Lively2021-08-301-2/+2
| | | | | | | | | After #4106 we already treat the input file "-" as a shorthand for reading from stdin at the file.cpp level. However, the "-" input was still treated as a normal file name at the wasm-io.cpp file and as a result was always treated as text input. This commit updates wasm-io.cpp to use the stdin code path supporting both binary and text input for "-". Fixes #4105 (again).
* Apply features from the commandline first (#3960)Alon Zakai2021-07-021-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | As suggested in https://github.com/WebAssembly/binaryen/pull/3955#issuecomment-871016647 This applies commandline features first. If the features section is present, and disallows some of them, then we warn. Otherwise, the features can combine (for example, a wasm may enable feature X because it has to use it, and a user can simply add the flag for feature Y if they want the optimizer to try to use it; both flags will then be enabled). This is important because in some cases we need to know the features before parsing the wasm, in the case that the wasm does not use the features section. In particular, non-nullable GC locals have an effect during parsing. (Typed function references also does, but we found a way to apply its effect all the time, that is, always use the refined type, and that happened to not break the case where the feature is disabled - but such a workaround is not possible with non-nullable locals.) To make this less error-prone, add a FeatureSet input as a parameter to WasmBinaryBuilder. That is, when building a module, we must give it the features to use while doing so. This will unblock #3955 . That PR will also add a test for the actual usage of a feature during loading (the test can only be added there, after that PR unbreaks things).
* [wasm-split] Add an option to emit only the module names (#3901)Thomas Lively2021-05-251-0/+3
| | | | | | Even when other names are stripped, it can be useful for wasm-split to preserve the module name so that the split modules can be differentiated in stack traces. Adding this option to wasm-split requires adding similar options to ModuleWriter and WasmBinaryWriter.
* wasm-emscripten-finalize: Do not read the Names section when not writing ↵Alon Zakai2021-03-181-0/+1
| | | | | | | | | | | | | | | | | output (#3698) When not writing output we don't need debug info, as it is not relevant for our metadata. This saves loading and interning all the names, which takes several seconds on massive inputs. This is possible in principle in other tools, but this does not change anything in them for now. (We do use names internally in some nontrivial ways without opting in to it, so that would require further refactoring. Also the other tools almost always do write an output.) This is not 100% unobservable. If validation fails then the validation error would just contain the function index instead of the name from the Names section if there is one. However finalize does not validate atm so that would only matter if we change that later.
* Skip function bodies in wasm-emscripten-finalize when we don't need them (#3689)Alon Zakai2021-03-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After sbc100 's work on EM_ASM and EM_JS they are now parsed from the wasm using exports etc. and so we no longer need to parse function bodies. As a result if we are not emitting a wasm from wasm-emscripten-finalize then all we are doing is scanning global structures like imports and exports and emitting metadata about them. And indeed we do not need to emit a wasm in some cases, specifically when not optimizing and when using WASM_BIGINT (to avoid needing to legalize). We had considering skipping wasm-emscripten-finalize entirely in that situation, and instead to parse the metadata from the wasm in python on the emscripten side. However sbc100 had the brilliant idea today to just skip function bodies. That is very simple to do - no need to write another parser for wasm, and also look at how simple this PR is - and also it will be faster to run wasm-emscripten-finalize in this mode than to run python. (With the only downside that the bytes of the wasm are loaded even if they aren't parsed; but almost certainly they are in the disk cache anyhow.) This PR implements that idea: when wasm-emscripten-finalize knows it will not write a wasm output, it notes "skip function bodies". The binary reader then skips the bodies and places unreachables there instead (so that the wasm still validates). There are no new tests here because this can't be tested - by design it is an unobservable optimization. (If we could notice the bodies have been skipped, we would not have skipped them.) This is also why no changes are needed on the emscripten side to benefit from this speedup. Basically when binaryen sees it will not need X, it skips parsing of X automatically. Benchmarking speed, it is as fast as you'd expect: the wasm-emscripten-finalize step is 15x faster on SQLite (1MB of wasm) and almost 50x faster on the biggest wasm I have on my drive (40MB of LLVM). (These numbers are on release builds, without debug info - debug into makes things slower, so the speedup is lower there, and will need further work.) Tested manually and also on wasm0 wasm2 other on emscripten.
* Refactor printing code so that printing Expressions always works (#3450)Alon Zakai2020-12-171-1/+1
| | | | | | | | This avoids needing to add include wasm-printing if a file doesn't already have it. To achieve that, add the std::ostream hooks in wasm.h, and also use them when possible, removing the need for the special WasmPrinter object. Also stop printing in "full" (print types on each line) in error messages by default. The user can still get that, as always, using BINARYEN_PRINT_FULL=1 in the env.
* Poppy IR wast parsing and validation (#3105)Thomas Lively2020-09-091-4/+4
| | | | | Adds an IR profile to each function so the validator can determine which validation rules to apply and adds a flag to have the wast parser set the profile to Poppy for testing purposes.
* Binary format code section offset tracking (#2515)Alon Zakai2019-12-191-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optionally track the binary format code section offsets, that is, when loading a binary, remember where each IR node was read from. This is necessary for DWARF debug info, as these are the offsets DWARF refers to. (Note that eventually we may want to do something else, like first read the DWARF and only then add debug info annotations into the IR in a more LLVM-like manner, but this is more straightforward and should be enough to update debug lines and ranges). This tracking adds noticeable overhead - every single IR node adds an entry in a map - so avoid it unless actually necessary. Specifically, if the user passes in -g and there are actually DWARF sections in the binary, and we are not about to remove those sections, then we need it. Print binary format code section offsets in text, when printing with -g. This will help debug and test dwarf support. It looks like ;; code offset: 0x7 as an annotation right before each node. Also add support for -g in wasm-opt tests (unlike a pass, it has just one - as a prefix). Helps #2400
* Convert to using DEBUG macros (#2497)Sam Clegg2019-12-041-26/+21
| | | | | | This means that debugging/tracing can now be enabled and controlled centrally without managing and passing state around the codebase.
* clang-tidy braces changes (#2075)Alon Zakai2019-05-011-5/+10
| | | Applies the changes in #2065, and temprarily disables the hook since it's too slow to run on a change this large. We should re-enable it in a later commit.
* Apply format changes from #2048 (#2059)Alon Zakai2019-04-261-18/+29
| | | Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
* Allow tools to read from stdin (#1950)Thomas Lively2019-03-181-7/+39
| | | | This is necessary to write tests that don't require temporary files, such as in #1948, and is generally useful.
* wasm-opt source map support (#1557)Alon Zakai2018-06-071-3/+17
| | | | | | | | | | * support source map input in wasm-opt, refactoring the loading code into wasm-io * use wasm-io in wasm-as * support output source maps in wasm-opt * add a test for wasm-opt and source maps
* Fold wasm-link-metadata into wasm-emscripten-finalize (#1408)Jacob Gravelle2018-02-141-5/+20
| | | | | | | * wasm-link-metadata: Use `__data_end` symbol. * Add --global-base param to emscripten-wasm-finalize to compute staticBump properly * Let ModuleWriter write to a provided Output object
* First pass at LLD support for Emscripten (#1346)Jacob Gravelle2018-01-221-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Skeleton of a beginning of o2wasm, WIP and probably not going to be used * Get building post-cherry-pick * ast->ir, remove commented out code, include a debug module print because linking * Read linking section, print emscripten metadata json * WasmBinaryWriter emits user sections on Module * Remove debugging prints, everything that isn't needed to build metadata * Rename o2wasm to lld-metadata * lld-metadata support for outputting to file * Use tables index instead of function index for initializer functions * Add lld-emscripten tool to add emscripten-runtime functions to wasm modules (built with lld) * Handle EM_ASM in lld-emscripten * Add a list of functions to forcibly export (for initializer functions) * Disable incorrect initializer function reading * Add error printing when parsing .o files in lld-metadata * Remove ';; METADATA: ' prefix from lld-metadata, output is now standalone json * Support em_asm consts that aren't at the start of a segment * Initial test framework for lld-metadata tool * Add em_asm test * Add support for WASM_INIT_FUNCS in the linking section * Remove reloc section parsing because it's unused * lld-emscripten can read and write text * Add test harness for lld-emscripten * Export all functions for now * Add missing lld test output * Add support for reading object files differently Only difference so far is in importing mutable globals being an object file representation for symbols, but invalid wasm. * Update help strings * Update linking tests for stackAlloc fix * Rename lld-emscripten,lld-metadata to wasm-emscripten-finalize,wasm-link-metadata * Add help text to header comments * auto& instead of auto & * Extract LinkType to abi/wasm-object.h * Remove special handling for wasm object file reading, allow mutable globals * Add braces around default switch case * Fix flake8 errors * Handle generating dyncall thunks for imports as well * Use explicit bool for stackPointerGlobal * Use glob patterns for lld file iteration * Use __wasm_call_ctors for all initializer functions
* Exporting/importing debug location information from .wast/.asm.js/.s formats ↵Yury Delendik2017-06-011-2/+11
| | | | | | | | (#1017) * Extends wasm-as, wasm-dis and s2wasm to consume debug locations. * Exports source map from asm2wasm
* only read first 4 bytes to check if a file is a wasm binary (#894)Alon Zakai2017-02-021-3/+8
|
* Read/Write Abstraction (#889)Alon Zakai2017-01-261-0/+86
* Added ModuleReader/Writer classes that support text and binary I/O * Use them in wasm-opt and asm2wasm