summaryrefslogtreecommitdiff
path: root/src/wasm/wasm-io.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [NFC] Encapsulate source map reader state (#7132)Thomas Lively2024-12-031-13/+6
| | | | | | | | | | | | Move all state relevant to reading source maps out of WasmBinaryReader and into a new utility, SourceMapReader. This is a prerequisite for parallelizing the parsing of function bodies, since the source map reader state is different at the beginning of each function. Also take the opportunity to simplify the way we read source maps, for example by deferring the reading of anything but the position of a debug location until it will be used and by using `std::optional` instead of singleton `std::set`s to store function prologue and epilogue debug locations.
* Remove obsolete parser code (#6607)Thomas Lively2024-05-291-12/+3
| | | | | Remove `SExpressionParser`, `SExpressionWasmBuilder`, and `cashew::Parser`. Simplify gen-s-parser.py. Remove the --new-wat-parser and --deprecated-wat-parser flags.
* [StackIR] Run StackIR during binary writing and not as a pass (#6568)Alon Zakai2024-05-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously we had passes --generate-stack-ir, --optimize-stack-ir, --print-stack-ir that could be run like any other passes. After generating StackIR it was stashed on the function and invalidated if we modified BinaryenIR. If it wasn't invalidated then it was used during binary writing. This PR switches things so that we optionally generate, optimize, and print StackIR only during binary writing. It also removes all traces of StackIR from wasm.h - after this, StackIR is a feature of binary writing (and printing) logic only. This is almost NFC, but there are some minor noticeable differences: 1. We no longer print has StackIR in the text format when we see it is there. It will not be there during normal printing, as it is only present during binary writing. (but --print-stack-ir still works as before; as mentioned above it runs during writing). 2. --generate/optimize/print-stack-ir change from being passes to being flags that control that behavior instead. As passes, their order on the commandline mattered, while now it does not, and they only "globally" affect things during writing. 3. The C API changes slightly, as there is no need to pass it an option "optimize" to the StackIR APIs. Whether we optimize is handled by --optimize-stack-ir which is set like other optimization flags on the PassOptions object, so we don't need the old option to those C APIs. The main benefit here is simplifying the code, so we don't need to think about StackIR in more places than just binary writing. That may also allow future improvements to our usage of StackIR.
* [Parser] Do not eagerly lex parens (#6540)Thomas Lively2024-04-251-1/+0
| | | | | | | | | | | The lexer currently lexes tokens eagerly and stores them in a `Token` variant ahead of when they are actually requested by the parser. It is wasteful, however, to classify tokens before they are requested by the parser because it is likely that the next token will be precisely the kind the parser requests. The work of checking and rejecting other possible classifications ahead of time is not useful. To make incremental progress toward removing `Token` completely, lex parentheses on demand instead of eagerly.
* [Parser] Enable the new text parser by default (#6371)Thomas Lively2024-04-251-1/+1
| | | | | | | | | | | | | | The new text parser is faster and more standards compliant than the old text parser. Enable it by default in wasm-opt and update the tests to reflect the slightly different results it produces. Besides following the spec, the new parser differs from the old parser in that it: - Does not synthesize `loop` and `try` labels unnecessarily - Synthesizes different block names in some cases - Parses exports in a different order - Parses `nop`s instead of empty blocks for empty control flow arms - Does not support parsing Poppy IR - Produces different error messages - Cannot parse `pop` except as the first instruction inside a `catch`
* Do not add an extra null character when reading files (#6538)Thomas Lively2024-04-241-2/+1
| | | | | | | | The new wat parser currently considers itself to be at the end of the file whenever it cannot lex another token. This is not quite right, but fixing it causes parser errors because of the extra null character we were appending to files when we read them. This null character is not useful since we can already read files as `std::string`, which always has an implicit null character, so remove it. Clean up some users of `read_file` while we're at it.
* [Parser] Parse tags and throw (#6126)Thomas Lively2023-11-201-2/+1
| | | | Also fix the parser to correctly error if an imported item appears after a non-imported item and make the corresponding fix to the test.
* Encode command line to UTF8 on Windows (#5671)Derek Schuff2023-09-141-3/+11
| | | | | | | | | | | | | | | | This PR changes how file paths and the command line are handled. On startup on Windows, we process the wstring version of the command line (including the file paths) and re-encode it to UTF8 before handing it off to the rest of the command line handling logic. This means that all paths are stored in UTF8-encoded std::strings as they go through the program, right up until they are used to open files. At that time, they are converted to the appropriate native format with the new to_path function before passing to the stdlib open functions. This has the advantage that all of the non-file-opening code can use a single type to hold paths (which is good since std::filesystem::path has proved problematic in some cases), but has the disadvantage that someone could add new code that forgets to convert to_path before opening. That's somewhat mitigated by the fact that most of the code uses the ModuleIOBase classes for opening files. Fixes #4995
* Rename WasmBinaryBuilder to WasmBinaryReader (NFC) (#5767)Heejin Ahn2023-06-131-1/+1
| | | | | | We have `WasmBinaryBuilder` that read binary into Binaryen IR and `WasmBinaryWriter` that writes Binaryen IR to binary. To me `WasmBinaryBuilder` sounds similar to `WasmBinaryWriter`, which builds binary. How about renaming it to `WasmBinaryReader`?
* [NFC] Remove our bespoke `make_unique` implementation (#5613)Thomas Lively2023-03-311-2/+2
| | | | This code predates our adoption of C++14 and can now be removed in favor of `std::make_unique`, which should be more efficient.
* [Parser][NFC] Small code cleanups (#4729)Thomas Lively2022-06-141-1/+1
| | | | Apply cleanups suggested by aheejin in post-merge code review of previous parser PRs.
* [Parser] Begin parsing modules (#4716)Thomas Lively2022-06-101-4/+16
| | | | | | | | | | | Implement the basic infrastructure for the full WAT parser with just enough detail to parse basic modules that contain only imported globals. Parsing functions correspond to elements of the grammar in the text specification and are templatized over context types that correspond to each phase of parsing. Errors are explicitly propagated via `Result<T>` and `MaybeResult<T>` types. Follow-on PRs will implement additional phases of parsing and parsing for new elements in the grammar.
* Use ModuleReader::readStdin for file "-" (#4114)Thomas Lively2021-08-301-2/+2
| | | | | | | | | After #4106 we already treat the input file "-" as a shorthand for reading from stdin at the file.cpp level. However, the "-" input was still treated as a normal file name at the wasm-io.cpp file and as a result was always treated as text input. This commit updates wasm-io.cpp to use the stdin code path supporting both binary and text input for "-". Fixes #4105 (again).
* Apply features from the commandline first (#3960)Alon Zakai2021-07-021-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | As suggested in https://github.com/WebAssembly/binaryen/pull/3955#issuecomment-871016647 This applies commandline features first. If the features section is present, and disallows some of them, then we warn. Otherwise, the features can combine (for example, a wasm may enable feature X because it has to use it, and a user can simply add the flag for feature Y if they want the optimizer to try to use it; both flags will then be enabled). This is important because in some cases we need to know the features before parsing the wasm, in the case that the wasm does not use the features section. In particular, non-nullable GC locals have an effect during parsing. (Typed function references also does, but we found a way to apply its effect all the time, that is, always use the refined type, and that happened to not break the case where the feature is disabled - but such a workaround is not possible with non-nullable locals.) To make this less error-prone, add a FeatureSet input as a parameter to WasmBinaryBuilder. That is, when building a module, we must give it the features to use while doing so. This will unblock #3955 . That PR will also add a test for the actual usage of a feature during loading (the test can only be added there, after that PR unbreaks things).
* [wasm-split] Add an option to emit only the module names (#3901)Thomas Lively2021-05-251-0/+3
| | | | | | Even when other names are stripped, it can be useful for wasm-split to preserve the module name so that the split modules can be differentiated in stack traces. Adding this option to wasm-split requires adding similar options to ModuleWriter and WasmBinaryWriter.
* wasm-emscripten-finalize: Do not read the Names section when not writing ↵Alon Zakai2021-03-181-0/+1
| | | | | | | | | | | | | | | | | output (#3698) When not writing output we don't need debug info, as it is not relevant for our metadata. This saves loading and interning all the names, which takes several seconds on massive inputs. This is possible in principle in other tools, but this does not change anything in them for now. (We do use names internally in some nontrivial ways without opting in to it, so that would require further refactoring. Also the other tools almost always do write an output.) This is not 100% unobservable. If validation fails then the validation error would just contain the function index instead of the name from the Names section if there is one. However finalize does not validate atm so that would only matter if we change that later.
* Skip function bodies in wasm-emscripten-finalize when we don't need them (#3689)Alon Zakai2021-03-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After sbc100 's work on EM_ASM and EM_JS they are now parsed from the wasm using exports etc. and so we no longer need to parse function bodies. As a result if we are not emitting a wasm from wasm-emscripten-finalize then all we are doing is scanning global structures like imports and exports and emitting metadata about them. And indeed we do not need to emit a wasm in some cases, specifically when not optimizing and when using WASM_BIGINT (to avoid needing to legalize). We had considering skipping wasm-emscripten-finalize entirely in that situation, and instead to parse the metadata from the wasm in python on the emscripten side. However sbc100 had the brilliant idea today to just skip function bodies. That is very simple to do - no need to write another parser for wasm, and also look at how simple this PR is - and also it will be faster to run wasm-emscripten-finalize in this mode than to run python. (With the only downside that the bytes of the wasm are loaded even if they aren't parsed; but almost certainly they are in the disk cache anyhow.) This PR implements that idea: when wasm-emscripten-finalize knows it will not write a wasm output, it notes "skip function bodies". The binary reader then skips the bodies and places unreachables there instead (so that the wasm still validates). There are no new tests here because this can't be tested - by design it is an unobservable optimization. (If we could notice the bodies have been skipped, we would not have skipped them.) This is also why no changes are needed on the emscripten side to benefit from this speedup. Basically when binaryen sees it will not need X, it skips parsing of X automatically. Benchmarking speed, it is as fast as you'd expect: the wasm-emscripten-finalize step is 15x faster on SQLite (1MB of wasm) and almost 50x faster on the biggest wasm I have on my drive (40MB of LLVM). (These numbers are on release builds, without debug info - debug into makes things slower, so the speedup is lower there, and will need further work.) Tested manually and also on wasm0 wasm2 other on emscripten.
* Refactor printing code so that printing Expressions always works (#3450)Alon Zakai2020-12-171-1/+1
| | | | | | | | This avoids needing to add include wasm-printing if a file doesn't already have it. To achieve that, add the std::ostream hooks in wasm.h, and also use them when possible, removing the need for the special WasmPrinter object. Also stop printing in "full" (print types on each line) in error messages by default. The user can still get that, as always, using BINARYEN_PRINT_FULL=1 in the env.
* Poppy IR wast parsing and validation (#3105)Thomas Lively2020-09-091-4/+4
| | | | | Adds an IR profile to each function so the validator can determine which validation rules to apply and adds a flag to have the wast parser set the profile to Poppy for testing purposes.
* Binary format code section offset tracking (#2515)Alon Zakai2019-12-191-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optionally track the binary format code section offsets, that is, when loading a binary, remember where each IR node was read from. This is necessary for DWARF debug info, as these are the offsets DWARF refers to. (Note that eventually we may want to do something else, like first read the DWARF and only then add debug info annotations into the IR in a more LLVM-like manner, but this is more straightforward and should be enough to update debug lines and ranges). This tracking adds noticeable overhead - every single IR node adds an entry in a map - so avoid it unless actually necessary. Specifically, if the user passes in -g and there are actually DWARF sections in the binary, and we are not about to remove those sections, then we need it. Print binary format code section offsets in text, when printing with -g. This will help debug and test dwarf support. It looks like ;; code offset: 0x7 as an annotation right before each node. Also add support for -g in wasm-opt tests (unlike a pass, it has just one - as a prefix). Helps #2400
* Convert to using DEBUG macros (#2497)Sam Clegg2019-12-041-26/+21
| | | | | | This means that debugging/tracing can now be enabled and controlled centrally without managing and passing state around the codebase.
* clang-tidy braces changes (#2075)Alon Zakai2019-05-011-5/+10
| | | Applies the changes in #2065, and temprarily disables the hook since it's too slow to run on a change this large. We should re-enable it in a later commit.
* Apply format changes from #2048 (#2059)Alon Zakai2019-04-261-18/+29
| | | Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
* Allow tools to read from stdin (#1950)Thomas Lively2019-03-181-7/+39
| | | | This is necessary to write tests that don't require temporary files, such as in #1948, and is generally useful.
* wasm-opt source map support (#1557)Alon Zakai2018-06-071-3/+17
| | | | | | | | | | * support source map input in wasm-opt, refactoring the loading code into wasm-io * use wasm-io in wasm-as * support output source maps in wasm-opt * add a test for wasm-opt and source maps
* Fold wasm-link-metadata into wasm-emscripten-finalize (#1408)Jacob Gravelle2018-02-141-5/+20
| | | | | | | * wasm-link-metadata: Use `__data_end` symbol. * Add --global-base param to emscripten-wasm-finalize to compute staticBump properly * Let ModuleWriter write to a provided Output object
* First pass at LLD support for Emscripten (#1346)Jacob Gravelle2018-01-221-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Skeleton of a beginning of o2wasm, WIP and probably not going to be used * Get building post-cherry-pick * ast->ir, remove commented out code, include a debug module print because linking * Read linking section, print emscripten metadata json * WasmBinaryWriter emits user sections on Module * Remove debugging prints, everything that isn't needed to build metadata * Rename o2wasm to lld-metadata * lld-metadata support for outputting to file * Use tables index instead of function index for initializer functions * Add lld-emscripten tool to add emscripten-runtime functions to wasm modules (built with lld) * Handle EM_ASM in lld-emscripten * Add a list of functions to forcibly export (for initializer functions) * Disable incorrect initializer function reading * Add error printing when parsing .o files in lld-metadata * Remove ';; METADATA: ' prefix from lld-metadata, output is now standalone json * Support em_asm consts that aren't at the start of a segment * Initial test framework for lld-metadata tool * Add em_asm test * Add support for WASM_INIT_FUNCS in the linking section * Remove reloc section parsing because it's unused * lld-emscripten can read and write text * Add test harness for lld-emscripten * Export all functions for now * Add missing lld test output * Add support for reading object files differently Only difference so far is in importing mutable globals being an object file representation for symbols, but invalid wasm. * Update help strings * Update linking tests for stackAlloc fix * Rename lld-emscripten,lld-metadata to wasm-emscripten-finalize,wasm-link-metadata * Add help text to header comments * auto& instead of auto & * Extract LinkType to abi/wasm-object.h * Remove special handling for wasm object file reading, allow mutable globals * Add braces around default switch case * Fix flake8 errors * Handle generating dyncall thunks for imports as well * Use explicit bool for stackPointerGlobal * Use glob patterns for lld file iteration * Use __wasm_call_ctors for all initializer functions
* Exporting/importing debug location information from .wast/.asm.js/.s formats ↵Yury Delendik2017-06-011-2/+11
| | | | | | | | (#1017) * Extends wasm-as, wasm-dis and s2wasm to consume debug locations. * Exports source map from asm2wasm
* only read first 4 bytes to check if a file is a wasm binary (#894)Alon Zakai2017-02-021-3/+8
|
* Read/Write Abstraction (#889)Alon Zakai2017-01-261-0/+86
* Added ModuleReader/Writer classes that support text and binary I/O * Use them in wasm-opt and asm2wasm