| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Move all state relevant to reading source maps out of WasmBinaryReader
and into a new utility, SourceMapReader. This is a prerequisite for
parallelizing the parsing of function bodies, since the source map
reader state is different at the beginning of each function.
Also take the opportunity to simplify the way we read source maps, for
example by deferring the reading of anything but the position of a debug
location until it will be used and by using `std::optional` instead of
singleton `std::set`s to store function prologue and epilogue debug
locations.
|
|
|
|
|
| |
Remove `SExpressionParser`, `SExpressionWasmBuilder`, and `cashew::Parser`.
Simplify gen-s-parser.py. Remove the --new-wat-parser and
--deprecated-wat-parser flags.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we had passes --generate-stack-ir, --optimize-stack-ir, --print-stack-ir
that could be run like any other passes. After generating StackIR it was stashed on
the function and invalidated if we modified BinaryenIR. If it wasn't invalidated then
it was used during binary writing. This PR switches things so that we optionally
generate, optimize, and print StackIR only during binary writing. It also removes
all traces of StackIR from wasm.h - after this, StackIR is a feature of binary writing
(and printing) logic only.
This is almost NFC, but there are some minor noticeable differences:
1. We no longer print has StackIR in the text format when we see it is there. It
will not be there during normal printing, as it is only present during binary writing.
(but --print-stack-ir still works as before; as mentioned above it runs during writing).
2. --generate/optimize/print-stack-ir change from being passes to being flags that
control that behavior instead. As passes, their order on the commandline mattered,
while now it does not, and they only "globally" affect things during writing.
3. The C API changes slightly, as there is no need to pass it an option "optimize" to
the StackIR APIs. Whether we optimize is handled by --optimize-stack-ir which is
set like other optimization flags on the PassOptions object, so we don't need the
old option to those C APIs.
The main benefit here is simplifying the code, so we don't need to think about
StackIR in more places than just binary writing. That may also allow future
improvements to our usage of StackIR.
|
|
|
|
|
|
|
|
|
|
|
| |
The lexer currently lexes tokens eagerly and stores them in a `Token` variant
ahead of when they are actually requested by the parser. It is wasteful,
however, to classify tokens before they are requested by the parser because it
is likely that the next token will be precisely the kind the parser requests.
The work of checking and rejecting other possible classifications ahead of time
is not useful.
To make incremental progress toward removing `Token` completely, lex parentheses
on demand instead of eagerly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new text parser is faster and more standards compliant than the old text
parser. Enable it by default in wasm-opt and update the tests to reflect the
slightly different results it produces. Besides following the spec, the new
parser differs from the old parser in that it:
- Does not synthesize `loop` and `try` labels unnecessarily
- Synthesizes different block names in some cases
- Parses exports in a different order
- Parses `nop`s instead of empty blocks for empty control flow arms
- Does not support parsing Poppy IR
- Produces different error messages
- Cannot parse `pop` except as the first instruction inside a `catch`
|
|
|
|
|
|
|
|
| |
The new wat parser currently considers itself to be at the end of the file
whenever it cannot lex another token. This is not quite right, but fixing it
causes parser errors because of the extra null character we were appending to
files when we read them. This null character is not useful since we can already
read files as `std::string`, which always has an implicit null character, so
remove it. Clean up some users of `read_file` while we're at it.
|
|
|
|
| |
Also fix the parser to correctly error if an imported item appears after a
non-imported item and make the corresponding fix to the test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR changes how file paths and the command line are handled. On startup on Windows,
we process the wstring version of the command line (including the file paths) and re-encode
it to UTF8 before handing it off to the rest of the command line handling logic. This means
that all paths are stored in UTF8-encoded std::strings as they go through the program, right
up until they are used to open files. At that time, they are converted to the appropriate native
format with the new to_path function before passing to the stdlib open functions.
This has the advantage that all of the non-file-opening code can use a single type to hold paths
(which is good since std::filesystem::path has proved problematic in some cases), but has the
disadvantage that someone could add new code that forgets to convert to_path before
opening. That's somewhat mitigated by the fact that most of the code uses the ModuleIOBase
classes for opening files.
Fixes #4995
|
|
|
|
|
|
| |
We have `WasmBinaryBuilder` that read binary into Binaryen IR and
`WasmBinaryWriter` that writes Binaryen IR to binary. To me
`WasmBinaryBuilder` sounds similar to `WasmBinaryWriter`, which builds
binary. How about renaming it to `WasmBinaryReader`?
|
|
|
|
| |
This code predates our adoption of C++14 and can now be removed in favor of
`std::make_unique`, which should be more efficient.
|
|
|
|
| |
Apply cleanups suggested by aheejin in post-merge code review of previous
parser PRs.
|
|
|
|
|
|
|
|
|
|
|
| |
Implement the basic infrastructure for the full WAT parser with just enough
detail to parse basic modules that contain only imported globals. Parsing
functions correspond to elements of the grammar in the text specification and
are templatized over context types that correspond to each phase of parsing.
Errors are explicitly propagated via `Result<T>` and `MaybeResult<T>` types.
Follow-on PRs will implement additional phases of parsing and parsing for new
elements in the grammar.
|
|
|
|
|
|
|
|
|
| |
After #4106 we already treat the input file "-" as a shorthand for reading from
stdin at the file.cpp level. However, the "-" input was still treated as a
normal file name at the wasm-io.cpp file and as a result was always treated as
text input. This commit updates wasm-io.cpp to use the stdin code path
supporting both binary and text input for "-".
Fixes #4105 (again).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As suggested in
https://github.com/WebAssembly/binaryen/pull/3955#issuecomment-871016647
This applies commandline features first. If the features section is present, and
disallows some of them, then we warn. Otherwise, the features can combine
(for example, a wasm may enable feature X because it has to use it, and a user
can simply add the flag for feature Y if they want the optimizer to try to use it;
both flags will then be enabled).
This is important because in some cases we need to know the features before
parsing the wasm, in the case that the wasm does not use the features section.
In particular, non-nullable GC locals have an effect during parsing. (Typed
function references also does, but we found a way to apply its effect all the time,
that is, always use the refined type, and that happened to not break the case
where the feature is disabled - but such a workaround is not possible with
non-nullable locals.)
To make this less error-prone, add a FeatureSet input as a parameter to
WasmBinaryBuilder. That is, when building a module, we must give it the
features to use while doing so.
This will unblock #3955 . That PR will also add a test for the actual usage
of a feature during loading (the test can only be added there, after that PR
unbreaks things).
|
|
|
|
|
|
| |
Even when other names are stripped, it can be useful for wasm-split to preserve
the module name so that the split modules can be differentiated in stack traces.
Adding this option to wasm-split requires adding similar options to ModuleWriter
and WasmBinaryWriter.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
output (#3698)
When not writing output we don't need debug info, as it is not relevant for
our metadata. This saves loading and interning all the names, which takes
several seconds on massive inputs.
This is possible in principle in other tools, but this does not change anything
in them for now. (We do use names internally in some nontrivial ways without
opting in to it, so that would require further refactoring. Also the other tools
almost always do write an output.)
This is not 100% unobservable. If validation fails then the validation error would
just contain the function index instead of the name from the Names section if
there is one. However finalize does not validate atm so that would only matter
if we change that later.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After sbc100 's work on EM_ASM and EM_JS they are now parsed from
the wasm using exports etc. and so we no longer need to parse function bodies.
As a result if we are not emitting a wasm from wasm-emscripten-finalize then all we are
doing is scanning global structures like imports and exports and emitting metadata
about them. And indeed we do not need to emit a wasm in some cases, specifically
when not optimizing and when using WASM_BIGINT (to avoid needing to
legalize).
We had considering skipping wasm-emscripten-finalize entirely in that situation,
and instead to parse the metadata from the wasm in python on the emscripten
side. However sbc100 had the brilliant idea today to just skip function bodies.
That is very simple to do - no need to write another parser for wasm, and also
look at how simple this PR is - and also it will be faster to run
wasm-emscripten-finalize in this mode than to run python. (With the only
downside that the bytes of the wasm are loaded even if they aren't parsed; but
almost certainly they are in the disk cache anyhow.)
This PR implements that idea: when wasm-emscripten-finalize knows it will
not write a wasm output, it notes "skip function bodies". The binary reader then
skips the bodies and places unreachables there instead (so that the wasm still
validates).
There are no new tests here because this can't be tested - by design it is an
unobservable optimization. (If we could notice the bodies have been skipped,
we would not have skipped them.) This is also why no changes are needed on
the emscripten side to benefit from this speedup. Basically when binaryen sees
it will not need X, it skips parsing of X automatically.
Benchmarking speed, it is as fast as you'd expect: the wasm-emscripten-finalize
step is 15x faster on SQLite (1MB of wasm) and almost 50x faster on the biggest
wasm I have on my drive (40MB of LLVM). (These numbers are on release
builds, without debug info - debug into makes things slower, so the speedup is
lower there, and will need further work.)
Tested manually and also on wasm0 wasm2 other on emscripten.
|
|
|
|
|
|
|
|
| |
This avoids needing to add include wasm-printing if a file doesn't already have it.
To achieve that, add the std::ostream hooks in wasm.h, and also use them
when possible, removing the need for the special WasmPrinter object.
Also stop printing in "full" (print types on each line) in error messages by default. The
user can still get that, as always, using BINARYEN_PRINT_FULL=1 in the env.
|
|
|
|
|
| |
Adds an IR profile to each function so the validator can determine
which validation rules to apply and adds a flag to have the wast
parser set the profile to Poppy for testing purposes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optionally track the binary format code section offsets,
that is, when loading a binary, remember where each IR
node was read from. This is necessary for DWARF
debug info, as these are the offsets DWARF refers to.
(Note that eventually we may want to do something
else, like first read the DWARF and only then add
debug info annotations into the IR in a more LLVM-like
manner, but this is more straightforward and should be
enough to update debug lines and ranges).
This tracking adds noticeable overhead - every single
IR node adds an entry in a map - so avoid it unless
actually necessary. Specifically, if the user passes in
-g and there are actually DWARF sections in the
binary, and we are not about to remove those sections,
then we need it.
Print binary format code section offsets in text, when
printing with -g. This will help debug and test dwarf
support. It looks like
;; code offset: 0x7
as an annotation right before each node.
Also add support for -g in wasm-opt tests (unlike
a pass, it has just one - as a prefix).
Helps #2400
|
|
|
|
|
|
| |
This means that debugging/tracing can now be enabled and controlled
centrally without managing and passing state around the codebase.
|
|
|
| |
Applies the changes in #2065, and temprarily disables the hook since it's too slow to run on a change this large. We should re-enable it in a later commit.
|
|
|
| |
Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
|
|
|
|
| |
This is necessary to write tests that don't require temporary files,
such as in #1948, and is generally useful.
|
|
|
|
|
|
|
|
|
|
| |
* support source map input in wasm-opt, refactoring the loading code into wasm-io
* use wasm-io in wasm-as
* support output source maps in wasm-opt
* add a test for wasm-opt and source maps
|
|
|
|
|
|
|
| |
* wasm-link-metadata: Use `__data_end` symbol.
* Add --global-base param to emscripten-wasm-finalize to compute staticBump properly
* Let ModuleWriter write to a provided Output object
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Skeleton of a beginning of o2wasm, WIP and probably not going to be used
* Get building post-cherry-pick
* ast->ir, remove commented out code, include a debug module print because linking
* Read linking section, print emscripten metadata json
* WasmBinaryWriter emits user sections on Module
* Remove debugging prints, everything that isn't needed to build metadata
* Rename o2wasm to lld-metadata
* lld-metadata support for outputting to file
* Use tables index instead of function index for initializer functions
* Add lld-emscripten tool to add emscripten-runtime functions to wasm modules (built with lld)
* Handle EM_ASM in lld-emscripten
* Add a list of functions to forcibly export (for initializer functions)
* Disable incorrect initializer function reading
* Add error printing when parsing .o files in lld-metadata
* Remove ';; METADATA: ' prefix from lld-metadata, output is now standalone json
* Support em_asm consts that aren't at the start of a segment
* Initial test framework for lld-metadata tool
* Add em_asm test
* Add support for WASM_INIT_FUNCS in the linking section
* Remove reloc section parsing because it's unused
* lld-emscripten can read and write text
* Add test harness for lld-emscripten
* Export all functions for now
* Add missing lld test output
* Add support for reading object files differently
Only difference so far is in importing mutable globals being an object
file representation for symbols, but invalid wasm.
* Update help strings
* Update linking tests for stackAlloc fix
* Rename lld-emscripten,lld-metadata to wasm-emscripten-finalize,wasm-link-metadata
* Add help text to header comments
* auto& instead of auto &
* Extract LinkType to abi/wasm-object.h
* Remove special handling for wasm object file reading, allow mutable globals
* Add braces around default switch case
* Fix flake8 errors
* Handle generating dyncall thunks for imports as well
* Use explicit bool for stackPointerGlobal
* Use glob patterns for lld file iteration
* Use __wasm_call_ctors for all initializer functions
|
|
|
|
|
|
|
|
| |
(#1017)
* Extends wasm-as, wasm-dis and s2wasm to consume debug locations.
* Exports source map from asm2wasm
|
| |
|
|
* Added ModuleReader/Writer classes that support text and binary I/O
* Use them in wasm-opt and asm2wasm
|