| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
| |
--skip-pass can now be specified more than once on the commandline.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Each pass instance can now store an argument for it, which can be different.
This may be a breaking change for the corner case of running a pass multiple
times and setting the pass's argument multiple times as well (before, the last
pass argument affected them all; now, it affects the last instance only). This
only affects arguments with the name of a pass; others remain global, as
before (and multiple passes can read them, in fact). See the CHANGELOG for
details.
Fixes #6646
|
|
|
|
|
|
|
|
|
| |
Normally we use it when optimizing (above a certain level). This lets the user
prevent it from being used even then.
Also add optimization options to wasm-metadce so that this is possible
there as well and not just in wasm-opt (this also opens the door to running
more passes in metadce, which may be useful later).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we had passes --generate-stack-ir, --optimize-stack-ir, --print-stack-ir
that could be run like any other passes. After generating StackIR it was stashed on
the function and invalidated if we modified BinaryenIR. If it wasn't invalidated then
it was used during binary writing. This PR switches things so that we optionally
generate, optimize, and print StackIR only during binary writing. It also removes
all traces of StackIR from wasm.h - after this, StackIR is a feature of binary writing
(and printing) logic only.
This is almost NFC, but there are some minor noticeable differences:
1. We no longer print has StackIR in the text format when we see it is there. It
will not be there during normal printing, as it is only present during binary writing.
(but --print-stack-ir still works as before; as mentioned above it runs during writing).
2. --generate/optimize/print-stack-ir change from being passes to being flags that
control that behavior instead. As passes, their order on the commandline mattered,
while now it does not, and they only "globally" affect things during writing.
3. The C API changes slightly, as there is no need to pass it an option "optimize" to
the StackIR APIs. Whether we optimize is handled by --optimize-stack-ir which is
set like other optimization flags on the PassOptions object, so we don't need the
old option to those C APIs.
The main benefit here is simplifying the code, so we don't need to think about
StackIR in more places than just binary writing. That may also allow future
improvements to our usage of StackIR.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We tested --generate-global-effects --vacuum and such, but not
--generate-global-effects -O3 or the other -O flags. Unfortunately, our
targeted testing missed a bug because of that. Specifically, we have special
logic for -O flags to make sure the passes they expand into run with the
proper opt and shrink levels, but that logic happened to also interfere with
global effect computation. It would also interfere with allowing GUFA info
or other things to be stored on the side, which we've proposed. This PR
fixes that + future issues.
The fix is to just allow a pass runner to execute more than once. We thought
to avoid that and assert against it to keep the model "hermetic" (you create
a pass runner, you run the passes, and you throw it out), which feels nice in
a way, but it led to the bug here, and I'm not sure it would prevent any other
ones really. It is also more code. It is simpler to allow a runner to execute more
than once, and add a method to clear it. With that, the logic for -O3 execution
is both simpler and does not interfere with anything but the opt and shrink
level flags: we create a single runner, give it the proper options, and then keep
using that runner + those options as we go, normally.
|
|
|
|
|
|
|
|
|
|
|
| |
This is a followup to #5333 . That fixed the selection of which passes to run, but
forgot to also fix the global state of the current optimize/shrink levels. This PR
fixes that. As a result, running -O3 -Oz will now work as expected: the first -O3
will run the right passes (as #5333 fixed) and while running them, the global
optimize/shrinkLevels will be -O3 (and not -Oz), which this PR fixes.
A specific result of this is that -O3 -Oz used to inline less, since the invocation
of inlining during -O3 thought we were optimizing for size. The new test verifies
that we do fully inline in the first -O3 now.
|
|
|
|
|
|
|
|
| |
For example,
-O3 --skip-pass=vacuum
will run -O3 normally but it will not run the vacuum pass at all
(which normally runs more than once in -O3).
|
|
|
|
| |
This allows tools like wasm-reduce to be told to operate in closed-world mode. That
lets them validate in the more strict way of that mode.
|
|
|
|
|
|
|
|
|
|
| |
Previously -O3 -O1 would run -O1 twice since the last flag set the global opt level
to 1, and then all invocations of the optimizer pipeline read that. This makes each
pipeline define its own opt level.
This has been a long-standing annoyance, which wasn't much noticed except that
with wasm GC there is more of a need to run the optimization pipeline more than
once. And sometimes it is nice to run different levels.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change we default to an open world, that is, we do the safe thing
by default: we no longer assume a closed world. Users that want a closed
world must pass --closed-world.
Atm we just do not run passes that assume a closed world. (We might later
refine them to find which types don't escape and only optimize those.) The
RemoveUnusedModuleElements is an exception in that the closed-world
flag influences one part of its operation, but not the rest.
Fixes #5292
|
|
|
| |
The flag does nothing so far.
|
|
|
|
|
| |
Introduce static consts with PassOptions Defaults.
Add assertion to verify that the default options are the Os options.
Also update the text in relevant tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The general shape of the --help output is now:
========================
wasm-foo
Does the foo operation
========================
wasm-foo opts:
--------------
--foo-bar ..
Tool opts:
----------
..
The options are now in categories, with the more specific ones - most likely to be
wanted by the user - first. I think this makes the list a lot less confusing.
In particular, in wasm-opt all the opt passes are now in their own category.
Also add a script to make it easy to update the help tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds `EHUtils::handleBlockNestedPops`, which can be called at the
end of passes that has a possibility to put `pop`s inside `block`s. This
method assumes there exists a `pop` in a first-descendant line, even
though it can be nested within a block. This allows a `pop` to be nested
within a `block` or a `try`, but not a `loop`, since that means the
`pop` can run multile times. In case of `if`, `pop` can exist only in
its condition; if a `pop` is in its true or false body, that's not in
the first-descendant line.
This can be useful when optimization passes create blocks to do
transformations. Wrapping expressions wiith a block does not change
semantics most of the time, but if pops happen to be inside a block
generated by those passes, they can result in invalid binaries.
To test this, this adds `passes/test_passes.cpp`, which is intended to
contain multiple test passes that test a single (or more) utility
functions separately. Without this kind of pass, it is hard to test
various cases in which nested `pop`s can be generated in existing
passes. This PR also adds `PassRegistry::registerTestPass`, which
registers a pass that's intended only for internal testing and does not
show up in `wasm-opt --help`.
Fixes #4237.
|
|
|
|
|
| |
Locally I saw a 10% speedup on j2cl but reports of regressions have
arrived, so let's disable it for now pending investigation. The option added
here should make it easy to experiment.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The goal of this mode is to remove obviously-unneeded code like
(drop
(i32.load
(local.get $x)))
In general we can't remove it, as the load might trap - we'd be removing
a side effect. This is fairly rare in general, but actually becomes quite
annoying with wasm GC code where such patterns are more common,
and we really need to remove them.
Historically the IgnoreImplicitTraps option was meant to help here. However,
in practice it did not quite work well enough for most production code, as
mentioned e.g. in #3934 . TrapsNeverHappen mode is an attempt to fix that,
based on feedback from @askeksa in that issue, and also I believe this
implements an idea that @fitzgen mentioned a while ago (sorry, I can't
remember where exactly...). So I'm hopeful this will be generally useful
and not just for GC.
The idea in TrapsNeverHappen mode is that traps are assumed to not
actually happen at runtime. That is, if there is a trap in the code, it will
not be reached, or if it is reached then it will not trap. For example, an
(unreachable) would be assumed to never be reached, which means
that the optimizer can remove it and any code that executes right before
it:
(if
(..condition..)
(block
(..code that can be removed, if it does not branch out..)
(..code that can be removed, if it does not branch out..)
(..code that can be removed, if it does not branch out..)
(unreachable)))
And something like a load from memory is assumed to not trap, etc.,
which in particular would let us remove that dropped load from earlier.
This mode should be usable in production builds with assertions
disabled, if traps are seen as failing assertions. That might not be true
of all release builds (maybe some use traps for other purposes), but
hopefully in some. That is, if traps are like assertions, then enabling
this new mode would be like disabling assertions in release builds
and living with the fact that if an assertion would have been hit then
that is "undefined behavior" and the optimizer might have removed
the trap or done something weird.
TrapsNeverHappen (TNH) is different from IgnoreImplicitTraps (IIT).
The old IIT mode would just ignore traps when computing effects.
That is a simple model, but a problem happens with a trap behind
a condition, like this:
if (x != 0) foo(1 / x);
We won't trap on integer division by zero here only because of the
guarding if. In IIT, we'd compute no side effects on 1 / x, and then
we might end up moving it around, depending on other code in
the area, and potentially out of the if - which would make it happen
unconditionally, which would break.
TNH avoids that problem because it does not simply ignore traps.
Instead, there is a new hasUnremovableSideEffects() method
that must be opted-in by passes. That checks if there are no side
effects, or if there are, if we can remove them - and we know we can
remove a trap if we are running under TrapsNeverHappen mode,
as the trap won't happen by assumption. A pass must only use that
method where it is safe, that is, where it would either remove the
side effect (in which case, no problem), or if not, that it at least does
not move it around (avoiding the above problem with IIT).
This PR does not implement all optimizations possible with
TNH, just a small initial set of things to get started. It is already
useful on wasm GC code, including being as good as IIT on removing
unnecessary casts in some cases, see the test suite updates here.
Also, a significant part of the 18% speedup measured in
#4052 (comment)
is due to my testing with this enabled, as otherwise the devirtualization
there leaves a lot of unneeded code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have --pass-arg that allows sending an argument to a pass, like this:
wasm-opt --do-stuff --pass-arg=do-stuff@FUNCTION_NAME
With this PR that is equivalent to this:
wasm-opt --do-stuff=FUNCTION_NAME
That is,one can just give an argument to a pass on the commandline.
This fixes the Optional mode in command-line.h/cpp. That was not
actually used anywhere before this PR.
Also rename --extract-function's pass argument to match it. That is, the usage
used to be
wasm-opt --extract-function --pass-arg=extract@FUNCTION_NAME
Note how the pass name differed from the pass-arg name. This changes it to
match. This is a breaking change, but I doubt this is used enough to justify
any deprecation / backwards compatibility effort, and any usage is almost
certainly manual, and with PR writing it manually becomes easier as one
can do
wasm-opt --extract-function=FUNCTION_NAME
The existing test for that is kept (&renamed), and a new test added to test the
new notation.
This is a step towards unifying the symbol map functionality between
wasm-as and wasm-opt (later PRs will turn the symbol mapping pass into
a pass that receives an argument).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements emscripten-core/emscripten#13744
Inlining functions with a single use allows us to remove the function afterward.
That looks highly beneficial, shrinking every single benchmark in emscripten's
benchmark suite, by an average of 2% on the macrobenchmarks and 3.5% on
all of them. Speed also improves, although mostly on the microbenchmarks so
that might be less realistic.
There may be a slight downside to startup time due to emitting larger functions,
but given the baseline compilers in VMs these days it seems worth it, as the
delay would be just to get to the upper tier. On the benchmark suite the risk
seems low.
See more details in the PR above.
|
| |
|
|
|
|
|
|
|
|
|
| |
When set, we can assume an imported memory was not modified before us.
That lets us assume it is all zeros and so we can optimize out zeros from
memory segments.
This does not actually do anything with the flag, just adds it. This is to avoid
a rolling problem. Next emscripten can emit it without erroring, and then we
can start to use it.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to clang and gcc, --fast-math makes us ignore corner cases of floating-point
math like NaN changes and (not done yet) lack of associativity and so forth.
In the future we may want to have separate fast math flags for each specific thing,
like gcc and clang do.
This undoes some changes (#2958 and #3096) where we assumed it was
ok to not change NaN bits, but @binji corrected us. We can only do such things in fast
math mode. This puts those optimizations behind that flag, adds tests for it, and
restores the interpreter to the simpler code from before with no special cases.
|
|
|
|
|
|
|
|
|
| |
Split that mode into an option to check for loops (which indicate a function
is "heavy") and a constant check for having calls. The case of calls is
different as we would need more logic to avoid infinite recursion if we are
willing to inling functions with calls.
Practically, this renames allowHeavyweight to allowFunctionsWithLoops.
|
|
|
|
|
| |
As discussed in #2921, this allows inlining of functions not identified
as "lightweight" (that include a loop, for example).
|
|
|
|
|
|
| |
This will allow us to pass pass args to
wasm-emscripten-finalize, which runs
legalize-js-interface internally, which recently
added an optional argument.
|
|
|
| |
ignore-imports makes it not assume that any import may unwind/rewind the stack. ignore-indirect makes it not assume any indirect call can reach an unwind/rewind (which means, it assumes there is not an indirect call on the stack while unwinding).
|
|
|
|
|
| |
Use one shared location in optimization-options as much as possible.
This also allows tools like wasm2js to receive that flag.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Inlining: exposed inlining thresholds as command-line parameters.
This will allow easier experimentation with optimal settings.
Also tweaked the default logic slightly to always inline single
caller functions up to a certain size.
The command-line arguments were tested to have the desired effect
for example by the Makefile change in this commit:
https://github.com/aardappel/lobster/commit/39ae393e27ff363ab095bbb26c90d6fe17570104
which in turn relies on:
https://github.com/emscripten-core/emscripten/pull/8635
* Grouped inlining options & reverted defaults.
Now uses same defaults for inlining as before for the sake of
not having to redo a lot of tests.
Added FIXME to indicate that the current inlining logic needs
fixing.
* Fixed default values now pulled from code.
* clang-format
|
|
|
| |
Windows filenames can't contain colons. Use @ instead for passing arguments to passes.
|
|
|
| |
Applies the changes in #2065, and temprarily disables the hook since it's too slow to run on a change this large. We should re-enable it in a later commit.
|
|
|
| |
Mass change to apply clang-format to everything. We are applying this in a PR by me so the (git) blame is all mine ;) but @aheejin did all the work to get clang-format set up and all the manual work to tidy up some things to make the output nicer in #2048
|
|
|
|
|
| |
This allows us to emit a (potentially modified) target features
section and conditionally emit other sections such as the DataCount
section based on the presence of features.
|
|
|
|
|
|
|
|
|
| |
This allows
wasm-opt --pass-arg=KEY:VALUE
where KEY and VALUE are strings. It is then added to passOptions.arguments, where passes can read it.
This is used in ExtractFunction instead of an env var.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See #1919 - we did not do this consistently before.
This adds a lowMemoryUnused option to PassOptions. It can be passed on the commandline with --low-memory-unused. If enabled, we run the new optimize-added-constants pass, which does the real work here, replacing older code in post-emscripten.
Aside from running at the proper time (unlike the old pass, see #1919), this also has a -propagate mode, which can do stuff like this:
y = x + 10
[..]
load(y)
[..]
load(y)
=>
y = x + 10
[..]
load(x, offset=10)
[..]
load(x, offset=10)
That is, it can propagate such offsets to the loads/stores. This pattern is common in big interpreter loops, where the pointers are offsets into a big struct of state.
The pass does this propagation by using a new feature of LocalGraph, which can verify which locals are in SSA mode. Binaryen IR is not SSA (intentionally, since it's a later IR), but if a local only has a single set for all gets, that means that local is in such a state, and can be optimized. The tricky thing is that all locals are initialized to zero, so there are at minimum two sets. But if we verify that the real set dominates all the gets, then the zero initialization cannot reach them, and we are safe.
This PR also makes safe-heap aware of lowMemoryUnused. If so, we check for not just an access of 0, but the range 0-1023.
This makes zlib 5% faster, with either the wasm backend or asm2wasm. It also makes it 0.5% smaller. Also helps sqlite (1.5% faster) and lua (1% faster)
|
|
|
|
|
| |
* Also fixes some bugs in wasm2js tests that did not validate.
* Rename FeatureOptions => ToolOptions, as they now contain all the basic stuff each tool needs for commandline options (validation yes or no, and which features if so).
|
|
|
|
| |
Add feature flags and struct interface. Default feature set has all feature enabled.
|
|
|
|
|
|
|
|
|
| |
This defines a new -O4 optimization mode, as flatten + flat-only opts (currently local-cse) + -O3.
In practice, flattening is not needed for LLVM output, which is pretty flat already (no block or if values, etc., even if it does use tees and does nest expressions; and LLVM has already done gvn etc. anyhow). In general, though, wasm generated by a non-LLVM compiler may naturally be nested because wasm allows that. See for example #1593 where an AssemblyScript testcase requires flattening to be fully optimized. So -O4 can help there.
-O4 takes 3x longer to run than -O3 in my testing, basically because flat IR is much bigger. But when it's useful it may be worth it. It does handle that AssemblyScript testcase and others like it. There's not much big real-world code that isn't LLVM yet, but running the fuzzer - which happily creates nested stuff all the time - I see -O4 consistently shrink the size by around 20% over -O3.
|
|
|
|
| |
makes loading large wasm files more than twice as fast (#1496)
|
|
|
| |
The & on the type is the proper convention.
|
|
|
|
|
| |
Followup to #1357. This moves the optimization settings into pass.h, and uses it from there in the various places.
This also splits up huge lines from the tracing code, which put all block children (whose number can be arbitrarily large) on one line. This seems to have caused random errors on the bots, I suspect from overflowing a buffer. Anyhow, it's much more clear to split the lines at a reasonable length.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add optimize, shrink level and debug info options to C/JS
* Add instantiate functionality for creating additional unique instances of the API
* Use a workaround when running tests in node
Tests misuse a module as a script by concatenating, so instead of catching this case in the library, catch it there
* Update sieve test
Seems optimized output changed due to running with optimize levels 2/1 now
* Use the options with all pass runners
* Update relooper-fuzz C-API test
* Share defaults between tools and the C-API
* Add a test for optimize levels
* Unify node test support in check.by and auto_update_tests.py
* Also add getters for optimize levels and test them
* Also test debugInfo
* Add debug info to C tests that used it as well
* Fix missing NODEJS import in auto_update_tests
* Detect node.js version (WASM support)
* Update hello-world JS test (now also runs with node)
* feature-test WebAssembly in node instead
* Document that these options apply globally, and where
* Make sure hello-world.js output doesn't differ between mozjs/node
|
|
|
|
|
|
|
|
|
|
|
|
| |
This enum describes which wasm features the IR is expected to include. The
validator should reject operations which require excluded features, and passes
should avoid producing IR which requires excluded features.
This makes it easier to catch possible errors in Binaryen producers (e.g.
emscripten). Asm2wasm has a flag to enable or disable atomics. Other
tools currently just accept all features (as, dis and opt are just for
inspecting or modifying existing modules, so it would be annoying to have to use
flags with those tools and I expect the risk of accidentally introducing
atomics to be low).
|
|
|
|
|
|
|
|
| |
* avoid returning a PassRunner just for OptimizationOptions, it would need a more careful design with a copy constructor. instead, just simplify the API to do the thing we need, which is run the passes
* disallow copy constructor
* delete copy operator too
|
|
|
|
| |
* refactor optimization opts helper code to a class
|
| |
|
|
|
|
|
|
|
|
| |
* add --ignore-implicit-traps option, and by default do not ignore them, to properly preserve semantics
* implicit traps can be reordered, but are side effects and should not be removed
* add testing for --ignore-implicit-traps
|
| |
|
|
And use them in wasm-opt and asm2wasm consistently and uniformly.
|