Fuzzing: ClusterFuzz integration (#7079)

The main addition here is a bundle_clusterfuzz.py script which will package up the exact files that should be uploaded to ClusterFuzz. It also documents the process and bundling and testing. You can do bundle.py OUTPUT_FILE.tgz That bundles wasm-opt from ./bin., which is enough for local testing. For actually uploading to ClusterFuzz, we need a portable build, and @dschuff had the idea to reuse the emsdk build, which works nicely. Doing bundle.py OUTPUT_FILE.tgz --build-dir=/path/to/emsdk/upstream/ will bundle wasm-opt (+libs) from the emsdk. I verified that those builds work on ClusterFuzz. I added several forms of testing here. First, our main fuzzer fuzz_opt.py now has a ClusterFuzz testcase handler, which simulates a ClusterFuzz environment. Second, there are smoke tests that run in the unit test suite, and can also be run separately: python -m unittest test/unit/test_cluster_fuzz.py Those unit tests can also run on a given bundle, e.g. one created from an emsdk build, for testing right before upload: BINARYEN_CLUSTER_FUZZ_BUNDLE=/path/to/bundle.tgz python -m unittest test/unit/test_cluster_fuzz.py A third piece of testing is to add a --fuzz-passes test. That is a mode for -ttf (translate random data into a valid wasm fuzz testcase) that uses random data to pick and run a set of passes, to further shape the wasm. (--fuzz-passes had no previous testing, and this PR fixes it and tidies it up a little, adding some newer passes too). Otherwise this PR includes the key run.py script that is bundled and then executed by ClusterFuzz, basically a python script that runs wasm-opt -ttf [..] to generate testcases, sets up their JS, and emits them. fuzz_shell.js, which is the JS to execute testcases, will now check if it is provided binary data of a wasm file. If so, it does not read a wasm file from argv[1]. (This is needed because ClusterFuzz expects a single file for the testcase, so we make a JS file with bundled wasm inside it.)
author: Alon Zakai <azakai@google.com> 2024-11-19 09:28:01 -0800
committer: GitHub <noreply@github.com> 2024-11-19 09:28:01 -0800
commit: b0e999a2b8841d8be21cbcdc84cbc1d6469e36d7 (patch)
tree: 55f1d24ca38d3a0c9b6e9197f0e1a28493c50f50
parent: 25b8e6a714d2217e8735a925bc751900bce09d53 (diff)
download: binaryen-b0e999a2b8841d8be21cbcdc84cbc1d6469e36d7.tar.gz
binaryen-b0e999a2b8841d8be21cbcdc84cbc1d6469e36d7.tar.bz2
binaryen-b0e999a2b8841d8be21cbcdc84cbc1d6469e36d7.zip
11 files changed, 808 insertions, 22 deletions
diff --git a/scripts/bundle_clusterfuzz.py b/scripts/bundle_clusterfuzz.py
new file mode 100755
index 000000000..a03553837
--- /dev/null
+++ b/scripts/bundle_clusterfuzz.py
@@ -0,0 +1,135 @@
+#!/usr/bin/python3
+
+'''
+Bundle files for uploading to ClusterFuzz.
+
+Usage:
+
+bundle.py OUTPUT_FILE.tgz [--build-dir=BUILD_DIR]
+
+The output file will be a .tgz file.
+
+if a build directory is provided, we will look under there to find bin/wasm-opt
+and lib/libbinaryen.so. A useful place to get builds from is the Emscripten SDK,
+as you can do
+
+  ./emsdk install tot
+
+after which ./upstream/ (from the emsdk dir) will contain builds of wasm-opt and
+libbinaryen.so (that are designed to run on as many systems as possible, by not
+depending on newer libc symbols, etc., as opposed to a normal local build).
+Thus, the full workflow could be
+
+  cd emsdk
+  ./emsdk install tot
+  cd ../binaryen
+  python3 scripts/bundle_clusterfuzz.py binaryen_wasm_fuzzer.tgz --build-dir=../emsdk/upstream
+
+When using --build-dir in this way, you are responsible for ensuring that the
+wasm-opt in the build dir is compatible with the scripts in the current dir
+(e.g., if run.py here passes a flag that is only in a new/older version of
+wasm-opt, a problem can happen).
+
+Before uploading to ClusterFuzz, it is worth doing the following:
+
+  1. Run the local fuzzer (scripts/fuzz_opt.py). That includes a ClusterFuzz
+     testcase handler, which simulates what ClusterFuzz does.
+
+  2. Run the unit tests, which include smoke tests for our ClusterFuzz support:
+
+       python -m unittest test/unit/test_cluster_fuzz.py
+
+     Look at the logs, which will contain statistics on the wasm files the
+     fuzzer emits, and see that they look reasonable.
+
+     You should run the unit tests on the bundle you are about to upload, by
+     setting the proper env var like this (using the same filename as above):
+
+       BINARYEN_CLUSTER_FUZZ_BUNDLE=`pwd`/binaryen_wasm_fuzzer.tgz python -m unittest test/unit/test_cluster_fuzz.py
+
+     Note that you must pass an absolute filename (e.g. using pwd as shown).
+
+     The unittest logs should reflect that that bundle is being used at the
+     very start ("Using existing bundle: ..." rather than "Making a new
+     bundle"). Note that some of the unittests also create their own bundles, to
+     test the bundling script itself, so later down you will see logging of
+     bundle creation even if you provide a bundle.
+
+After uploading to ClusterFuzz, you can wait a while for it to run, and then:
+
+  1. Inspect the log to see that we generate all the testcases properly, and
+     their sizes look reasonably random, etc.
+
+  2. Inspect the sample testcase and run it locally, to see that
+
+       d8 --wasm-staging testcase.js
+
+     properly runs the testcase, emitting logging etc.
+
+  3. Check the stats and crashes page (known crashes should at least be showing
+     up). Note that these may take longer to show up than 1 and 2.
+'''
+
+import os
+import sys
+import tarfile
+
+# Read the filenames first, as importing |shared| changes the directory.
+output_file = os.path.abspath(sys.argv[1])
+print(f'Bundling to: {output_file}')
+assert output_file.endswith('.tgz'), 'Can only generate a .tgz'
+
+build_dir = None
+if len(sys.argv) >= 3:
+    assert sys.argv[2].startswith('--build-dir=')
+    build_dir = sys.argv[2].split('=')[1]
+    build_dir = os.path.abspath(build_dir)
+    # Delete the argument, as importing |shared| scans it.
+    sys.argv.pop()
+
+from test import shared # noqa
+
+# Pick where to get the builds
+if build_dir:
+    binaryen_bin = os.path.join(build_dir, 'bin')
+    binaryen_lib = os.path.join(build_dir, 'lib')
+else:
+    binaryen_bin = shared.options.binaryen_bin
+    binaryen_lib = shared.options.binaryen_lib
+
+with tarfile.open(output_file, "w:gz") as tar:
+    # run.py
+    run = os.path.join(shared.options.binaryen_root, 'scripts', 'clusterfuzz', 'run.py')
+    print(f'  .. run:         {run}')
+    tar.add(run, arcname='run.py')
+
+    # fuzz_shell.js
+    fuzz_shell = os.path.join(shared.options.binaryen_root, 'scripts', 'fuzz_shell.js')
+    print(f'  .. fuzz_shell:  {fuzz_shell}')
+    tar.add(fuzz_shell, arcname='scripts/fuzz_shell.js')
+
+    # wasm-opt binary
+    wasm_opt = os.path.join(binaryen_bin, 'wasm-opt')
+    print(f'  .. wasm-opt:    {wasm_opt}')
+    tar.add(wasm_opt, arcname='bin/wasm-opt')
+
+    # For a dynamic build we also need libbinaryen.so and possibly other files.
+    # Try both .so and .dylib suffixes for more OS coverage.
+    for suffix in ['.so', '.dylib']:
+        libbinaryen = os.path.join(binaryen_lib, f'libbinaryen{suffix}')
+        if os.path.exists(libbinaryen):
+            print(f'  .. libbinaryen: {libbinaryen}')
+            tar.add(libbinaryen, arcname=f'lib/libbinaryen{suffix}')
+
+            # The emsdk build also includes some more necessary files.
+            for name in [f'libc++{suffix}', f'libc++{suffix}.2', f'libc++{suffix}.2.0']:
+                path = os.path.join(binaryen_lib, name)
+                if os.path.exists(path):
+                    print(f'  ......... : {path}')
+                    tar.add(path, arcname=f'lib/{name}')
+
+print('Done.')
+print('To run the tests on this bundle, do:')
+print()
+print(f'BINARYEN_CLUSTER_FUZZ_BUNDLE={output_file} python -m unittest test/unit/test_cluster_fuzz.py')
+print()
diff --git a/scripts/clusterfuzz/run.py b/scripts/clusterfuzz/run.py
new file mode 100755
index 000000000..efddfc2d4
--- /dev/null
+++ b/scripts/clusterfuzz/run.py
@@ -0,0 +1,163 @@
+#
+# Copyright 2024 WebAssembly Community Group participants
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#         http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+'''
+ClusterFuzz run.py script: when run by ClusterFuzz, it uses wasm-opt to generate
+a fixed number of testcases. This is a "blackbox fuzzer", see
+
+https://google.github.io/clusterfuzz/setting-up-fuzzing/blackbox-fuzzing/
+
+This file should be bundled up together with the other files it needs, see
+bundle_clusterfuzz.py.
+'''
+
+import os
+import getopt
+import random
+import subprocess
+import sys
+
+# The V8 flags we put in the "fuzzer flags" files, which tell ClusterFuzz how to
+# run V8. By default we apply all staging flags.
+FUZZER_FLAGS_FILE_CONTENTS = '--wasm-staging'
+
+# Maximum size of the random data that we feed into wasm-opt -ttf. This is
+# smaller than fuzz_opt.py's INPUT_SIZE_MAX because that script is tuned for
+# fuzzing large wasm files (to reduce the overhead we have of launching many
+# processes per file), which is less of an issue on ClusterFuzz.
+MAX_RANDOM_SIZE = 15 * 1024
+
+# The prefix for fuzz files.
+FUZZ_FILENAME_PREFIX = 'fuzz-'
+
+# The prefix for flags files.
+FLAGS_FILENAME_PREFIX = 'flags-'
+
+# The name of the fuzzer (appears after FUZZ_FILENAME_PREFIX /
+# FLAGS_FILENAME_PREFIX).
+FUZZER_NAME_PREFIX = 'binaryen-'
+
+# The root directory of the bundle this will be in, which is the directory of
+# this very file.
+ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
+
+# The path to the wasm-opt binary that we run to generate testcases.
+FUZZER_BINARY_PATH = os.path.join(ROOT_DIR, 'bin', 'wasm-opt')
+
+# The path to the fuzz_shell.js script that will execute the wasm in each
+# testcase.
+JS_SHELL_PATH = os.path.join(ROOT_DIR, 'scripts', 'fuzz_shell.js')
+
+# The arguments we provide to wasm-opt to generate wasm files.
+FUZZER_ARGS = [
+    # Generate a wasm from random data.
+    '--translate-to-fuzz',
+    # Run some random passes, to further shape the random wasm we emit.
+    '--fuzz-passes',
+    # Enable all features but disable ones not yet ready for fuzzing. This may
+    # be a smaller set than fuzz_opt.py, as that enables a few experimental
+    # flags, while here we just fuzz with d8's --wasm-staging.
+    '-all',
+    '--disable-shared-everything',
+    '--disable-fp16',
+]
+
+
+# Returns the file name for fuzz or flags files.
+def get_file_name(prefix, index):
+    return f'{prefix}{FUZZER_NAME_PREFIX}{index}.js'
+
+
+# Returns the contents of a .js fuzz file, given particular wasm contents that
+# we want to be executed.
+def get_js_file_contents(wasm_contents):
+    # Start with the standard JS shell.
+    with open(JS_SHELL_PATH) as file:
+        js = file.read()
+
+    # Prepend the wasm contents, so they are used (rather than the normal
+    # mechanism where the wasm file's name is provided in argv).
+    wasm_contents = ','.join([str(c) for c in wasm_contents])
+    js = f'var binary = new Uint8Array([{wasm_contents}]);\n\n' + js
+    return js
+
+
+def main(argv):
+    # Parse the options. See
+    # https://google.github.io/clusterfuzz/setting-up-fuzzing/blackbox-fuzzing/#uploading-a-fuzzer
+    output_dir = '.'
+    num = 100
+    expected_flags = ['input_dir=', 'output_dir=', 'no_of_files=']
+    optlist, _ = getopt.getopt(argv[1:], '', expected_flags)
+    for option, value in optlist:
+        if option == '--output_dir':
+            output_dir = value
+        elif option == '--no_of_files':
+            num = int(value)
+
+    for i in range(1, num + 1):
+        input_data_file_path = os.path.join(output_dir, f'{i}.input')
+        wasm_file_path = os.path.join(output_dir, f'{i}.wasm')
+
+        # wasm-opt may fail to run in rare cases (when the fuzzer emits code it
+        # detects as invalid). Just try again in such a case.
+        for attempt in range(0, 100):
+            # Generate random data.
+            random_size = random.SystemRandom().randint(1, MAX_RANDOM_SIZE)
+            with open(input_data_file_path, 'wb') as file:
+                file.write(os.urandom(random_size))
+
+            # Generate wasm from the random data.
+            cmd = [FUZZER_BINARY_PATH] + FUZZER_ARGS
+            cmd += ['-o', wasm_file_path, input_data_file_path]
+            try:
+                subprocess.check_call(cmd)
+            except subprocess.CalledProcessError:
+                # Try again.
+                print('(oops, retrying wasm-opt)')
+                attempt += 1
+                if attempt == 99:
+                    # Something is very wrong!
+                    raise
+                continue
+            # Success, leave the loop.
+            break
+
+        # Generate a testcase from the wasm
+        with open(wasm_file_path, 'rb') as file:
+            wasm_contents = file.read()
+        testcase_file_path = os.path.join(output_dir,
+                                          get_file_name(FUZZ_FILENAME_PREFIX, i))
+        js_file_contents = get_js_file_contents(wasm_contents)
+        with open(testcase_file_path, 'w') as file:
+            file.write(js_file_contents)
+
+        # Emit a corresponding flags file.
+        flags_file_path = os.path.join(output_dir,
+                                       get_file_name(FLAGS_FILENAME_PREFIX, i))
+        with open(flags_file_path, 'w') as file:
+            file.write(FUZZER_FLAGS_FILE_CONTENTS)
+
+        print(f'Created testcase: {testcase_file_path}, {len(wasm_contents)} bytes')
+
+        # Remove temporary files.
+        os.remove(input_data_file_path)
+        os.remove(wasm_file_path)
+
+    print(f'Created {num} testcases.')
+
+
+if __name__ == '__main__':
+    main(sys.argv)
diff --git a/scripts/fuzz_opt.py b/scripts/fuzz_opt.py
index bf712c821..cd583e026 100755
--- a/scripts/fuzz_opt.py
+++ b/scripts/fuzz_opt.py
@@ -36,6 +36,7 @@ import subprocess
 import random
 import re
 import sys
+import tarfile
 import time
 import traceback
 from os.path import abspath
@@ -1574,6 +1575,84 @@ class RoundtripText(TestCaseHandler):
         run([in_bin('wasm-opt'), abspath('a.wast')] + FEATURE_OPTS)
 
 
+# Fuzz in a near-identical manner to how we fuzz on ClusterFuzz. This is mainly
+# to see that fuzzing that way works properly (it likely won't catch anything
+# the other fuzzers here catch, though it is possible). That is, running this
+# script continuously will give continuous cover that ClusterFuzz should be
+# running ok.
+#
+# Note that this is *not* deterministic like the other fuzzers: it runs run.py
+# like ClusterFuzz does, and that generates its own random data. If a bug is
+# caught here, it must be reduced manually.
+class ClusterFuzz(TestCaseHandler):
+    frequency = 0.1
+
+    def handle(self, wasm):
+        self.ensure()
+
+        # run.py() should emit these two files. Delete them to make sure they
+        # are created by run.py() in the next step.
+        fuzz_file = 'fuzz-binaryen-1.js'
+        flags_file = 'flags-binaryen-1.js'
+        for f in [fuzz_file, flags_file]:
+            if os.path.exists(f):
+                os.unlink(f)
+
+        # Call run.py(), similarly to how ClusterFuzz does.
+        run([sys.executable,
+             os.path.join(self.clusterfuzz_dir, 'run.py'),
+             '--output_dir=' + os.getcwd(),
+             '--no_of_files=1'])
+
+        # We should see the two files.
+        assert os.path.exists(fuzz_file)
+        assert os.path.exists(flags_file)
+
+        # Run the testcase in V8, similarly to how ClusterFuzz does.
+        cmd = [shared.V8]
+        # The flags are given in the flags file - we do *not* use our normal
+        # flags here!
+        with open(flags_file, 'r') as f:
+            flags = f.read()
+        cmd.append(flags)
+        # Run the fuzz file, which contains a modified fuzz_shell.js - we do
+        # *not* run fuzz_shell.js normally.
+        cmd.append(os.path.abspath(fuzz_file))
+        # No wasm file needs to be provided: it is hardcoded into the JS. Note
+        # that we use run_vm(), which will ignore known issues in our output and
+        # in V8. Those issues may cause V8 to e.g. reject a binary we emit that
+        # is invalid, but that should not be a problem for ClusterFuzz (it isn't
+        # a crash).
+        output = run_vm(cmd)
+
+        # Verify that we called something. The fuzzer should always emit at
+        # least one exported function (unless we've decided to ignore the entire
+        # run).
+        if output != IGNORE:
+            assert FUZZ_EXEC_CALL_PREFIX in output
+
+    def ensure(self):
+        # The first time we actually run, set things up: make a bundle like the
+        # one ClusterFuzz receives, and unpack it for execution into a dir. The
+        # existence of that dir shows we've ensured all we need.
+        if hasattr(self, 'clusterfuzz_dir'):
+            return
+
+        self.clusterfuzz_dir = 'clusterfuzz'
+        if os.path.exists(self.clusterfuzz_dir):
+            shutil.rmtree(self.clusterfuzz_dir)
+        os.mkdir(self.clusterfuzz_dir)
+
+        print('Bundling for ClusterFuzz')
+        bundle = 'fuzz_opt_clusterfuzz_bundle.tgz'
+        run([in_binaryen('scripts', 'bundle_clusterfuzz.py'), bundle])
+
+        print('Unpacking for ClusterFuzz')
+        tar = tarfile.open(bundle, "r:gz")
+        tar.extractall(path=self.clusterfuzz_dir)
+        tar.close()
+
+
 # The global list of all test case handlers
 testcase_handlers = [
     FuzzExec(),
@@ -1585,7 +1664,8 @@ testcase_handlers = [
     Merge(),
     # TODO: enable when stable enough, and adjust |frequency| (see above)
     # Split(),
-    RoundtripText()
+    RoundtripText(),
+    ClusterFuzz(),
 ]
 
 
diff --git a/scripts/fuzz_shell.js b/scripts/fuzz_shell.js
index d9a994896..ce817646e 100644
--- a/scripts/fuzz_shell.js
+++ b/scripts/fuzz_shell.js
@@ -25,14 +25,18 @@ if (typeof process === 'object' && typeof require === 'function') {
   };
 }
 
-// We are given the binary to run as a parameter.
-var binary = readBinary(argv[0]);
+// The binary to be run. This may be set already (by code that runs before this
+// script), and if not, we get the filename from argv.
+var binary;
+if (!binary) {
+  binary = readBinary(argv[0]);
+}
 
 // Normally we call all the exports of the given wasm file. But, if we are
 // passed a final parameter in the form of "exports:X,Y,Z" then we call
 // specifically the exports X, Y, and Z.
 var exportsToCall;
-if (argv[argv.length - 1].startsWith('exports:')) {
+if (argv.length > 0 && argv[argv.length - 1].startsWith('exports:')) {
   exportsToCall = argv[argv.length - 1].substr('exports:'.length).split(',');
   argv.pop();
 }
diff --git a/src/tools/fuzzing/fuzzing.cpp b/src/tools/fuzzing/fuzzing.cpp
index cbdbff3ca..ed653ef6b 100644
--- a/src/tools/fuzzing/fuzzing.cpp
+++ b/src/tools/fuzzing/fuzzing.cpp
@@ -55,16 +55,23 @@ TranslateToFuzzReader::TranslateToFuzzReader(Module& wasm,
       wasm, read_file<std::vector<char>>(filename, Flags::Binary)) {}
 
 void TranslateToFuzzReader::pickPasses(OptimizationOptions& options) {
+  // Pick random passes to further shape the wasm. This is similar to how we
+  // pick random passes in fuzz_opt.py, but the goal there is to find problems
+  // in the passes, while the goal here is more to shape the wasm, so that
+  // translate-to-fuzz emits interesting outputs (the latter is important for
+  // things like ClusterFuzz, where we are using Binaryen to fuzz other things
+  // than itself). As a result, the list of passes here is different from
+  // fuzz_opt.py.
   while (options.passes.size() < 20 && !random.finished() && !oneIn(3)) {
-    switch (upTo(32)) {
+    switch (upTo(42)) {
       case 0:
       case 1:
       case 2:
       case 3:
       case 4: {
-        options.passes.push_back("O");
         options.passOptions.optimizeLevel = upTo(4);
-        options.passOptions.shrinkLevel = upTo(4);
+        options.passOptions.shrinkLevel = upTo(3);
+        options.addDefaultOptPasses();
         break;
       }
       case 5:
@@ -83,7 +90,14 @@ void TranslateToFuzzReader::pickPasses(OptimizationOptions& options) {
         options.passes.push_back("duplicate-function-elimination");
         break;
       case 10:
-        options.passes.push_back("flatten");
+        // Some features do not support flatten yet.
+        if (!wasm.features.hasReferenceTypes() &&
+            !wasm.features.hasExceptionHandling() && !wasm.features.hasGC()) {
+          options.passes.push_back("flatten");
+          if (oneIn(2)) {
+            options.passes.push_back("rereloop");
+          }
+        }
         break;
       case 11:
         options.passes.push_back("inlining");
@@ -127,11 +141,9 @@ void TranslateToFuzzReader::pickPasses(OptimizationOptions& options) {
       case 24:
         options.passes.push_back("reorder-locals");
         break;
-      case 25: {
-        options.passes.push_back("flatten");
-        options.passes.push_back("rereloop");
+      case 25:
+        options.passes.push_back("directize");
         break;
-      }
       case 26:
         options.passes.push_back("simplify-locals");
         break;
@@ -150,18 +162,115 @@ void TranslateToFuzzReader::pickPasses(OptimizationOptions& options) {
       case 31:
         options.passes.push_back("vacuum");
         break;
+      case 32:
+        options.passes.push_back("merge-locals");
+        break;
+      case 33:
+        options.passes.push_back("licm");
+        break;
+      case 34:
+        options.passes.push_back("tuple-optimization");
+        break;
+      case 35:
+        options.passes.push_back("rse");
+        break;
+      case 36:
+        options.passes.push_back("monomorphize");
+        break;
+      case 37:
+        options.passes.push_back("monomorphize-always");
+        break;
+      case 38:
+      case 39:
+      case 40:
+      case 41:
+        // GC specific passes.
+        if (wasm.features.hasGC()) {
+          // Most of these depend on closed world, so just set that.
+          options.passOptions.closedWorld = true;
+
+          switch (upTo(16)) {
+            case 0:
+              options.passes.push_back("abstract-type-refining");
+              break;
+            case 1:
+              options.passes.push_back("cfp");
+              break;
+            case 2:
+              options.passes.push_back("gsi");
+              break;
+            case 3:
+              options.passes.push_back("gto");
+              break;
+            case 4:
+              options.passes.push_back("heap2local");
+              break;
+            case 5:
+              options.passes.push_back("heap-store-optimization");
+              break;
+            case 6:
+              options.passes.push_back("minimize-rec-groups");
+              break;
+            case 7:
+              options.passes.push_back("remove-unused-types");
+              break;
+            case 8:
+              options.passes.push_back("signature-pruning");
+              break;
+            case 9:
+              options.passes.push_back("signature-refining");
+              break;
+            case 10:
+              options.passes.push_back("type-finalizing");
+              break;
+            case 11:
+              options.passes.push_back("type-refining");
+              break;
+            case 12:
+              options.passes.push_back("type-merging");
+              break;
+            case 13:
+              options.passes.push_back("type-ssa");
+              break;
+            case 14:
+              options.passes.push_back("type-unfinalizing");
+              break;
+            case 15:
+              options.passes.push_back("unsubtyping");
+              break;
+            default:
+              WASM_UNREACHABLE("unexpected value");
+          }
+        }
+        break;
       default:
         WASM_UNREACHABLE("unexpected value");
     }
   }
+
   if (oneIn(2)) {
+    // We randomize these when we pick -O?, but sometimes do so even without, as
+    // they affect some passes.
     options.passOptions.optimizeLevel = upTo(4);
+    options.passOptions.shrinkLevel = upTo(3);
   }
-  if (oneIn(2)) {
-    options.passOptions.shrinkLevel = upTo(4);
+
+  if (!options.passOptions.closedWorld && oneIn(2)) {
+    options.passOptions.closedWorld = true;
+  }
+
+  // Usually DCE at the very end, to ensure that our binaries validate in other
+  // VMs, due to how non-nullable local validation and unreachable code
+  // interact. See fuzz_opt.py and
+  //   https://github.com/WebAssembly/binaryen/pull/5665
+  //   https://github.com/WebAssembly/binaryen/issues/5599
+  if (wasm.features.hasGC() && !oneIn(10)) {
+    options.passes.push_back("dce");
   }
-  std::cout << "opt level: " << options.passOptions.optimizeLevel << '\n';
-  std::cout << "shrink level: " << options.passOptions.shrinkLevel << '\n';
+
+  // TODO: We could in theory run some function-level passes on particular
+  //       functions, but then we'd need to do this after generation, not
+  //       before (and random data no longer remains then).
 }
 
 void TranslateToFuzzReader::build() {
diff --git a/src/tools/wasm-opt.cpp b/src/tools/wasm-opt.cpp
index 3e1152179..3e429a976 100644
--- a/src/tools/wasm-opt.cpp
+++ b/src/tools/wasm-opt.cpp
@@ -161,8 +161,8 @@ int main(int argc, const char* argv[]) {
          })
     .add("--fuzz-passes",
          "-fp",
-         "Pick a random set of passes to run, useful for fuzzing. this depends "
-         "on translate-to-fuzz (it picks the passes from the input)",
+         "When doing translate-to-fuzz, pick a set of random passes from the "
+         "input to further shape the wasm",
          WasmOptOption,
          Options::Arguments::Zero,
          [&](Options* o, const std::string& arguments) { fuzzPasses = true; })
diff --git a/test/lit/help/wasm-opt.test b/test/lit/help/wasm-opt.test
index b30f62150..1ac823fa7 100644
--- a/test/lit/help/wasm-opt.test
+++ b/test/lit/help/wasm-opt.test
@@ -41,10 +41,10 @@
 ;; CHECK-NEXT:   --initial-fuzz,-if                            Initial wasm content in
 ;; CHECK-NEXT:                                                 translate-to-fuzz (-ttf) mode
 ;; CHECK-NEXT:
-;; CHECK-NEXT:   --fuzz-passes,-fp                             Pick a random set of passes to
-;; CHECK-NEXT:                                                 run, useful for fuzzing. this
-;; CHECK-NEXT:                                                 depends on translate-to-fuzz (it
-;; CHECK-NEXT:                                                 picks the passes from the input)
+;; CHECK-NEXT:   --fuzz-passes,-fp                             When doing translate-to-fuzz,
+;; CHECK-NEXT:                                                 pick a set of random passes from
+;; CHECK-NEXT:                                                 the input to further shape the
+;; CHECK-NEXT:                                                 wasm
 ;; CHECK-NEXT:
 ;; CHECK-NEXT:   --no-fuzz-memory                              don't emit memory ops when
 ;; CHECK-NEXT:                                                 fuzzing
diff --git a/test/passes/fuzz_metrics_passes_noprint.bin.txt b/test/passes/fuzz_metrics_passes_noprint.bin.txt
new file mode 100644
index 000000000..b4d67bab0
--- /dev/null
+++ b/test/passes/fuzz_metrics_passes_noprint.bin.txt
@@ -0,0 +1,35 @@
+Metrics
+total
+ [exports]      : 23      
+ [funcs]        : 34      
+ [globals]      : 30      
+ [imports]      : 5       
+ [memories]     : 1       
+ [memory-data]  : 17      
+ [table-data]   : 6       
+ [tables]       : 1       
+ [tags]         : 0       
+ [total]        : 9415    
+ [vars]         : 105     
+ Binary         : 726     
+ Block          : 1537    
+ Break          : 331     
+ Call           : 306     
+ CallIndirect   : 10      
+ Const          : 1479    
+ Drop           : 83      
+ GlobalGet      : 778     
+ GlobalSet      : 584     
+ If             : 531     
+ Load           : 164     
+ LocalGet       : 774     
+ LocalSet       : 570     
+ Loop           : 244     
+ Nop            : 105     
+ RefFunc        : 6       
+ Return         : 94      
+ Select         : 70      
+ Store          : 86      
+ Switch         : 2       
+ Unary          : 654     
+ Unreachable    : 281     
diff --git a/test/passes/fuzz_metrics_passes_noprint.passes b/test/passes/fuzz_metrics_passes_noprint.passes
new file mode 100644
index 000000000..1d1a109be
--- /dev/null
+++ b/test/passes/fuzz_metrics_passes_noprint.passes
@@ -0,0 +1 @@
+translate-to-fuzz_fuzz-passes_metrics
diff --git a/test/passes/fuzz_metrics_passes_noprint.wasm b/test/passes/fuzz_metrics_passes_noprint.wasm
new file mode 100644
index 000000000..24c4a2e2e
--- /dev/null
+++ b/test/passes/fuzz_metrics_passes_noprint.wasm
diff --git a/test/unit/test_cluster_fuzz.py b/test/unit/test_cluster_fuzz.py
new file mode 100644
index 000000000..293cfa339
--- /dev/null
+++ b/test/unit/test_cluster_fuzz.py
@@ -0,0 +1,259 @@
+import os
+import platform
+import re
+import statistics
+import subprocess
+import sys
+import tarfile
+import tempfile
+import unittest
+
+from scripts.test import shared
+from . import utils
+
+
+def get_build_dir():
+    # wasm-opt is in the bin/ dir, and the build dir is one above it,
+    # and contains bin/ and lib/.
+    return os.path.dirname(os.path.dirname(shared.WASM_OPT[0]))
+
+
+# Windows is not yet supported.
+@unittest.skipIf(platform.system() == 'Windows', "showing class skipping")
+class ClusterFuzz(utils.BinaryenTestCase):
+    @classmethod
+    def setUpClass(cls):
+        # Bundle up our ClusterFuzz package, and unbundle it to a directory.
+        # Keep the directory alive in a class var.
+        cls.temp_dir = tempfile.TemporaryDirectory()
+        cls.clusterfuzz_dir = cls.temp_dir.name
+
+        bundle = os.environ.get('BINARYEN_CLUSTER_FUZZ_BUNDLE')
+        if bundle:
+            print(f'Using existing bundle: {bundle}')
+        else:
+            print('Making a new bundle')
+            bundle = os.path.join(cls.clusterfuzz_dir, 'bundle.tgz')
+            cmd = [shared.in_binaryen('scripts', 'bundle_clusterfuzz.py')]
+            cmd.append(bundle)
+            cmd.append(f'--build-dir={get_build_dir()}')
+            shared.run_process(cmd)
+
+        print('Unpacking bundle')
+        tar = tarfile.open(bundle, "r:gz")
+        tar.extractall(path=cls.clusterfuzz_dir)
+        tar.close()
+
+        print('Ready')
+
+    # Test our bundler for ClusterFuzz.
+    def test_bundle(self):
+        # The bundle should contain certain files:
+        # 1. run.py, the main entry point.
+        self.assertTrue(os.path.exists(os.path.join(self.clusterfuzz_dir, 'run.py')))
+        # 2. scripts/fuzz_shell.js, the js testcase shell
+        self.assertTrue(os.path.exists(os.path.join(self.clusterfuzz_dir, 'scripts', 'fuzz_shell.js')))
+        # 3. bin/wasm-opt, the wasm-opt binary in a static build
+        wasm_opt = os.path.join(self.clusterfuzz_dir, 'bin', 'wasm-opt')
+        self.assertTrue(os.path.exists(wasm_opt))
+
+        # See that we can execute the bundled wasm-opt. It should be able to
+        # print out its version.
+        out = subprocess.check_output([wasm_opt, '--version'], text=True)
+        self.assertIn('wasm-opt version ', out)
+
+    # Generate N testcases, using run.py from a temp dir, and outputting to a
+    # testcase dir.
+    def generate_testcases(self, N, testcase_dir):
+        proc = subprocess.run([sys.executable,
+                               os.path.join(self.clusterfuzz_dir, 'run.py'),
+                               f'--output_dir={testcase_dir}',
+                               f'--no_of_files={N}'],
+                              text=True,
+                              stdout=subprocess.PIPE,
+                              stderr=subprocess.PIPE)
+        self.assertEqual(proc.returncode, 0)
+        return proc
+
+    # Test the bundled run.py script.
+    def test_run_py(self):
+        temp_dir = tempfile.TemporaryDirectory()
+
+        N = 10
+        proc = self.generate_testcases(N, temp_dir.name)
+
+        # We should have logged the creation of N testcases.
+        self.assertEqual(proc.stdout.count('Created testcase:'), N)
+
+        # We should have actually created them.
+        for i in range(0, N + 2):
+            fuzz_file = os.path.join(temp_dir.name, f'fuzz-binaryen-{i}.js')
+            flags_file = os.path.join(temp_dir.name, f'flags-binaryen-{i}.js')
+            # We actually emit the range [1, N], so 0 or N+1 should not exist.
+            if i >= 1 and i <= N:
+                self.assertTrue(os.path.exists(fuzz_file))
+                self.assertTrue(os.path.exists(flags_file))
+            else:
+                self.assertTrue(not os.path.exists(fuzz_file))
+                self.assertTrue(not os.path.exists(flags_file))
+
+    def test_fuzz_passes(self):
+        # We should see interesting passes being run in run.py. This is *NOT* a
+        # deterministic test, since the number of passes run is random (we just
+        # let run.py run normally, to simulate the real environment), so flakes
+        # are possible here. However, we do the check in a way that the
+        # statistical likelihood of a flake is insignificant. Specifically, we
+        # just check that we see a different number of passes run in two
+        # different invocations, which is enough to prove that we are running
+        # different passes each time. And the number of passes is on average
+        # over 100 here (10 testcases, and each runs 0-20 passes or so).
+        temp_dir = tempfile.TemporaryDirectory()
+        N = 10
+
+        # Try many times to see a different number, to make flakes even less
+        # likely. In the worst case if there were two possible numbers of
+        # passes run, with equal probability, then if we failed 100 iterations
+        # every second, we could go for billions of billions of years without a
+        # flake. (And, if there are only two numbers with *non*-equal
+        # probability then something is very wrong, and we'd like to see
+        # errors.)
+        seen_num_passes = set()
+        for i in range(100):
+            os.environ['BINARYEN_PASS_DEBUG'] = '1'
+            try:
+                proc = self.generate_testcases(N, temp_dir.name)
+            finally:
+                del os.environ['BINARYEN_PASS_DEBUG']
+
+            num_passes = proc.stderr.count('running pass')
+            print(f'num passes: {num_passes}')
+            seen_num_passes.add(num_passes)
+            if len(seen_num_passes) > 1:
+                return
+        raise Exception(f'We always only saw {seen_num_passes} passes run')
+
+    def test_file_contents(self):
+        # As test_fuzz_passes, this is nondeterministic, but statistically it is
+        # almost impossible to get a flake here.
+        temp_dir = tempfile.TemporaryDirectory()
+        N = 100
+        self.generate_testcases(N, temp_dir.name)
+
+        # To check for interesting wasm file contents, we'll note how many
+        # struct.news appear (a signal that we are emitting WasmGC, and also a
+        # non-trivial number of them), the sizes of the wasm files, and the
+        # exports.
+        seen_struct_news = []
+        seen_sizes = []
+        seen_exports = []
+
+        # The number of struct.news appears in the metrics report like this:
+        #
+        # StructNew      : 18
+        #
+        struct_news_regex = re.compile(r'StructNew\s+:\s+(\d+)')
+
+        # The number of exports appears in the metrics report like this:
+        #
+        # [exports]      : 1
+        #
+        exports_regex = re.compile(r'\[exports\]\s+:\s+(\d+)')
+
+        for i in range(1, N + 1):
+            fuzz_file = os.path.join(temp_dir.name, f'fuzz-binaryen-{i}.js')
+            flags_file = os.path.join(temp_dir.name, f'flags-binaryen-{i}.js')
+
+            # The flags file must contain --wasm-staging
+            with open(flags_file) as f:
+                self.assertEqual(f.read(), '--wasm-staging')
+
+            # The fuzz files begin with
+            #
+            #   var binary = new Uint8Array([..binary data as numbers..]);
+            #
+            with open(fuzz_file) as f:
+                first_line = f.readline().strip()
+                start = 'var binary = new Uint8Array(['
+                end = ']);'
+                self.assertTrue(first_line.startswith(start))
+                self.assertTrue(first_line.endswith(end))
+                numbers = first_line[len(start):-len(end)]
+
+            # Convert to binary, and see that it is a valid file.
+            numbers_array = [int(x) for x in numbers.split(',')]
+            binary_file = os.path.join(temp_dir.name, 'file.wasm')
+            with open(binary_file, 'wb') as f:
+                f.write(bytes(numbers_array))
+            metrics = subprocess.check_output(
+                shared.WASM_OPT + ['-all', '--metrics', binary_file, '-q'], text=True)
+
+            # Update with what we see.
+            struct_news = re.findall(struct_news_regex, metrics)
+            if not struct_news:
+                # No line is emitted when --metrics sees no struct.news.
+                struct_news = ['0']
+            # Metrics should contain one line for StructNews.
+            self.assertEqual(len(struct_news), 1)
+            seen_struct_news.append(int(struct_news[0]))
+
+            seen_sizes.append(os.path.getsize(binary_file))
+
+            exports = re.findall(exports_regex, metrics)
+            # Metrics should contain one line for exports.
+            self.assertEqual(len(exports), 1)
+            seen_exports.append(int(exports[0]))
+
+        print()
+
+        # struct.news appear to be distributed as mean 15, stddev 24, median 10,
+        # so over 100 samples we are incredibly likely to see an interesting
+        # number at least once. It is also incredibly unlikely for the stdev to
+        # be zero.
+        print(f'mean struct.news:   {statistics.mean(seen_struct_news)}')
+        print(f'stdev struct.news:  {statistics.stdev(seen_struct_news)}')
+        print(f'median struct.news: {statistics.median(seen_struct_news)}')
+        self.assertGreaterEqual(max(seen_struct_news), 10)
+        self.assertGreater(statistics.stdev(seen_struct_news), 0)
+
+        print()
+
+        # sizes appear to be distributed as mean 2933, stddev 2011, median 2510.
+        print(f'mean sizes:   {statistics.mean(seen_sizes)}')
+        print(f'stdev sizes:  {statistics.stdev(seen_sizes)}')
+        print(f'median sizes: {statistics.median(seen_sizes)}')
+        self.assertGreaterEqual(max(seen_sizes), 1000)
+        self.assertGreater(statistics.stdev(seen_sizes), 0)
+
+        print()
+
+        # exports appear to be distributed as mean 9, stddev 6, median 8.
+        print(f'mean exports:   {statistics.mean(seen_exports)}')
+        print(f'stdev exports:  {statistics.stdev(seen_exports)}')
+        print(f'median exports: {statistics.median(seen_exports)}')
+        self.assertGreaterEqual(max(seen_exports), 8)
+        self.assertGreater(statistics.stdev(seen_exports), 0)
+
+        print()
+
+    # "zzz" in test name so that this runs last. If it runs first, it can be
+    # confusing as it appears next to the logging of which bundle we use (see
+    # setUpClass).
+    def test_zzz_bundle_build_dir(self):
+        cmd = [shared.in_binaryen('scripts', 'bundle_clusterfuzz.py')]
+        cmd.append('bundle.tgz')
+        # Test that we notice the --build-dir flag. Here we pass an invalid
+        # value, so we should error.
+        cmd.append('--build-dir=foo_bar')
+
+        failed = False
+        try:
+            subprocess.check_call(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+        except subprocess.CalledProcessError:
+            # Expected error.
+            failed = True
+        self.assertTrue(failed)
+
+        # Test with a valid --build-dir.
+        cmd.pop()
+        cmd.append(f'--build-dir={get_build_dir()}')
+        subprocess.check_call(cmd)
author	Alon Zakai <azakai@google.com>	2024-11-19 09:28:01 -0800
committer	GitHub <noreply@github.com>	2024-11-19 09:28:01 -0800
commit	b0e999a2b8841d8be21cbcdc84cbc1d6469e36d7 (patch)
tree	55f1d24ca38d3a0c9b6e9197f0e1a28493c50f50
parent	25b8e6a714d2217e8735a925bc751900bce09d53 (diff)
download	binaryen-b0e999a2b8841d8be21cbcdc84cbc1d6469e36d7.tar.gz binaryen-b0e999a2b8841d8be21cbcdc84cbc1d6469e36d7.tar.bz2 binaryen-b0e999a2b8841d8be21cbcdc84cbc1d6469e36d7.zip