diff options
author | Thomas Lively <tlively@google.com> | 2022-10-11 11:16:14 -0500 |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-10-11 16:16:14 +0000 |
commit | b83450ed1fd98cec4453024f57f892b31851ea50 (patch) | |
tree | bf0467d96c9966d0f4699ea0afcdf25905b4098c /src/support/istring.cpp | |
parent | 6d4ac3162c290e32a98de349d49e26e904a40414 (diff) | |
download | binaryen-b83450ed1fd98cec4453024f57f892b31851ea50.tar.gz binaryen-b83450ed1fd98cec4453024f57f892b31851ea50.tar.bz2 binaryen-b83450ed1fd98cec4453024f57f892b31851ea50.zip |
Make `Name` a pointer, length pair (#5122)
With the goal of supporting null characters (i.e. zero bytes) in strings.
Rewrite the underlying interned `IString` to store a `std::string_view` rather
than a `const char*`, reduce the number of map lookups necessary to intern a
string, and present a more immutable interface.
Most importantly, replace the `c_str()` method that returned a `const char*`
with a `toString()` method that returns a `std::string`. This new method can
correctly handle strings containing null characters. A `const char*` can still
be had by calling `data()` on the `std::string_view`, although this usage should
be discouraged.
This change is NFC in spirit, although not in practice. It does not intend to
support any particular new functionality, but it is probably now possible to use
strings containing null characters in at least some cases. At least one parser
bug is also incidentally fixed. Follow-on PRs will explicitly support and test
strings containing nulls for particular use cases.
The C API still uses `const char*` to represent strings. As strings containing
nulls become better supported by the rest of Binaryen, this will no longer be
sufficient. Updating the C and JS APIs to use pointer, length pairs is left as
future work.
Diffstat (limited to 'src/support/istring.cpp')
-rw-r--r-- | src/support/istring.cpp | 88 |
1 files changed, 88 insertions, 0 deletions
diff --git a/src/support/istring.cpp b/src/support/istring.cpp new file mode 100644 index 000000000..8a3319b5e --- /dev/null +++ b/src/support/istring.cpp @@ -0,0 +1,88 @@ +/* + * Copyright 2022 WebAssembly Community Group participants + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include "istring.h" + +namespace wasm { + +std::string_view IString::interned(std::string_view s, bool reuse) { + // We need a set of string_views that can be modified in-place to minimize + // the number of lookups we do. Since set elements cannot normally be + // modified, wrap the string_views in a container that provides mutability + // even through a const reference. + struct MutStringView { + mutable std::string_view str; + MutStringView(std::string_view str) : str(str) {} + }; + struct MutStringViewHash { + size_t operator()(const MutStringView& mut) const { + return std::hash<std::string_view>{}(mut.str); + } + }; + struct MutStringViewEqual { + bool operator()(const MutStringView& a, const MutStringView& b) const { + return a.str == b.str; + } + }; + using StringSet = + std::unordered_set<MutStringView, MutStringViewHash, MutStringViewEqual>; + + // The authoritative global set of interned string views. + static StringSet globalStrings; + + // The global backing store for interned strings that do not otherwise have + // stable addresses. + static std::vector<std::vector<char>> allocated; + + // Guards access to `globalStrings` and `allocated`. + static std::mutex mutex; + + // A thread-local cache of strings to reduce contention. + thread_local static StringSet localStrings; + + auto [localIt, localInserted] = localStrings.insert(s); + if (!localInserted) { + // We already had a local copy of this string. + return localIt->str; + } + + // No copy yet in the local cache. Check the global cache. + std::unique_lock<std::mutex> lock(mutex); + auto [globalIt, globalInserted] = globalStrings.insert(s); + if (!globalInserted) { + // We already had a global copy of this string. Cache it locally. + localIt->str = globalIt->str; + return localIt->str; + } + + if (!reuse) { + // We have a new string, but it doesn't have a stable address. Create a copy + // of the data at a stable address we can use. Make sure it is null + // terminated so legacy uses that get a C string still work. + allocated.emplace_back(); + auto& data = allocated.back(); + data.reserve(s.size() + 1); + data.insert(data.end(), s.begin(), s.end()); + data.push_back('\0'); + s = std::string_view(allocated.back().data(), s.size()); + } + + // Intern our new string. + localIt->str = globalIt->str = s; + return s; +} + +} // namespace wasm |