Unraveling_string_view
Unraveling_string_view
C++:
string_view:
Deep dive
Jasmine Lopez
Prithvi Okade
2
Topics
• Motivation
• Performance benefits & basics
• string_view: Constructors, useful functions
• string vs. string_view and their interoperability
• When to use string_view
• Using string_view safely
• Intro to span
• span vs. string_view
• Case study of an optimization using string_view.
3
Motivation
• Consider a function foo which operates on an immutable string.
• In C++ we generally will create it with following signature.
If this was in a performance sensitive portion of the code and we did not want memory allocation, we
may need to write alternate methods.
2 void foo(const char* str, size_t len); For code reuse “1” and “3” will end up calling “2”.
3 void foo(const char* str); And the code will miss the niceties of using the
string API set.
foo(s); hello
hello
foo({p_str, p_str_size}); hello
foo(p_str);
Apart from this convenience, string_view also provides performance benefits which we will see shortly.
5
Basics
• string_view does not allocate any memory.
• It consists of a) pointer to string and b) length.
There are a lot of memory allocations for strings. Each substr call will cause a memory allocation.
string_view can be a good replacement for this scenario.
8
Performance benefits
• The fact that string_view does not allocate memory can be used to gain performance in some
scenarios. E.g., String splitting.
vector<string_view> split_string_sv(string_view str, char delim) { template <typename Collection>
vector<string_view> splits; void print(const Collection& coll) {
size_t index = 0; for (size_t i = 0; i < coll.size(); ++i) {
while (true) { cout << coll[i];
const auto found_index = str.find(delim, index); if (i != coll.size() - 1)
if (found_index != string::npos) { cout << '+';
splits.emplace_back(str.substr(index, found_index - index)); }
index = found_index + 1; cout << '\n';
} else { }
splits.emplace_back(str.substr(index));
break; string s("hello|how|are|you");
} print(split_string_sv(s, '|'));
}
return splits; hello+how+are+you
}
// C++ 20
typedef basic_string_view<char8_t> u8string_view;
} // namespace std
10
C++17 Constructors
constexpr basic_string_view() noexcept;
constexpr basic_string_view(const basic_string_view& other) noexcept = default;
constexpr basic_string_view(const CharT* s, size_type count);
constexpr basic_string_view(const CharT* s);
How does the following code work? string has a conversion operator to string_view.
// C++23 code.
std::string_view sv{nullptr};
Usage:
auto sv = "hello"sv;
string_view sv1 = "hello"sv;
Since operator""sv does not need to do strlen, it can contain embedded \0’s.
16
string_view vs. string
• string owns memory, string_view does not.
• string is always null terminated, string_view may not.
void foo(string_view s) {
if (s.empty()) Always check for empty before using
return;
const auto n = s.size(); Always use size() to figure out the range to
// Do stuff with n.
}
operate on. Never use just data().
17
string_view vs. string: library functions
• string_view is “mostly” non-mutable, so it does not have the following “mutating”
methods present in string:
• reserve • pop_back
• shrink_to_fit • append
• clear • operator+=
• insert • replace
• erase • resize
• push_back
• remove_prefix
name_
• remove_suffix name
18
string_view vs. string: library functions
string s("hello"); auto s = "hello"sv; cout << s[0] << s.at(1) << '\n';
cout << s[0] << s.at(1) << '\n';
s[0] = 'H'; s[0] = 'H'; error: cannot assign to return value because function
'operator[]' returns a const value
s.at(1) = 'E'; s.at(1) = 'E'; s[0] = 'H';
std::string_view sv("hello"); T
// Out of bound access.
std::cout << sv[100] << '\n';
20
string_view / string: Interoperability
string can be automatically converted to string_view.
std::string s("hello");
std::string_view sv = s;
A a("hello");
std::string s(a);
22
string_view / string: Interoperability
string_view to string conversion must be explicit.
// C++17 std::string_view sv("hello");
template <class StringViewLike> std::string s = sv;
explicit basic_string(const StringViewLike& t,
const Allocator& alloc = Allocator()); std::string_view sv("hello");
// From C++20 std::string s(sv);
template <class StringViewLike>
constexpr explicit basic_string(const StringViewLike& t,
const Allocator& alloc = Allocator());
Without that it is more likely developers would write more optimal code.
void Baz(std::string_view sv) { void Baz(std::string_view sv) {
Foo(std::string{sv}); // Construct string once and use it twice.
Bar(std::string{sv}); const std::string s{sv};
} Realize duplication and update. Foo(s);
Bar(s);
}
24
Where to use?
string_view can be used as a function argument to remove the need of multiple functions
dealing with strings.
void foo(const std::string& str);
void foo(const char* str, size_t len); void foo(std::string_view sv);
void foo(const char* str);
Functions accepting const string& can be replaced with string_view to remove memory allocation.
void foo(const std::string& str); void foo(std::string_view sv);
Ensure that string_view is not converted to string later. That will cause us to lose the optimization.
25
Where to use?
• string_view can used in constexpr functions.
• string constructors are constexpr only in C++20.
• Can be used to create compile time string constants.
constexpr string_view kHello("Hello"); This form (not using operator""sv) may take a little longer to
compile, since strlen equivalent is needed.
26
Where to use?
#include <string> If std::string is being used in many
constexpr char kHelloStr[] = "Hello"; functions, then std::string_view makes it
constexpr std::string_view kHelloSv{"Hello"}; little more cumbersome w.r.t typing for
void Foo(const std::string& s) {} developers. But is more performant.
int main() {
Foo(kHelloStr);
void Foo(std::string_view s) {}
int main() {
Foo(kHelloStr);
Foo(kHelloSv);
}
27
Where to use?
• In some cases, to gain performance, string_view can be returned from functions.
Some scenarios:
• If the function is returning compile time constant memory.
• If the function is returning parts of string_view which were sent in arguments to the
function.
string_view GetConstString(EnumValue e) {
// Returns constant string based on enum value.
// e.g. return "enumvalue1"sv.
}
• Since string_view does not own its memory, it needs to be used carefully to avoid use-
after-free scenarios.
30
Problem: Assigning strings to string_view
string foo();
auto& s = foo(); error: non-const lvalue reference to type 'basic_string<...>' cannot bind to a
temporary of type 'basic_string<...>'
auto& s = foo();
string_view s = foo();
cout << s << '\n'; // CRASH / UNDEFINED BEHAVIOR AT RUNTIME.
31
Problem: Returning string_view from functions
string_view foo() { This is fine because "hello" is part of read-only memory.
return "hello"sv; // This is fine.
}
string_view foo() { s will be destroyed at the end of function, hence the memory point by
string s("hello");
string_view will be dangling.
return s; // BAD.
}
string s("hello");
const auto sv = foo(s);
cout << sv << '\n'; // FINE
string s’s memory is destroyed at
const auto sv = foo("hello"); the end of the statement.
cout << sv << '\n'; // CRASH / UNDEFINED BEHAVIOR AT RUNTIME.
32
Problem: Returning string_view from class methods
class A {
string s_;
public:
string_view get_s() const { return s_; }
A a;
void set_s(string s) { s_ = move(s); } const auto as = a.get_s();
}; cout << as << '\n'; // FINE.
A a;
const auto as = a.get_s(); The memory that “as” points to is
a.set_s("hello"); destroyed here.
cout << as << '\n'; // CRASH / UNDEFINED BEHAVIOR AT RUNTIME.
struct A {
string_view sv;
A(string_view sv) : sv(sv) {}
};
A a1("hello"sv);
A a2("hello");
cout << a1.sv << a2.sv << '\n'; // FINE
string foo();
cout << a1.sv << a2.sv << a3.sv << '\n'; // UNDEFINED BEHAVIOR.
35
Problem: Catching issues with Warnings as Errors
Check out clang's dangling warning as errors. Check out MSVC’s Lifetime Rules of the C++ Core
Guidelines (-WLifetime).
Problem: returning string_view from functions
string_view foo() { s will be destroyed at the end of function, hence the memory point by string_view
string s("hello"); will be dangling.
return s; // BAD.
} Caught with -Wreturn-stack-address which shows up as a default warning in clang.
warning: address of stack memory associated with local variable 's' returned [-Wreturn-stack-address]
6 | return s;
string_view foo(const string& s LIFETIME_BOUND) warning: temporary whose address is used as value of local variable
{ 'sv' will be destroyed at the end of the full-expression [-Wdangling]
15 | const auto sv = foo("hello");
return s;
}
36
span: Motivation
Consider the following functions
void foo(int* arr, int n) { It is easy to make mistake in code. It should have been arr[i].
for (int i = 0; i < n; ++i) {
cout << i << ' ';
} This is the most error prone function. Since, we must depend on
cout << '\n'; caller to provide a valid “n”.
}
array needs exact size specification. That reduces the usability for
void foo(const array<int, 5>& arr) { general cases.
for (const auto i : arr) {
cout << i << ' ';
} template<typename T, size_t N> This helps with both uses:
cout << '\n'; void foo(array<T, N> arr) {
} for (const auto i : arr) { foo(array{1, 2, 3, 4, 5});
cout << i << ' '; foo(array{1, 2, 3});
}
void foo(const std::vector<int>& vec) { cout << '\n';
for (const auto i : vec) { }
cout << i << ' ';
} vector is typesafe. But needs extra memory during construction.
cout << '\n';
}
int arr[] = {1, 2, 3, 4, 5}; 0 1 2 3 4 Can we have some type which can consume all these
foo(arr, size(arr)); 1 2 3 4 5
foo(array{1, 2, 3, 4, 5}); 1 2 3 4 5 contiguous containers with a single interface?
foo(vector{1, 2, 3, 4, 5});
37
span: Motivation
void foo(span<int> s) { span provides a single representation for different types of
for (const auto i : s) { “contiguous” sequences of elements.
cout << i << ' ';
}
cout << '\n'; It also helps to decouple the interface from the actual type of
}
contiguous sequence (C-style array, std::array, std::vector).
int arr[] = {1, 2, 3, 4, 5}; 1 2 3 4 5
foo(arr); It is lightweight, does not allocate memory and holds only a
pointer and length.
array arr1{1, 2, 3, 4, 5}; 1 2 3 4 5
array arr2{1, 2, 3}; 1 2 3
foo(arr1);
foo(arr2);
v.data() 1 s.data
2 s.size = 3
3
vector v{1, 2, 3, 4};
// Dynamic span, initially with 2 elements.
span s{v.begin() + 1, 2}; //
v.data() 1 s.data
2 s.size = 2
3
4
// Const modifying data: Compilation error error: cannot assign to return value because function 'operator[]' returns a const value
s_const[1] = 5; s_const[1] = 5;
s_const.front() = 6; error: cannot assign to return value because function 'front' returns a const value
s_const.back() = 4; s_const.front() = 6;
sort(s_const.begin(), s_const.end());
error: cannot assign to return value because function 'back' returns a const value
s_const.back() = 4;
error: cannot assign to return value because function 'operator*' returns a const value
*__start = _Ops::__iter_move(__child_i);
40
span: usage scenario
const int arr[] = {1, 2, 3}; const int arr[] = {1, 2, 3};
foo(arr, size(arr)); foo(arr);
foo(array{1, 2, 3}); foo(array{1, 2, 3});
foo(vector{1, 2, 3}); foo(vector{1, 2, 3});
1 2 3 1 2 3
1 2 3 1 2 3
1 2 3 1 2 3
41
span: usage scenario
spans are also views which don’t own memory. So, they can always lead to dangling pointer access scenarios
Returning span from function.
span<int> GetSpanBad() { const auto s = GetSpanBad();
vector v{1, 2, 3}; // Cannot use elements, since they have been destroyed.
return v;
}
spans are views which don’t own memory. So, they can always lead to dangling pointer access scenarios
Return span from class member function.
class A { A a{1, 2, 3, 4, 5};
public: const auto s = a.GetVec();
A(initializer_list<int> l) : v_(l) {} // The underlying memory for span has been destroyed.
a.Add({6, 7, 8, 9, 10});
span<const int> GetVec() const { return v_; } // Undefined behavior, read deleted memory.
void Add(initializer_list<int> l) { v_.insert(v_.end(), l); } for (const auto i : s)
cout << i << ' ';
private:
vector<int> v_;
0 0 840699920 22077 5
};
A a{1, 2, 3, 4, 5}; 1 2 3 4 5
for (const auto i : a.GetVec()) {
cout << i << ' ';
}
45
span: best practices for usage
Use as argument to function which accepts any contiguous container.
void foo(span<const int> s);
Use as return value of function only when memory is backed by storage that will remain unchanged, e.g., globals.
span<const int> GetErrorCodes() {
// Not magic static.
static constexpr int kArr[]{10, 30, 30};
return span{kArr};
}
Don’t use span to hold non-const containers in local scope, because the container may get modified in the
same scope.
vector v{1, 2, 3, 4, 5};
span s{v};
// Rellocates memory, so "s" refers to deleted memory.
v.insert(v.end(), {6, 7, 8, 9, 10});
Since it has low overhead and is cheap to copy, pass by value instead of const &.
46
string_view vs. span
Both refer to contiguous sequence of elements starting at position zero with standard operations.
Both are lightweight easy-to-copy objects with a pointer and a size member.
string_view span
• Read-only view over strings. • View over contiguous sequence of elements
• Always constant, cannot be used to • span<T> can modify contents.
modify the referred string. span<const T> cannot.
// In header
const std::vector<std::string>& GetKnownHosts();
// In source file.
const std::vector<std::string>& GetKnownHosts() {
static const std::vector<std::string> known_hosts{
"bing.com",
"microsoft.com",
"sharepoint.com"
};
return known_hosts;
}
// In header
std::span<const std::string_view> GetKnownHosts();
// In source file.
std::span<const std::string_view> GetKnownHosts() {
static constexpr std::string_view kKnownHosts[] = {
"bing.com", "microsoft.com", "sharepoint.com"};
return kKnownHosts;
}
error: no viable conversion from returned value of type 'const std::string_view[3]' (aka 'const basic_string_view<char>[3]') to
function return type 'std::span<std::string_view>' (aka 'span<basic_string_view<char>>')
return kKnownHosts;
^~~~~~~~~~~
<<TRUNCATED>>
49
Optimization: Searching through the map
Original: Updated:
// In header // In header file
const std::vector<std::string>& GetKnownHosts(); std::span<const std::string_view> GetKnownHosts();
class A {
public: Using the above version, we immediately run into errors
// Other stuff.
bool IsInMap(const std::string& host) const {
error: no viable conversion from 'const std::string_view' to 'const
return host_int_map_.contains(host); std::string' (aka 'const basic_string<char>’)
} GetKnownHosts(), [&a](const auto& host) { return a.IsInMap(host); });
^~~~
private:
std::map<std::string, int> host_int_map_; A fix is to do an explicit conversion to string
};
bool HasKnownHost(const A& a) {
bool HasKnownHost(const A& a) { return std::ranges::any_of(
return std::ranges::any_of( GetKnownHosts(),
GetKnownHosts(), [&a](const auto& host)
[&a](const auto& host) { return a.IsInMap(std::string{host}); }
{ return a.IsInMap(host); } );
); }
}
class A { class A {
public: public:
// Other stuff. // Other stuff.
bool IsInMap(const std::string& host) const { bool IsInMap(std::string_view host) const {
return host_int_map_.contains(host); return host_int_map_.contains(host);
} }
Does not compile
private: private:
std::map<std::string, int> host_int_map_; std::map<std::string, int> host_int_map_;
}; };
// In header
std::span<const std::string_view> GetKnownHosts();
// In header
std::span<const std::string_view> GetKnownHosts(); Do a string conversion.
class A { class A {
public: public:
// Other stuff. // Other stuff.
bool IsInMap(std::string_view host) const { bool IsInMap(std::string_view host) const {
return host_int_map_.contains(host); return host_int_map_.contains(std::string{host});
} }
private: private:
std::map<std::string, int> host_int_map_; std::map<std::string, int> host_int_map_;
}; };
// In header
std::span<const std::string_view> GetKnownHosts();
// In header
std::span<const std::string_view> GetKnownHosts();
A better fix is to use transparent comparator
class A { class A {
public: public:
// Other stuff. // Other stuff.
bool IsInMap(std::string_view host) const { bool IsInMap(std::string_view host) const {
return host_int_map_.contains(host); return host_int_map_.contains(host);
} }
private: private:
std::map<std::string, int> host_int_map_; std::map<std::string, int, std::less<>> host_int_map_;
}; };
// In header
std::span<const std::string_view> GetKnownHosts();
A better fix is to use transparent comparator
class A { class A {
public: public:
// Other stuff. // Other stuff.
bool IsInMap(const std::string& host) const { bool IsInMap(std::string_view host) const {
return host_int_map_.contains(host); return host_int_map_.contains(host);
} }
private: private:
std::map<std::string, int> host_int_map_; std::map<std::string, int, std::less<>> host_int_map_;
}; };
If you have a non-constructor function that accepts const std::string& consider whether you can
convert that to std::string_view.
Victor Ciura
Chandranath
Enough string_view to hang ourselves
Bhattacharyya
A Short Life span<> For a Regular Mess - std::span
61
! ?
Thank you Questions