std::optional<T> and non-POD C++ types

C++ 17 has introduced the std::optional<T> template class that is analogous to the Maybe/Optional monad implementation in many other languages. “Analogous” is doing a lot of work in this statement because the C++ type checker is not going to help you avoid dereferencing an empty optional like Rust, Haskell, Scala, Kotlin, TypeScript and many other languages will do.

That does not make it useless. As with many things in C++, we will be careful™ when using it and write only programs that do not dereference an empty optional.

In languages that deal mostly with reference types, an optional type can be implemented as an object that wraps a reference and a tag bit that tells if the optional has some data or nothing.¹ In C++ on the other hand, the std::optional<T> will inline the value type T onto itself. That means the general behavior of an optional of T depends a lot on the specifics of the type it’s wrapping.

For integer, floats, characters, the use of std::optional<T> doesn’t bring many surprises. In this post I want to look at what happens when non-POD types are wrapped in an optional. For this, I will write a class that prints a different message on each special function call:

class Object {
 private:
  std::string _s;

 public:
  Object() { puts("default-constructed"); }
  ~Object() { puts("destroyed"); }

  explicit Object(const std::string &s) : _s(s) { puts("constructed"); }

  Object(const Object &m) : _s(m._s) { puts("copy-constructed"); }
  Object &operator=(const Object &m) {
    puts("copy-assinged");
    _s = m._s;
    return *this;
  }

  Object(Object &&m) : _s(std::move(m._s)) { puts("move-constructed"); }
  Object &operator=(Object &&m) {
    puts("move-assigned");
    _s = std::move(m._s);
    return *this;
  }

  void dump() const { puts(_s.c_str()); }
};

And write a function that returns an optional of this class — a common use-case of optionals. The returned optional will contain a value if the string argument is non-empty, and be std::nullopt otherwise.

std::optional<Object> maybe(const std::string &s) {
  if (s.empty()) {
    return std::nullopt;
  }
  return Object(s);
}

The good

When using a std::optional<Object>, neither the Object constructors or destructors have to be called if the variable never gets populated with a value.

To see this in action, consider program1

void program1(const std::string &s) {
  const std::optional<Object> o = maybe(s);
  if (o) {
    o->dump();
  } else {
    puts("<empty>");
  }
}

and its output when called with an empty string

<empty>

The ugly

Things get more involved when program1 is called with "Hello!" and the optional gets populated

constructed
move-constructed
destroyed
Hello!
destroyed

The return Object(s) line in maybe, calls Object::Object(const std::string &) to create the Object that then gets moved into the storage within the std::optional<Object>. If Object didn’t have a move-constructor, it would be copied here. At the end of the scope of maybe, the “moved-from” temporary Object instance is destroyed, and at the caller — program1 — Object::~Object has to be implicitly called again to destroy the Object within the std::optional<Object> instance.

This situation can be improved if we tell std::optional<Object> to forward the arguments to Object::Object so it can construct Object in the optional’s storage area right away without a temporary

   if (s.empty()) {
     return std::nullopt;
   }
-  return return Object(s);
+  return std::optional<Object>(s);

With this change, the output of program1("Hello!") becomes

constructed
Hello!
destroyed

Only one constructor invocation and one destructor invocation. A win!

However, most functions returning a std::optional<T> are calling some function that returns T in the code path that instantiates and returns the optional. That takes us back to the same situation of duplicated constructor/destructor invocations.

Object makeObject(const std::string &s);

std::optional<Object> maybe(const std::string &s) {
  if (s.empty()) {
    return std::nullopt;
  }
  return makeObject(s);
}

To improve this and keep the logic of makeObject separate from maybe, we would have to change makeObject to allow the perfect-forwarding of the parameters from maybe, to makeObject, to std::optional<Object>::optional, to Object::Object!

Another common way of writing these functions is by declaring a variable of type T, performing some operations on it, and then returning it wrapped in an optional. This has the same problem we started with.

std::optional<Object> maybe(const std::string &s) {
  if (s.empty()) {
    return std::nullopt;
  }
  Object o(s);
  doSomething(o);
  return o;
}

The bad

These problems might not be a big deal in most situations, but if you insist on returning non-PODs wrapped in a optional, make sure that:

The wrapped type should be cheaply movable, otherwise your program might be copying it on every function call due to innocent-looking code;
Define your destructors outside the class declaration so they don’t get inlined by the compiler in both functions — the caller and the callee that returns the optional — to avoid binary size increase.

The specific situation in which I’ve seen this really affect binary size and possibly performance is when functions are written to return instances of classes generated by the Google Protocol Buffers compiler.

Google Protocol Buffers for C++ was designed before C++11 (i.e. before move-semantics was added to the language). Its APIs and generated code are designed to make it possible to use the classes without ever invoking copy constructors. It’s a good API.

If you never invoke a function, it doesn’t need to be in the compiled binary. You can notice a sudden increase in the binary size of your program when a single call to a big function is added to the codebase. Returning an optional of a Protocol Buffers object is enough to instantiate a lot of code that could otherwise never be needed.

Let’s take a look at the generated code based on a Protocol Buffers message called Date

class Date PROTOBUF_FINAL : public ::PROTOBUF_NAMESPACE_ID::Message {
 public:
  inline Date() : Date(nullptr) {}
  virtual ~Date();

  Date(const Date& from);
  Date(Date&& from) noexcept
    : Date() {
    *this = ::std::move(from);
  }

  inline Date& operator=(const Date& from) {
    CopyFrom(from);
    return *this;
  }
  inline Date& operator=(Date&& from) noexcept {
    if (GetArena() == from.GetArena()) {
      if (this != &from) InternalSwap(&from);
    } else {
      CopyFrom(from);
    }
    return *this;
  }
  ...

The move-constructor (by calling operator=(Date &&)) can potentially call InternalSwap and CopyFrom. The latter is called when the objects can’t be swapped because they are allocated in different arenas and have to be to be copied instead. By using the move-constructor of this object, both the moving (swapping) and copying functions are instantiated in the binary. This explains why returning the optional of a Protocol Buffers class increases the binary size of a program that, before doing that, didn’t have a need for InternalSwap and the CopyFrom function.

Recommendation

By adopting a more C-like way of initializing structures, the unnecessary use of move-constructors and extraneous destructor calls can be avoided. This pattern fits nicely with the code generated by Protocol Buffers.

Let’s add a new member function to the Object class

void set(const std::string &s) { _s = s; }

and write an alternative to program1 — program2

[[nodiscard]] bool maybe(const std::string &s, Object &out_object) {
  if (s.empty()) {
    return false;
  }
  out_object.set(s);
  return true;
}

void program2(const std::string &s) {
  Object o;
  if (maybe(s, o)) {
    o.dump();
  } else {
    puts("<empty>");
  }
}

The maybe function was rewritten to take an output parameter and return a boolean. out_object is changed in-place and doesn’t have to be moved into an optional and then destroyed within maybe. As expected, program2("Hello!") generates a cleaner output

default-constructed
Hello!
destroyed

program2 does not have to ever call the move-constructor, so it can be discarded by the linker and the destructor is called only once. If the destructor was inlined, it would be inlined once in the program, not twice.

Conclusion

Optionals are far from zero-cost abstractions in C++ and if this cost matters to you, taking output parameters and returning non-discardable booleans is an advantageous alternative solution to a function returning std::optional<T> when objects of type T are expensive² to move and/or destroy.

Languages like TypeScript can statically determine if an object is set based on its position in the control flow of the program. ↩
In the sense of run-time and size of the code in the binary. ↩

std::optional and non-POD C++ types

The good

The ugly

The bad

Recommendation

Conclusion