Leveraging Zero-Cost Abstractions in C++: Variadic Templates

C++’s strength mostly comes from the zero-cost abstractions it provides. Stroustrup explains what it means in the C++ papers:

In general, C++ implementations obey the zero-overhead principle: What you don’t use, you don’t pay for [Stroustrup, 1994]. And further: What you do use, you couldn’t hand code any better.

This can be achieved because:

C++ maps directly onto hardware. Its basic types (such as char, int, and double) map directly into memory entities (such as bytes, words, and registers), most arithmetic and logical operations provided by processors are available for those types. Pointers, arrays, and references directly reflect the addressing hardware. There is no “abstract”, “virtual” or mathematical model between the C++ programmer’s expressions and the machine’s facilities. This allows relatively simple and very good code generation.

Variadic Functions in C/C++

It’s been possible to implement variadic functions in C and C++ for a long time by using the standard <stdarg.h> header.

That’s how you can implement an average function that takes multiple parameters (can be called by average(3, 1.0, 2.0, 4.0)):

#include <cstdarg>

double average(int count, ...) {
    va_list args;
    // Requires the last fixed parameter to get the address
    va_start(args, count);

    double sum = 0.0;
    for (int j = 0; j < count; j++) {
        sum += va_arg(args, double);
    }

    va_end(args);

    return sum / count;
}

The syntax is a little complicated but it gets the job done. Complicated syntax is not the only problem with this kind of implementation. It’s necessary to have at least one fixed parameter (count in this case). The va_start macro uses this parameter to figure out the start address of the list of arguments. As it’s not possible to figure out the size of the va_list by only using its value, the programmer has to provide the length to the function correctly. Here, the count parameter is used for this purpose. A call like average(4, 1.0, 2.0) will read more memory than it should and the compiler isn’t able to warn you about that. Once the length is figured out we can loop and successively call va_arg(args, double) to read the next double in the va_list. This is problematic because we’re paying the cost of a loop ((1.0 + 2.0 + 4.0) / 3 would be the ideal zero-overhead way to calculate the average of the 3 numbers) and we are trusting that the caller provided only doubles to the va_list (the compiler won’t warn you if you call average(3, a, "foo", c)).

printf is a very commonly used variadic function. Compilers and library implementors use non-standard extensions to implement a printf that doesn’t suffer from the common problems with variadic functions. It means that printf("%ld\n", avg); will raise a warning (avg is double):

warning: format ‘%ld’ expects argument of type ‘long int’, but argument 2 has type ‘double’ [-Wformat=]`.

A standard implementation of printf is problematic because it relies on the format string to figure out the length of the va_list and the types of the arguments. It’s the programmer’s responsibility to ensure that the right number of parameters is passed and that the format specifiers match the parameters’ types.

Variadic Templates to the Rescue

C++11 introduces variadic templates. Let’s implement the average function using variadic templates and see how it compares with the va_list version. The first advantage: we won’t need a count parameter and we’ll be able to simply call average(a, b, c) to calculate the average of 3 numbers.

First, we need a function to count the number of arguments passed:

int count() {
  return 0;
}

template<typename T, typename... Args>
int count(T n, Args... args) {
  return 1 + count(args...);
}

We have to define count for two cases: the zero-argument case and the one-or-more-arguments case.

typename... Args is how you define a variadic template. Our implementation of count allows 0 or more type parameters. A different type for each argument can be provided when calling. count<int, const char[2], float>(1, "a", 2.0) or simply count(1, "a", 2.0) will return 3.

When count(a, b, c) is called, the n receives a (consequently T becomes the type of a) and args of type Args... receives a, b then it effectively returns 1 + count(b, c). That’s what the 1 + count(args...) expression expands to.

Similarly, we can define a variadic template function that sums all parameters:

double sum() {
  return 0.0;
}

template<typename T, typename... Args>
double sum(T n, Args... args) {
  return n + sum(args...);
}

The sum of zero numbers is 0.0 and the sum of a number n and args numbers is n + sum(args...).

Now we can easily define the average function using sum and count:

template<typename T, typename... Args>
double average(T n, Args... args) {
  return (n + sum(args...)) / (1 + count(args...));
}

This function takes a variadic number of arguments and divides their sum (n + sum(args...)) by their number (1 + count(args...)).

Defining it like double average(Args... args) { return sum(args...) / count(args...); } would be problematic as it would lead to division by zero and it doesn’t make sense to calculate the average of zero numbers after all.

Variadic Templates Allow the C++ Compiler to Complain About Problems

Let’s abuse the function and see how the compiler reacts. Calling average with no arguments (average()) should not work:

error: no matching function for call to ‘average()’
note: candidate is:
note: candidate expects 2 arguments, 0 provided

Actually, calling with one argument will work.

Passing a non-numeric value (average(a, "foo")) won’t work:

required from ‘double average(T, Args ...) [with T = double; Args = {const char*}]’
required from here
error: invalid operands of types ‘const char*’ and ‘double’ to binary ‘operator+’

You can’t sum const char* and double, thus average(a, "foo") will fail.

Variadic Templates Allow the C++ Compiler to Optimize the Generated Code

The generated code is very efficient. Calling average(a, b, c) will generate the same assembly code (a + b + c) / 3 would.

count():
	movl	$0, %eax
	ret
sum():
	pxor	%xmm0, %xmm0
	ret
.LC2:
	.string	"%lf\n"
main:
	subq	$8, %rsp
	pxor	%xmm0, %xmm0
	addsd	c(%rip), %xmm0
	addsd	b(%rip), %xmm0
	addsd	a(%rip), %xmm0
	divsd	.LC1(%rip), %xmm0
	movl	$.LC2, %edi
	movl	$1, %eax
	call	printf
	movl	$0, %eax
	addq	$8, %rsp
	ret