Skip to main content

Using The Pigeonhole Principle in C++ Metaprogramming

The Pigeonhole Principle is one of the most obvious fundamentals in mathematics. It is so obvious that you may be surprised that there is even a name for it. It states that:

"If n items are put into m containers, with n > m, then at least one container must contain more than one item."

Alternatively,

"If there are n items and m containers, with n > m, and only one item can fit in a container, then at least one item must remain out."

For those who prefer visuals and really hate math:


Even though the principle is simple it has been used to prove many complex mathematical theorems and lemmas. Here is one I find quite interesting:

"Incompressible strings of every length exist."

Alternatively,

"There is a file of every size that your favorite zip program can't compress."

The solution is left to the reader as an exercise.

So, does the Pigeonhole Principle show up in programming. Of course it does. That's why std::vector must allocate memory when its capacity is full. OK, but does it manifest in more interesting ways? As it turns out, it has been used in compile-time meta-programming to achieve interesting results. It manifests in preprocessor meta-programming and in template meta-programming in two distinct flavors.

The Pigeonhole Principle in C++ Preprocessor Meta-programming

Check out the following example. Also available here. The original author of this trick is unknown to me.
#include <iostream>

#define COUNT_ARGS(...)     PP_NARG_IMPL(__VA_ARGS__,PP_RSEQ_N()) 
#define PP_NARG_IMPL(...)   PP_ARG_N(__VA_ARGS__) 
#define PP_ARG_N( _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, N, ...) N 
#define PP_RSEQ_N() 10,9,8,7,6,5,4,3,2,1,0 

int main()
{
  std::cout << COUNT_ARGS(a,b,c,d); // prints 4
}
COUNT_ARGS is a "simple" macro that counts the number of variadic arguments it is called with. It does that by using a preprocessing programming trick based on the Pigeonhole principle. Here is how the macro expands:
  1. The COUNT_ARGS macro substitutes the arguments (a,b,c,d) in the __VA_ARGS__ part before calling PP_NARG_IMPL. The PP_RSEQ_N macro is a list of integers from 10 to 0, which is substituted in the PP_NARG_IMPL. Therefore, the PP_NARG_IMPL macro is "called" with actual arguments = a,b,c,d,10,9,8,7,6,5,4,3,2,1,0
  2. The PP_NARG_IMPL macro simply forwards its arguments to the PP_ARG_N macro.
  3. The PP_ARG_N macro is where the Pigeonhole Principle comes in to play. It has 11 named arguments: From _1, _2, _3, etc. and N. Note that _1, _2, etc. are not special. They are just macro arguments with an underscore at the beginning. You may want to rename them as one, two, three, four, etc. It won't make a difference. The PP_ARG_N always expands to its 11th argument because of N.
  4. The original argument list has 15 arguments but there are only 11 arguments to the PP_ARG_N macro. Obviously, not all are going to fit. The PP_ARG_N macro only "picks up" the first actual argument that does not get a slot (i.e., 11th)
  5. As N always coincides with the 11th actual argument, the PP_ARG_N results in that value producing the count.
Needless to say, that's clever! Now let's proceed with template meta-programming.

The Pigeonhole Principle in C++ Template Meta-programming

Check out the following example. Also available here.
int main()
{
 auto x = ::nth<7>(0,"1",'2',3,"4",'5',6,"7");
 std::cerr << x << std::endl;
}
The goal is to access the N-th element in a variadic function argument list. The output of the above program should be 7.

There are many ways to go about implementing it, most using recursion of some sort. However, there is one implementation I came across, which I find particularly interesting. Why? You guessed it ... It uses the Pigeonhole Principle to avoid recursion.

The code was originally written by Richard Smith. I found it through a post by Roland Bock on the boost developers mailing list. If you prefer more comments, please see the same example with comments by LJEvans.
#include <utility>
#include <iostream>

namespace detail
{
  struct any { template<typename T> any(T &&) {} };

  template<typename T, typename U> struct first { typedef T type; };

  template<typename ...Ts>
  struct select_impl 
  {
    template<typename U, typename ...Vs>
 static U &&select(typename first<any, Ts>::type..., U &&u, Vs &&...) 
    {
    return static_cast<U&&>(u);
    }
  };

  template<std::size_t... Idx, typename... Ts>
  static auto select(const std::index_sequence<Idx...>&, Ts&&... ts)
  {
     return select_impl<decltype(Idx)...>::select(static_cast<Ts&&>(ts)...);
  }
}

template<std::size_t N, typename ...Ts>
auto nth(Ts &&...ts)
{
  return detail::select(std::make_index_sequence<N>(), static_cast<Ts&&>(ts)...);
}

int main()
{
 auto x = ::nth<7>(0,"1",'2',3,"4",'5',6,"7"); // prints 7
 std::cerr << x << std::endl;
}
Here is how the nth<7>(...) function works in the example above.
  1. N is 7 and Ts is a variadic parameter pack of integers, character strings, and plain characters.
  2. The std::make_index_sequence is a new addition in C++14 that produces an instance of std::index_sequence given a compile-time integral constant. Here, it produces std::index_sequence<0,1,2,3,4,5,6>.
  3. The formal arguments to the nth function (captured in the parameter pack ts) are forwarded to detail::select using a static_cast. This function must return the nth argument among the forwarded arguments.
  4. In detail::select, the Idx parameter pack represents the indices from 0 to 6. Its deduced by the compiler looking at the type of the index_sequence instance.
  5. The select_impl class template is instantiated with the decltype of each member in the Idx parameter pack. decltype(ts)... expands in to a list of types for every member in Ids. In this case, it is just 'int, int, int,... 7 times. The remaining arguments to select_impl::select are just being forwarded as before.
  6. The select_impl::select has access to Ts parameter pack, which is at the class-template level. Recall that it is 'int,int,int,....'. The list of formal arguments to select_impl::select is broken down in to 3 parts: a variadic piece of N-1 arguments at the beginning, U&& in the middle, and everything else in Vs.
  7. The first N-1 arguments to select_impl::select are "absorbed" using the detail::any class. The detail::any has a single argument constructor that converts argument of any type to any. The first N-1 arguments are thus converted to any. In our example, all the arguments from 0 to 6 are converted to any. The conversion is achieved using an in place parameter pack expansion 'typename first::type...'. For every argument in the Ts parameter pack, the 'first' meta-function is applied, which results into the 'any' type every time.
  8. As the first N-1 arguments are out of the way, U&& necessarily fits the N-th argument. This is where the Pigeonhole Principle springs back in to action.
  9. The remaining argument after the N-th (if any) are left unused in the Vs parameter pack.

So, there it is: returning the N-th argument in a argument list without using recursion. In practice, however, std::make_index_sequence is implemented using recursion. So, the above code is not truly recursion-free.

OK ... So you read it all! I'm sure you found the use of the Pigeonhole Principle in processing variadics in C++ very interesting.

Comments

Unknown said…
Roland Bock did not come up with that trick. His post to the Boost mailing list gives a link to a bug report with an attachment that shows the trick. The code was originally written by Richard Smith. Please correct the attribution. Thanks.
Anonymous said…
Your post is interesting. THanks you for this comparison

Unfortunately, it is really sad to learn that C++ Template programming are much more complex than MACRO.

Honestly, the Template code is too much complex to be used in industry and I really wonder where this language is going to...

It is like SFINAE vs simple #ifdef test.
How to create our operating system?
Unknown said…
I have just visited your website and I found it very informative and useful for readers.Thanks for sharing and please keep updating with your views.


Maths Tutoring | IELTS Exam
Anonymous said…
https://www.amazon.jobs/en/jobs/423500
Anonymous said…
Roland Bock did not come up with that trick. His post to the Boost mailing list gives a link to a bug report with an attachment that shows the trick. The code was originally written by Richard Smith. Please correct the attribution.
Thanks for sharing
golden slot mobile
gclub

Popular Content

Unit Testing C++ Templates and Mock Injection Using Traits

Unit testing your template code comes up from time to time. (You test your templates, right?) Some templates are easy to test. No others. Sometimes it's not clear how to about injecting mock code into the template code that's under test. I've seen several reasons why code injection becomes challenging. Here I've outlined some examples below with roughly increasing code injection difficulty. Template accepts a type argument and an object of the same type by reference in constructor Template accepts a type argument. Makes a copy of the constructor argument or simply does not take one Template accepts a type argument and instantiates multiple interrelated templates without virtual functions Lets start with the easy ones. Template accepts a type argument and an object of the same type by reference in constructor This one appears straight-forward because the unit test simply instantiates the template under test with a mock type. Some assertion might be tested in

Multi-dimensional arrays in C++11

What new can be said about multi-dimensional arrays in C++? As it turns out, quite a bit! With the advent of C++11, we get new standard library class std::array. We also get new language features, such as template aliases and variadic templates. So I'll talk about interesting ways in which they come together. It all started with a simple question of how to define a multi-dimensional std::array. It is a great example of deceptively simple things. Are the following the two arrays identical except that one is native and the other one is std::array? int native[3][4]; std::array<std::array<int, 3>, 4> arr; No! They are not. In fact, arr is more like an int[4][3]. Note the difference in the array subscripts. The native array is an array of 3 elements where every element is itself an array of 4 integers. 3 rows and 4 columns. If you want a std::array with the same layout, what you really need is: std::array<std::array<int, 4>, 3> arr; That's quite annoying for

Covariance and Contravariance in C++ Standard Library

Covariance and Contravariance are concepts that come up often as you go deeper into generic programming. While designing a language that supports parametric polymorphism (e.g., templates in C++, generics in Java, C#), the language designer has a choice between Invariance, Covariance, and Contravariance when dealing with generic types. C++'s choice is "invariance". Let's look at an example. struct Vehicle {}; struct Car : Vehicle {}; std::vector<Vehicle *> vehicles; std::vector<Car *> cars; vehicles = cars; // Does not compile The above program does not compile because C++ templates are invariant. Of course, each time a C++ template is instantiated, the compiler creates a brand new type that uniquely represents that instantiation. Any other type to the same template creates another unique type that has nothing to do with the earlier one. Any two unrelated user-defined types in C++ can't be assigned to each-other by default. You have to provide a