If you cannot name it, it shouldn't exist!


Abstract

As age progresses, I become less and less tolerant toward certain things, today it is the turn of (unnamed) tuples, pairs, and similar atrocities like them

One of the most controversial, if not the most controversial, topic in programming is the naming of variables. Yes, sure, there are religious wars for braces — on the same line? On next one? — and for indentation — should that be three spaces? Maybe four? Why not tabs? —, and there are divergent opinions on how comments should be written, if not written at all, but the question of naming variables, and at a certain extent of other programming objects such as types, classes, namespaces and so on, is a deeper one. It is deeper because it is all about what programming is: abstraction.

Programming languages evolved in the direction of being more and more abstract from the hardware and the machines running them: we even added layers of virtual machines to abstract hardware even more and liberate the thoughts and make them feasible through programming. Programming languages even took distance from the way computers work: there are no such things as classes or functional — as in functional programming — constructs down at the level of the hardware. Even though, we use them and we were able to use them in day by day programming. Even low level programming languages are abstract enough for us to care about how the memory works and, as a matter of fact, even the assembly language relies on abstractions such as virtual memory or instructions operating directly on memory cells that in reality need to be translated in multiple operations at the level of the hardware without the need for the programmer to know.

A lot has been written already on how to give names to variables, for instance The Practice of Programming and Code Complete, and even more methodical approaches have been proposed, such as the Hungarian notation in Windows — while hated, at least we should recognise its merits in a world lacking powerful IDEs to remind of types to the programmers —, and this write up is not here to stress more on the same concepts and ideas. I am writing this to discuss some constructs that, under the promise of making things easier, can easily result in future problems: the use of unnamed tuples (which includes, on a certain extent, the std::pair in C++).

Why are unnamed tuples an horrible idea?

First of all, let's clarify that I am referring to unnamed tuples which rely solely on the position of an element to as the way access it. On some extent, the first and second fields in a std::pair are an extremely convoluted way of access the first and second elements of an object storing two values of, possibly, different types, without using any more appropriate or distinctive name for them. More commonly, a unnamed tuple is some object of the form

 std::tuple<int, int, string> x(42, 21, "Hello World!");

While we cannot deny a priori that such a programming element could represent an object of the real world, if for certain hides the meaning of the components of the object itself. Let me clarify with an example

 std::tuple<string, int, string> customer("Francesco Nidito", 40, "XYZ st.");

Without thinking too much: what is the int in the middle? Could that be the age? The house number? The credit available? I cannot deny that while writing it, you perfectly know what that is but, most probably, in month times you will not remember what that was. We can lie to ourselves in the following way

 // - Name
 // - House Number 
 // - Address
 //
 std::tuple<string, int, string> customer("Francesco Nidito", 40, "XYZ st.");

But are we going to remember what the int was hundreds of lines far from the declaration? We can always pub comments everywhere to remind ourselves what that was. Even more, we could easily make mistakes very difficult to find:

 ...
 stuff.shipToAddress(customer.get<0>());
 ...

When I will be famous, there will be a Francesco Nidito road — and even a square! — but at the moment, we are sending the parcel nowhere.

Why are tuples widely used?

According to Larry Wall, the father of Perl, one of the three virtues of programmers is laziness. In his own words

«(laziness) makes you write labor-saving programs that other people will find useful, and document what you wrote so you don't have to answer so many questions about it» — from Programming Perl

While the first part could be true, the second one is, for the greatest part, false as documentation is quite a rare good in programming. On the other hand, laziness with a more mundane meaning is actually well spread in the field of software engineering.

In this energy reducing effort, tuples and pairs help a lot: why should I create a short lived class / type just to make possible to return three values from this function? It is way faster just to pack them in a tuple and return them!

In some cases, that is an obvious saving of time. For instance, in the case below, creating a small struct with two fields representing the quotient and the rest makes the code as twice as large, if not more ♠.

 // returns the quotient and rest of the
 // integer division between x and y
 //
 std::tuple<int, int> div(int x, int y) {
    return std::tuple<int, int>(x/y, x%y);
 }
 ...
 auto [q, r] = div(someNumber, anotherNumber);

At the same time, if instead of assigning the quotient and the reminder to q and r we would have started to copy them around inside the tuple, we would have totally lost visibility of what they were supposed to be, if not now or next week, in a couple of months, coming back to the same code, we will most probably lose a lot of time to understand what they are supposed to be.

A good question could be: do they really save time? At the very moment of using them, most probably yes but in the long run, they really don't as reading again the code months after writing it, to do changes or fixing bugs, could be time consuming because everything look like a something.get<0> and something.get<1>. If you are really writing some throw away code —that you write, use one, and then delete— it could probably save something but if you are writing code to stay, please, don't use them or you —or someone else— will pay the price in the future.

It happened to me: in one piece of code in which I needed to use some small structures of two fields, I (ab)used the std::pair type and when I had to modify the code after some months I found myself in a forest of +.first+ and +.second+ without really remembering why they were there and what they were.

What can be done?

In C++ there is not a lot that can be done other than not using the std::pair and the std::tuple types and creating some real types instead. Notice that using structures can be very efficient or fast as well to create and initialize, for instance

 struct div_t {
    const int quot;
    const int rem;
 };
 ...
 div_t div(int x, int y) {
    return div_t { .quot=x/y, .rem=x%y };
 }

Once the type is created, the expression div_t { .quot=x/y, .rem=x%y } just tells the compiler to create an object of type div_t and to initialize the fields quot and rem with the right values. While the length of the expression depends on the name of the type, it is not longer than the using the std::tuple. Moreover, the code above is easier to read and to code-review as it can be easily checked in which field / position the result of which operation goes.

Notice that in modern C++, you can sill assign quot and rem to q and r using the structured binding as in the case of the std::tuple!

Post Scriptum

I received quite a bit of feedback from the readers and I want to clarify something in the following

"So we should not use array with indices" — arrays are positional / geometric containers, not data structures, if something has index 0 it is because it fits the first position and that is meaning enough. On the other hand, using an array of three integers instead of a class / struct containing the three fields should be punishable with death 😊

"This means that he'd hate Tacit programming" — the elision of temporary variables is fine, as they could have had a name. If a result makes sense only after a composition of functions, let's say f(g(something)), it makes sense not to have a temporary value in the middle as it makes more clear the fact that the expression has a reasonable value only after the application of both functions.

More in general, the point of the write-up is mainly on using (unnamed) tuples, and in C++ the std::pair, instead of creating types with meaningful names that can help understanding the code from the logical point of view.

Notes

stdlib.h contains the function div which actually contains a struct div_t with two fields quot and rem, bear with me, this is just for the sake of the example.

Share me with