Note to self: Never try to modify your examples in realtime during a presentation, to show an idea that just came through your mind: You might succeed.

The problem

We have an old pre-C++11 class:

class C {
    string _s;
public:
    C(const string &s) : _s(s) {}
    const string &get() const { return _s; }
};

As you can see, class C contains a function get() which returns a reference to its internal state. In normal code, we must take care not to use this reference after our class has been destroyed. This was quite an improbable event in pre-C++11,  since, if we were good programmers, we probably never returned any complex object (like an object of type C) from a function, by value.

In C++11, thanks to move semantics, this class will receive an automatic upgrade to have move constructors, so in our brand new code it’s quite efficient to write something like:

C f() {
    return C("test");
}

In how many ways can we capture the return value of this function?

1. Direct use

We can use the returned value directly:

cout << f().get();

This is safe: f() returns a temporary instance of C, which is kept alive until the end of the full statement (until the semicolon).

2. Store the object in a local variable

Create the object, store it in a new variable, and then do the get together with the usage

C c = f();
cout << c.get();

This is also safe: the variable c play the role of the temporary in previous example, but its life is now even bigger, being the enclosing scope of our statements.

3. Store the object in a local variable and the string in a reference

We can additionally store the result of get() in a temporary reference for later use

C c = f();
const string &s = c.get();
cout << s;

That’s the same as before, but the visibility of variable s is now extended to the full enclosing scope. This is safe, because c‘s lifetime is also the full enclosing scope.

4. Store a reference to the object

What about storing the object in a reference? Something like

const C &c = f();
cout << c.get();

Wait… should this work?

Yes! And the reason why it works is called temporary lifetime extension (TLE): life of the temporary returned by f is extended to match the one of the reference in which it is stored, i.e. the variable c. In practice, this case os are equivalent respectively to the 2.

5. Store a reference to the object and the string in a reference

const C &c = f();
const string &s = c.get();
cout << s;

This is equivalent to 3, with TLA on c as in case 4.

6. Create on-the-fly and store the result

const string &s = f().get();
cout << s;

Aaand, this is a dangling reference.

Why?

No reference is created, no lifetime is extended. As in the first case, the life of temporary returned by f() is the full statement (i.e. until the semicolon), but the reference returned by get is stored in s and so it will be accessible (and used) even after.

This is really nasty: we must somehow instruct the users of our object not to follow this pattern, or inhibit them from doing it.

Possible Solutions

It’s pretty clear that the problem is that we’re using a reference for return, isn’t it? Let’s see some possible solutions.

Always store in a variable

In practice, the idea is documenting your library by kindly asking the user not to use your function f in an expression, but rather always store its returned value in a variable.

Leave the code as is, but make 6 “invalid by design”. Incidentally, other perfectly fine use cases are invalidated as well (1).

Test case Result Notes
1 N/A Invalid by design
2 Pass
3 Pass
4 Pass TLE on c
5 Pass TLE on both s and c
6 N/A Invalid by design

It works, ok, but who RTFM after all? The probability of this non-solution failing are quite high.

Always return by value (a.k.a. “the condom”)

This seens the safest way. We pay a little penalty, but if you always return by value, you have no lifetime problems, right?

class C {
    string _s;
public:
    C(const string &s) : _s(s) {}
    string get() const { return _s; }
};

 

Test case Result Notes
1 Pass Additional copy of get()‘s result
2 Pass Additional copy of get()‘s result
3 Pass Additional copy of get()‘s result, TLE on s
4 Pass Additional copy of get()‘s result, TLE on c
5 Pass Additional copy of get()‘s result, TLE on both s and c
6 Pass Copy of get()‘s result, TLE on s

It solves 6, but it add quite a big overhead everywhere.

Always return by value and store in a variable (a.k.a. “the double condom”)

A union of the previous 2 cases, makes 6 (and 1) “invalid by design”, but it’s fault-tolerant in case the user didn’t really RTFM.

Test case Result Notes
1 N/A (Pass) Invalid by design. Additional copy of get()‘s result
2 Pass Additional copy of get()‘s result
3 Pass Additional copy of get()‘s result, TLE on s
4 Pass Additional copy of get()‘s result, TLE on c
5 Pass Additional copy of get()‘s result, TLE on both s and c
6 N/A (Pass) Invalid by design. Fallback with copy of get()‘s result, TLE on s

In practice, its only use is to top the number of potential copies, and make a lot of code “invalid by design”.

Selective invalidation of return from temporary

The problem is keeping reference to temporaries. Can we explicitly inhibit this at compile time, and thus not requiring the user to RTFM? Apparently we can.

class C {
    string _s;
public:
    C(const string &s) : _s(s) {}
    const string &get() const & { return _s; }
    const string &get() && = delete;
};

In this solution we differentiated two uses of the class C, when the class is an r-value (a temporary), invokation of get() is forbidden.

Quite elegant, but still…

Test case Result Notes
1 Compile time error Invalidates good code 🙁
2 Pass
3 Pass
4 Pass TLE on c
5 Pass TLE on c
6 Compile time error 🙂

6 is invalidate at compile time. Unluckily, 1 is also invalidated (and we might break some perfectly fine code).

Move on return from temporary

This is probably the most elegant solution I could find. Overloading the r-value version of get() and make it return a moved version of the content.

class C {
    string _s;
public:
    C(const std::string &s) : _s(s) {}
    const string &get() const & { return _s; }
    string get() && { return move(_s); }
};

It won’t break existing code, but it will slightly modify its behaviour.

Test case Result Notes
1 Pass Additional move of get()‘s result
2 Pass
3 Pass
4 Pass TLE on c
5 Pass TLE on c
6 Pass move of get()‘s result, TLE on s

So, in this case we’re still paying a move (hopefully elided) in case 1 (something we weren’t really paying before), but even the price for making case 6 work is pretty small.

If I find the time, I’ll try to do a performance analysis on these additional costs.

PS: I’m not sure move(…) is really needed inside get() && body. _s, being part of a && object, should be && itself, and so move(…) won’t do anything. Buy I need to verify this.

4 Comments


  1. `std::move` is, in fact, needed. Inside `get() &&`, `*this` is stil just the same old lvalue. The `&&` part is only interesting for selecting the right overload, see also this answer of mine on SO: http://stackoverflow.com/a/8610728/500104

    Reply

    • Thank you, very interesting answer on SO!

      Reply
  2. peppe

    For a simple example of application of this in real world code: https://code.woboq.org/qt5/qtbase/src/corelib/tools/qstring.h.html#408

    toLower/toUpper/etc. return a copy of the string in lowercase/uppercase/etc.. However, if the string is a rvalue, there’s usually no need to allocate an extra string (*): just do the replacement in place. The helpers are simply overloaded on QString & (in place) and const QString & (not in place).

    (*) to be pedantic, since this is Unicode, a reallocation might still be necessary if one code point needs to get translated into a bigger number of code units…

    Reply

    • Thank you, I didn’t know they added move semantics to QT5 already, my experience with QT is a bit outdated now.

      Reply

Leave a Reply