NOTE (2016/03/26 01:30PM): The content of this post has has been fixed. I must thank Lawrence Crowl for pointing out my mistake at SG14 GDC 2016 meeting.
It is well known that using ordering comparison operators (<, <=, >, >=) with mixed signed/unsigned types result in warnings and, ultimately, in undefinedunexpected behaviour.
I never loved this, and I always tried to avoid it by using a simple trick, which apparently can be made into a definition that will make operators in mixed sign cases avoid undefined behaviour and have unsurprising semantics.
A simple function to convert from signed to unsigned is the key:
template<typename T> auto to_unsigned(T value) { return std::make_unsigned_t<T>(value); }
This function alone could raise UB if we apply to a negative value, but all the positive signed values fit in the corresponding unsigned type, and the conversion maintain the value intact.
Now the various cases are actually straightforward, and are listed in the table below for the expression X <op> Y:
<op> | X | Y | Expression |
---|---|---|---|
< or <= | signed | unsigned | (X < 0) || (to_unsigned(X) <op> Y) |
< or <= | unsigned | signed | (Y >= 0) && (X <op> to_unsigned(Y)) |
> or >= | signed | unsigned | (X >= 0) && (to_unsigned(X) <op> Y) |
> or >= | unsigned | signed | (Y < 0) || (X <op> to_unsigned(Y)) |
All the comparisons are now well defined (comparison against 0 are signed, comparison between the two operands are all unsigned).
For completeness, let’s also add the equality operators
<op> | X | Y | Expression |
---|---|---|---|
== | signed | unsigned | (X >= 0) && (X <op> std::make_unsigned() |
== | unsigned | signed | (Y >= 0) && (X <op> std::make_unsigned(Y)) |
!= | signed | unsigned | (X < 0) || (std::make_unsigned(X) <op> Y) |
!= | unsigned | signed | (Y < 0) || (X <op> std::make_unsigned(Y)) |
So, what’s the problem? If it’s that simple, why nobody has standardised this yet?
I think the problem relies on equality comparison, which is defined oddly, and might result in confusion. Current wording for the standard is the following (taken from one of the latest C++14 drafts):
- Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:
- […]
- Otherwise, the integral promotions (4.5) shall be performed on both operands.62 Then the following rules shall be applied to the promoted operands:
- If both operands have the same type, no further conversion is needed.
- Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank shall be converted to the type of the operand with greater rank.
- Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.
- Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.
- Otherwise, both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
Apparently, if conversion from signed to unsigned is deterministic, this gives the opportunity to assume that the following code will print “true”:
int s = -1; unsigned int u = (unsigned int)s; //* explicit conversion of a negative value to unsigned if (s == u) //* implicit conversion of a negative value to unsigned cout << "true"; else cout << "false";
Someone already interpreted this code as correct, such as here. It is my opinion the two lines marked as //*
will invoke undefined behaviour themselves (the second one because of the conversion, and not the comparison).
This actually holds also for inequalities, but the reasoning there is clearly wrong (because for normal humans the fact that in most platform (1u < -1)
results in true doesn’t really make much sense).
From this point, it is also straightforward to design a set of rules for promotions in signed/unsigned comparison. Actually, by rewriting the operands as described in the tables above, all the comparisons between different types (short, int, long, …) become well-defined against existing rules (plus the assumption above).
Can we define a new type which behave correctly? Yes, but it must be noted that a change like this might break existing (but arguably wrong) code.
Acknowledgments:
J. W. Walker, Safer Comparison of Signed and Unsigned Integers in C++, 2013, May 6th (retrieved 2016, Mar 14th)
Update: I just read this from Denis Bider, posted few hours ago: apparently this is a hot topic today :)
#
The conversion of -1 to unsigned will always be well defined, as I pointed out in my SO answer here: http://stackoverflow.com/a/22801135/1708801 -1 will convert to the max unsigned value.
Unsigned integers follow the law of arithmetic modulo 2^n and so they can not overflow in the sense that singed overflow can be undefined behavior in expression if it overflows or implementation defined in the case of conversions where the value can be represented by the signed integer.
This is backed up by reading the draft C++14 draft standard section 3.9.1 [basic.fundamental] which says:
Unsigned integers shall obey the laws of arithmetic modulo 2^n where n is the number of bits in the value
representation of that particular size of integer.
and section 4.7 [conv.integral] which covers integral conversion and says:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s
complement representation, this conversion is conceptual and there is no change in the bit pattern (if there
is no truncation). —end note ]
Which is different than the case of conversion to a signed integer which can result in implementation defined conversion if the value can not be represented in the destination type or overflow in an expression which is undefined behavior as stated in section 5 [expr]:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of
representable values for its type, the behavior is undefined.
So there is no undefined behavior in either of the lines you indicate. The results here are un-intuitive if you do not understand the rules well enough but they are very much well defined.
#
Thank you Shafik, I was indeed talking with Lawrence Crowl on monday who informed me about the problem, and that’s why I immediately updated the post with the red note above to clarify that part. I’m currently out for GDC, and that’s why I didn’t have time to go back and fix the post, but I promise I’ll do it as soon as I get back home.
I confirm, it is sadly a well defined behaviour we cannot apparently get rid of so easily. The results are not only un-intuitive: the fact that (-1 > 1u) holds is unexpected and – in general – just plain wrong. The fact that current rules explicitly makes it correct makes little difference, the main result of this inspection isn’t really a proposal to change this behaviour for ints (which is still possible, albeit it’s quite improbable that the committee would ever consider that), but rather to explain how to produce a class that get rid of this terrible behaviour.
This is going to be expanded in the next posts.
#
Note, that if you use gcc or clang -Wconversion will catch most of these cases and combined with -Werror will force you to fix any of these issues whether it is a bug or if intended behavior via static_cast. There are some differences between both gcc and clang but if you use both you should catch the vast majority of these issues.