Consider the classical sequence point example:
i = i++;
The C and C++ standards state that the behavior of the above expression is undefined because the = operator is not associated with a sequence point.
What confuses me is that
++ has a higher precedence than
= and so, the above expression, based on precedence, must evaluate
i++ first and then do the assignment. Thus, if we start with
i = 0, we should always end up with
i = 0 (or
i = 1, if the expression was
i = ++i) and not undefined behavior. What am I missing?
Operator precedence (and associativity) state the order in which an expression is parsed and executed. However, this says nothing about the order of evaluation of the operands, which is a different term. Example:
a() + b() * c()
Operator precedence dictates that the result of
b() and the result of
c() must be multiplied before added together with the result of
However, it says nothing about the order in which these functions should be executed. The order of evaluation of each operator specifies this. Most often, the order of evaluation is unspecified (unspecified behavior), meaning that the standard lets the compiler do it in any order it likes. The compiler need not document this order nor does it need to behave consistently. The reason for this is to give compilers more freedom in expression parsing, meaning faster compilation and possibly also faster code.
In the above example, I wrote a simple test program and my compiler executed the above functions in the order
c(). The fact that the program needs to execute both
c() before it can multiply the results, doesn’t mean that it must evaluate those operands in any given order.
This is where sequence points come in. It is a given point in the program where all previous evaluations (and operations) must be done. So sequence points are mostly related to order of evaluation and not so much operator precedence.
In the example above, the three operands are unsequenced in relation to each other, meaning that no sequence point dictates the order of evaluation.
Therefore it turns problematic when side effects are introduced in such unsequenced expressions. If we write
i++ + i++ * i++, then we still don’t know the order in which these operands are evaluated, so we can’t determine what the result will be. This is because both
* have unspecified/unsequenced order of evaluation.
Had we written
i++ || i++ && i++, then the behavior would be well-defined, because the
|| specifies the order of evaluation to be left-to-right and there is a sequence point between the evaluation of the left and the right operand. Thus
if(i++ || i++ && i++) is perfectly portable and safe (although unreadable) code.
As for the expression
i = i++;, the problem here is that the
= is defined as (6.5.16):
The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
This expression is actually close to be well-defined, because the text actually says that the left operand should not be updated before the right operand is computed. The problem is the very last sentence: the order of evaluation of the operands is unspecified/unsequenced.
And since the expression contains the side effect of
i++, it invokes undefined behavior, since we can’t know if the operand
i or the operand
i++ is evaluated first.
(There’s more to it, since the standard also says that an operand should not be used twice in an expression for unrelated purposes, but that’s another story.)