Categories
c c++ implementation-defined-behavior undefined-behavior unspecified-behavior

Undefined, unspecified and implementation-defined behavior

621

What is undefined behavior (UB) in C and C++? What about unspecified behavior and implementation-defined behavior? What is the difference between them?

4

470

Undefined behavior is one of those aspects of the C and C++ language that can be surprising to programmers coming from other languages (other languages try to hide it better). Basically, it is possible to write C++ programs that do not behave in a predictable way, even though many C++ compilers will not report any errors in the program!

Let’s look at a classic example:

#include <iostream>

int main()
{
    char* p = "hello!\n";   // yes I know, deprecated conversion
    p[0] = 'y';
    p[5] = 'w';
    std::cout << p;
}

The variable p points to the string literal "hello!\n", and the two assignments below try to modify that string literal. What does this program do? According to section 2.14.5 paragraph 11 of the C++ standard, it invokes undefined behavior:

The effect of attempting to modify a string literal is undefined.

I can hear people screaming “But wait, I can compile this no problem and get the output yellow” or “What do you mean undefined, string literals are stored in read-only memory, so the first assignment attempt results in a core dump”. This is exactly the problem with undefined behavior. Basically, the standard allows anything to happen once you invoke undefined behavior (even nasal demons). If there is a “correct” behavior according to your mental model of the language, that model is simply wrong; The C++ standard has the only vote, period.

Other examples of undefined behavior include accessing an array beyond its bounds, dereferencing the null pointer, accessing objects after their lifetime ended or writing allegedly clever expressions like i++ + ++i.

Section 1.9 of the C++ standard also mentions undefined behavior’s two less dangerous brothers, unspecified behavior and implementation-defined behavior:

The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine.

Certain aspects and operations of the abstract machine are described in this International Standard as implementation-defined (for example, sizeof(int)). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these respects.

Certain other aspects and operations of the abstract machine are described in this International Standard as unspecified (for example, order of evaluation of arguments to a function). Where possible, this International Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the abstract machine.

Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer). [ Note: this International Standard imposes no requirements on the behavior of programs that contain undefined behavior.end note ]

Specifically, section 1.3.24 states:

Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

What can you do to avoid running into undefined behavior? Basically, you have to read good C++ books by authors who know what they’re talking about. Avoid internet tutorials. Avoid bullschildt.

20

  • 11

    It’s a weird fact that resulted from the merge that this answer only covers C++ but this question’s tags includes C. C has a different notion of “undefined behavior”: It will still require the implementation to give diagnostic messages even if behavior is also stated to be undefined for certain rule violations (constraint violations).

    Nov 20, 2010 at 4:45

  • 12

    @Benoit It is undefined behavior because the standard says it’s undefined behavior, period. On some systems, indeed string literals are stored in the read-only text segment, and the program will crash if you try to modify a string literal. On other systems, the string literal will indeed appear change. The standard does not mandate what has to happen. That’s what undefined behavior means.

    Jan 17, 2013 at 13:37

  • 8

    @FredOverflow, Why does a good compiler allow us to compile code that gives undefined behavior? Exactly what good can compiling this kind of code give? Why didn’t all good compilers give us a huge red warning sign when we are trying to compile code that gives undefined behavior?

    – Pacerier

    Sep 27, 2013 at 8:53


  • 17

    @Pacerier There are certain things that are not checkable at compile time. For example it is not always possible to guarantee that a null pointer is never dereferenced, but this is undefined.

    Dec 8, 2013 at 14:16

  • 4

    @Celeritas, undefined behavior can be non-deterministic. For example, it is impossible to know ahead of time what the contents of uninitialized memory will be, eg. int f(){int a; return a;}: the value of a may change between function calls.

    – Mark

    Oct 12, 2015 at 20:56


114

Well, this is basically a straight copy-paste from the standard

3.4.1 1 implementation-defined behavior unspecified behavior where
each implementation documents how the
choice is made

2 EXAMPLE An example of
implementation-defined behavior is the
propagation of the high-order bit when
a signed integer is shifted right.

3.4.3 1 undefined behavior behavior, upon use of a nonportable or erroneous
program construct or of erroneous
data, for which this International
Standard imposes no requirements

2
NOTE Possible undefined behavior
ranges from ignoring the situation
completely with unpredictable results,
to behaving during translation or
program execution in a documented
manner characteristic of the
environment (with or without the
issuance of a diagnostic message), to
terminating a translation or execution
(with the issuance of a diagnostic
message).

3 EXAMPLE An example of
undefined behavior is the behavior on
integer overflow.

3.4.4 1 unspecified behavior use of an unspecified value, or other behavior
where this International Standard
provides two or more possibilities and
imposes no further requirements on
which is chosen in any instance

2
EXAMPLE An example of unspecified
behavior is the order in which the
arguments to a function are evaluated.

18

  • 4

    What’s the difference between implementation-defined and unspecified behaviour?

    – Zolomon

    Mar 7, 2010 at 21:23

  • 29

    @Zolomon: Just like it says: basucally the same thing, except that in case of implementation-defined the implementation is requred to document (to guarantee) what exactly is going to happen, while in case of unspecified the implementation is not required to document or guarantee anything.

    Mar 7, 2010 at 21:27


  • 1

    @Zolomon: It’s reflected in the difference between 3.4.1 and 2.4.4.

    – sbi

    Mar 7, 2010 at 21:28

  • 11

    @Celeritas: Hyper-modern compilers can do better than that. Given int foo(int x) { if (x >= 0) launch_missiles(); return x << 1; } a compiler can determine that since all means of invoking the function that don’t launch the missiles invoke Undefined Behavior, it can make the call to launch_missiles() unconditional.

    – supercat

    May 5, 2015 at 17:24

  • 2

    @northerner As the quote states, unspecified behavior is usually restricted to a limited set of possible behaviors. In some cases you might even come to conclusion that all of these possibilities are acceptable in the given context, in which cases unspecified behavior is not a problem at all. Undefined behavior is completely unrestricted (e.b. “the program may decide to format your hard drive”). Undefined behavior is always a problem.

    Mar 21, 2017 at 7:41


68

Maybe simpler wording could be easier to understand than the rigorous definition of the standards.

implementation-defined behavior:
The language says that we have data-types. The compiler vendors specify what sizes shall they use, and provide a documentation of what they did.

undefined behavior:
You are doing something wrong. For example, you have a very large value in an int that doesn’t fit in char. How do you put that value in char? actually there is no way! Anything could happen, but the most sensible thing would be to take the first byte of that int and put it in char. It is just wrong to do that to assign the first byte, but thats what happens under the hood.

unspecified behavior:
Which of these two functions is executed first?

void fun(int n, int m);

int fun1() {
    std::cout << "fun1";
    return 1;
}
int fun2() {
    std::cout << "fun2";
    return 2;
}

//...

fun(fun1(), fun2()); // which one is executed first?

The language doesn’t specify the evaluation, left to right or right to left! So an unspecified behavior may or mayn’t result in an undefined behavior, but certainly your program should not produce an unspecified behavior.


@eSKay I think your question is worth editing the answer to clarify more 🙂

for fun(fun1(), fun2()); isn’t the behaviour “implementation defined”? The compiler has to choose one or the other course, after all?

The difference between implementation-defined and unspecified, is that the compiler is supposed to pick a behavior in the first case but it doesn’t have to in the second case. For example, an implementation must have one and only one definition of sizeof(int). So, it can’t say that sizeof(int) is 4 for some portion of the program and 8 for others. Unlike unspecified behavior, where the compiler can say: “OK I am gonna evaluate these arguments left-to-right and the next function’s arguments are evaluated right-to-left.” It can happen in the same program, that’s why it is called unspecified. In fact, C++ could have been made easier if some of the unspecified behaviors were specified. Take a look here at Dr. Stroustrup’s answer for that:

It is claimed that the difference between what can be produced giving the compiler this freedom and requiring “ordinary left-to-right evaluation” can be significant. I’m unconvinced, but with innumerable compilers “out there” taking advantage of the freedom and some people passionately defending that freedom, a change would be difficult and could take decades to penetrate to the distant corners of the C and C++ worlds. I am disappointed that not all compilers warn against code such as ++i+i++. Similarly, the order of evaluation of arguments is unspecified.

IMO far too many “things” are left undefined, unspecified, that’s easy to say and even to give examples of, but hard to fix. It should also be noted that it is not all that difficult to avoid most of the problems and produce portable code.

9

  • 3

    for fun(fun1(), fun2()); isn’t the behaviour "implementation defined"? The compiler has to choose one or the other course, after all?

    – Lazer

    Mar 8, 2010 at 5:14


  • 1

    @AraK: thanks for the explaining. I understand it now. Btw, "I am gonna evaluate these arguments left-to-right and the next function's arguments are evaluated right-to-left" I understand this can happen. Does it really, with compilers that we use these days?

    – Lazer

    Mar 8, 2010 at 10:23

  • 1

    @eSKay You have to ask a guru about this who got his hands dirty with many compilers 🙂 AFAIK VC evaluates arguments right-to-left always.

    Mar 8, 2010 at 10:28

  • 5

    @Lazer: It can definitely happen. Simple scenario: foo(bar, boz()) and foo(boz(), bar), where bar is an int and boz() is a function returning int. Assume a CPU where parameters are expected to be passed in registers R0-R1. Function results are returned in R0; functions may trash R1. Evaluating “bar” before “boz()” would require saving a copy of bar somewhere else before calling boz() and then loading that saved copy. Evaluating “bar” after “boz()” will avoid a memory store and re-fetch, and is an optimization many compilers would do regardless of their order in the argument list.

    – supercat

    Mar 21, 2011 at 20:12


  • 6

    I don’t know about C++ but the C standard says that a conversion of an int to a char is either implementation defined or even well defined (depending on the actual values and signedness of types). See C99 §6.3.1.3 (unchanged in C11).

    Jan 14, 2013 at 10:18