Invisible Unicorns

Undefined Behaviour and For-Loops

Recently, my boss was interviewing for an intern and shared some of the example-code written by the candidates. One of these was interesting in its wrong-ness. Roughly-speaking, the candidate had written a for-loop that looked like this:

for (int i = 0; i++) {
    // ...
}

This is, quite obviously, not a valid for loop in Java (the language that they wanted to write their code in) because i++ does not evaluate to a boolean and Java is picky about that here. It is, however, kind of a valid for-loop in C if you add a semicolon after the i++. I wondered what would happen when you compiled that.

The syntax of the for loop in C is:

for (initializer; condition; update) { ... }

C has no dedicated boolean type, so 1 is true and 0 is false. This means that i++, since it evaluates to a number, can be considered to be a boolean value in C. Our test program looks like this:

#include <stdio.h>

int main(void)
{
    int n = 0;
    int i;
    for (i = 0; i++;) {
        n++;
    }
    printf("%d\n", n);
}

I compiled this (on Linux, on x8664, using GCC) and ran it and it immediately output `0`. Because the `++` is evaluated after_ the value of i is taken, since i starts at 0 it's immediately false so the loop is never entered. For the sake of continuing this experiment, I switched the code to ++i so that the loop would actually be run.

Running the ++i version of the program took a while longer, but it eventually output -1. This happened because the integer "wrapped around" from a large positive number to a negative number and eventually it gets back to being zero. This is C, though, and C has undefined behaviour. In C, the act of adding to an integer, when that addition would cause integer overflow, is undefined behaviour. When the compiler encounters an operation that would result in undefined behaviour, it's allowed to do anything that it wants to do. The reason that we got the "boring" behaviour in our test program is that we were compiling without optimization. If we turn on optimization, via the -O2 compiler flag, things start to get interesting.

$ cc -O2 -o ubfor ubfor.c
ubfor.c: In function ‘main’:
ubfor.c:7:21: warning: iteration 2147483647 invokes undefined behavior [-Waggressive-loop-optimizations]
    7 |         for (i = 0; ++i;) {
      |

This time when we attempt to compile the program, we get a warning about the fact that this for-loop will result in undefined behaviour. When we run the program this time, no matter how long we wait, it will never end—i.e. it has been compiled to an infinite loop. We can see this if we disassemble the executable (using intel assembly syntax) as follows:

$ objdump -M intel -d ubfor

The main function, in its entirety, is:

0000000000001020 <main>:
    1020:       eb fe                   jmp    1020 <main>
    1022:       66 2e 0f 1f 84 00 00    cs nop WORD PTR [rax+rax*1+0x0]
    1029:       00 00 00 
    102c:       0f 1f 40 00             nop    DWORD PTR [rax+0x0]

As soon as we get into main, we jump to the beginning of main, forever. This program does nothing other than loop eternally. The nop operation was more confusing to me because it seemed awfully complicated for a "do nothing" operation. Looking at the x86\_64 developer manual (on page 1378) we find the multi-bye NOP opcode. Weirdly, none of the instructions match exactly, but it's close to the 9-byte one which is

66 NOP DWORD ptr [EAX + EAX*1 + 00000000H]

having the byte sequence

66 0F 1F 84 00 00 00 00 00H

In our instruction, we have an extra 2e, presumably just to pad the instruction. The actual meaning of the bytes in our 10-bytes nop are as follows:

value (default can be 16 or 32 bits) * 2e - CS segment override prefix: sets the default segment register used for memory accesses to CS * 0F - NOP (instruction) * 1F - DWORD (double word): 4-byte argument * 84 00 00 00 00H is a memory access

Since the instruction is a NOP, the memory instruction is irrelevant apart from its use to fill space, but I wanted to know what it meant anyway. On page 522 of the x86_64 manual, we find a table explaining the bits in the 84 byte. The first two bits (10 in our case from 8 = 0b1000) are the "mode" bit. The next 3 bits are the "reg/opcode" bits and they specify if a register or an opcode extension will be used. The final 3 bits are the R/M bits and these specify either a particular register or memory addressing mode.

After this (84) byte is the SIB which is 00. Its bits are:

Therefore this byte means EAX + EAX*1. Finally we have the 32-bit (4-byte) displacement value, which is 0. Ultimately, the memory access is, as the disassembler said: [EAX + EAX*1 + 00000000H] (by the way, the "H" means "hexadecimal"). We now know exactly what all of this code means. It's very complex for code that's meant to do nothing!

This brings us back to how we got here in the first place: undefined behaviour. Since addition-with-overflow is undefined behaviour, during optimization the compiler assumes that it is impossible, meaning that the condition of the for loop will never be true, so rather than do the work of running any code after the infinite loop, it gets deleted. Then the code inside the loop does nothing of interest so it is deleted as well. That's how we get from a badly-coded for-loop to a main function that does nothing, forever.