>>How's that "bad code"? On most desktop architecture, integers are coded as 2's...

loup-vaillant · on March 4, 2016

You sound like programming languages are a Given™, never to be modified —or at least not by us lowly programmers. That's too pessimistic for my taste.

As for whether C would be better off… that's besides the point. Taken to its extreme, undefined behaviour upon signed integer overflow breaks C's own philosophy.

C is supposed to be closed to the metal. Originally, it was. So you'd expect that if some `ADD R1 R2` instruction triggers an overflow, you might observe funny results that might have funny consequences down the line. You'd further expect that the `+` operator has the exact same consequences —because C is close to the metal.

What you do not expect, is this (imagine you're writing a PID for a DSP, where integer arithmetic saturates):

  int signal = input + 1;
  if (signal == input) { // tests for saturation
      // dead code, because undefined behaviour
      //
      // The only way for this test to be true is
      // to overflow signed integer addition.
      // Signed integer overflow is undefined, so
      // the compiler is allowed to assume it never
      // happens.
      //
      // Therefore, the test above is always false,
      // and the code right here is marked as dead.
  }

Dead code is not whatever funny behaviour that might have risen from the `ADD R1 R2` above. That's the compiler doing weird stuff because "undefined behaviour" allowed it to go crazy. This is not what you'd expect of a low level language. Craziness is supposed to come from the platform, not the compiler.

Now C being what it is, the quick fix would be to use INT_MAX instead. It's the portable way to do this test, and would avoid that crazy dead code elimination. But this is not enough: if `input == INT_MAX`, we still have an overflow, and who knows what would happen. The real fix would be something like this:

  int signal = input + (input == INT_MAX ? 0 : 1);
  if (signal == INT_MAX) {
      // live code, yay!
  }

I have to emulate saturation in software! Why?!? The platform does this already with the `ADD` instruction, why can't I just use it? Why am I even using a low level language?

In this particular case, the spirit of the C standard was clearly to have addition map to whatever hardware instruction was available. If the `ADD` wrapped around, so would `+`. If it saturated, so would `+`. And if it trapped like a divide by zero, so would `+`.

Instead, the compiler writers took the standard to the letter, and saw "undefined" as a license to eliminate code that wasn't dead by any such low level assumption. Doing this clearly violates the idea that C is close to the metal.