Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

    int d[16];
    int SATD (void)
    {
        int satd = 0, dd, k;
        for (dd=d[k=0]; k<16; dd=d[++k]) {
            satd += (dd < 0 ? -dd : dd);
        }
        return satd;
    }
This gets optimized into an infinite loop, and the paper says that this is incorrect. I'm on the side of the compiler. This code is broken, if you do an out-of-bounds access then an infinite loop is the least of your worries.


Independent of undefined behavior: Who writes such god-awful code?

There:

    for (k=0; k < 16; k++)
    {
      dd = d[k];
      satd += (dd < 0 ? -dd : dd);
    }
Just one more line, less error-prone, idiomatic for-loop and variable accesses (therefore easier and faster to comprehend), and best of all: no undefined behavior.

("But if my code is simple, how will other people know what a genius I am?")

There's no way I would accept the code from that benchmark in a code review.


FWIW I compiled this with gcc 4.9.3 -O3 and got:

    satd.c:7:29: warning: iteration 15u invokes undefined behavior [-Waggressive-loop-optimizations]
         for (dd=d[k=0]; k<16; dd=d[++k]) {
                                 ^


Good evidence that compiler writers aren't quite so malicious as the author portrays.

Warnings, not optimizations, are what most programmers actually want from undefined behavior.


On a reasonable machine (e.g., x86, x86_64, ARM, ARM64), in a reasonable context (i.e., not the very tippy top stack frame), the one-past-the-end read is harmless. The value is unused, and the read itself is not going to segfault because it's going to read the frame pointer or return address out of the stack frame.


... unless your implementation does bound checking at runtime. This actually happen with the new crop of sanitizes or even with the new hardware accelerated bound checking features of modern intel cpus.

It would be a shame if people had to disable critical safety features to workaround crashes caused by such "harmless" out of bound reads.


That's also taking the view that the code is executing pointless busy work as the result is meaningless -- seems fair enough to optimise the whole thing away to me under those circumstances, although I suppose the compiler ought to put in some "random" return value. Four would probably do nicely as it's already been proven to be the best random number.


When you say "optimize the whole thing away", do you mean that it's reasonable to skip that particular load or reasonable to skip the entire loop? Anton and I (and I presume 'swolchok') are all for reordering the code to avoid the the unnecessary load, and if it had some benefit, would probably be fine with replacing it with some random or non-random number.

The part that makes us cringe is for the compiler to reason backward from the undefined load to removing the entire loop, even for the values that are within range. While accepting that compiler would be spec compliant in doing so (replacing main() with "return 0" would also be spec compliant), we question whether that really makes for a better/faster/safer user experience. Essentially, we think that the clear intent of the code should have greater influence on the particular optimizations that are chosen.


I do agree that this isn't quite so straightforward, but I also can't see that it's necessary for the compiler to guess what is meant here. Even supposing the array is initialised elsewhere there is a read past the end of the array -- I hope we'd both agree that a segfault in response is perfectly reasonable.

If that's the case then maybe forcing a segfault is better than optimising the loop away, but I do err on the compiler's side here. As others have pointed out, the compiler warns that undefined behaviour is invoked by the code and it probably doesn't actually do what the author thinks it does. The diagnostic also isn't required by the standard.

The clear intent of the code certainly isn't clear to me -- whatever calculation is done involving a value read from past the end of the array only tells me that the return value isn't used in any meaningful way, and if that's the case then why not eliminate the code?

[EDIT] Ok, just re-read the code, I guess the value is actually unused so the function return value is presumably OK, but I think the option of the segfault is still reasonable for the compiler to do.


A segmentation violation when doing the out-of-bounds access would be a possible behaviour of C* code.

Concerning the intent: Note that the result of the out-of-bounds access is not being used. Looks to me like SATD() just produces the sum of absolute values of d.


Even with buggy code, you want the behavior of the generated code to reflect the behavior of the source code, or debugging becomes an exercise in frustration (especially, but not only on platforms where running a debugger is not an option).


Yeah, this is undefined behaviour and should not be expected to work, 100% agreed.

Wouldn't it be faster to remove that costly ternary conditional operator by zeroing out the first bit? Well, there's one degenerate case, for minvalue of int, but that is degenerate either way.


> Wouldn't it be faster to remove that costly ternary conditional operator by zeroing out the first bit?

No, that wouldn't work for two's-complement numbers [1], only for sign-magnitude numbers (like IEEE floating-point numbers). For example, the value -2 is stored as 11111110 (in an 8-bit signed integer); setting the most significant bit to 0 would result in 01111110, or 126 in decimal.

[1] https://en.wikipedia.org/wiki/Two's_complement


Oh, what an awkward blunder. I know two's complement (that is why I talked about the degenerate case of minvalue). Why did I think this would work?


Maybe you were temporarily confused because the MSB is sufficient to test for negativity?


The paper says that it doesn't get optimized into an infinite loop, because the GCC developers changed the compiler to make it well defined.


But they only change their mind for this particular code, because it is part of a benchmark, which is the kind of code they supports instead of useful one.


This is Anton's view of the situation, and not an actual one AFAICT :)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53073

What really happened: It was diagnosed, and they produced a fix for the spec source code. SPEC said it does not plan to offer the fix. So they then went and said "okay, i guess we'll try to add an option to give people a chance to have this random crap work". The flag was added to disable this assumption so people could use it if they need it.

I'm not sure what better response you want from developers than that.

Nowhere, in any of the bug report, will you see someone say "yeah, we gotta make sure this still works because it's a benchmark".

They treat it like any other undefined behavior bug, and say "that's undefined behavior". you can look through the bug database and find literally tens of thousands of cases where they say that, benchmarks, non-benchmarks, etc. They do it whether it's important for optimization or not. It's really pretty consistent.

So taking this example and extrapolating that the only reason it was done is because it's a benchmark is, well, pretty typical of this author :)


Would it have gotten a special flag if it were not a SPEC benchmark? I read that as Anton's point.


Yes. It would have.


The "optimization" was added and turned on by default (for some higher optimization levels) in a pre-release of gcc-4.8. After it turned out to miscompile SPEC, they disabled it for the case occuring there (it still seems to be on by default for other cases); it was disabled in released gcc-4.8 and all following gcc versions. When I asked a gcc maintainer about that, he wrote that it turned out that this "optimization" did not really provide a significant performance benefit; I asked how they evaluated that and he wrote that they used a set of relevant benchmarks. I have seen little reluctance by the gcc maintainers in enabling "optimizations" by default, even if they miscompile real-world programs. My conclusion is that SPEC has a special status among gcc maintainers, but draw your own conclusions.


This optimization has been around a long time.

You are wrong about the timeline, it was added in 2012 well before the 4.8 branch point (it's closer to the 4.7 timeline). It was not added in a pre-release of gcc 4.8, and it was actually added to target some common cases. And in fact, it existed before then in other forms (I can point you at code back to 2004 doing the same thing). It has been on by default there for a long time. It happened to not miscompile SPEC before then by luck.

". After it turned out to miscompile SPEC, they disabled it for the case occuring there (it still seems to be on by default for other cases);"

No, it was not. It is on by default for all cases, AFAICT. Please point to the patch you think disabled it. " it was disabled in released gcc-4.8 and all following gcc versions."

Again, i don't see a patch that does this. If this has happened, i imagine it's not on purpose.

Here's the current state of this optimization: from https://gcc.gnu.org/svn/gcc/trunk/gcc/common.opt

faggressive-loop-optimizations Common Report Var(flag_aggressive_loop_optimizations) Optimization Init(1) Aggressively optimize loops using language constraints.

The Init(1) means it's on by default. Note it is mentioned in the release notes for 4.8, for all versions: https://gcc.gnu.org/gcc-4.8/changes.html

Nowhere will you see something that says it is not the default, or will you see changes mentioning the default was changed.

" When I asked a gcc maintainer about that, he wrote that it turned out that this "optimization" did not really provide a significant performance benefit; I asked how they evaluated that and he wrote that they used a set of relevant benchmarks."

Which "gcc maintainer" did you ask? I ask because i'm a listed maintainer for the SCEV related optimizations (see https://gcc.gnu.org/svn/gcc/trunk/MAINTAINERS) and you didn't ask me :)

I asked Sebastian as well, and you didn't ask him.

While compiler vendors do target SPEC, GCC generally does not, and in fact, has pretty much repeatedly refused optimizations that are only useful for SPEC.

FYI: putting optimization in quotes repeatedly does not help your case.


I tested compiling SATD() with gcc-4.8.2 and gcc-5.2.0 with -O3, and they do not "optimize" SATD() into an infinite loop. That some version between 4.7 and 4.8 did actually do it was well publicized, and is also documented in bug report 53073; this bug report was resolved as "INVALID", yet somehow a change was introduced before 4.8 was released that results in compiling SPEC as intended. If you want to know which patch achieved that, you can bisect the commits.

The gcc maintainer who justified that change to me is Andrew Haley. He may not be working on SCEV, whatever that is, but, unlike you, he admits to this change that has provably happened.


"I tested compiling SATD() with gcc-4.8.2 and gcc-5.2.0 with -O3, and they do not "optimize" SATD() into an infinite loop."

and from this you assume it was specifically changed (much like you assume a lot of bad faith throughout), instead of "just falling by the wayside as the result of some other change", which is infinitely more likely.

I have no desire to know what patch did it, actually, i'm pointing out you are, pretty much everywhere, going off half-cocked with assumptions and accusations.

"The gcc maintainer who justified that change to me is Andrew Haley. He may not be working on SCEV, whatever that is, but, unlike you, he admits to this change that has provably happened."

Andrew pretty much has only ever worked on the java front end, and that's what he maintains. SCEV is the thing that changed here.

I'm not sure why Andrew would have ever said anything about SPEC (since it's not java ;P), but i actually at this point suspect more bad faith on your part, given what you've said so far.


well, that does make him seem a tad unreasonable then


What's wrong with this code? I don't see it.

(disclaimer: I am a programmer, not a compiler)


This:

    k<16; dd=d[++k]
, whenever the code reached the end of the loop, effectively becomes

    dd=d[++k]
    if(k<16) goto startOfLoop;
That, in turn is equivalent to:

    k+=1
    dd=d[k]
    if(k<16) goto startOfLoop
In the second line, k is used to index into array d. Conforming programs do not access memory outside of arrays, so the compiler can infer that

  0 <= k < 16
Given that, it is clear that the goto startOfLoop branch will always be taken. Inside the loop, the program accesses some memory, but that data never is visible to the outside world, so the compiler can optimize away the memory reads and writes.

The compiler is that aggressive because it has to be in real world programs. For example, compilers using C as their target instruction set may generate bounds checks for every array access, trusting the C compiler to remove as many as possible of them. Macros are another common cause of superfluous source statements that programmers will expect the compiler to optimize away.

I think it will be very tricky to detect cases where compilers should be wary of applying 'obvious' optimizations.


I would be pretty miffed if I wrote a, say, Pascal compilers with array bounds checks, and the compiler "optimized" the checks away just because accessing a[i] would be undefined behaviour. OTOH, in a loop like

for (i=0; i<16; i++) { if (i>=0 && i<16) do something with a[i] else report an error;

you can certainly optimize the if to if(1). That's an optimization*.


No, it would not. Compilers can remove boundary checks that provably happen after accessing an item of an array, not those before the array is accessed.

For your example, the Pascal code (at least, I hope this is Pascal; it has been a while since I wrote any):

  if (i>=0 and i<16) then
  begin
    x := a[i]
  end
  else
  begin
    ReportError;
  end
is equivalent to the C code:

  if (i>=0 && i<16)
  {
    if(i>=0 && i<16)
    {
       x = a[i]
    } else {
      RuntimeAbort(“Array access outside bounds");
    }
  } else {
    ReportError();
  }
A good Pascal compiler, like a good C compiler, would optimise away that second boundary check and the call to RuntimeAbort. Neither compiler is allowed to optimise away the first check.


The point of my posting was that you don't need "optimizations" to optimize away redundant bounds checks, optimization* is able to do it just fine. Sorry if I did not get that across clearly. What does your "No, it would not" refer to?


Raymond Chen has a series of blog posts (e.g. https://blogs.msdn.microsoft.com/oldnewthing/20140627-00/?p=...) that cover some surprising consequences that undefined behavior can have on program optimization. I'm kind of sympathetic to the point of view that these are abuses of the freedoms granted to the compiler, and that it would be better for it to just generate instructions that match the illegal code. But as others have pointed out, you can generally get the compiler to warn you when it detects something like this, and anyone who uses C and doesn't turn on "treat warnings as errors" gets what they deserve. If you want a language that does runtime bounds checking there are plenty to choose from.

See also this related SO question, and ones it links to: http://stackoverflow.com/questions/24296571/why-does-this-lo...


The (expected) last iteration will try to access d[16] (which is UB) in the statement dd=d[++k].




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: