This article has a strange take on algorithmic scaling.
L1, L2, LL, MM, Disk, NetDisk ..., GalacticRAM.
Wow, you actually expected traversing ‘system boundaries’ with distinct access costs and maintain theoretical scaling?
An algorithm that is O(n) will be O(n) in L1, O(n) in L2, ,..., and yes O(n) in GalacticRAM. Each (sub-)system traversal is due to exceeding capacity limits.
There is likely a more interesting mathematical result buried in here, since we see that a series of sub-systems with relative constant order of magnitude increases in access costs in aggregate transforms f -> f’ with this interesting n^.5 term showing up instead of the logs.
This is the 2nd post I've seen recently submitted to HN that plays fast and loose with big-O notation. It has a defined meaning. If someone wants to talk about something else, great use different notation. People already seem to have some trouble grokking big-O and we don't need to muddy it with 'ad-hack' usages.
According to the exact literal meaning, every algorithm you ever run on your computer is O(1), because your computer is finite. The same can be applied to the reachable universe.
That’s not true - an algorithm is a mathematical “entity”, it can and is infinite. The actual implementation on a real machine might not run forever, but there is no such kind of limit with math.
The article was specifically talking about putting algorithms onto real machines and the speed consequences. This creates a new version of the algorithm with certain limits. And because of these limits, there is a maximum runtime for any halting code, and it is a constant number.
No? I'm tired and it was way too long since I dealed with Big O so someone will surely correct me.
Your computer being finite is irrelevant. Big O is about growth factor.
Just because you are capped at value X (< infinity) does not mean that every value below X has the same characteristics. If it did, sure, you are O(1). But if it is N^2 it will still be N^2 even if your machine only has resources for N < 12.
Big O is about the eventual behavior for large enough N. An algorithm is O(N^2) is for sufficiently large inputs the running time is < c * N^2, for some fixed constant c. That's a really weak guarantee for practical purposes -- it allows an algorithm to take stupidly long amounts of time on all practical inputs, so long as eventually it's got a running time with the given bound.
The colloquial use of big O as a general growth factor doesn't hold up to its definition, if you want to be sure it's valid for small inputs too.
I'm in agreement that saying algorithms on computers are all O(1) doesn't make much sense. In a finite memory model, it's impossible to solve big problems, so the running time function becomes undefined for big enough inputs, so big O becomes completely inappropriate in this situation.
> I'm in agreement that saying algorithms on computers are all O(1) doesn't make much sense. In a finite memory model, it's impossible to solve big problems, so the running time function becomes undefined for big enough inputs, so big O becomes completely inappropriate in this situation.
The difference between "everything is O(1)" and "you're not allowed to use O" is pretty minor anyway.
Either way it takes a lot of very useful analysis and throws it out the window because a particular branch of math was being too pedantic about needing infinity.
Use big O anyway. It works fine. It's a good tool even within the limits of real machines. If you're not abusing it on purpose, it tells you useful information in the exact same way that the pure math version does.
> That I use Big O to analyze time and not operations is important.
This is why we have benchmarks and big-O for different things. If we care about cache level sizes and the constant factors, use benchmarks with increasing orders of magnitde.
Or if preferring to remain abstract, model N as the bits of memory so that the access cost goes up with increasing N. To represent specific cache level sizes and latencies use benchmarks everyone will understand.
I mean, big O comes from that particular branch of math. The whole point of it -- even in CS -- is it's asymptotic analysis. (It's also been long criticized in CS for this deficiency. Usually it's the hidden constant factor, but the minimum problem size before it's relevant is an issue too!)
It's fine colloquially misusing big O to mean something like "for usual inputs, the running time is roughly this function", but you can't then in the same breath say all practical algorithms are technically O(1), because that's just not true in the technical sense (and it's useless or at least inaccurate in the colloquial one).
It doesn't really matter, but to add more to the discussion, the way I see it is that computer programs are implementations of algorithms on some restricted set of inputs. The algorithm has the asymptotic analysis, but the program itself does not.
> It's fine colloquially misusing big O to mean something like "for usual inputs, the running time is roughly this function", but you can't then in the same breath say all practical algorithms are technically O(1), because that's just not true in the technical sense (and it's useless or at least inaccurate in the colloquial one).
My argument is "there's no reason to stick to the exact technical definition, because the exact technical definition breaks when you apply it to real computers". I don't think that's contradictory in any way.
But I don't think this is a misuse either. Talking about "usual inputs" and "roughly" would be a misuse, sure. But if I can prove it uses at most an^2 + bn + c steps, and I call it O(n^2), then what's the problem?
This is typically why we don't pick N to mean all of the resources in the universe. I guess you're the kind of guy that chooses to uses not 8-bit bytes in an interview?
Instead typically it's the amount of accesses into some array. Other times people get cute and it's `log2(input length)` so somebodies algorithm could be O(N) while if N was just `input length` then it'd be N^2.
This isn't about picking N. This is saying "No matter what N you pick, your runtime will be smaller than roughly 2^[number of atoms in the reachable universe]" (Because if you hit the same state twice, you're stuck in a loop that will never halt.)
So yes, your algorithm is bound by O(N^2), and that's the number that matters on practical computers... but it's also bound by O(1) with a stupidly large constant. So it's O(1).
> I guess you're the kind of guy that chooses to uses not 8-bit bytes in an interview?
Dude, I'm here specifically to argue against insisting on the exact literal meaning, and to choose the practical option.
...which becomes meaningless and not useful. There is a whole area of mechanical sympathy and cache aware or cache oblivious algorithms that are well worth discussing and learning, but we don't need to use big-O notation to do so.
I have a video link to an MIT lecture that disproves your assertion. Cache aware, cache adaptive, and cache oblivious algorithms are definitively analyzed using big O. In fact it is big O notions that provide assurances about these approaches.
Mechanical sympathy has little (if anything) to do with big-O, btw. LMAX, the poster child of mechanical sympathy, is really about minimizing the use of memory coherence machinery and not taking the substantial latency hits.
It's useful notation even when we're not dealing with mathematical infinities. Unless you have a better alternative for the same purpose, "don't use it" is a pretty bad answer. It's being used for pretty much the same meaning. Not meaningless at all.
Ending up at sqrt(n) doing a visual linear regression on a log plot doesn't mean the same thing to me. It would be better to frame the discussion as an illustration of the limitations of big-O analysis in real world cases, than to say that the big-O of a linked-list is really O(sqrt(N))
Second, there are three images of the same graph, each showing how access times aren't linear when they cross cache boundaries. There aren't any transforms and there aren't any "aggregate transforms f -> f’".
Third, algorithmic complexity is about abstract amounts of work relative to amount of elements. It isn't about "a series of sub-systems" or "galactic RAM".
This article shouldn't use big O notation because it isn't really about algorithms, it is just about non constant access times.
We strongly & emphatically disagree. Analysis assumes a homogeneous computational regime for the said “infinite” store otherwise it is basically meaningless. If this assumption is known not to hold (for example in considering the role of caching in algorithms), this is specifically incorporated in the analysis. See “cache models” and cache-oblivious algorithms for examples.
Cache-Oblivious Structures I, MIT, by Erik Demaine
None of this is true for basic O notation, it's just what exponent the algorithm carries on the number of elements.
Big-O notation doesn't even have to do with this article, it's just showing that memory access isn't strictly constant. This all seems like you're trying to argue something that isn't a part of this thread and I'm not sure what that is.
L1, L2, LL, MM, Disk, NetDisk ..., GalacticRAM.
Wow, you actually expected traversing ‘system boundaries’ with distinct access costs and maintain theoretical scaling?
An algorithm that is O(n) will be O(n) in L1, O(n) in L2, ,..., and yes O(n) in GalacticRAM. Each (sub-)system traversal is due to exceeding capacity limits.
There is likely a more interesting mathematical result buried in here, since we see that a series of sub-systems with relative constant order of magnitude increases in access costs in aggregate transforms f -> f’ with this interesting n^.5 term showing up instead of the logs.