Hacker Newsnew | past | comments | ask | show | jobs | submit | rixed's commentslogin

> it's worth pointing out that binary search (or low-level implementation variants) is the best only if you know nothing about the data beyond the fact that it is sorted / monotonic

Also if you do not learn anything about the data while performing the binary search, no? Like, if you are constantly below the estimate, you could gess that the distribution is biases toward large values and adjust your guess based on this prediction.


> Also [IFF] you do not learn anything about the data while performing the binary search, no?

Yes, absolutely!

I forgot to share this general perspective above, and it's too late to edit, so I'll add it here...

Since binary search assumes only monotonicity; splitting your interval into two equal parts extracts one bit of information per step, and any other choice would extract less information on average. One bit of information per step is how you end up needing log(n) steps to find the answer.

To accelerate your search, you basically need to extract those log(n) bits as fast as you can. You can think of that as leveraging both the prior, and everything you learn along the way -- to adaptively design each step to be the optimal experiment to extract maximum amount of information. And adaptive local models of your search space (gradient / hessian / etc) allow you to extract many more bits of information from each query / experiment, provided the function you are inverting has some local structure.

PS: That is why we leverage these ideas to "search" for the optimum, among a space of solutions.


For a list of sorted values with no other knowledge, the binary search is optimal. Provably, it is simple information theory on binary information.

You can do better if the list is stable by reusing information.

But gathering that information during searches is going to require great complexity to leverage, as searches are an irregular information gathering scheme.

So create RAM for speedup optimizations up front.

1) Create a table that maps the first 8 bits to upper and lower indexes in the list. Then binary search over the last 8 bits. That reduces the search time in half.

2) Go all the way, and create an array of 32,768 indexes, with all 1's for misses. Either way, search returns O(1).

Stable lists allow for sliding parametric trade offs between RAM-lookup vs. binary search. From full lookup, to full binary.


It's not possible to learn anything about other elements when performing binary search, _except_ the only thing there is to learn: if the target is before or after the recently compared element.

If we would guess that there is a bias in the distribution based on recently seen elements, the guess is at least as likely to be wrong as it is to be right. And if we guess incorrectly, in the worst case, the algorithm degrades to a linear scan.

Unless we have prior knowledge. For example: if there is a particular distribution, or if we know we're dealing with integers without any repetition (i.e. each element is strictly greater than the previous one), etc.


> It's not possible to learn anything about other elements when performing binary search, _except_ the only thing there is to learn: if the target is before or after the recently compared element.

You have another piece of information, you don't only know if the element was before or after the compared element. You can also know the delta between what you looked at and what you're looking for. And you also have the delta from the previous item you looked at.


And you always start off knowing the total length of the array, and the width of the datatype.

Actually deciding what to do with that information without incurring a bunch more cache misses in the process may be tricky.


Is the disconnect here that in many datasets there is some implicit distribution? For example if we are searching for english words we can assume that the number of words or sentences starting with "Q" or "Z" is very small while the ones starting with "T" are many. Or if the first three lookups in a binary search all start with "T" we are probably being asked to search just the "T" section of a dictionary.

Depending on the problem space such assumptions can prove right enough to be worth using despite sometimes being wrong. Of course if you've got the compute to throw at it (and the problem is large) take the Contact approach: why do one when you can do two in parallel for twice the price (cycles)?


Assuming your key space is anything like randomly distributed.

Thinking about it--yeah, if you can anticipate anything like a random distribution it's a few extra instructions to reduce the number of values looked up. In the old days that would have been very unlikely to be a good deal, but with so many algorithms dominated by the cache (I've seen more than one case where a clearly less efficient algorithm that reduced memory reads turned out better) I suspect there's a lot of such things that don't go the way we learned them in the stone age.


> If we would guess that there is a bias in the distribution based on recently seen elements, the guess is at least as likely to be wrong as it is to be right.

This is true for abstract and random data. I don't think it's true for real world data.

For example, python's sort function "knows nothing" about the data you're passing in. But, it does look for some shortcuts and these end up saving time, on average.


Also of interest: https://github.com/uutils/coreutils/issues

Is rewritting old softwares into a newer language a bit like adding people to a late project?


Ubuntu does this to get rid of the GPL license.

Why makes this statement racist?

Disregarding genuine differences between cultures and intentially treating those cultures as identical is racist. Turns out he did come around later in life - https://www.thenational.scot/news/23716840.george-orwells-an...

> Disregarding genuine differences between cultures and intentially treating those cultures as identical is racist.

Just for the record, I don't think that's a widely used definition for "racism". Also, the linked article does not really support the view that he was "disregarding genuine differences"; if anything, his prejudice made him see differences where there were none (or misattribute traits that he notice on some individual to a whole population). The article speak about his contempt for the gaelic language though, which I guess is what annoys you the most? But then the article also states that he changed his mind on that.


The article you linked goes to great lengths about Orwell's alleged scotophobia. I don't see how particularly hating scotts supports "intentially treating those cultures as identical". If that was true, then Orwell would equally hate all scots and englishmen.

Also, what's the downside of treating different cultures identical, aside from potentially offending people? As opposed to other kind of racism, where other people are treated as lesser subhumans that ultimately led to slavery. Why are both casually referred to racism when the other has more far-reaching consequences.


If he was an entirely rational being I would agree, but he clearly wasn't and he did hold both views that Scottish identity isn't worth anything and Scots are essentially identical to the English.

Let's be serious, I'm not saying Orwell would be out enslaving Scots given the chance, but saying Scottish culture IS English culture is intentionally erasing a culture and is a racist statement, not unlike the common view that all of Africa is culturally homogenous.


I really find this amazing, gives you the feeling that learning the ideograms would be enough to get the gist of most texts. Quite bright yet simple hack.

I wish HN would feature more projects like this one instead of wtv is currently on the first page.


TY!!

Yea this actually has been going viral when I posted it on Instagram. Maybe I'll need to post this on HN in a different time or wait for someone else to organically post it :)


Does not work on FF, iOS : the translation is drawn with the same font.

oops, I recently switched the storage to Git LFS and the deploy workflow didn't download the fonts

the same goes for Brave, ios

  > So what was valued? ...
None of that was valued much compared to lineage, though.


Serviceability by owner is less and less important as fewer and fewer owners actually do maintenance themselves, though.


Even if 100% of owners choose to pay someone else to do it, they are still benefiting from the user-serviceable standard.

First, anything serviceable by the owner is also accessible to a local garage or independent repair shop. That means a competitive market for those owners, rather that being stuck paying extra to a local monopoly or to a rent-seeking manufacturer.

Second, it makes long-term repairability of the product much easier, things don't just suddenly become irreparable because the manufacturer closed down their "unlock codes for trusted affiliates" site. Their asset retains more of its value.

There are things which provide value even when nobody uses them.


Option 1: "I am so much smarter than these stupid engineers"

Option 2: "Certainly, I do not understand the various trade offs that led to this design"

I wished at least in software tech circles there would be more of option 2 kind of thinking.


Option 3: "Engineers have discussed this at length and all agree that the only justifications for this design are based in extremely short-term thinking."


I've been not betting on x86 in the long term since the PowerPC was announced ;)


It's sad to admit, but being a chef étoilé does not make it easier to find a job in the fast food industry. It can even be detrimental.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: