Hacker Newsnew | past | comments | ask | show | jobs | submit | mskkm's commentslogin

The public comments on Openreview now include explicit allegations that the TurboQuant paper knowingly misrepresented RaBitQ and understated RaBitQ’s results. The RaBitQ authors also report in a technical note that several of TurboQuant’s runtime and recall numbers do not reproduce from the released code under the paper’s stated setup. In the note, TurboQuant generally loses to RaBitQ: https://arxiv.org/abs/2604.19528. If these public allegations hold up, then this is not just overhype or sloppy citation practice, but points to a distorted comparison and benchmark claims that do not survive reproduction.

went through ICLR review: scores 4 4 6 10, serious? open-source implementations: where is the official code? CUDA kernels: where?

Since yesterday, I put up the source code btw:

https://github.com/Safebots/KV


seems to be a scam

"The TurboQuant paper (ICLR 2026) contains serious issues in how it describes RaBitQ, including incorrect technical claims and misleading theory/experiment comparisons. We flagged these issues to the authors before submission. They acknowledged them, but chose not to fix them. The paper was later accepted and widely promoted by Google, reaching tens of millions of views.

We’re speaking up now because once a misleading narrative spreads, it becomes much harder to correct. We’ve written a public comment on openreview (https://openreview.net/forum?id=tO3ASKZlok).

We would greatly appreciate your attention and help in sharing it."

https://x.com/gaoj0017/status/2037532673812443214


Pied Piper vibes. As far as I can tell, this algorithm is hardly compatible with modern GPU architectures. My guess is that’s why the paper reports accuracy-vs-space, but conveniently avoids reporting inference wall-clock time. The baseline numbers also look seriously underreported. “several orders of magnitude” speedups for vector search? Really? anyone has actually reproduced these results?



They confirmed on the accuracy on NIAH but didn't reproduce the claimed 8x efficiency.


Efficient execution on the GPU appears to have been one of the specific aims of the authors. Table 2 of their paper shows real world performance that would appear at a glance to be compatible with inference.


This is not an LLM inference result. Table 2 is the part I find most questionable. Claiming orders-of-magnitude improvements in vector search over standard methods is an extraordinary claim. If it actually held up in practice, I would have expected to see independent reproductions or real-world adoption by now. It’s been about a year since the paper came out, and I haven’t seen much of either. That doesn’t prove the claim is false, but it certainly doesn’t inspire confidence.


Classic academic move. If the authors show accuracy-vs-space charts but hide end-to-end latency, it usually means their code is slower in practice than vanilla fp16 without any compression. Polar coordinates are absolute poison for parallel GPU compute


I don't think they're using polar coordinates? They're quantizing to grid centroids.


true


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: