mskkm's comments

mskkm · 2026-04-27T08:20:31 1777278031

The public comments on Openreview now include explicit allegations that the TurboQuant paper knowingly misrepresented RaBitQ and understated RaBitQ’s results. The RaBitQ authors also report in a technical note that several of TurboQuant’s runtime and recall numbers do not reproduce from the released code under the paper’s stated setup. In the note, TurboQuant generally loses to RaBitQ: https://arxiv.org/abs/2604.19528. If these public allegations hold up, then this is not just overhype or sloppy citation practice, but points to a distorted comparison and benchmark claims that do not survive reproduction.

mskkm · 2026-04-21T21:14:34 1776806074

went through ICLR review: scores 4 4 6 10, serious? open-source implementations: where is the official code? CUDA kernels: where?

EGreg · 2026-04-22T00:30:34 1776817834

Since yesterday, I put up the source code btw:

https://github.com/Safebots/KV

mskkm · 2026-03-30T09:29:57 1774862997

seems to be a scam

"The TurboQuant paper (ICLR 2026) contains serious issues in how it describes RaBitQ, including incorrect technical claims and misleading theory/experiment comparisons. We flagged these issues to the authors before submission. They acknowledged them, but chose not to fix them. The paper was later accepted and widely promoted by Google, reaching tens of millions of views.

We’re speaking up now because once a misleading narrative spreads, it becomes much harder to correct. We’ve written a public comment on openreview (https://openreview.net/forum?id=tO3ASKZlok).

We would greatly appreciate your attention and help in sharing it."

https://x.com/gaoj0017/status/2037532673812443214

mskkm · 2026-03-25T09:05:50 1774429550

Pied Piper vibes. As far as I can tell, this algorithm is hardly compatible with modern GPU architectures. My guess is that’s why the paper reports accuracy-vs-space, but conveniently avoids reporting inference wall-clock time. The baseline numbers also look seriously underreported. “several orders of magnitude” speedups for vector search? Really? anyone has actually reproduced these results?

NitpickLawyer · 2026-03-25T10:35:18 1774434918

Apparently MLX confirmed it - https://x.com/prince_canuma/status/2036611007523512397

mskkm · 2026-03-25T10:40:45 1774435245

They confirmed on the accuracy on NIAH but didn't reproduce the claimed 8x efficiency.

fc417fc802 · 2026-03-25T13:20:02 1774444802

Efficient execution on the GPU appears to have been one of the specific aims of the authors. Table 2 of their paper shows real world performance that would appear at a glance to be compatible with inference.

mskkm · 2026-03-25T13:39:55 1774445995

This is not an LLM inference result. Table 2 is the part I find most questionable. Claiming orders-of-magnitude improvements in vector search over standard methods is an extraordinary claim. If it actually held up in practice, I would have expected to see independent reproductions or real-world adoption by now. It’s been about a year since the paper came out, and I haven’t seen much of either. That doesn’t prove the claim is false, but it certainly doesn’t inspire confidence.

veunes · 2026-03-25T09:49:31 1774432171

Classic academic move. If the authors show accuracy-vs-space charts but hide end-to-end latency, it usually means their code is slower in practice than vanilla fp16 without any compression. Polar coordinates are absolute poison for parallel GPU compute

fc417fc802 · 2026-03-25T13:35:31 1774445731

I don't think they're using polar coordinates? They're quantizing to grid centroids.

mskkm · 2025-09-12T14:11:10 1757686270