Yes it's all wrong, because:
a) recall is designed to measure binary relevance, but vector scores are not good relevance judgments and they aren't binary.
b) most models optimise purely for distance, which makes nDCG look great, but causes content to clump together. This loses local ranking precision and the noise from embedding order is significantly greater than the approximation in the ANN system
c) bi-encoders have significantly greater error than cross-encoders. Basically every vector DB is blowing at least one order of magnitude more resources than they need to to optimise bi-encoding efficiency which is wrong anyway.
Yeah basically all the vector "database" solutions in market have chosen data-dependent indexes, so you need the data upfront. Imagine if regular databases needed all data upfront before they could build indexes. It's kind of crazy...
You pay a decent cost to do the hash, it’s a compression algorithm of sorts. But the data is a fraction of the size and comparison is way faster. If you do many of these or compare the same ones more than once you amortise the cost very quickly
I was part of the above article. Happy to answer questions.
In terms of accuracy, it totally depends on the resolution needed. We can get >99% accuracy of L2 waaaaay faster with 1/10 of the memory overhead. For what we are doing that is the perfect trade off.
In terms of LSH, we tried projection hashing and quantization and were always disappointed.
So it seems like the neural network producing the neural hash is still a standard CNN operating on the usual vector representations? And then the learned hash gets used in a downstream problem...
Or is there actually some interesting hash-based neural algorithm lurking around somewhere?
Network based hashing is great to maximise information quality of the hash (compared to other LSH methods). It works to compress existing vectors super efficiently.
Very soon things like language embeddings will skip the vectors and instead networks output hashes directly. These are much faster as the network can learn where to use more bits where it needs resolution, as opposed to using floatXX for everything. It’s amazing to see it work, but not fully there yet.
Hello! First I would like to say this is a very cool writeup. I'm not a computer scientist but do dabble a bit in neural networks. Is it possible this could be used to build a convolutional neural network?
Disclaimer: I work at Algolia.