Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not a Postgres specialist but, if memory serves, when joining 2 tables the entire hash must be loaded on 1 side (typically the smaller side, of course).


You're mixing up hash joins with hash indexes.


I know they're different things, but isn't the main reason to do hash indexes in a relational db to then to a hash join?

More to the point: even if you're just talking indexes you can't load/use half a hash table, so wouldn't the point also apply to isolated use of a hash table.

I know it's not what was asked for, just a related point.


> I know they're different things, but isn't the main reason to do hash indexes in a relational db to then to a hash join?

I think the primary use case is O(1) lookup by key, i.e. similar to how key/value stores work.

I'm not aware of the planner being able to reuse an existing hash index to perform a hash join. AFAIK the hash buckets are created on the fly. That does sound like a cool idea though.

> More the point, even if you're just talking indexes you can't load/use half a hash table, so wouldn't the point also apply to isolated use of a hash table.

What do you mean by half a hash table?


Thanks for this clarification

>> What do you mean by half a hash table?

What I'm trying to get across is my impression that you don't have to load an entire B-tree index into memory to use it... you can traverse it. Whereas a hash index can't be "streamed" or traversed like that, you have to use the whole thing.

I read some papers on hash indexes and hash joins several months ago, and know little about the Postgres specifics... just wanting to wrap my head around it, so please help me understand if I'm off base with this.


>> That means you don't load it into memory at all, you just jump to Bucket[Hash(KEY)] and then check there for KEY (it may or may not be there due to false positives and the bucket being full).

Right, of course. You're making me realize this really is my confusion between hash indexes and hash joins. The issue I'm thinking of is all about dynamically constructed hash joins. Sorry, you called it right from the start but it took me a minute to see... thanks for helping talk through it.


A hash index allows you to determine which bucket an arbitrary key is mapped to without having to look anything else up.

That means you don't load it into memory at all, you just jump to Bucket[Hash(KEY)] and then check there for KEY (it may or may not be there due to false positives and the bucket being full).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: