This is a good description, except that TileDB (the open source client) is not t...

biggestlou · on June 15, 2020

This is a very interesting development that I'd like to learn more about. Whenever I've played around with writing databases (just as toy projects) I've always done so using RocksDB or something similar as a backend. This "thick client" model, though, seems to have a lot of potential benefits, most notably no need to worry about disk space or volumes (so say goodbye to a bunch of config parameters) and no need for a tiered storage setup or S3 migration tools (already accomplished!). Not ideal for most use cases but intriguing for some!

jakebol · on June 15, 2020

There are a lot of issues though with S3, latency, poor performance for small reads / writes, timeouts, api rate limits, api costs, and consistency issues poorly understood by third party developers.

A "thick-client" also doesn't perform well unless that client is located on a node in the same region. I think as with everything it works well in some cases and not well in others.

manigandham · on June 15, 2020

It's not so difficult if you control the data. Snowflake offers a relational datawarehouse built on EC2/S3 (and now other clouds) with its own column-oriented data format (a hybrid called PAX). It can seek to the right columns and rows by getting the exact byte ranges from an S3 object.

jakebol · on June 15, 2020

this is true (and a property of all? cloud formats Delta, Hudi, Iceberg, parquet, etc.)

I was referring more to the fact that the cloud vendors can co-design their infrastructure and software to support their database services.