This makes me think about how other people here feel about the arguments present...

imtringued · on Jan 5, 2021

>Definitely agreed, in general it seems like most DBMS mostly scale vertically better than they do horizontally.

NoSQL databases do nothing special to achieve horizontal scaling. They simply don't support transactions or atomic operations across documents. If that's what you want you can just choose an RDBMS with that behavior.

lmm · on Jan 6, 2021

Sounds like vaporware to me. If there are any RDBMSes that support practical autoscaling, they're certainly less mature/established than e.g. Cassandra.

lmm · on Jan 5, 2021

> Well, you want to do that anyways, but then you realize that instead of simply renaming a column, you'll probably do a rolling migration for the apps that use the DB, therefore you need to create a new column that the app will write data into, then migrate all of the app instances to the new version and then clean up the old column, god forbid validations use the wrong column while this is going on. I don't think it's possible to work around problems like this with technologies like MongoDB either, since then dealing with missing data in an "old" version of a document would still be annoying. I don't know of any good solutions to address how data evolves over time, regardless of technology.

IME the best way to do it is to build your system on stream transformation (i.e. Kafka) and then you can just produce the new representation in parallel, wait for it to catch up, migrate the readers over gradually and then eventually stop producing the old representation. That tends to be what you end up doing with a traditional RDBMS too, but if you're using something like Kafka then the pieces that you use are more normal parts of your workflow so it's less error-prone.

> It seems like this problem affects most distributed systems and i'm not sure how to address it, short of making each new data entry reference the previous state, like CouchDB does with revisions ( https://docs.couchdb.org/en/stable/intro/api.html#revisions ) and even that won't always help.

There are two approaches that I've known to work: 1. actual multiple concurrent versions as you say, with vector clocks or equivalent, forcing the clients to resolve conflicts if you're not using CRDTs - Riak was the best version of this approach, 2. having a clear shard key and allowing each partition to have its own "owner", making it clear what you do and don't guarantee across partitions - e.g. Kafka.

> Partially agreed, SQL is pretty reasonable for what it does, despite its dialects being somewhat inconsistent, many of the procedural extensions being clunky and most of the in-database processing heavy systems that i've encountered being a nightmare from a debugging and logging perspective, though i guess that's mostly the fault of the tooling surrounding them.

I wasn't talking about the fancy analytics so much as just the basic data model - e.g. having a collection-valued column is just way harder than it should be. Everything being nullable everywhere is also a significant pain.