This makes me think about how other people here feel about the arguments presented in this post. Personally, i actually agree with some of these points!
> difficulty deploying schema changes
Definitely agreed, even with tools like Flyway, Liquibase, dbmate or most of the framework provided options (such as Active Record Migrations for Rails and Doctrine for Symfony), most migrations still end up feeling brittle, because you oftentimes do things like renaming a column, or processing some data into a new format, or cleaning up old data etc. Well, you want to do that anyways, but then you realize that instead of simply renaming a column, you'll probably do a rolling migration for the apps that use the DB, therefore you need to create a new column that the app will write data into, then migrate all of the app instances to the new version and then clean up the old column, god forbid validations use the wrong column while this is going on. I don't think it's possible to work around problems like this with technologies like MongoDB either, since then dealing with missing data in an "old" version of a document would still be annoying. I don't know of any good solutions to address how data evolves over time, regardless of technology.
> difficulty sharding
Definitely agreed, in general it seems like most DBMS mostly scale vertically better than they do horizontally. For example, master-slave replication seems doable, but once you want to do master-master replication, you run into problems with latency and data consistency. There are some solutions like TiDB which attempt to give you a distributed database in a transparent way, without making you worry about its inner workings, but that only works until suddenly it doesn't. It seems like this problem affects most distributed systems and i'm not sure how to address it, short of making each new data entry reference the previous state, like CouchDB does with revisions ( https://docs.couchdb.org/en/stable/intro/api.html#revisions ) and even that won't always help.
> an awkward square-table model and a terrible query language
Partially agreed, SQL is pretty reasonable for what it does, despite its dialects being somewhat inconsistent, many of the procedural extensions being clunky and most of the in-database processing heavy systems that i've encountered being a nightmare from a debugging and logging perspective, though i guess that's mostly the fault of the tooling surrounding them. Discoverability can be a big problem if OTLT and EAV are heavily used ( https://tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-bi... ) and foreign keys are not used. Window functions, analytical functions, partitioning and other functionality feels like it's implemented in unintuitive ways in some systems, but that could also be a question of familiarity and a steep learning curve.
> if you make the mistake of trying to use the transactional functionality that's the one actual selling point of those datastores then you're practically guaranteed to deadlock yourself in production at some point during your growth process
Partially agreed, it can definitely happen, but being able to revert bad changes to the data and even test them in the first place sometimes feels like a godsend. Well, there should always be a local instance that's safe to break, but in practice that doesn't really come true often.
>Definitely agreed, in general it seems like most DBMS mostly scale vertically better than they do horizontally.
NoSQL databases do nothing special to achieve horizontal scaling. They simply don't support transactions or atomic operations across documents. If that's what you want you can just choose an RDBMS with that behavior.
Sounds like vaporware to me. If there are any RDBMSes that support practical autoscaling, they're certainly less mature/established than e.g. Cassandra.
> Well, you want to do that anyways, but then you realize that instead of simply renaming a column, you'll probably do a rolling migration for the apps that use the DB, therefore you need to create a new column that the app will write data into, then migrate all of the app instances to the new version and then clean up the old column, god forbid validations use the wrong column while this is going on. I don't think it's possible to work around problems like this with technologies like MongoDB either, since then dealing with missing data in an "old" version of a document would still be annoying. I don't know of any good solutions to address how data evolves over time, regardless of technology.
IME the best way to do it is to build your system on stream transformation (i.e. Kafka) and then you can just produce the new representation in parallel, wait for it to catch up, migrate the readers over gradually and then eventually stop producing the old representation. That tends to be what you end up doing with a traditional RDBMS too, but if you're using something like Kafka then the pieces that you use are more normal parts of your workflow so it's less error-prone.
> It seems like this problem affects most distributed systems and i'm not sure how to address it, short of making each new data entry reference the previous state, like CouchDB does with revisions ( https://docs.couchdb.org/en/stable/intro/api.html#revisions ) and even that won't always help.
There are two approaches that I've known to work: 1. actual multiple concurrent versions as you say, with vector clocks or equivalent, forcing the clients to resolve conflicts if you're not using CRDTs - Riak was the best version of this approach, 2. having a clear shard key and allowing each partition to have its own "owner", making it clear what you do and don't guarantee across partitions - e.g. Kafka.
> Partially agreed, SQL is pretty reasonable for what it does, despite its dialects being somewhat inconsistent, many of the procedural extensions being clunky and most of the in-database processing heavy systems that i've encountered being a nightmare from a debugging and logging perspective, though i guess that's mostly the fault of the tooling surrounding them.
I wasn't talking about the fancy analytics so much as just the basic data model - e.g. having a collection-valued column is just way harder than it should be. Everything being nullable everywhere is also a significant pain.
> difficulty deploying schema changes
Definitely agreed, even with tools like Flyway, Liquibase, dbmate or most of the framework provided options (such as Active Record Migrations for Rails and Doctrine for Symfony), most migrations still end up feeling brittle, because you oftentimes do things like renaming a column, or processing some data into a new format, or cleaning up old data etc. Well, you want to do that anyways, but then you realize that instead of simply renaming a column, you'll probably do a rolling migration for the apps that use the DB, therefore you need to create a new column that the app will write data into, then migrate all of the app instances to the new version and then clean up the old column, god forbid validations use the wrong column while this is going on. I don't think it's possible to work around problems like this with technologies like MongoDB either, since then dealing with missing data in an "old" version of a document would still be annoying. I don't know of any good solutions to address how data evolves over time, regardless of technology.
> difficulty sharding
Definitely agreed, in general it seems like most DBMS mostly scale vertically better than they do horizontally. For example, master-slave replication seems doable, but once you want to do master-master replication, you run into problems with latency and data consistency. There are some solutions like TiDB which attempt to give you a distributed database in a transparent way, without making you worry about its inner workings, but that only works until suddenly it doesn't. It seems like this problem affects most distributed systems and i'm not sure how to address it, short of making each new data entry reference the previous state, like CouchDB does with revisions ( https://docs.couchdb.org/en/stable/intro/api.html#revisions ) and even that won't always help.
> an awkward square-table model and a terrible query language
Partially agreed, SQL is pretty reasonable for what it does, despite its dialects being somewhat inconsistent, many of the procedural extensions being clunky and most of the in-database processing heavy systems that i've encountered being a nightmare from a debugging and logging perspective, though i guess that's mostly the fault of the tooling surrounding them. Discoverability can be a big problem if OTLT and EAV are heavily used ( https://tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-bi... ) and foreign keys are not used. Window functions, analytical functions, partitioning and other functionality feels like it's implemented in unintuitive ways in some systems, but that could also be a question of familiarity and a steep learning curve.
> if you make the mistake of trying to use the transactional functionality that's the one actual selling point of those datastores then you're practically guaranteed to deadlock yourself in production at some point during your growth process
Partially agreed, it can definitely happen, but being able to revert bad changes to the data and even test them in the first place sometimes feels like a godsend. Well, there should always be a local instance that's safe to break, but in practice that doesn't really come true often.