sqlite is a great embedded database and thanks to use by browsers and on mobile the most used database in the world by orders of magnitude.
But it also comes with lots of limitations.
* there is no type safety, unless you run with the new strict mode, which comes with some significant drawbacks (eg limited to the handful of primitive types)
* very narrow set of column types and overall functionality in general
* the big one for me: limited migration support, requiring quite a lot of ceremony for common tasks (eg rewriting a whole table and swapping it out)
These approaches (like fly.io s) with read replication also (apparently?) seem to throw away read after write consistency. Which might be fine for certain use cases and even desirable for resilience, but can impact application design quite a lot.
With sqlite you have do to a lot more in your own code because the database gives you fewer tools. Which is usually fine because most usage is "single writer, single or a few local readers". Moving that to a distributed setting with multiple deployed versions of code is not without difficulty.
This seems to be mitigated/solved here though by the ability to run worker code "next to the database".
I'm somewhat surprised they went this route. It probably makes sense given the constraints of Cloudflares architecture and the complexity of running a more advanced globally distributed database.
On the upside: hopefully this usage in domains that are somewhat unusual can lead to funding for more upstream sqlite features.
* the big one for me: very limited migration support, requiring quite a lot of ceremony for common tasks (eg rewriting a whole table and swapping it out)
I don't know where this idea of having to swap a whole table in SQLite came from, but it simply isn't true. Over the last 13 years I have upgraded production HashBackup databases at customer sites a total of 35 times without rewriting and swapping out tables by using the ALTER statement, just like other databases:
For the most recent upgrade, I upgraded to strict tables, which I could also have done without a rebuild/swap. I chose to do a rebuild/swap this one time because I wanted to reorder some columns. Why? Because columns stored with default or null values don't have row space allocated if the column is at the end of the row.
For a long time sqlite did not have DROP COLUMN and RENAME COLUMN support, which are both pretty essential.
I'm embarrassed to admit that I didn't realize RENAME COLUMN was actually added in 3.25, almost four years ago.
DROP COLUMN was only just added last year in 3.35.
I'm surprised a database schema lasted 9/12 years without ever renaming or dropping a column.
This changes things! But even now, ALTER TABLE is not transactional. So especially with many concurrent readers there can definitely be situations where you'd still want to rewrite.
I'm not sure what you mean by "not transactional". SQLite implements transaction support at the "page" level, and builds all other database operations on top of it, which means anything that touches the bytes of the database file is transaction-safe. You can verify this for yourself:
sqlite> CREATE TABLE foo(a,b,c);
sqlite> INSERT INTO foo VALUES (1,2,3);
sqlite> BEGIN;
sqlite> ALTER TABLE foo DROP COLUMN b;
sqlite> SELECT * FROM foo;
1|3
sqlite> ROLLBACK;
sqlite> SELECT * FROM foo;
1|2|3
It's of course still subject to SQLite's normal restrictions on locking, which means a long-running ALTER statement will block concurrent writers (and probably also concurrent readers if you're not running in WAL mode).
> I'm surprised a database schema lasted 9/12 years without ever renaming or dropping a column.
I did have a couple of columns that were no longer needed and would have dropped them, but instead I just set them to null and ignored them. Nulls only take 1 byte of space in a row. I dropped them when DROP COLUMN was added.
It would really help if SQLite3 had a `MERGE`, or, failing that, `FULL OUTER JOIN`. In fact, I want it to have `FULL OUTER JOIN` even if it gains a `MERGE`.
`FULL OUTER JOIN` is the secret to diff'ing table sources. `MERGE` is just a diff operation + insert/update/delete statements to make the target table more like the source one (or even completely like the source one).
`FULL OUTER JOIN` is essential to implementing `MERGE`. Granted, one could implement `MERGE` without implementing `FULL OUTER JOIN` as a public feature, but that seems silly.
Sadly, the SQLite3 dev team specifically says they will not implement `FULL OUTER JOIN`[0].
Implementing `MERGE`-like updates without `FULL OUTER JOIN` is possible (using two `LEFT OUTER JOIN`s), but it's an O(N log N) operation instead of O(N).
The lack of `FULL OUTER JOIN` is a serious flaw in SQLite3. IMO.
RIGHT and FULL JOIN are on the trunk branch of SQLite and will (very likely) appear in the next release. Please grab a copy of the latest pre-release snapshot of SQLite (https://sqlite.org/download.html) and try out the new RIGHT/FULL JOIN support. Report any problems on the forum, or directly to me at drh at sqlite dot org.
"D1 will create read-only clones of your data, close to where your users are, and constantly keep them up-to-date with changes."
Sounds like there will be no synchronous replication and instead there will be a background process to "constantly keep [read-only clones] up-to-date". This means that a stale read from an older read replica can occur even after a write transaction has successfully committed on the "primary" used for writes.
So, while the consistency is not "thrown away", it's no longer a strong consistency? Anyway, Kyle from Jepsen will figure it out soon, I guess :)
Yeah, so you can always opt-in to strong consistency by transferring execution to the primary (see the "Embedded Compute" section of the blog). Then it's pretty much exactly the same as a DO.
Just clarifying - D1 without read replicas is strongly consistent. If you add read replicas, those can have replication lag and will not be strongly consistent.
Thanks for the clarification, that is what I would expect.
Does SQLite support some kind of monotonic transaction id that can be used as a cache coherency key? Say a client writes a new record to the database which returns `{"result": "ok", "transaction_id": 123}`, then to ensure that subsequent read requests are coherent they provide a header that checks that the read replica has transaction_id >= 123 and either waits for replication before serving or fails the request. (Perhaps a good use for the embedded worker?)
Yes you could do it manually, but it would be nice if the solution didn't require carefully managing update queries so the journal addition isn't missed and increasing write amplification to manually update a journal table when that information probably already exists somewhere in the WAL implementation.
Interesting that D1 is built on top of Durable Objects. Does this mean that it would be practical for a single worker to access multiple D1 databases, so it could use, for example, a separate database for each tenant in a B2B SaaS application? Edit: And could each database be in a different primary region?
That is interesting. I wish CF would give us some more information as I've assumed that there must be a lack of strong consistency which would be a major drawback.
Edit: But that would mean that durable objects can't be replicated asynchronously? That would mean a big latency hit. Then what's the difference to a central DB in one datacenter?
I’m not familiar with Durable Objects. When D1 does replication to read replicas, if it’s not doing synchronous replication, then it’s not strongly consistent, is that correct?
I'd like to see some up front D1 & R2 benchmarks (read/write/iops). I can't judge invocation cost value until I can judge my use case performance. Here's hoping its -gt NVMe raid 10 under the hood of D1 as some big SQLite reads suffer under slow storage.
Why? Yes sqlite doesn't have all the features postgres has. Postgres doesn't have all the features the sqlite has either. What's wrong with having different tools with different sets tradeoffs. It's a different shape of Lego and that's fine - some things call for a 1/3height 2x2 and others call for a full height 1x8.
I haven't tried duckdb but I have been googling about it. I think I saw a discussion where it was mentioned that duckdb isn't a replacement for SQLite. It is an OLAP database [0] which makes its ingestion time slower than SQLite, I think. So it is meant for analytics but not as fullfledge replacement for SQLite.
sqlite is a great embedded database and thanks to use by browsers and on mobile the most used database in the world by orders of magnitude.
But it also comes with lots of limitations.
* there is no type safety, unless you run with the new strict mode, which comes with some significant drawbacks (eg limited to the handful of primitive types)
* very narrow set of column types and overall functionality in general
* the big one for me: limited migration support, requiring quite a lot of ceremony for common tasks (eg rewriting a whole table and swapping it out)
These approaches (like fly.io s) with read replication also (apparently?) seem to throw away read after write consistency. Which might be fine for certain use cases and even desirable for resilience, but can impact application design quite a lot.
With sqlite you have do to a lot more in your own code because the database gives you fewer tools. Which is usually fine because most usage is "single writer, single or a few local readers". Moving that to a distributed setting with multiple deployed versions of code is not without difficulty.
This seems to be mitigated/solved here though by the ability to run worker code "next to the database".
I'm somewhat surprised they went this route. It probably makes sense given the constraints of Cloudflares architecture and the complexity of running a more advanced globally distributed database.
On the upside: hopefully this usage in domains that are somewhat unusual can lead to funding for more upstream sqlite features.