Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I maintain a couple of MySQL based applications. I don't really use any features outside of "standard SQL" is there a reason to switch over to Pg? I haven't used Pg before and usually default to MySQL.


One big argument: Transactional DDL. For example:

  begin;
  alter table foos add answer int not null default 42;
  alter table foos drop column plumbus;
  update foos set name = upper(name);
  create table bars (t serial);
  drop table dingbats;
  rollback;  // Or, of course, commit
What's the benefit? Atomic migrations. You can create, alter, drop tables, update data, etc. in a single transaction, and it will either commit complete if all the changes succeed, or roll back everything.

This is not possible in MySQL, or almost any other database [1], including Oracle — DDL statements aren't usually transactional. (In MySQL, I believe a DDL statement implicits commits the current transactions without warning, but I could be wrong.)

Beyond that, I'd mention: PostGIS, arrays, functional indexes, and window functions. You may not use these things today, but once you discover them, you're bound to.

[1] https://wiki.postgresql.org/wiki/Transactional_DDL_in_Postgr...


I use transactional DDL in my tests. All the tables, triggers, etc. are set up inside a transaction, and then the actual tests run inside nested transactions. At the end of the test run, the outer transaction gets rolled back, and everything disappears.

I don't know if it accomplishes anything truly new (other than ideas that aren't very useful in practice like being able to have multiple test runs going in parallel), but it's a pretty neat way to be able to do it and works well.


Transactional tests have some downsides, unfortunately. If your tests test transactional code, that code itself cannot create transactions; they have to use savepoints, which aren't quite the same. Transactional tests also don't work with testing anything concurrent, unless you share the session across threads/goroutines/whatever.

Lastly, if a test fails you'd typically like to leave the data behind so that you can inspect it. A transactional test that rolls back on failure won't allow that.


Save points, with proper management of them, seem to match a conceptual nested transaction as far as I've seen. We've got a test bed connection manager that mocks savepoint creation, rollback and committal into transaction management functions so doing something like beginTransaction && beginTransaction works fine.


There may be some semantic nuances. Savepoints aren't real transactions, so they "inherit" the locks of the parent transaction, for example. But it might not matter in practice in the context of tests.


That's true, thanks. None of those are relevant for me, but there definitely are cases where it wouldn't be the right approach.


I use Pg's Template Databases[1] for this. Gives me the freedom to use transactions and concurrency in tests.

Without this built-in feature, I'd have used filesystem snapshots, if I didn't mind the time it'd take to stop and start Pg.

----

1: https://www.postgresql.org/docs/current/manage-ag-templatedb...


I don't think one would migrate if Pg's strength is on 'alter table' which isn't what people do on a daily basis

Might want to mention the downside of using MySQL as well. (Am also interested to know as a daily MySQL user.)


If Oracle DDL is not transactional, what's the point of its Edition-Based Redefinition feature?


Oracle's "editions" are more like versioning, I think. Last I checked, only a very limited set of schema objects were editionable (views, triggers, procedures, etc.), not including tables or indexes.


At my PHP-shop company, most projects are limited to MySQL 5.7 (legacy reason, dependency reason, boss-likes-MySQL reason...). They are all handicapped by MySQL featureset, and can't update to 8 yet. If they had used Postgres some years ago, they would get:

- JSON column (actually MySQL 5.6 supports it but I doubt if it's as good as Postgres)

- Window functions (available in MySQL 8x only, while this has been available since Postgres 9x)

- Materialized views, views that is physical like a table, can be used to store aggregated, pre-calculated data like sum, count...

- Indexing on function expression

- Better query plan explanation


Also suffering under mysql 5.7 here and agree. Also even stuff like CTEs/WITH make queries more readable and composite field types like ARRAY are still missing (you see GROUP_CONCAT shenanigans being used instead).

For indexing on function expressions in particular, the workaround we use is to add a generated column and index that.


Be warned that in PostgreSQL, WITH is an optimization barrier, and is planned to remain that way to serve that purpose. If you can, prefer using views to enhance readability (and testability as a bonus). PostgreSQL views (unlike those in MySQL) do not prevent optimization across them.


No, CTEs are not planned to remain a barrier, this is already fixed in the next version which is in feature freeze right now.

https://www.depesz.com/2019/02/19/waiting-for-postgresql-12-...


My favorite feature of PostgreSQL 12 except perhaps REINDEX CONCURRENTLY, but I am very biased since I was involved in both patches (both were large projects involving many devs and reviewers). It is awesome to finally see both land.


This is the best news I've heard all week.


I have tried to express my joy at this news to my less SQL literate co-workers... that failed so I wanted to let it out here. This is the best news, I am overjoyed!


Oh wow, that is news to me! A welcome change.


Which is very often a good thing. I have tuned more than one query by moving a sub-query/derived table into a CTE.

What bothers me more, that a CTE prevents parallel execution, but I think that too is fixed with Postgres 12


> Indexing on function expression

MySQL 5.7 fully supports this. See https://dev.mysql.com/doc/refman/5.7/en/create-table-generat... and https://dev.mysql.com/doc/refman/5.7/en/create-table-seconda...

> JSON column (actually MySQL 5.6 supports it but I doubt if it's as good as Postgres)

Actually MySQL 5.6 doesn't support this, but 5.7 does, quite well: https://dev.mysql.com/doc/refman/5.7/en/json.html


Indexing a generated/computed column is not the same as creating an index on an expression. If you want to support several different expressions you need to create a new column each time.

Additionally, an ALTER TABLE blocks access to the table. Indexes can be created concurrently while other transactions can still read and write the table.

But MySQL doesn't support indexing the complete JSON value for arbitrary queries. You can only index specific expressions by creating a computed column with that expression and indexing that.


> If you want to support several different expressions you need to create a new column each time

Yes and no. Generated columns in MySQL can optionally be "virtual". An indexed virtual column is functionally identical to an index on an expression.

> Additionally, an ALTER TABLE blocks access to the table.

It depends substantially on the specific ALTER and version of MySQL. Many ALTERs do not block access to the table in modern MySQL; some are even instantaneous.

> But MySQL doesn't support indexing the complete JSON value for arbitrary queries. You can only index specific expressions by creating a computed column with that expression and indexing that.

What's the difference, functionally speaking? (Asking honestly, not being snarky -- I may not understand what you are saying / what the equivalent postgres feature is?)


> What's the difference, functionally speaking? (Asking honestly, not being snarky -- I may not understand what you are saying / what the equivalent postgres feature is?)

You create a single index, e.g:

create index on the_table using gin(jsonb_column);

And that will support many different types of conditions,

e.g.: check if a specific key/value combination is contained in the JSON:

where jsonb_column @> '{"key": "value"}'

this also works with nested values:

where jsonb_column @> '{"key1" : {"key2": {"key3": 42}}}'

Or you can check if an array below a key contains one or multiple values:

where jsonb_column @> '{"tags": ["one"]}' or where jsonb_column @> '{"tags": ["one", "two"]}'

Or you can check if all keys from a list of keys are present:

where jsonb_column ?& array['key1', 'key2']

All those conditions are covered by just one index.


Interesting, thanks! Indeed, there isn't an equivalent feature in MySQL yet.

Out of curiosity, how commonly is this used at scale? I'd imagine there are significant trade-offs with write amplification, meaning it would consume a lot of space and make writes slow. (vs making indexes on specific expressions, I mean. That said, you're right -- there are definitely use-cases where making indexes on specific expressions isn't practical or is too inflexible.)


I'm stuck on 5.7 because previous dev used the worst sprocs I've seen (no exaggeration) and until I've ripped them all out I daren't move to 8, it was on 5.5 when I started but with much effort I got it tested enough to reasonably confident that 5.7 would work.

It's an excruciating process though.


Actually window functions were introduced in Postgres 8.4


If there is an analytics db/replica, your data analysts will worship the ground you walk on if you migrate from MySQL to Postgres.


Interesting, can you elaborate, I'm considering a switch.


Not the parent, but even as a MySQL expert I'd agree that Postgres is often a better choice for OLAP-heavy workloads (e.g. very complex reporting and analytical queries). MySQL's query planner historically hasn't been great for these, and ditto for its feature set overall, although it is improving.

Meanwhile, for high-volume OLTP workloads, MySQL (with either InnoDB or MyRocks on the storage engine side) has some compelling advantages... this is one reason why social networks lean towards MySQL for their OLTP product data, and/or have stayed on MySQL despite having the resources to switch.

As with all things in computer science, there are trade-offs and it all depends on your workload :)


Please humor me if I sound a bit triggered in this response, as MySQL pre-v8.0 is like if the creators went to data analysts and asked: "what features could we remove, and what changes could we make to an SQL-standard-compliant DB (postgres), in order to slow you down as much as possible, hinder your ability to read and write queries, and generally make your life a living hell?"

I really get that MySQL is good for what it does, from an engineer's point of view. It is an absolute piss-poor excuse for a database, prior to v8.0.

So what's wrong with MySQL (again, prior to v8.0, but no one seems to use the damn current version)

-Not ANSI SQL compliant (unlike Postgres)

-No CTEs/WITH clause (?!)

-no WINDOW FUNCTIONS (?!?!?!?)

-"schemas are called databases" which makes for bizarre interpretation of `information_schema` queries, which behave the same across all other DBs except mySQL. What I mean to say is MySQL calls each schema it's own database. This results in having to connect the same DB multiple times to other programs/APIs/inputs which accept JDBC.

-Worse replication options than postgres, not default ACID compliant,

-Don't know the programming term for this... but the horrendous "select col1, col2, col3... colN, count(<field>) from table group by 1" implicit group by. Meaning the system takes your INVALID query, and does things underneath the hood to return a result. Systems should enforce correct syntax (you must group by all non-aggregation columns... mysql implicitly does this under the hood).

-on a tangentially related note to the prior one, MySQL returns null instead of a divide by zero error when you divide by zero. Divide by zero errors are one of the few things that should ALWAYS RETURN AN ERROR NO MATTER WHAT -mysql doesn't support EXCEPT clauses

-doesn't support FULL OUTER JOIN

-doesn't support generate_series,

-poor JSON support

-very limited, poor array/unnest support

-insert VALUES () (in postgres) not supported

-lack of consistent pipe operator concatenation,

-weird datatype suppport and in-query doesn't support ::cast

-doesn't support `select t1._* , t2.field1, t2.field2 from t1 join t2 on t1.id = t2.id` ; that is, you cannot select * from one table, and only certain fields from the other.

-case dependence in field and table names when not escape quoted (mysql uses backtick, postgres uses double quote for escaping names). What the fuck is this? SQL is a case-insensitive language, then the creators build-in case sensitivity?

-As I mentioned above, mysql uses backticks to escape names. This is abnormal for SQL databases.

-mysql LIKE is case-insensitive (what the hell, it's case-sensitive everywhere else). Postgres has LIKE, and ILIKE (insensitive-like).

-ugly and strange support for INTERVAL syntax (intervals, despite being strings, give a syntax error in mysql. Example: In postgres or redshift etc you would right `select current_timestamp - interval '1 week'. In MySQL, you'd have to do `select current_timestamp - interval 1 week` (the '1 week' could be '7 month' or '2 day'... it's a string, and should be in single quotes. MySQL doesn't do this)

-mysql doesn't even support the normal SQL comment of `--`. It uses a `#` instead. No other database does that.

-probably the worst EXPLAIN/EXPLAIN ANALYZE plans I've ever seen from any database, ever

-this is encapsulated in the prior points but you can't do something simple like `select <fields>, row_number() as rownum from table`. Instead you have to declare variables and increment them in the query

-did I mention it's just straight up not SQL standard compliant?

At least MySQL 8.0 supports window functions and CTEs (seriously it's a death knell to a data analyst not to have these). They are the absolute #1 biggest piece of missing functionality to an analyst in my opinion.

This entire post focused on "mySQL have-nots", rather than "Postgres-haves" so I do think there are actually _even more_ advantages to using Postgres over MySQL. I understand MySQL is very fast for writes, but to my understanding it's not even like Postgres is slow for writes, and on the querying side of the coin, it's a universe of difference.

If you ever use MySQL in the future and there will be a data analyst existing somewhere downstream of you, I implore you to use MySQL v8.0 and nothing older, at any cost, for their sake.


I'm not sure what you mean by the "star from one table, specific columns from another" issue... if I'm understanding correctly I'm pretty sure MySQL allows just that.

Division by zero errors and non-"magical" GROUP BY have been the default mode of operation for a _little_ longer, since the 5.7 series.


Hmm, you're right. Must've been user error on that one ;)

I stand by the rest of my points, however.


Still mostly the same for MySQL 8?


MySQL 8.0 has Window Functions and CTEs. Of the entire list of things I mentioned, window functions and CTE's comprise 80% of the "weight" of my issues of that list.

In other words, while I (or another analyst) would likely still prefer Postgres over MySQL of any version, I wouldn't really have too much to complain about if I was using v8.


- A robust security model with role inheritance that supports both column and row policies.

- PLV8/PLPython/C functions/etc (with security!)

- TimescaleDB

- Better JSON query support

- Foreign Data Wrappers

- Better window function support

- A richer extension ecosystem (IMO)

Honestly, at this point I wouldn't use MySQL unless you only care about slightly better performance for very simple queries and simpler multi-master scaling/replication. Even saying that, if you don't need that simple multi-master scaling RIGHT NOW, improvements to the Postgres multi-master scaling story are not too far off on the roadmap, so I would still choose PG in that case.


The benefits are better defaults in terms of data reliability. Hard to say if migration is worth it to you without a lot more details, but I'd definitely recommend trying it in a new project.


We just migrated a medium size project using pgloader with great success and minimal headaches, which seems like a big step up from the last time I had to migrate a project. Highly recommended, and it might be easier than you think!


Frankly, data reliability concerns with modern MySQL / InnoDB are very outdated FUD.

Many of the largest tech companies rely on MySQL as their primary data store. They would not do so if it was unreliable with persistence.

There are many valid reasons to choose Postgres over MySQL, or vice versa -- they have different strengths and weaknesses. But there are no major differences regarding data reliability today, nor have there been for many years now.


I haven't kept up, but I remember having constrains that were accepted by the parser, but ignored by the engine (using InnoDB, I don't remember what specific constrain was concerned).

Is it still the case?


It was the case until very recently. Check constraints are now supported in MySQL 8: https://dev.mysql.com/doc/refman/8.0/en/create-table-check-c...

I believe MariaDB added support for them a couple years earlier, but am not certain.

More broadly, I would agree it's a very painful "gotcha" to have aspects of CREATE TABLE be accepted by the parser but ignored by the engine. However, in MySQL's defense, theoretically this type of flexibility does allow third-party storage engines to support these features if the engine's developer wishes.

Ideally, the engine should throw an error if you try using a feature it does not support, but in a few specific cases it does not (at least for InnoDB). This can be very frustrating, for sure. But at least it's documented. And no database is perfect; they all have similarly-frustrating inconsistencies somewhere.


To be clearer, it's about the defaults regarding strictness (of the acceptance) of data that help to avoid trouble up front, not that mysql randomly corrupts/deletes it later.


Sure, and MySQL 5.7, released 4 years ago, fixed those bad defaults.

It isn't fair to compare Postgres-of-today to MySQL-of-over-4-years-ago.


Ok, looks like that version is in LTS dists now. Perhaps to poster's legacy apps are taking advantage of them.

There were other deficiencies mentioned in Klepmann's book on Designing Data apps, but I don't remember the specifics now.


Even if OP is using 5.6 or previous, strict mode has been available as an option for over 15 years, and can be enabled dynamically (no restart required).

I simply don't see any valid argument for avoiding MySQL due to "data reliability" concerns in 2019.

> There were other minor deficiencies mentioned in Klepmann's book on Designing Data apps, but I don't remember the specifics now.

Well, I can't really respond to non-specific points from a book I haven't read. I'm happy to respond to any specifics re: data reliability concerns, if you want to cite them. FWIW, I have quite extensive expertise on the subject of massive-scale MySQL (16 years of MySQL use; led development of Facebook's internal DBaaS; rebuilt most of Tumblr's backend during its hockey-stick growth period).


It takes years to build up trust, seconds to lose it. Obsolete documentation to disappear. It's an uphill battle. In the meantime there's postgres, and it's free.


MySQL defaults to InnoDB. Is there a different metric for reliability you had in mind?

Where I work, we chose MySQL back in 2012 due to production quality async replication. I think (but am never sure) that that is now good in Postgres land.

PG has a lot of SQL features I'd love to use and can't. OTOH MySQL's query planner is predictably dumb, which means I can write queries and have good idea about how well (or not) they'll execute.


I can't remember the last time I started a project using MySQL, may be it is catching up - but PITR, partition tables and document-oriented columns are some features I use a lot. Postgis also used to be stronger then the MySQL counterpart.


I'm pretty sure you're frequently using LIMIT, which (TIL) is non-standard SQL. (PostgreSQL shares that syntax though.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: