PostgreSQL Accessing MySQL as a Data Source Using mysqsl_fdw

teej · on Aug 24, 2018

This feature is the sort of thing that a clever person will use as a shortcut to actually writing robust ETL code. I have had nothing but nightmares dealing with foreign data wrappers in Postgres.

The caveats using this feature -

1/ Strict upper bound on how much data it can pull in.

2/ MySQL migrations need to be run on both Postgres and MySQL

3/ No way to gracefully migrate or version

4/ MySQL’s “loose” typing doesn’t play well with Postgres’s “strict” typing. This means data can break the fdw.

5/ Pain in the ass to debug.

6/ This is debatable, but I don’t believe that application code (SQL code in this case) should live in the database.

7/ Postgres and MySQL have different performance characteristics. This can lead to hard-to-debug performance problems if you are explicitly or implicitly using a fdw in your query.

8/ If you want to keep a local copy of MySQL data in Postgres (to solve for #7) you then have to write code to keep it up to date. This defeats the convenience of the fdw.

9/ At least when I was using fdws, they didn’t have predicate pushdown. This causes you to do structure queries in a weird way to get filters to work as you expect.

10/ You have to manage schema type mismatch between MySQL and Postgres. This isn’t fun or productive work.

My experience is colored by inheriting a disaster of interconnected fdws. But given that negative bias, I still haven’t seen enough value from fdws to justify their use outside of small scale corner cutting.

atomic77 · on Aug 24, 2018

FDWs can be incredibly useful in the case of needing database-level, low-latency access to a reasonably small amount of data managed on another server, and you don't want or cannot spend the time to write "proper" ETL code.

pg314 · on Aug 24, 2018

> This feature is the sort of thing that a clever person will use as a shortcut to actually writing robust ETL code.

With mysql_fdw you can write the ETL code itself in PostgreSQL: you expose (a subset of) the MySQL tables through the FDW, and then you write SQL code to transform it and copy it to your analytics tables. That's exactly what I did in one project: at night the whole analytics database was recreated in 15 minutes or so (the biggest table had about a hundred million of rows).

Most of the caveats don't apply in that case:

> 1/ Strict upper bound on how much data it can pull in.

I don't know what you mean by that. As far as I know, there are no such bounds.

> 2/ MySQL migrations need to be run on both Postgres and MySQL

Since the analytics database is recreated every night, that is not the case.

> 3/ No way to gracefully migrate or version

Same as 2.

> 4/ MySQL’s “loose” typing doesn’t play well with Postgres’s “strict” typing. This means data can break the fdw.

I never ran into that, but it is possible.

> 5/ Pain in the ass to debug.

Much easier to debug than ETL scripts that talk to two different databases, in my opinion. You can interactively write the SQL code in psql. I used pgTap to unit test the SQL code.

> 6/ This is debatable, but I don’t believe that application code (SQL code in this case) should live in the database.

I don't see any problem with that.

> 7/ Postgres and MySQL have different performance characteristics. This can lead to hard-to-debug performance problems if you are explicitly or implicitly using a fdw in your query.

In my approach you just do a 'SELECT * FROM x' on the mysql side. All performance problems are easily debugged on the postgres side.

> 8/ If you want to keep a local copy of MySQL data in Postgres (to solve for #7) you then have to write code to keep it up to date. This defeats the convenience of the fdw.

If you recreate the analytics database periodically this is a non-issue.

> 9/ At least when I was using fdws, they didn’t have predicate pushdown. This causes you to do structure queries in a weird way to get filters to work as you expect.

PostgreSQL has had predicate pushdown for years now.

> 10/ You have to manage schema type mismatch between MySQL and Postgres. This isn’t fun or productive work.

You have to do that anyway, even if you're using ETL scripts.

If you use the FDW, you can implicitly create the tables on the postgres side: CREATE TABLE my_analytics_table AS SELECT foo, bar FROM mysql_fdw_table WHERE ...;

That way my_analytics_table will get the postgresql types corresponding to the mysql_table. If anything changes, it gets automatically propagated. Analytics queries against my_analytics_table might break, though. Usually, when it's a change from e.g. int8 to int16, things will continue working.

j88439h84 · on Aug 26, 2018

Why don't you want app code in the db?

eMerzh · on Aug 24, 2018

I've used the mysql FDW for a project at work, it was really helpfull. We've replaced an old "ETL" using a bunch of hard to maintain scripts to mysql_fdw and some materialized view.

It was easier for us because it helps write maintainable logic, you can easily sync multiple databases, multiple servers, without much changes.

A few things to note though:

- we had issues with 0000-00-00 dates that are valid in mysql but not in postgres. So we had to write include a convertion in plsql into the mat view.

- We stumble upon a bug in mysql_fdw, in which the selected row is skipped if the size of a column exceed a given size. This is reaaaaally hard to catch and really bad experience as you won't get any error, notice or warning.

- we had trouble with a few type casting like json and others.

kbendyk · on Aug 24, 2018

I once used FDW to synchronize Microsoft SQL server DB with PostreSQL, after setting up, It was just a matter of a single UPDATE to sync quite complex data across systems. It isn’t a perfect way, but given there was no public interface in MS SQL backed application it was a pretty effective solution.

imbusy111 · on Aug 24, 2018

Services should be communicating through public interfaces, not databases. Otherwise you're confusing implementation details with public interfaces and change becomes really hard and risky.

I can see this being useful in some legacy system that you have to maintain that is not changing anymore, but otherwise I don't see a good use for it for properly designed and new projects.

jobinau · on Aug 24, 2018

Suppose we want to run a complex reporting query on tables in Sales database (running MySQL) and Finance database (running PostgreSQL). What is the good option without duplicating data from one system to another?

logophobia · on Aug 24, 2018

ETL, that way you can also ensure your data gets put in a format that'd make it easy to report on.

Granted, it's not quite as quick as this, and not suited for "one-off" reporting, but it's a lot more flexible. Put things in a nice star schema and you have a good way to do analytics on your data.

gav · on Aug 24, 2018

Duplicating data isn't a bad thing.

Having a process that copies data into a reporting system solves the problem that otherwise it would be brittle (changes to either system might break things) and performance (any reporting queries affect realtime ones). Also, as it's decoupled, pulling in data from a 3rd system, e.g. CRM is possible.

kadenshep · on Aug 24, 2018

>but otherwise I don't see a good use for it for properly designed and new projects.

What's your definition of properly designed projects?

imbusy111 · on Aug 24, 2018

A project that adapts easily to changing business requirements and environments.

namibj · on Aug 24, 2018

There are cases where you do know that those don't change as fast or that the actual core of your software needs a fundamental rewrite anyway to pivot. Think about number crunching or comparable small software.

kadenshep · on Aug 25, 2018

I don't see anything that doesn't meet those requirements.

avivallarapu · on Aug 24, 2018

This is great in fact. I like the idea how it can be used to avoid data duplication.

ramenmeal · on Aug 24, 2018

I'd rather data duplication than coupling two different database servers

rosemaryg · on Aug 24, 2018

Great article.

alexnewman · on Aug 24, 2018

Couldn’t get this to work for large tuples

ibrar74 · on Aug 24, 2018

Do you have sample sql script and paste what error you are getting?

alexnewman · on Aug 28, 2018

It's been a year let me get one.

jobinau · on Aug 24, 2018

PostgreSQL and MySQL are the most popular database systems. Yes, they can talk and share data using mysql_fdw