Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Performance Cost of Node.js and GraphQL (softwareatscale.dev)
41 points by todsacerdoti on Oct 8, 2023 | hide | past | favorite | 37 comments


All this pain when most of the time folks could just fire up Postgraphile or Hasura, point them at their Postgres database, and go sip mai tais by the pool. I honestly don't understand why folks insist on writing their resolvers by hand in 2023.

"But I don't want folks having direct knowledge of my database schema," I hear as a retort.

1. Most of the time for most projects, the GraphQL schema or REST API is a direct analog of the database.

2. You can always make a new Postgres schema with views to expose just the data features you want with annotations to show where the foreign key relationships should be. Still worlds easier than writing and maintaining resolver and data loader logic by hand.


Postgraphile just released their v5 update (still in beta) and it’s even more awesome than v4. No more crazy queries to address the n+1 problem.


I looked at GraphQL and couldn't get past the fact that it doesn't provide a way to express joins. Isn't that a problem?


A join is a SQL-ism. GraphQL allows you to express _relationships_, which the underlying database would fulfill via a join.


True - but the important point worth discussing here is that most common GraphQL implementations make it really hard to think in joins.

The resolver pattern and data loader pattern make joins really painful.

eg: implement a simple top-N query (fetch 10 users and the most recent 5 articles for each) in GraphQL with a resolver tree, or even data loader.


> eg: implement a simple top-N query (fetch 10 users and the most recent 5 articles for each) in GraphQL with a resolver tree, or even data loader.

A most_recent_articles view?


query {

  Users (first: x) {

    …,

    articles (first: y, sortBy: CREATED) {

    ..

    }

  }
}

How would one implement the users and articles resolvers - that would be as efficient as a most_recent_views in sql?


- Resolver for the most recent articles

- Resolver for n users by ID

- Caching layer

This will be quite performant in general since in worst case we do 2 SQL queries but in most cases we hit a cache.


The problem is that it’s hard to get the articles resolver make one query to fetch all the recent articles. So you’ll end up with n+1 fetches from the db.

In the articles resolver, you have to:

Select from articles order by created where article.user_id = X limit 5.

And you’ll have to run this statement n times - once for each user.

Not sure how a data-fetching cache layer at the app server layer will help here.


@alex_lav: Your SQL query here is exactly what I meant! In SQL, this is easy, both to write and to optimize perf.

Implementing this with GraphQL resolvers (how folks typically write GraphQL servers) is hard - there's a users function and an articles function that are both called, so it's hard to implement those 2 functions contributing together towards a query plan. This was my point.

In fact a REST endpoint that takes a list of users and runs this query with the `IN` part parameterized is easier to build here. But the GraphQL version is a pain.


Sorry, I feel like there must be something I'm not understanding about the limitation you're trying to convey.

WITH relevant_users AS (

  SELECT id FROM users WHERE id IN (1)
)

SELECT users.name, mra.title

FROM users

INNER JOIN most_recent_articles mra

ON users.id = mra.user_id

INNER JOIN relevant_users

ON users.id = relevant_users.id;

This is a single query that can fetch users filtered by the First: (although I just did in, you could add whatever filtering options you wanted in the relevant_users cte) with the most recent articles view we've discussed previously.


Lateral JOIN


Hasura.io was mentioned above. I'd recommend you take a look at their schema extensions to do this.

However, please keep in mind graphql is an API convention, much as REST is used. REST doesn't provide a way for joins either. Both are not direct DB access.


And thus I don't really see the point. When it first came out, I hoped GraphQL would allow me to craft my queries in one place, and reduce the tedium and error-proneness of writing client and server code to pack and unpack the data structures. But it doesn't, as far as I could tell.

Thanks for the reply.


You might want to look into tRPC and how the client has the definitions for the server. It’s a really nice dev experience


Thanks! I will.


Yeah, Hasura is really nice. Saves a lot of time


Postgraphile looks awesome. Thx for pointing it out


> Instead of multiple batches of promises, we fetch the user, items, and details in one shot. This brings up the meta-question about why use GraphQL in the first place, but that’s a larger conversation for a separate time.

At my current job we have a TON of methods like the one just above this paragraph, and they're terrible to work with. It's hard to write (correctly), it's hard to test (correctly), and it is simply tedious to make work. I could write four lines of declarative GQL (that fetches exactly what I ask for from the DB) or a real mess of a method.

In practice, it's rarely the case that you're returning 10k of anything. To look at the author's benchmark, the difference between 20ms and 60ms for a far nicer and more maintainable dev experience (for 1000 returned items) is not a compelling argument.

Hell, if I look at our stack, the biggest cost is creating parameterized queries with many hundreds of IDs to select to prevent N+1 queries. I've spent countless hours optimizing the slowest code (that uses the pattern the author suggests) to use JOINs instead. Removing the round trip and the cost of shipping the list of IDs back to the DB consumed far more time than you might expect.

If you want to go down an even deeper rabbit hole, look at the cost of serializing your data to JSON and deserializing it on the client. When you're shipping tens of thousands of records to the client, that's likely hundreds of kilobytes or even megabytes of data. Serializing that is expensive!


Totally feel the pain. Ironically, most DBs have started supported really good json aggregation functions.

Combined with lateral joins, it’s possible to even push this ser/de to the DB directly and have minimal unpacking and packing in sending that data back to the client!


To me those advanced json aggregation functions are to handle the fact that most DBs accept de-normalized JSON fields.

Direct DB ser/de looks like a bad idea, it's often much easier to scale the application server layer that the DB layer on most cases.

And in most situations you want to "expand" the data as close as possible to the final location.

If you really care about having very low overhead ser/de you would probably go for protobuf/Cap'n Proto almost everywhere instead.


> it's often much easier to scale the application server layer that the DB layer on most cases.

Hm…ignoring the specific design decision here on using json agg - I think the open question is - is that specific sentiment practically still the truth in 2023? Is it worth scaling app servers or is it better to use Postgres well and scale a managed PG offering?

Do we need to spend time writing boring code when we could instead just use DBs better? And then instead we can spend time where it matters in the app server.

> And in most situations you want to "expand" the data as close as possible to the final location.

What do you mean by “expand”?

From a perf pov: Imagine you’re fetching 100 authors and each had 1000 articles, would you rather a) fetch mxn Cartesian product or b) ask the DB to send you a json aggregation where each author has a 1000 articles? There’s a significant db perf, intermediate layer ser/de, and network latency difference right?


> Imagine you’re fetching 100 authors and each had 1000 articles, would you rather a) fetch mxn Cartesian product or b) ask the DB to send you a json aggregation where each author has a 1000 articles?

I would be surprised if the DB performed better with an aggregation than just returning the results. Assuming you have reasonable indices, it’s a trivial join.

Network latency, maybe. Assuming 20 bytes per name, with no compression that’s an extra 2 MB.

You could write the query with a CONCAT as a subquery as a poor man’s aggregation, but then you’re very much at the mercy of the planner deciding to do all of that in one pass. I’m not sure that it would. On mobile, or else I’d check.


Like you said, it's not the join cost that's significant :)

But for the DB to be sending less data over the network means that the DB is doing less CPU work in helping process that data.

Benchmarking should be straightforward for a particular use-case.


> Is it worth scaling app servers or is it better to use Postgres well and scale a managed PG offering?

That is a very good question indeed, it probably depends on whatever compute or the DB offering is cheaper to scale for you; from my very anecdotal experience using both, usually DB is much more expensive to scale up.

> What do you mean by “expand”?

Getting more data thru the pipes, getting closer to the limit of the network pipes.

That doesn't sound like a realistic scenario to me - pulling Y articles from X authors.

If you're not pulling directly from articles, and you're pulling from X authors, X>1, you most likely don't actually want the articles, more likely you want instead some stat of each article like article score, subject, tags or comment counts. Then I don't see neither the JOIN expansion or the JSON ser/de being a serious issue at all compared to the DB retrieval costs.


Most applications don't even need to optimise for N+1 queries and the ones that do could get by with just optimising it in a few places. Measure first then optimise as you see the need.

GraphQL batching together what would be multiple waterfall REST requests is such a perf boost for most cases that N+1 is really only an issue if your DB is bottlenecked.


I'd normally agree, but this is dangerous advice. Why? Because by the time you notice, it's not slow, it's down. N+1 queries don't scale badly as a function of users, it scales badly as a function of cumulative user behavior. One unexpectedly active user can make you self-DoS your database. Even a DB proxy like pgbouncer won't really help because parameterized queries are likely implemented with stored procedures, which require an exclusive DB connection, layering bottlenecks on bottlenecks.


In my experience, you typically don’t just improve past slwo with n+1s.

Sure, if you introduce an n+1 to a large, existing table it will slow down. However, most endpoints are going to start at zero.


I think there are 2 subtleties here on where you optimize perf:

1. Frontend makes n+1 requests to the api. Either use GraphQL or build a custom endpoint that shapes a perfect response.

2. Build an api server that fetches data from the db efficiently without n+1 hitting the DB.

The whole tension introduced with resolver/data-loader ways of building GraphQL, is that while it makes 1 easy, it makes 2 harder. Compared to bespoke REST-ish endpoints.


true, but ORMs also make 2. harder


IMHO: The way we often approach GraphQL execution is a little dated. It’s perhaps legacy thinking from times when we had data layers that couldn’t be expected to scale even for the majority of use cases. But that has completely changed over the last 5-10 years - DBs can scale pretty massively, for specific workloads with very little work required at the app server layer. So if we used DBs to actually model as close to our domain entities as possible (and yes, we’ll need multiple DBs sooner or later), then a huge part of our work is already technically done.

So - here’s another way to approach building GraphQL:

1. Build a “read graph” so that you get a highly composable API that can almost entirely be pushed down into a data layer (eg: Postgres). This would involve techniques similar to building a compiler. Which is hard, but the great news is that it’s easy to make reusable across people who share the same data layer (Eg: Postgres).

There are only 2 business logic concerns at layer, both of which can be pushed down, which helps simplify the problem. 1) modelling/transformation: which can be done in the language of the data layer and pushed down (like views) 2) authz: which can also be done in the language of the data layer and pushed down (like rls)

2. The “command graph”: a set of controllers written in whatever language you please that makes changes to the data layer (or delegates to an orchestration system) and returns a reference to the read graph after having made changes.

This is kind of how we approach the problem at Hasura - a connector SDK that makes it easy to build reusable query compilers for different DBs or API services.

And it’s been great to see similar ideas in the ecosystem - let’s make it easy to execute a tree of functions of create a query plan, instead of executing a tree of data fetch functions (like the OP). Grafast by the postgraphile folks comes to mind.


One of Bun's assertions is that their AsyncLocalStorage is significantly faster than Node's, b/c it purposefully only implements the subset of async_hooks that can be implemented w/o affecting performance:

https://github.com/oven-sh/bun/pull/3089

I haven't had a chance to test it out yet, but perhaps the OP could add Bun to the mix? To compare it's promise overhead vs. Node's promise overhead.

Also there is an issue in Node about performing the same optimization:

https://github.com/nodejs/node/issues/46265


> We could write a one-shot resolver that implements the entire query. [...] This brings up the meta-question about why use GraphQL in the first place [...].

This seems like a reasonable approach to me. Give downstream-developers the opportunity to work unobstructed, then observe which queries are slow / heavily used, and implement optimised resolvers for just these cases.

In the worst case, you might not save any work on your end, but at least teams that depend on your api are never blocked from iterating.


Or just stop separating front end and backend. The worst companies to work for are the ones who separate these 2 then try to patch it with junk like graphql.


Well, personally I have a bigger aversion to companies who hire dogmatic developers that discard an entire technology as junk, without even considering the scenarios in which you might prefer a horizontal separation of teams.


I only received upset emails when a (businesS) consumable API was GraphQL only.


"2x-3x latency increase"

With respect to what?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: