To me that reads more like monorepo is a central point of failure and they’re scrambling to bandaid the consequence of that decision. And the bandaids aren’t gonna scale to 1000 people
I guess they’re missing whatever Google has to make their monorepo scale
Problems don’t go away with fractured repos. They just change shape. Many repos maybe get you more reliable CI, but you pay for it with increased cost of integrating dependencies and increased complexity with debugging breaks in production (assuming many repos mean many services).
In my experience, multiple small repos don’t even have better CI reliability than a mono repo as less is invested because it affects fewer people. 10 person repos regularly have flaky tests that never get addressed because “we’ll deal with it later”. The tolerance for flakiness goes up when you can attribute it to a close teammate you know is heads down on something critical instead of it feeling like a random test you don’t even care about.
Kind of? Almost tautologically, if you have multiple repos, it’s less likely that everyone will stuck at once. But it’s entirely possible that the total “stuck time” per engineer is no lower across a year.
In my experience the only repos that never get stuck are ones with no checkin gates.
Mendral co-founder here. What happens at PostHog is not uncommon. While building Mendral, we talked to hundreds of team and they all have a similar situation. Initially they come to us to make their CI pipelines faster. But as the agent dives in, the urgency becomes keeping all pipelines reliable. It comes from growing a code base with a test suite. Of course it has to change eventually: splitting the test suite, running specific part of the CI depending on the code, etc... But the situation described in the article is widespread with a product that grows quickly.
I guess they’re missing whatever Google has to make their monorepo scale