fair point on blocking sends — but that's an implementation detail, not a structural one.
From my pov, the worker pool's job isn't to absorb saturation. it's to make capacity explicit so the layer above can route around it. a bounded queue that returns ErrQueueFull immediately is a signal, not a failure — it tells the load balancer to try another instance.
saturation on a single instance isn't a scheduler problem, it's a provisioning signal. the fix is horizontal, not vertical. once you're running N instances behind something that understands queue depth, the "unfair scheduler under contention" scenario stops being reachable in production — by design, not by luck.
the FrankenPHP case looks like a single-instance stress test pushed to the limit, which is a valid benchmark but not how you'd architect for HA.
My usecase was building an append-only blob store with mandatory encryption, but using a semaphore + direct goroutine calls to limit background write concurrency instead of a channel + dedicated writer goroutines was a net win across a wide variety of write sizes and max concurrent inflight writes. It is interesting that frankenphp + caddy came up with almost the same conclusion despite vastly different work being done.
this makes sense for your workload, but may the right primitive be a function of your payload profile and business constraints ?
in my case the problem doesn't arise because control plane and data plane are separated by design — metadata and signals never share a concurrency primitive with chunk writes. the data plane only sees chunks of similar order of magnitude, so a fixed worker pool doesn't overprovision on small payloads or stall on large ones.
curious whether your control and data plane are mixed on the same path, or whether the variance is purely in the blob sizes themselves.
if it's the latter: I wonder if batching sub-1MB payloads upstream would have given you the same result without changing the concurrency primitive. did you have constraints that made that impractical?
In my case, "background writes" literally means "do the io.WriteAt for this fixed-size buffer in another goroutine so that the one servicing the blob write can get on with encryption / CRC calculation / stuffing the resulting byte stream into fixed-size buffers". Handling it that way lets me keep the IO to the kernel as saturated as possible without the added schedule + mutex overhead sending stuff thru a channel incurs, while still keeping a hard upper bound on IO in flight (max semaphore weight) and write buffer allocations (sync.Pool). My fixed-size buffers are 32k, and it is a net win even there.
right — no variance, question was off target. worth noting though: the sema-bounded WriteAt goroutines are structurally a fan-out over homogeneous units, even if the pipeline feels linear from the blob's perspective. that's probably why the channel adds nothing — no fan-in, no aggregation, just bounded fire-and-forget.
And to put it plainly: we won't be able to manage LLM-generated contributions without LLMs. It's physically impossible at this scale.
Which means the immune system has to be built from the same substrate as the threat. The question isn't whether to use AI for review — it's whether that review layer will be open, distributed, and community-owned, or closed, centralized, and controlled by whoever gets there first.
But there's a layer above that which is easy to skip over: human supervision.
Not line-by-line review — that's already gone. What remains is supervision of curated logs, at ratios that might look something like 1 in 10^10. The human role is no longer technical production. It's oversight. And that's a genuinely new function that we don't have good tools for yet.
The flow is perpetual. It doesn't stop, it doesn't slow down, it only accelerates. Which means we'll need to build tooling specifically designed to absorb volume, abstract it into supervisable signals, and train us to work at that level of abstraction — where the unit of human attention is no longer a line of code or a PR, but a pattern across millions of automated actions.
Automation isn't the threat to manage. It's the only viable response to production at this frequency. The question is whether we build the abstraction layer deliberately, as a community, before someone builds it for us.
Something worth sitting with, rather than a conclusion:
As PR velocity reaches this scale — 100 per hour, hundreds of thousands of lines a day — I find myself wondering about the collective immune system side of this.
If we're not yet organized around injection and obfuscation at the community level, PR saturation itself becomes a distinguishable attack vector — and not just for backdoors.
Two distinct risks worth separating:
Offensive saturation: flood a competitor or a fast-moving startup with automated PRs. Their human review bandwidth collapses. Real community contributions drown in noise. The project slows, maintainers burn out, momentum dies. No backdoor needed — attrition is enough.
Forced opening: a project overwhelmed by volume lowers its review standards to survive. It merges faster, checks less. The saturation wasn't meant to block — it was meant to open. Once standards drop, real injection becomes trivial.
The unsettling part: this vector requires no particular skill, is already available, and is organically indistinguishable from legitimate viral growth.
To envision an open source that survives AI, maybe we need to envision an open source AI that protects open source.
Genuinely curious if others are thinking about this, and whether anyone has seen serious work in this direction already.
The "job as library" pattern is simple: instead of wiring jobs into main or a framework, you split into 3 things.
Your queue is a struct with New(db) — it knows submit, poll, complete, fail, nothing else.
Your worker is another struct that loops on the queue and dispatches to handlers registered via RegisterHandler("type", fn). Your handlers are pure functions (ctx,payload) → (result, error) carried by a dependency struct.
Main just assembles: open DB, create queue, create worker, register handlers, call worker.Start(ctx). Result: each handler is unit-testable without the worker or network, the worker is reusable across any pipeline, and lifecycle is controlled by a simple context.Cancel().
Bonus: here the queue is a SQLite table with atomic poll (BEGIN IMMEDIATE), zero external infra.
The whole "framework" is 500 lines of readable Go, not an opaque DSL. TL;DR: every service is a library with New() + Start(ctx), the binary is just an assembler.
The "all in connectivity" pattern means every capability in your system — embeddings, document extraction, replication, MCP tools — is called through one interface: router.Call(ctx,"service", payload).
The router looks up a SQLite routes table to decide how to fulfill that call: in-memory function (local), HTTP POST (http), QUIC stream (quic), MCP tool (mcp), vector embedding (embed), DB replication (dbsync), or silent no-op (noop).
You code everything as local function calls — monolith. When you need to split a service out, you UPDATE one row in the routes table, the watcher picks it up via PRAGMA data_version, and the next call goes remote.
Zero code change, zero restart. Built-in circuit breaker, retry with backoff, fallback-to-local on remote failure, SSRF guard.
The caller never knows where the work happens.
That's the "job as library" pattern: the boundary between monolith and microservices is a config row, not an architecture decision.
Nice try — the local-first distributed pattern worth building on.
Three questions:
Why blockchain for access rights? A signed Merkle structure or a Certificate Transparency-style log would give the same guarantees without the operational complexity. What does the blockchain add here that a simpler append-only signed registry doesn't?
The threat model is unclear. If the blockchain provider controls validation, the "accessible only to end users" guarantee depends on trusting that provider. This is the oracle problem — the chain guarantees integrity of what's inside it, but not the truthfulness of what gets written in. Who runs the chain, and what happens if they're compromised or write false access rights?
Go is listed first in the bindings but the example code is Python. Is the Go binding at feature parity, or is Python the primary target?
Hi Horos,
thanks for your comments. I really appreciate!
1. Perhaps I am misusing the "blockchain" term. The access is granted with signed blocks. Each block can introduce some changes, like granting/removing access to other users, including the encryption key with an envelope. Each block links to the previous via hash. There is no consensus mechanism.
2. The vault is defined by a storage and the public keys of the creator. A client must know in advance the creator keys and he will use those keys to verify the signature. The creator then can grant admin rights to other users with specific blocks. An access grant not signed by an admin, will be rejected by a user. It is not really about data truth, because the target is more information exchange. Does it answer the question?
3. Go is the implementation language, not really a binding. I use Python in the first example because it is more compact. However the guide shows samples for all supported languages. The primary target is Go for server side and Dart for mobile. Python is effective for samples and experiments.
Thanks for sharing — always a pleasure to discover what others are building.
A few thoughts after your answers:
The E2E file sync part has existing solutions, and your access rights system is really a signed append-only log rather than a blockchain (no consensus, no decentralization) — which is fine, but the term might create misleading expectations.
What I'm more curious about is the access model itself. How are access tokens created and transferred? Who consumes them, and how does authorization propagate? Have you considered a salted API where each user carries a unique identifier, so the whole grant/revoke/delegate flow goes through a single unified mechanism regardless of what's being accessed?
The SQL sync layer is what actually caught my eye — I've worked on similar problems for specific use cases, and encrypted database sync between peers is a genuinely hard problem. That feels like your real differentiator.
On that note: does the SQL layer reference the file content or file paths? I'm guessing you built both interfaces because they're correlated — the SQL holds structured data that points to the encrypted files. If so, that's worth making explicit, because right now they look like two unrelated features rather than two sides of the same system.
my claude drive his own brave autonomously, even for ui ?
reply