Thanks for sharing — sounds like you've dealt with similar
challenges.
On identity and trust boundaries: each agent in Splox runs
with isolated credentials scoped to the tools the user
explicitly connects. Agents can't discover or access services
beyond what's been granted. The MCP protocol helps here —
tool access is defined per-connection, so permissions are
inherently scoped rather than bolted on after the fact.
For the "3am on Saturday" problem — that's exactly why we
built the Event Hub with silence detection. If an agent stops hearing from a service it's
monitoring, it reacts to that. Subscriptions state persists across
restarts.
On what's worth automating: it splits roughly into two camps.
The most common are repetitive operational things — monitoring
markets, responding to messages, deploying code, updating
spreadsheets. But the more interesting use cases are
decision-based: the trading agent deciding when to open/close
positions, or a support agent deciding whether to escalate.
The Event Hub is what makes the decision-based ones viable.
Agents subscribe to real-time events and react based on
triggers — you can use structured filters or even natural
language conditions ("fire when the user seems frustrated").
So the agent isn't just on a cron loop, it's genuinely
reacting to context.
On failure states: agents have built-in timeouts on
subscriptions, automatic retries with exponential backoff,
and silence detection (they can react to the absence of
events, not just their presence). If something breaks, the
subscription expires and the agent can re-evaluate. Long-
running agents also persist their state across restarts so
they pick up where they left off.
There's also a workflow builder where you connect multiple
agents together in non-linear graphs — agents run async
and pass results between each other. So you can have one
agent monitoring, another analyzing, another executing —
all coordinating without a linear chain
That makes sense — the shift from task automation to decision automation feels like the real inflection point.
The silence detection aspect is especially interesting. Reacting to the absence of signals is something most workflow tools still struggle with, and it’s usually where long-running systems fail in practice.
Curious whether users tend to start with predefined agent patterns, or if they’re designing workflows from scratch once they understand the event model? I imagine abstraction becomes important pretty quickly as graphs grow.
Both, actually. Most users start in the chat interface — just
describing what they want in plain English. The agent figures
out which tools to use and how to react. No graph, no config.
Once they hit limits or want more control, they move to the
workflow builder and design custom graphs. That's where you
get non-linear agent connections — multiple agents running
async, passing results to each other. One monitors, one
analyzes, one executes.
Abstraction is definitely the challenge as graphs grow. Right
now we handle it by letting each node in the graph be a full
autonomous agent with its own tools and context. So you're
composing agents, not steps. Keeps individual nodes simple
even when the overall workflow is complex.
Good catches — just added Devstral Small 1 (May 2025, Apache 2.0), Devstral 2 (Dec 2025, modified MIT), and Devstral Small 2 (Dec 2025, Apache 2.0). Thanks for the feedback!
Fair point — updated the tagline to 'The complete history of LLMs'. AI as a field goes back decades; this is specifically tracking the transformer/LLM era from 2017 onward
Great resource — Dr. Thompson's table is exhaustive. llm-timeline.com takes a different angle: visual timeline format, focused on base/foundation models only, filterable by open/closed source. Different tools for different needs.
Fair point on T5 — just marked it as a milestone. On Llama 3.1: it's there as a milestone because it was the first open model to match GPT-4 at 405B, which felt like a genuine inflection point. Happy to debate the milestone criteria though — what would you add?
That was llama 3, which is marked as milestone already.
Also I would say add apple/DCLM-7B(not as milestone imo) as it was kind of the first fully open model which was at least somewhat competitive with closed data model.
On identity and trust boundaries: each agent in Splox runs with isolated credentials scoped to the tools the user explicitly connects. Agents can't discover or access services beyond what's been granted. The MCP protocol helps here — tool access is defined per-connection, so permissions are inherently scoped rather than bolted on after the fact.
For the "3am on Saturday" problem — that's exactly why we built the Event Hub with silence detection. If an agent stops hearing from a service it's monitoring, it reacts to that. Subscriptions state persists across restarts.