More

walmsles · 2026-04-27T22:42:08 1777329728

Not for a lack of trying! Had to enforce tool building through code as models tend to just execute arbitrary code when allowed.

walmsles · 2026-04-27T22:34:48 1777329288

This is essentially ADRs — capturing what the agent learned and why. The manual trigger is the interesting constraint though; the hard part is teaching the agent to recognise the moment a decision worth recording has been made, without being asked. That's what the triggers/suppression definitions are trying to formalise — the when of capture, not just the what.

walmsles · 2026-04-27T22:31:59 1777329119

The three bootstrap tools are a partial answer to (1) — the tool surface never grows, only the registry does, so context pollution is bounded by the search interface rather than the full tool list. Whether the registry search stays useful as it grows is an open question, semantic search over capability definitions is probably the next step.

(2) is where the structured capability format earns its keep over free-text memory. Triggers and suppression conditions give you inspectable, versioned invocation policy rather than prose that degrades over time. Still early though.

(3) I don't have a good answer to yet. Your point about feedback loops is the right framing — knowing whether the agent is actually getting better rather than just accumulating more tools is unsolved. The audit angle (administrators reasoning about which tools fire, when, and whether they should) is where I think this needs to go, but I haven't built that layer.

One thing that might directly address your caching point though — ADRs (Architecture Decision Records). The article that spawned Tendril started with giving an agent a record_decision capability that wrote ADRs to the filesystem. ADRs as agent cache is an interesting framing: structured, persistent, searchable records of why decisions were made at the moment they were made. That's arguably a better cache primitive than summarisation — decisions don't degrade the way summaries do, and they give you something to reason about for regression detection too.

Your tree/hierarchy observation resonates — the registry is a flat index right now which probably doesn't scale past a few dozen capabilities without some grouping structure.

walmsles · 2026-04-27T21:43:48 1777326228

IKR!!!!

walmsles · 2026-04-27T21:39:35 1777325975

The registry itself is searchable. The system prompts guides the agent to search it to find tools. Right now its a naive implementation as it's a local tool. I am exploring the idea of more structured policy here. It's not net new or different to skills or MCP it externalities the invocation policy which I feel is really important when looking to formalise or scale agent tools in larger organisations.

It's more an idea I decided to share because I think we need more thinking in this space as we all run towards agent networks of networks.

Will review the README.md. the article I wrote looks at the aspect of "when" which I found interesting in the original case I wrote about.

Tendril and find tools is more an experimental look at "how do we discover tools at scale" and how do agents know what to choose.

More importantly how do administrators reason about the tools and when are they used and are they being used correctly (agent validation).

I feel the focus of "when" is more human oriented IMO.

walmsles · 2026-04-27T21:26:40 1777325200

It's an open experiment, the utility of tendril is the concept. I am more curious about how good can the tool making get. Frontier models tend to be very specific about what they build so don't get specific bloat (yet).

walmsles · 2026-04-27T13:35:25 1777296925

I built this while working on a coding agent that kept starting cold every session. The deeper problem was that agent frameworks give you what a tool does and how to call it, but no structured answer to when — when should a tool fire autonomously, and when should it stay silent. That judgement is always implicit, scattered across system prompts and tool descriptions.

Tendril is a reference implementation of what I'm calling the Agent Capability pattern. It starts with three bootstrap tools and builds everything else itself. The key constraint: there's no direct code execution. The agent can only run registered capabilities, so every task forces it to write a tool, define its invocation conditions, and register it for future sessions. The registry accumulates across sessions.

I also ran the self-extending loop against five local models — Qwen3-8B, Gemma 4, Mistral Small 3.1, Devstral Small 2, Salesforce xLAM-2. None passed.

The failure modes were distinct enough to be worth writing up separately: https://serverlessdna.com/strands/ai-agents/agents-know-what...

Stack: AWS Strands TypeScript SDK, Bedrock (Claude Sonnet), Deno sandbox, Tauri + React desktop shell.

dd8601fn · 2026-04-27T15:42:10 1777304530

I did something that sounds similar for my home assistant.

The agent never executes anything. It has like four tools… search, request execute, request build, request update.

The tool service runs vector search against the tools catalog.

The build generalizes the requested function and runs authoring with review steps, declaring needed credentials and network access.

The adversarial reviewer can reject back to the authoring three times.

After passing, the tool is registered and embeddings are done for search. It’s live for future use.

Credentials are stored encrypted, and only get injected by the tools catalog service during tool execution. The network resources are declared so tool function execution can be better sandboxed (it’s not, yet).

The agent never has access to credentials and cannot do anything without going through vetted functions in the tool service.

Agent, author process, reviewer, embedding… all can be different models running local or remote.

Event bus, agent, tool service… all separate containers.

I have an url if you want to read a bit about what I did: https://dcd.fyi/agent

It’s really just meant for me, but if you’re interested in more details on anything let me know. There’s nothing super special in it.

esafak · 2026-04-27T14:34:19 1777300459

You can list the uses of the available tools in the AGENTS. I keep my agents on a tight leash, and self-extension runs counter to this. I would not my agent to spontaneously develop the ability to tap my bank account, for example.

walmsles · 2026-04-27T22:44:20 1777329860

The Deno sandbox is the answer here — network access is restricted to an allowlist, and the execution environment has scoped permissions. The agent builds tools within those constraints, it can't reach anything you haven't explicitly allowed

walmsles · 2026-03-14T07:16:07 1773472567

The portal is the B2B component for agent monetisation platform I am building. We are also building free and open source agent discovery - boring federated infra servicekind of like DNS - to ensure Agent discovery is free from lock-in and available for everyone - this is the no Vendor Lock-in component. AWS was a choice and its for a hackathon run by AWS so - needed to.

No Vendor Lock-in is the discovery piece Tethyr.cloud is built on. Thats the killer bit that matters - so we all collectively own agent discovery.

walmsles · 2026-03-14T04:22:49 1773462169

I built Tethyr Cloud for the AWS AIdeas competition - it's a B2B agent federation platform addressing a problem I see emerging: agent discovery is fragmenting across proprietary registries with vendor lock-in. The architecture has two parts: open-tethyr (OSS): Agent discovery server implementing the Agent Exchange (AX) protocol draft by Aaron Sempf. Free, decentralized discovery that prevents corporate gatekeeping. Tethyr Cloud (commercial): Trust layer for B2B commerce - subscriptions, rate limiting, usage metering. Built on the discovery network but doesn't own the data. Key design decisions:

Peer-to-peer agent execution - payloads never touch Tethyr infrastructure Stateless JWT validation with <200ms p95 latency Fire-and-forget usage reporting off the hot path (SQS buffering) Dual-API boundary (AppSync for dashboard, API Gateway REST for SDK)

Built on AWS Amplify Gen 2 in 4 weeks. Running costs: $0/month in Free Tier. Full writeup: https://builder.aws.com/content/3Au66gquBesFWDxq83Luj1SzZOk/...

Top 300 projects advance based on article likes by March 20.

Feedback welcome.

walmsles · 2026-02-05T21:45:59 1770327959

I built a library that uses an LLM as an orchestrator to coordinate multiple agents at runtime. You define what each agent does in markdown files using RFC 2119 constraints (MUST, SHOULD, MAY), and the orchestrator figures out who to call and when based on the user's request.

This builds on AWS Strands Agent SOPs (markdown format for agent workflows released in November). The difference: instead of manually chaining agents or defining explicit flows, the orchestrator reads available agent capabilities and decides the execution path dynamically.

Add a new agent by dropping in a markdown file. No code changes to coordination logic.

The bet: LLMs are better at runtime orchestration than developers are at predicting workflows upfront, especially when requirements change. Natural language is more maintainable when both producers (agent authors) and consumers (orchestrator) are LLMs.

Built on AWS Strands SDK and Bedrock with Claude models. Using this in a technical bootcamp next week to teach students complex agent workflows without coordination code.

GitHub: https://github.com/serverless-dna/sop-agents npm: https://www.npmjs.com/package/@serverless-dna/sop-agents AWS Strands blog: https://aws.amazon.com/blogs/opensource/introducing-strands-...