I've found that this works for smaller projects but starts breaking down when you're dealing with large-scale codebases or systems that change frequently.
Let us know if you have any feedback. We welcome contributors and are working every day to improve based on what our users want. GitHub = https://github.com/ix-infrastructure/Ix
This is really interesting, especially the chunking and parallel Haiku approach.
Curious how it holds up as note volume grows. At some point you're still doing N relevance checks per tool call. do you hit a scaling limit there or does chaching keep it manageable? Also wondering if you've seen any drift in relevance when notes become more numerous or overlapping
the distribution muscle being a separate skill is so underrated. most technical founders assume if you build something good it finds its own audience. it almost never does. sounds like you're learning the hard way which is probably the only way
the duct tape framing is fair but the deeper issue is the model has no persistent understanding of the system it's working in. each generation starts from scratch with no memory of prior context or architectural decisions. that's a harder problem than prompt engineering but it's solvable at the infrastructure layer
Hey HN, I'm Tanner, one of the founders of Ix.
Every time we started a new AI session, the agent had no idea what system it was working in. Not just losing conversation, losing the entire architecture. We kept throwing more context at the problem. It didn't fix it. The problem wasn't retrieval. There was no map.
So we built one.
Ix ingests your codebase and generates a structured architectural map of your systems, subsystems, modules, and dependencies, built deterministically without an LLM inferring structure.
`ix map .` in under 90 seconds on any codebase. 30% fewer tokens on average, sometimes 80%+.
We are not replacing your vector DB or RAG setup. We are solving a different layer entirely. System structure memory, not conversation memory. Open source and good first issues are labeled in the repo for anyone who wants to get involved.
Live on GitHub. Happy to answer questions.
The 0.92 semantic similarity threshold is interesting. curious how you landed on that number? Thinking too low and you risk false cache hits but too high and you lose most of the semantic benefit. Did you run experiments or is it tunable per use case?