Yes this sort of auto-regressive error propagation is a real concern for the same reason it's a real concern with LLMs in general.
If you force the output of an LLM to begin with an error, the LLM tends to continue down that erroneous path.
In practice, we didn't see much of this kind of EP. A solution to this would be to give some agent the task of occasionally reviewing the NERDs for contradictions as well as the ability to search through the source material as needed. That of course creates the possibility of catastrophic forgetting, where the agent rewrites a NERD in an effort to remove a contraction and end's up deleting something important.
We didn't see a lot of error propagation, but one example where we did: in Harry Potter, Prof Dumbledore is introduced as a mysterious hooded character. So the NERD-writer would create a NERD for "mysterious hooded man." There's no tool for the agent to change the title of a NERD, so the system is stuck with that title now. Sometimes the system would build the entire Dumbledore entry under "mysterious hooded man"; sometimes it would make a new Dumbledore entity and like a reference back to the "mysterious hooded man" entity, and sometimes it wouldn't link them. None of those outcomes are great.
We originally developed NERDs inside of my last startup for monitoring the progress of solar developments. There are many different multi-modal event feeds that you need to monitor for a wholistic view of the project. NERDs helped glue together the event around entities.
Only later did we adapted to the technique to work to long books. The existing long book benchmarks seemed like the most appropriate way to show the core idea to a wider audience.
So ya, I'm confident that this central idea can be applied in many different domains.
This maps closely to something we've been exploring in our recent paper. The core issue is that flat context windows don't organize information scalably, so as agents work in parallel they lose track of which version of 'reality' applies to which component. We proposed NERDs (Networked Entity Representation Documents), Wikipedia-style docs that consolidate all info about a code entity (its state, relationships, recent changes) into a single navigable document, corss-linked with other other documents, that any agent can read. The idea is that the shared memory is entity-centered rather than chronological. Might be relevant: https://www.techrxiv.org/users/1021468/articles/1381483-thin...
We talk a little bit about this in the paper, but just a bit. And it is an important question. Real entities change! We’re not just trying to create a representation of now, but a historical record of how we got here. The two ideas we played with that didn’t make it in the paper were
1) explicitly tell the NERD system to keep a timeline for each entity that tracks “core state” changes.
2) Let the NERD-agents also access the full change log of the NERD documents, so that they could see the history of the document. Possibly like a git history. For the paper we left these out because they were both too complicating.
This is my roman empire, and I go back and forth on my conclusions at least once a day.
On one hand, clarity and structure make a platform that's easy to build and collaborate on. If the system enforces the rules, and the rules are a good model of reality, everyone knows what to expect.
Pushing the world forward one ISO standard at a time.
On the other hand, greatness can't be planned. By the time we know enough to make a plan, the really important stuff has already happened. "Everyone" expected solar to always be a somewhat marginal energy source, so why spend a lot of time standardizing formats?
And it's not like this is a just a thing in tech. Buildings used to be fine tolerance artifacts built by craftsman. Now we slap them together prefab parts and just add more caulk until it works.
I'm genuinely shocked that the electrical grid works. And the more I learn about how it works, the more shocked I become.
Are we losing our attention spans as a rational response to a world that changing faster and faster; or is our lack of attention creating a less stable world?
Ultimately, we make progress not when the code runs fast, but when the humans run fast; but sometimes that means the code needs to run fast too.
Haven't heard this said before, but is this a reference to how often men think about Rome? That's very amusing.
RDF is my Roman empire. It's the original and best Web 3: open-world knowledge graphs. It feels like if the ecosystem around that developed better, it would provide a way to interchange data without having to do Big Standards Up Front.
Any org could publish data under their own schemas, communities could start to converge on the most helpful ones, provide translations between terms, and gradually things would evolve, without having to all agree first. RDF provides a baseline for interop, on top of which specific worlds of knowledge can evolve.
I work in solar software too and every day I am sad that manufacturers provide their datasheets as PDFs. Not even Excel files.
> I'm genuinely shocked that the electrical grid works. And the more I learn about how it works, the more shocked I become.
If you force the output of an LLM to begin with an error, the LLM tends to continue down that erroneous path.
In practice, we didn't see much of this kind of EP. A solution to this would be to give some agent the task of occasionally reviewing the NERDs for contradictions as well as the ability to search through the source material as needed. That of course creates the possibility of catastrophic forgetting, where the agent rewrites a NERD in an effort to remove a contraction and end's up deleting something important.
We didn't see a lot of error propagation, but one example where we did: in Harry Potter, Prof Dumbledore is introduced as a mysterious hooded character. So the NERD-writer would create a NERD for "mysterious hooded man." There's no tool for the agent to change the title of a NERD, so the system is stuck with that title now. Sometimes the system would build the entire Dumbledore entry under "mysterious hooded man"; sometimes it would make a new Dumbledore entity and like a reference back to the "mysterious hooded man" entity, and sometimes it wouldn't link them. None of those outcomes are great.
reply