Hacker Newsnew | past | comments | ask | show | jobs | submit | jwpapi's commentslogin

Has someone verified this was an actual bug?

One of AI’s strengths is definitely exploration, f.e. in finding bugs, but it still has a high false positive rate. Depending on context that matters or it wont.

Also one has to be aware that there are a lot of bugs that AI won’t find but humans would

I don’t have the expertise to verify this bug actually happened, but I’m curious.


It's not even clear if AI was used to find the bug: they mention modeling the software with an "ai native" language, whatever that means. What is not clear is how they found themselves modeling the gyros software of the apollo code to begin with.

But, I do think their explanation of the lock acquisition and the failure scenario is quite clear and compelling.


They have some spec language and here,

https://github.com/juxt/Apollo-11/tree/master/specs

have many thousands of lines of code in it.

Anyways, it seems it would take a dedicated professional serious work to understand if this bug is real. And considering this looks like an Ad for their business, I would be skeptical.


> It's not even clear if AI was used to find the bug: they mention modeling the software with an "ai native" language, whatever that means.

Could the "AI native language" they used be Apache Drools? The "when" syntax reminded me of it...

https://kie.apache.org/docs/10.0.x/drools/drools/language-re...

(Apache Drools is an open source rule language and interpreter to declaratively formulate and execute rule-based specifications; it easily integrates with Java code.)


How did you pick out AI native and miss the rest of the SAME sentence?

> We found this defect by distilling a behavioural specification of the IMU subsystem using Allium, an AI-native behavioural specification language.


That does not answer my confusion, especially when static analysis could reveal the same conclusion with that language. It's not clear what role ai played at all.

It seems pretty clear when you follow the link?

https://juxt.github.io/allium/


> It's not even clear if AI was used to find the bug

The intro says “We used Claude and Allium”. Allium looks like a tool they’ve built for Claude.

So the article is about how they used their AI tooling and workflow to find the bug.


The article does not explain anything about how they used AI—it just has some relation with the behavioral model a human seems to have written (and an AI does not seem necessary to use!)

Sure it does.

They used their AI tool to extract the rules for the Apollo guidance system based on the source code.

Then they used Claude to check if all paths followed those rules.


>It's not even clear if AI was used to find the bug

It's not even clear you read the article


Where do you think my confusion came from? All it says is that ai assists in resolving the gyroscope lock path, not why they decided to model the gyroscope lock path to begin with.

Please, keep your offensive comments to yourself when a clarifying comment might have sufficed.


Even worse, the other child comments are speculating (and didn't RTFA either) when the answer is clear in the article.

> We found this defect by distilling a behavioural specification of the IMU subsystem using Allium, an AI-native behavioural specification language.


That's the opposite of clear to me.

Has the article been updated?

2nd paragraph starts with: "We used Claude and Allium"

And later on: "With that obligation written down, Claude traced every path that runs after gyros_busy is set to true"


> distilling

A.k.a. as fabricating. No wonder they chose to use "AI".


solid bench brother

thank you:^)

Endgame is IPOing those AI companies and getting them on indexes, forcing index funds to buy them, which seemed to be evergreen investment category, but I’m not so sure anymore..

Did somebody say crypto?


Their backers certainly enough money and political muscle to force this outcome.

I’ve noticed a lot of fake Tik Tok comments recently and was wondering already..

Fake comments are like cockroaches. If you see one, you must assume there are 10x more that you are not noticing

Thank you so much. These comments let me believe in my sanity in an over-hyped world.

I see how people think its more productive, but honestly I iterate on my code like 10-15 times before it goes into production, to make sure it logs the right things, it communicates intent clearly, the types are shared and defined where they should be. It’s stored in the right folder and so on.

Whilst the laziness to just pass it to CC is there I feel more productive writing it on my own, because I go in small iterations. Especially when I need to test stuff.

Let’s say I have to build an automated workflow and for step 1 alone I need to test error handling, max concurrency, set up idempotency, proper logging. Proper intent communication to my future self. Once I’m done I never have to worry about this specific code again (ok some error can be tricky to be fair), but often this function is just practically my thought and whenever i need it. This only works with good variable naming and also good spacing of a function. Nobody really talks about it, but if a very unimportant part takes a lot of space in a service it should be probably refactored into a smaller service.

The goal is to have a function that I probably never have to look again and if I have to do it answers me as fast as possible all the questions my future self would ask when he’s forgotten what decisions needed to be made or how the external parts are working. When it breaks I know what went wrong and when I run it in an orchestration I have the right amount of feedback.

As others I could go very long about that and I’m aware of the other side of the coin overengineering, but I just feel that having solid composable units is just actually enabling to later build features and functionality that might be moat.

Slow, flaky units aren’t less likely to become an asset..

And even if I let AI draft the initial flow, honestly the review will never be as good as the step by step stuff I built.

I have to say AI is great to improve you as a developer to double check you, to answer (broad questions), before it gets to detailed and you need to experiment or read docs. Helps to cover all the basics


So don't write slow flakey unit tests? Or better yet, have the AI make them not slow and not flakey? Of if you wanna be old school, figure out why they're flakey yourself and then fix it? If it's a time thing then fix that or if it's a database thing then mock the hell out of that and integration test, but at this point if your tests suck, you only have yourself to blame.

Sorry I don’t get your point and you didn’t seem to get mine.

I’m saying I would guess I’m faster building manually then to let AI write it, arguably it won’t even achieve the level I feel best with in the future aka the one having the best business impact to my project.

Also the way I semantically define unit tests is that they are instant and non-flaky as they are deterministic else it would be a service for me.


Once I started agents and Claude code hid more and more of the changes it did from me it all went downhill..

I tried swarms as well, but I came back to you as well. It’s not worth it even th e small worse description double-checking, fine-tuning is not worth the effort the worse code will cost me in the future. Also when I don’t know about it.

Is the formula based on the actual probabilities. I’m pretty sure you could do that, but it’s not clear it it.

Can somebody explain me how we can trust these AIs on scale. It’s a simple task that every experienced dev will know the solution right away.

Opus goes wrong 3 times, wastes a lot of time and obviously giving wrong suggestions.


the whole point is that you guide them, you make sure they dont make this mistakes.

the power of ai is not that its smarter or better than you (at least yet) its that it will just keep grinding while your thinking, making new things, whole the context that it cant.

you can set up and agent with a good harness to make sure the code will be allright, with good guardrail while you do elevated work.

use them as the tool they are not as the tool you wish them to be.


I agree, that is my point. It’s just contra the whole agent logic that is pushed atm.

Have you tried to feed "tsc" error messages to Opus? That should reset model's logic

Another chill day on Linux..

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: