Hacker Newsnew | past | comments | ask | show | jobs | submit | jonnycoder's commentslogin

This is clever and provides a clean alternative to using custom plugins and mcp servers for doing code reviews.

For example, with the degradation of Claude in the past 1-2 months, I am always asking Codex to review Claude's plans and vice versa and I get excellent results that way.

Also, making a skill an API call allows for easy deployment if the security around tool calling could be isolated in an ephemeral sandbox.


Thanks! Sandbox deployment is planned in the roadmap. I already have a RuntimeAdapter interface in my architecture that I'll use to isolate the VMs. I'm doing exactly the same thing: I'm cross-referencing the models to challenge their plan, and my code reviewer agent's API is a big help.

I agree, I use codex 5.4 xhigh as my reviewer and it catches major issues with Opus 4.6 implementation plans. I'm pretty close to switching to codex because of how inconsistent claude code has become.

Everything in our life is a black box, but I agree that depending on non-deterministic and sporadic quality black boxes is a huge red flag.

No, most systems in daily life can be understood if you are willing to take the time.

That doesn’t mean you personally are required to, but some people do and your interaction with the system of social trust determines how much of that remains opaque to you.


I do the same but I often find that the subtasks are done in a very lazy way.

Yea I went through my global claude skills and /context yesterday because claude was performing terribly. I deleted a bunch of stuff including memory and anecdotally got better results later on in the day.

It’s shifting for knowledge workers too, we just need to pivot. I have had many app ideas for a while and now ai lets me build them quickly. Access to education and knowledge led to your advanced eduction, now access to cheap/fast building leads to products execution. Use your phd brain to come up with a well researched idea/plan and then go execute.


Just a note that everyone is doing that, at 10x speed, and very good people can now output 100x thanks to AI.


Those who are essentially vibe coding will find their code large, brittle, and unmaintainable beyond a size, contingent on its organization. They will be able to make 100x the toys but toys aren't what make the world work.


Yeah, but those are amateurs. But every developer like you and me are going to do the same, or be whipped to do the same. But the world only needs that many games, that many TODO apps, that many...so, either you are already a top developer, which ofc means you shouldn't worry, or else.


Their support team likes to sit on things for a while too. I'm on day 4 of waiting for Azure to approve my support request to increase Azure Batch vCPUs from default of 4 to 20 for ESv3 series. I signed up last week and converted to a paid account. I'm going to use Google Cloud Batch today instead.


You’ve made a fundamental mistake and you’ll have the same result from every cloud provider.

You’re using a legacy v3 series that is being removed from the data centres in an era where you could be using v6 or newer instances that are being freshly deployed and are readily available.

If you can’t be bothered to keep an eye on these absolute basics, you’re going to have a rough time with any public cloud, no matter their logo design.

Right now you're paying more for less compute and having to deal with low availability too! Go read the docs and catch up to the last decade of virtual hardware changes.

Or, just run this and pick a size:

    Get-AzBatchSupportedVMSku -Location 'centralus' | `
    ? Name -like 'Standard_E*v[67]'


Thanks I will try that!


I tend to agree. I spent a lot of time revising skills for my brownfield repo, writing better prompts to create a plan with clear requirements, writing a skill/command to decompose a plan, having a clear testing skill to write tests and validate, and finally having a code reviewer step using a different model (in my case it's codex since claude did the development). My last PR was as close to perfect as I have got so far.


That's pretty cool. I'm working in maplibre myself and your json maps seems like it could also be used to demo a workflow or tutorial in a mapping product.


Prompting is just step 1. Creating and reviewing a plan is step 2. Step 0 was iterating and getting the right skills in place. Step 3 is a command/skill that decomposes the problem into small implementation steps each with a dependency and how to verify/test the implementation step. Step 4 is execute the implementation plan using sub agents and ensuring validation/testing passes. Step 5 is a code review using codex (since I use claude for implementation).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: