I can’t understand this. The first thing I do with new agent driven project is set up quality checks. Linters, test frameworks, static analysis, etc… Whatever I would expect a developer to do, I would expect an agent to do. All implementation has to go through build success and mixed agent reviews before moving on.
I might not do this with initial research/throwaway prototype, but once I know what direction to go and expect code to go to production it is vital to set guard rails.
> The first thing I do with new agent driven project is set up quality checks. Linters, test frameworks, static analysis, etc
I do this too, but then I sit and observe how agent gets very creative by going around all of these layers just to get to the finish line faster.
Say, for example, if I needlessly pass a mutable reference and the linter screams at me, I know it's either linter is wrong in this case, or I should listen to it and change the signature. If I make the lazy choice, I will be dissatisfied with myself, I might even get scolded, or even fired if I keep making lazy choices.
LLM doesn't get these feelings.
LLM will almost always go for silencing it because it prevents it from reaching the 'reward'. If you put guardrails so that LLM isn't allowed to silence anything, then you get things like 'ok, I'll just do foo.accessed = 1 to satisfy the linter'.
Same story with tests. Who decides when it's the test that should be changed/deleted or the implementation?
You have to not "stress" the agents out over testing. If a gate is no failing tests they cheat. If the gate is triage failing tests, quantify risk of failing test, prioritize in next work cycles... agents behave amazingly better at cheating tests.
> Same story with tests. Who decides when it's the test that should be changed/deleted or the implementation?
Claude is remarkably good at figuring this is out. I asked it to look at a failing test in a large and messy Python codebase. It found the root cause and then asked whether the failure was either a regression or an insufficiently specified test, performed its own investigation, and found that the test harness was missing mocks that were exposed by the bug fix.
If you point it at a specific thing and ask a specific question, yes, it will figure it out.
But I never have "fix this test" as a task. What happens when you task it with a feature implementation and test breaks in the middle of the session? It will not behave the same way.
I can generate a lot of tests amounting to assert(true). Yeah, LLM generated tests aren't quite that simplistic, but are you checking that all the tests actually make sense and test anything useful? If no, those tests are useless. If yes, I don't actually believe you.
It's the typical 10 line diff getting scrutinized to death, 1000 line diff: Instant LGTM.
As a percentage of good to mediocre, maybe.
Engineers of 40 years ago were probably better than engineers 20 years ago. Less of them and more constraints they had to deal with.
Democratization of technology makes it easier for more people to use. It applies to programming as much as just using a computer.
If I don’t know what I don’t know, how am I going to build something any better than a coding agent?
An approach on a couple of projects has been to prototype with the agent, learn, write a design and then start over. I then know the areas to look into more detail.
That's really the catch, and I think we're just figuring out how to approach this. I think your iteration approach is correct. I do a lot of asking AI to tell me about blind spots, to double check it's work in specific ways. I've asked it to let me know what it might have hallucinated, where it cut corners, and a bunch of other things along these lines. It will be interesting to see how things evolve in this area, and if AI gets good to the point that we have to do this less.
This is my go to solution for code sync across macOS laptop, Windows VMs, and Linux VMs to build and run/debug across environments. Unless something has changed, exclusions of build artifacts was always an issue with cloud sync providers.
I have been doing more cross compilation on macOS, copy and run on those other machines lately for prototypes, but for IDE based debugging it’s great to edit local or remote and get it all synced to the machine to run it in seconds.
My wife's work laptop gives this stupid warning anytime any USBC charger is plugged in, other than the Dell brick. So even a dock delivering 100w would get a complaint. The Dell brick offers non-standard charging at 140w, which can't get replaced by standards compliant, smaller chargers.
I am not sure how, but at one point even private browser mode would still have me logged in to Entra ID. Couldn’t log out of main browser and same session would follow me to private.
Claude can do mermaid diagrams and I started with those, but I have been asking it to generate draw.io diagrams as of late. I haven't actually tried the AI integration recommendations for draw.io, yet. I will have to pull the skill and references to see if it makes the process faster.
My M4 MacBook Pro for work just came a few weeks ago with 128 GB of RAM. Some simple voice customization started using 90GB. The unified memory value is there.
reply