Even with Serena and detailed plans crafted by Gemini that lay out file-by-file changes, Claude will sometimes go off the rails. Claude is very task-completion driven, and it's willing to relax the constraints of the task to complete in the face of even slight adversity. I can't tell you the number of times I've had Claude try to install a python computational library, get an error, then either try to hand-roll the algorithm (in PYTHON) or just return a hard coded or mock result. The worst part is that Claude will tell you that it completed the task as instructed in the final summary; Claude lying is a meme for a reason.
I have to agree with pretty much all of this. Specifically, I've had Claude fail at creating a database migration using tooling then go on to create the migration manually. My only reaction to anyone doing this, be it human or computer, is "You did WHAT!?".