Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect. Then 30 minutes later I hit session limits. Three sessions like that in a day, and suddenly 25% of the weekly limit is gone.
I ended up buying the $100 Codex plan. So far it has been much more generous with usage and more accurate than Claude for the kind of work I do.
That said, Codex has its own issues. Its personality can be a bit off-putting for my taste. I had to add extra instructions in Agents.md just to make it less snarky. I was annoyed enough that I explicitly told it not to use the word “canonical.”
On UI/UX taste, I still think current Codex is behind the Jan/Feb era of Claude Code. Claude used to have much better finesse there. But for backend logic, hard debugging, and complex problem-solving, Codex has been clearly better for me. These days I use Impeccable Skillset inside Codex to compensate for the weaker UI taste, but it still does not quite match the polish and instinct Claude Code used to have.
I used to be a huge Claude Code advocate. At this point, I cannot recommend it in good conscience.
My advice now is simple: try the $20 plans for Codex and Cursor, and see which one matches your workflow and vibes best
I had a weird experience at work last week where Claude was just thinking forever about tasks and not actually doing anything. It was unusable. The next day it was fine again.
The way Claude/Codex behave is entirely consistent with how every vibe coded project (of mine) has ended up so far. I bet those guys have no idea what's going on and are taking guesses because no one understands the thing they've made.
i was having this issue yesterday. the same prompt would send it into a loop where it would appear to be doing nothing for 30+ minutes until i cancelled it. it would show 400 tokens used and thats it.
I tested on a previous version (2.1.68) and it still ran into this neverending loop BUT at least the token count kept steadily increasing.
So we are seeing 1. some sort of model degredation is my guess (why it can't break a thinking loop on some problems), as well as 2. a clear drop in thinking token UI transparency.
Ya I've had this experience more than a few times recently. I've heard people claiming they are serving quantized models during high loads, but it happens in cursor as well so I don't think it's specific to Anthropics subscription. It could be that the context window has just gotten into a state that confuses the model... But that wouldn't explain why it appears to be temporary...
My best guess is this is the result of the companies running "experiments" to test changes. Or it's just all in my head :)
These days cursor feel more capable and reliable then Claude Code (at last for my workflow). For personal projects, I'm using cursor during planning and verification but run Claude code for just implementation to save $.
Not the guy you're responding to, but when this happens the token counter is frozen at some low value (eg. 1k-10k) value as well, so it's not thinking in circles but rather not thinking (or doing anything, for that matter) at all.
i was having this issue yesterday. the same prompt would send it into a loop where it would appear to be doing nothing for 30+ minutes until i cancelled it. it would show 400 tokens used and thats it.
I tested on a previous version (2.1.68) and it still ran into this neverending loop BUT at least the token count kept steadily increasing.
So we are seeing 1. some sort of model degredation is my guess (why it can't break a thinking loop on some problems), as well as 2. a clear drop in thinking token UI transparency
when i left it running overnight it finally sent a message saying it exceeded the 64000 output token limit
This happened to me as well! It was especially infuriating because I had just barely upgraded to the $200 per month plan because I exhausted my weekly quota. Then the entire next day was a complete bust because of this issue. I want my money back!
I'm using the Codex Business subscription (about 30€) already for multiple months. Even there they cut back on the quota. A few months back it was hard for me to reach the limit.
Now it is easier.
Still, in comparison with Claude Code, the quota of Codex is a much better deal.
However, they should not make it worse...
Promotion has been extended til May 31st for the $100 and $200 subs.
At the same time, they’ve been giving out a ton of additional quota resets seemingly every other week (and committed to an additional reset for every million additional users until they hit 10mil on codex).
So they’ve really set a high bar for people’s expectations on their quota limits.
Once they drop the 2x promotion for good and stop the frequent resets, there are going to be a lot of complaints.
> Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect.
This is what I'm working on proving now.
It is more that there is a confidence score while thinking. Opus will quit if it is too high and will grind on if the confidence score is close to the real answer. Haiku handles this well too.
If you give Sonnet a hard task, it won't quit when it should.
Nonetheless, that issue has been fixed with Opus.
I'll try to show that the speed of using Opus on tasks that have medium to hard difficultly is consistently the same price or cheaper than running them with Haiku and Sonnet. While easier tasks, the busy work that is known, is cheaper run with Haiku.
Stella Laurenzo, AMD’s director of AI, filed a detailed GitHub issue on April 2 documenting that Claude Code reads code three times less before editing it, rewrites entire files twice as often, and abandons tasks mid-way at rates that were previously zero. Her analysis of nearly 7,000 sessions puts precise numbers on how Anthropic’s coding tool has degraded since early March.
It was pretty much first for CLI agents and had a benchmark that was the go to at the start of LLM coding. Now the benchmark doesn't get updated and aider never gets a mention in talking about CLI tools till now.
By the way, what are you using it for? I bought Max and Pro plans for Claue and Codex, developed a few apps with it, and after the initial excitation ("Wow I can get results 10x faster!") I felt the net sum is negative for me. In the end I didn't learn much except the current quirks of each model/tool, I didn't enjoy the whole process and the end result was not good enough for my standards. In the end I deleted all these projects and unsubscribed.
For me it’s mostly useful in day-to-day coding, not “build an entire app and walk away” coding.
TDD was never really my natural style, but LLMs are great at generating the obvious test cases quickly. That lets me spend more of my attention on the edge cases, the invariants, and the parts that actually need judgment.
Frontend is another area where they help a lot. It’s not my strongest side, so pairing an LLM with shadcn/ui gets me to a decent, responsive UI much faster than I would on my own. Same with deployment and infra glue work across Cloudflare, AWS, Hetzner, and similar platforms.
I’m basically a generalist with stronger instincts in backend work, data modeling, and system design. So the value for me is that I can lean into those strengths and use LLMs to cover more ground in the areas where I’m weaker.
That said, I do think this only works if you’re using them as leverage, not as a substitute for taste or judgment.
> It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect.
Give it a custom sandbox and context for the work, so it has no opportunity to roam around when not required. AI agentic coding is hugely wasteful of context and tokens in general (compared to generic chat, which is how most people use AI), there's a whole lot of scope for improvement there.
> But the problem is it used to not need that before. These days, you have to think twice before you summon a subagent.
This is exactly what I (and many others) kept trying to tell the pro-AI folk 18 months ago: there is no value to jumping on the product early because any "experience" you have with it is easily gained by newcomers, and anything you learned can easily be swapped out from under you anyway.
The value is all the things I built with it? Surely, this constant change deteriorates the experience but to be clear, here we're nitpicking on the experience, not questioning the value.
I also don't understand the "pro-AI" phrase. It's a tool, it brings results. I'm not pro-car when I drive to work.
The sandbox is fine, but if the parent has given explicit instruction of files to inspect, why is it not centering there? Is the recent breakage that the base prompt makes it always try to explore for more context even if you try to focus it?
Because the "explicit instruction" you give AI is not deterministic as in a normal computer program. It's a complete black box and the context is also most likely polluted by all sorts of weird stuff. Putting it on as tight of a leash as possible should be seen as normal.
They changed plan mode so that it's instructed to follow a multi-step plan, the first step being to explore the code base. When you tell it to focus it's getting contradictory instructions from plan mode vs your prompt and it's essentially a coin flip which one it picks.
It does seem like a cynical attempt to make more money.
I also gave up on my Claude Code subscription. It's running out in 2 weeks and I have canceled it. My current MAX session got rate-limited in 2 hours of work and that's just absurd.
Codex seems to give the $20 plan for free for 1 month and that's what I signed up for.
Let's see how it compares when I can't use my Claude max sub for 3 more hours.
When they bumped the context size up to 1m tokens they made it much easier to blow through session limits quickly unless you manually compact or keep sessions short.
Codex has been better for me, but it's WAY too nitpicky/defensive. It always wants to make changes that add complexity and code to solve a problem that's impossible to happen (e.g. a multiprocess race condition on a daemon I only ever run one instance of).
You just convinced me to try it. Claude just copy pastes, does search and replace, zero abstractions and I'm the one that needs to think about the edge cases.
You may think that's a good thing but it's not. Codex is great at coming up with solutions to problems that don't exist and failing to find solution to problems that do. In the end you have 300 new lines of code and nothing to show for it.
I'm adding two extra gpus to my local rig. Turns out qwen 3.5 122b is already enough to handle (finish with moderate guidance) non-planning parts of my tasks.
I am also on Codex while Claude seems to be blatantly ignoring instructions (as recently as Thursday: when I made the switch). The huge Claude context helps with planning, so that's all it does now.
Codex consumes way fewer resources and is much snappier.
The product was performing badly and you thought this would be solved by spending more money on it?
When will people realize this is the same as vendor lock-in?
"Maybe if I spend more money on the max plan it will be better" > no it will be the same
"Maybe if I change my prompt it will work" > no it will be the same
"Maybe if I try it via this API instead of that API it will improve" > no it will be the same.
Claude, ChatGPT, Gemini etc all of these SOTA models are carefully trained, with platforms carefully designed to get you to pay more for "better" output, or try different things instead of using a different product.
It's to keep you in the ecosystem and keep you exploring. There is a reason you can't see the layers upon layers of scaffolding they have. And there's a reason why after 2 weeks post major update, the model is suddenly "bad" and "frustrating". It's the same reason its done with A/B testing, so when you complain, someone else has no issues, when they complain, you have no issues. It muddies the water intentionally.
None of it is because you're doing anything wrong, it's not a skill issue, it's a careful strategy to extract as much engagement and money from customers as possible. It's the same reason they give people who buy new gun skins in call of duty easier matches in matchmaking for the first couple games.
The only mistake you made was paying MORE, hoping it would get better. It won't, that's not what makes them money. Making people angry and making people waste their time, while others have no issues, and making them explore and try different things for longer so they can show to investors how long people use these AI tools is what makes them money.
When competitors have a better product these issues go away
When a new model is released these issues don't exist
I was paying a ton of money for claude, once I stopped and cancelled my subscription entirely, suddenly sonnet 4.6 is performing like opus and I don't have prompts using 10% of my quota in one message despite being the same complexity.
You ask that as if there is some insight to the question, but the insight is hard to find. What the person you replied to is saying, applies to both Claude and Codex.
Maybe I’m in the minority here, but while directories and similar channels are useful, I felt like I was just shooting darts in the dark without understanding sales and marketing from first principles and hoping something would stick.
I had three side projects and kept struggling to get any real traction or traffic without becoming spammy across the internet. So I decided to approach it the same way I approach learning anything new: through books, courses, and solid foundational material.
HN had a few excellent suggestions. One of them was Founding Sales. Another, which I came across through a friend’s recommendation, was Alex Hormozi’s series. He seems to have something of a cult following, which made me a bit skeptical at first, so I decided to just read the first 100 pages before forming an opinion.
I ended up finding it genuinely useful, especially for understanding the psychology and mindset needed to sell something. I now highly recommend his book $100M Leads to technical friends who are trying to figure out how to sell what they’ve built.
I’m still learning, if you’ve any good recommendations, please drop them below
Agreed, had the same experience. Codex feels lazy - I have to explicitly tell it to research existing code before it stops giving hand-wavy answers. Doc lookup is particularly bad; I even gave it access to a Context7 MCP server for documentation and it barely made a difference. The personality also feels off-putting, even after tweaking the experimental flag settings to make it friendlier.
For people suggesting it’s a skill issue: I’ve been using Claude Code for the past 6 months and I genuinely want to make Codex work - it was highly recommended by peers and friends. I’ve tried different model settings, explicitly instructed it to plan first and only execute after my approval, tested it on both Python and TypeScript backend codebases. Results are consistently underwhelming compared to Claude Code.
Claude Code just works for me out of the box. My default workflow is plan mode - a few iterations to nail the approach, then Claude one-shots the implementation after I approve. Haven’t been able to replicate anything close to that with Codex
+1 to this. Been using Codex the last few months, and this morning I asked it to plan a change. It gave me generic instructions like 'Check if you're using X' or 'Determine if logic is doing Y' - I was like WTF.
Curious, are you doing the same planning with Codex out-of-band or otherwise? In order to have the same measurable outcome you'd need to perhaps use Codex in a plan state (there's experimental settings - not recommended) or other means (explicit detailed -reusable- prompt for planning a change). It's a missing feature if your preference is planning in CLI (I do not prefer this).
You are correct in that this mode isn't "out of the box" as it is with Claude (but I don't use it in Claude either).
My preference is to have smart models generate a plan with provided source. I wrote (with AI) a simple python tool that'll filter a codebase and let me select all files or just a subset. I then attach that as context and have a smart model with large context (usually Opus, GPT-5.2, and Gemini 3 Pro in parallel), give me their version of a plan. I then take the best parts of each plan, slap it into a single markdown and have Codex execute in a phased manner. I usually specify that the plan should be phased.
I prefer out-of-CLI planning because frankly it doesn't matter how good Codex or Claude Code dive in, they always miss something unless they read every single file and config. And if they do that, they tip over. Doing it out of band with specialized tools, I can ensure they give me a high quality plan that aligns with the code and expectations, in a single shot (much faster).
Then Claude/Codex/Gemini implement the phased plan - either all at once - or stepwise with me testing the app at each stage.
But yeah, it's not a skill issue on your part if you're used to Plan -> Implement within Claude Code. The Experimental /collab feature does this but it's not supported and more experimental than even the experimental settings.
Came across official anthropic repo on gh actions very relevant to what you mentioned. Your idea on scheduled doc updation using llm is brilliant, I’m stealing this idea.
https://github.com/anthropics/claude-code-action
No path for busy people, unfortunately. Learn everything from ground up, from containers to Compose to k3s, maybe to kubeadm or hosted. Huge abstraction layers coming from Kubernetes serve their purpose well, but can screw you up when anything goes slightly wrong on the upper layer.
For start, ignore operators, ignore custom CSI/CNI, ignore IAM/RBAC. Once you feel good in the basics, you can expand.
k3sup a cluster, ask an AI on how to serve an nginx static site using trafeik on it and explain every step of it and what it does (it should provide: a config map, a deployment, a service and an ingress)
k3s provides: csi, cni (cluster storage interface, cluster network interface) which is flannel and and local-pv which just maps volumes to disk (pvcs)
trafeik is what routes your traffic from the outside to inside your cluster (to an ingress resource)
Is there a comprehensive leaderboard like ClickBench but for vector DBs? Something that measures both the qualitative (precision/recall) and quantitative aspects (query perf at 95th/99th percentile, QPS at load, compression ratios, etc.)?
ANN-Benchmark exists but it’s algorithm-focused rather than full-stack database testing, so it doesn’t capture real-world ops like concurrent writes, filtering, or resource management under load.
Would be great to see something more comprehensive and vendor-neutral emerge, especially testing things like: tail latencies under concurrent load, index build times vs quality tradeoffs, memory/disk usage, and behavior during failures/recovery
Seconding Fullmetal Alchemist. I hear the remake (Fullmetal Alchemist: Brotherhood) is usually regarded as the better version. More suggestions: Neon Genesis Evangelion, Death Note, Sousou no Frieren, Cowboy Bebop, Nichijou, Tengen Toppa Gurren Lagann, Bakemonogatari. There's also quite a few good movies, anything by Studio Ghibli is great, and so are Akira, Perfect Blue, and Ghost in the Shell.
Some of those those aren't really going to appeal people unfamiliar with the conventions of the genre and some of the big personalities, e.g. Evangelion is a deconstruction of the once popular giant robot genre and Hideaki Anno's personal couch trip rolled into one.
Brotherhood follows the plot of the source material comic, which is regarded as having a better ending. The original series aired concurrently with the comic and had to diverge when it passed where the ongoing comic ran out of chapters.
Slightly tangential but this was a learning moment for me.
This reminds me of a story where Sage Mandavya established the first juvenile law in Hindu mythology.
<story starts>
Long ago, there lived a great sage named Mandavya who had taken a vow of silence and spent his days in deep meditation. One day, while he sat motionless beneath a tree with his arms raised in penance, a group of thieves being pursued by the king’s soldiers fled into his hermitage. They hid their stolen loot near the sage and escaped through the other side.
When the king’s soldiers arrived, they found the stolen goods but the sage—deep in meditation and bound by his vow of silence—neither confirmed nor denied their presence. The soldiers arrested him and brought him before the king, accusing him of harboring criminals.
Despite his spiritual stature, the king ordered a severe punishment: Mandavya was to be impaled on a stake (shula)—a horrific execution where a wooden spike was driven through the body. However, due to his immense yogic powers and detachment from the physical world, the sage did not die. He remained alive on the stake, enduring the agony with superhuman patience.
Eventually, other sages intervened, the king realized his grave error, and Mandavya was freed. But the damage was done. When the sage finally left his mortal body, he went directly to Yamaloka—the realm of Yama, the god of death and justice—to demand an explanation.
“Why did I have to suffer such a gruesome fate?” Sage Mandavya asked Lord Yama. “What terrible sin did I commit to deserve impalement?”
Yama consulted his records and replied, “When you were a child, you caught a dragonfly and pierced it with a needle through its body, watching it suffer for your amusement. That act of cruelty resulted in your punishment - you experienced the same suffering you inflicted on that innocent creature.”
Sage Mandavya was furious. “That was when I was a child!” he protested. “I was too young to understand the difference between right and wrong, between sin and virtue. How can you punish an ignorant child with the same severity as a knowing adult?”
Yama tried to explain that karma operates impartially, but Mandavya would not accept this. In his righteous anger, the sage cursed Yama himself: “For this unjust judgment, you shall be born as a human on Earth and experience mortality yourself!”
This curse led to Yama being born as Vidura, the wise and virtuous counselor in the Mahabharata - a human who, despite his wisdom and righteousness, had to endure the limitations and sufferings of mortal life.
But Mandavya didn’t stop there. Using his spiritual authority, he proclaimed a new divine law: “No sin committed by a child below the age of fourteen shall count toward their karmic debt equivalent to that of an adult. Children who do not yet understand dharma and adharma shall not be punished for their ignorant actions.”
This became the first “juvenile law” in Hindu mythology—a recognition that children, in their innocence and ignorance, deserve compassion and correction rather than severe punishment.
<story ends>
When I was a child, I too wanted to catch a dragonfly and tie a thread to it so it would fly around like a little pet. But my mother stopped me. She told me this very story of Sage Mandavya, and it scared me for life. I never forgot it, and I never tried to catch and bind a dragonfly again.
1. If is were possible for an ordinary mortal to impose arbitrary curses on the god of death and justice, the world would quickly descend into utter chaos.
2. If children are completely free from accountability, adults will form them into an army and convince them to commit crimes on their behalf, leading to an intolerable situation. This may already be a standard way of doing business in some parts of the world.
> If children are completely free from accountability, adults will form them into an army and convince them to commit crimes on their behalf, leading to an intolerable situation. This may already be a standard way of doing business in some parts of the world.
This is an ongoing problem in Norway now and I think it has been in Sweden for some time.
If you want to read more, search for the foxtrot network.
> 1. If is were possible for an ordinary mortal to impose arbitrary curses on the god of death and justice, the world would quickly descend into utter chaos.
Mandavya is not just any mortal; he is an enlightened sage. In Hinduism, enlightened beings are considered superior to gods. There’s another story about Sage Markandeya (one of the nine immortals, the Chiranjeevis) who caused the death of Yama, the God of Death. In Hindu cosmology, all the gods hold honorary responsibilities, and nothing is permanent - not even the position of Brahma, the Creator
> 2. If children are completely free from accountability, adults will form them into an army and convince them to commit crimes on their behalf, leading to an intolerable situation. This may already be a standard way of doing business in some parts of the world
I believe he introduced a juvenile law, which involves reduced sentences or milder punishments rather than granting complete immunity from consequences.
> 1. If is were possible for an ordinary mortal to impose arbitrary curses on the god of death and justice, the world would quickly descend into utter chaos.
Opportunity myth? Mortals are simply temporarily embarrassed gods?
I ended up buying the $100 Codex plan. So far it has been much more generous with usage and more accurate than Claude for the kind of work I do.
That said, Codex has its own issues. Its personality can be a bit off-putting for my taste. I had to add extra instructions in Agents.md just to make it less snarky. I was annoyed enough that I explicitly told it not to use the word “canonical.”
On UI/UX taste, I still think current Codex is behind the Jan/Feb era of Claude Code. Claude used to have much better finesse there. But for backend logic, hard debugging, and complex problem-solving, Codex has been clearly better for me. These days I use Impeccable Skillset inside Codex to compensate for the weaker UI taste, but it still does not quite match the polish and instinct Claude Code used to have.
I used to be a huge Claude Code advocate. At this point, I cannot recommend it in good conscience.
My advice now is simple: try the $20 plans for Codex and Cursor, and see which one matches your workflow and vibes best
reply