Seems OpenAI knew this is forthcoming so they front ran the news? I was really h...

erichocean · 2025-07-21T17:50:32 1753120232

I regularly have the opposite experience: o3 is almost unusable, and Gemini 2.5 Pro is reliably great. Claude Opus 4 is a close second.

o3 is so bad it makes me wonder if I'm being served a different model? My o3 responses are so truncated and simplified as to be useless. Maybe my problems aren't a good fit, but whatever it is: o3 output isn't useful.

Davidzheng · 2025-07-21T17:57:57 1753120677

I have this distinctive feeling that o3 tries to trick me intentionally when it can't solve a problem by cleverly hiding its mistakes. But I could be imagining it

int_19h · 2025-07-21T18:45:06 1753123506

It's certainly the "laziest" model, in the sense that it seems to be the likeliest to avoid doing the actual work and generate "TBD" stubs instead.

helloplanets · 2025-07-21T18:14:34 1753121674

Are you using a tool other than ChatGPT? If so, check the full prompt that's being sent. It can sometimes kneecap the model.

Tools having slightly unsuitable built in prompts/context sometimes lead to the models saying weird stuff out of the blue, instead of it actually being a 'baked in' behavior of the model itself. Seen this happen for both Gemini 2.5 Pro and o3.

square_usual · 2025-07-21T18:25:33 1753122333

Are you using o3 on the official ChatGPT app or via API? I use it on the app and it performs very well, it's my go-to model for general purpose LLM use.

erichocean · 2025-07-21T21:34:48 1753133688

official ChatGPT app

sigmoid10 · 2025-07-21T17:25:33 1753118733

>I was really high on Gemini 2.5 Pro after release but I kept going back to o3 for anything I cared about

Same here. I was impressed by their benchmarks and topping most leaderboards, but in day to day use they still feel so far behind.

aerhardt · 2025-07-21T18:33:33 1753122813

I use o3, openAI API and Claude Code. Genuinely curious what about Gemini 2.5 is so far behind?

nicce · 2025-07-22T07:33:04 1753169584

I don’t think it is behind in anything. It is just harder to make obey and redefine the default system command. It is very verbose model by default.

sigmoid10 · 2025-07-22T11:29:15 1753183755

It's not just much more verbose in general, it easily gets lost in verbosity. As in often trying to solve issues that aren't there. And when you try to make it focus, it doesn't get the issue at all.

danjl · 2025-07-21T17:28:53 1753118933

I think that's most likely just your view, and not really based on evidence.