Hacker Newsnew | past | comments | ask | show | jobs | submit | beaker52's commentslogin

I have had LLMs refuse several of my requests. I still got my answers, but at least they tried.

Yea, I was asking a SOTM about copy.fail, and it was freaking out, and tried to indirectly call me a hacker a few times. Weirdly, all I did was slightly reword requests, and they all went through. Granted, I am not actually a hacker, so I guess my follow-up questions made it realize that I am asking for educational purposes, but it was definitely the most accusatory, curt, and outright abrasive I have seen an LLM behave.

The biggest problem isn't the token slot machine refusing to give you the answer, but the fact that multiple refusals can end up flagging your account and getting banned from the service.

While contributing to a friend's Remembrance research, I was pretty surprised when Gemini Pro suddenly refused to answer any more questions about photos from the Höcker Album after it spotted an "SS" insignia.

Ironically, the justification it gave was that it wasn't its fault because it was just following orders. I hope this hasn't landed me on Google's list of undesirables.

Grok, for better or worse, didn't seem to mind.


I've been able to have deepseek give me an unofficial account of what happened on Tiananmen square in 1989.

It even went as far as confirming that we should always base our opinion on multiple sources, not just the government.

We should create badges like "script kiddie", "llm hacker", "grandpa's printer adjuster"


It doesn’t really come as a surprise to me that these companies are struggling to reliably fix issues with software which relies on a central component which is nondeterministic.

But they made their own bed with that one.


I've noticed a lack of product cohesion in general and it does make me wonder if it's a result of dogfooding AI.

For example, chat, cowork and code have no overlap - projects created in one of the modes are not available in another and can't be shared.

As another example, using Claude with one of their hosted environments has a nice integration with GitHub on the desktop, but some of it also requires 'gh' to be installed and authenticated, and you don't have that available without configuring a workaround and sharing a PAT. It doesn't use the GH connector for everything. Switch to remote-control (ideal on Windows/WSL) or local and that deep integration is gone and you're back to prompting the model to commit and push and the UI isn't integrated the same.

Cowork will absolutely blow through your quota for one task but chat and code will give you much more breathing room.

Projects in Code are based on repos whereas in Chat and Cowork they are stateful entities. You can't attach a repo to a cowork project or attach external knowledge to a code project (and maybe you want that because creating a design doc or doing research isn't a programming task or whatever)

Use Claude Code on the CLI and you can't provide inline comments on a plan. There is a technical limitation there I suppose.

The desktop app is very nice and evolving but it's not a single coherent offering even within the same mode of operation. And I think that's something that is easy to do if you're getting AI to build shit in a silo.


this is "you ship your org chart" not ai.

https://en.wikipedia.org/wiki/Conway%27s_law


Even a distributed or silo'd org chart has some affinity across the hierarchy in order to keep things in overall alignment. You wouldn't expect to use a product suite that is, holistically, not fully compatible with its own ecosystem, even down to not having a single concept of a project. Or requiring a CLI tool in an ephemeral environment that you cannot easily configure.

That's clearly a trade-off that Anthropic have accepted but it makes for a disappointing UX. Which is a shame because Claude Desktop could easily become a hands-off IDE if it nailed things down better.


And the multiple concepts of subscriptions for products, and the idea of MCPs/connectors that arent shared between the different modalities, and the idea of api key vs subscription, and two different inbound websites (claude.ai and claude.com)...


Agreed. I use the Claude desktop app almost every day, and have used Code and Cowork since their respective launch dates, and even I still have a really hard time grokking what each is for. It becomes even more confusing when you enable the (Anthropic-provided) filesystem extension for Chat mode. Anthropic really needs to streamline this.


YES! I thought it was just me being a bit scattered. But uploading an important file to a project only to have it not there because....<garbled answer from Claude> is distracting to say the least. I don't know what I've enabled offhand but I hate having to stop and try to work out why Claude can't reference a file uploaded to the project in a chat within that project. I think they should pause on all the wild aspirations and devote some time to fundamentals.


Add to that that notion mcp works for the chat but not code. now my workflow has docs I comment with others in notion, while the actual work and source of truth is in GitHub.

Need to fall back to codex to keep things in sync, but that's a great opportunity to also make sure I can compare how things run - and it catches a lot of issues with Claude Code and is great at fixing small/medium issues.


Absolutely its dogfooding AI and vibing huge features on the house of cards. Its a fucking mess, and the product design is simultaneously confusing and infuriating. But the product is useful and Im more productive with it than without it now.


Well, the fun part is that the algorithms themselves are deterministic. They are just so afraid of model distillation that they force some randomness on top (and now hide thinking). Arguably for coding, you'd probably want temperature=0, and any variation would be dependent on token input alone.


Meh. Temp 0 means throwing away huge swathes of the information painstakingly acquired through training for minimal benefit, if any. Nondeterminism is a red-herring, the model is still going to be an inscrutable black box with mostly unknowable nonlinear transition boundaries w.r.t. inputs, even if you make it perfectly repeatable. It doesn't protect you from tiny changes in inputs having large changes in outputs _with no explanation as to why_. And in the process you've made the model significantly stupider.

As for distillation... sampling from the temp 1 distribution makes it easier.


Bringing up computational determinism in the early days of AI was absolutely career-limiting. But now, even if the model itself is deterministic for batch size 1, load balancing for MOE routing can make things non-deterministic any larger batch size. Good luck with that guys!


  Location: London/UK
  Remote: Yes
  Willing to relocate: No
  Technologies: Experienced Generalist / Go / TypeScript / AWS
  Résumé/CV: https://dri.me/C0BM3ArVTLwKv7zVOtAwY1k3tOZGkw
  Eml: luke at lukebarton co uk
Product engineer. Coming out of some time off. Open to principal/staff/well-compensated roles. I do my best work in fluid, fuzzy, real-world problem spaces, where things need to be figured out, where creativity and adaptability are valuable traits. System design. XP/DDD. Coaching & Mentoring. Available immediately.


The play was to use AI as an opportunity to quietly insert adverts into a platform full of paying users.

The moment your company starts playing a pauper and enshitificating the products I already pay for, is the moment I stop giving you any money at all. Try it. I’m not paying you money so you can try to make more money from me. Either add value and convince me to pay more, or fuck off.


I don’t mind looking stupid. It’s actually an important part of my identity - I lay my humanity bare. I am of flesh after all.

I’m starting to suspect that it’s making it more difficult for me to land a job though. I don’t know. There’s something about it. It’s almost as if businesses aren’t hiring human beings, but I can’t quite put my finger on it.


This is a distinctly Zed solution - trying to move the agent experience into the editor, rather than just giving the agent an interface with which to control and read from the editor.

Not only do the most popular editors have little-to-no incentive to implement it (they’re more interested in pushing their own first-class implementations, rather than integrating those of others), it’s much more work to integrate the evolving agent experience into the IDE than it would be to provide IDE integration points for the agents themselves.

So, I think this project would have been much more successful if it had been more focussed on keeping the agent and IDE experiences separated but united by the protocol, instead of trying to deeply marry them. But that’s not in line with Zed’s vision and monetization strategy.

It won’t be long before the big players start to release their own cloud-based editors. They’ll be cloud-based because the moat is wider, and they’ll try to move coding to the cloud in the way that Google Workspaces moved docs to the cloud. Probably with huge token discounts to capture people. If you squint, you can already see this starting to happen with Claude Desktop, which runs its agent loop on the cloud (you can tell because skills appear to need to be uploaded).

Notably, Microsoft, with VSCode and GitHub have a web-based editor advantage in this space, but no models.


It's not just Zed, Emacs has has a thriving ACP implementation in agent-shell[0], and allows for some very cool integrations[1]. There are a fair number of other clients[2] as well.

[0]: https://github.com/xenodium/agent-shell

[1]: https://www.youtube.com/watch?v=HJQ86HuSIJI

[2]: https://agentclientprotocol.com/get-started/clients


The second half of this is spot on. The now is making IDEs that can integrate with agents, not the other way around. Soon the Claude and Codex will do that for us on their hosts and the argument is it will save sending the context up.


I imagine that it’s because the rug is becoming insufficient to cover the growing dirt pile.

I’m here for it. Corruption is a problem worth solving, so I’m happy to bother the ycombinator readership with it.


Me being a non-US reader, it’s honestly a bit frustrating to see how often people from the US forget that a large portion of HN readers are from other countries and don’t share the same context for posts like this. It ends up assuming US context as universal.

And don’t get me wrong. I agree that corruption is horrible. I live in a country where corruption was and still is rampant. Political discussions related more closely to, let’s say, AI companies such as OpenAI or Anthropic when it comes to the Pentagon do spark interest, since they are somewhat more directly connected to decisions we can make as tech professionals in other countries, whether for moral, ethical, or practical reasons. That is not really the case for posts like these, however. To your point, I would love to see the tech/hacker community come up with ideas about solving corruption, even if it’s just philosophical discussion.

If my point still doesn’t make sense, imagine seeing posts about corruption cases from any other non-US country being posted on HN. What would you think about those?


When i browse sites based in other countries, i don't complain when there's a lot of talk specific to that country. I didn't know what Eurovision was until last week, but now LMNC is representing the UK. A lot of talk about how it should be boycotted because of Israel. How a bunch of people i never heard of are corrupt. i'm just there to cheer on LMNC, but i get why it's being overshadowed by the current politics.


Well, when it came to news about Silvio Burlusconi in Italy, I was incredulous that any established democracy would tolerate such corruption.

Which is why I owe Italians an apology nowadays.


given what we know about trump, "bungabunga" parties with consenting adults sounds positively pedestrian


I don't think the answer to that is to discourage posting US-centric stories about serious political issues. I think the answer is to encourage people from other countries to post theirs, too.

We need more understanding of each other and of each other's situations, not less. The more we tech people bury our heads in the sand about politics—every country's politics—the more likely we are to create more situations like the one we're in today.


Someone in the UK government is furiously writing this down.


> it feels natural to me that the line for images should be aligned with the line for the act itself

Not before we get GTA 6 please.


Next, they'll be banning porn depicting sex between ministers and farm animals.


Basically the plot of Black Mirror pilot



Incidentally,

In November 2015, solicitor Myles Jackman said that performing a sexual act with a dead animal would not be illegal under the Sexual Offences Act 2003. He stated that possessing a photograph of such an act would be illegal under the Criminal Justice and Immigration Act 2008 if it was produced for pornographic purposes, but not if the purpose was "satire, political commentary or simple grossness".


What? So putting your willy in a pig on camera is totally fine while you do it ironically? Why and how would any reasonable human being decide what the purpouse of a photograph of sex with animals is? Have furries been overstepping the law the whole time?


More like Father Ted.


Not really, no.

More like precisely the plot of a Black Mirror Episode, and some rather plausible rumours about David Cameron. Have you not seen them?


The one with Sacha Baron Cohen running in elections?


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: