Hacker Newsnew | past | comments | ask | show | jobs | submit | redox99's commentslogin

Also if you tweet a link to the content instead of tweeting the actual content, you get penalized by the algorithm.

They do this in almost every tweet.


This has changed recently. Links no longer appear to be penalized.

> If it slightly beats or even matches Opus 4.6

It doesn't though


Curious on why you think this. Any data points that led you to this?

The benchmarks they released

What do you mean? In most cases, the benchmarks show a larger number for Muse and a smaller number for Opus.

In Multimodal yes, but Opus is definitely edging out in Text/Reasoning and Agentic benchmarks.

I think the general skepticism is because they are late to race, and they are releasing a Opus-4.6-equivalent model now, when Anthropic is teasing Mythos.


Besides claiming opus and gemini flash share 99% of style being suspicious, the point that you are wasting money on the expensive model is non sensical. You pay primarily for the intelligence, not the writing style.

Is this article AI slop?


That's kind of crazy. Why doesn't Microsoft revoke such certs such that you can't sign new software with it?

Because it's mostly just performative.

Reddit turned way more into an echo chamber over time. The moderators and the downvote system destroyed the site. The shift from free speech, libertarian and anarchist ideology into heavily left leaning definitely didn't help.

> AI programming is fundamentally different from programming

It's really not. Maybe vibecoding, in its original definition (not looking at generated code) is fundamentally different. But most people are not vibe coding outside of pet projects, at least yet.


Hopefully this does not devolve into ‘nuh-uh’-‘it is too’ but I disagree.

Even putting aside the AI engineering part where you use a model as a brick in your program.

Classic programming is based on assumption that there is a formal strict input language. When programming I think in that language, I hold the data structures and connections in my head. When debugging I have intuition on what is going on because I know how the code works.

When working on somebody else’s code base I bisect, I try to find the abstractions.

When coding with AI this does not happen. I can check the code it outputs but the speed and quantity does not permit the same level of understanding unless I eschew all benefits of using AI.

When coding with AI I think about the context, the spec, the general shape of the code. When the code doesn’t build or crashes the first reflex is not to look at the code. It’s prompting AI to figure it out.


This is the same argument that people used to have against compilers

It is not. One version of a compiler on one platform transforms a specific input into an exact and predictable artefact.

A compiler will tell you what is wrong. On top of that the intent is 100% preserved even when it is wrong.

An LLM will transform an arbitrarily vague input into an output. Adding more specification may or may not change the output.

There is a fundamental difference between asking for “make me a server in go that answers with the current time on port 80” and actually writing out the code where you _have to_ make all decisions such as “wait in what format” beforehand. (And using the defaults is also making a decision - because there are defaults)

Compilers have undefined behaviour. UB exists in well defined places.

Even a 100% perfect LLM that never makes mistakes has, by definition, UB everywhere when spec lacks.


Right, they allow for the idea of gradual specification - you can write in broad strokes where you don't care about the details, and in fine detail when you do. Whether the LLM followed the spec or not is mostly down to having the right tooling.

Compilers are an abstraction. AI coding is not an abstraction by any reasonable definition.

You're only thinking that because we're mostly still at the imperative, REPL stage.

We're telling them what to do in a loop. Instead we should be declaring what we want to be true.


You’re describing a hypothetical that doesn’t exist. Even if we assume it will exist someday we can’t reasonably compare it to what exists today.

It exists today, please message me if you’d like to try it

The value is in the imperative, the computer does what you tell it do, The control is very powerful and is arguably a major reason computer technology is as power and popular as it is today. Bits don't generally speaking argue with you the same way analog programming if by electronics or mechanical means did before the transistor.

You can certainly write in imperative or functional but you are still telling the computer what you want. LLM use impercise language can generate loose binding the actual reality people one. They have there use cases too but they have a radically different locus of control. Compilers don't ask you to give up percision either they will do what you tell them to do. AI can do whatever it thinks is the most likely next token which is foundationally different from what we do when we engage in programming or writing in general


It very much is. It’s more like telling an intern what to do and then reviewing their code. Anyone can do it, and it results in (mostly) slop.

>But most people are not vibe coding outside of pet projects, at least yet.

Major corporations have had outages thanks to AI slop code. Lol the idea that people aren't vibe coding outside of pet projects is hilarious.


The idea that everyone using LLMs is vibe coding is equally hilarious.

If you use an LLM to generate source code you are vibecoding.

You specify the problem in natural language (the vibes) and the LLM spits out source (the code).

Whether you review it or not, that is vibecoding. You did not go through the rigor of translating the requirements to a programming language, you had a nondeterministic black box generate something in the rough general vicinity of the prompt.

Are people seriously trying to redefine what vibecoding is?


> If you use an LLM to generate source code you are vibecoding

No, you're not.

> Are people seriously trying to redefine what vibecoding is?

Yes, you are.


No, that is literally vibecoding. Reviewing vibecoded source is just an extra step. It's like saying "I'm not power toolgardening, I use a pair of gardening scissors afterwards." You still did power tool gardening.

As additional proof, the dictionary definition of vibe coding is "the use of artificial intelligence prompted by natural language to assist with the writing of computer code" [1]

It seems like vibecoders don't like the label and are retconning the term.

[1] https://www.collinsdictionary.com/dictionary/english/vibe-co...


Both you and the Collins dictionary (merely one dictionary, not an absolute anuthority) are retconning. “Vibe coding”, as originally coined in this tweet, means something more specific: to generate code with LLMs and not really look at the output. The term itself suggests this too: reviewing code is not exactly a vibes-based activity, is it?

https://xcancel.com/karpathy/status/1886192184808149383


Here's Merriam Webster with the same definition: https://www.merriam-webster.com/dictionary/vibe%20coding

That tweet coins the term, we agree there. The activity it describes is using natural language to generate software. Whether you add a review process or not doesn't substantially change that. Sure, Karpathy says he doesn't "read the diffs anymore". Why does he say "anymore"? Clearly he was reading them at some point. If not reading any diffs was a core part of the activity, that wouldn't be the case, the tweet itself clearly outlines that as optional. He's clearly not talking about a core part of the activity.


I think the tweet is pretty clear on its intention for the definition and I’m not interested in arguing about it.

I do think the dictionary definitions, such as they are, are coming from a real place: some people do use the more general definition. And you seem to already know about both definitions. So why argue so belligerently and definitively in the first place? Parent comments you were replying to were obviously using the original definition. Talking about “retconning” is obviously silly given this timeline. Meaning in language is not a race to be the first to make it into a dictionary. It’s a very new phenomenon that new terms make it so quickly into a dictionary at all, and they’re always under review. So maybe factor that into your commentary?


Because I believe the broad definition is more widely used, I also don't think the narrow term is useful or meaningful, and I think it's being used purely by vibe coding practitioners who feel that the term has negative connotations.

This all started with the parent comment telling someone else (belligerently and definitively) using the broader definition that they were wrong.


The narrow term is very useful, there is obviously a world of difference between reviewing the output of an LLM and not - the latter is irresponsible. It shouldn’t be surprising that people bristle when being accused of it. It doesn’t make sense to accuse someone of redefining a term to make themselves feel better when the history of the term shows that yours is the redefinition. The simpler explanation is that the accused just doesn’t like being called irresponsible - not that they’re trying to defend LLM code generation from someone who doesn’t like it.

You're saying what I'm saying. They feel self conscious about the term "vibe coding".

And to be clear, nobody accused the people who lashed out here. They reacted to general statements that people are vibe coding.

I also don't understand why the term vibe coding couldn't contain a spectrum of responsible use. Just say you're reviewing your vibe coded commits!

Clearly the issue here is about how vibe coders perceive the term vibe coding. Some of them feel that it's demeaning and are trying to wiggle their way out of the label by arguing semantics.


No, people think it’s demeaning because they are using a different definition to you, the definition which was the original one. Don’t know how I can put it clearer.

You say no, but then you agree that they think it's demeaning. Are you saying no just to say no, because you dislike how I'm framing this?

I don't think you've shown that the narrow definition is the original one. That's just a claim with no evidence or argument for it.

If you think the tweet is that evidence, I disagree. The tweet itself could be used to support both definitions. Personally I think it's more inline with the broader definition (see previous posts in this thread).


I think the tweet is crystal clear evidence that “vibe coding” was meant to mean “LLM code generation without reviewing the generated code”. Plenty of other parent commenters in this thread clearly think the same. Think what you like, but your interpretation is very strange, and the pushback and downvotes you’re getting is because of that.

This is still not an argument.

You are still just stating opinions without any arguments. If you think the tweet is crystal clear evidence of your point, please show why. If you think my interpretation is strange (even though I've already shown you two normative sources that agree with me), please show why.

Look, there's already a term for unreviewed nonsensical genAI output: slop. The original tweet does not comment on the quality of the cod; slop otoh is specifically about the quality of the output. Call it slop if you want to specify that it's unreviewed.

Downvotes are not proof of anything. I'm getting roughly 0.5 downvotes per post, that's to be expected when multiple people are disagreeing with me about something they care about. And HN has been flooded by LLM enthusiasts for the past couple of years. This is not surprising.


Correct, I'm not making an argument on the quality of the evidence, I'm expressing a different opinion and explaining the disconnect. I'm not interested in convincing you as I don't think that will happen, but I did think that you were missing a distinction and could understand the difference even if you thought differently. Apparently not.

The takes on LLM programming on reddit are hilarious and borderline sad. It's way past the point of denial, now into delusions.

They truly believe LLMs are close to useless and won't improve. They believe it's all just a bubble that will pop and people will go back to coding character by character.


Are there really 70 percent srgb laptops at $600?


Power cycling is not a solution. It's a crappy workaround, and you still had downtime because of it. The device should never get stuck in the first place, and the solution for that is fixing whatever bug is in the firmware.

If they want to reduce support calls, then have more reliable gear.


> Power cycling is not a solution. It's a crappy workaround, and you still had downtime because of it. The device should never get stuck in the first place, and the solution for that is fixing whatever bug is in the firmware.

I'm sympathetic to the argument that companies should make support calls less necessary by providing better products and services, but "just write bug-free software" is not a solution.


This isn't a case where you need bug free software. This is a case where the frequency of fatal bugs is directly proportional to the support cost. Fix the common bugs, then write off the support for rare ones as a cost of doing business.

The effect of cheap robo support is not reducing the cost of support. It is reducing the cost of development by enabling a more buggy product while maintaining the previous support costs.


Giving the device enough RAM to survive memory leaks during heavy usage would also be a valid option, as is automatic rebooting to get the device back into a clean state before the user experiences a persistent loss of connectivity. There are a wealth of available workarounds when you control everything about the device's hardware and software and almost everything about the network environments it'll be operating in. Fixing all the tricky, subtle software bugs is not necessary.


For a community full of engineers, I'm always surprised that people always take absolutionist views on minor technical decisions, rather than thinking of the tradeoffs made that got there.


The obvious trade off here is engineering effort vs. development cost, and when the tech support solution is "have you tried turning it off, then on again?" We know which path was chosen


You can't just throw RAM at embedded devices that you make millions of and have extremely thin margins on. Have you bothered to look at the price of RAM today? At high numbers and low margins you can barely afford to throw capacitors at them, let alone precious rare expensive RAM.


No, XFinity are the ones who decided their routers “““need””” to have unwanted RAM-hungry extra functionality beyond just serving their residential customers' needs. Their routers participate in an entire access-sharing system so they can greedily double-dip by reselling access to your own connection that you already pay them for:

- https://www.xfinity.com/learn/internet-service/wifi

- https://www.xfinity.com/support/articles/xfinity-wifi-hotspo...


We're talking about devices where the retail price is approximately one month of revenue from one customer, and that's if there isn't an extra fee specifically for the equipment rental. Yes, consumer electronics tend to have very thin margins, but residential ISPs are playing a very different game.


A memory leak will consume any amount of ram by definition, adding more ram is not a solution either.


You're implying all software/hardware is of equal quality. I've had many routers with years of uptime, never requiring a reboot.

And I'm sure they had a lot of bugs, but not every bug means hanging to the point of requiring a reboot during normal operation.

Even a proper watchdog would, after some downtime, recover the system.


IME ChatGPT is pretty mid at search. Grok although significantly dumber, is really strong at diligently going through hundreds of search results, and is much more tuned to rely on search results instead of its internal knowledge (which depending on the case can be better or worse). It's the only situation where Grok is worth using IMO.

Gemini is really good with many topics. Vastly superior to ChatGPT for agronomy.

You should always use the best model for the job, not just stick to one.


I'd be friends with you. Wish you had contact info in your profile.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: