Hacker Newsnew | past | comments | ask | show | jobs | submit | swalsh's commentslogin

Try running with Open Code. It works quite well.

I had an equally painful experience with Open Code. I don't think the harness is the issue. It's the need for a large context window and slow inference.

Been using the model for a few hours now. I'm actually reall impressed with it. This is the first time i've found value in an image model for stuff I actually do. I've been using it to build powerpoint slides, and mockups. It's CRAZY good at that.

Yeah, it's funny. I would expect to see more enthusiasm versus just basic run-of-the-mill, "oh, there it is". Leave it to the HN crowd. This is incredible. I don't even like OpenAI.

LLMs make for great day 1 demos, but in a few weeks I promise you many people will be able to tell nearly all of the images generated by this are AI. It just takes time and exposure to figure out the new common flaws.

Frankly, I am not sure if they will ever actually be able to solve this problem or if it'll be a continuous game of whackamole, but regardless there's a large crowd of people out there where if they can tell something is AI generated they will not support the company behind it. Being able to tell anything is AI generate cheapens brands.


You're thinking is like everyone else's, and it's backwards. The world will learn to accept it as the standard way of doing things and people will appreciate one generation over another and look at manual image creation as a niche activity like blacksmithing vs assembly-line manufacturing and automation. With the latter, you appreciate the intent and the end result. Same thing here, people are just adjusting to it.

HN is engineer heavy so its a bunch of people who spend their days looking at code. If it's not a coding model they'll likely never use it.

To the average HN'er, images and design are superfluous aesthetic decoration for normies.

And for those on HN who do care about aesthetics, they're using Midjourney, which blows any GPT/Gemini model out of the water when it comes to taste even if it doesn't follow your prompt very well.

The examples given on this landing page are stock image-esque trash outside of the improvements in visual text generation.


My understanding is GPT 6 works via synaptic space reasoning... which I find terrifying. I hope if true, OpenAI does some safety testing on that, beyond what they normally do.


From the recent New Yorker piece on Sam:

“My vibes don’t match a lot of the traditional A.I.-safety stuff,” Altman said. He insisted that he continued to prioritize these matters, but when pressed for specifics he was vague: “We still will run safety projects, or at least safety-adjacent projects.” When we asked to interview researchers at the company who were working on existential safety—the kinds of issues that could mean, as Altman once put it, “lights-out for all of us”—an OpenAI representative seemed confused. “What do you mean by ‘existential safety’?” he replied. “That’s not, like, a thing.”


Amusing! Even if they believe that, they should know the company communicated the opposite earlier.


No chance an openAI spokesperson doesnt know what existential safety is


I did not read the response as...

>Please provide the definition of Existential Safety.

I read:

>Are you mentally stable? Our product would never hurt humanity--how could any language model?


The absolute gall of this guy to laugh off a question about x-risks. Meanwhile, also Sam Altman, in 2015: "Development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity. There are other threats that I think are more certain to happen (for example, an engineered virus with a long incubation period and a high mortality rate) but are unlikely to destroy every human in the universe in the way that SMI could. Also, most of these other big threats are already widely feared." [1]

[1] https://blog.samaltman.com/machine-intelligence-part-1


Why are these people always like this.


Likely an improvement on:

> We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.

<https://arxiv.org/abs/2502.05171>


Oh you mean literally the thing in AI2027 that gets everyone killed? Wonderful.


AI 2027 is not a real thing which happened. At best, it is informed speculation.


Funny if you open their website and go to April 2026 you literally see this: 26b revenue (Anthropic beat 30b) + pro human hacking (mythos?).

I don’t think predictions, but they did a great call until now.


I agree that they called many things remarkably well! That doesn't change the fact that AI 2027 is not a thing which happened, so it isn't valid to point out "this killed us in AI 2027." There are many reasons to want to preserve CoT monitorability. Instead of AI 2027, I'd point to https://arxiv.org/html/2507.11473.


That's sounds really interesting. Do you have some hints where to read more?


Oh, of course they will /s


I had no idea one could buy a Blackhawk for $1.5M


It's the fuel cost that gets you...


Someone's not watching HeavyDSparks ;)

https://www.youtube.com/watch?v=m3P3FWkBFU4


There are Chinooks with no bids as of yet....as well as a Bombardier Challenger


I gave the same prompt (a small rust project that's not easy, but not overly sophisticated) to both Gemma-4 26b and Qwen 3.5 27b via OpenCode. Qwen 3.5 ran for a bit over an hour before I killed it, Gemma 4 ran for about 20 minutes before it gave up. Lots of failed tool calls.

I asked codex to write a summary about both code bases.

"Dev 1" Qwen 3.5

"Dev 2" Gemma 4

Dev 1 is the stronger engineer overall. They showed better architectural judgment, stronger completeness, and better maintainability instincts. The weakness is execution rigor: they built more, but didn’t verify enough, so important parts don’t actually hold up cleanly.

Dev 2 looks more like an early-stage prototyper. The strength is speed to a rough first pass, but the implementation is much less complete, less polished, and less dependable. The main weakness is lack of finish and technical rigor.

If I were choosing between them as developers, I’d take Dev 1 without much hesitation.

Looking at the code myself, i'd agree with codex.


There are issues with the chat template right now[0], so tool calling does not work reliably[1].

Every time people try to rush to judge open models on launch day... it never goes well. There are ~always bugs on launch day.

[0]: https://github.com/ggml-org/llama.cpp/pull/21326

[1]: https://github.com/ggml-org/llama.cpp/issues/21316


What causes these? Given how simple the LLM interface is (just completion), why don't teams make a simple, standardized template available with their model release so the inference engine can just read it and work properly? Can someone explain the difficulty with that?


The model does have the format specified but there is no _one_ standard. For this model it’s defined in the [ tokenizer_config.json [0]. As for llama.cpp they seem to be using a more type safe approach to reading the arguments.

[0] https://huggingface.co/google/gemma-4-31B-it/blob/main/token...


Hm, but surely there will be converters for such simple formats? I'm confused as to how there can be calling bugs when the model already includes the template.


was just merged


It was just an example of a bug, not that it was the only bug. I’ve personally reported at least one other for Gemma 4 on llama.cpp already.

In a few days, I imagine that Gemma 4 support should be in better shape.


Qwen 3.5 27B is dense, so (I think) should be compared to Gemma 4 31B.

Or Gemma-4 26B(-A4B) should be compared to Qwen 3.5 35B(-A3B)


Exactly, compare MoE with MoE and dense with dense otherwise it's apples and oranges.


Its coding to coding. I could care less how the model is architected, i only care how it performs in a real world scenario.


If you don't care about how it's architectured, why you care about size? Compare it to Q3.5 397B-A17B.

Just like smaller size models are speed / cost optimization, so is MoE.

G4 26B-A4B goes 150 t/s on 4090/5090, 80 t/s on M5 Max. Q3.5 35B-A3B is comparably fast. They are flash-lite/nano class models.

G4 31B despite small increase in total parameter count is over 5 times slower. Q3.5 27B is comparably slow. They are approximating flash/mini class models (I believe sizes of proprietary models in this class are closer to Q3.5 122B-A10B or Llama 4 Scout 109B-A17B).


The implication is that there is (should be) a major speed difference - naively you'd expect the MoE to be 10x faster and cheaper, which can be pretty relevant on real world tasks.


The models are not technically comparable: the Qwen is dense, the Gemma is MoE. The ~33B models are the other way around!


Try using Grok 4.1 reasoning. It's crazy cheap, and really it's not that bad.


Sure, it might try to subtly steer you towards fascism, but other than that, it's great.


Neurons that fire together, wire together. Your brain optimizes for your environment over time. As we get older, our brains are running in a more optimized way than when we're younger. That's why older hunters are more effective than younger hunters. They're finely tuned for their environment. It's an evolutionary advantage. But it also means that they're not firing in "novel" ways as much as the "kids". "kids" are more creative I think because their brains are still adopting, exploring novelty, neuron connections aren't as deeply tied together yet.

This is also maybe one of the biggest pitfalls as our society get's "older" with more old people, and less "kids". We need kids to force us to do things differently.


Oh i've been looking for a project for my 11 year old... he's a very project oriented learner, which schools don't seem to do anymore.


What country are you in?


Speak for yourself, I have never thrown away code at this rate in my entire career. I couldn't keep up this pace without AI codegen.


Did you read the article? I don’t think that refutes anything the author said even a little bit.


XD


I bet claude was hyping this guy up as he was building it. "Absolutely, a rust compiler written in PHP is a great idea!"


Every compiler in any language for any language has at the very least educational value.

On the other hand, demeaning comments without any traces of constructive criticism don't have any value.


Does it matter who the sycophant was or just that there was a sycophant?

My partner does that as well as LLMs at this point; "Sure honey, I remember you've talked a lot about Rust and about Clojure in the past, and you seem excited about this Clojure-To-Rust transpiler you're building, it sounds like a great idea!", is that bad too?


There is no comment on whether LLMs/agents have been used. I feel like projects should explicitly say if they were _or_ were not used. There is no license file, and no copyright header either. This feels like "fauxpen-source": imagine getting LEX+YACC to generate a parser, and presenting the generated C code as "open-source".

This is just another way to throw binaries over the wire, but much worse. This has the _worst_ qualities of the GPL _and_ pseudo-free-software-licenses (i.e. the EULAs used by mongo and others). It has all the deceptive qualities of the latter (e.g. we are open but not really -- similar to Sun Microsystems [love this company btw, in spite of its blunders], trying to convince people that NeWS is "free" but that the cost of media [the CD-ROM] is $900), with the viral qualities of the former (e.g. the fruit of the poison tree problem -- if you use this in your code, then not only can you not copyright the code, but you might actually be liable for infringement of copyright and/or patents).

I would appreciate it if the contributor, mrconter11, would treat HN as an internet space filled with intelligent thinking people, and not a bunch of shallow and mindless rubes. (Please (1) explicitly disclose both the use and absence of use of LLMs -- people are more likely to use your software this way, and preserves the integrity of the open source ecosystem, and (2) share you prompts and session).

So passes the glory of open source.


According to his Readme he seems to have built a 3D engine completely from scratch 8 years ago without using any library:

https://github.com/mrconter1/IntuitiveEngine

> A simple 3D engine made only with 2D drawLine functions.


That is (slightly) reassuring (but the rest of his portfolio does not inspire confidence). Nevertheless, we should be required to disclose whether the code has been (legally) tainted or not. This will help people make informed decisions, and will also help people replace the code if legal consequences appear on the horizon, or if they are ready to move from prototype to production.


Slightly?


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: