More

electroglyph · 2026-05-07T09:19:14 1778145554

nice writeup! looking forward to doing some more training as soon as i get some more data sorted. it'll be a custom arch, but i'll probably shoehorn it into unsloth for a speed boost.

danielhanchen · 2026-05-07T10:22:58 1778149378

Thank you!

electroglyph · 2026-05-05T06:41:57 1777963317

you can train it, but not fully

mips_avatar · 2026-05-06T05:16:01 1778044561

I trained karpathys d28 1.6b nanochat on a 3090. Took an extremely long time but I did it.

electroglyph · 2026-05-01T08:24:18 1777623858

that's in the ideal scenario where it's only seen a single copy of it tho

electroglyph · 2026-05-01T08:21:49 1777623709

it was 1.3e-6 billion years ago!

electroglyph · 2026-04-28T05:31:48 1777354308

i'm doing inference on a free mi300x instance from AMD right now. not sure if the software stack is just old or what, but here's what i've observed: stuck on an old version of vllm pre-Transformers 5 support. it lacks MoE support for qwen3 models. oss-120b is faaaar slower than it should be.

int8 quantization seems like it's almost supported, but not quite. speeds drop to a fraction of full precision speed and the server seems like it intermittently hangs. int4 quantization not supported. fp8 quantization not supported.

again, maybe AMD is just being lazy with what they've provided, but it's not a great look.

right now the fastest smart model i can run is full precision qwen3-32b. with 120 parallel requests (short context) i'm getting PP @ 4500 tokens/sec and TG @ 1300 tokens/sec

electroglyph · 2026-04-26T02:31:32 1777170692

but should you drive or walk to the car wash?

electroglyph · 2026-04-25T22:57:57 1777157877

i dunno, Opus is losing it's edge imo. i regularly use a mix of models, including Opus, glm 5.1, kimi 2.6, etc. and i find that all of them are pretty much equally good at "average" coding, but on difficult stuff they're nearly equally bad. i can't deny that Opus has an edge, but it's not a huge one.

electroglyph · 2026-04-25T22:27:38 1777156058

> they also don't know what they don't know

they sort of do tho:

https://transformer-circuits.pub/2025/introspection/index.ht...

2ndorderthought · 2026-04-25T23:08:16 1777158496

I won't quibble even though I likely should. Have to remember this is HN and companies need to shill their work otherwise ... Yes.

I will play along and assume this is sound. 10-40% +/- 10% is along the lines of "sort of" in a completely unreliable, unguaranteed and unproven way sure.

electroglyph · 2026-04-25T10:15:42 1777112142

how about a unicode art tool?

https://electroglyph.github.io/atheriz_draw/

electroglyph · 2026-04-20T09:00:25 1776675625

https://sleepingrobots.com/dreams/stop-using-ollama/