More

oceansweep · 2026-04-16T16:36:48 1776357408

migrate to rust.

oceansweep · 2026-04-03T04:04:37 1775189077

Likely meant 'swyx', https://www.latent.space/podcast

swyx · 2026-04-03T04:13:49 1775189629

(came from f5 bot) haha yeah if any ai companies are looking i’ll talk for $200m!!!

oceansweep · 2026-03-26T13:13:12 1774530792

If you don’t mind a little instability while I work out the bugs, might be interested in my project: https://github.com/rmusser01/tldw_server ; it’s not quite fully ready yet but the backend api is functional and has a full RAG system with a customizable and tweakable local-first ETL so you can use it without relying on any third party services.

oceansweep · 2026-01-26T17:20:58 1769448058

You can absolutely downvote posts. You have to have a certain amount of karma before the option becomes available.

digiown · 2026-01-26T17:39:39 1769449179

No I was wrong. You can't downvote posts. Flags are used instead, apparently.

freedomben · 2026-01-27T17:12:55 1769533975

Yes, and I will fully agree with you that flags are overpowered. That system does need to be re-worked IMHO.

nebezb · 2026-01-26T21:12:00 1769461920

freedomben has 28k karma. I don’t think the downvote button is coming.

oceansweep · 2025-12-07T15:42:08 1765122128

Yes. That is absolutely the case. One of the Most popular handguns does not have a safety switch that must be toggled before firing. (Glock series handguns)

If someone performs a negligent discharge, they are responsible, not Glock. It does have other safety mechanisms to prevent accidental fires not resulting from a trigger pull.

agentultra · 2025-12-07T16:50:57 1765126257

You seem to be getting hung up on the details of guns and missing the point that it’s a bad analogy.

Another way LLMs are not guns: you don’t need a giant data centre owned by a mega corp to use your gun.

Can’t do science because GlockGPT is down? Too bad I guess. Let’s go watch the paint dry.

The reason I made it is because this is inherently how we designed LLMs. They will make bad citations and people need to be careful.

oceansweep · 2025-12-03T18:26:27 1764786387

https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/o...

oceansweep · 2025-11-29T01:33:14 1764379994

Just wanted to comment on the fact that I remember seeing that comment, and it left such an impression I remember it 7 years later. Thanks for the reminder, going to bookmark it this time.

oceansweep · 2025-11-07T01:38:20 1762479500

Epyc Genoa CPU/Mobo + 700GB of DDR5 ram. The model is a MoE, so you don't need to stuff it all into VRAM, you can use a single 3090/5090 to hold the activated weights, and hold the remaining weights in DDR5 ram. Can see their deployment guide for reference here: https://github.com/kvcache-ai/ktransformers/blob/main/doc/en...

oceansweep · 2025-10-20T16:06:20 1760976380

Mean time between failures

oceansweep · 2025-07-06T18:14:34 1751825674

Are you doing this with vLLM? If you're using Llama.cpp/Ollama, you could likely see some pretty massive improvements.

kgeist · 2025-07-06T19:27:24 1751830044

We're using llama.cpp. We use all kinds of different models other than Qwen3, and vLLM startup when switching models is prohibitively slow (several times slower than llama.cpp, which is already 5 sec)

From what I understand, vLLM is best when there's only 1 active model pinned to the GPU and you have many concurrent users (4, 8 etc.). But with just a single 32 GB GPU you have to switch the models pretty often, and you can't fit more than 2 concurrent users anyway (without sacrificing the context length considerably: 4 users = just 16k context, 8 users = 8k context), so I think vLLM so far isn't worth it. Once we have several cards, we may switch to vLLM.