Hacker Newsnew | past | comments | ask | show | jobs | submit | oceansweep's commentslogin

migrate to rust.


(came from f5 bot) haha yeah if any ai companies are looking i’ll talk for $200m!!!


If you don’t mind a little instability while I work out the bugs, might be interested in my project: https://github.com/rmusser01/tldw_server ; it’s not quite fully ready yet but the backend api is functional and has a full RAG system with a customizable and tweakable local-first ETL so you can use it without relying on any third party services.


You can absolutely downvote posts. You have to have a certain amount of karma before the option becomes available.


No I was wrong. You can't downvote posts. Flags are used instead, apparently.


Yes, and I will fully agree with you that flags are overpowered. That system does need to be re-worked IMHO.


freedomben has 28k karma. I don’t think the downvote button is coming.


Yes. That is absolutely the case. One of the Most popular handguns does not have a safety switch that must be toggled before firing. (Glock series handguns)

If someone performs a negligent discharge, they are responsible, not Glock. It does have other safety mechanisms to prevent accidental fires not resulting from a trigger pull.


You seem to be getting hung up on the details of guns and missing the point that it’s a bad analogy.

Another way LLMs are not guns: you don’t need a giant data centre owned by a mega corp to use your gun.

Can’t do science because GlockGPT is down? Too bad I guess. Let’s go watch the paint dry.

The reason I made it is because this is inherently how we designed LLMs. They will make bad citations and people need to be careful.



Just wanted to comment on the fact that I remember seeing that comment, and it left such an impression I remember it 7 years later. Thanks for the reminder, going to bookmark it this time.


Epyc Genoa CPU/Mobo + 700GB of DDR5 ram. The model is a MoE, so you don't need to stuff it all into VRAM, you can use a single 3090/5090 to hold the activated weights, and hold the remaining weights in DDR5 ram. Can see their deployment guide for reference here: https://github.com/kvcache-ai/ktransformers/blob/main/doc/en...


Mean time between failures


Are you doing this with vLLM? If you're using Llama.cpp/Ollama, you could likely see some pretty massive improvements.


We're using llama.cpp. We use all kinds of different models other than Qwen3, and vLLM startup when switching models is prohibitively slow (several times slower than llama.cpp, which is already 5 sec)

From what I understand, vLLM is best when there's only 1 active model pinned to the GPU and you have many concurrent users (4, 8 etc.). But with just a single 32 GB GPU you have to switch the models pretty often, and you can't fit more than 2 concurrent users anyway (without sacrificing the context length considerably: 4 users = just 16k context, 8 users = 8k context), so I think vLLM so far isn't worth it. Once we have several cards, we may switch to vLLM.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: