Hacker Newsnew | past | comments | ask | show | jobs | submit | 44za12's commentslogin

I read it as an article in defence of boring tech with a fancier/clickbaity title.

Here’s the more honest one i wrote a while back:

https://aazar.me/posts/in-defense-of-boring-technology


While I agree with your points, this one could be more nuanced:

> Infrastructure: Bare Server > Containers > Kubernetes

The problem with recommending a bare server first is that bare metal fails. Usually every couple of years a component fails - a PSU, a controller, a drive. Also, a bare metal server is more expensive than VPS.

Paradoxically, a k3s distro with 3 small nodes and a load balancer at Hetzner may cost you less than a bare metal server and will definitely give you much better availability in the long run, albeit with less performance for the same money.


In 5 years of running 3x Dell R620s 24/7 - which were already 9 years old when I got them - I had two sticks of RAM have ECC errors, and one PSU fail. The RAM technically didn’t have to be replaced, but I chose to. The PSU of course had a hot spare, so the system switched over and informed me without issue.

IME, hardware is much more reliable than people think.


Specialised models easily beat SOTA, case in point: https://nehmeailabs.com/flashcheck


All of us use the same keyboards more or less, maybe us randomly typing a large number is not as random as we would like to think. Just like how “asdf”, “xcyb” are common strings because these keys are together, there has to be some pattern here as well.


Especially for those very large numbers in the top ten (like 166884362531608099236779 with 6779 searches), and the relatively small number of total "votes" (probably less than a million), I think the only likely explanation for their rank is ballot-stuffing.


That means there is less entropy than purely random strings, not that this specific number would be so far outside the distribution. My money would be on someone hammering it.


This is the way. I actually mapped out the decision tree for this exact process and more here:

https://github.com/NehmeAILabs/llm-sanity-checks


That's interesting. Is there any kind of mapping to these respective models somewhere?


Yes, I included a 'Model Selection Cheat Sheet' in the README (scroll down a bit).

I map them by task type:

Tiny (<3B): Gemma 3 1B (could try 4B as well), Phi-4-mini (Good for classification). Small (8B-17B): Qwen 3 8B, Llama 4 Scout (Good for RAG/Extraction). Frontier: GPT-5, Llama 4 Maverick, GLM, Kimi

Is that what you meant?


at the sake of being obvious, do you have a tiny llm gating this decision and classifying and directing the task to its appropriate solution?


>Before you reach for a frontier model, ask yourself: does this actually need a trillion-parameter model?

>Most tasks don't. This repo helps you figure out which ones.

About a year ago I was testing Gemini 2.5 Pro and Gemini 2.5 Flash for agentic coding. I found they could both do the same task, but Gemini Pro was way slower and more expensive.

This blew my mind because I'd previously been obsessed with "best/smartest model", and suddenly realized what I actually wanted was "fastest/dumbest/cheapest model that can handle my task!"


For simple extraction tasks, a delimiter-separated string uses 11 tokens vs 35 for JSON. Output tokens are the latency bottleneck.


Shameless plug.

I’ve been using a cli tool i had created for over 2 years now, it just works. I had more ideas but never got to incorporate those.

https://github.com/44za12/horcrux


6 years for me if we're counting :)

https://github.com/edify42/otp-codegen


Love the minimalism.


Have been using remove.bg for this for years now.


Yes, I’ve built a free tool that delivers the same background removal results as remove.bg


Like a sempahore?


Semaphore limits concurrency, this one automatically groups (batches) input.


I’ve had great luck with all gemma 3 variants, on certain tasks it the 27B quantized version has worked as well as 2.5 flash. Can’t wait to get my hands dirty with this one.


Can you benchmark Kimi K2 and GLM 4.5 as well? Would be interesting to see where they land.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: