I am much more interested seeing the M5 Pro and M5 Max in the Mac Mini & Mac Studio.
The temptation of running a local LLM on my gaming PC's GPU finally gave me the incentive I needed to set up Tailscale & Mosh, and there's no going back. My 15" M2 Macbook air is my ideal travel form-factor, and I'd much rather "upgrade" by adding a power-sipping homelab box I can remote into from anywhere.
Not surprising, but it's sad to accept that the only major company building consumer-focused computing devices is Apple.
My hope is that LLMs allow linux to gain market share quickly. I know personally I've had a much smoother time moving to linux now that I can delegate a lot of the annoying troubleshooting/customization to claude.
Being able to say something like "I don't like the window colors make them more consistent with my terminal color scheme" and have it "just work" feels like a superpower. I've even gone as far as asking Claude to directly edit the icon pack svg files to whenever if I encounter something that feels out of place.
this is interesting because of how much it differs from my own hopes. I don't really have any personal need or want for the Linux desktop marketshare to increase. I like computers because I can program them to do something and it will do it. Ideally you have complete control over it. I've customized my desktop here and there in order to get some result, but while you care most about the _result_, for me the act of _making_ that result happen is as important if not more. I'm not looking to offload it to something else.
I don't really see the troubleshooting/customization as annoying. It's not much different than learning to program. At first you don't have any intuition for patterns or ways to solve problems, but given time, you start to identify them and know how to work on it unaided. For many distros or operating systems more broadly, it's the same thing. When in doubt, I head to the Arch wiki or more rarely the forums, then I'm good to go.
I'm not really after some integrated LLM or Copilot 365 for Linux experience when it comes to using my computer.
I'm presently in the process of building (read: directing claude/codex to build) my own AI agent from the ground up, and it's been an absolute blast.
Building it exactly to my design specs, giving it only the tool calls I need, owning all the data it stores about me for RAG, integrating it to the exact services/pipelines I care about... It's nothing short of invigorating to have this degree of control over something so powerful.
In a couple of days work, I have a discord bot that's about as useful as chatgpt, using open models, running on a VPS I manage, for less than $20/mo (including inference). And I have full control over what capabilities I add to it in the future. Truly wild.
> It's nothing short of invigorating to have this degree of control over something so powerful
I'm a SWE w/ >10 years, and you're right, this part has always been invigorating.
I suppose what's "new" here is the drastically reduced amount of cognitive energy I need build complex projects in my spare time. As someone who was originally drawn to software because of how much it lowered the barrier to entry of birthing an idea into existence (when compared to hardware), I am genuinely thrilled to see said barrier lowered so much further.
Sharing my own anecdotal experience:
My current day job is leading development of a React Native mobile app in Typescript with a backend PaaS, and the bulk of my working memory is filled up by information in that domain. Given this is currently what pays the bills, it's hard to justify devoting all that much of my brain deep-diving into other technologies or stacks merely for fun or to satisfy my curiosity.
But today, despite those limitations, I find myself having built a bespoke AI agent written from scratch in Go, using a janky beta AI Inference API with weird bugs and sub-par documentation, on a VPS sandbox with a custom Tmux & Neovim config I can "mosh" into from anywhere using finely-tuned Tailscale access rules.
I have enough experience and high-level knowledge that it's pretty easy for me to develop a clear idea of what exactly I want to build from a tooling/architecture standpoint, but prior to Claude, Codex, etc., the "how" of building it tended to be a big stumbling block. I'd excitedly start building, only to run into the random barriers of "my laptop has an ancient version of Go from the last project I abandoned" or "neovim is having trouble starting the lsp/linter/formatter" and eventually go "ugh, not worth it" and give up.
Frankly, as my career progressed and the increasingly complex problems at work left me with vanishingly less brain-space for passion projects, I was beginning to feel this crushing sense of apathy & borderline despair. I felt I'd never be able make good on my younger self's desire to bring these exciting ideas of mine into existence. I even got to the point where I convinced myself it was "my fault" because I lacked the metal to stomach the challenges of day-to-day software development.
Now I can just decide "Hmm.. I want an lightweight agent in a portable binary. Makes sense to use Go." or "this beta API offers super cheap inference, so it's worth dealing with some jank" and then let an LLM work out all the details and do all the troubleshooting for me. Feels like a complete 180 from where I was even just a year or two ago.
At the risk of sounding hyperbolic, I don't think it's overstating things to say that the advent of "agentic engineering" has saved my career.
I'm using kimi-k2-instruct as the primary model and building out tool calls that use gpt-oss-120b to allow it to opt-in to reasoning capabilities.
Using Vultr for the VPS hosting, as well as their inference product which AFAIK is by far the cheapest option for hosting models of these class ($10/mo for 50M tokens, and $0.20/M tokens after that). They also offer Vector Storage as part of their inference subscription which makes it very convenient to get inference + durable memory & RAG w/ a single API key.
Their inference product is currently in beta, so not sure whether the price will stay this low for the long haul.
You can definitely get gpt-oss-120b for much less than $0.20/M on openrouter (cheapest is currently 3.9c/M in 14c/M out). Kimi K2 is an order of magnitude larger and more expensive though.
What other models do they offer? The web page is very light on details
K2 is the only of the 5 that supports tool calling. In my testing, it seems like all five support RAG, but K2 loses knowledge of its registered tools when you access it through the RAG endpoint forcing you to pick one capability or the other (I have a ticket open for this).
Also, the R1-distill models are annoying to use because reasoning tokens are included in the output wrapped in <think> tags instead of being parsed into the "reasoning_content" field on responses. Also also, gpt-oss-120b has a "reasoning" field instead of "reasoning_content" like the R1 models.
Having a similar experience. Durable memory with accurate, low-latency recall is not at all easy. Loads of subtle design decisions to make around how exactly you want the thing to work.
Would be really interesting to see an "Eager McBeaver" bench around this concept. When doing real work, a model's ability to stay within the bounds of a given task has almost become more important than its raw capabilities now that every frontier model is so dang good.
Every one of these models is so great at propelling the ship forward, that I increasingly care more and more about which models are the easiest to steer in the direction I actually want to go.
Codex is very steerable to a fault, and will gladly "monkey paw" your requests to a fault.
Claude Opus will ignore your instructions and do what it thinks is "right" and just barrel forward.
Both are bad and papering over the actual issue which is these models don't really have the ability to actually selectively choose their behavior per issue (ie ask for followup where needed, ignore users where needed, follow instructions where needed). Behavior is largely global
I my experience Claude gradually stops being opinionated as task at hand becomes more arcane. I frequently add "treat the above as a suggestion, and don't hesitate to push back" to change requests, and it seems to help quite a bit.
For sure. I imagine it'd be pretty difficult to evaluate the "correct" amount of steer-ability. You'd probably just have to measure a delta in eagerness on a single same task between when given highly-specified prompts, and more open-ended prompts. Probably not dissimilar from how artificialanalysis.ai does their "omniscience index".
OpenAI, Anthropic, and other model providers have created tools (the LLMs) with unprecedented new capabilities. The key problems are a) these new tools have weird limitations that make them hard to deploy effectively, and b) these tools are so fundamentally new that creating useful products out of them is an exercise in discovery and requires incredibly novel, forward-thinking vision.
Pete, more than anyone in the OSS community IMO, exemplifies both of these qualities. He is living very much on the bleeding edge, so yes, the 10s of projects he's shipped faster than most devs can ship 1 are not as polished as if he'd created them by hand. But he's been pushing the envelope in ways that few, if any, are, and I'd argue that OpenClaw is much more the result of Pete living on that edge and understanding the trade-offs of these tools better than just about anyone.
Personally, I'm much more jealous of the fact that Pete has already had a successful exit under his belt and had the freedom to explore & learn these tools to the fullest. There is definitely a degree of luck involved with the degree to which OpenClaw took off, but that Pete discovered it is 100% earned IMO.
My guess is that the true impact of this will be difficult to measure for a while. Most "single-person start-ups" will probably not be high-visibility VC-backed, YC affairs, and rather solopreneurs with a handful of niche moonlighted apps each making 3-4 digit monthly revenue.
I'm fascinated by how the economy is catching up to demand for inference. The vast majority of today's capacity comes from silicon that merely happens to be good at inference, and it's clear that there's a lot of room for innovation when you design silicon for inference from the ground up.
With CapEx going crazy, I wonder where costs will stabilize and what OpEx will look like once these initial investments are paid back (or go bust). The common consensus seems to be that there will be a rug pull and frontier model inference costs will spike, but I'm not entirely convinced.
I suspect it largely comes down to how much more efficient custom silicon is compared to GPUs, as well as how accurately the supply chain is able to predict future demand relative to future efficiency gains. To me, it is not at all obvious what will happen. I don't see any reason why a rug pull is any more or less likely than today's supply chain over-estimating tomorrow's capacity needs, and creating a hardware (and maybe energy) surplus in 5-10 years.
The temptation of running a local LLM on my gaming PC's GPU finally gave me the incentive I needed to set up Tailscale & Mosh, and there's no going back. My 15" M2 Macbook air is my ideal travel form-factor, and I'd much rather "upgrade" by adding a power-sipping homelab box I can remote into from anywhere.
reply