Purely marketing from front to back. It's a little bit similar to how AMD spent too much time on R&D for a "true quad-core" processor (which ended up being Phenom) back in the late aughts (2008 or so?). Intel kinda glued two dual-core CPUs together and beat them to market with a "quad-core" processor, securing dominance for about 15 years. I used the Phenom IIx4 for almost that entire 15 year period - it was really great! But they didn't sell that many.
Unfortunately it seems like fiscal success in 2023 mostly means jumping on the hype train of the day and beating the drum for all it's worth, and being ready to pivot to the next thing quickly. Rather than maximizing actual value in your product.
>Unfortunately it seems like fiscal success in 2023 mostly means jumping on the hype train of the day and beating the drum for all it's worth, and being ready to pivot to the next thing quickly. Rather than maximizing actual value in your product.
We have nobody to blame but ourselves. Consumers are far too happy to just jump on the hypiest train and don't care enough to actually educate themselves about which product has maximum value. It's the entire reason Apple still exists as a company, for example.
>> It's the entire reason Apple still exists as a company, for example.
Apple makes great product with great user experience. And I say this as someone who switched from Windows PC to Linux PC on my desktop 20 years ago. My family all went Apple though, and I can completely see why. I do have an MBA, and I'm putting Linux on an old (un-upgradable) iMac of theirs because the hardware is very nice.
Does Apple charge a premium for their brand? Yes. Does that mean they're on a hype train? Nope, they really do put together a good UX - with the exception of eliminating standard ports in favor of expensive adapters.
> It's the entire reason Apple still exists as a company, for example.
Nope. Apple still exists because they were the first ones to sell digital media players that weren't a DRM clusterfuck (Zune), had actual storage capacity, battery life and sound quality. The iPod saved Apple - at a point, it contributed 40% of revenue and with the iPod Touch and iPhone's jailbreaks proving demand for apps, it paved the way to the endless cash cow that is the App Store.
This is ahistorical. iPods were locked to a specific iTunes library at a time when other players acted as mass storage devices, drag and drop.
DRM-free didn't happen until after the iphone [0]
iPod was a better product with better marketing, but it was not freer (libre) than its competition. (Personally I liked my creative Zen with Yahoo unlimited subscription [1]- DRM was a feature! Spotify wouldn't come along for another 5 or 6 years)
I was just getting into highschool. I had a few of the earlier competitors. The difference between that and an iPod is that iPod “just worked.”
To me, It’s the reason Apple made it. When family bought apple stuff it tended to function. When they bought windows or dells or sandisk media players stuff they tended to have it unplugged on its side and a screw driver in one hand and a phone in the other.
But iTunes wás the user experience with playlists and a proper desktop interface. Drag and drop to iTunes was much more convenient for 99% of the users, except you maybe.
I got the airpods 2 recently, it’s not just that they work real nice with my iPhone, the sound quality blew me away, and spatial audio with an iPhone is almost unnerving in it’s realism, albeit not much use.
A bunch of people have already pointed out that Apple is a weird thing to use here, but no one has really pointed out that Apple actually takes the opposite approach with remarkable success.
Apple typically waits for competitors to jump on the hype train, usually with poor results, then learns from their lessons and steps in with a product that appeals to non-early-adopters.
Smartphone hype had basically died down by the time the iPhone was introduced. Apple Watch wasn’t out until 5 years after Fitbit (which was by no means the first smartwatch). Apple Vision came at a time where a number of AR/VR devices failed to gain any sort of traction and everyone had kind of accepted that AR/VR wasn’t going to be mass market for a long time, and really only benefits from hype from the Apple brand, not because anyone is particularly excited about non-Apple VR.
> It's the entire reason Apple still exists as a company, for example.
I genuinely can't believe you've said this with a straight face. If anything, Apple is a company that ruthlessly figures out what best delivers value to their customers and iterates like crazy on that principle.
> We have nobody to blame but ourselves. Consumers are far too happy to just jump on the hypiest train and don't care enough to actually educate themselves about which product has maximum value. It's the entire reason Apple still exists as a company, for example.
And water is all too happy to flow downhill. So we build dams to hold it back.
> Consumers are far too happy to just jump on the hypiest train and don't care enough to actually educate themselves about which product has maximum value. It's the entire reason Apple still exists as a company, for example.
Can you elaborate on this? It seems to me that observing Apple, it is a company that creates and captures significant value as a business. They are excelling in some key business areas, such as supply chain logistics, while also moving the needle in a meaningful way with the development of best-in-class hardware, specifically in sensor packages, baseband processors, and CPUs. Some of that has to do with their buy of TSMC yield on N3 and other nodes, however there is a direct business acumen in choosing to do this and understanding the logistical chain of provenance to dominate the market from a hardware perspective.
While I agree that this release from Intel is mostly marketing, it isn't clear to me that Apple is driving a hype train, rather than actually meaningfully moving the needle.
>Some of that has to do with their buy of TSMC yield on N3 and other nodes, however there is a direct business acumen in choosing to do this
That seems a little excessive. I'd say that, on the contrary, it would take a profound lack of business acumen to do otherwise in Apple's position. If you're trying to sell a premium product at that scale, what else would you do?
Good question. I'd honestly agree with you. There was an article linked here the other day quoting Charlie Munger saying that the secret to success in business was less about doing the right things and more about not doing the wrong things. Unfortunately if you look at the hardware business today, there's a lot of companies clearly doing the wrong things, Apple is also clearly not. Still goes to my point that Apple is far more than a hype train, there's actually something of substance there.
Why blame consumers? The market is designed by advertising to be opaque like this. You aren’t supposed to quickly gleam technical information and take relevant action. You are supposed to only find the hype when you go looking and buy into it. This is our modern marketplace. The consumers are blameless, they are a captive audience here, joe shmoe out of his garage down the street isn’t making computers for you.
Sort of true, but the problem is that "ourselves" includes everyone from the top 2% of consumers with highly specialized education, training, and experience in the relevant technology as well as the randos who couldn't begin to explain the technology to keep a gun at their temple from being fired, adn jsut use "social proof" as their primary criteria for buying anything.
Category 2 and everything in between makes up the bulk of the customer base. so even if 100% of the Category 1 people, plus everyone they knew, bought only true technological advancements and never bought marketing-driven products, there are not enough to make a difference to the marketing & finance "leaders" at those companies.
> On stage, Gelsinger teased what he claimed would be the "world's largest AI supercomputer in Europe," and one of the "top 15 AI supercomputers in the world."
Disregarding what constitutes an "Ai supercomputer", surely "world's largest AI supercomputer in Europe" would just be Europe's largest Ai supercomputer?
I interpreted it as the world's largest ai supercomputer, which may not be the fastest, and will be located in Europe. Size and performance are two different metrics, especially if you're using lower density processing.
Phenom I was truly awful, sadly... especially once the bugs and instability turned up. It wasn't really competitive even before it had to be turned down.
marketing announcement leading to competition is a good thing.
It is only a positive thing to have robust competition in a market. This goes not only for the customers, but the employees in the companies that are competing. Sort of sucks to work in a company that won't (or can't) compete.
I think folks are underestimating how far we have to go on inference performance. Even ignoring LLM’s like phi-1.5. PyTorch was meant to be easy to use first, and fast second. This turned out to be the winning combo for adhoc science workloads.
For pure LLM inference, you can go much further by fusing operations together and eventually building asic extensions to improve transformer perf.
I’d easily believe that we will see a 10x reduction in inference costs over the next 5 years as hardware folks catch up.
> PyTorch was meant to be easy to use first, and fast second.
People have been saying this for years, yet most training and a majority(?) of inference is done in PyTorch.
We have all sorts of cool ML compilers, some that directly support importing PyTorch projects, largely sitting there gathering dust. Even torch.compile is catching on like molasses in winter.
Most "production" ML models of the last 10 years were only a few millions of parameters owing to limited training data and simplistic tasks. While BERT and similar models raised the parameter count substantially, my anecdotal experience observing hundreds of ML projects was that BERT based models provided only a marginal (to the business) gain in performance over smaller models.
We are now seeing mass consumer and enterprise adoption of models in the hundreds of billions of parameters (GPT-4). These models use relatively standard, building blocks - and are even chained together in a relatively standard manner. If everyone's laptop was running a 7B parameter LLM, or actively using a cloud based derivative - then LLM inference would have a chance at being the dominant compute workload in the industry.
This will definitely change the way that inference is done, even if training remains the same.
That, but I think many users don't even know its an option to try.
I am hoping torch 2.1 improves this situation. torch.compile should "just work" much more often with `dynamic=true`, and its widespread usage might make people wonder "what other cost-reducing compilation frameworks are out there?"
I think the changed cost of money will improve this over time. It’s possible interest rates will swing back (and there is still a lot of denial about how far back) but IMO the days of throwing money at things and punting on efficiency appear to be waning.
I was really impressed with Microsoft's paper on llm architectures, where instead of using giant chips with giant memory busses, they pipe together a bunch of cheap, monolithic, SRAM heavy SoCs.
This would work really well for something like Llama, where the layers can be easily split up into SRAM sized chunks, and the whole thing can be pipelined as long as many users are making requests.
That's how we get locked in to that model e.g. how processors became optimized for LLVM/the C ecosystem (whereas in the past there were Lisp optimized processors, general processors etc.)
I've learned the Register isn't a very good rag, they push the cattiness so much, you can only get information from between the lines. The reality is "AI" is an important part of the future, local AI if it's going to be a good future. The other reality is Apple is way ahead with its M chips for consumer devices, with their tremendous memory bandwidth and GPU well suited to running LLMs up to the speed of a 4080. If they didn't limit RAM so much and could scale up to the level of professional GPUs, they would be very compelling.
AMD is being very flaky with their rollout. They only provide the ML chip on some of their new chips, and let some vendors block it out in BIOS. I've been watching their github demo site, and they won't commit to anything but a very limited demo for Windows.
Intel at least released an open source kit earlier this year and will support their VPU on all Meteor Lake+ chips, as well as other processors. Consistency and openness is needed to break NVidia's deadlock here, OpenVINO could be a good step.
Meteor Lake will have 128GB/s for on-chip RAM bandwidth, that's a fraction of M-chips, it really remains to be seen how this will shake out, but it seems likely there will be a new class of computing device, and it may make all others very second class.
The cattiness is fine, because the validity of the use case doesn't mean their chips have a big enough advantage on that use case to matter. Not just because of all the vector units you already have in the CPU and iGPU cores, but as another comment points out memory bandwidth is often the bottleneck and a bunch of matrix multiply boosts won't affect that.
I'm not entirely sure that intel is in the best position to make consumer grade specialty "AI" PC because they make CPUs not GPUs. Maybe that's a bad interpretation on my end since intel has afforded graphics for computers without a dedicated GPU.
Side note: as a hobbyist game dev I've learned about shaders. A shader is simply a program written for a GPU. In game dev we talk about shaders for graphics output but one can write generic shaders too (look up GPGPU). My eyes were opened to the concept that there are programs written for CPU and programs written for GPU.
A CPU typically has <=12 cores and each core computes very quickly. A GPU typically has <=1024 cores and each core computes quickly (but not as quick as a CPU core). It follows that a program ideal for a GPU is one that benefits massive parallelization. I'm still noodling on what that ideal program could be, outside of the obvious graphics processing.
Sharing in case others hadn't realized this tidbit and/or someone more experienced can share their thoughts.
They do make GPUs! They've made some beefy iGPUs in the past, like Broadwell with eDRAM, and they make discrete consumer GPUs like the very reasonably priced 16GB Arc A770 now, and they make 128GB server GPUs!
They have some very large GPUs, like Falcon Shores and the Battlemage enthusiast die, on their roadmap.
Also, AI is not always as "core happy" as you would think. For instance, llama.cpp can essentially saturate a DDR5 RAM bus generating tokens on a relatively modest CPU, hence a huge IGP would bring limited benefits in that specific phase. Its also a relatively "serial" operation since the next tokens depend on the previous once, hence its hard to get good GPU utilization without serving multiple clients in parallel. And other "AI chip" designs have diverged from GPUs, like this one:
Wow, you sent me down a serious rabbit hole. That core looks awesome, and the ring bus blew my mind [1].
What is the practical difference between a very wide SIMD processor vs very many single-instruction processors? Is there any? They say this processor is comprised of 16 "slices", each 265B SIMD + 1 MiB cache. Is that different at all from having 16 265B processors?
A huge SIMD core should have a die area advantage (which in turn gives it other advantages) over a bunch of little processors of the same "width" executing instructions independently. And there is less overhead.
In exchange, it can't process different instruction streams simultaneously. It has to perform the same operations over a giant chunk data, or otherwise "waste" the huge SIMD width.
Another highlight of this thing:
> There is also extensive support for predication with 8 predication registers. The unit is optimized for 8-bit integers (9-bit calculations)
From everything I read, NCore would have been a low price LLM monster. Centaur would probably be alive and selling them like hotcakes if they came out with it now, instead if then.
> A CPU typically has <=12 cores and each core computes very quickly. A GPU typically has <=1024 cores and each core computes quickly (but not as quick as a CPU core).
That's not entirely accurate... Cuda "cores" are mostly analogous to a SIMD lanes, which iirc on Nvidia are about 32 registers wide, so 1024/32 = 32 actual individual compute units...
Similarly, most Intel CPUs support AVX which can be 16, or even 32 lanes wide in the case of AVX-512. Which gives you 384 lanes.
There are of course many other details that make the comparison difficult...
Of course the essence of your thinking is in the right direction, but it's just that, in some ways, CPUs and GPUs are have converged more then may be obvious at fitst glance.
I will however take issue with the point that Intel may not be well placed, due to the similarities, along with Intel's experience with GPUs, and projects like Larrabee which eventually became Xeon Phi, they certainly have the technical expertise. Now, if they are institutionally well suited for the task, we'll just have to wait and find out.
> I'm not entirely sure that intel is in the best position to make consumer grade specialty "AI" PC because they make CPUs not GPUs.
Intel makes (albeit recently) the Arc series of dedicated GPUs.
I’m not a graphics programmer so it’s not quite clear how the logic units compare between nVidia, AMD, and Intel, but this link suggests thousands of shading units/cores.
> I'm still noodling on what that ideal program could be, outside of the obvious graphics processing.
Basically any kind of big linear algebra workload. Things where you have a big vector/matrix and you need to do some operations to the whole thing. This applies to graphics processing, but also to all DSP in general: RF processing, image manipulation, video encoding, etc. Audio stuff tends to not need it because you only need to process 44000 samples per second, and CPUs are fast enough for that.
AI stuff, I understand, uses a lot of linear algebra, so GPUs are good for that workload also.
Intel do make GPUs. They're relatively new to the high-performance space, but they have been making iGPUs for a very long time. Their success in the dGPU market is far from certain, but they're making a credible effort.
More to the point, we use GPUs because that's what we have, not because they're optimal. Intel have a lot of experience in designing special-purpose accelerators, with perhaps the best example being Quick Sync - a low-end Intel CPU outperforms pretty much anything at video encoding and decoding. Inferencing is a very different workload to training and there is substantial potential for on-CPU accelerators to massively improve the performance and efficiency of inferencing tasks.
"CUDA C++ extends C++ by allowing the programmer to define C++ functions, called kernels, that, when called, are executed N times in parallel by N different CUDA threads, as opposed to only once like regular C++ functions."
GPUs are increasingly optimised for machine learning workloads, but they aren't really designed for it on an architectural level and still leave a lot of potential performance and efficiency on the table. Current ML implementations are still quite kludgy and path-dependent, with lots of technical debt. There is huge potential for optimisation throughout the stack.
I’d welcome correction, but my understanding is that NPUs are “native” neural weight processors that trade the flexibility of doing graphics for the optimization of not having a layer like CUDA between neural net and hardware.
The idea isn't terrible, but I don't understand the market. People do 90% of their work "on the web" or "in the cloud", but they're going to be concerned about running an LLM locally (and have no ability to audit security)?
Individuals aren't concerned about that, but plenty of companies and government agencies are. Most wouldn't even want to audit the LLM, they just want to audit that the information doesn't leave their network. Running the LLM locally on some "AI PC" would satisfy that perfectly, and the types of organizations that care the most about this are also the ones with the most generous budgets for these kinds of problems.
>Running the LLM locally on some "AI PC" would satisfy that perfectly, and the types of organizations that care the most about this are also the ones with the most generous budgets for these kinds of problems.
There are far better solutions, today, for intranet LLMs, then having an AI PC on each worker's desk.
>having no ability to audit security vs running one "in the cloud"? Seems like the opposite to me.
By "ability" I mean, literally, ability.
If the customer is someone who does most of their computing in the cloud today, what value is a "personal secure LLM"? Can this person even tell if their local LLM is secure? Doubt it. For the average person, that is best left to "the cloud".
I think they meant people trust the web enough to use it for most of their activities so the ability to audit security in a local LLM may not be a big selling point.
No the point is that I'm already doing basically all of my computing (both work and non-work) via web applications. All of the AI integrations into products can happily run on backends rather than locally on my machine. There are exceptions, of course, but this is the state of the bulk of modern computing.
A PC with a built it AI "IT guy" would be a game changer. It sits there crunching all the logs and performance data, making tweaks or recommendations for solving known or potential issues.
Much of what I did in my systems support was sifting through log files trying to find the answers amid mountains of crappy search results.
Apple's been making hay with "neural engines" in their hardware for quite a while. It is curious why Intel hasn't made a bigger splash with its efforts, which go back a ways and have included (semi?) cancelled things like the RealSense from aquisitions like Movidius (whose chips also formed the core of early products like the Google AIY Vision Kit; AIY still being marketed as a deployment target for Coral).
I guess that: 1. Apple has been able to capitalize on their constrained stack of HW/SW. 2. Intel has struggled with being the unpopular dish at a well optioned buffet.
On-device ML inferencing is going to be an interesting space.
Your iPhone, iPad, Mac, and Apple TV make use of a specialized neural processing unit called Apple Neural Engine (ANE) that's way faster and more energy efficient than the CPU or GPU.
This year at WWDC 2022, Apple is making available an open-source reference PyTorch implementation of the Transformer architecture . . . specifically optimized for the Apple Neural Engine (ANE), the energy-efficient and high-throughput engine for ML inference on Apple silicon.
Intel RealSense Technology, formerly known as Intel Perceptual Computing, was a product range of depth and tracking technologies designed to give machines and devices depth perception capabilities. . . In August 2021 Intel announced it was "winding down" its RealSense computer vision division to focus on its core businesses.
The AIY Maker Kit . . . Using a Raspberry Pi and a few accessories, our tutorials show you how to build a portable device that can identify objects, people, and body poses with a camera, and recognize voice commands or other sounds with a microphone. . . [using] TensorFlow Lite models. . . .
Very true. Apple is in a prime position to make strides in language model development on their hardware. An Apple-designed LLM that runs locally would be amazing. Especially if they could integrate it with iOS or macOS.
While I’m very excited for an LLM powered Siri that has access to my calendar, email, and messages, it’s worth noting that llama.cpp is using metal for acceleration and the M2 Mac Studio is one of the best values out there for running very large models like llama2 70B derivatives or Falcon 180B. The shared ram/vram allows you to load huge models and it’s got decent tokens/sec performance.
In the last few weeks, the initial prompt evaluation step got much faster with llama.cpp so everything that depends on it like the python library and lmstudio are faster as well. I’m very happy with my purchase at this point and it seems to keep getting better.
I'm curious about this memory things with the Macs. I was looking at Airs with M2, and the memory is only 8G. Is that the limit, if I'm say... running containers and applications?
Where can I learn more?
I don't understand how one can ship a computer today with only 8G of mem. My phone has more than that. Are developers able to work with this amount of mem, or are they forking out $3k + for a Pro with 24G?
I went with a Framework because I got good hardware for way less money
My previous MBP had 32 GB of ram and it was constantly swapping. Now I have 64, and it's holding up. I guess it really depends on your work environment, but I need Docker, IntelliJ, plus a bunch of other stuff, and it quickly adds up.
I don't know how anyone can use a laptop with only 8GB of ram. Maybe if all they do on it is Facebook and Youtube ¯\_(ツ)_/¯.
> Apple is in a prime position to make strides in language model development on their hardware.
Ok, I'll take the bait and disagree with your agreement.
Apple has a good thing going for now. However, their tightly coupled HW/OS and insular corporate culture limit their reach. LLM is a good example of this. They may have their own cooking in some secret room. Or not. What is the value=(reward/risk) of making something like an on-device LLM that Apple might do? Maybe Apple will acquire you or maybe you get Sherlocked.
As an example, Siri was an acquisition and neat trick, but it's barely improved since 2011 (I'm still a basically satisfied many x daily user). Although Apple has changed lots since 2011, it hasn't demonstrated an ability to execute something like meaningful integration and improvement of an LLM. Therefor I would not agree that "Apple is in a prime position to make strides in language model development on their hardware."
Unfortunately it seems like fiscal success in 2023 mostly means jumping on the hype train of the day and beating the drum for all it's worth, and being ready to pivot to the next thing quickly. Rather than maximizing actual value in your product.