> “My belief is in the next 5 to 10 years, RISC-V will take over all the data ce...

brucehoult · on June 13, 2023

The first Graviton machines (which were worse than the current 64 core RISC-V SG2042) were installed in November 2018. Four and a half years later are now up to 20% or 25% of total AWS capacity. Even without any acceleration of adoption, it looks like they could be the majority of AWS within 10 years from first deployment.

There seems to be every reason RISC-V could do it faster. Graviton 1 level chips are available right now, Graviton 3 or better coming from multiple companies (including Keller's) in 2024-2025. The software is more critical, but once x86-only things have been made portable to Arm (which can be hard work), further porting them to RISC-V is much much easier.

cjsplat · on June 13, 2023

Cloud services have a lot of back end.

Cluster management, file systems, disk / storage systems, network management, database systems.

None of those require user or OS instruction set compatibility for legacy apps that are hard or impossible to recompile.

And most of these applications don't really require gonzo superscalar performance. Add more cores, support more data streams.

If you can eliminate licensing costs for that portion of your fleet, then you only need to expand the ISA compatible portion of your fleet as demanded by paying customers.

As an example, suppose all of a cloud provider's services can migrate to RISC-V. As organic demand for x86 Cloud among customers grows, services can shift incrementally to the cheaper home grown platforms. And since the freed up machines are at least partially depreciated, the cost of these servers is much less than what a customer would pay for new servers on prem. (depreciated Cap-ex, far better Op-Ex).

The interesting question is the transition rate of end customer apps to the new ISA vs the growth rate of locked ISA apps.

Eventually the locked ISA apps portion becomes a lot like the current IBM mainframe business. Very valuable to a very small number of customers.

The only counter for this is if x86 can crank performance per $TCO so far that the non-x86 branch can't compete in business terms, which has historically been the issue with ARM.

inkyoto · on June 14, 2023

> Cluster management, file systems, disk / storage systems, network management, database systems.

… and fully managed cloud services (serverless databases, API gateways, messaging components, supplementary services etc etc). Which is where the hardware replacement is likely most frantic but impossible to glean into without having the insider knowledge. And since the fully managed services are charged on the usage basis, it is possible to utilise the hardware more efficiently under the hood and completely transparently to the end user (if properly architected) as opposed to spinning up cloud VM's.

als0 · on June 13, 2023

> The only counter for this is if x86 can crank performance per $TCO so far that the non-x86 branch can't compete in business terms, which has historically been the issue with ARM.

If we take AWS for example, isn't the performance per TCO better of an Arm-based Graviton instance better than x86? I don't think the historical issue you cite represents the future.

cjsplat · on June 13, 2023

Impossible to know from the outside.

We know what they are selling it for, but that isn't the same.

True TCO needs to include the cost to develop the chip - after all, that is folded into the x86 price.

If you assume that the Graviton project is $250M per chip design for the 3 iterations, and the online estimates of 1 million chips is accurate, then you need to add about $750 per CPU, beyond the probably $250 per chip fab'ed and packaged.

$1000 per chip gets you a lot of x86 horsepower.

nl · on June 14, 2023

$250M seems a massive overestimate given that the majority of the design is done by ARM and licensed from them.

cjsplat · on June 14, 2023

For example see :

https://www.semianalysis.com/p/the-dark-side-of-the-semicond...

I think you are overestimating the value and amount of help from ARM, especially in the more recent Graviton generations, but of course I don't know Amazon's actual chip cost profile.

tambourine_man · on June 13, 2023

Yes, 5 years seems crazily optimistic to me.

But this is Jim Keller and I’m a random internet guy.

Varloom · on June 13, 2023

This man knows exactly what he is talking about. He was responsible for designing the original AMD Athlon 64. He worked at apple to transition from Generic Samsung ARM SoC to their own Apple silicon which is the base for modern M1 Apple silicon. He worked for Intel (we'll see his work in Lunar Lake, Jim Keller's Royal Core Project). And most importantly he worked again at AMD and gave us Zen architecture.

throwaway4good · on June 13, 2023

I find a completely silly claim but I guess as a CEO you need to talk your book.

Here is another interview with Jim Keller where he explains why instruction sets doesn't matter that much:

https://www.anandtech.com/show/16762/an-anandtech-interview-...

ladberg · on June 13, 2023

That's not exactly my takeaway, e.g. he says this in that interview which is pretty consistent:

> So if I was just going to say if I want to build a computer really fast today, and I want it to go fast, RISC-V is the easiest one to choose. It’s the simplest one, it has got all the right features, it has got the right top eight instructions that you actually need to optimize for, and it doesn't have too much junk.

throwaway4good · on June 14, 2023

Yes. Because it has less legacy, it is easier to build for if you are going for on scratch. The point is most code is the same handful of instructions.

twoodfin · on June 13, 2023

My optimistic take: “All the data centers” are the Cloud hyperscalers, who are increasingly delivering value through PaaS/SaaS vs. raw VMs and IaaS.

They’re choosing the CPUs they like best, can turn over quickly if it’s worthwhile, and if the performance/economics of RISC-V are suitably appealing will do so.

I wonder how much of, say, S3’s infrastructure is running on Graviton?

guerby · on June 13, 2023

I hope RISC-V servers will come with open BIOS, given Ron Minnich stance on the proprietary BIOS issue during his time at google I think hyperscalers would like that too.

Recently had basic BIOS/BMC bugs it's annoying as hell.

syntheweave · on June 13, 2023

Data centers are one of the best demographics for adopting new architectures because more of the software can be custom-built towards a narrow application: Get a Linux stack to build, add some network I/O, add some virtualization, and you can do all sorts of things.

Client apps have a much harder time making that jump because the environment is more holistic, the hardware more varied, and the need for "must-have" proprietary apps more imperative.

inkyoto · on June 14, 2023

> Data centers are one of the best demographics […]

Hardly. Data centers are a dying breed, and their number has been rapidly dwindling in recent years. DC (and the mythical «on-prem» by extension) has effectively become a dirty word in contemporary times. The most brutal lift-and-shift approach (without discussing the merits of doing so) is most common: create a backup, spin up a cloud «VM», restore from the backup and turn the server in the DC off forever. No-one is going to even remotely consider a new hardware architecture, not even in the cloud.

Moreover, since servers do not exist in a vacuum and either run business apps or do something at least remotely useful, that entails the software migration to adopt the new platform. And the adoption has to be force-pushed onto the app developers otherwise they won't bother, and for them to convert/migrate the app onto a new architecture, they need desktops/laptops that run on the new ISA, and no viable server and desktop hardware exists in June 2023 – it will come along later with «later» not having a clear definition. Talking open source is a moot point as most businesses out there run commercially procured business apps.

xarope · on June 14, 2023

Data centers in general are NOT a dying breed, and it's more a case of rapidly growing, not dwindling. Perhaps you are referring to individual companies moving to the cloud, and colo type activity (albeit institutions with strict regulation may still require a backup colo) dwindling?

However, the cloud resource providers are definitely growing (https://www.statista.com/outlook/tmo/data-center/worldwide#r...), and there is a huge push for more power and heat efficient architecture, whether on the server/network/supporting infrastructure side.

intvocoder · on June 14, 2023

This doesn’t seem to comport with Amazon’s experience, investment, and trajectory with Graviton, based on public reference customers and a few personal anecdotes.

Ampere isn’t a slouch either…

otabdeveloper4 · on June 14, 2023

AWS and Azure are datacenters too. You're paying a x10 markup to make accounting shenanigans easier, but the technology is exactly the same.

inkyoto · on June 14, 2023

> AWS and Azure are datacenters too.

They are, but they are not data centers in the traditional sense of the term. The GP was referring to the traditional data centers as far as I understand.

> You're paying a x10 markup to make accounting shenanigans easier,

Whilst cloud platforms do allow one to accrue an eye-watering cloud bill by virtue of shooting oneself with a double-barelled gun, the fault is always on the user and the «10x markup» is complete bonkers and is a fiction.

As an isolated random example, API gateway in AWS serving 100 million requests 32 kB each with at least 99.95% SLA will cost US$100 a month. AWS EventBridge for the same 100 million monthly events with at least 99.99% availability will also cost US$100 a month.

That is US$200 in total monthly for a couple of the most critical components of a modern data processing backbone that scales out nearly indefinitely, requires no maintenance nor manual supervision and is always patched up security wise and is shielded from DDoS attacks. Compared to the same SLA, scalability and opex costs in a traditional data centre, they are a steal. Again, we are talking about at least 99.95% and 99.99% SLA for each service.

If one uses the cloud to spin up cloud VM's and databases that run 24x7 and result in an averags 10% monthly CPU utilisation, they are cooking the cloud wrong, they are wasting their own money and they are the only ones to blame the 10x markup that is a delusion caused by ignorance.

> but the technology is exactly the same.

The underlying technology might be the same, but is abstracted from the user who can no longer care about it, use a service and pay for the actual usage only. The platform optimises the resource utilisation and distribution automatically. That is the value proposition of the cloud today and not 15 years ago.

otabdeveloper4 · on June 14, 2023

> They are, but they are not data centers in the traditional sense of the term.

They are, 100%, without a doubt, absolutely, data centers in the traditional sense of the term.

> ...the «10x markup» is complete bonkers and is a fiction.

Go compare prices of e.g. Hetzner or OVA and come back to me again with that "fiction".

> 100 million monthly events

That's only about 35 events per second. Hosting over at Hetzner will cost you maybe $25 a month. So yeah, the x10 markup is real.

inkyoto · on June 14, 2023

> Go compare prices of e.g. Hetzner or OVA and come back to me again with that "fiction".

I have given two real examples of two real and highly useful fully managed services with made-up data volumes along with their respective costs. Feel free to demonstrate which managed services API gateway and pub/sub services Hetzner or OVA have to offer that come close or the same, functionality and SLA wise, – to compare.

> That's only about 35 events per second.

Irrelevant. I am not running a NASDAQ clone, and most businesses do not come anywhere close to generating 35 events per second anyway. If I happen to have a higher event rate, the service will scale for me without me lifting a finger. Whereas if a server hosted in a data centre has been underprovisioned, it will require a full time ops engineer to reprovision it, set it up and potentially restore from a backup. That entails resource planning (a human must be available) and time spent on doing it. None of that is free, especially operations.

> […] Hosting over at Hetzner will cost you maybe $25 a month.

It is the «maybe» component that invalidates the claim. Other than «go and compare it yourself» and hand-waving, I have seen slightly less than zero evidence as a counter-argument so far.

Most importantly, I am not interested in hosting and daily operations, whereas the business is interested in a working solution, and the business wants it quickly. Hosting and tinkering with, you know, stuff and trinkets on a Linux box is an antithesis of the fast delivery.

The vast majority of servers in data centers idle by most of the time anyway consuming electricity and generating pollution for no-one's gain so the argument is moot.

otabdeveloper4 · on June 15, 2023

> "Hosting and tinkering"

It isn't 1992 anymore, people don't "tinker", they have orchestration in 2023.

The orchestration tools for self-hosted are cheaper, more standard and more reliable. (Because Amazon's and Google's stuff is actually built on top of standard stacks, except with extra corporate stupidity added.)

Regardless of whether you use something industry standard or something proprietary, you will need to have an ops team that knows orchestration. (And an AWS orchestration team will be more expensive, because, again, their stuff is non-standard and proprietary.)

There are reasons for using AWS, but cost or time to market is never one of them.

Osiris · on June 13, 2023

Considering modern processors spend 4-5 years in development before public release, someone would have to be building the game changing RISC-V CPU right now.

Maybe he meant that development on RISC-V CPUs would start in earnest in the next 5-10 years?

snvzz · on June 13, 2023

>Considering modern processors spend 4-5 years in development before public release, someone would have to be building the game changing RISC-V CPU right now.

And they are.

Tenstorrent is working on Ascalon. Wei-han Lien (lead architect of M1 at Apple) is the lead architect. Ascalon is a RISC-V microarchitecture expected to be released in 2024, with similar performance to projected AMD Zen5 (also 2024), but lower power consumption.

Ventana Veyron is due late 2023. A very high performance server chip AIUI implementing RVA22+V.

Rivos has been working on something RISC-V, with a very strong team, for several years now.

SiFive's next iteration of high performance CPUs is expected to be strong.

Alibaba group has something in the works, too.

And this is all just the tip of the iceberg. There's way more known projects ongoing, and even more that we do not know of.

WatchDog · on June 14, 2023

The reason that processors take so long to develop, might have something to do with the complexity of the ISA, most of CPU development effort is spent on verification, if you have a simpler ISA, I would imagine that it makes verification easier.

snvzz · on June 14, 2023

Furthermore, for RISC-V there's no need to reinvent the wheel re: verification.

Multiple companies are offering RISC-V verification services.

pier25 · on June 13, 2023

Who knows. There's a huge financial motivation to move to RISC-V to save on energy and heat. Cooling is one of the biggest expenses of a data center.

blueblob · on June 13, 2023

Is RISC-V any more efficient than arm?

pier25 · on June 13, 2023

Probably similar but the advantage of RISC-V is that it's open source.

AFAIK x86 is still dominating the data centers. There's a bit of ARM going on but wouldn't it make more sense to switch straight to RISC-V?

blueblob · on June 13, 2023

Yeah, an open source architecture sounds like it could be a game changer for the SBC market

otabdeveloper4 · on June 14, 2023

No, because aarch64 linux distros only became a viable thing just recently.

Who knows how many decades it will take for e.g. Debian to support RISC-V properly.

blueblob · on June 16, 2023

I didn't downvote you; I think you make a reasonable, albeit exaggerated, point. I think it's also important to look through the lens that a lot of things linux supports are reverse engineered and that's why they take a long time to implement. This is de facto different with everything being open so I expect support will come faster. There's also the fact that this aligns better with the ideologies of a lot of free software enthusiasts so they may be more likely to work on it.

imtringued · on June 13, 2023

All the Chinese datacenters.

doctor_eval · on June 13, 2023

I read this and immediately thought of that aphorism, which I heard back in the dialup days when it seemed the internet was never going to take off:

Important changes always take much longer than we expect, but have much greater impact than we imagine.

snvzz · on June 13, 2023

>Isn't this essentially impossible?

RISC-V is inevitable.

jylam · on June 13, 2023

Why ?

klelatti · on June 13, 2023

My interpretation of 'take over' would be a majority of new server installs would be RISC-V based. There is a lead time for development, orders etc plus customers have to be content to switch to a new architecture. Amazon's Arm program started what 6ish years ago and they are at 20% installs (from my recollection).

brucehoult · on June 14, 2023

4 1/2 years. November 2018. 20% in 2022, quite possibly 25% by now.