They have customers already, one (Argonne National Labs) is given explicitly.
The issue with using ‘industry standard’ benchmarks is that it's like measuring a bus' efficiency by shuttling around a single person at a time. The CS-1 is just bigger than that; the workloads that it provides the most value on are ones that are sized to fit, and specifically built for the device.
This does make it hard to evaluate as outsiders (certainly for similar reasons I never liked Graphcore), but I don't think it means anything as grim as you say. The recipe fits.
They could always release figures for larger networks - they don't have to target Resnet50 (which is the MLPerf standard). I don't think anyone would hold it against them if they show massive improvements in something like GPT-2 training time (a network 37000x the size of Resnet)
That' sounds like horseshit to me. Very large public datasets and models are available to test training on a chip or system of any size. ImageNet is large enough for this. But if that's not sufficient, OpenImages is also available.
To me as a practitioner a meaningful metric would be "it trains an ImageNet classifier to e.g. 80% top1 in a minute". If it's not suitable for CNNs, do BERT or something else non-convolutional. Even better if I can replicate this result in a public cloud somewhere. They know this, and yet all we have is a single mention of a customer under an NDA and no public benchmarks of any kind, let alone any verifiable ones. If it did excel at those, we'd already know.
> Cerebras hasn’t released MLPerf results or any other independently verifiable apples-to-apples comparisons. Instead the company prefers to let customers try out the CS-1 using their own neural networks and data.
> This approach is not unusual, according to analysts. “Everybody runs their own models that they developed for their own business,” says Karl Freund, an AI analyst at Moor Insights. “That’s the only thing that matters to buyers.”
Sounds like instead of benchmarks, prospective customers get a chance to run a workload of their choice on the core before purchase. Assuming support is good, that's way better than looking at benchmarks, because you're guaranteed that the performances you're comparing are for workloads you care about.
The appropriately large models with public recognition I know of use attention, which is too memory-hungry to work effectively on the CS-1. The datasets aren't the issue.
I'm fine with skepticism. It's certainly plausible that they don't actually do all that well.
The issue with using ‘industry standard’ benchmarks is that it's like measuring a bus' efficiency by shuttling around a single person at a time. The CS-1 is just bigger than that; the workloads that it provides the most value on are ones that are sized to fit, and specifically built for the device.
This does make it hard to evaluate as outsiders (certainly for similar reasons I never liked Graphcore), but I don't think it means anything as grim as you say. The recipe fits.