They have customers already, one (Argonne National Labs) is given explicitly. Th...

typon · on Jan 3, 2020

They could always release figures for larger networks - they don't have to target Resnet50 (which is the MLPerf standard). I don't think anyone would hold it against them if they show massive improvements in something like GPT-2 training time (a network 37000x the size of Resnet)

Veedrac · on Jan 3, 2020

GPT-2 uses attention, which is very memory hungry to train, so probably won't work well. But I agree with your overall point.

m0zg · on Jan 2, 2020

That' sounds like horseshit to me. Very large public datasets and models are available to test training on a chip or system of any size. ImageNet is large enough for this. But if that's not sufficient, OpenImages is also available.

To me as a practitioner a meaningful metric would be "it trains an ImageNet classifier to e.g. 80% top1 in a minute". If it's not suitable for CNNs, do BERT or something else non-convolutional. Even better if I can replicate this result in a public cloud somewhere. They know this, and yet all we have is a single mention of a customer under an NDA and no public benchmarks of any kind, let alone any verifiable ones. If it did excel at those, we'd already know.

tynpeddler · on Jan 2, 2020

> Cerebras hasn’t released MLPerf results or any other independently verifiable apples-to-apples comparisons. Instead the company prefers to let customers try out the CS-1 using their own neural networks and data.

> This approach is not unusual, according to analysts. “Everybody runs their own models that they developed for their own business,” says Karl Freund, an AI analyst at Moor Insights. “That’s the only thing that matters to buyers.”

Sounds like instead of benchmarks, prospective customers get a chance to run a workload of their choice on the core before purchase. Assuming support is good, that's way better than looking at benchmarks, because you're guaranteed that the performances you're comparing are for workloads you care about.

Veedrac · on Jan 2, 2020

The appropriately large models with public recognition I know of use attention, which is too memory-hungry to work effectively on the CS-1. The datasets aren't the issue.

I'm fine with skepticism. It's certainly plausible that they don't actually do all that well.