Five and a half years ago, one DGX-2 would be in the top 10 supercomputers in the world[1], and you'll probably be able to rent one on EC2 for under twenty bucks an hour before the year's out. You can already get the DGX-1 for under ten bucks an hour right now.
It falls around 220th in the November 2012 list, and would make it onto the top 10 for the November 2007 list (comparing Rpeak) and the June 2008 list (comparing Rmax).
Systems of comparable performance in the top 10 used between 300-500 kW (30-50x the DGX2).
Red Storm, which has a listed Rpeak of 127 TFLOPs and placed sixth in the November 2007 TOP500, was "relatively inexpensive," costing only in the ballpark of $75M (~180x the price of the DGX2).
There is no published price and the difference is the new Quadro supports Nvidia's OptiX ray-tracing API, as well as a very moderate performance boost over the Tesla V100 card.
While it is eye-popping, you would be correct in saying this is pretty much just a relaunch under the Quadro brand.
Does anyone know when graphics cards will be available at sane prices for people who actually want to use them one (or two max) at a time to render graphics?
The bad news is that a lot of system builders like us (we build pre-built systems pre-installed with Deep Learning software http://lambdalabs.com) are now used to paying more than MSRP for our GPUs. MSRP for a 1080Ti is supposed to be $699. They haven't been available even in large bulk purchases at that price for a while now. I don't see it going back down any time soon.
It's really hard to come up with a timeline, but I'd say the worst has passed. It will take time to recover inventory, but the alt-coin market (basically, not-Bitcoin cryptocurrencies) has died down a lot and the rush to acquire mining capital has likely diminished.
If Ethereum does move over to proof-of-stake, then the market will be flooded with 1080Tis. I expect the DeepLearning11 server will then be a goldmine for DL researchers - cheap as chips.
I can't say when, but you might be happy to know that graphics card availability has been improving throughout March, albeit at expensive, but no longer insane prices. I was monitoring availability of various AMD and Nvidia cards since January (using nowinstock.com): previously cards would be available for a few hours at a time at an outlet - now we're up to weeks and the prices have been going down a bit - they are still above MRSP. The the ongoing cryptocoin downward spiral persists for a few more months, GPU prices ought to come down.
Almost all of the Nvidia GTX cards are back in stock on Amazon (for Prime shipping) and are relatively close to MSRP. (Edit: You will still have to sort through overpriced ones)
Yeah depends on what you mean by close. I see the cheapest 1080ti at newegg right now for $909. Bought mine a year ago (founders edition) for $699. It is getting better but we are still far away from sanity. IMHO this (year old) card should sell today for ~$500 if things were to be normal.
They understood really early on that you need to invest in software just as much as you would need to invest in hardware if not more.
Open standards tend to be a horse designed by a committee, it can take years for them to evolve and to reach any consensus and they would never be able to match the speed in which hardware can evolve and adapt to market requirements.
So NVIDIA essentially made their own software ecosystem which can be just as flexible as their hardware and more importantly it allows NVIDIA to be proactive rather than reactive.
To repeat the other comments, right now CuDNN is the advantage - that is manifested as TensorFlow/Keras/PyTorch. AMD have RocM and in these benchmarks it was like 10X or more slower training for CNNs that P100s - https://www.pcper.com/reviews/Graphics-Cards/NVIDIA-TITAN-V-...
How sustainable is the advantage? Not that big - you don't need cuda compatability like hiptensorflow tried in a classic shortcuts-dont-work way. Just an alternative CuDNN for Vega that is integrated in TensorFlow distributed binaries.
Yeah, pretty much. They are claiming the future is specialization in compute. Tensorcores are an attempt at this - they should more accurately be called "4x4 matrix multiply in half precision with a full-precision add". That is the instruction they support. Unless you are training a very deep CNN with a lot of 4x4 kernels, they won't give you much of a speedup. You're left then with the memory-GPU b/w. For the 1080Ti, that's 484GB/s and it's about 950GB/s for the V100. So about twice the training performance for 10 times the price. Not a good deal, IMO.
When you compare that with the top Vega cards, it's a complete rip-off - but Vega cards are currently useless for deep learning. I'm not convinced AMD have put the money in to build the libraries, so we're stuck with AMD for a while yet (unless Intel by some miracle pull one out with its new platform).
NVLink is good if you are doing distributed training and sharing large amounts of gradients (very large models). If you're doing parallel experiments or your models are not huge, then PCI-e (16 GB/s) is generally good enough.
AMD don't have anything close that i am aware of, but their new Ryzen board will have double the number of PCI lanes, which makes PCI-e competitive again -
https://news.ycombinator.com/item?id=14450924
NVidia definitely does have an edge in GPU hardware. AMD Vega is roughly tied with NVidia Pascal in performance, but Vega consumes more power and came out two years later.
NVIDIA may have an edge in GPU hardware, but probably not in GPU hardware architecture. AMD is contractually required to use GF process which is particularly bad in the current generation, but will be competitive in the next generation. My understanding is that process difference accounts for most of edge NVIDIA has.
They were clever to understand that developers wanted the freedom to code GPGPU in C, C++, Fortran plus any other language able to target their bytecode (PTX) instead of being bound programming in crufty C.
Then they created nice numeric libraries and graphical debuggers for GPU programming.
Their new Volta GPUs were explicitly designed to be developed in C++ (there are a few talks about it).
When Khronos woke up for the idea that maybe they should support something else other than C, invoking compilers and linkers during runtime, with OpenCL 2.0, already most developers were deeply invested into CUDA.
One note on this. Their graphical debuggers aren't user friendly at all. And the Eclipse platform "Nsight" is a pile of garbage. I haven't used the version in visual studio, but if it isn't way way better, then people might as well stick to vim and cuda-GDB... or just put printf statements everywhere like most of us do.
The really need to develop a purpose built IDE or work on their integration way more.
They don't really have an edge today. They just achieved big lock-in, and inertia of those who depend on CUDA now prevents them from using other hardware.
Even if you assume APIs aren't copyrightable, I don't think anyone cared to implement CUDA itself except for Nvidia. There is really no point in proliferating such APIs, since there are OpenCL / Vulkan already which are open to begin with.
What someone could try implementing though, is translating CUDA into OpenCL (if that's possible). That would be useful to break lock-in.
What you miss is there is a reason why OpenCL didn't get traction and lack of tools translating CUDA to OpenCL is not one of them. OpenCL 2.0+ is lot better than previous versions but it is too little to late.
Legacy CUDA software would exist if OpenCL ecosystem was better. This is not the case. Code translation from CUDA to OpenCL is solution looking for a problem.
Ecosystem depends on CUDA, it doens't care where it's translated to, no? So it would work with translation, until it's properly rewritten to use open APIs. It's a solution for lock-in that limits your hardware choices, which is a problem. You don't need to look for the problem, it's pretty obvious.
> I thought the result of Oracle v. Google is that APIs can’t be copyrighted?
The result—not final yet—is that the Federal Circuit ruled APIs are eligible for copyright, but that ruling didn't create binding precedent that applies outside the Oracle v Google case. So future cases are still likely to produce the result that APIs can't be copyrighted, unless those cases also include the patent claims necessary to get them into the Federal Circuit for appeals.
I think people are underestimating the difficulty of developing high performance microarchitecture for GPU or CPU.
A new clean sheet design architecture takes 5-7 years even for teams that have been doing it constantly for decades in places like Intel, AMD, ARM or Nvidia. This includes optimizing the design into process technology, yield, etc. Then there is economies of scale and price points.
Recent examples:
* Nvidia's Volta microarchitecture design started 2013, launch was December 2017
* AMD's zen CPU architecture design started 2012 and CPU was out 2017.
in deep learning, they presented high quality hand optimized building blocks way before anybody else did (CuDNN). An effect of that is that the libraries were built around cuda and cudnn, and now amd is still trying to catch up. Intel just hasn't delivered a fast enough, flexible enough, cheap enough gpu or gpu alternative afaik
Sheeeit. I’d love some of the stuff they’re smoking. You can build a 100 GPU rig for half as much. Just scatter it around the office so it’s not “in the data center”.
Well, those who were willing to pay the ~$150k for the DGX-1 (Volta-16Gb upgraded IIRC), won't necessarily find it too much -- and NVIDIA is after quick money before the markets start slipping down the hyper curve.
You also do get quite some meat compared to the DGX-1: the equivalent of 2x DGX-1 in terms of GPUs and NICs with 4x HBM2 size, plus the NVlink fabric. Plus a more SSD storage, Xeon Platinums to round it off.
Oh, and 350 lbs, let's not forget. :D
Expensive, yes! Will it sell? I'd be very surprised if it didn't!
Not only that, but the switch has the exact same throughput as the actual memory of the V100 until you get to fairly large block sizes. It was shown in a presentation.
Hahha, good one. Just that by the time that one dev implements everything, you'll be overtaken by the competition -- at least that's what everyone's fear is, and it's partly warranted.
Many problems are bandwidth-limited. More cores may indeed actually hurt performance as contention for the limiting resource makes it increasingly hard for any core to run efficiently.
Do you think they actually mean 900Gb/s (Gbit) here? 18 ports of todays ~50Gb/s serdes would give 900Gb/s. And that'd be twice the bandwidth of their Nvlink 2.0 stuff (~25Gb/s), which would seem a reasonable evolution.
Edit: Turns out that a single Nvlink 3.0 port is 8-bits wide and since 144 50Gb/s serdes on a (big) chip is perfectly doable - 900GB/s must be correct.
You didn't work back in the day when high end products where if you had to ask how much you couldn't afford it.
In 1980 or so my office mate was working on a way to measure the efficiency of different toilet designs - to try and save water.
I looked at using the then very new use of image recognition hardware (and I think neural nets) but sadly realised that with a base price of hardware at £250,000 around a million in todays money it wasn't practical.
Which was a pity as the organisation was into ml/ai back then and even hired one of the first knowledge engineers (as they wher called back then).
The DGX-2 server (16x V100) costs the same as about 26 DeepLearning11 servers (10x 1080Ti) -
https://www.servethehome.com/deeplearning11-10x-nvidia-gtx-1...
With 260 1080Ti GPUs, you can do neural architecture search that competes with some published work by Google.
You are again missing the benefits of the switch. That factors into the cost at some point. You will not get the same scaling as adding an equivalent number of gpus and separate systems that you will get with the same number and a switch in between.
Too bad this came from the one of the worst companies in the world, according to the policy towards open source. Too bad Google playing on their side by do not merging OpenCL support in TensorFlow.
More DGX-2 information - https://www.anandtech.com/show/12587/nvidias-dgx2-sixteen-v1...
Quadro GV100 announced - https://www.anandtech.com/show/12579/big-volta-comes-to-quad...
Tesla V100 memory bumped to 32GB - https://www.anandtech.com/show/12576/nvidia-bumps-all-tesla-...