Few people, Nvidia’s competitors included, would dispute that fact Nvidia call the shots in AI chip play today. That the announcement of the new Ampere AI chip in Nvidia’s main event, GTC, stole the limelight last week.
There has been ample coverage, including here ZDNet. Tiernan Ray provided an in-depth analysis of the new and remarkable in terms of the chip architecture itself. Andrew Brust focused on the software side of things, extend Nvidia’s support to Apache sparkles, one of the most successful open-source frameworks for computer engineering, analysis and machine learning.
Let’s pick up from where they left off and put the new architecture into perspective by comparing it to the competition in terms of performance, economy and software.
Nvidia’s double bottom line
The core of Ray’s analysis is to capture Nvidia’s intention with the new generation of chips: to create a chip family that can serve both “training” neural networks, where the neural network’s operation is first developed on a set of examples, and also for inference, the phase in which predictions are made based on new incoming data.
Ray notes that this is a departure from today’s situation where different Nvidia chips show up in different computer systems for either training or inference. He further adds that Nvidia hopes to make a financial argument to AI stores that it is best to buy an Nvidia-based system that can perform both tasks.
“You get all that overhead of extra memory, CPUs and power supplies on 56 servers … collapsing into one,” said Nvidia CEO Jensen Huang. “The economic value proposition is really off the charts and that’s the thing that’s really exciting.”
Jonah Alben, Nvidia’s senior VP of GPU Engineering, told analysts that Nvidia had already pushed Volta, Nvidia’s previous generation chip, as far as it could go without a fire. It went even further with Ampere, which contains 54 billion transistors and can perform 5 petaflops of performance, or about 20 times more than Volta.
So Nvidia is following a double bottom line: Better performance and better economy. Let’s remember recently, Nvidia also added support for Arm CPUs. Although the Arm processor performance may not be at par with Intel at this time, its sparse power requirements also make them an attractive option for the data center, according to analysts.
On the software front, in addition to Apache Spark support, Nvidia also revealed Jarvis, a new application framework for building conversation AI services. In order to offer interactive, personalized experiences, Nvidia notes, companies need to train their language-based applications on data specific to their own product offerings and customer needs.
However, building a service from scratch requires deep AI expertise, large amounts of data, and computing resources to train the models and software to regularly update models with new data. Jarvis aims to tackle these challenges by offering an end-to-end deep learning pipeline for conversation AI.
Jarvis includes advanced deep learning models that can be further refined using Nvidia NeMo, optimized for inference using TensorRT, and implemented in the cloud and on the edge using Helm charts available on NGC, Nvidia’s GPU optimized software catalog.
Intel and GraphCore: high-profile challengers
Working backwards, that’s something we’ve noticed time and time again for Nvidia: Its lead isn’t just in hardware. In fact, Nvidia’s software and partner ecosystem is possibly the hardest part for the competition to match. However, competition also creates movements. Some competitors may challenge Nvidia on finances, others on performance. Let’s see what the challengers are up to.
Intel has been working on its Nervana technology for a while. In late 2019, Intel made waves when it did acquired startup Habana Labs for $ 2 billion. Es analyst Karl Freund notes, after the acquisition, Intel has been working to shift its AI acceleration from Nervana technology to Havana Labs.
Freund also highlights the importance of the software stack. He notes that Intel’s AI software stack is nothing more than Nvidia’s, layered to provide support (through abstraction) of a wide variety of chips, including Xeon, Nervana, Movidius and even Nvidia GPUs. Habana Labs has two separate AI chips, Gaudi for training and Goya for inference.
Intel is betting that Gaudi and Goya can match Nvidia’s chips. That MLPerf inference benchmark results published last year were positive for Goya. However, we will have to wait and see how it goes against Nvidia’s Amps and Nvidia’s ever-evolving software stack.
Another high profile challenger is GraphCore. The UK-based AI chip maker has one architecture designed from the ground up for high performance and unicorn status. GraphCore has also kept busy expanding its market footprint and working on its software.
From Dell servers to Microsoft Azure cloud and Baidu’s PaddlePaddle hardware ecosystem, GraphCore has a number of essential offers in place. GraphCore has also worked on its own software stack, Poplar. In the last month Poplar has seen one new version and a new analysis tool.
If Intel has a lot to catch up on, it certainly applies to GraphCore as well. However, both vendors appear to be on a similar suit. The goal of innovating at the hardware level and hoping to be able to challenge Nvidia with a new and radically different approach, custom-built for AI workloads. At the same time, they are working on their software stack and building their market presence.
Fractionation of AI hardware with a software solution from Run: AI
Last but not least, there are a few challengers who are less tall and have a different approach. Startup Run: AI recently completed stealth mode, with the announcement of $ 13 million in funding for what sounds like an unorthodox solution: Instead of offering another AI chip, Run: AI offers a software layer to accelerate workload execution in machine learning, on-site, and in the cloud.
The company works closely with AWS and is a VMware technology partner. Its core value proposition is to act as a management platform to bridge the gap between the various AI workloads and the various hardware chips and run a really efficient and fast AI computer platform.
Driving: AI recently revealed its fractional GPU sharing for Kubernetes deep learning loads. The goal of lightweight AI tasks at scale like inference gives the fractional GPU system data science and AI engineering teams the ability to run multiple workloads simultaneously on a single GPU, thus lowering costs.
Omri Geller, Run: AI co-founder and CEO told ZDNet that Nvidia’s announcement to “fractionate” the GPU or run separate jobs within a single GPU is revolutionary for GPU hardware. Geller said it has seen many customers with this need, especially for inference workloads: Why use a full GPU for a job that doesn’t require a full GPU calculation and memory?
“However, we believe this is more easily managed in the software stack than at the hardware level and the reason is flexibility. While hardware cutting creates ‘smaller GPUs’ with a static amount of memory and computer cores, software solutions allow for the division of GPUs in any number of smaller GPUs, each with a selected memory footprint and computational power.
In addition, fractionation with a software solution is possible with any GPU or AI accelerator, not just Ampere servers – which improves TCO for all the company’s computational resources, not just the latest. This is actually what Run: AI’s fractional GPU feature enables. “
An accessibility layer for FPGAs with InAccel
InAccel is a Greek startup, built around the premise of providing an FPGA manager that allows the distributed acceleration of large datasets across clusters of FPGA resources using simple programming models. Founder and CEO Chris Kachris told ZDNet there are several arguments regarding the benefits of FPGAs versus GPUs, especially for AI workloads
Kachris noted that FPGAs can provide better energy efficiency (output / watt) in some cases, and they can also achieve lower latency compared to GPUs for deep neural networks (DNNs). For DNNs, Kachris went on to add, FPGAs can achieve high throughput using low batch size, resulting in much lower latency. In applications where latency and energy efficiency are critical, FPGAs can prevail.
However, scalable implementation of FPGA clusters remains challenging, and this is the problem that InAccel is out to solve. Its solutions aim to provide scalable implementation of FPGA clusters and prove the lack of abstraction – OS-like layers for the FPGA world. InAccels Orchestrator enables easy distribution, instant scaling and automatic resource management of FPGA clusters.
Kachris compared InAccel to VMware / Kubernetes or Run.ai / Bitfusion for the FPGA world. He also claimed that InAccel makes FPGA easier for software developers. He also noted that FPGA vendors such as Intel and Xilinx have recognized the importance of a strong ecosystem and formed strong alliances to help expand their ecosystem:
“It seems that cloud providers will have to provide a diverse and heterogeneous infrastructure as different platforms have pros and cons. Most of these providers provide fully heterogeneous resources (CPUS, GPUS, FPGAs and dedicated accelerators) so users can choose the optimal resource.
Several cloud providers, such as AWS and Alibaba, have begun to implement FPGAs because they see the potential benefits. However, FPGA deployment is still challenging as users need to know the FPGA tool flow. We enable software developers to get all the benefits of FPGAs using well-known PaaS and SaaS models and high-level frameworks (Spark, Skcikit-learning, Keras), making FPGAs implementation in the cloud much easier. “
Hedge your bets
It takes more than fast chips to be a leader in this field. Economics is one aspect that potential users need to consider, ecosystem and software are another. Taking everything into account, it seems that Nvidia is still ahead of the competition.
However, it is also interesting to note that this is starting to look less and less like a monoculture. Innovation comes from different places and in different shapes and forms. This is something that Nvidia’s Albums also acknowledged. And this is certainly something that cloud providers, server vendors, and application builders seem to be aware of.
Uncovering one’s bet in the AI chip market can be the smart thing to do.