JP Data LLC
101 Jefferson Dr, 1st Floor
Menlo Park CA 94025

Phone
(408) 623 9165

Email
info at jpdata dot co
sales at jpdata dot co

Ampere will merge inference and training workloads in data center and cloud

Note: This was published via Tractica in 2020.

At 2020 GTC keynote, Nvidia announced latest compute platform for enterprises, Ampere. Ampere is a giant step up from the previous generation and the white  paper gives quite a few details. While there are many impressive feats achieved in the chipset specs, few jump out and have long term potential to disrupt the AI chipset market.

The numbers that is changed drastically is the increase in inference performance in comparison to V100 . The inference compute at INT8 has increased by 10X over V100 to 624 TOPS via TensorCore (Tensor Cores perform matrix multiply and accumulate calculations for a given data format). V100 was not exactly optimized to run inference workloads and although it offered impressive 116 TensorFLOPS its integer performance was 62 TOPS. Inference rarely used floating point as a data format in production and hence V100 was restricted  to primarily training workloads. The change from 62 TOPS to 624 TOPS is 10X. In comparison, training performance has seen modest increase (although via different data format) from 125 TFLOPS to 156 TFLOPS.

Nvidia has created a separate, low cost product line for inference, with T4 being the state of the art. As the inference pipeline has more or less become standard at 8 bit integer data format, T4 was optimized for 8 bit inference with 120TOPS INT8 performance.

The biggest difference between the inference and training users is their willingness to pay for compute. While training system budget comes from R&D budget, inference systems fall under IT’s OpEx budge. The inference chipset user’s primary concern is the price per inference. If you consider T4 list price of $3000 per card, the price of single INT8 compute comes to $23/TOPs(see table below). While A100 price has not been officially announced, assuming that its’ similar to V100 (~12K), the  price per inference drops below T4 to $19/TOPS.

ChipsetPriceComputePrice/inference
T4$3000130 TOPS$23/TOPS
A100$12000624 TOPS$19/TOPS

So while V100 was never considered to be a solution for inference, moving forward, A100 will definitely stand out as a solution for inference. This would be valuable to companies that are looking to maximize their resource utilization in data center. If the resources are not utilized for training, they can allocate them for inference. Cloud companies and hyperscalers will particularly relish this as this would mean that the chipsets can be offered to a wider range of users to run both inference as well as training workloads thereby maximizing their revenue potential.  Nvidia’s virtualization software introduced with Ampere would facilitate this even further.

This doesn’t mean that T4 product line will end. It will most likely find home in another products, in particular on edge inference. The compute requirements for inference workloads on the edge are on the rise and T4 will fill in that gap. The emergence of edge cloud and 5G will also facilitate edge servers and workstations providing opportunities for T4 like chipsets. We expect that the edge workloads will primarily be inference on the edge till suitable training frameworks and applications emerge. Once that happens, T4 like chipsets will also need floating point data path for training.

At 2020 GTC keynote, Nvidia announced latest compute platform for enterprises, Ampere. Ampere is a giant step up from the previous generation and the white  paper gives quite a few details. While there are many impressive feats achieved in the chipset specs, few jump out and have long term potential to disrupt the AI chipset market.

The numbers that is changed drastically is the increase in inference performance in comparison to V100 . The inference compute at INT8 has increased by 10X over V100 to 624 TOPS via TensorCore (Tensor Cores perform matrix multiply and accumulate calculations for a given data format). V100 was not exactly optimized to run inference workloads and although it offered impressive 116 TensorFLOPS its integer performance was 62 TOPS. Inference rarely used floating point as a data format in production and hence V100 was restricted  to primarily training workloads. The change from 62 TOPS to 624 TOPS is 10X. In comparison, training performance has seen modest increase (although via different data format) from 125 TFLOPS to 156 TFLOPS.

Nvidia has created a separate, low cost product line for inference, with T4 being the state of the art. As the inference pipeline has more or less become standard at 8 bit integer data format, T4 was optimized for 8 bit inference with 120TOPS INT8 performance.

The biggest difference between the inference and training users is their willingness to pay for compute. While training system budget comes from R&D budget, inference systems fall under IT’s OpEx budge. The inference chipset user’s primary concern is the price per inference. If you consider T4 list price of $3000 per card, the price of single INT8 compute comes to $23/TOPs(see table below). While A100 price has not been officially announced, assuming that its’ similar to V100 (~12K), the  price per inference drops below T4 to $19/TOPS.

ChipsetPriceComputePrice/inference
T4$3000130 TOPS$23/TOPS
A100$12000624 TOPS$19/TOPS

So while V100 was never considered to be a solution for inference, moving forward, A100 will definitely stand out as a solution for inference. This would be valuable to companies that are looking to maximize their resource utilization in data center. If the resources are not utilized for training, they can allocate them for inference. Cloud companies and hyperscalers will particularly relish this as this would mean that the chipsets can be offered to a wider range of users to run both inference as well as training workloads thereby maximizing their revenue potential.  Nvidia’s virtualization software introduced with Ampere would facilitate this even further.

This doesn’t mean that T4 product line will end. It will most likely find home in another products, in particular on edge inference. The compute requirements for inference workloads on the edge are on the rise and T4 will fill in that gap. The emergence of edge cloud and 5G will also facilitate edge servers and workstations providing opportunities for T4 like chipsets. We expect that the edge workloads will primarily be inference on the edge till suitable training frameworks and applications emerge. Once that happens, T4 like chipsets will also need floating point data path for training.

The AI chipset market is large and the only constant in AI world is change. We have seen rapid advances in the past three years with going from 10 TFLOPS (P100) to almost 1 PetaOPS (rated capacity for A100 with sparse INT4). Enterprise market is realizing its needs for AI compute and  many product lines are now emerging for different applications.