Several years ago, Google began working on its own custom software for machine learning and artificial intelligence workloads, dubbed TensorFlow. Last year, the company announced that it had designed its own tensor processing unit (TPU), an ASIC designed for high throughput of low-precision arithmetic. Now, Google has released some performance data for their TPU and how it compares to Intel’s Haswell CPUs and Nvidia’s K80 (Kepler-based) data center dual GPU.
Before we dive into the data we need to talk about the workloads Google is discussing. All of Google’s benchmarks measure inference performance as opposed to initial neural network training. Nvidia has a graphic that summarizes the differences between the two:
Teaching a neural network what to recognize and how to recognize it is referred to as training, and these workloads are still typically run on CPUs or GPUs. Inference refers to the neural network’s ability to apply what it learned from training. Google makes it clear that it’s only interested in low-latency operations and that it’s imposed strict criteria for responsiveness on the benchmarks we’ll discuss below.
Google’s TPU design, benchmarks
The first part of Google’s paper discusses the various types of deep neural networks it deploys, the specific benchmarks it uses, and offers a diagram of the TPU’s physical layout, pictured below. The TPU is specifically designed for 8-bit integer workloads and prioritizes consistently low latency over raw…
click here to read more