6.4.3 Obtaining the Reuse Parameters
6.5 Eyexam: Framework for Evaluating Performance
6.5.1 Simple 1-D Convolution Example
6.5.2 Apply Performance Analysis Framework to 1-D Example
6.6 Tools for Map Space Exploration
PART III Co-Design of DNN Hardware and Algorithms
7.1 Benefits of Reduce Precision
7.2 Determining the Bit Width
7.2.1 Quantization
7.2.2 Standard Components of the Bit Width
7.3 Mixed Precision: Different Precision for Different Data Types
7.4 Varying Precision: Change Precision for Different Parts of the DNN
7.5 Binary Nets
7.6 Interplay Between Precision and Other Design Choices
7.7 Summary of Design Considerations for Reducing Precision
8.1 Sources of Sparsity
8.1.1 Activation Sparsity
8.1.2 Weight Sparsity
8.2 Compression
8.2.1 Tensor Terminology
8.2.2 Classification of Tensor Representations
8.2.3 Representation of Payloads
8.2.4 Representation Optimizations
8.2.5 Tensor Representation Notation
8.3 Sparse Dataflow
8.3.1 Exploiting Sparse Weights
8.3.2 Exploiting Sparse Activations
8.3.3 Exploiting Sparse Weights and Activations
8.3.4 Exploiting Sparsity in FC Layers
8.3.5 Summary of Sparse Dataflows
8.4 Summary
9 Designing Efficient DNN Models
9.1 Manual Network Design
9.1.1 Improving Efficiency of CONV Layers
9.1.2 Improving Efficiency of FC Layers
9.1.3 Improving Efficiency of Network Architecture After Training
9.2 Neural Architecture Search
9.2.1 Shrinking the Search Space
9.2.2 Improving the Optimization Algorithm
9.2.3 Accelerating the Performance Evaluation
9.2.4 Example of Neural Architecture Search
9.3 Knowledge Distillation
9.4 Design Considerations for Efficient DNN Models
10.1 Processing Near Memory
10.1.1 Embedded High-Density Memories
10.1.2 Stacked Memory (3-D Memory)
10.2 Processing in Memory
10.2.1 Non-Volatile Memories (NVM)
10.2.2 Static Random Access Memories (SRAM)
10.2.3 Dynamic Random Access Memories (DRAM)
10.2.4 Design Challenges
10.3 Processing in Sensor
10.4 Processing in the Optical Domain
Preface
Deep neural networks (DNNs) have become extraordinarily popular; however, they come at the cost of high computational complexity. As a result, there has been tremendous interest in enabling efficient processing of DNNs. The challenge of DNN acceleration is threefold:
• to achieve high performance and efficiency,
• to provide sufficient flexibility to cater to a wide and rapidly changing range of workloads, and
• to integrate well into existing software frameworks.
In order to understand the current state of art in addressing this challenge, this book aims to provide an overview of DNNs, the various tools for understanding their behavior, and the techniques being explored to efficiently accelerate their computation. It aims to explain foundational concepts and highlight key design considerations when building hardware for processing DNNs rather than trying to cover all possible design configurations, as this is not feasible given the fast pace of the field (see Figure 1). It is targeted at researchers and practitioners who are familiar with computer architecture who are interested in how to efficiently process DNNs or how to design DNN models that can be efficiently processed. We hope that this book will provide a structured introduction to readers who are new to the field, while also formalizing and organizing key concepts to provide insights that may spark new ideas for those who are already in the field.
Organization
This book is organized into three modules that each consist of several chapters. The first module aims to provide an overall background to the field of DNN and insight on characteristics of the DNN workload.
• Chapter 1 provides background on the context of why DNNs are important, their history, and their applications.
• Chapter 2 gives an overview of the basic components of DNNs and popular DNN models currently in use. It also describes the various resources used for DNN research and development. This includes discussion of the various software frameworks and the public datasets that are used for training and evaluation.
The second module focuses on the design of hardware for processing DNNs. It discusses various architecture design decisions depending on the degree of customization (from general purpose platforms to full custom hardware) and design considerations when mapping the DNN workloads onto these architectures. Both temporal and spatial architectures are considered.
Figure 1: It’s been observed that the number of ML publications are growing exponentially at