SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE
Lecture #33
Series Editor: Margaret Martonosi, Princeton University
Series ISSN
Print 1935-3235 Electronic 1935-3243
Customizable Computing
Yu-Ting Chen, Jason Cong, Michael Gill, Glenn Reinman, and Bingjun Xiao
University of California, Los Angeles
SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE #33
ABSTRACT
Since the end of Dennard scaling in the early 2000s, improving the energy efficiency of computation has been the main concern of the research community and industry. The large energy efficiency gap between general-purpose processors and application-specific integrated circuits (ASICs) motivates the exploration of customizable architectures, where one can adapt the architecture to the workload. In this Synthesis lecture, we present an overview and introduction of the recent developments on energy-efficient customizable architectures, including customizable cores and accelerators, on-chip memory customization, and interconnect optimization. In addition to a discussion of the general techniques and classification of different approaches used in each area, we also highlight and illustrate some of the most successful design examples in each category and discuss their impact on performance and energy efficiency. We hope that this work captures the state-of-the-art research and development on customizable architectures and serves as a useful reference basis for further research, design, and implementation for large-scale deployment in future computing systems.
KEYWORDS
accelerator architectures, memory architecture, multiprocessor interconnection, parallel architectures, reconfigurable architectures, memory, green computing
Contents
2.1 Customizable System-On-Chip Design
2.1.2 On-Chip Memory Hierarchy
3.2 Dynamic Core Scaling and Defeaturing
3.4 Customized Instruction Set Extensions
3.4.1 Vector Instructions
3.4.2 Custom Compute Engines
3.4.3 Reconfigurable Instruction Sets
3.4.4 Compiler Support for Custom Instructions
4 Loosely Coupled Compute Engines
4.2 Loosely Coupled Accelerators
4.2.1 Wire-Speed Processor
4.2.2 Comparing Hardware and Software LCA Management
4.2.3 Utilizing LCAs
4.3 Accelerators using Field Programmable Gate Arrays
4.4 Coarse-Grain Reconfigurable Arrays
4.4.1 Static Mapping
4.4.2 Run-Time Mapping
4.4.3 CHARM
4.4.4 Using Composable Accelerators
5 On-Chip Memory Customization
5.1.1 Caches and Buffers (Scratchpads)
5.1.2 On-Chip Memory System Customizations
5.2 CPU Cache Customizations
5.2.1 Coarse-Grain Customization Strategies
5.2.2 Fine-Grain Customization Strategies
5.3 Buffers for Accelerator-Rich Architectures
5.3.1 Shared Buffer System Design for Accelerators
5.3.2 Customization of Buffers Inside an Accelerator
5.4 Providing Buffers in Caches for CPUs and Accelerators
5.4.1 Providing Software-Managed Scratchpads for CPUs
5.4.2 Providing Buffers for Accelerators
5.5 Caches with Disparate Memory Technologies
5.5.1 Coarse-Grain Customization Strategies
5.5.2 Fine-Grain Customization Strategies
6.2 Topology Customization
6.2.1 Application-Specific Topology Synthesis
6.2.2 Reconfigurable Shortcut Insertion
6.2.3 Partial Crossbar Synthesis and Reconfiguration
6.3 Routing Customization
6.3.1 Application-Aware Deadlock-Free Routing
6.3.2 Data Flow Synthesis
6.4 Customization Enabled by New Device/Circuit Technologies
6.4.1 Optical Interconnects
6.4.2 Radio-Frequency Interconnects
6.4.3 RRAM-Based Interconnects