Liliana Andrade

Multi-Processor System-on-Chip 1


Скачать книгу

host CPU and the GPGPU relies on a software stack that results in long and hard-to-predict latencies (Cavicchioli et al. 2019).

      2.2.2. Machine learning inference

Schematic illustration of operation of a Volta tensor core.

      Figure 2.3. Operation of a Volta tensor core (NVIDIA 2020)

      Machine learning computations normally rely on FP32 arithmetic; however, significant savings in memory footprint and increases in performance/efficiency can be achieved by using 16-bit representations for training and 8-bit representations for inference with acceptable precision loss. The main 16-bit formats are FP16 and BF16, which is FP32 with 16 mantissa bits truncated (Intel 2018), and INT16 that covers the 16-bit integer and fixed-point representations (Figure 2.4a). Those reduced bit-width formats are, in fact, used as multiplication operands in linear operations, whose results are still accumulated in FP32, INT32 or larger fixed-point representations.

      While mainstream uses of 8-bit formats in convolutional network inference are signed or unsigned integers (Jacob et al. 2018; Krishnamoorthi 2018), floating-point formats smaller than 16-bit are also investigated. Their purpose is to eliminate the complexities associated with small integer quantization: fake quantization, where weights and activations are quantized and dequantized in succession during both the forward and backward passes of training; and post-training calibration, where the histogram of activations is collected on a representative dataset to adjust the saturation thresholds. Microsoft introduced the Msfp8 data format (Chung et al. 2018), which is FP16 truncated to 8 bits, with only 2 bits of mantissa left, along with its extension Msfp9. Among the reduced bit-width floating-point formats, however, the Posit8 representations generate the most interest (Carmichael et al. 2019).

Schematic illustration of numerical formats used in deep learning inference.

      Figure 2.4. Numerical formats used in deep learning inference (adapted from Gustafson (2017) and Rodriguez et al. (2018))

      2.2.3. Application requirements

      In the case of automated driving applications (Figure 2.5), the perception and the path planning functions require programmability, high performances and energy efficiency, which leads to the use of multi-core or GPGPU many-core processors. Multi-core processing entails significant execution resource sharing on the memory hierarchy, which negatively impacts time predictability (Wilhelm and Reineke 2012). Even with a predictable execution model (Forsberg et al. 2017), the functional safety of perception and path planning functions may only reach ISO 26262 ASIL-B. Conversely, vehicle control algorithms, as well as sensor and actuator management, must be delegated to electronic control units that are specifically designed to host ASIL-D functions.

Schematic illustration of autoware automated driving system functions.

      Figure 2.5. Autoware automated driving system functions (CNX 2019)

Schematic illustration of application domains and partitions on the MPPA3 processor.

      Figure 2.6. Application domains and partitions on the MPPA3 processor

      Table 2.1. Cyber-security requirements by application area



Defense Avionics Automotive
Hardware root of trust image image image
Physical attack protection