9 Smart Cameras and MPSoCs 9.1. Introduction 9.2. Early VLSI video processors 9.3. Video signal processors 9.4. Accelerators 9.5. From VSP to MPSoC 9.6. Graphics processing units 9.7. Neural networks and tensor processing units 9.8. Conclusion 9.9. References 10 Software Compilation and Optimization Techniques for Heterogeneous Multi-core Platforms 10.1. Introduction 10.2. Dataflow modeling 10.3. Source-to-source-based compiler infrastructure 10.4. Software distribution 10.5. Results 10.6. Conclusion 10.7. References
11 Index
List of Tables
1 Chapter 1Table 1.1. Processing requirement corners as per standard specificationTable 1.2. Kernel parameters for corner use casesTable 1.3. Selected GFDM implementation variantsTable 1.4. Kernel profile: cycles, memory accesses, and densityTable 1.5. GFDM: required frequency budget and performance on our vDSP
2 Chapter 2Table 2.1. Implementation properties of various coding schemes
3 Chapter 8Table 8.1. A comparison of large FFTs mapped onto a GPU and an FPGATable 8.2. A comparison of the four back-pressure schemes of Figure 8.18Table 8.3. A summary of the StaccatoLab execution model
4 Chapter 10Table 10.1. Retargeting MAPS towards MPSoC platforms
List of Illustrations
1 Chapter 1Figure 1.1. Application mapping on the rate–latency plane with regard to the rel...Figure 1.2. Comparing 14 OFDM symbols’ TTI duration of 4G and 5GFigure 1.3. Processing load in kRB/s for 5G NR FR1 (Damjancevic et al. 2019)Figure 1.4. Processing load in kRB/s for 5G NR FR2Figure 1.5. Tiled “Kachel” MPSoC with decentralized tightly coupled memoriesFigure 1.6. Heterogeneous MPSoC with a central shared memory architectureFigure 1.7. GFDM processing dataflow diagramFigure 1.8. Visualization of time-domain GFDM filteringFigure 1.9. GFDM pseudo-codeFigure 1.10. Precision test bed set upFigure 1.11. Varied precision quantization of GFDMFigure 1.12. GFDM EVM for varied data and ACC complex bit-lengths compared to ad...Figure 1.13. vDSP simplified HW block diagram
2 Chapter 2Figure 2.1. State-of-the-art commercial system-on-chip baseband architectureFigure 2.2. Left: 306 Gbit/s turbo decoder. Middle: 288 Gbit/s LDPC decoder. Rig...
3 Chapter 3Figure 3.1. Security in LoRaWANFigure 3.2. Boot process in an STM32MP1 deviceFigure 3.3. Execution environments in OP-TEE enabled organization based on ARM T...Figure 3.4. LoraWAN gateway using an RAK831 RF with a GPS (top two shields), the...Figure 3.5. Execution of gateway packet forwarder in OP-TEE enabled organization...
4 Chapter 4Figure 4.1. Hypervisor typesFigure 4.2. Hyperconverged versus disaggregated architecturesFigure 4.3. Comparison of the NexVisor I/O architecture to standard XenFigure 4.4. Optimized I/O datapath operations in the NexVisor for local and remo...Figure 4.5. High-level view of disaggregated storage architecture, showing the I...Figure 4.6. ATA over Ethernet (AoE) header format (Hopkins and Coile 2009)Figure 4.7. Storage virtualization data structures used by the accelerated datap...Figure 4.8. Hardware architecture for the disaggregated storage acceleration car...Figure 4.9. Thousands of sequential read I/Os per second, using fio on four VM c...Figure 4.10. Thousands of sequential write I/Os per second, using fio on four VM...Figure 4.11. Thousands of sequential read I/Os per second, using fio on four VM ...Figure 4.12. AoE read throughput scaling for one to four client flows
5 Chapter 5Figure 5.1. Overall ECU/DCU costs (B$) evolution and breakdown between standard ...Figure 5.2. Overview of the FACE PCU and PIU infrastructureFigure 5.3. Synthetic result of the hardware benchmarkFigure 5.4. Overview of the FACE PCU structureFigure 5.5. Daughterboard physical form factorFigure 5.6. Overview of the FACE PIU structureFigure 5.7. Example of hardware setup of the FACE platform. PCU front and back s...Figure 5.8. Illustration of AUTOSAR adaptive platform software architectureFigure 5.9. ADAS polygraph modelFigure 5.10. The FACE instrumented prototype setupFigure 5.11. Use case interface
6 Chapter 6Figure 6.1. The anatomy of a desktop in the 1980sFigure 6.2. The anatomy of a modern dual-socket server bladeFigure 6.3. Chip organization in an x86-based CPU (left) and a custom many-core ...Figure 6.4. Speedup of near-memory processing: many-core OoO CPU and Mondrian wi...
7 Chapter 7Figure 7.1. Overview on the operation of VPSimFigure 7.2. Simplistic implementation of the PL011 UART in PythonFigure 7.3. System validation flow using the hybrid prototyping solutionFigure 7.4. Synchronization mechanism between VPSim and FPGA in TLM R/WFigure 7.5. Communication scheme in VPSim during co-simulation and co-emulationFigure 7.6. Example of FmiValue declarationFigure 7.7. Structure of the generated virtual platform FMUFigure 7.8. Parallel implementation of the virtual platform FMU
8 Chapter 8Figure 8.1. The top 20 of the TOP500 and GREEN500 supercomputers (left) and the ...Figure 8.2. The three axes of FFT parallelism (left) and various machine rooflin...Figure 8.3. Homogeneous flow graph G
comprising seven nodes and seven edgesFigure 8.4. Flow graph G
(top), its state after three cycles (mid), its flow (ev...Figure 8.5. Graph Glorenz
(top), its flow during the first eight cycles (left) a...Figure 8.6. Definition graph Glorenz
Figure 8.7. Definition of graph GMA
Figure 8.8. Node ma
computes the moving average over a window N = 11 samples. Th...Figure 8.9. An example of an FSM for a cyclo-static dataflow nodeFigure 8.10. Subgraph RO
(left, top) comprises three CSDF nodes ro[i]
and perfor...Figure 8.11. SWITCH and SELECT nodes as used in Boolean DataflowFigure 8.12. Recursive dataflow graph Sort.64
(top), Sort.64
connected to a rand...Figure 8.13. Definition of node class Split
Figure 8.14. Definition of node class IMG
Figure 8.15. A toy processor farm: dataflow graph (top), flow plot (left) and da...Figure 8.16. Dataflow graph Sobel
with I/O to RAM
node ram
(top), 50μsec flow pl...Figure 8.17. Three tricycle