Liliana Andrade

Multi-Processor System-on-Chip 2


Скачать книгу

if we were to underestimate all the complexity layers and their implications.

      1.5.3. Measurements for low-end and high-end use cases

      For the analysis of GFDM kernel performance for the high-end and low-end workloads posed by the standard analysis in section 1.2 and Table 1.1, we have to provide a vDSP architecture on which to execute the GFDM kernel.

      We use theoretical minimum operations as the baseline for a fair, objective and unbiased comparison between the code variants and their utilization of processor resources. We call this metric implementation execution density ρ and define it as a ratio of minimum theoretical operations and measured cycles. The theoretical minimum includes a) only general/standard arithmetic or logical operations (not fused/composite operations that combine several into one, with MAC as the only exception), and b) memory accesses. The theoretical vector operations minimum depends on the implementation variant, i.e. the loop order combination and vectorization.

image
Use Case Metric Black Blue
low-end LTE legacy image required budget [MHz] 1.01 5.39
our vDSP processing time [µs] 0.504 2.695
vDSP utilization [%] 0.10 0.54
min. vDSP s to run [#] 1 1
CA high-end FR2 image 4 ×CA, µ = 3,400MHz required budget [MHz] 921.2 5,505.5
our vDSP processing time [µs] 57.58 344.09
vDSP utilization [%] 92.12 550.55
min. vDSP s to run [#] 1 6
MIMO CA high-end FR2 image 8 ×8, 4 ×CA, µ = 3,400MHz required budget [MHz] 7.37 44.04
our vDSP processing time [µs] 460.6 2,752.7
vDSP utilization [%] 736.95 4,404.38
min. vDSP s to run [#] 8 45

      The results argue in favor of the discussion from section 1.2: it is practical to run the low-end use cases quite effortlessly in parallel with other kernels and tasks on a vDSP. Surprisingly, even the CA high-end black GFDM can fit on a single vDSP core and make the deadline, albeit at a heavy load. Since there are several vDSPs on the MPSoC, running this modulation flavor is an option to consider, provided, of course, that the memory bandwidth allows using the black flavor. Finally, as expected, it is practical to use HW accelerator engines for the MIMO CA high-end use case instead of many fully loaded vDSP cores.

      This chapter closely followed an SW implementation of the GFDM algorithm on the SotA vDSP and noted considerations taken into account with regard to handset workloads expected in modern and future mobile communications. We give analyses and conclusions on four layers: specification requirements, translating theory to pseudo-code, precision analysis and requirements and implementation space exploration.