the NVIDIA volta GPU architecture via microbenchmarking. ArXiv, abs/1804.06826.
Johnson, J. (2018). Rethinking floating point for deep learning. ArXiv, abs/1811. 01721.
Kanduri, A., Rahmani, A.M., Liljeberg, P., Hemani, A., Jantsch, A., and Tenhunen, H. (2017). A Perspective on Dark Silicon. Springer International Publishing.
Kästner, D., Pister, M., Gebhard, G., Schlickling, M., and Ferdinand, C. (2013). Confidence in timing. SAFECOMP 2013 - Workshop SASSUR (Next Generation of System Assurance Approaches for Safety-Critical Systems) of the 32nd International Conference on Computer Safety, Reliability and Security, Toulouse, France.
Krishnamoorthi, R. (2018). Quantizing deep convolutional networks for efficient inference: A whitepaper. ArXiv abs/1806.08342.
Lee, E.A., Reineke, J., and Zimmer, M. (2017). Abstract PRET Machines. IEEE Real-Time Systems Symposium, RTSS, Paris, France, December 5–8, 1–11.
NVIDIA (2020). Programming Tensor Cores in CUDA 9 [Online]. Available: https://devblogs.nvidia.com/programming-tensor-cores-cuda-9/.
Pagetti, C., Saussié, D., Gratia, R., Noulard, E., and Siron, P. (2014). The ROSACE case study: From simulink specification to multi/many-core execution. 20th IEEE Real-Time and Embedded Technology and Applications Symposium. Berlin, Germany, 309–318.
Perret, Q., Maurère, P., Noulard, E., Pagetti, C., Sainrat, P., and Triquet, B. (2016). Temporal isolation of hard real-time applications on many-core processors. IEEE Real-Time and Embedded Technology and Applications Symposium. Vienna, Austria, April 11-14, 37–47.
Resmerita, D., Farias, R.C., Dupont de Dinechin, B., and Fillatre, L. (2020). Benchmarking alternative floating-point formats for deep learning inference. Conférence francophone d’informatique en Parallélisme, Architecture et Système.
Rihani, H., Moy, M., Maiza, C., Davis, R.I., and Altmeyer, S. (2016). Response time analysis of synchronous data flow programs on a many-core processor. Proceedings of the 24th International Conference on Real-Time Networks and Systems. Brest, France, 67–76.
Rodriguez, A., Ziv, B., Fomenko, E., Meiri, E., and Shen, H. (2018). Lower numerical precision deep learning inference and training. Intel AI Developer Program, 1–19 [Online]. Available: https://software.intel.com/content/www/us/en/develop/articles/lower-numerical-precision-deep-learning-inference-and-training.html.
Rovder, S., Cano, J., and O’Boyle, M. (2019). Optimising convolutional neural networks inference on low-powered GPUs. 12th International Workshop on Programmability and Architectures for Heterogeneous Multicores. Valencia, Spain.
Saidi, S., Ernst, R., Uhrig, S., Theiling, H., and Dupont de Dinechin, B. (2015). The shift to multicores in real-time and safety-critical systems. International Conference on Hardware/Software Codesign and System Synthesis. Amsterdam, The Netherlands, October 4–9, 220–229.
Wilhelm, R. and Reineke, J. (2012). Embedded systems: Many cores - Many problems. 7th IEEE International Symposium on Industrial Embedded Systems. Karlsruhe, Germany, June 20–22, 176–180.
For a color version of all figures in this book, see www.iste.co.uk/andrade/multi1.zip.
1 1. Numbers in each pair denote, respectively, the bit-width of the multiplicands and the accumulator.
2 2. Motivated by saving the silicon area and not constrained by the architecture.
4 4. Passing the OpenCL 1.2 conformance with PoCL is work in progress.
5 5. https://www.ansys.com/products/embedded-software/ansys-scade-suite.
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.