Mohamed Zahran

Heterogeneous Computing


Скачать книгу

the way to process technology. The aim of this book is to introduce heterogeneous computing in the big picture. Whether you are a hardware designer or a software developer, you need to know how the pieces of the puzzle fit together.

      This book will discuss several architecture designs of heterogeneous system, the role of operating system, and the need for more efficient programming models. The main goal is to bring researchers and engineers to the forefront of the research frontier in the new era that started a few years ago and is expected to continue for decades.

      Acknowledgments

      First and foremost, I would like to thank all my family for their support, encouragement, and unconditional love. I would like to thank Steve Welch, who is actually the one who gave me the idea of writing this book. A big thank you goes also to Tamer Özsu the editor-n-chief of ACM books, for his encouragement, flexibility, and willingness to answer many questions. Without him, this book wouldn’t have seen the light of day.

      Anything I have learned in my scientific endeavors is due to my professors, my students, and my colleagues. I cannot thank them enough. Dear students, we learn from you as much as you learn from us.

      Mohamed Zahran

      September 2018

      1Why Are We forced to Deal with Heterogeneous Computing?

      When computers were first built, about seven decades ago, there was one item on the wish list: correctness. Then soon a second wish appeared: speed. The notion of speed differs of course from those old days and applications to today’s requirements. But in general we can say that we want fast execution. After a few more decades and the proliferation of desktop PCs and then laptops, power became the third wish, whether in the context of battery life or electricity bills. As computers infiltrated many fields and were used in many applications, like military and health care, we were forced to add a fourth wish: reliability. We do not want a computer to fail during a medical procedure, for example; or it would have been a big loss (financially and scientifically) if the rover Curiosity, which NASA landed on Mars in 2012, failed. (And yes, Curiosity is a computer.) With the interconnected world we are in today, security became a must. And this is the fifth wish. Correctness, speed, power, reliability, and security are the five main wishes we want from any computer system. The order of the items differs based on the application, societal needs, and the market segment. This wish list is what directs the advances in hardware and software. But the enabling technologies for fulfilling this wish list lie in hardware advances and software evolution. So there is a vicious cycle between the wish list and hardware and software advances, and this cycle is affected by societal needs. This chapter explains the changes we have been through from the dawn of computer systems till today that made heterogeneous computing a must.

      In this chapter we see how computing systems evolved till the current status quo. We learn about the concept of heterogeneity and how to make use of it. At the end of this chapter, ask yourself: Have we reached heterogeneous computing willingly? Or against our will? I hope by then you will have an answer.

      In 1965 Gordon Moore, cofounder of Intel together with Robert Noyce, published a four-page paper that became very famous [Moore 1965]. This paper, titled “Cramming More Components onto Integrated Circuits,” made a prediction that the number of components (he did not mention transistors specifically, but the prediction evolved to mean transistors) in an integrated circuit (IC) will double every year. This prediction evolved over time to be two years, then settled on 18 months. This is what we call Moore’s law: transistors on a chip are expected to double every 18 months. The 50th anniversary of Moore’s law was in 2015! More transistors per chip means more features, which in turn means, hopefully, better performance. Life was very rosy for both the hardware community and the software community. On the hardware side, faster processors with speculative execution, superscalar capabilities, simultaneous multithreading, etc., were coupled with better process technology and higher frequency, which produced faster and faster processors. On the software side, you could write your program and expect it to get faster with every new generation of processors with any effort on your part! Until everything stopped around 2004. What happened?

      Around 2004 Dennard scaling stopped. In 1974 Robert Dennard and several other authors [Dennard et al. 1974] published a paper that predicted that voltage and current should be proportional to the linear dimensions of the transistors. This has been known as Dennard scaling. It works quite well with Moore’s law. Transistors get smaller and hence faster and their voltage and current also scale down, so power can stay almost constant, or at least will not increase fast. However, a closer look at the Dennard scaling prediction shows that the authors ignored leakage current (was very insignificant at the time when the paper was published). Now as transistors get smaller and smaller, leakage becomes more significant. The aforementioned paper also ignored the threshold voltage at which the transistor switches. Around 2004 those two factors overcame the prediction of Dennard scaling, and now we increase the number of transistors per Moore’s law, but the power density also increases. Power density has many effects. One of them is that it increases packaging cost. Also, dealing with power dissipation becomes problematic and expensive. Given all that and given that dynamic power is proportional to clock frequency, we are stuck! What is the solution?

      The solution is to stop increasing the clock frequency and instead increase the number of cores per chip, mostly at lower frequency. We can no longer increase frequency, otherwise power density becomes unbearable. With simpler cores and lower frequency, we reduce power dissipation and consumption. With multiple cores, we hope to maintain higher performance. Figure 1.1 [Rupp 2018] tells the whole story and shows the trends of several aspects of microprocessors throughout the years. As the figure shows, from around 2004 the number of logical cores started to increase beyond single core. The word logical includes physical cores with simultaneous multithreading (SMT) capability, also known as hyperthreading technology [Tullsen et al. 1995, Lo et al. 1997]. So a single core with four-way hyperthreading is counted as four logical cores. With SMT and the increase in the number of physical cores, we can see a sharp increase in the number of logical cores (note the logarithmic scale). If we look at the power metric, the 1990s was not a very friendly decade in terms of power. We see a steady increase. After we moved to multicore, things slowed down a bit due to the multicore era as well as the rise of dark-silicon techniques [Allred et al. 2012, Bose 2013, Esmaeilzadeh et al. 2011] and some VLSI tricks. “Dark silicon” refers to the parts of the processor that must not be turned off (hence dark) in order for the heat generated not to exceed the maximum capability that the cooling system can dissipate (called thermal design point, or TDP). How to manage dark silicon while trying to increase performance? This is the question that has resulted in many research papers in the second decade of the twenty-first century. We can think of the dark-silicon strategy as a way to continue increasing the number of cores per chip while keeping the power and temperature at a manageable level. The figure also shows that we stopped, or substantially slowed, increasing clock frequency. With this bag of tricks, we sustained, so far, a steady increase of transistors, as the figure shows at its top curve. There is one interesting curve remaining in the figure: the single thread (i.e., sequential programs) performance. There is a steady increase in single thread performance almost till 2010. The major reason is Moore’s law, which allowed computer architects to make use of these transistors to add more features (from pipelining to superscalar to speculative execution, etc.). Another reason is the increase in clock frequency that was maintained till around 2004. There are some minor factors that make single thread performance a bit better with multicore. One of them is that the single thread program has a higher chance of executing on a core by itself without sharing resources with another program. The other is that of thread migration. If a single thread program is running on a core and that core becomes warm, the frequency and voltage will be scaled down, slowing down the program. If the program is running on a multicore, and thread migration is supported, the program may migrate to another core, losing some performance in the migration process but continuing at full speed afterwards.

images