Gadi Taubenfeld

Distributed Computing Pearls


Скачать книгу

an algorithm should also specify how to find the maximum in each subset.)

      The term distributed algorithms refers to algorithms where the computing elements are physically far away from each other and communicate by sending and receiving messages (as done on the Internet). The term concurrent algorithms refers to algorithms where the computing elements are physically very close to each other and communicate by reading from and writing to shared memory locations (as done inside a multiprocessor computer). In the field of distributed computing, both types of algorithms are studied.

      When a processor executes a computer program (such as a web browser), the execution itself is called a process. A process runs on a processor, which is the physical hardware. The physical location of the different processes or processors—we shall use the terms interchangeably—involved in a single concurrent activity can be anywhere from the same computer to different computers anywhere in the world.

      There are two main technological underpinnings of the fascinating rapid developments of computer networks and multiprocessor computers. The first is, of course, the advances in the design of faster hardware. The second is the development of efficient concurrent and distributed algorithms for supporting complex interactions between processors and computers. This book tells the story of the development of such algorithms. It is a fascinating story!

      In addition to the study of concurrent and distributed algorithms, the field of distributed computing also explores the inherent limitations of distributed systems: what problems cannot be solved in particular systems. Identifying features of a specific distributed system architecture that make it inadequate for solving certain problems is crucial for the design of better systems which can overcome such limitations.

      Impossibility results help us understand the crucial limitations of real systems, why certain systems are (computationally) weak while others are powerful, when should we stop looking for a solution for a given problem, and how to adjust a problem statement or a system model to overcome an impossibility result.

      Impossibility results usually depend on assumptions about: how the computing elements communicate with one another, what kinds of failures may occur, or whether randomization can be used. Such results are usually hard to prove.

      Through the book, we will introduce and discuss some of the most fundamental impossibility results of distributed computing. We will highlight the insights gained from these results so that the reader can understand and appreciate their utmost importance.

      In 1968, Edsger Wybe Dijkstra, one of the most influential members of computing science’s founding generation, published his famous paper “Co-operating Sequential Processes” [14], that originated the field of concurrent programming. A few concepts and results from Dijkstra’s papers are covered in Chapters 2 and 7.

      The Internet traces its beginning back to the early 1960s. At that time several research groups had invented a new technique, called packet switching, to transmit information as an efficient and robust alternative for circuit switching which is the transmission method used by the telephone network. Packet switching is the transmission method used today by the Internet. James F. Kurose and Keith W. Ross’ excellent book Computer Networking: A Top-Down Approach explains and uses the Internet’s architecture and protocols as primary vehicles for studying fundamental computer networking concepts [31].

      In 1995, Gordon Moore published an article predicting exponential growth in the density of transistors in integrated circuits [37]. Since then, this prediction, known as “Moore’s Law,” went on to become a self-fulfilling prophecy. Moore’s Law has become actual fact, with the doubling of transistor density every 18 months, as well as exponential growth in clock rates. However, due to inherent limitations imposed by the laws of physics, this exponential growth of the computing power of uniprocessors has to decline. The constraints of heat dissipation due to high-power consumption by fast uniprocessors have forced chip manufacturers to develop multicore architectures, where increases in throughput are achieved by packaging multiple cores embedded on the same chip which resides inside a single multiprocessor computer.

      1Multicore means multiple processors embedded on the same chip (i.e., on the same piece of semiconducting material).

      2In mid-2007, IBM unveiled Blue Gene/P, the second generation of one of the most powerful supercomputers in the world. The Blue Gene/P supercomputer configuration is a 294,912-processor, 72-rack system harnessed to a high-speed, optical network. It is used for executing applications like hydrodynamics, quantum chemistry, molecular dynamics, climate modeling, and financial modeling.

      3The multicore era for mainstream computers began in spring 2005 when Intel and AMD (followed the lead of IBM and Sun Microsystems) announced that their microprocessors would rely on multiple processors or cores, and introduced computers with two processors each (dual-core processors) which enable the execution of two programs in parallel.

      CHAPTER 2

       One Loaf of Bread, Please

      In many distributed systems, sometimes there is a need to ensure that two things will not happen at the same time. We use the too much bread problem to demonstrate the difficulty involved in such a type of synchronization when communication is restricted to sending and receiving messages.

      Alice and Bob are sharing an apartment. They have decided that at any time they will try to have precisely one loaf of bread in the kitchen. Let’s assume that Alice arrives home in the afternoon, and finds that there is no bread. So, she leaves for the bakery to buy bread. After she leaves, Bob arrives, he also finds that there is no bread and goes to buy bread. In the end, each one of them buys a loaf of bread, and they end up with too much bread. So, Alice and Bob are looking for a solution to ensure the following.

      1. Only one person buys a loaf of bread, when there is no bread.

      2. Someone always buys a loaf of bread, when there is no bread.

      A solution in which only Bob is responsible for buying bread is not acceptable. In such a solution, there is a scenario where Alice arrives home and finds that there is no bread, and waits forever for Bob to show up. A proper solution should ensure that, when there is no bread, someone always buys bread even if only one person shows up.

      Alice and Bob cannot see each other, and they communicate only by sending each other short text messages. It is assumed that a message that is sent is never lost and arrives immediately to its destination. However, between the time Alice finishes checking whether she has received a message and starts sending a message, Bob may send her a message.

      To see the corresponding synchronization problem for computing systems, replace a loaf of bread in the above example with a file, and let Alice and Bob be the names of two computers that are trying to avoid updating a shared file at the same time. As already mentioned in Chapter 1, without proper synchronization, the integrity of the data may be destroyed if two computers update a common file at the same time, and as a result, deposits and withdrawals could be lost, confirmed reservations might have disappeared, etc. Alice and Bob which communicate only by sending text messages, correspond to computers which communicate by sending and receiving messages.

      Figure 2.1: The code for the first incorrect attempt. The statement “if (no A)” means if Alice is not present, and the statement “if (no B)” means if Bob is not present.