Seifedine Kadry

Big Data


Скачать книгу

are often used for failover and backup purposes. Without clustering the nodes if the server running an application goes down, the application will not be available until the server is up again. In a highly available cluster, if a node becomes inoperative, continuous service is provided by failing over service from the inoperative cluster node to another, without administrative intervention. Such clusters must maintain data integrity while failing over the service from one cluster node to another. High availability systems consist of several nodes that communicate with each other and share information. High availability makes the system highly fault tolerant with many redundant nodes, which sustain faults and failures. Such systems also ensure high reliability and scalability. The higher the redundancy, the higher the availability. A highly available system eliminates single point of failures.

      Highly available systems are essential for an organization that has to protect its business against loss of transactional data or incomplete data and overcome the risk of system outage. These risks, under certain circumstances, are bound to cause millions of dollars of losses to the business. Certain applications such as online platforms may face sudden increase in traffic. To manage these traffic spikes a robust solution such as cluster computing is required. Billing, banking, and e‐commerce demand a system that is highly available with zero loss of transactional data.

      2.1.1.2 Load Balancing Cluster

      Round robin load balancing, weight‐based load balancing, random load balancing, and server affinity load balancing are examples of load balancing. Round robin load balancing chooses server from the top server in the list in sequential order until the last server in the list is chosen. Once the last server is chosen it resets back to the top. The weight‐based load balancing algorithm takes into account the previously assigned weight for each server. The weight field will be assigned a numerical value between 1 and 100, which determines the proportion of the load the server can bear with respect to other servers. If the servers bear equal weight, an equal proportion of the load is distributed among the servers. Random load balancing routes requests to servers at random. Random load balancing is suitable only for homogenous clusters, where the machines are similarly configured. A random routing of requests does not allow for differences among the machines in their processing power. Server affinity load balancing is the ability of the load balancer to remember the server where the client initiated the request and to route the subsequent requests to the same server.

      2.1.2 Cluster Structure

      In a basic cluster structure, a group of computers are linked and work together as a single computer. Clusters are deployed to improve performance and availability. Based on how these computers are linked together, cluster structure is classified into two types:

       Symmetric clusters

       Asymmetric clusters

image

       Replication—Replication is the process of placing the same set of data over multiple nodes. Replication can be performed using a peer‐to‐peer model or a master‐slave model.

       Sharding—Sharding is the process of placing different sets of data on different nodes.

       Sharding and Replication—Sharding and replication can either be used alone or together.

      2.2.1 Sharding

      Sharding is the process of partitioning very large data sets into smaller and easily manageable chunks called shards. The partitioned shards are stored by distributing them across multiple machines called nodes. No two shards of the same file are stored in the same node, each shard occupies separate nodes, and the shards spread across multiple nodes collectively constitute the data set.

image image

      Figure 2.6b shows an example as how a data block is split up into shards across multiple nodes. A data set with employee details is split up into four small blocks: shard A, shard B, shard C, shard D and