of architecture, some systems may experience lower levels of performance than with a single larger pool. In addition, with multiple smaller pools, capacity planning becomes more complex, as growth across disk pools may not be consistent, and there is likely to be an increase in overall disk resources not being used.
Neither of these options is without its advantages and drawbacks, and there is no one perfect solution. However, designing a solution that uses multiple smaller pools over one universal disk pool will likely come down to one or more of the following key design factors:
• Disk pools based on function, such as development, QA, production, and so on. This option may be preferred if you are concerned with performance for specific environments, and want to isolate them from impacting the production system.
• In multitenanted environments, whether public or based on internal business units, each tenant can be allocated its own pool. However, depending on the environment and SLAs, each tenant might end up with multiple pools in order to address specific I/O characteristics of various applications.
• Application-based pools, such as database or email systems. This can provide optimum performance as applications of similar type often have similar I/O characteristics. For this reason, it may be worth considering designing pools based on application type. However, this also carries the risk of some databases, for instance, generating very high volumes of I/O and potentially impacting other databases residing on the same disk pool.
• Drive technology and RAID type. This allows you to place data on the storage type that best matches the application I/O characteristics, such as reads versus writes versus sequential. However, this approach can also increase costs and does not address any specific application I/O intensity requirement.
• Storage tier–based pools (such as Gold, Silver, and Bronze) could allow you to mix drive technologies and/or RAID types within each pool, therefore reducing the number of pools required to support most application types, configurations, and SLAs.
RAID Sets
The term RAID has already been used multiple times in different contexts, so let’s address this technology next.
RAID (redundant array of independent disks) combines two or more disk drives into a logical grouping, typically known as a RAID set. Under the control of a RAID controller (or in the case of a storage system, the storage processors or controllers), the RAID set appears to the connected hosts as a single logical disk drive, even though it is made up of multiple physical disks. RAID sets provide four primary advantages to a storage system:
• Higher data availability
• Increased capacity
• Improved I/O performance
• Streamlined management of storage devices
Typically, the storage array management software handles the following aspects of RAID technology:
• Management and control of disk aggregation
• Translation of I/O requests between the logical and the physical entities
• Error correction if disk failures occur
The physical disks that make up a RAID set can be either traditional mechanical disks or solid-state flash drives (SSDs). RAID sets have various levels, each optimized for specific use cases. Unlike many other common technologies, RAID levels are not standardized by an industry group or standardization committee. As a result, some storage vendors provide their own unique implementation of RAID technology. However, the following common RAID levels are covered in this chapter:
• RAID 0–striping
• RAID 1–mirroring
• RAID 5–striping with parity
• RAID 6–striping with double parity
• RAID 10–combining mirroring and striping
Determining which type of RAID to use when building a storage solution largely depends on three factors: capacity, availability, and performance. This section addresses the basic concepts that provide a foundation for understanding disk arrays, and how RAID can enable increased capacity by combining physical disks, provide higher availability in case of a drive failure, and increase performance through parallel drive access.
A key element in RAID is redundancy, in order to improve fault tolerance. This can be achieved through two mechanisms, mirroring and striping, depending on the RAID set level configured. Before addressing the RAID set capabilities typically used in storage array systems, we must first explain these two terms and what they mean for availability, capacity, performance, and manageability.
NOTE
Some storage systems also provide a JBOD configuration, which is an acronym for just a bunch of disks. In this configuration, the disks do not use any specific RAID level, and instead act as stand-alone drives. This type of disk arrangement is most typically employed for storage devices that contain swap files or spooling data, where redundancy is not paramount.
Striping in RAID Sets
As highlighted previously, RAID sets are made up of multiple physical disks. Within each disk are groups of continuously addressed blocks, called strips. The set of aligned strips that spans across all disks within the RAID set is called the stripe (see Figure 2.3).
Figure 2.3 Strips and stripes
Striping improves performance by distributing data across the disks in the RAID set (see Figure 2.4). This use of multiple independent disks allows multiple reads and writes to take place concurrently, providing one of the main advantages of disk striping: improved performance. For instance, striping data across three hard disks would provide three times the bandwidth of a single drive. Therefore, if each drive runs at 175 input/output operations per second (IOPS), disk striping would make available up to 525 IOPS for data reads and writes from that RAID set.
Figure 2.4 Performance in striping
Striping also provides performance and availability benefits by doing the following:
• Managing large amounts of data as it is being written; the first piece is sent to the first drive, the second piece to the second drive, and so on. These data pieces are then put back together again when the data is read.
• Increasing the number of physical disks in the RAID set increases performance, as more data can be read or written simultaneously.
• Using a higher stripe width indicates a higher number of drives and therefore better performance.
• Striping is managed through storage controllers, and is therefore transparent to the vSphere platform.
As part of the same mechanism, parity is provided as a redundancy check, to ensure that the data is protected without having to have a full set of duplicate drives, as illustrated in Figure 2.5. Parity is critical to striping, and provides the following functionality to a striped RAID set:
Figure 2.5 Redundancy through parity
• If a single disk in the array fails, the other disks have enough redundant data so that the data from the failed disk can be recovered.
• Like striping, parity is generally a function of the RAID controller or storage controller, and is therefore fully transparent to the vSphere platform.
• Parity information can be
• Stored on a separate, dedicated drive
•