Storage

RAID 5 vs RAID 6: Which one is the best for SSDs?

Compare the differences between RAID 5 vs RAID 6, including resiliency, performance, erasure coding, and the impact on SSDs.

Two RAID levels that get compared quite a lot are RAID 5 vs RAID 6. What are the differences between these two RAID levels, and which one should you use for SSDs?

Parity Data – What is it?

Understanding Parity data is important to understanding the differences between RAID 5 and RAID 6 and which one you might want to use or choose with SSDs. Parity information makes sure of data protection and helps you recover the lost data if a disk fails in your data storage configuration. The system computes and stores this parity data across the disks in the RAID array.

Parity data is the key ingredient to raid
Parity data is the key ingredient to raid

Parity data provides error correction. It’s a mathematical way that helps provide redundancy so you can recover your data during a drive failure. The primary thing behind parity is using the values of bits across multiple data blocks to calculate a parity bit.

XOR Operation and Parity

One of the most common operations to determine parity is the XOR (Exclusive OR) operation. Let’s consider a simplified example using binary data:

  • Data Block A: 1010

  • Data Block B: 1100

If we were to XOR the corresponding bits of these blocks, we’d get:

1 XOR 1 = 0
0 XOR 1 = 1
1 XOR 0 = 1
0 XOR 0 = 0

So, our parity block becomes: 0110

Using Parity for Data Recovery

Let’s assume Data Block A (1010) gets lost or corrupted. We can recover the remaining Data Block B and the parity block.

Xor operations allow restoring data from parity information
Xor operations allow restoring data from parity information

Using the XOR operation again:

  • Lost Data Block A: ???? (This is what we want to recover)

  • Data Block B: 1100

  • Parity Block: 0110

If we XOR Data Block B and the Parity Block:

1 XOR 0 = 1
1 XOR 1 = 0
0 XOR 1 = 1
0 XOR 0 = 0

The result is 1010, which is the original Data Block A!

This is a basic example, but the principle extends to RAID arrays with multiple drives. In configurations like RAID 5, the parity data is spread across all drives rather than being located in a single “parity drive.” This distribution enhances performance and ensures that the loss of any single drive can be recovered using the data from the remaining drives combined with the distributed parity data.

RAID 6 and Dual Parity

While RAID 5 uses a single set of parity data, RAID 6 introduces dual parity, allowing for recovery even if two drives fail simultaneously. This dual parity is more complex than the single parity in RAID 5.

RAID 6 uses two distinct mathematical operations (including XOR) to calculate two sets of parity information. This means even if two sets of data are lost, the dual parity and the remaining data blocks can still reconstruct the original information.

What is RAID 5?

RAID 5 is a means to distribute data across all disks in your RAID array. The minimum disks is three disks, and it can withstand a single disk failure. The advantage here is it can have really good read performance. However, write performance does take a hit because of the overhead of calculating and saving parity blocks.

Raid 5 is a standard in raid arrays
Raid 5 is a standard in raid arrays

Key Features of RAID 5:

  • Redundancy: Protection against single disk failure.

  • Write Performance: there is a write performance impact due to parity information written

  • Capacity: You will lose the total of 1 disk of data with the parity information that is written

  • Hardware vs Software RAID: RAID 5 can be implemented using both hardware raid controllers and software raid solutions. But always use hardware if possible.

What is RAID 6?

RAID 6 is another common RAID level that many use. It has the advantage that you can lose 2 disks instead of just 1 disks. But, there are disadvantages to know, especially with SSDs. It has to calculate and write 2 times the parity information. When thinking about SSDs this will be something to consider due to the extra wear it will introduce.

Raid 6 protects your data even more than raid 5
Raid 6 protects your data even more than raid 5

Key Differences between RAID 5 vs RAID 6:

  • Fault Tolerance: While RAID 5 offers fault tolerance against a single disk failure, RAID 6 is resilient to two simultaneous failures.

  • Write Speed: RAID 6 can have a lot slower write speeds compared with RAID 5 since it is having to write double the parity information

  • Disk Requirements: At least four disks are needed for RAID 6 compared to RAID 5 which has a minimum of three.

  • Parity RAID: RAID 6 is sometimes called “dual parity raid” because of the two sets of parity data it uses.

Performance

RAID 5 has less write operations than RAID 6 because it only has to calculate and store one set of parity data. On the other hand, RAID 6, with its double parity, will bring down write speeds due to the additional parity calculation and writes.

Disk Failures and Data Recovery

In RAID 5, if a disk fails, the RAID array can still operate while using the parity data already written to recreate the data from the failed drive. However, RAID is vulnerable to data loss during the recovery time, since just a single other drive failure while the array is in a degraded state will result in the whole array being lost.

RAID 6 has better data security, that allows the system to continue operating even with two disk failures. The dual parity data makes sure the system can rebuild data even if two disks fail simultaneously.

RAID 5/6 vs Erasure Coding in HCI Systems

Hyper-converged infrastructure (HCI) systems bring a whole new concept of data resiliency mechanisms, including advanced Erasure Coding. Let’s quickly compare the key differences between traditional RAID 5/6 and Erasure Coding within HCI environments found in HCI systems like VMware vSAN.

RAID 5/6 Overview:

RAID 5/6: These RAID levels offer redundancy through parity data, ensuring data integrity and availability during single (RAID 5) or double (RAID 6) disk failures.

  • Performance: RAID 5/6 has balanced read and write speeds suitable for various applications, with RAID 6 experiencing slightly slower writes due to dual parity calculations.
  • Capacity Efficiency: While RAID 5 loses one diskโ€™s capacity for parity, RAID 6 sacrifices two
  • Scalability: Traditional RAID structures may face scalability issues as they depend on dedicated hardware RAID controllers, limiting the number of disks in the array.
Raid5 erasure coding between 4 nodes
Raid5 erasure coding between 4 nodes

Erasure Coding Overview:

Erasure Coding (EC): EC is a forward error correction technique employed in HCI systems, breaking data into fragments, expanding and encoding them, and then storing them across different locations. This is typically done between nodes and not disks. So, these systems are often referred to as RAIN (Redundant Array of Independent Nodes).

  • Performance: While read operations in EC are efficient, write operations might suffer due to computational overhead, especially during fragment creation and encoding processes.
  • Capacity Efficiency: EC has higher storage efficiency than RAID 6, as it can provide the same or higher levels of redundancy without sacrificing as much storage space.
  • Scalability: EC excels in scalability, being a software-defined solution, making it apt for the distributed architecture found in HCI systems, easily accommodating growing data needs.
Raid6 erasure coding between 6 nodes
Raid6 erasure coding between 6 nodes

Quick Comparison:

  • Efficiency: Erasure Coding is typically more storage-efficient than RAID 5/6, especially important in large-scale and cloud environments where storage efficiency is paramount.
  • Fault Tolerance: While RAID 5/6 are limited to tolerating one or two disk failures respectively, Erasure Coding can be configured to withstand multiple failures, providing enhanced reliability.
  • Use Case Suitability: RAID 5/6 might be preferable for smaller, hardware-defined storage solutions, while Erasure Coding is well-suited for large, distributed, and software-defined storage environments inherent to HCI systems.

Best RAID Levels for SSDs

Understanding how raid levels affect ssd wear
Understanding how raid levels affect ssd wear

SSDs have a finite number of reads and writes that can happen before they wear out. When thinking about different RAID levels, some have more writes than others so can definitely lead to wearing out your SSDs faster than others.

RAID Levels and SSD Wear

Certain RAID levels perform more write operations than others, which can lead to accelerated wear on SSDs:

  • RAID 0 (Striping): Good performance and stripes data across multiple SSDs. No parity data is written in RAID 0, so it doesn’t add additional writes, making it relatively friendly for SSD wear.

  • RAID 1 (Mirroring): RAID 1 mirrors data so it is stored on both, the number of writes remains consistent across the mirrored drives so it doesn’t accelerate the wear on the drive

  • RAID 5 (Single Parity): Each write operation in RAID 5 will have increased write amplification it is called, and this will lead to accelerating SSD wear. However, with modern SSDs and their wear-mechanisms, are still ok

  • RAID 6 (Dual Parity): This RAID level is concerning when it comes to SSDs. Since it has two sets of parity data for every data write, it has much more wear and tear on your SSDs. It is less ideal for SSDs.

Parity calculations and wear patterns
Parity calculations and wear patterns

Best RAID Levels for SSDs

  1. RAID 1 (Mirroring): Offers redundancy without accelerating wear. It’s a good choice for critical data that doesn’t require large storage pools.

  2. RAID 10 (1+0): Combining the best of RAID 1 and RAID 0, RAID 10 will give you redundancy from mirroring and better performance from striping. The wear considerations are similar to RAID 1, making it SSD-friendly.

  3. RAID 5: RAID 5 can be a good choice for SSDs, when considering modern wear-leveling mechanisms and overprovisioning. However, regular monitoring of SSD health and prompt replacements are essential.

Considering TRIM

Another aspect of RAID with SSDs is the TRIM command. TRIM allows the operating system to tell the SSD that blocks of data are no longer considered in use and can be wiped. This helps to make sure that the RAID controller or software RAID supports TRIM with SSDs is important to giving you optimal SSD performance and longevity.

Wrapping up

There are so many comparisons out there comparing RAID 5 vs RAID 6. These are two of the most common RAID levels in the enterprise and definitely one you need to be aware of. When thinking about these with SSDs there are several factors to consider as we have seen.

Subscribe to VirtualizationHowto via Email ๐Ÿ””

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Brandon Lee

Brandon Lee is the Senior Writer, Engineer and owner at Virtualizationhowto.com and has over two decades of experience in Information Technology. Having worked for numerous Fortune 500 companies as well as in various industries, He has extensive experience in various IT segments and is a strong advocate for open source technologies. Brandon holds many industry certifications, loves the outdoors and spending time with family. Also, he goes through the effort of testing and troubleshooting issues, so you don't have to.

Related Articles

2 Comments

    1. hellboy,

      Thank you for the comment! Definitely Ceph is an interesting option that goes more along the lines of HCI and erasure coding discussions.

      Brandon

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.