The Storage Mistakes That Quietly Hurt Your Proxmox Performance

Proxmox storage performance

One of the things I have learned after running Proxmox clusters full time in the home lab is that storage problems can be more difficult to spot than something like CPU issues. If CPU usage is high, we all know that we can spot this pretty easily. If networking breaks, things can stop working. Storage issues can be different though. In many cases, the issue is not always the performance of the hardware, although it can be. It is often other decisions around your hardware that can hurt as well. Let’s take a look at some of the storage mistakes that can quietly hurt your Proxmox performance in your home lab, but also production environments as well.

Looking at throughput only and not latency

One of the mistakes that has bitten me and affected Proxmox storage performance in the past is looking at the throughput numbers for your storage but then ignoring or not looking at the latency of your environment. A storage array as an example may be able to do several gigabytes per second. But virtualization workloads especially care much more about latency than raw transfer speeds.

VMs generate lots of small random reads and writes. Things like databases, Kubernetes workloads, monitoring stacks, and even simple Linux servers constantly perform small I/O operations. If latency is unpredictable or high, the entire environment starts to feel slow even if throughput numbers still look great using various benchmarks.

A great command to know especially if you are running Ceph with Proxmox is the following command:

ceph osd perf

Below, you can see a really clean working Ceph cluster with enterprise drives and 0 ms of latency on both reads and writes.

Viewing latency in a proxmox ceph cluster
Viewing latency in a proxmox ceph cluster

Be sure also if you decide to benchmark your environment to benchmark with random I/O and not just simple sequential read and write tests as these usually always look good in a benchmark if you have even consumer grade NVMe drives.

In Proxmox, you can also usually spot storage latency issues with a “real world” check of the following:

  • VM responsiveness during backups
  • Delays opening consoles
  • Ceph latency metrics
  • High iowait percentages
  • Slow snapshot creation
  • Applications freezing briefly under load

Low latency and consistent latency are usually a lot more important than chasing huge throughput numbers in the home lab.

Using consumer SSDs for heavy virtualization workloads

Earlier this year, when I built my 5 node Proxmox Ceph cluster, I was bitten by this. Consumer SSDs have improved tremendously over the last few years, but not all SSDs perform very well under virtualization workloads, especially Ceph, which can drastically impact Proxmox storage performance.

Due to the write amplification of Ceph, it has a way of brutalizing consumer grade storage. One of the biggest nose dives in performance that you will see is using low-cost QLC NVMe drives for VM storage in a Ceph cluster.

QLC drives can perform extremely well during burst activity because they rely a huge amount on SLC caching. However, once the SLC cache fills up, sustained write performance can nose dive dramatically. In some cases or tests, write speeds can fall below even a spinning disks during long write operations.

This becomes very noticeable in a Ceph cluster or other even traditional storage types when you are doing the following operations or tasks:

  • running backups
  • rebuilding Ceph
  • restoring VMs
  • moving large VM disks
  • running Kubernetes clusters
  • performing replication jobs

It can be easy to assume that “NVMe drives equal fast” but not all NVMe is created equal. NAND type definitely matters in a virtualization cluster with software-defined storage environments like Ceph.

Note below the Ceph OSDs that i have underlined are “consumer” grade NVMe drives. Note how their latency compares to enterprise drives which are at 0 for both reads and writes.

Comparing latency on enterprise vs consumer grade drives
Comparing latency on enterprise vs consumer grade drives for proxmox storage performance

TLC drives are usually a much better option for virtualization storage. Enterprise SSDs are usually an even better choice because they are designed for sustained write endurance with low latency. This is exactly the kind of characteristics that we need in Proxmox clusters with Ceph.

This does not mean consumer SSDs cannot work in Proxmox. I have run a home lab for years successfully with consumer hardware. The key is understanding the needs of the type of storage you are using in your home lab. Traditional simple NVMe thin pools that you have backed by a single consumer grade NVMe drive usually work just fine. But when it comes to Ceph, these will definitely fall short.

Check out my benchmark comparison of consumer NVMe drives vs enterprise NVMe drives in my Proxmox Ceph cluster here:

Mixing different drives in the same storage pool

Another mistake that I have made many times and that quietly hurts performance is mixing different drive types in the same pool. Let’s face it, this is really common in a home lab due to the very nature of the way we have to acquire hardware on a budget.

When we resort to using whatever hardware is available in the home lab (nothing wrong with this as we can still learn by doing that), we see a lot of the following:

  • different SSD brands
  • mixed capacities
  • different NAND types
  • SATA mixed with NVMe
  • older drives mixed with new drives

The problem is that storage pools often operate at the speed of the slowest drive. So, for instance in Ceph if you are using a very fast enterprise NVMe drive in a Ceph pool along with a consumer-grade QLC NAND NVMe drive, your enterprise drive is going to be hamstrung by the performance (or lack of) of the consumer-grade drive.

In Ceph especially, inconsistent drives can create major performance issues and weirdness. One slower OSD can affect the overall performance of the cluster because workloads distribute across multiple disks. You will see this same type of issue appear with ZFS pools. A single slow drive can be the bottleneck during resilvering processes or heavy writes.

Does this mean you must buy identical hardware forever? Not necessarily. Home labs evolve over time. But we can minimize the impact of slower vs faster drives by keeping drives with similar performance characteristics in the same storage tier. By doing this, your fast drives won’t be bottlenecked by the limitations of the slower drives in the pool.

Different drive types and models in the same ceph cluster
Different drive types and models in the same ceph cluster can impact proxmox storage performance

It is a good idea to try to group similar endurance levels, similar write speeds, similar latency, and similar capacities together in your pools. This will create much more predictable storage behavior.

Running Ceph using 1 GbE networking

This is a big one. We have already explored the fact that Ceph can definitely surface performance issues of consumer grade drives. But, it is also very telling on the quality and performance of your network that carries the storage traffic.

One of the biggest mistakes I see is thinking Ceph will perform well on basic 1GbE networking simply because the environment is small. Technically, Ceph can run on 1GbE. Practically, performance can become painful very quickly once you have multiple nodes and if you have several VMs running. If Ceph begins a rebalance operation or has to rebuild objects, this can absolutely cripple a cluster that is running Ceph on 1 GbE.

It is fine to play around with Ceph on 1 GbE. But, even in a home lab, if you are considering using Ceph as the backend storage for your home lab workloads, do yourself a favor. Build the storage network with 10 GbE. It is a dramatic difference with Ceph. I found that even 2.5 GbE feels a bit anemic, especially if a rebalance is triggered.

Check out my coverage of this topic, switching from 2.5 GbE to 10 GbE:

I switched to 10 gig networking for proxmox and ceph
I switched to 10 gig networking for proxmox and ceph

In my own lab, I am currently running dual 10Gb links with jumbo frames. This made a massive difference in Ceph and its responsiveness and stability during heavy operations, like backups or rebalancing.

Ignoring discard and TRIM support

This is one that I have super close to as of recently. I realized in my home lab that I hadn’t done my due diligence to make sure that blocks could be reclaimed successfully on thin provisioned storage. Many of my VMs were created without discard enabled. This was especially true on VMs that I had imported from VMware vSphere.

What this meant for me was that over time, thin provisioned storage was growing and my Ceph usage continued to go up. If you are using traditional storage, your SAN space disappears slowly and your backups become larger over time. Also, storage performance may degrade as a result as well.

Enabling discard on supported storage types is essential and then following through with the VM disk configuration itself to make sure it has discard enabled.

You can test to reclaim storage in Linux using the following command:

fstrim -av
Running fstrim in linux
Running fstrim in linux

Windows handles this process automatically with TRIM support in modern versions of Windows. Be sure to check out my post on this one here:

Using RAID controllers with ZFS

This is still one of the biggest ZFS mistakes that many make with ZFS, using RAID controllers to manage the disks instead of letting ZFS manage the disks directly. ZFS wants direct disk access. Traditional hardware RAID controllers get in the way as they interfere with how ZFS manages data integrity, caching, and recovery behavior.

When users place ZFS on top of hardware RAID, you get problems like SMART visibility breaking, disk failures become harder to seem, and caching becoming unpredictable. It is not a good thing if your ZFS loses visibility on the underlying disks in your pool. Performance tuning in this configuration can also become difficult or impossible.

Instead, the better approach is using HBA mode with your RAID controller where it passes these disks through for direct management by ZFS. To do this, you need to make sure that your RAID controller supports something like IT mode for direct disk passthrough

ZFS was designed to manage the disks itself. This is one reason many Proxmox builders specifically choose HBAs instead of traditional RAID controllers for Proxmox storage performance.

Putting backups on the same datastore as production VMs

You would be surprised at how many times this happens even in production environments. Putting backups on the same storage that you are storing your production data is a huge “no no” and definitely not something you want to do.

Outside of this being very dangerous in terms of not losing data (having all your “eggs in one basket”), it is also bad for performance too. It means that you are reading from your production storage and writing to your backup storage which are one in the same. This creates a massive overhead on read and write performance. You want to have these split out.

If you don’t have these separated, you may notice things like the following:

  • VMs become sluggish
  • latency spikes appear
  • applications pause briefly
  • Docker hosts become unstable

Good options for storing your backps include using something like dedicated PBS storage, separate SSD pools outside of your production environment. You can also target NAS devices, or offsite replication targets. By doing this, you are not only increasing performance making sure your production storage doesn’t get saturated with backup activity, you are also following best practices to keep your backups separate from production.

Check out the new settings in Proxmox Backup Server 4.2 that help us with offsite backups even more:

Adding s3 to proxmox backup server
Adding s3 to proxmox backup server

Wrapping up

Hopefully, this list of storage mistakes that are commonly made in Proxmox environments I have seen as well as my own, will help you avoid making the same mistakes. Storage, like compute and networking, is one of the most important components of having a home lab and production environment that performs well and runs the workloads you want to run, but do that in a way that is performant and stable. One of the best things you can do in your home lab is periodically step back and check whether your storage design still matches your workloads instead of the workloads you originally built the environment around. How about you, what mistakes have you seen made in the area of storage? Please share what you have seen or experienced in your lab.

Google
Add as a preferred source on Google

Google is updating how articles are shown. Don’t miss our leading home lab and tech content, written by humans, by setting Virtualization Howto as a preferred source.

About The Author

Brandon Lee

Brandon Lee

Brandon Lee is the Senior Writer, Engineer and owner at Virtualizationhowto.com, and a 7-time VMware vExpert, with over two decades of experience in Information Technology. Having worked for numerous Fortune 500 companies as well as in various industries, He has extensive experience in various IT segments and is a strong advocate for open source technologies. Brandon holds many industry certifications, loves the outdoors and spending time with family. Also, he goes through the effort of testing and troubleshooting issues, so you don't have to.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments