Your Drives Might Be Failing. Check These Free Tools

Check your disks now free tools

Storage is one of the absolutely critical aspects of any home lab and computing environment in general. And, unfortunately, storage issues can creep in without even having any signs of things going wrong in the lab. VMs may be up and running, containers are responding, backups are happening, etc. However, after recently ordering some “supposedly new” SSDs for home lab use, before putting them into production, I decided to run a quick health check. What I found is a change in how I approach disk validation in my home lab. In this post I will walk through the free tools i use to check disk health and validate things to catch problems early before they turn into data loss.

Wait, individual disk health is a thing?

If you haven’t really experienced a lot of storage failures and you may be beginning your home lab journey of learning, you may not have realized that you need to keep an eye on your individual disk health. And I will give you a tip that I learned recently. With all the price gouging and crazy prices out on the Internet now with enterprise level storage due to the AI boom, you definitely want to check drives that may be sold to you as “new” drives in the package (more on that below)

In the home lab, you are often running things like:

  • Hypervisors (Proxmox, XCP-ng, Hyper-V, ESXi, etc)
  • Distributed storage like Ceph
  • Container workloads (Docker hosts, Swarm clusters, Kubernetes)
  • Backup jobs and snapshots

All of these generate I/O activity. Especially if you are using SSDs for performance, the tricky part is that disks can degrade silently. You might not see any immediate errors, but under the surface you may have some of the following silent drive problems going on:

IssueWhat it meansWhy it matters
Higher than expected wearSSD has been used more than anticipated (high write cycles or wear level)Shorter lifespan and possible early failure, especially in write-heavy workloads
Increasing reallocated sectorsDrive is remapping bad sectors to spare onesThis is a sign of physical degradation of the disk surface and growing failure risk
Rising error countsRead/write or uncorrectable errors are being loggedData integrity may already be at risk, even if the system still looks like it is stable
Performance degradationSlower read/write speeds or inconsistent performanceThis can be a warning sign of failing hardware or worn-out NAND cells

Unfortunately, by the time you notice a real problem, your data may already be in danger of being lost due to hardware issues. This is why checking disk health in a proactive way with the right tools is one of the best things you can do to keep your storage healthy in the home lab.

Also, this is a great way to protect yourself from scammers who may be selling drives “new in the package” that have actually previously been used.

How can checking drives that you by with tools protect you?

Well, let’s just say there may be ones out there that are trying to pass off “used” drives as “new” or barely used drives. Granted, enterprise class drives usually have super high DWPD. DWPD stands for Drive Writes Per Day. It tells you how many times you can write the full capacity of the drive every day over its warranty period.

So if you have a 1 TB SSD rated at 1 DWPD, that means you can write, for the entire warranty period (usually 3 to 5 years), 1 TB per day, every day.

Still if you buy drives and expect these to be new and unused, you don’t want to see that your drive already has major use and not much runway left ahead of it. For instance, recently, I installed a drive that was sold as new that when I installed it, it looked like this. This told me that this drive was heavily used before I installed it.

Supposedly new drive that was heavily used
Supposedly new drive that was heavily used

SMART data uncovers potential disk issues

Most modern hard drives contain SMART data. SMART stands for Self-Monitoring, Analysis, and Reporting Technology. The SMART data exposes internal metrics for the drive that gives you a really good peek inside the overall health of the drive.

SMART data can also be misleading with some tools though. Tools may give you a big green OK that everything is healthy and fine while there may still be warning signs underneath if you look deeper into the SMART data. What are some things that you want to look for inside the SMART metrics and data? Take note of the following:

  • Wear leveling count might be high
  • Power-on hours might not match what you expect, especially if “new”
  • Uncorrectable errors might be non-zero
  • Total bytes written might show heavy usage

This is exactly what I saw with those “new” SSDs. On the surface, they looked fine. But once I dug into the SMART data, it was clear they had already seen significant use. So the goal is not just to check SMART status. It is to interpret the data behind it.

smartctl and smartd

If you are running Linux or anything like Proxmox, these tools can be your foundation of checking your disk health. The smartctl tool is part of the Smartmontools package and gives you direct access to SMART data.

A basic command looks like this:

smartctl -a /dev/sda
Running the smartctl a command to view the smart data for a disk
Running the smartctl a command to view the smart data for a disk

This will dump everything about the drive, including the following information.

  • Health status
  • Power-on hours
  • Wear indicators (especially for SSDs)
  • Reallocated sectors
  • Temperature history
  • Error logs

What makes smartctl a powerful tool is that it works on just about any environment that you would run in the realm of home lab. You can run it on the following:

  • Linux servers
  • Proxmox hosts
  • NAS devices
  • Many enterprise environments

You can also run short and long tests:

smartctl -t short /dev/sda
smartctl -t long /dev/sda

When you run these test, they can help you identify issues that may not visible in raw SMART data. The smartd tool complements the smartctl tool by running as a daemon and alerting you when something changes. Instead of manually checking disks, you can have your system notify you when a threshold is crossed, errors maybe are increasing, or the health status of a disk changes. For a home lab, this is a huge benefit. It turns disk monitoring into something proactive instead of reactive that you have to remember to run.

Using a GUI with GSmartControl

If you do not want to parse CLI output, GSmartControl is a great option. It uses the same backend as smartctl but gives you a graphical interface that makes it easy to see your disk health in a very quick way. You can view things like SMART attributes, run tests, see health summaries, or see warnings that may already be happening.

Gsmartcontrol is a gui interface to the smartmontools
Gsmartcontrol is a gui interface to the smartmontools

Below is an example of running the GSmartControl tool and the self-tests it includes.

Running proactive self test with the gsmartcontrol tool
Running proactive self test with the gsmartcontrol tool

After running a successful health test.

Successful self test using the gsmartcontrol tool
Successful self test using the gsmartcontrol tool

Especially on GUI-based Windows or Linux hosts, this is a great tool that I often use when I want a quick overview without digging through command line output.

Easy Windows checks with CrystalDiskInfo

If you are running Windows systems in your lab or just want a quick check on a workstation, CrystalDiskInfo is a well-known tool that is super easy to use in Windows and it is free. It gives you a simple way to get a health rating on your disks. You can also see things like temperature readings, SMART attributes, and it will alert you when something isn’t quite right.

Running the crystaldiskinfo tool to get the health of a drive on a windows system
Running the crystaldiskinfo tool to get the health of a drive on a windows system

What I like about it is how quickly you will know if something isn’t quite right. If a drive shows “Caution” or “Bad,” you know immediately that you need to have a game plan to replace the disk or make sure your backups are good. It is also useful when testing drives outside your main lab before putting them into production.

PassMark DiskCheckup

Another simple and easy command to use is the PassMark DiskCheckup tool. It is free to download and use and it gives you a super simple and lightweight tool that focuses on getting you access to SMART monitoring data. It is not quite as rich in features as some of the other tools that we have talked about already but if you are looking for a tool that gives you quick checks, and access to SMART data, this is a easy and simple one to use.

Passmark diskcheckup gives you quick access to smart monitoring data
Passmark diskcheckup gives you quick access to smart monitoring data

Validating disks with badblocks

SMART data tells you what the drive reports about itself. There is also a tool called badblocks actually tests the disk. This is important when you get a new drive, or you suspect there may be physical issues if you are seeing some flakiness with a disk. It also allows you to stress the drive before you get it into a production environment.

You can do a basic read test with the badblocks command with the following:

badblocks -sv /dev/sda
Badblocks command to identify bad blocks
Badblocks command to identify bad blocks

This will scan the disk and report any blocks that may be bad. Keep in mind that write tests are destructive, so use these with caution. This is one of the best tools for actually verifying that a drive is actually healthy, and it is not just reporting that it is.

What to look for in SMART data

As a recap, SMART monitoring data is a great place to start when looking at the overall health of your disks. What are some of the key pieces of information to look at? Notice below:

SMART AttributeWhat to check forWhy it matters
Power-on hoursUnexpectedly high hours on a “new” driveIndicates it have been used and reduced lifespan
Wear indicatorsHigh percentage used or wear leveling countShows how much of the SSD’s endurance has been used
Reallocated sectorsAny non-zero valueShows you if there might be physical problems and sectors being remapped
Uncorrectable errorsAny non-zero valueData could not be recovered, serious risk for data loss
Total bytes writtenHigher than expected for the age of the driveReveals actual usage and helps estimate remaining lifespan
TemperatureConsistently high temps (especially under load)Heat speeds up wear and will shorten the overall drive life

Wrapping up

Disk health and silent failures are one of the most common causes of issues in a home lab and they are definitely problems that can easily get overlooked. Also, the recent experience I had with “new” SSDs that were sold to me is a reminder that you cannot assume anything about your hardware if you buy things secondhand. The good news is that we have access to a whole range of tools that give you a simple and easy way to look at the overall drive health and get information that gives you the whole story on whether a drive has been used previously or has potential issues. How about you? What tools do you use to keep on top of your drive health in the home lab?

Google
Add as a preferred source on Google

Google is updating how articles are shown. Don’t miss our leading home lab and tech content, written by humans, by setting Virtualization Howto as a preferred source.

About The Author

Brandon Lee

Brandon Lee

Brandon Lee is the Senior Writer, Engineer and owner at Virtualizationhowto.com, and a 7-time VMware vExpert, with over two decades of experience in Information Technology. Having worked for numerous Fortune 500 companies as well as in various industries, He has extensive experience in various IT segments and is a strong advocate for open source technologies. Brandon holds many industry certifications, loves the outdoors and spending time with family. Also, he goes through the effort of testing and troubleshooting issues, so you don't have to.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments