![]() ![]() Whereas, if the same drive had SMART 197 raw value of 5 and a SMART 198 raw value of 20 and no other errors, we might hold off on replacing the drive awaiting more data, such as the frequency of the errors occurring. From that we could conclude the drive is deteriorating and should be scheduled for replacement. How does understanding the correlation, of lack thereof, of these SMART stats help us? Let’s say, a drive reported a SMART 5 raw value of 10 and SMART 197 raw value of 20. Why do we continue to collect both SMART 197 and SMART 198? Two reasons: 1) the correlation isn’t perfect, so there’s room for error, and 2) not all drive manufacturers report both attributes. Only SMART 197 and 198 have a good correlation, meaning we could consider them as one indicator versus two. In most instances, the stats have little correlation and can be considered independent. But, before we decide that multiple errors help, let’s take a look at the correlation between these SMART stats as seen in the chart below. To clarify, a value of one means that of the five SMART stats we track, only one has a value greater than zero, while a value of five means that all five SMART stats we track have a value greater than zero. The following chart shows the incidence of having one, two, three, four or all five of the SMART stats we track have a raw value that is greater than zero. One thing that helps is when we observe multiple SMART errors. The reality is it can take a fair amount of intelligence (both human and artificial) during the evaluation process to reach the conclusion that an operational drive is going to fail. On its own, such a value means little until combined with other factors. For example, a drive may have a SMART 5 raw value of two, meaning two drive sectors have been remapped. Having a given drive stat with a value that is greater than zero may mean nothing at the moment. Are these stats useful? I’ll let you decide if you’d like to have a sign of impending drive failure 76.7% of the time. That means that 23.3% of failed drives showed no warning from the SMART stats we record. ![]() Operational drives with one or more of our five SMART stats greater than zero: 4.2%.įailed drives with one or more of our five SMART stats greater than zero: 76.7%. While no single SMART stat is found in all failed hard drives, here’s what happens when we consider all five SMART stats as a group. SMART Stats We Use to Predict Hard Drive Failureįor the last few years we’ve used the following five SMART stats as a means of helping determine if a drive is going to fail. In this case, the 1TB drive is not marked as a failure, but the SMART data will no longer be logged. ![]() #DISK SMART UTILITY UPGRADE#Sometimes a drive will be removed from service even though it has not failed, like when we upgrade a Storage Pod by replacing 1TB drives with 4TB drives. Drives which have failed are marked as such and their data is no longer logged. #DISK SMART UTILITY DOWNLOAD#You can download these logs files from our website. and create a row in the daily log for each drive. #DISK SMART UTILITY SERIAL#We add in a few elements, such as drive model, serial number, etc. This is done once a day for each hard drive. We use Smartmontools to capture the SMART data. While we’ve looked at SMART stats before, this time we’ll dig into the SMART stats we use in determining drive failure and we’ll also look at a few other stats we find interesting. SMART stands for Self-Monitoring, Analysis, and Reporting Technology and is a monitoring system included in hard drives that reports on various attributes of the state of a given drive. What if a hard drive could tell you it was going to fail before it actually did? Is that possible? Each day Backblaze records the SMART stats that are reported by the 67,814 hard drives we have spinning in our Sacramento data center. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |