Solid-state drives (SSDs) are massively deployed in various fields, especially in data centers, for their excellent cost-effectiveness. However, SSDs may fail due to their imperfect manufacturing processes, resulting in system-level failures and even downtime in data centers. This makes SSD failure prediction critical.
[...]
In this work, we study the failure characteristics of over 200,000 drives from industry data centers over a 4-year period, as well as daily data.
[...]
First, to cope with the differences between failures, a diff-state method is proposed for differential machine learning modeling of SSDs in different “States”. We define the “State” of an SSD, which represents the range of values in which the SSD currently lies in terms of some key attributes. Through flash reliability characteristics, we distinguish between different failures before training the model to obtain accurate predictions of different failure behaviors.
[...]
The evaluation results of the real dataset show that the predictive ability of Prophet is improved amazingly, realizing a high recall and low false-positive rates while providing sufficient response time for the processing of failed SSDs.
MeteorMarc · 25m ago
Onfortunately, they made the abstract into a table of contents and left out the results. As a teaser, it works ok.
[...]
In this work, we study the failure characteristics of over 200,000 drives from industry data centers over a 4-year period, as well as daily data.
[...]
First, to cope with the differences between failures, a diff-state method is proposed for differential machine learning modeling of SSDs in different “States”. We define the “State” of an SSD, which represents the range of values in which the SSD currently lies in terms of some key attributes. Through flash reliability characteristics, we distinguish between different failures before training the model to obtain accurate predictions of different failure behaviors.
[...]
The evaluation results of the real dataset show that the predictive ability of Prophet is improved amazingly, realizing a high recall and low false-positive rates while providing sufficient response time for the processing of failed SSDs.