Inproceedings,

To Fail Or Not To Fail: Predicting Hard Disk Drive Failure Time Windows

, , , , and .
Proceedings of the 20th International GI/ITG Conference on Measurement, Modelling and Evaluation of Computing Systems, page 19--36. Cham, Springer, (March 2020)
DOI: 10.1007/978-3-030-43024-5_2

Abstract

Due to the increasing size of today's data centers as well as the expectation of 24/7 availability, the complexity in the administration of hardware continuously icreases. Techniques as the Self-Monitoring, Analysis, and Reporting Technology (S.M.A.R.T.) support the monitoring of the hardware. However, those techniques often lack algorithms for intelligent data analytics. Especially, the integration of machine learning to identify potential failures in advance seems to be promising to reduce administration overhead. In this work, we present three machine learning approaches to (i) identify imminent failures, (ii) predict time windows for failures, as well as (iii) predict the exact time-to-failure. In a case study with real data from 369 hard disks, we achieve an F1-score of up to 98.0% and 97.6% for predicting potential failures with two or multiple time windows, respectively, and a hit rate of 84.9% (with a mean absolute error of 4.5 hours) for predicting the time-to-failure.

Tags

Users

  • @marwin.zuefle
  • @se-group
  • @joh.grohmann
  • @samuel.kounev
  • @chris.krupitzer

Comments and Reviews