Provisioning a (new) disk
SMART
see also ./smart.md
Check SMART attributes for failures:
smartctl -a /dev/sda | grep -iE '(error|uncorrect|pending|recovered|fail)' | grep -v '0$'
Run SMART selftests:
smartctl -t short /dev/sda
smartctl -l selftest /dev/sda
smartctl -t long /dev/sda
smartctl -l selftest /dev/sda
Badblocks
If disk can get nuked:
badblocks -svw -b 4096 -c 65536 -p 1 /dev/sdX
Afterwards, check SMART log again
NVME / SSD
Do a secure erase, see ./ssd.md
Monitoring
-
Be sure prometheus node-exporter exports
smartmon_*
ornvme_*
metrics:curl -s localhost:9100/metrics | grep -iE '^(nvme|smartmon)'
- Make sure these metrics get scraped, see metrics from
smartmon_device_info
nvme_critical_warning_total
node_nvme_info