raid-check on RHEL/CentOS/Fedora

While attending a gig at a club, I get a message from one of our database servers, stating

/etc/cron.weekly/99-raid-check:
WARNING: mismatch_cnt is not 0 on /dev/md0

Quick google search via phone shows it’s not a “This system will die in a minute” issue, but rather fairly trivial.
Since being tired anyway (4:25 AM), decided to go home and check up on it.

Basically, since RHEL and CentOS 5.4 and I think Fedora 12 there’s a new script and weekly cronjob.
The cronjob’s called
‘/etc/cron.weekly/99-raid-check’.

The very first check of that script is to see if /etc/sysconfig/raid-check exists, if it does, read it in, and if not, exit.
Then followed by a check if the variable in raid-check ‘ENABLED’ is set to ‘yes’, else exit.
Check up on the variable and enable it if it isn’t.

In the same config, raid-check, you can set the devices it should check and also repair.

By default those are empty, so it’s a good idea to enter your devices there into CHECK_DEVS and REPAIR_DEVS, md0,md1, etc.

The default seems to do a check nonetheless on the devices, since I got a message while no devices have been entered into that variable, but no repair has been done.

If the case occurs that you get a mail stating

/etc/cron.weekly/99-raid-check:
WARNING: mismatch_cnt is not 0 on /dev/md0

while no devices have been entered into the REPAIR_DEVS variable, log into the complaining box and run following:

[root@dbbox ~]# cat /sys/block/md0/md/mismatch_cnt
256
[root@dbbox ~]# echo repair >> /sys/block/md0/md/sync_action
[root@dbbox ~]# echo check >> /sys/block/md0/md/sync_action
[root@dbbox ~]# cat /sys/block/md0/md/mismatch_cnt
0

Problem, as much as it’s not really one, solved.
Now, enter your devices to repair on an error such as above into /etc/sysconfig/raid-check at REPAIR_DEVS=””, and next time this occurs, it’s done automatically.