[Lvlug] Detecting new bad blocks on a disk while a system is running?

jmm jmcnam at speakeasy.net
Wed Dec 20 13:05:26 EST 2006


> 
> The question is something like this: if a Linux system is running, and some 
> part of the disk develops an error, what happens?  I mean is there anything 
> that a Linux system does while running that might detect a (new) bad area on 
> disk and then either report that to the user, or add it to the bad blocks 
> list and then readjust the layout of the files on the disk to avoid that new 
> bad area?
> 

I think, to start, you might want to look at smartd. You could at least be 'warned' that some sectors/blocks might be going bad and make physical disk timing/replacement decision based on that output.

On-the-fly 'badblocks' marking/correction, as such on a 'live' file system on a single drive is something, at least for ext2 and ext3 I've not heard about. Maybe some of the more advanced/exotic FSs such as XFS or JFS or others can do this 'live preemptive auto-correcting' (is that what you're thinking ?). Unless, like in a basic RAID-1 setup (mirror), you have 'another copy of the bits' hanging around, I'm not sure how you'd get back bits lost to a truly damaged or bad sector on a single drive which cropped up very quickly.

Interesting topic, though and it seems such 'self-healing' system concepts are currently what's driving much of the SAN and Virtualiztion markets in the enterprise (e.g. entire virtual servers which just reboot/replicate themselves when they detect trouble or 'heartbeat' arrangements of computers which keep services going mostly transparent to the users). All cool stuff and much it done cost-effectively with FLOSS.

Still, at some point, even if the FS could do such a thing, you would have to monitor it anyway or be getting some regular indication on what it was doing - what if the drive got progressively and/or rapidly worse over time and the FS just kept trying to move bits around until there were no more 'good' blocks left ? - that could get nasty...;)

a few links on the topic:
http://www.linuxjournal.com/article/6983
http://smartlinux.sourceforge.net/
http://www.cyberciti.biz/tips/monitoring-hard-disk-health-with-smartd-under-linux-or-unix-operating-systems.html

Here's a link about Sun's 'ZFS'...(they claim it 'self-healing')
http://www.sun.com/2004-0914/feature/





More information about the Lvlug mailing list