[Lvlug] Detecting new bad blocks on a disk while a system
jmcnam at speakeasy.net
Wed Dec 20 13:05:26 EST 2006
> The question is something like this: if a Linux system is running, and some
> part of the disk develops an error, what happens? I mean is there anything
> that a Linux system does while running that might detect a (new) bad area on
> disk and then either report that to the user, or add it to the bad blocks
> list and then readjust the layout of the files on the disk to avoid that new
> bad area?
I think, to start, you might want to look at smartd. You could at least be 'warned' that some sectors/blocks might be going bad and make physical disk timing/replacement decision based on that output.
On-the-fly 'badblocks' marking/correction, as such on a 'live' file system on a single drive is something, at least for ext2 and ext3 I've not heard about. Maybe some of the more advanced/exotic FSs such as XFS or JFS or others can do this 'live preemptive auto-correcting' (is that what you're thinking ?). Unless, like in a basic RAID-1 setup (mirror), you have 'another copy of the bits' hanging around, I'm not sure how you'd get back bits lost to a truly damaged or bad sector on a single drive which cropped up very quickly.
Interesting topic, though and it seems such 'self-healing' system concepts are currently what's driving much of the SAN and Virtualiztion markets in the enterprise (e.g. entire virtual servers which just reboot/replicate themselves when they detect trouble or 'heartbeat' arrangements of computers which keep services going mostly transparent to the users). All cool stuff and much it done cost-effectively with FLOSS.
Still, at some point, even if the FS could do such a thing, you would have to monitor it anyway or be getting some regular indication on what it was doing - what if the drive got progressively and/or rapidly worse over time and the FS just kept trying to move bits around until there were no more 'good' blocks left ? - that could get nasty...;)
a few links on the topic:
Here's a link about Sun's 'ZFS'...(they claim it 'self-healing')
More information about the Lvlug