[Lvlug] Detecting new bad blocks on a disk while a system is running?

Randy Kramer rhkramer at gmail.com
Sat Dec 16 13:31:35 EST 2006


I've had a question sort of floating around in the back of my head for a 
while, I'll try to ask it, if nothing else that my clarify my thinking.  (I 
have reasons to ask the question, I'll probably mention those.)

The question is something like this: if a Linux system is running, and some 
part of the disk develops an error, what happens?  I mean is there anything 
that a Linux system does while running that might detect a (new) bad area on 
disk and then either report that to the user, or add it to the bad blocks 
list and then readjust the layout of the files on the disk to avoid that new 
bad area?

The reason I ask is that a few times in the last 12 months, I've had my system 
start to act funny in some way.  (ATM, I can't recall those occurrences 
enough to describe them--I'm sure I wrote about at least one to this list 
describing some sort of (odd) symptoms and seeking help in troubleshooting 
them.  

In the two cases I know occurred (there might have been a third), after some 
time either the system actually crashed or I intentionally or accidentally 
rebooted, and then found that the system would not boot up.  On further 
investigation, I typically found that one of my partitions was now bad (can't 
remember exactly how I determined that), and I solved the problem by creating 
a new partition out of unused space on the disk, then reinstalling as 
necessary.

Looking back, if Linux does nothing to look for badblocks in a running system 
(or if it does something but I've missed the error reports--maybe I need to 
start looking at those ;-), that might explain these occurrences.

I was then going to ask how to prevent such occurrences.  If Linux does look 
for these occurrences and reports them in one of the logs, I need to start 
watching that log at least occasionally.  

But if not, what kind of "maintenance" can be done on a continually running 
system to detect such occurrences?

And, if I find out that something like that has occurred, what should I do at 
that point?   (Create a new partition and move the files from the old to the 
new partition?)

Does an fsck deal with this kind of occurrence?  Should I run fsck at regular 
intervals (maybe most easily by rebooting)?

Randy Kramer


More information about the Lvlug mailing list