[Lvlug] Re: Detecting new bad blocks on a disk while a system is
running?
Keith Erekson
kbe2 at Lehigh.EDU
Wed Dec 20 12:03:33 EST 2006
The tool you're looking for is called smartmon, or smartd. In Debian,
it's packaged as smartmontools.
You can configure this to check the smart status of drives on a regular
basis, which generally catches these things. You can also have it do
longer tests periodically (weekly?) that should find these sorts of
problems before they're really, really bad.
http://sourceforge.net/projects/smartmontools
-Keith
PS Sorry for the weird spacing below, I couldn't figure out how to make
Thunderbird reply to the digest intelligently.
I've had a question sort of floating around in the back of my head for a
while, I'll try to ask it, if nothing else that my clarify my thinking. (I
have reasons to ask the question, I'll probably mention those.)
The question is something like this: if a Linux system is running, and some
part of the disk develops an error, what happens? I mean is there anything
that a Linux system does while running that might detect a (new) bad area on
disk and then either report that to the user, or add it to the bad blocks
list and then readjust the layout of the files on the disk to avoid that new
bad area?
The reason I ask is that a few times in the last 12 months, I've had my system
start to act funny in some way. (ATM, I can't recall those occurrences
enough to describe them--I'm sure I wrote about at least one to this list
describing some sort of (odd) symptoms and seeking help in troubleshooting
them.
In the two cases I know occurred (there might have been a third), after some
time either the system actually crashed or I intentionally or accidentally
rebooted, and then found that the system would not boot up. On further
investigation, I typically found that one of my partitions was now bad (can't
remember exactly how I determined that), and I solved the problem by creating
a new partition out of unused space on the disk, then reinstalling as
necessary.
Looking back, if Linux does nothing to look for badblocks in a running system
(or if it does something but I've missed the error reports--maybe I need to
start looking at those ;-) , that might explain these occurrences.
I was then going to ask how to prevent such occurrences. If Linux does look
for these occurrences and reports them in one of the logs, I need to start
watching that log at least occasionally.
But if not, what kind of "maintenance" can be done on a continually running
system to detect such occurrences?
And, if I find out that something like that has occurred, what should I do at
that point? (Create a new partition and move the files from the old to the
new partition?)
Does an fsck deal with this kind of occurrence? Should I run fsck at regular
intervals (maybe most easily by rebooting)?
Randy Kramer
--
Keith Erekson '03, '05
Senior Computing Consultant
Library & Technology Services
Lehigh University
keith at lehigh.edu
More information about the Lvlug
mailing list