Checking the Sanity of Your System
Have you ever tried to do something and it didn’t behave the way you expected
it to? You read the manual and typed in the example character for a character
only to find it didn’t work right. Your first assumption is that the manual is
wrong, but rather than reporting a bug, you try the command on another machine
and to your amazement, it behaves exactly as you expect. The only logical reason
is that your machine has gone insane.
Well, at least that’s the attitude I have
had on numerous occasions. Although this personification of the system helps
relieve stress sometimes, it does little to get to the heart of the problem.If
you want, you could check every single file on your system (or at least those
related to your problem) and ensure that permissions are correct, the size is
right, and that all the support files are there. Although this works in many
cases, often figuring out which programs and files are involved is not easy.
Fortunately, help is on the way. Linux provides several useful tools with
which you can not only check the sanity of your system but return it to normal.
I’ve already talked about the first set of tools. These are the monitoring tools
such as ps and vmstat. Although these programs
cannot correct your problems, they can indicate where problems lie.
If the problem is the result of
a corrupt file (either the contents are corrupt or the permissions
are wrong), the system monitoring tools cannot help much. However, several tools specifically address different aspects of your system.
Linux provides a utility to compute a checksum
on a file, called sum. It provides
three ways of determining the sum. The first is with no options at all, which
reports a 16-bit sum. The next way uses the -r option, which again provides a
16-bit checksum but uses an older method to
compute the sum. In my opinion, this method is more reliable because the byte
order is important as well. Without the -r, a file containing the word
“housecat” would have the same checksum if you changed that single word to
“cathouse.” Although both words have the exact same bytes, they are in a
different order and give a different meaning. Note that I have seen that newer
versions of sum default to the -r behavior. If you want the older behavior
(and get the same checksum in this example), use the -s option.
On many systems, there is
the md5sum command. Instead of creating a 16-bit checksum,
md5sum creates a 128-bit
checksum. This makes it substantially more difficult to hide the fact that a file
has changed.
Because of the importance of the file’s checksum,
I created a shell script while I was in tech support that would
run on a freshly installed system. As it ran, it would store in a database all
the information provided in the permissions
lists, plus the size of the file (from an ls -l listing), the type of
file (using the file command), and the checksum
(using sum with the -r option). If I was on the phone with a customer and things didn’t appear right, I could do a quick grep of that file name and get the necessary information. If they didn’t match, I knew something was out of whack.
Unfortunately
for the customer, much of the information that my script and database provided
was something to which they didn’t have access. Now, each system
administrator could write a similar script and call
up that information. However, most administrators do not consider this issue
until it’s too late.
We now get to the “sanity checker” with which perhaps most people are familiar:
fsck, the file system checker. Anyone who has lived through a system
crash or had the system shut down improperly has seen fsck.
One unfamiliar aspect of fsck is the fact that it is actually several programs,
one for each of the different file systems. This is done because of the
complexities of analyzing and correcting problems on each file system. As a
result of these complexities, very little of the code can be shared. What can be
shared is found within the fsck program.
When it runs, fsck determines what type of file system you want to check and
runs the appropriate command. For example, if you were checking an ext2fs file
system, the program that would do the actually checking would be
fsck.ext2 (typically in the /sbin directory).
Another very useful sanity checker is the
rpm package manager (assuming that your system uses the RPM
file format) that is the RPM program
itself. As we talked about earlier, the rpm program is used to install additional
software. However, you can use many more options to test the integrity of your system.
When the system is
installed, all of the file information is stored in several files located in
/var/lib/rpm. These are hashed files that rpm can use but mean very little to us
humans. Therefore, I am not going to go into more detail about these files.
Assuming you know what file is causing the problem, you
can use rpm
to determine the package to which this file belongs. The syntax would be
rpm -q -f <full_path_to_file>
The -q puts rpm into query mode and the -f tells it to query the
following file and tell me to what package it belongs. Once you know to what
package a file belongs, you can verify the package. For example, lets say that
you believe that there is something wrong with the xv file viewer. Its
full path is /usr/bin/X11R6/xv, so to find out to what
package it belongs, the command would be
rpm -q -f /usr/bin/X11R6/xv
This tells you that xv is part of the package
xv-3.10a-3
Now use the -V option to verify the package:
rpm -V xv-3.10a-3
If rpm returns with no response, the package is fine. What if the owner and group
are wrong? You would end up with an output that looks like
this:
..UG. /usr/bin/X11R6/xv
Each dot represents a particular characteristic of the file. These characteristics are
| 5 |
MD5 checksum |
| S |
File size |
| L |
Symbolic link |
| T |
Modification time |
| D |
Device |
| U |
User |
| G |
Group |
| M |
Mode (permissions and file type) |
If any of these characteristics are incorrect, rpm will display the appropriate letter.
If you wanted to check all of the packages you could create a script that looks
like this:
rpm -qa | while read pkg
do
echo “======== $pkg ================”
rpm -V $pkg
done
The first rpm command simply lists all of the package and pipes it into the read,
which then loops through each package and verifies it. Since each file will be
listed, you should have some seperator between each packages.