Checking the Sanity of Your System

Checking the Sanity of Your System

Have you ever tried to do something and it didn’t behave the way you expected it to? You read the manual and typed in the example character for a character only to find it didn’t work right. Your first assumption is that the manual is wrong, but rather than reporting a bug, you try the command on another machine and to your amazement, it behaves exactly as you expect. The only logical reason is that your machine has gone insane.

Well, at least that’s the attitude I have had on numerous occasions. Although this personification of the system helps relieve stress sometimes, it does little to get to the heart of the problem.If you want, you could check every single file on your system (or at least those related to your problem) and ensure that permissions are correct, the size is right, and that all the support files are there. Although this works in many cases, often figuring out which programs and files are involved is not easy.

Fortunately, help is on the way. Linux provides several useful tools with which you can not only check the sanity of your system but return it to normal. I’ve already talked about the first set of tools. These are the monitoring tools such as ps and vmstat. Although these programs cannot correct your problems, they can indicate where problems lie.

If the problem is the result of a corrupt file (either the contents are corrupt or the permissions are wrong), the system monitoring tools cannot help much. However, several tools specifically address different aspects of your system.

Linux provides a utility to compute a checksum on a file, called sum. It provides three ways of determining the sum. The first is with no options at all, which reports a 16-bit sum. The next way uses the -r option, which again provides a 16-bit checksum but uses an older method to compute the sum. In my opinion, this method is more reliable because the byte order is important as well. Without the -r, a file containing the word “housecat” would have the same checksum if you changed that single word to “cathouse.” Although both words have the exact same bytes, they are in a different order and give a different meaning. Note that I have seen that newer versions of sum default to the -r behavior. If you want the older behavior (and get the same checksum in this example), use the -s option.

On many systems, there is the md5sum command. Instead of creating a 16-bit checksum, md5sum creates a 128-bit checksum. This makes it substantially more difficult to hide the fact that a file has changed.

Because of the importance of the file’s checksum, I created a shell script while I was in tech support that would run on a freshly installed system. As it ran, it would store in a database all the information provided in the permissions lists, plus the size of the file (from an ls -l listing), the type of file (using the file command), and the checksum (using sum with the -r option). If I was on the phone with a customer and things didn’t appear right, I could do a quick grep of that file name and get the necessary information. If they didn’t match, I knew something was out of whack.

Unfortunately for the customer, much of the information that my script and database provided was something to which they didn’t have access. Now, each system administrator could write a similar script and call up that information. However, most administrators do not consider this issue until it’s too late.

We now get to the “sanity checker” with which perhaps most people are familiar: fsck, the file system checker. Anyone who has lived through a system crash or had the system shut down improperly has seen fsck. One unfamiliar aspect of fsck is the fact that it is actually several programs, one for each of the different file systems. This is done because of the complexities of analyzing and correcting problems on each file system. As a result of these complexities, very little of the code can be shared. What can be shared is found within the fsck program.

When it runs, fsck determines what type of file system you want to check and runs the appropriate command. For example, if you were checking an ext2fs file system, the program that would do the actually checking would be fsck.ext2 (typically in the /sbin directory).

Another very useful sanity checker is the rpm package manager (assuming that your system uses the RPM file format) that is the RPM program itself. As we talked about earlier, the rpm program is used to install additional software. However, you can use many more options to test the integrity of your system.

When the system is installed, all of the file information is stored in several files located in /var/lib/rpm. These are hashed files that rpm can use but mean very little to us humans. Therefore, I am not going to go into more detail about these files.

Assuming you know what file is causing the problem, you can use rpm to determine the package to which this file belongs. The syntax would be

rpm -q -f <full_path_to_file>

The -q puts rpm into query mode and the -f tells it to query the following file and tell me to what package it belongs. Once you know to what package a file belongs, you can verify the package. For example, lets say that you believe that there is something wrong with the xv file viewer. Its full path is /usr/bin/X11R6/xv, so to find out to what package it belongs, the command would be

rpm -q -f /usr/bin/X11R6/xv

This tells you that xv is part of the package

xv-3.10a-3

Now use the -V option to verify the package:

rpm -V xv-3.10a-3

If rpm returns with no response, the package is fine. What if the owner and group are wrong? You would end up with an output that looks like this:

..UG. /usr/bin/X11R6/xv

Each dot represents a particular characteristic of the file. These characteristics are

5 MD5 checksum
S File size
L Symbolic link
T Modification time
D Device
U User
G Group
M Mode (permissions and file type)

If any of these characteristics are incorrect, rpm will display the appropriate letter.

If you wanted to check all of the packages you could create a script that looks like this:

rpm -qa | while read pkg do echo “======== $pkg ================” rpm -V $pkg done

The first rpm command simply lists all of the package and pipes it into the read, which then loops through each package and verifies it. Since each file will be listed, you should have some seperator between each packages.