Finding Out About Your System

Finding Out About Your System

One challenging aspect of tech support is that before you talk to the customer, you have no way of knowing what the problem will be. It can be anything from simple questions that are easily answered by reading the manual to long, drawn-out system crashes.

Late one Monday afternoon, when I had been idle the longest, my phone rang as the next customer came into the queue. The customer described the situation as simply that his computer would no longer boot. For some reason, the system rebooted itself and now it would not boot.

When I asked the customer how far the system got and what, if any, error messages were on the screen, his response indicated that the root file system was trashed. At that point, I knew it was going to be a five-minute call. In almost every case, there is no way to recover from this. On a few rare occasions, fsck can clean things up to be able to boot. Because the customer had already tried that, this was not one of those occasions.

We began discussing the options, which were very limited. He could reinstall the operating system and then the data, or he could send his hard disk to a data recovery service. Because this was a county government office, they had the work of dozens of people on the machine. They had backups from the night before, but all of that days work would be lost.

Because no one else was waiting in the queue to talk to me, I decided to poke around a little longer. Maybe the messages we saw might indicate to us a way to recover. We booted from the emergency boot/root set again and started to look around. The fdisk utility reported the partition table as valid but it looked as though just the root file system was trashed, which is bad enough.

I was about ready to give up when the customer mentioned that the fdisk table didn’t look right. Three entries in the table had starting and ending blocks. This didn’t sound right because he only had two partitions: root and swap. So I checked /etc/fstab and discovered that another file system was being mounted.

Because the data was probably already trashed, there was no harm in continuing, so we decided to try running fsck on it. Amazingly enough, fsck ran though relatively quickly and reported just a few errors. We mounted the file system and, holding our breath, we did a listing of the directory. Lo and behold, there was his data. All the files appeared to be intact. Because this was all in a directory named /data, he simply assumed that there was no /usr or /home file system, which there wasn’t. However, there was a second file system.

I suggested backing up the data just to be safe. Because it was an additional file system, however, reinstalling the OS could preserve it. Within a couple of hours, he could be up and running again. The lesson learned: Make sure you know the configuration of your system! If at all possible, keep data away from the root file system and do a backup as often as you can afford to. The lesson for me was to have the customer read each entry one-by-one.

Being able to manage and administer your system requires that you know something about how your system is defined and configured. What values have been established for various parameters? What is the base address of your SCSI host adapters? What is the maximum UID that you can have on a system? All of these are questions that will eventually crop up, if they haven’t already.

The nice thing is that the system can answer these questions for you, if you know what to ask and where to ask it. In this section, we are going to take a look at where the system keeps much of its important configuration information and what you can use to get at it.

As a user, much of the information that you can get will be useful only to satisfy your curiosity. Most files that I am going to talk about you can normally read. However, you won’t be able to run a few of the utilities, such as fdisk. Therefore, what they have to say will be hidden from you.

If you are an administrator, there are probably many nooks and crannies of the system into which you never looked, many you probably never knew existed. After reading this section, I hope you will gain some new insight into where information is stored. For the more advanced system administrator, this may only serve as a refresher. Who knows? Maybe the gurus out there will learn a thing or two. Table 0-1 gives you a good overview of the various files configuration files on your system.

The command getconf will display the maximum allow value of various system configuration parameters. It is useful if you run into problems and you think you might have reached some predefined limit. For example, you might try to create a filename or username which is to long. The getconf command will show you what the maximum is. For example:

# getconf LOGNAME_MAX

will show you the maximum length of the user name. This can also show you to things that are less meaningful to users, such as the size of a page in memory.

A list of the essential system files can be found here.