When Things Go Wrong


Until you become very accustomed to using Linux you’re likely to make mistakes
(which also happens to people who have been working with Linux for a long
time). In this section, we’ll be talking about some common mistakes and
problems that occur when you first start using Linux.

Usually when you make mistakes the system will let you know in some way.
When using using the command line, the system will tell you in the form
of error messages. For example, if you try to execute a command and the
command does not exist, the system may report something like this:

bash: some_command: command not found

Such an error might occur if the command exists, but it does not reside
in a directory in your search path. You can find more about this in
the section on directory paths.

The system may still report an error, even if it can execute the command. For
example, if the command acts on a file that does not exist. For example,
the more displays the contents of a file. If the file you
want to look at does not exist, you might get the error:

some_file: No such file or directory

In the first example, the error came from your shell as it
tried to execute the command. In the second case, the error came from
the more command as it encountered the error when trying
to access the file.

In both these cases, the problem is pretty obvious. In some cases,
you are not always sure. Often you include such commands within shell
scripts and want to change the flow of the script based on errors or success
of the program. When a command ends, it provides its “exit code” or
“return code” in the special variable $?. So after a command fails,
running this command will show you the exit code:

echo $?

Note that it is up to the program to both provide the text message and the
return code. Sometimes you end up with a text message that does not make sense
(or there is no text at all), so all you get is the return code, which is
probably even less understandable. To make a translation between the return
code and a text message, check the file /usr/include/asm/errno.h.

You need to be aware that errors on one system (i.e. one Linux distribution) are not necessarily errors on other systems. For example, if you forget the space in this command, some distributions will give you an error:

ls-l

However, on SUSE Linux, this will generate the same output as if you had not forgotten the space. This is because the ls-l is an alias to the command ls -l. As the name implies, an alias is a way of referring to something by a different name. For details take a look at the section on aliases.

It has happened before that I have done a directory listing and saw a particular file. When I tried to remove it, the system told me the file did not exist. The most likely explanation is that I misspelled the filename, but that wasn’t it. What can happen sometimes is that a control character ends up becoming part of the filename. This typically happens with the backspace as it is not always defined as the same character on every system. Often the backspace is CTRL-H, but it could happen that you create a file on a system with a different backspace key and end up creating a filename with CTRL-H. When you display the file it prints out the name and when it reaches the backspace backs up one character before continuing. For example your ls output might show you this file:

jimmo

However trying to erase it you get an error message. To see any ” non printable” characters you would use the -q option to ls. This might show you:

jimmoo?

Which says the file name actually contains two o’s and a trailing backspace. Since the backspace erased the last ‘o’ in the display, you do not see it when the file name is displayed normally.

Sometimes you lose control of programs and they seem to “runaway”. In other cases, a program may seem to hang and freeze your terminal. Although it is possible because of a bug in the software or a flaky piece of hardware, oftentimes the user makes a mistake he was not even aware of. This can be extremely frustrating for the beginner, since you do not even know how you got yourself into the situation, let alone how to get out.

When I first started learning Unix (even before Linux was born) I would start programs and quickly see that I needed to stop them. I knew I could stop the program with some combination of the control key and some other letter. In my rush to stop the program, I would press the control key and many different letters in sequence. On some occassions, the program simply stop and goes no further. On other occasions, the program would appear to stop, but I would later discover that it was still running. What happened was that I hit a combination that did not stop the program but did something else.

In the first example, where the program would stop and go no further, I had “suspended” the program. In essence, I’d put it to sleep and it would wait for me to tell it to start up again. This is typically done by pressing CTRL-S. This feature can obviously be useful in the proper circumstance, but when it is unexpected and you don’t know what you did, it can be very unnerving. To put things right, you resume the command with CTRL-Q.

In the second example, where the program seemed to have disappeared, I had also suspended the program but at the same time had put in the “background”. This special feature of Unix shells dates from the time before graphical interfaces were common. It was a great waste of time to start a program and then have to wait for it to complete, when all you were interested in was the output which you could simply write to file. Instead you put a program in the background and the shell returned to the prompt, ready for the next command. It’s sometimes necessary to do this once a command is started, which you do by pressing CTRL-Z, which suspends the program, but returns to the prompt. You then issue the bg command, which starts the previous command in the background. (This is all part of “job control” which is discussed in another section.)

To stop the program, what I actually wanted to do was to “interrupt” it. This is typical done with CTRL-C.


What this actually does is to send a signal to the program, in this case an interrupt signal. You
can define which signal is sent when you press any given combination of keys. We talk about this in
the section on terminal settings.

When you put a command in the background which send output to the screen, you need to be careful about running other programs in the meantime. What could happen is that your output gets mixed up, making it difficult to see which output belongs to which command.

There have been occasions where I have issued a command and the shell jumps to the next line, then simply displays a greater than symbol (>). What this often means is that the shell does not think you are done with the command. This typically happens when you are enclosing something on the command line quotes in you forget to close the quotes. For example if I wanted to search for my name in a file I would use the grep command. If I were to do it like this:

grep James Mohr filename.txt

I would get an error message saying that the file “Mohr” did not exist.

To issue this command correctly I would have to include my name inside quotes, like this:

grep “James Mohr” filename.txt


However, if I forgot the final quote, for example, the shell would not think the command was
done yet and would perceive the enter key that I pressed as part of the command. What I
would need to do here is to interrupt the command, as we discussed previously. Note this
can also happen if you use single quotes. Since the shell does not see any difference
between a single quote and an apostrophe, you need to be careful with what you type. For
example if I wanted to print the phrase “I’m Jim”, I might be tempted to do it like this:/P>

echo I’m Jim

However, the system does not understand contractions and thinks I have not finished the command.

As we will discuss in the section on pipes and redirection, you can send the output of a command to a file. This is done with the greater than symbol (>). The generic syntax looks like this:

command > filename

This can cause problems if the command you issue expects more arguments than you gave it. For example, if I were searching the contents of a file for occurrences of a particular phrase

grep phrase > filename

What would happen is the shell would drop down to the next line and simply wait forever or until you interrupted the command. The reason is that the grep command can also take input from the command line. It is waiting for you to type in text, before it will begin searching. Then if it finds the phrase you are looking for it will write it into the file. If that’s not what you want the solution here is also to interrupt the command. You can also enter the end of file character (CTRL-D), which would tell grep to stop reading input.

One thing to keep in mind, is that you can put a program in the background even if the shell does not understand job control. In this case, it is impossible to bring the command back to the foreground in order to interrupt. You need to do something else. As we discussed earlier, Linux provides you a tool to display the processes which you are currently running (the ps command). Simply typing ps on the command line might give you something like this:

PID TTY TIME CMD 29518 pts/3 00:00:00 bash 30962 pts/3 00:00:00 ps

The PID column in the ps output is the process identifier (PID).

If not run in the background, the child processes will continue to do its job until its finished and then report back to its parent when it is done. A little house cleaning is done and the process disappears from the system. However, sometimes, the child doesn’t end like it is supposed to. One case is when it becomes a “runaway” process. There are a number of causes of runaway processes, but essentially it means that the process is no longer needed but does not disappear from the system

The result of this is often the parent cannot end either. In general, the parent should not end until all of its children are done (however there are cases where it is desired). If processes continue to run they take up resource and can even bring the system to a stand still.

In cases where you have “runaway” processes or any other time where as process is running that you need to stop, you can send any process a signal to stop execution if you know its PID. This is the kill command and syntax is quite simple:

kill <PID>

By default, the kill command sends a termination signal to that process. Unfortunately, there are some cases where a process can ignore that termination signal. However, you can send a much more urgent “kill” signal like this:

kill -9 <PID>

Where “9” is the number of the SIGKILL or kill signal. In general, you should first try to use signal 15 or SIGTERM. This sends a terminate singal and gives the process a chance to end “gracefully”. You should also look to see if the process you want to stop has any children.

For details on what other signals can be sent and the behavior in different circumstances look at the kill man-page or simply try kill -l:

1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP 6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1 11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP 21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ 26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR 31) SIGSYS 35) SIGRTMIN 36) SIGRTMIN+1 37) SIGRTMIN+2 38) SIGRTMIN+3 39) SIGRTMIN+4 40) SIGRTMIN+5 41) SIGRTMIN+6 42) SIGRTMIN+7 43) SIGRTMIN+8 44) SIGRTMIN+9 45) SIGRTMIN+10 46) SIGRTMIN+11 47) SIGRTMIN+12 48) SIGRTMIN+13 49) SIGRTMIN+14 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12 53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7 58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2 63) SIGRTMAX-1 64) SIGRTMAX

Keep in mind that sending signals to a process is not just to kill a process. In fact, sending signals to processes is a common way for processes to communicate with each other. You can find more details about signals in the section on interprocess communication.

In some circumstances, it is not easy to kill processes by their PID. For example, if something starts dozens of other processes, it is ineffective to try to input all of their PIDs. To solve this problem Linux has the killall command and takes the command name instead of the PID. You can also use the -i, –interactive option to interactively ask you if the process should be kill or the -w, –wait option to wait for all killed processes to die. Note that if processed ignores the signal or if it is a zombie, then killall may end up waiting forever.There have been cases where I have frantically tried to stop a runaway program and repeatedly pressed Ctrl-C. The result is that the terminal gets into an undefined state whereby it does not react properly to any input, that is when you press the various keys. For example, pressing the enter key may not bring you to a new line (which it normally should do). If you try executing a command, it’s possible to command is not executed properly, because the system has not identified the enter key correctly. You can return your terminal to a “sane” condition by inputting:

stty sane Ctrl-J

The Ctrl-J character is the line feed character and is necessary as the system does not recognize the enter key.

It has happened to me a number of times, that the screen saver was activated and it was if the system had simply frozen. There were no error messages, no keys work and the machine did not even respond across the network (telnet, ping, etc.) Unfortunately, the only thing to do in this case is to turn the computer off and then on again.

On the other hand, you can prevent these problems in advance. THe most likely cause it that the Advanced Power Management (APM) is having problems. In this case, you should disable the APM within the system BIOS. Some machines also have something called “hardware monitoring”. This can cause problems, as well, and should be disabled.

Problems can also be caused by the Advanced Programmable Interrup controller. This can be deactivated by changing the boot string used by either LILO or grub. In addtion, you can disable it by adding “disableapic” to your boot line.