|
Basic Shell Scripting
In many of the other sections of the shell and utilities, we talked about a
few programming constructs that you could use to create a quick
script to perform some complex task. What if you wanted to
repeat that task with different parameters each time? One simple solution is
to is to re-type everything each time. Obviously not a happy thing.
We could use vi or some other text editor to create the file. However, we could
take advantage of a characteristic of the cat command, which is normally used to
output the contents of a file to the screen. You can also redirect the cat to
another file.
If we wanted to combine the contents of a file, we could do something like
this:
This would combine file1, file2, and file3 into newfile.
What happens if we leave the names of the source files out? In this
instance, our command would look like this:
Now, cat will take its input from the default input file, stdin.
We can now type in lines, one at a time. When we are done, we tell cat to close the file by
sending it an end-of-file character, Ctrl-D. So, to create the new command, we
would issue the cat command as above and type in our command as the
following:
<CTRL-D>
Note that here the secondary prompt, >, does not appear because it is cat
that is reading our input and not the shell.
We now have a file containing the
five lines that we typed in that we can use as a shell
script.
However, right now, all that we have is a file named newfile that contains
five lines. We need to tell the system that it is a shell
script that can be
executed. Remember in our discussion on operating system
basics that I said that
a file's permissions
need to be set to be able to execute the file. To change the
permissions, we need a new command: chmod. (Read as "change mode" because we are
changing the mode of the file.)
The chmod command is used to not only change access to a file, but also to
tell the system that it should try to execute the command. I said "try"
because the system would read that file, line-by-line, and would try to execute
each line. If we typed in some garbage in a shell
script, the system would try
to execute each line and would probably report not found for every line.
To make a file execute, we need to give it execute permissions.
To give
everyone execution permissions, you use the chmod command
like this:
Now the file newfile has execute permissions,
so, in a sense, it is
executable. However, remember that I said the system would read each line. In
order for a shell
script to function correctly, it also needs to be readable by
the person executing it. In order to read a file, you need to have read
permission on that file. More than likely, you already have read
permissions on the file since you created it. However, since we gave everyone
execution permissions,
let's give them all read permissions as well, like
this:
You now have a new command called newfile. This can be executed just like
any the system provides for you. If that file resides in a directory somewhere
in your path, all you need to do is type it in. Otherwise, (as we talked about
before) you need to enter in the path as well. Keep in mind that the system does
not need to be able to read binary
programs. All it needs to be able to do is
execute them. Now you have your first shell
script and your first self-written
UNIX command.
What happens if, after looking through all of the files, you don't find the
one you are looking for. Maybe you were trying to be sophisticated and used
"small aquatic vehicle" instead of boat. Now, six months later, you cannot
remember what you called it. Looking through every file might take a long time.
If only you could shorten the search a little. Because you remember that the
letter you wrote was to the boat dealer, if you could remember the name of the
dealer, you could find the letter.
The problem is that six months after you wrote it, you can no more remember
the dealer's name than you can remember whether you called it a "small aquatic
vehicle" or not. If you are like me, seeing the dealer's name will jog your
memory. Therefore, if you could just look at the top portion of each letter, you
might find what you are looking for. You can take advantage of the fact that the
address is always at the top of the letter and use a command that is designed to
look there. This is the head command, and we use it like this:
This will look at the first 10 (the default for head) lines of each of the
files that it finds. If the addressee were not in the first ten lines, but
rather in the first 20 lines, we could change the command to be
The problem with this is that 20 lines is almost an entire screen. If you
ran this, it would be comparable to running more on every file and hitting q to
exit after it showed the first screen. Fortunately, we can add another command
to restrict the output even further. This is the tail command, which is just the
opposite of head as it shows you the bottom of a file. So, if we knew that the
address resided on lines 15-20, we could run a command like this:
This command passes the first 20 lines of each file through the pipe,
and then tail displays the last five lines. So you would get lines 15-20 of every
file, right? Not quite.
The problem is that the shell
sees these as two tokens. That
is, two separate commands: find ./letters/taxes -exec head -20 {} \; and tail -5.
All of the output of the find is sent to the pipe
and it is the last five lines of this that tail shows. Therefore, if the find | head had found 100 files, we
would not see the contents of the first 99 files!
The solution is to add two other shell
constructs: while and read. The first
command carries out a particular command (or set of commands) while some
criteria are true. The read can read input either from the command line,
or as part of a more complicated construction. So, using cat again to create a command
as we did above, we could have something like this:
In this example, the while and read work together. The while will continue
so long as it can read something into the variable
FILE; that is, so long as
there is output coming from find. Here again, we also need to enclose the body
of the loop within the do-done pair.
The first line of the loop simply echoes the name of the file so we can keep
track of what file is being looked at. Once we find the correct name, we can use
it as the search criteria for a find | grep command. This requires looking
through each file twice. However, if all you need to see is the address,
then
this is a lot quicker than doing a more on every file.
If you have read through the other sections, you have a pretty good idea of
how commands can be put together to do
a wide variety of tasks. However, to create more complicated scripts, we need
more than just a few commands. There are several shell
constructs that you need
to be familiar with to make complicated scripts. A couple (the while and for-in
constructs) we already covered. However, there are several more that can be very
useful in a wide range of circumstances.
There are several things we need to talk about before we jump into things.
The first is the idea of arguments. Like binary
programs, you can pass arguments
to shell
scripts and have them use these arguments as they work. For example,
let's assume we have a script called myscript that takes three arguments. The
first is the name of a directory, the second is a file name, and the third is a
word to search for. The script will search for all files in the directory with
any part of their name being the file name and then search in those files for
the word specified. A very simple version of the script might look like
this:
The syntax is:
myscript directory file_name word
I discussed the while-do-done construct when I
discussed different commands like find and grep. The one difference here is that
we are sending the output of a command through a second pipe
before we send it
to the while.
This also brings up a new construct: ${1}/${file}. By enclosing a variable
name inside of curly braces, we can combine variables. In this case, we take the
name of the directory (${1}), and tack on a "/" for a directory separator,
followed by the name of a file that grep found (${file}). This builds up the
path name to the file.
When we run the program like this
myscript /home/jimmo trip boat
the three arguments /home/jimmo, trip, and boat are assigned to the
positional parameters 1, 2, and 3, respectively. "Positional" because the number
they are assigned is based on where they appear in the command. Because the
positional parameters are shell
variables, we need to refer to them with the
leading dollar sign ($).
When the shell
interprets the command, what is actually run is
If we wanted, we could make the script a little more self-documenting by
assigning the values of the positional parameters to variables. The new script
might look like this:
If we started the script again with the same arguments, first /home/jimmo
would get assigned to the variable
DIR, trip would get assigned to the variable
FILENAME, and boat would get assigned to WORD. When the command was interpreted
and run, it would still be evaluated the same way.
Being able to assign positional parameters to variables is useful for a
couple of reasons. First is the issue of self-documenting code. In this example,
the script is very small and because we know what the script is doing, we
probably would not have made the assignments to the variables. However, if we
had a larger script, then making the assignment is very valuable in terms of
keeping track of things.
The next issue is that it might seem that many older shells can only reference 10
positional parameters. The first $0 refers
to the script itself. What this can be used for, we'll get to in a minute. The
others, $1-$9, refer to the arguments that are passed to the script. What
happens if you have more than nine arguments? This is where the shift
instructions come in. These move the arguments "down" in the positional
parameters list.
For example, let's assume we changed the first part of the script like this:
On the first line, the value of positional parameter 1 is
/home/jimmo and we assign it to the variable
DIR. In the next line, the shift moves every positional parameter down. Because $0
remains unchanged, what was in $1 (/home/jimmo) drops out of the bottom. Now, the value
of positional parameter 1 is trip, which is assigned to the variable
FILENAME, and positional parameter 2 (boat) is assigned to WORD.
If we had 10 arguments, the tenth would initially be unavailable to us.
However, once we do the shift, what was the tenth argument
is shifted down and
becomes the ninth. It is now accessible through the positional parameter 9. If
we had more than 10, there are a couple of ways to get access to them. First, we
could issue enough shifts until the arguments all moved down far enough. Or, we
could use the fact that shift can take as an argument
the number of shifts it should do. Therefore, using
shift 9
makes the tenth argument
positional parameter 1.
What about the other nine arguments? Are they gone? If you never assigned
them to a variable,
then yes, they are gone. However, if you assigned them to a
variable before you made the shift, you still have access to their
values. New versions of many shells (such as bash) can handle greater number
of position parameters.
However, being able to shift positional parameters comes in handy in other instances,
which brings up the issue of a new parameter: $*. This parameter refers to all
the positional parameters (except $0). So, we had 10 positional parameters and
did a shift 2 (ignoring whatever we did with the first two), the parameter $*
would contain the value of the last eight arguments.
In our sample script above, if we wanted to search for a phrase and
not just a single word, we could change the script to look like this:
The first change was that after assigning positional parameters 1 and 2 to
variables, we shifted twice, effectively removing the first two arguments. We
then assigned the remaining argument
to the variable
WORD (WORD=$*). Because
this could have been a phrase, we needed to enclose the variable
in
double-quotes ("$WORD"). Now we can search for phrases as well as single words.
If we did not include the double quotes, the system would view our entry as
individual arguments to grep.
Another useful parameter keeps track of the total number of parameters: $#.
In the previous script, what would happen if we had only two arguments? The grep
would fail because there would be nothing for it to search for. Therefore, it
would be a good thing to keep track of the number of arguments.
We need to first introduce a new construct: if-then-fi. This is similar to
the while-do-done construct, where the if-fi pair marks the end of the block
(fi is simply if reversed). The difference is that instead of repeating the
commands within the block while the specific condition is true, we do it only
once, if the condition is true. In general, it looks like this:
The conditions are all defined in the test man-page.
They can be string
comparisons, arithmetic comparisons, and even conditions where we test specific
files, such as whether the files have write permission. Check out the test
man-page for more examples.
Because we want to check the number of arguments passed to our script, we
will do an arithmetic comparison. We can check if the values are equal, the
first is less than the second, the second is less than the first, the first is
greater than or equal to the second, and so on. In our case, we want to ensure
that there are at least three arguments, because having more is valid
if we are going to be searching for a phrase. Therefore, we want to compare the
number of arguments and check if it is greater than or equal to 3. So, we might
have something like this:
If we have only two arguments, the test inside the brackets is false, the if
fails, and we do not enter the loop. Instead, the program simply exits silently.
However, to me, this is not enough. We want to know what's going on, therefore,
we use another construct: else. When this construct is used with the if-then-fi,
we are saying that if the test evaluates to true, do one thing; otherwise, do
something else. In our example program, we might have something like this:
if [ $# -ge 3 ]
then
DIR=$1
FILENAME=$2
shift 2
WORD=$*
ls -1 $DIR | grep $FILENAME | while read file
do
grep "$WORD" ${DIR}/${file}
done
else
echo "Insufficient number of arguments"
fi
If we only put in two arguments, the if fails and the commands between the
else and the fi are executed. To make the script a little more friendly, we
usually tell the user what the correct syntax is; therefore, we might change the
end of the script to look like this:
The important part of this change is the use of the $0. As I mentioned a
moment ago, this is used to refer to the program itself not just its name, but
rather the way it was called. Had we hard-coded the line to look like this
echo "Usage: myscript <directory> <file_name> <word>"
then no matter how we started the script, the output would always be
Usage: myscript <directory> <file_name> <word>
However, if we used $0 instead, we could start the program like this
/home/jimmo/bin/myscript /home/jimmo file
and the output would be
Usage: /home/jimmo/bin/myscript <directory> <file_name>
<word>
On the other hand, if we started it like this
./bin/myscript /home/jimmo file
the output would be
Usage: ./bin/myscript <directory> <file_name> <word>
One thing to keep in mind is that the else needs to be within the matching
if-fi pair. The key here is the word matching. We could nest the
if-then-else-fi several layers if we wanted. We just need to keep track of
things. The key issues are that the ending fi matches the last fi and
the else is enclosed within an if-fi pair. Here is how multiple sets might
look:
This doesn't take into account the possibility that condition1 is false,
but that either condition2 or condition3 is true or that conditions 1 and 3 are true, but 2 is false.
However, you should see how to construct nested conditional statements.
What if we had a single variable
that could take on several values? Depending on the value that it acquired, the program would
behave differently. This could be used as a menu, for example. Many system administrators
build such a menu into their user's .profile (or .login) so that they never need to get to a
shell. They simply input the number of the program that they want to run and
away they go.
To do something like this, we need to introduce yet another construct: the
case-esac pair. Like the if-fi pair, esac is the reverse of case. So to
implement a menu, we might have something like this:
If the value of choice that we input is a, b, or c, the appropriate program
is started. The things to note are the in on the first line, the expected value
that is followed by a closing parenthesis, and that there are two semi-colons at
the end of each block.
It is the closing parenthesis that indicates the end of the possibilities.
If we wanted, we could have included other possibilities for the different
options. In addition, because the double semi-colons mark the end of the block,
we could have simply added another command before we got to the end of the
block. For example, if we wanted our script to recognized either upper- or
lowercase, we could change it to look like this:
If necessary, we could also include a range of characters, as in
Now, whatever is called as the result of one of these choices does not have
to be a UNIX
command. Because each line is interpreted as if it were executed
from the command line,
we could have included anything as though we had executed
the command from the command line.
Provided they are known to the shell
script,
this also includes aliases, variables, and even shell
functions.
A shell
function behaves similarly to functions in other programming
languages. It is a portion of the script that is set off from the rest of the
program and is accessed through its name. These are the same as the functions we
talked about in our discussion of shells. The only apparent difference is that
functions created inside of a shell script will disappear when the shell
exits.
To prevent this, start the script with a . (dot).
For example, if we had a function inside a script called myscript, we would
start it like this:
./myscript
One construct that I find very useful is select. With select, you can have a
quick menuing system. It takes the form
where each word is presented in a list and preceded by a number. Inputting
that number sets the value of name to
the word following that number. Confused? Lets look at an example. Assume we
have a simple script that looks like this:
When we run this script, we get
The "#?" is whatever you have defined as the PS3 (third-level prompt)
variable. Here, we have just left it at the default, but we could have set it to
something else. For example:
This would make the prompt more obvious, but you need to keep in mind that
PS3 would be valid everywhere (assuming you didn't set it in the script).
In our example, when we input 1, we get the date. First, however, the word
"date" is assigned to the variable
"var." The single line within the list
expands that variable
and the line is executed. This gives us the date. If we
were to input 2, the variable
"var" would be assigned the word "ls -l" and we
would get a long listing of the current directory (not where the script
resides). If we input 4, when the line was executed, we would exit from the
script.
In an example above we discussed briefly the special parameter $#. This is
useful in scripts, as it keeps track of how many positional parameters there
were and if there are not enough, we can report an error. Another parameter is
$*, which contains all of the positional parameters. If you want to check the
status of the last command you execute using the $? variable.
The process ID of the current shell is stored in the $$ parameter. Paired with
this is the $! parameters, which is the process ID of the last command executed
in the background.
One thing I sort of glossed over up to this point was the tests we made in the
if-statements in the examples above. In one case we had this:
As we mentioned, this checks the number of command-line arguments ($#) and tests
whether it is greater than or equal to 3. We could have written it like this:
With the exact same result. In the case of the bash, both [ and test are built
into the shell. However, with other shells, they are
external commands (however they are typically together). If you look at either
the test or bash man-page, you will see that there are
many more things we can test. In our examples, we were either testing two
strings or testing numerical values. We can also test many different conditions
related to files, not just variables as we did in these examples.
It is common with many of the system scripts (i.e. those under /etc/rc.d) that
they will first test if a particular file exists before proceeding. For
example, a script might want to test if a configuration file exists. If so, it
will read that file and use the values found in that file. Otherwise it will use
default values. Sometimes these scripts will check whether a file exists and is
executable. In both cases, a missing file could mean an error occurred or simply
that a particular package was not installed.
|