Looking Through Files
In the section on looking for files, we talk about
various methods for finding a particular file on your system.
Let's assume for a moment that we were looking for a particular file, so we used the find
command to look for a specific file name, but none of the commands we issued came up with a
matching file. There was not a single match of any kind. This might mean that we
removed the file. On the other hand, we might have named it yacht.txt or
something similar. What can we do to find it?
We could jump through the same hoops for using various spelling and letter combinations,
such as we did for yacht and boat. However,
what if the customer had a canoe or a junk? Are we stuck with every possible
word for boat? Yes, unless we know something about the file, even if that
something is in the file.
The nice thing is that grep doesn't have to be the end of a pipe.
One of the arguments can be the name of a file. If you want, you can use several files,
because grep will take the first argument as the pattern it should
look for. If we were to enter
we would search the contents of all the files in the directory
./letters/taxes looking for the word Boat or boat.
If the file we were looking for happened to be in the directory
./letters/taxes, then all we would need to do is run more on the file. If things are
like the examples above, where we have dozens of directories to look through,
this is impractical. So, we turn back to find.
One useful option to find is -exec. When a file is found, you use -exec to
execute a command. We can therefore use find to find the files, then use -exec
to run grep on them. Still, you might be asking yourself what good this is to you.
Because you probably don't have dozens of files on your
system related to taxes, let's use an example from files that you most probably
have.
Let's find all the files in the /etc directory containing /bin/sh. This would be run as
The curly braces ({ }) are substituted for the file found by the search, so
the actual grep command would be something like
The "\;" is a flag saying that this is the end of the command.
What the find command does is search for all the files that match the
specified criteria then run grep on the criteria, searching for the pattern [Bb]oat.
(in this case there were no criteria, so it found them all)
Do you know what this tells us? It says that there is a file somewhere under
the directory ./letters/taxes that contains either "boat" or "Boat." It doesn't
tell me what the file name is because of the way the -exec is handled. Each file
name is handed off one at a time, replacing the {}. It is as though we had
entered individual lines for
If we had entered
grep would have output the name of the file in front of each matching line
it found. However, because each line is treated separately when using find, we
don't see the file names. We could use the -l option to grep, but that would
only give us the file name. That might be okay if there was one or two files.
However, if a line in a file mentioned a "boat trip" or a "boat trailer," these
might not be what we were looking for. If we used the -l option to grep, we
wouldn't see the actual line. It's a catch-22.
To get what we need, we must introduce a new command: xargs. By using it as
one end of a pipe,
you can repeat the same command on different files without
actually having to input the command multiple times.
In this case, we would get what we wanted by typing
The first part is the same as we talked about earlier. The find command
simply prints all the names it finds (all of them, in this case, because there
were no search criteria) and passes them to xargs. Next, xargs processes them
one at a time and creates commands using grep. However, unlike the -exec option
to find, xargs will output the name of the file before each matching line.
Obviously, this example does not find those instances where the file we were
looking for contained words like "yacht" or "canoe" instead of "boat."
Unfortunately, the only way to catch all possibilities is to actually specify
each one. So, that's what we might do. Rather than listing the different possible
synonyms for boat, lets just take the three: boat, yacht, and canoe.
To do this, we need to run the find | xargs command three times. However,
rather than typing in the command each time, we are going to take advantage of a
useful aspect of the shell.
In some instances, the shell knows when you want to
continue with a command and gives you a secondary prompt. If you are running sh
or ksh, then this is probably denoted as ">."
For example, if we typed
find ./letters/taxes -print |
the shell
knows that the pipe
(|) cannot be at the end of the line. It then
gives us a > or ? prompt where we can continue typing
> xargs grep -i boat
The shell
interprets these two lines as if we had typed them all on the same
line. We can use this with a shell
construct that lets us do loops. This is the
for/in construct for sh and ksh, and the foreach construct in csh. It would look
like this:
In this case, we are using the variable
j, although we could have called it
anything we wanted. When we put together quick little commands, we save
ourselves a little typing by using single letter variables.
In the bash/sh/ksh example, we need to enclose the body of the loop inside the
do-done pair. In the csh example, we need to include the end. In both cases,
this little command we have written will loop through three times. Each time,
the variable
$j is replaced with one of the three words that we used. If we had
thought up another dozen or so synonyms for boat, then we could have included
them all. Remember also that the shell
knows that the pipe
(|) is not the end of
the command, so this would work as well.
Doing this from the command line
has a drawback. If we want to use the same
command again, we need to retype everything. However, using another trick, we
can save the command. Remember that both the ksh and csh have history mechanisms
to allow you to repeat and edit commands that you recently edited. However, what
happens tomorrow when you want to run the command again? Granted, ksh has the
.sh_history file, but what about sh and csh?
Why not save commands that we use often in a file that we have all the time?
To do this, you would create a basic shell script, and we have a
whole section just on that topic.
When looking through files, I am often confronted with the situation where I am
not just looking for a single text, but possible multiple matches. Imagine a
data file that contains a list of machines and their various characteristics,
each on a separate line, which starts with that characteristic. For example:
Name: lin-db-01
IP: 192.168.22.10
Make: HP
CPU: 700
RAM: 512
Location: Room 3
All I want is the computer name, the IP address and the location, but not the
others. I could do three individual greps, each with a different pattern.
However, it would be difficult to make the association between the separate
entries. That is, the first time I would have a list of machine's names, the
second time a list of IP addresses and the third time a list of locations. I
have written scripts before that handle this kind of situation, but in this case
it would be easier to use a standard Linux command: egrep. The egrep command
is an extension of the basic grep command. (The 'e' stands for extended) In
older versions of grep, you did not have the ability to use things like
[:alpha:] to represent alphabetic characters, so extended grep was born. For
details on representing characters like this check out the section in
regular expressions.
One extension is the
ability to have multiple search patterns that are checked simultaneously. That
is, if any of the patterns are found, the line is displayed. So in the problem
above we might have a command like this:
This would then list all of the respective lines in order, making
association between name and the other values a piece of cake.
Another variant of grep is fgrep, which interprets the search pattern as a
list of fixed strings, separated by newlines, any of which is to be
matched. On some systems, grep, egrep and fgrep will all be a
hard link to the same file.
I am often confronted with files where I
want to filter out the "noise". That is, there is a lot of stuff in the files
that I don't want to see. A common example, is looking through large shell
scripts or configuration files when I am not sure exactly what I am looking for.
I know when I see it, but to simply grep for that term is impossible, as I am
not sure what it is. Therefore, it would be nice to ingore things like comments and
empty lines.
Once again we could use egrep as there are two expressions we
want to match. However, this type we also use the -v option, which simply flips
or inverts the meaning of the match. Let's say there was a start-up script that
contained a variable you were looking for, You might have something like this:
The first part of the expressions says to match on the beginning of the line (^)
followed immediately by the end of the line ($), which turn out to be all empty
lines. The second part of the expression says to match on all lines that start
with the pound-sign (a comment). This ends up giving me all of the "interesting"
lines in the file. The long option is easier to remember: --invert-match.
You may also run into a case where all you are interested in is which files
contain a particular expression. This is where the -l option comes in (long
version: --files-with-matches). For example, when I made some style changes to
my web site I wanted to find all of the files that contained a table. This means
the file had to contain the <TABLE> tag. Since this tag could contain
some options, I was interested in all of the file which contained "<TABLE".
This could be done like this:
There is an important thing to note here. In the section on
interpreting the command, we learn that the shell sets up file
redirection before it tries to execute the command. If we don't include the
less-than symbol in the single quotes, the shell will try to redirect the input
from a file name "TABLE". See the section on
quotes for details on this.
The -l option (long version:
--files-with-matches) says to simply list the file names. Using the -L option
(long version: --files-without-match) we have the same effect as using both the
-v and the -l options. Note that in both cases, the lines containing the matches
are not displayed, just the file name.
Another common option is -q
(long: --quiet or --silent). This does not display anything. So, what's the use in that? Well, often, you simply want to know if a particular value exists in a file. Regardless of the options you use, grep will return 0 if any matches
were found, and 1 if no matches were found. If you check the $? variable after
running grep -q. If it is 0, you found a match. Check out the section on
basic shell scripting for details on the $? and other
variables.
Keep in mind that you do not need to use grep to read through
files. Instead, it can be one end of a pipe. For example, I have a number of
scripts that look through the process list to see if a particular process is
running. If so, then I know all is well. However, if the process is not running,
a message is sent to the administrators.
|