Welcome to Linux Knowledge Base and Tutorial
"The place where you learn linux"
PHP Web Host - Quality Web Hosting For All PHP Applications

 Create an AccountHome | Submit News | Your Account  

Tutorial Menu
Linux Tutorial Home
Table of Contents
Up to --> Linux Tutorial

· Shells and Utilities
· The Shell
· The Search Path
· Directory Paths
· Shell Variables
· Permissions
· Regular Expressions and Metacharacters
· Quotes
· Pipes and Redirection
· Interpreting the Command
· Different Kinds of Shells
· Command Line Editing
· Functions
· Job Control
· Aliases
· A Few More Constructs
· The C-Shell
· Commonly Used Utilities
· Looking for Files
· Looking Through Files
· Basic Shell Scripting
· Managing Scripts
· Shell Odds and Ends

Glossary
MoreInfo
Man Pages
Linux Topics
Test Your Knowledge

Site Menu
Site Map
FAQ
Copyright Info
Terms of Use
Privacy Info
Disclaimer
WorkBoard
Thanks
Donations
Advertising
Masthead / Impressum
Your Account

Communication
Feedback
Forums
Private Messages
Recommend Us
Surveys

Features
HOWTOs
News
News Archive
Submit News
Topics
User Articles
Web Links

Google
Google


The Web
linux-tutorial.info

Who's Online
There are currently, 192 guest(s) and 3 member(s) that are online.

You are an Anonymous user. You can register for free by clicking here

  
Linux Tutorial - Shells and Utilities - Looking for Files
  Examples of Commonly Used Utilities ---- Looking Through Files  


Looking for Files

In the section on Interacting with the System we talked about using the ls command to look for files. There we had the example of looking in the sub-directory ./letters/taxes for specific files. Using ls, command we might have something like this:

What if the taxes directory contained a subdirectory for each year for the past five years, each of these contained a subdirectory for each month, each of these contained a subdirectory for federal, state, and local taxes, and each of these contained 10 letters?

If we knew that the letter we we're looking for was somewhere in the taxes subdirectory, the command

would show us the sub-directories of taxes (federal, local, state), and it would show their contents. We could then look through this output for the file we were looking for.

What if the file we were looking for was five levels deeper? We could keep adding wildcards (*) until we reached the right directory, as in:

This might work, but what happens if the files were six levels deeper. Well, we could add an extra wildcard. What if it were 10 levels deeper and we didn't know it? Well, we could fill the line with wildcards. Even if we had too many, we would still find the file we were looking for.

Fortunately for us, we don't have to type in 10 asterisks to get what we want. We can use the -R option to ls to do a recursive listing. The -R  option also avoids the "argument list too long" error that we might get with wildcards. So, the solution here is to use the ls command like this:

The problem is that we now have 1,800 files to look through. Piping them through more and looking for the right file will be very time consuming. If we knew that it was there, but we missed it on the first pass, we would have to run through the whole thing again.

The alternative is to have the more command search for the right file for you. Because the output is more than one screen, more will display the first screen and at the bottom display --More--. Here, we could type a slash (/) followed by the name of the file and press Enter. Now more will search through the output until it finds the name of the file. Now we know that the file exists.

The problem here is the output of the ls command. We can find out whether a file exists by this method, but we cannot really tell where it is. If you try this, you will see that more jumps to the spot in the output where the file is (if it is there). However, all we see is the file name, not what directory it is in. Actually, this problem exists even if we don't execute a search.

If you use more as the command and not the end of a pipe, instead of just seeing --More--, you will probably see something like

--More--(16%)

This means that you have read 16 percent of the file.

However, we don't need to use more for that. Because we don't want to look at the entire output (just search for a particular file), we can use one of three commands that Linux provides to do pattern searching: grep, egrep, and fgrep. The names sound a little odd to the Linux beginner, but grep stands for global regular expression print. The other two are newer versions that do similar things. For example, egrep searches for patterns that are full regular expressions and fgrep searches for fixed strings and is a bit faster. We go into details about the grep command in the section on looking through files.

Let's assume that we are tax consultants and have 50 subdirectories, one for each client. Each subdirectory is further broken down by year and type of tax (state, local, federal, sales, etc.). A couple years ago, a client of ours bought a boat. We have a new client who also wants to buy a boat, and we need some information in that old file.

Because we know the name of the file, we can use grep to find it, like this:

If the file is called boats, boat.txt, boats.txt, or letter.boat, the grep will find it because grep is only looking for the pattern boat. Because that pattern exists in all four of those file names, all four would be potential matches.

The problem is that the file may not be called boat.txt, but rather Boat.txt. Remember, unlike DOS, UNIX is case-sensitive. Therefore, grep sees boat.txt and Boat.txt as different files. The solution here would be to tell grep to look for both.

Remember our discussion on regular expressions in the section on shell basics? Not only can we use regular expressions for file names, we can use them in the arguments to commands. The term regular expression is even part of grep's name. Using regular expressions, the command might look like this:

This would now find both boat.txt and Boat.txt.

Some of you may see a problem with this as well. Not only does Linux see a difference between boat.txt and Boat.txt, but also between Boat.txt and BOAT.TXT. To catch all possibilities, we would have to have a command something like this:

Although this is perfectly correct syntax and it will find the files, it does not matter what case the word "boat" is in, it is too much work. The programmers who developed grep realized that people would want to look for things regardless of what case they are in. Therefore, they built in the -i option, which simply says ignore the case. Therefore, the command

will not only find boats, boat.txt, boats.txt, and letter.boat, but it will also find Boat.txt and BOAT.TXT as well.

If you've been paying attention, you might have noticed something. Although the grep command will tell you about the existence of a file, it won't tell you where it is. This is just like piping it through more. The only difference is that we're filtering out something. Therefore, it still won't tell you the path.

Now, this isn't greps fault. It did what it was supposed to do. We told it to search for a particular pattern and it did. Also, it displayed that pattern for us. The problem is still the fact that the ls command is not displaying the full paths of the files, just their names.

Instead of ls, let's use a different command. Let's use find instead. Just as its name implies, find is used to find things. What it finds is files. If we change the command to look like this:

This finds what we are looking for and gives us the paths as well.

Before we go on, let's look at the syntax of the find command. There are a lot of options and it does look foreboding, at first. We find it is easiest to think of it this way:

In this case, the "where" is ./letters/taxes. Therefore, find starts its search in the ./letters/taxes directory. Here, we have no search criteria; we simply tell it to do something. That something was to -print out what it finds. Because the files it finds all have a path relative to ./letters/taxes, this is included in the output. Therefore, when we pipe it through grep, we get the path to the file we are looking for.

We also need to be careful because the find command we are using will also find directories named boat. This is because we did not specify any search criteria. If instead we wanted it just to look for regular files (which is often a good idea), we could change the command to look like this:

Here we see the option -type f as the search criteria. This will find all the files of type f for regular files. This could also be a d for directories, c for character special files, b for block special files, and so on. Check out the find man-page for other types that you can use.

Too complicated? Let's make things easier by avoiding grep. There are many different things that we can use as search criteria for find. Take a quick look at the man-page and you will see that you can search for a specific owner, groups, permissions, and even names. Instead of having grep do the search for us, let's save a step (and time) by having find do the search for us. The command would then look like this:

This will find any file named boat and list its respective path. The problem here is that it will only find the files named boat. It won't find the files boat.txt, boats.txt, or even Boat.

The nice thing is that find understands about regular expressions, so we could issue the command like this:

(Note that we included the single quote (') to avoid the square brackets ([]) from being first interpreted by the shell.)

This command tells find to look for all files named both boat and Boat. However, this won't find BOAT. We are almost there.

We have two alternatives. One is to expand the find to include all possibilities, as in

This will find all the files with any combination of those four letters and print them out. However, it won't find boat.txt. Therefore, we need to change it yet again. This time we have

Here we have passed the wildcard (*) to find to tell it took find anything that starts with "boat" (upper- or lowercase), followed by anything else. If we add an extra asterisk, as in

we not only get boat.txt, but also newboat.txt, which the first example would have missed.

This works. Is there an easier way? Well, sort of. There is a way that is easier in the sense that there are less characters to type in. This is:

Isn't this the same command that we issued before? Yes, it is. In this particular case, this combination of find and grep is the easier solution, because all we are looking for is the path to a specific file. However, these examples show you different options of find and different ways to use them. That's one of the nice things about Linux. There are many ways to get the same result.

Note that more recent versions of find do not require the -print options, as this is the default behavior.

Looking for files with specific names is only one use of find. However, if you look at the find man-page, you will see there are many other options you can use. One thing I frequently do is to look for files that are older than a specific age. For example, on many systems, I don't want to hang on to log files that are older than six months. Here I could use the -mtime options like this:

Which says to find everything in the /usr/log/mylogs directory which is older than 180 days (Not exactly six months, but it works.) If I wanted, I could have used the -name option to have specified a particular file pattern:

One problem with this is what determines how "old" a file is? The first answer for many people is that the age of a file is how long it has been since the file was created. Well, if I created a file two years ago, but added new data to it a minute ago, is it "older" than a file that I created yesterday, but have not changed since then? It really depends on what you are interested in. For log files, I would say that the time the data in that was last changed is more significant than when the file was created. Therefore, the -mtime is fitting as it bases its time on when the data was changed.

However, that's not always the case. Sometimes, you are interested in the last time the file was used, or accessed. This is when you would use the -atime option. This is helpful in find old files on your system that no one has used for a long time.

You could also use the -ctime option, which is based on when the files "status" was last changed. The status is changed when the permissions or file owner is changed. I have used this option in security contexts. For example, on some of our systems there are only a few places that contain files that should change at all. For example, /var/log. If I search on all files that were changed at all (content or status), it might give me an indication of improper activity on the system. I can run a script a couple of times an hour to show me the files that have changed within the last day. If anything shows up, I suspect a security problem (obviously ignoring files that are supposed to change.)

Three files that we specifically monitor are /etc/passwd, /etc/group and /etc/shadow. Interestingly enough, we want to have these files change once a month (/etc/shadow). This is our "proof" that the root password was changed as it should be at regular intervals. Note that we have other mechanisms to ensure that it was the root password that was changed and not simply changing something else in the file, but you get the idea. One place you see this mechanism at work is your /usr/lib/cron/run-crons file, which is started from /etc/crontab every 15 minutes.

One shortcoming of -mtime and the others is that it measures time in 24 hour increments starting from now. That means that you cannot find anything that was changed within the last hour, for example. For this newer versions of find have the -cmin, -amin and -mmin options, which measure times in minutes. So, to find all of the files changed within the last hour (i.e. last 60 minutes) we might have something like this:

In this example, the value was preceded with a minus sign (-), which means that we are looking for files with a value less than what we specified. In this case, we want values less than 60 minutes. In the example above, we use a plus-sign (+) before the value, which means values greater that what we specified. If you use neither one, then the time is exactly what you specified.

Along the same vein, are the options -newer, -anewer, -cnewer, which find files which are newer than the file specified.

Note also that these commands find everything in the specified path older or younger than what we specify. This includes files, directories, device nodes and so forth. Maybe this is what you want, but not always. Particularly if you are using the -exec option and what to search through each file you find, looking for "non-files" is not necessarily a good idea. To specify a file type, find provides you with the -type option. Among the possible file type are:

  • b - block device
  • c - character device
  • d - directory
  • p - named pipe (FIFO)
  • f - regular file
  • l - symbolic link
  • s - socket

As you might expect, you can combine the -type option with the other options we discussed, to give you something like this:

The good news and the bad news at this point is that there are many, many more options you can use. For example. you can search for files based on their permissions (-perm), their owner (-users), their size (-size), and so forth. Many I occasionally use, some I have never used. See the find man-page for a complete list.

In addition, to the -exec option, there are a number of other ones that are applied to the files that are found (rather than used to restrict what files are found). Note that in most documentation, the options used to restrict the search are called tests and the options that perform an operation on the files are called actions. One very simple action is -ls, which does a listing of the files the same as using the -dils options to the ls command.

A variant of the -exec action is -ok. Rather than simply performing the action on each file, -ok with first ask you to confirm that it should do it. Pressing "Y" or "y" will run the command, pressing anything else will not.

With what we have discussed so far, you might run into a snag if there is more than one criterion you want to search on (i.e. more than one test). Find addresses that by allowing you to combine tests using either OR (-o -or ) or AND (-a -and). Furthermore, you can negate the results of any tests (! -not). Let's say we wanted to find all of the HTML files that were not owned by the user jimmo. Our command might look like this:

This brings up an important issue. In the section on interpreting the command, we talk about the fact that the shell expands wildcards before passing them to the command to be executed. In this example, if there was a file in the current directory ending in .html, the shell would first expand the .html to that name before passing it to find. We therefore need to "protect" it before we pass it. This is done using single quotes and the resulting command might look like this:

For details on how quoting works, check out the section on quotes.

It is important to keep in mind the order in which things are evaluated. First, negation (-not ! ), followed by AND (and -a), then finally OR (-o -or). In order to force evaluation in a particular way, you can include expressions in parentheses. For example, if we wanted all of the files or directories owned by either root or bin, the command might look like this:

The requires a little explanation. I said that you would use parentheses to group the tests together. However, they are preceded here with a back-slash. The reason is that the shell will see the parentheses and try to execute what is inside in a separate shell, which is not what we wanted.

 Previous Page
Examples of Commonly Used Utilities
  Back to Top
Table of Contents
Next Page 
Looking Through Files


MoreInfo

Test Your Knowledge

User Comments:


You can only add comments if you are logged in.

Copyright 2002-2009 by James Mohr. Licensed under modified GNU Free Documentation License (Portions of this material originally published by Prentice Hall, Pearson Education, Inc). See here for details. All rights reserved.
  

More information about the site can be found in the FAQ


Login
Nickname

Password

Security Code
Security Code
Type Security Code


Don't have an account yet? You can create one. As a registered user you have some advantages like theme manager, comments configuration and post comments with your name.

Help if you can!


Amazon Wish List

Did You Know?
You can choose larger fonts by selecting a different themes.


Friends



Tell a Friend About Us

Bookmark and Share



Web site powered by PHP-Nuke

Is this information useful? At the very least you can help by spreading the word to your favorite newsgroups, mailing lists and forums.
All logos and trademarks in this site are property of their respective owner. The comments are property of their posters. Articles are the property of their respective owners. Unless otherwise stated in the body of the article, article content (C) 1994-2013 by James Mohr. All rights reserved. The stylized page/paper, as well as the terms "The Linux Tutorial", "The Linux Server Tutorial", "The Linux Knowledge Base and Tutorial" and "The place where you learn Linux" are service marks of James Mohr. All rights reserved.
The Linux Knowledge Base and Tutorial may contain links to sites on the Internet, which are owned and operated by third parties. The Linux Tutorial is not responsible for the content of any such third-party site. By viewing/utilizing this web site, you have agreed to our disclaimer, terms of use and privacy policy. Use of automated download software ("harvesters") such as wget, httrack, etc. causes the site to quickly exceed its bandwidth limitation and are therefore expressly prohibited. For more details on this, take a look here

PHP-Nuke Copyright © 2004 by Francisco Burzi. This is free software, and you may redistribute it under the GPL. PHP-Nuke comes with absolutely no warranty, for details, see the license.
Page Generation: 0.23 Seconds