Welcome to Linux Knowledge Base and Tutorial
"The place where you learn linux"
No Starch Press

 Create an AccountHome | Submit News | Your Account  

Tutorial Menu
Linux Tutorial Home
Table of Contents
Up to --> Linux Tutorial

· Introduction to Operating Systems
· What Is an Operating System
· Processes
· Files and Directories
· Operating System Layers
· Moving On

Glossary
MoreInfo
Man Pages
Linux Topics
Test Your Knowledge

Site Menu
Site Map
FAQ
Copyright Info
Terms of Use
Privacy Info
Disclaimer
WorkBoard
Thanks
Donations
Advertising
Masthead / Impressum
Your Account

Communication
Feedback
Forums
Private Messages
Recommend Us
Surveys

Features
HOWTOs
News
News Archive
NukeSentinel
Submit News
Topics
User Articles
Web Links

Google
Google


The Web
linux-tutorial.info

Who's Online
There are currently, 250 guest(s) and 0 member(s) that are online.

You are an Anonymous user. You can register for free by clicking here

  
Linux Tutorial - Introduction to Operating Systems - Files and Directories
  Virtual Memory Basics ---- Operating System Layers  


Files and Directories

Another key aspect of any operating system is the concept of a file. A file is nothing more than a related set of bytes on disk or other media. These bytes are labeled with a name, which is then used as a means of referring to that set of bytes. In most cases, it is through the name that the operating system is able to track down the file's exact location on the disk.

There are three kinds of files with which most people are familiar: programs, text files, and data files. However, on a UNIX system, there are other kinds of files. One of the most common is a device file. These are often referred to as device files or device nodes. Under UNIX, every device is treated as a file. Access is gained to the hardware by the operating system through the device files. These tell the system what specific device driver needs to be used to access the hardware.

Another kind of file is a pipe. Like a real pipe, stuff goes in one end and out the other. Some are named pipes. That is, they have a name and are located permanently on the hard disk. Others are temporary and are unnamed pipes. Although these do not exist once the process using them has ended, they do take up physical space on the hard disk. We'll talk more about pipes later.

Unlike operating systems like DOS, there is no pattern for file names that is expected or followed. DOS will not even attempt to execute programs that do not end with .EXE, .COM, or .BAT. UNIX, on the other hand, is just as happy to execute a program called program as it is a program called program.txt. In fact, you can use any character in a file name except for "/" and NULL.

However, completely random things can happen if the operating system tries to execute a text file as if it were a binary program. To prevent this, UNIX has two mechanisms to ensure that text does not get randomly executed. The first is the file's permission bits. The permission bits determine who can read, write, and execute a particular file. You can see the permissions of a file by doing a long listing of that file. What the permissions are all about, we get into a little later. The second is that the system must recognize a magic number within the program indicating that it is a binary executable. To see what kinds of files the system recognizes, take a look in /etc/magic. This file contains a list of file types and information that the system uses to determine a file's type.

Even if a file was set to allow you to execute it, the beginning portion of the file must contain the right information to tell the operating system how to start this program. If that information is missing, it will attempt to start it as a shell script (similar to a DOS batch file). If the lines in the file do not belong to a shell script and you try to execute the program, you end up with a screen full of errors.

What you name your file is up to you. You are not limited by the eight-letter name and three-letter extension as you are in DOS. You can still use periods as separators, but that's all they are. They do not have the same "special" meaning that they do under DOS. For example, you could have files called

letter.txt
letter.text
letter_txt
letter_to_jim
letter.to.jim

Only the first file example is valid under DOS, but all are valid under Linux. Note that even in older versions of UNIX where you were limited to 14 characters in a file name, all of these are still valid. With Linux, I have been able to create file names that are 255 characters long. However, such long file names are not easy to work with. Note that if you are running either Windows NT or Windows 95, you can create file names that are basically the same as with Linux.

Also keep in mind that although you can create file names with spaces in them, it can cause problems. Spaces are used to seperate the different components on the command line. You can tell your shell to treat a name with spaces as a single unit by including it in quotes. However, you need to be careful. Typically, I simply use an underline (_) when the file name ought to have a space. It almost looks the same and I don't run into problems.

One naming convention does have special meaning in Linux: "dot" files. In these files, the first character is a "." (dot). If you have such a file, it will by default be invisible to you. That is, when you do a listing of a directory containing a "dot" file, you won't see it.

However, unlike the DOS/Windows concept of "hidden" files, "dot" files can be seen by simply using the -a (all) option to ls, as in ls -a. (ls is a command used to list the contents of directories.) With DOS/Windows the "dir" command can show you hidden files and directories, but has no option to show these along with the others.

The ability to group your files together into some kind of organizational structure is very helpful. Instead of having to wade through thousands of files on your hard disk to find the one you want, Linux, along with other operating systems, enables you to group the files into a directory. Under Linux, a directory is actually nothing more than a file itself with a special format. It contains the names of the files associated with it and some pointers or other information to tell the system where the data for the file actually reside on the hard disk.

Directories do not actually "contain" the files that are associated with them. Physically (that is, how they exist on the disk), directories are just files in a certain format. The directory structure is imposed on them by the program you use, such as ls.

The directories have information that points to where the real files are. In comparison, you might consider a phone book. A phone book does not contain the people listed in it, just their names and telephone numbers. A directory has the same information: the names of files and their numbers. In this case, instead of a telephone number, there is an information node number, or inode number.

The logical structure in a telephone book is that names are grouped alphabetically. It is very common for two entries (names) that appear next to each other in the phone book to be in different parts of the city. Just like names in the phone book, names that are next to each other in a directory may be in distant parts of the hard disk.

As I mentioned, directories are logical groupings of files. In fact, directories are nothing more than files that have a particular structure imposed on them. It is common to say that the directory "contains" those files or the file is "in" a particular directory. In a sense, this is true. The file that is the directory "contains" the name of the file. However, this is the only connection between the directory and file, but we will continue to use this terminology. You can find more details about this in the section on files and file systems.

One kind of file is a directory. What this kind of file can contain are files and more directories. These, in turn, can contain still more files and directories. The result is a hierarchical tree structure of directories, files, more directories, and more files. Directories that contain other directories are referred to as the parent directory of the child or subdirectory that they contain. (Most references I have seen refer only to parent and subdirectories. Rarely have I seen references to child directories.)

When referring to directories under UNIX, there is often either a leading or trailing slash ("/"), and sometimes both. The top of the directory tree is referred to with a single "/" and is called the "root" directory. Subdirectories are referred to by this slash followed by their name, such as /bin or /dev. As you proceed down the directory tree, each subsequent directory is separated by a slash. The concatenation of slashes and directory names is referred to as a path. Several levels down, you might end up with a path such as /home/jimmo/letters/personal/chris.txt, where chris.txt is the actual file and /home/jimmo/letters/personal is all of the directories leading to that file. The directory /home contains the subdirectory jimmo, which contains the subdirectory letters, which contains the subdirectory personal. This directory contains the file chris.txt.

Movement up and down the tree is accomplished by the means of the cd (change directory) command, which is part of your shell. Although this is often difficult to grasp at first, you are not actually moving anywhere. One of the things that the operating system keeps track of within the context of each process is the process's current directory, also referred to as the current working directory. This is merely the name of a directory on the system. Your process has no physical contact with this directory; it is just keeping the directory name in memory.

When you change directories, this portion of the process memory is changed to reflect your new "location." You can "move" up and down the tree or make jumps to completely unrelated parts of the directory tree. However, all that really happens is that the current working directory portion of your process gets changed.

Although there can be many files with the same name, each combination of directories and file name must be unique. This is because the operating system refers to every file on the system by this unique combination of directories and file name. In the example above, I have a personal letter called chris.txt. I might also have a business letter by the same name. Its path (or the combination of directory and file name) would be /home/jimmo/letters/business/chris.txt. Someone else named John might also have a business letter to Chris. John's path (or combination of path and file name) might be /home/john/letters/business/chris.txt. This might look something like this:

Image - Example of home directories. (interactive)

One thing to note is that John's business letter to Chris may be the exact same file as Jim's. I am not talking about one being a copy of the other. Rather, I am talking about a situation where both names point to the same physical locations on the hard disk. Because both files are referencing the same bits on the disk, they must therefore be the same file.

This is accomplished through the concept of a link. Like a chain link, a file link connects two pieces together. I mentioned above the "telephone number" for a file was its inode. This number actually points to a special place on the disk called the inode table, with the inode number being the offset into this table. Each entry in this table not only contains the file's physical location on this disk, but the owner of the file, the access permissions, and the number of links, as well as many other things. In the case where the two files are referencing the same entry in the inode table, these are referred to as hard links. A soft link or symbolic link is where a file is created that contains the path of the other file. We will get into the details of this later.

An inode does not contain the name of a file. The name is only contained within the directory. Therefore, it is possible to have multiple directory entries that have the same inode. Just as there can be multiple entries in the phone book, all with the same phone number. We'll get into a lot more detail about inodes in the section on filesystems. A directory and where the inodes point to on the hard disk might look like this:

Image - The relationship between file names, inodes and physical data on your hard disk. (interactive)

Lets think about the telephone book analogy once again. Although it is not common for an individual to have multiple listings, there might be two people with the same number. For example, if you were sharing a house with three of your friends, there might be only one telephone. However, each of you would have an entry in the phone book. I could get the same phone to ring by dialing the telephone number of four different people. I could also get to the same inode with four different file names.

Under Linux, files and directories are grouped into units called filesystems. A filesystem is a portion of your hard disk that is administered as a single unit. Filesystems exist within a section of the hard disk called a partition. Each hard disk can be broken down into multiple partitions and the filesystem is created within the partition. Each has specific starting and ending points that are managed by the system. (Note: Some dialects of UNIX allow multiple filesystems within a partition.)

When you create a filesystem under Linux, this is comparable to formatting the partition under DOS. The filesystem structure is laid out and a table is created to tell you where the actual data are located. This table, called the inode table in UNIX, is where almost all the information related to the file is kept.

In an operating system such as Linux, a file is more than just the basic unit of data. Instead, almost everything is either treated as a file or is only accessed through files. For example, to read the contents of a data file, the operating system must access the hard disk. Linux treats the hard disk as if it were a file. It opens it like a file, reads it like a file, and closes it like a file. The same applies to other hardware such as tape drives and printers. Even memory is treated as a file. The files used to access the physical hardware are the device files that I mentioned earlier.

When the operating system wants to access any hardware device, it first opens a file that "points" toward that device (the device node). Based on information it finds in the inode, the operating system determines what kind of device it is and can therefore access it in the proper manner. This includes opening, reading, and closing, just like any other file.

If, for example, you are reading a file from the hard disk, not only do you have the file open that you are reading, but the operating system has opened the file that relates to the filesystem within the partition, the partition on the hard disk, and the hard disk itself (more about these later). Three additional files are opened every time you log in or start a shell. These are the files that relate to input, output, and error messages.

Normally, when you login, you get to a shell prompt. When you type a command on the keyboard and press enter, a moment later something comes onto your screen. If you made a mistake or the program otherwise encountered an error, there will probably be some message on your screen to that effect. The keyboard where you are typing in your data is the input, referred to as standard input (standard in or stdin) and that is where input comes from by default. The program displays a message on your screen, which is the output, referred to as standard output (standard out or stdout). Although it appears on that same screen, the error message appears on standard error (stderr).

Although stdin and stdout appear to be separate physical devices (keyboard and monitor), there is only one connection to the system. This is one of those device files I talked about a moment ago. When you log in, the file (device) is opened for both reading, so you can get data from the keyboard, and writing, so that output can go to the screen and you can see the error messages.

These three concepts (standard in, standard out, and standard error) may be somewhat difficult to understand at first. At this point, it suffices to understand that these represent input, output, and error messages. We'll get into the details a bit later.

 Previous Page
Virtual Memory Basics
  Back to Top
Table of Contents
Next Page 
Operating System Layers


MoreInfo

Test Your Knowledge

User Comments:


You can only add comments if you are logged in.

Copyright 2002-2009 by James Mohr. Licensed under modified GNU Free Documentation License (Portions of this material originally published by Prentice Hall, Pearson Education, Inc). See here for details. All rights reserved.
  




Login
Nickname

Password

Security Code
Security Code
Type Security Code


Don't have an account yet? You can create one. As a registered user you have some advantages like theme manager, comments configuration and post comments with your name.

Help if you can!


Amazon Wish List

Did You Know?
You can choose larger fonts by selecting a different themes.


Friends



Tell a Friend About Us

Bookmark and Share



Web site powered by PHP-Nuke

Is this information useful? At the very least you can help by spreading the word to your favorite newsgroups, mailing lists and forums.
All logos and trademarks in this site are property of their respective owner. The comments are property of their posters. Articles are the property of their respective owners. Unless otherwise stated in the body of the article, article content (C) 1994-2013 by James Mohr. All rights reserved. The stylized page/paper, as well as the terms "The Linux Tutorial", "The Linux Server Tutorial", "The Linux Knowledge Base and Tutorial" and "The place where you learn Linux" are service marks of James Mohr. All rights reserved.
The Linux Knowledge Base and Tutorial may contain links to sites on the Internet, which are owned and operated by third parties. The Linux Tutorial is not responsible for the content of any such third-party site. By viewing/utilizing this web site, you have agreed to our disclaimer, terms of use and privacy policy. Use of automated download software ("harvesters") such as wget, httrack, etc. causes the site to quickly exceed its bandwidth limitation and are therefore expressly prohibited. For more details on this, take a look here

PHP-Nuke Copyright © 2004 by Francisco Burzi. This is free software, and you may redistribute it under the GPL. PHP-Nuke comes with absolutely no warranty, for details, see the license.
Page Generation: 0.36 Seconds