Disk Layout

Disk Layout

Originally, you could only get four partitions on a hard disk. Though this was not a major issue at first, as people got larger hard disks, there was a greater need to break things down in a certain structure. In addition, certain OSes needed a separate space on which to swap. If you had one partition for your root file system, one for swap, one for user information, and one for common data, you would have just run out.

To solve the problem and still maintain backward compatibility, DOS-based machines were able to create an extended partition that contained logical partitions within it. Other systems, like SCO, allow you to have multiple file systems within a single partition to overcome the limitation.

To be able to access data on your hard disk, there has to be some pre-defined structure. Without structure, the unorganized data end up looking like my desk, where there are several piles of papers that I have to look though to find what I am looking for. Instead, the layout of a hard disk follows a very consistent pattern so consistent that it is even possible for different operating systems to share the hard disk.

Basic to this structure is the concept of a partition. A partition defines a portion of the hard disk to be used by one operating system or another. The partition can be any size, even the entire hard disk. Near the very beginning of the disk is the partition table. The partition table is only 512 bytes but can still define where each partition begins and how large it is. In addition, the partition table indicates which of the partitions is active. This decides which partition the system should go to when looking for an operating system to boot. The partition table is outside of any partition.

Once the system has determined which partition is active, the CPU knows to go to the very first block of data within that partition and begin executing the instructions there. However, if LILO is setup to run out of your master boot block, it doesn’t care about the active partition. It does what you tell it.

Typically, special control structures that impose an additional structure are created at the beginning of the partition. This structure makes the partition a file system.

There are two control structures at the beginning of the file system: the superblock and the inode table. The superblock contains information about the type of file system, its size, how many data blocks there are, the number of free inodes, free space available, and where the inode table is. On the ext2 filesystem, copies of the superblock are stored at regular intervals for efficiency and in case the original gets trashed.

Many users are not aware that different file systems reside on different parts of the hard disk and, in many cases, on different physical disks. From the users perspective, the entire directory structure is one unit from the top (/) down to the deepest subdirectory. To carry out this deception, the system administrator needs to mount file systems by mounting the device node associated with the file system (e.g., /dev/home) onto a mountpoint (e.g., /home). This can be done either manually, with the mount command, or by having the system do it for you when it boots. This is done with entries in

/etc/fstab
.

Conceptually, the mountpoint serves as a detour sign for the system. If there is no file system mounted on the mountpoint, the system can just drive through and access what’s there. If a file system is mounted, when the system gets to the mountpoint, it sees the detour sign and immediately diverts in another direction. Just as roads, trees, and houses still exist on the other side of the detour sign, any file or directory that exists underneath the mountpoint is still there. You just cant get to it.

Let’s look at an example. You have create a filesystem on the first partition of your second hard disk, so the device node would be /dev/hdb1. You want to mount this file system onto the directory. /home. Let’s say that when you first installed the system and before you first mounted the /dev/hdb1 file system, you created some users with their home directories in /home. For example, /home/jimmo. When you do finally mount the /dev/home file system onto the /home directory, you no longer see /home/jimmo. It is still there, but once the system reaches the /homes directory, it is redirected somewhere else.

The way Linux accesses its file systems is different from the way a lot of people are accustomed to it. Let’s consider what happens when you open a file. All the program needs to know is the name of the file, which it tells the operating system, which then has to convert it to a physical location on this disk. This usually means converting it to an inode sfirst.

Because the conversion between a file name and the physical location on the disk will be different for different file system types, Linux has implemented a concept called the Virtual File System (VFS) layer. When a program makes a system call that accesses the file system (such as open), the kernel actually calls a function within the VFS layer. It is then the VFS’s responsibility to call the file-system-specific code to access the data. The figure below shows what this looks like graphically.

Image – File System Layers (interactive)

Because it has to interact with every file system type, the VFS has a set of functions that every file system implements. It has to know about all the normal operations that occur on a file such as opening, reading, closing, etc., as well as know about file system structures, such as inodes.

If you want more details, there is a whole section on the VFS.

To address certain problems, the Second Extended File System (ext2fs) was developed. This is an enhanced version of the Extended File System (extfs). The ext2fs was designed to fix some problems in the extfs, as well as add some features. Linux supports a larger number of other filesystems, but as of this writing, the ext2fs seems to be the most common. In the following discussion we will be talking specifically about the ext2fs in order to explain how inodes work. Although the details are specific to the ext2fs, the concepts apply to many other filesystems.

Among other things that the inode keeps track of are file types and permissions, number of links, owner and group, size of the file, and when it was last modified. In the inode, you will also find 15 pointers to the actual data on the hard disk.

Note that these are pointers to the data and not the data itself. Each one of the 15 pointers to the data is a block address on the hard disk. For the following discussion, please refer to the figure below.

Figure – Inodes Pointing to Disk Blocks

Each of these blocks is 1,024 bytes. Therefore, the maximum file size on a Linux system is 15KiB. Wait a minute! That doesn’t sound right, does it? It isn’t. If (and that’s a big if) all of these pointers pointed to data blocks, then you could only have a file up to 15KiB. However, dozens of files in the /bin directory alone are larger than 15KiB. Hows that?

The answer is that only 12 of these blocks actually point to data, so there is really only 12KiB that you can access directly. These are referred to as data blocks or direct data blocks. The thirteenth pointer points to a block on the hard disk outside of the inode table that actually contains the real pointers to the data. These are the indirect data blocks and contain 4-byte values, so there are 128 of them in each block. In the figure above, the thirteenth entry is a pointer to block 567. Block 567 contains 128 pointers to indirect data blocks. One of these pointers points to block 33453, which contains the actual data. Block 33453 is an indirect data block.

Because the data blocks that the 128 pointers pointed to in block 567 each contain 512 bytes of data, there is an additional 65KiB of data. So, with 12KiB for the direct data blocks and 65KiB for the indirect data blocks, we now have a maximum file size of 77KiB.

Hmmm. Still not good. There are files on your system larger than 77KiB. So that brings us to triplet 13. This points not to data blocks, not to a block of pointers to data blocks, but to blocks that point to blocks that point to data blocks. These are then the data blocks.

In the figure, the fourteenth pointer contains a pointer to block 5601. Block 5601 contains pointers to other blocks, one of which is block 5151. However, block 5151 does not contain data, but even more pointers. One of these pointers points to block 56732, and it is block 56732 that finally contains the data.

We have a block of 128 entries that each point to a block that each contains 128 pointers to 512 byte data blocks. This gives us 8Mb, just for the double-indirect data blocks. At this point, the additional size gained by the single-indirect and direct data blocks is negligible. Therefore, lets just say we can access more than 8Mb. Now, that’s much better. You would be hard-pressed to find a system with files larger than 8Mb (unless we are talking about large database applications). However, were not through yet. We have one pointer left.

So, not to bore you with too many of you, lets do the math quickly. The last pointer points to a block containing 128 pointers to other blocks, each of which points to 128 other blocks. At this point, we already have 16,384 blocks. Each of these 16,384 blocks contain 128 pointers to the actual data blocks. Here we have 2,097,152 pointers to data blocks, which gives us a grand total of 1,073,741,824, or 1Gb, of data (plus the insignificant 8MB we get from the double-indirect data blocks). As you might have guessed, these are the triple-indirect data blocks.In Figure 0-7 pointer 13 contains a pointer to block 43. Block 42 contains 256 pointers, one of which points to block 1979. Block 1979 also contains 256 pointers, one of which points to block 988. Block 988 also contains 256 pointers, though pointers point to the actual data. For example, block 911.

If we increase the block size to 4k (4096 bytes), we end up with more pointers in each of the indirect blocks so they can point to more blocks. In the end, we have files the size of 4Tb. However, because the size field in the inode is a 32-bit value, we max out at 4Gb.

If you want more details, there is a whole section on the ext2fs.

Linux’s support for file systems is without a doubt the most extensive of any operating system. In addition to “standard linux” file systems, there is also support for FAT, VFAT, ISO9660 (CD-ROM), NFS, plus file systems mounted from Windows machines using Samba (via the SMB protocol). Is that all? Nope! There are also drivers to support several compressed formats such as stacker and double-space. The driver for the Windows NT file system (NTFS) can even circumvent that “annoying” security.

Warning: I have seen Linux certification prep books that talk about the the inode being a “unique” number. This can be extremely misleading. While it is true that any given inode will only appear once in the inode table, this does not mean that multiple files cannot have the same inode. If they do, then they point to the same data on the hard disk, despite having different names.