The EXT2 File System

The Second Extended File system (EXT2)

Figure: Physical Layout of the EXT2 File system

The Second Extended File system was devised (by Rémy Card) as an extensible and powerful file system for Linux. It is also the most successful file system so far in the Linux community and is the basis for all of the currently shipping Linux distributions.

The EXT2 file system, like a lot of the file systems, is built on the premise that the data held in files is kept in data blocks. These data blocks are all of the same length and, although that length can vary between different EXT2 file systems the block size of a particular EXT2 file system is set when it is created (using mke2fs). Every file’s size is rounded up to an integral number of blocks. If the block size is 1024 bytes, then a file of 1025 bytes will occupy two 1024 byte blocks. Unfortunately this means that on average you waste half a block per file. Usually in computing you trade off CPU usage for memory and disk space utilisation. In this case Linux, along with most operating systems, trades off a relatively inefficient disk usage in order to reduce the workload on the CPU. Not all of the blocks in the file system hold data, some must be used to contain the information that describes the structure of the file system. EXT2 defines the file system topology by describing each file in the system with an inode data structure. An inode describes which blocks the data within a file occupies as well as the access rights of the file, the file’s modification times and the type of the file. Every file in the EXT2 file system is described by a single inode and each inode has a single unique number identifying it. The inodes for the file system are all kept together in inode tables. EXT2 directories are simply special files (themselves described by inodes) which contain pointers to the inodes of their directory entries.

The figure above shows the layout of the EXT2 file system as occupying a series of blocks in a block structured device. So far as each file system is concerned, block devices are just a series of blocks that can be read and written. A file system does not need to concern itself with where on the physical media a block should be put, that is the job of the device’s driver. Whenever a file system needs to read information or data from the block device containing it, it requests that its supporting device driver reads an integral number of blocks. The EXT2 file system divides the logical partition that it occupies into Block Groups.

Each group duplicates information critical to the integrity of the file system as well as holding real files and directories as blocks of information and data. This duplication is neccessary should a disaster occur and the file system need recovering. The subsections describe in more detail the contents of each Block Group.

One benefit of the ext2fs over the extfs is the size of the file systems that can be managed. Currently (after some enhancements in the VFS layer), the ext2fs can access file systems as large as 4TB. In contrast to other UNIXs, the ext2fs uses a variable length directory and can have files names that are as long as 255 characters.

When creating the file system, the ext2fs enables you to choose what size block you want. Using larger blocks will speed up the data transfer because the head disk does not need to look (seek) as much. However, if you have a lot of small files, a larger block size means you waste more space.

Also to speed up access, the ext2fs uses a technique called a “fast symbolic link.” On many UNIX systems, the files to which symbolic links point are stored as files themselves. This means that each time a file is read as a symbolic link, the disk is accessed to get the inode of the link, the path is read out of the file, and its inode needs to be read, and then the actual file can be accessed.

With a fast symbolic link, the path to the file is stored in the inode. This not only speeds up access but also saves the space that the file is no longer taking on the hard disk. The only drawback is that when the path to the real file has more than 60 characters, it cannot fit in the inode and must sit in a file. Therefore, if you are using symbolic links and want to increase performance, make sure the path has fewer than 60 characters.

Another advantages of the ext2fs is its reliability. The ext2fs is made of what are called “block groups.” Each block group has a block group descriptor, which provides an information copy of the superblock, as well as a block bitmap, inode bitmap, a piece of the inode table, and data blocks.

There is also an entry that contains the number of directories within the group block. When creating a new directory, the system will try to put the directory into the block group with the fewest directories. This makes accessing any one directory quicker.

Because the block group contains copies of the primary control structures, it can be repaired by these copies should the superblock at the start of the disk get corrupted. In addition, because the inode table, as such, is spread out across the disk, you have to search less. Plus, the distance between the inode table and the data block is reduced, thereby increasing performance ever further.

There’s still more! The ext2fs will preallocate up to eight adjacent blocks when it allocates a block for a file. This gives the file a little room to grow. By preallocating the blocks, you have a file that is located in the same area of the disk. This speeds up all sequential accesses.

The directories entries in the ext2fs are in a singly linked list, as compared to an array with fixed entry lengths on some systems. Within the directory entry, you will find the name of the file as well as the inode number. Note that this is the only place where the name of the file appears. In addition, there’s a field that has the total length of the record in bytes (which is always a multiple of 4) that is then used to calculate the start of the next block. Therefore, there are no pointers as in other linked lists.

When a file is deleted, the inode is set to 0 and the previous entry “takes over” the slot. This saves time because no shifts are required. There may be a slight loss in space, but if a new entry that will fill up the old slot is created, it will fill up the old slot. Because of this scheme, you can implement long file names without wasting space. In some systems, specific-length fields are set aside. If the file name doesn’t fill up the slot, the space is just wasted.