However, deciding which filesystems to use is not simply a matter of sharing data. It is also a matter of what your needs and to a limited extent the Linux distribution you are running. As we talked about in the sections on the disk layout and the EXT2 filesystem, a filesytem is basically nothing more than a set of data structures on the hard disk. However, it is how these structures are defined and how they are used which is the difference in the various filesystems. Which one you choose can have dramatic effects on your performance, the reliability of the data, the time required to recover from system crashes, the maximum number of files supported and even how specific you can controll access to the files.
Typically, when you install Linux, you are provided with a default and handful of choice, should you not want the default. Typically, people choose the default, which may be an ext2fs or ext3fs, or even a ReiserFS. The ext2fs seems to be the most common and the one referred to specifically in most texts that I know. However, that does not mean it (or any of the other default filesystems) is naturally the best for your specific purpose.
When I first started using Linux, I would always choose the ext2fs (actaully there wasn’t much else at that time). Then I moved to the ReiserFS. It wasnt’t until I knew what it could do and that I wanted to use the features that I moved on to the XFS. This is not to say that the XFS is the best, just that I want to use the features that is has.
Note that because it is relatively new, your distribution may not have the filesystem you want, Linux allows to to easily add them. You can download the driver from the appropriate web site, but check your distribution first. It may already be there, just not activated in your kernel. Also, before you download it and try to add it, make sure the version of the filesystem you want is supported by the version of the kernel you have. Obviously there is tight integration between the filesystem drivers and the kernel. Therefore, you want to make sure that they are 100% compatible.
My personal opinion is that unless you have a very specific reason to move to another filesystem, then you are probably best served by using the defaults for your particular distribution. That way you know the filesystem will work with your kernel and the tools needed by that filesystem work correctly.
In the section on the EXT2 filesystem, we go into details about the ext2fs. Although the concepts discussed are the same for many filesystems, the implementation is different, plus there are many additions or extensions each filesystem provides.
For example, the ext3fs has the same basic features as the ext2fs, but provides journalling. In essential, what journalling does is keep a record (or journal) of activity on the filesystem. Should the sytem crash, it is easier and quicker to simply re-read the journal to correct any problems. The way this works is similar to the way a modern database (like Oracle) works. Since data is usually written in blocks (not necessarily hard disk blocks), the filesystem might become corrupt if the system suddenly stopped in the middle of writing (for example because of a power failure or system crash). To make sure that any resulting problem are addressed, modern filesystems support the concept of a “dirty bit”. This is simply a marker at the beginning of the filesystem which indicates that it was not unmounted properly or “cleanly” and thus the filesystem is dirty. When the system reboots, it sees that the filesystem was not unmounted cleanly and begins performing the necessary checks on the entire filesystem. Naturally, the larger the filesystem, the longer these checks take.
Well, what if the system knew what areas of the filesystem had been used? That way it only needs to check just those areas that were changed. Sounds like a great ahead, but how does it keep track of what was changed? By writing the changes in the journal. When the system reboots, it will look for transactions (operations) in the journal that were not completed (pending) and reacts accordingly. It’s possible that you will still loose data, since journal cannot recover data that was not yet completely written to the journal and there are cases where the safest thing to do is to skip a given transaction (rollback). However, recovering from crashes takes a matter of seconds or minutes and not hours.
The first journalling filesystem with Linux support was the ReiserFS. Another journalling filesystem is IBM’s JFS, which was originally written for AIX (the IBM version of UNIX) and the ported to OS/2. Eventually, the OS/2 implementation was made open source and then ported to Linux. Then there is the XFS, which came from Silicon Graphics UNIX version IRIX. The XFS has the advantage that it supports access controll lists (ACLs), which allow much finer control of filesystem access than is possible with traditional UNIX filesystems.
If you have en existing ext2fs and do not want to reboot, you can easily convert to the ext2fs, plus the ext3fs can be read using the ext2fs driver (for example, if you move it to another system without ext2fs support). Being just an extension to the existing ext2fs, the ext3fs is very reliable. Consider how long the ext2fs has been around, some people consider it “out-dated”. However, older does not necessarily equate to out-dated, but it does equate to being stable and reliable. Plus it does provide journalling for system with the ext2fs without having to re-install.
Another consideration is the performance of yoru filesystem. With the speed of CPUs and disk sub-systems today, most home users do not run into filesystem problems. So, choosing a filesystem just because it has better performance is not always the best decision. One thing that is important is the way in which you in particular access the hard disk (your “access patterns”). Interestingly enough, the XFS and JFS typically work better with the smaller files (< 100 MB, if you call that small), which is what home users normally have.
If it is hard to make a decision between different filesystems and want to look at the performance in your environment, then I would say that the best thing to do is test it in your environment. Benckmarking by others under “laboratory” conditions might be good to get a general idea of what the performance is. However, there are a number of different factors that may bring you different results, such as the hardware, the applications, and usage patterns.
If you need to do performance tests, test in on the exact same hardware. Even go to the point where you overwrite each filesystem with the new one. Believe it or not, the physical location on the disk can have a great effect on the performance. Since it takes longer to move the read-write heads to places further in on the disk, I know some admistrators who place disk intensive data (like a database) at the very beginning of the disk and other things further on. So if the filesystems are on different parts of the disk, that might effect your tests.
For home users, the efficiency in which space is allocated is less of an issue. Today, 40Gb or even 60Gb is a small hard disk. My first hard disk was only 20 Mb, which was 20,000 time smaller. One reason this is less of an issue for home users is that they typically use IDE drives, whereas SCSI is more common of the servers businesses use. Since IDE drives are cheaper byte-for-byte than SCSI, do don’t find as many home users with SCSI hard drives, so it is likely that the hard disk are larger. Unless you download everything you have ever found on the Internet, then you are probably not going to need to squeeze out every kilobyte from the hard disk.
Remember that data on the hard disk is typically stored in blocks of 512 bytes. However, the various filesystem have block sizes ranging from 512 bytes to as much as 64Kbytes. Note that regardless of the size, they are typically still in “power-of-two” multiples times the 512 byte block size of that disk, such as 22*512, 23*512 or 24*512.
For argument’s sake, let’s say that you have the smallest possible block size and you create a file that is 513 bytes. Since it is larger than one block, you obviously will need two. However, 511 bytes of the the second block go unused. However, if the block size is 4K, then you loose just under 3.5K of space. If you use a 64K, then you end up loosing over 60K of space!
Realistically, this is all good theory, but it’s different in practice. If you just consider operating system files, you have files that range in size from just a few bytes to several megabytes and since they are almost never an exact multiple of any block sizes, you will always loose space. The only question is how much. Obviously, using the smallest block size means you loose the least amount of space (max. 511 bytes). However, there is almost always a slight performance degradation because the system needs to manage each block individually and the more blocks there are, there more that needs to be managed. Furthermore, the more pieces there are, the easier it is for the file to become fragmented (spread out all over the disk) and thus decrease performance even further as the hard disk needs to move back and forth to gather all of the various pieces.
The ReiserFs addresses this issue in a unique way in that it has a special mechanisms to manage all of the file “tails” (the extra parts that fill up only part of a block). The tails from multiple files are combine together is seperate blocks, thus descreasing the total number of blocks needed. However, I personally don’t think that this feature should be the sole reason for choosing the ReiserFS. Even if you managed to save 100Mb like that, it would only be 1% of a 10Gb hard disk and (in my opinion) not worth worrying about. The XFS solves this problem by storing small files directly within the inode. Once again, the savings is no longer relevant (in my opinion).
Now consider a filesystem that stores an Oracle database. You might only have 50 files on it, an no new ones are ever added. Of the 50, 40 are less than 100K, so you save almost nothing. The other 10 files are database files and each is 2 GB. The savings is not even worth the time to read this paragraph.
One the othe hand, a web server or other system that has millions of small files, then the savings might amount to something significant. However, I need to point out the word “might”. If all you need to do is add a 20 GB hard disk to solve your problems, the cost of the new hard drive probably outweighs other factors.
Also you need to consider the space required for the journalling information. In most cases, it is so small by comparison to the drive that it is usually not worth considering. However, if you are using some kind of removeable media that only has 100Mb, then the 32 Mb that the ReiserFS requires for its journal information is a big chunk!
As we tal about in the section on filesystem permissions linux uses the tradition ower-group-other, read-write-execute permissions. In general, this is all that is needed for home users, or ever on company servers that run applications for users rather than having users login, such as database or web servers. However, particularly for file servers, you often need even finer controll of who has access to which files and directories and what kind of access they have. This is done through access controll lists (ACLs).
As it’s name implies, an access controll list is a list that controls access to files and directories. Instead of being able to specify permissions for just a single user, a single group and everyone else, you can list specific users and groups. For example, you own a file and you want to give read access to just a specific person (for example, your boss) or to a number of people in different groups. Theorectically, you could create a new group containing these people, but that could mean that you need a new group for each combination. Also, groups can only be created by the system administrator, so you have to wait until they got around to doing it. With ACLs, the owner of the file has the ability to make changes.
An important thing to keep in mind is that only the XFS has native support for ACLs (at least as of this writing). There are packages available for ext2fs, ext3fs and ReiserFS. However, since XFS is also a journalling filesystem, the others offer little advantage (which is why I moved all of my filesystems to XFS).
As of this writing, Linux supports the following filesystem types: