EXT2 Files

Finding a File in an EXT2 File System

A Linux filename has the same format as all Unix filenames have.
It is a series of directory names separated by forward slashes (“/”) and
ending in the file’s name. One example filename would be /home/rusling/.cshrc where /home and /rusling are directory names and the file’s name is .cshrc. Like all other Unix  systems, Linux does not care about the format of the filename itself; it
can be any length and consist of any of the printable characters. To find the inode representing this file within an EXT2 file system the system must parse the filename a directory at a time until we get to the file itself.

The first inode we need is the inode for the root of the file system and we
find its number in the file system’s superblock.
To read an EXT2 inode we must look for it in the inode table of the appropriate
Block Group.
If, for example, the root inode number is 42, then we need the 42nd inode from
the inode table of Block Group 0.
The root inode is for an EXT2 directory, in other words the mode of the root
inode describes it as a directory and it’s data blocks contain EXT2 directory entries.


home is just one of the many directory entries and this directory entry gives us
the number of the inode describing the /home directory.
We have to read this directory (by first reading its inode and then
reading the directory entries from the data blocks described by its inode)
to find the rusling entry which gives us the number of the inode describing the
/home/rusling directory.
Finally we read the directory entries pointed at by the inode describing the
/home/rusling directory to find the inode number of the .cshrc file and
from this we get the data blocks containing the information in the file.

Changing the Size of a File in an EXT2 File System

One common problem with a file system is its tendency to fragment.
The blocks that hold the file’s data get spread all over the file system
and this makes sequentially accessing the data blocks of a file more and more
inefficient the further apart the data blocks are.
The EXT2 file system tries to overcome this by allocating the new blocks for a
file physically close to its current data blocks or at least in the same Block Group
as its current data blocks.
Only when this fails does it allocate data blocks in another Block Group.


Whenever a process attempts to write data into a file the Linux file system checks to
see if the data has gone off the end of the file’s last allocated block.
If it has, then it must allocate a new data block for this file.
Until the allocation is complete, the process cannot run; it must wait for
the file system to allocate a new data block and write the rest of the data to it before
it can continue.
The first thing that the EXT2 block allocation routines do is to lock the
EXT2 Superblock for this file system. Allocating and deallocating changes fields
within the superblock, and the Linux file system cannot allow more than one process
to do this at the same time.
If another process needs to allocate more data blocks, it will have to wait until
this process has finished.
Processes waiting for the superblock are suspended, unable to run, until
control of the superblock is relinquished by its current user.
Access to the superblock is granted on a first come, first
served basis and once a process has control of the superblock, it keeps control
until it has finished.
Having locked the superblock, the process checks that there are enough free blocks
left in this file system. If there are not enough free blocks, then this attempt to
allocate more will fail and the process will relinquish control of this file system’s
superblock.


If there are enough free blocks in the file system, the process tries to
allocate one.


If the EXT2 file system has been built to preallocate data blocks then we may be able
to take one of those.
The preallocated blocks do not actually exist, they are just reserved within the
allocated block bitmap.
The VFS inode representing the file that we are trying to allocate a new data block
for has two EXT2 specific fields, prealloc_block and prealloc_count, which
are the block number of the first preallocated data block and how many of them there
are, respectively.
If there were no preallocated blocks or block preallocation is not enabled, the EXT2
file system must allocate a new block.
The EXT2 file system first looks to see if the data block after the last data block
in the file is free. Logically, this is the most efficient block to allocate as
it makes sequential accesses much quicker.
If this block is not free, then the search widens and it looks for a data block within
64 blocks of the of the ideal block.
This block, although not ideal is at least fairly close and within the same Block Group
as the other data blocks belonging to this file.


If even that block is not free, the process starts looking in all of the
other Block Groups in turn until it finds some free blocks.
The block allocation code looks for a
cluster of eight free data blocks somewhere in one of the Block Groups.
If it cannot find eight together, it will settle for less.
If block preallocation is wanted and enabled it will update prealloc_block and
prealloc_count accordingly.


Wherever it finds the free block, the block allocation code updates the Block Group’s
block bitmap and allocates a data buffer in the buffer cache.
That data buffer is uniquely identified by the file system’s supporting device identifier
and the block number of the allocated block.
The data in the buffer is zero’d and the buffer is marked as dirty'' to show that it's contents have not been written to the physical disk. Finally, the superblock itself is marked asdirty” to show that it has been changed
and it is unlocked.
If there were any processes waiting for the superblock, the first one in the queue
is allowed to run again and will gain exclusive control of the superblock for its
file operations.
The process’s data is written to the new data block and, if that data block is filled,
the entire process is repeated and another data block allocated.