Hash Tables
Linked lists are handy ways of tying data structures together, but navigating
linked lists can be inefficient.
If you were searching for a particular element, you might easily have to look at the whole
list before you find the one that you need.
Linux uses another technique, hashing, to get around this restriction.
A hash table is an array or vector of pointers.
An array, or vector, is simply a set of things coming one after another in memory.
A bookshelf could be said to be an array of books.
Arrays are accessed by an index, which is an offset into the array's associated
area in memory.
Taking the bookshelf analogy a little further, you could describe each book by its position
on the shelf; you might ask for the 5th book.
A hash table is an array of pointers to data structures and its index is derived from information
in those data structures.
If you had data structures describing the population of a village then you could use a person's
age as an index.
To find a particular person's data you could use their age as an index into the population hash
table and then follow the pointer to the data structure containing the person's details.
Unfortunately many people in the village are likely to have the same age and so the hash
table pointer becomes a pointer to a chain or list of data structures each describing people
of the same age.
However, searching these shorter chains is still faster than searching all of the data
structures.
As a hash table speeds up access to commonly used data structures, Linux often uses hash tables
to implement caches.
Caches are handy information that needs to be accessed quickly and are usually a subset
of the full set of information available.
Data structures are put into a cache and kept there because the kernel often
accesses them.
The drawback to caches is that they are more complex to use and maintain than simple
linked lists or hash tables.
If the data structure can be found in the cache (this is known as a cache hit),
then all well and good.
If it cannot then all of the relevant data structures must be searched and, if the
data structure exists at all, it must be added into the cache.
In adding new data structures into the cache an old cache entry may need discarding.
Linux must decide which one to discard, the danger being that the discarded data structure
may be the next one that Linux needs.
|