In Linux, as in Unix TM, programs and commands are normally executed by a command interpreter. A command interpreter is a user process like any other process and is called a shell 2.
There are many shells in Linux, some of the most popular are sh, bash and tcsh. With the exception of a few built in commands, such as cd and pwd, a command is an executable binary file. For each command entered, the shell searches the directories in the process’s search path, held in the PATH environment variable, for an executable image with a matching name. If the file is found it is loaded and executed. The shell clones itself using the fork mechanism described above and then the new child process replaces the binary image that it was executing, the shell, with the contents of the executable image file just found. Normally the shell waits for the command to complete, or rather for the child process to exit. You can cause the shell to run again by pushing the child process to the background by typing control-Z, which causes a SIGSTOP signal to be sent to the child process, stopping it. You then use the shell command bg to push it into a background, the shell sends it a SIGCONT signal to restart it, where it will stay until either it ends or it needs to do terminal input or output.
An executable file can have many formats or even be a script file. Script files have to be recognized and the appropriate interpreter run to handle them; for example /bin/sh interprets shell scripts. Executable object files contain executable code and data together with enough information to allow the operating system to load them into memory and execute them. The most commonly used object file format used by Linux is ELF but, in theory, Linux is flexible enough to handle almost any object file format.
Figure: Registered Binary Formats
As with file systems, the binary formats supported by Linux are either built into the kernel at kernel build time or available to be loaded as modules. The kernel keeps a list of supported binary formats (see figure 4.3) and when an attempt is made to execute a file, each binary format is tried in turn until one works.
Commonly supported Linux binary formats are a.out and ELF. Executable files do not have to be read completely into memory, a technique known as demand loading is used. As each part of the executable image is used by a process it is brought into memory. Unused parts of the image may be discarded from memory.
The ELF (Executable and Linkable Format) object file format, designed by the Unix System Laboratories, is now firmly established as the most commonly used format in Linux. Whilst there is a slight performance overhead when compared with other object file formats such as ECOFF and a.out, ELF is felt to be more flexible. ELF executable files contain executable code, sometimes refered to as text, and data. Tables within the executable image describe how the program should be placed into the process’s virtual memory. Statically linked images are built by the linker (ld), or link editor, into one single image containing all of the code and data needed to run this image. The image also specifies the layout in memory of this image and the address in the image of the first code to execute.
Figure: ELF Executable File Format
The figure above shows the layout of a statically linked ELF executable image.
It is a simple C program that prints “hello world” and then exits. The header describes it as an ELF image with two physical headers (e_phnum is 2) starting 52 bytes (e_phoff) from the start of the image file. The first physical header describes the executable code in the image. It goes at virtual address 0x8048000 and there is 65532 bytes of it. This is because it is a statically linked image which contains all of the library code for the printf() call to output “hello world”. The entry point for the image, the first instruction for the program, is not at the start of the image but at virtual address 0x8048090 (e_entry). The code starts immediately after the second physical header. This physical header describes the data for the program and is to be loaded into virtual memory at address 0x8059BB8. This data is both readable and writeable. You will notice that the size of the data in the file is 2200 bytes (p_filesz) whereas its size in memory is 4248 bytes. This because the first 2200 bytes contain pre-initialized data and the next 2048 bytes contain data that will be initialized by the executing code.
When Linux loads an ELF executable image into the process’s virtual address space, it does not actually load the image.
It sets up the virtual memory data structures, the process’s vm_area_struct tree and its page tables. When the program is executed page faults will cause the program’s code and data to be fetched into physical memory. Unused portions of the program will never be loaded into memory. Once the ELF binary format loader is satisfied that the image is a valid ELF executable image it flushes the process’s current executable image from its virtual memory. As this process is a cloned image (all processes are) this, old, image is the program that the parent process was executing, for example the command interpreter shell such as bash. This flushing of the old executable image discards the old virtual memory data structures and resets the process’s page tables. It also clears away any signal handlers that were set up and closes any files that are open. At the end of the flush the process is ready for the new executable image. No matter what format the executable image is, the same information gets set up in the process’s mm_struct. There are pointers to the start and end of the image’s code and data. These values are found as the ELF executable images physical headers are read and the sections of the program that they describe are mapped into the process’s virtual address space. That is also when the vm_area_struct data structures are set up and the process’s page tables are modified. The mm_struct data structure also contains pointers to the parameters to be passed to the program and to this process’s environment variables.
ELF Shared Libraries
A dynamically linked image, on the other hand, does not contain all of the code and data required to run. Some of it is held in shared libraries that are linked into the image at run time. The ELF shared library’s tables are also used by the dynamic linker when the shared library is linked into the image at run time. Linux uses several dynamic linkers, ld.so.1, libc.so.1 and ld-linux.so.1, all to be found in /lib. The libraries contain commonly used code such as language subroutines. Without dynamic linking, all programs would need their own copy of the these libraries and would need far more disk space and virtual memory. In dynamic linking, information is included in the ELF image’s tables for every library routine referenced. The information indicates to the dynamic linker how to locate the library routine and link it into the program’s address space.
Script files are executables that need an interpreter to run them. There are a wide variety of interpreters available for Linux; for example wish, perl and command shells such as tcsh. Linux uses the standard Unux TM convention of having the first line of a script file contain the name of the interpreter. So, a typical script file would start:
The script binary loader tries to find the intepreter for the script.
It does this by attempting to open the executable file that is named in the first line of the script. If it can open it, it has a pointer to its VFS inode and it can go ahead and have it interpret the script file. The name of the script file becomes argument zero (the first argument) and all of the other arguments move up one place (the original first argument becomes the new second argument and so on). Loading the interpreter is done in the same way as Linux loads all of its executable files. Linux tries each binary format in turn until one works. This means that you could in theory stack several interpreters and binary formats making the Linux binary format handler a very flexible piece of software.