Executing Programs
In Linux, as in Unix TM, programs and commands are normally executed by a command
interpreter.
A command interpreter is a user process like any other process and is called a shell
2.
There are many shells in Linux, some of the most popular are sh, bash and
tcsh.
With the exception of a few built in commands, such as cd and pwd, a
command is an executable binary file.
For each command entered, the shell searches the directories in the process's search
path, held in the PATH environment variable, for an executable image with a matching
name.
If the file is found it is loaded and executed.
The shell clones itself using the fork mechanism described above and then the new child
process replaces the binary image that it was executing, the shell, with the contents of
the executable image file just found.
Normally the shell waits for the command to complete, or rather for the child process
to exit.
You can cause the shell to run again by pushing the child process to the
background by typing control-Z, which causes a SIGSTOP signal to be sent to
the child process, stopping it.
You then use the shell command bg to push it into a background, the
shell sends it a SIGCONT signal to restart it, where it will stay
until either it ends or it needs to do terminal input or output.
An executable file can have many formats or even be a script file.
Script files have to be recognized and the appropriate interpreter run to handle
them; for example /bin/sh interprets shell scripts.
Executable object files contain executable code and data together with enough
information to allow the operating system to load them into memory and execute
them.
The most commonly used object file format used by Linux is ELF but, in theory,
Linux is flexible enough to handle almost any object file format.
Figure: Registered Binary Formats
As with file systems, the binary formats supported by Linux are either built
into the kernel at kernel build time or available to be loaded as modules.
The kernel keeps a list of supported binary formats (see figure 4.3)
and when an attempt is made to execute a file, each binary format is tried
in turn until one works.
Commonly supported Linux binary formats are a.out and ELF.
Executable files do not have to be read completely into memory, a technique
known as demand loading is used.
As each part of the executable image is used by a process it is brought into memory.
Unused parts of the image may be discarded from memory.
ELF
The ELF (Executable and Linkable Format) object file format, designed
by the Unix System Laboratories, is now firmly established as the most
commonly used format in Linux.
Whilst there is a slight performance overhead when compared with other
object file formats such as ECOFF and a.out, ELF is felt to be more flexible.
ELF executable files contain executable code, sometimes refered to as text,
and data.
Tables within the executable image describe how the program should be placed into the
process's virtual memory.
Statically linked images are built by the linker (ld), or link editor, into one
single image containing all of the code and data needed to run this image.
The image also specifies the layout in memory of this image and the address in the
image of the first code to execute.
Figure: ELF Executable File Format
The figure above shows the layout of a statically linked ELF executable
image.
It is a simple C program that prints ``hello world'' and then exits.
The header describes it as an ELF image with two physical headers (e_phnum is 2)
starting 52 bytes (e_phoff) from the start of the image file.
The first physical header describes the executable code in the image.
It goes at virtual address 0x8048000 and there is 65532 bytes of it.
This is because it is a statically linked image which contains all of the library
code for the printf() call to output ``hello world''.
The entry point for the image, the first instruction for the program, is not at
the start of the image but at virtual address 0x8048090 (e_entry).
The code starts immediately after the second physical header.
This physical header describes the data for the program and is to be loaded into
virtual memory at address 0x8059BB8.
This data is both readable and writeable.
You will notice that the size of the data in the file is 2200 bytes (p_filesz)
whereas its size in memory is 4248 bytes.
This because the first 2200 bytes contain pre-initialized data and the next 2048 bytes
contain data that will be initialized by the executing code.
When Linux loads an ELF executable image into the process's virtual address space, it
does not actually load the image.
It sets up the virtual memory data structures, the process's vm_area_struct tree and
its page tables.
When the program is executed page faults will cause the program's code and data to be
fetched into physical memory.
Unused portions of the program will never be loaded into memory.
Once the ELF binary format loader is satisfied that the image is a valid ELF executable
image it flushes the process's current executable image from its virtual memory.
As this process is a cloned image (all processes are) this, old, image is the
program that the parent process was executing, for example the command interpreter shell
such as bash.
This flushing of the old executable image discards the old virtual memory data structures
and resets the process's page tables.
It also clears away any signal handlers that were set up and closes any files that are
open.
At the end of the flush the process is ready for the new executable image.
No matter what format the executable image is, the same information gets set up in the
process's mm_struct.
There are pointers to the start and end of the image's code and data.
These values are found as the ELF executable images physical headers are read and the
sections of the program that they describe are mapped into the process's virtual address
space.
That is also when the vm_area_struct data structures are set up and the process's page
tables are modified.
The mm_struct data structure also contains pointers to the parameters to be passed to
the program and to this process's environment variables.
ELF Shared Libraries
A dynamically linked image, on the other hand, does not contain all of the code and data
required to run.
Some of it is held in shared libraries that are linked into the image at run time.
The ELF shared library's tables are also used by the dynamic linker when the
shared library is linked into the image at run time.
Linux uses several dynamic linkers, ld.so.1, libc.so.1 and ld-linux.so.1, all
to be found in /lib.
The libraries contain commonly used code such as language subroutines.
Without dynamic linking, all programs would need their own copy of the these libraries
and would need far more disk space and virtual memory.
In dynamic linking, information is included in the ELF image's tables for every
library routine referenced.
The information indicates to the dynamic linker how to locate the library routine and link
it into the program's address space.
Script Files
Script files are executables that need an interpreter to run them.
There are a wide variety of interpreters available for Linux; for example wish,
perl and command shells such as tcsh.
Linux uses the standard Unux TM convention of having the first line of a script file
contain the name of the interpreter. So, a typical script file would start:
#!/usr/bin/wish
The script binary loader tries to find the intepreter for the script.
It does this by attempting to open the executable file that is named in the
first line of the script.
If it can open it, it has a pointer to its VFS inode and it can go ahead
and have it interpret the script file.
The name of the script file becomes argument zero (the first argument) and all
of the other arguments move up one place (the original first argument becomes the
new second argument and so on).
Loading the interpreter is done in the same way as Linux loads all of its executable
files.
Linux tries each binary format in turn until one works.
This means that you could in theory stack several interpreters and binary formats
making the Linux binary format handler a very flexible piece of software.
|