TCP-IP
Before we talk about the details of networking, we should first talk about the process of
network communication. Let's take a network
program such as telnet. The telnet program allows you to
login to a remote system. You end up with a shell
just as if you had logged in locally. Although you
are inputting commands on your local keyboard and the output is appearing on your local screen, all other activity is happening on the remote machine.
For simplicity's sake, we can say that there is a telnet program running on each computer.
When you are inputting input something on local keyboard, the local copy of telnet is
accepting input. It passes the
information through the network to the telnet on the remote machine. The
command is executed and the output is handed to the remote telnet. That information is passed back
through the network to the local telnet, which then displays the information on your screen.
Although it may appear as if there is a constant flow of information between your
local machine and the remote one, this is not the case. At any given time
there may be dozens, if not
hundreds of programs using the network. Since only one can use the network at
a time there needs to be a mechanism to allow each program to have its turn.
Think back on our discussion on the kernel.
When we need something from the hard disk, the system does not read everything at once. If it did,
one process could hog the computer if it needed to read in a large file. Instead, disk requests are
send in smaller chunks and the program only thinks that it gets everything it wants. Something
similar is done with network connections.
Computers are like humans beings in that they need to speak the same language in order to
communicate. Regardless of how they are connected, be it serial or Ethernet,
the computers must know how to talk to each other. The communication is carried out in a pre-defined
manner, called a "protocol". Like the protocols diplomats and politicians go through,
computer protocols determine how each side behaves and how it should react to behavior by its
counterpart. Roughly speaking even the interaction between the computer and the hardware, such as
the hard disk, can be considered a protocol.
The most common protocol
used by UNIX
variants, including Linux, is TCP/IP. However, it is more accurate to call TCP/IP a
protocol suite, or protocol family. This is because TCP/IP actually consists
of several different protocols. Even the name consists of two different protocols as TCP/IP stands
for Transmission Control Protocol/Internet Protocol.
TCP/IP is often referred to as protocol
suite as it contains many different protocols and therefore many different ways for
computers to talk to each other. However, TCP/IP is not the only protocol suite.
There are dozens, if not hundreds of different ones, although only a small
portion have gained wide acceptance. Linux only uses a few itself, although the TCP/IP family is
what is delivered by default and most commonly used.
Although the name refers to two specific protocols, TCP/IP usually refers to an entire suite of
protocols and programs. The result of many years of planning and discussion, the TCP/IP suite
includes a set of standards which specify how computers ought to communicate. By following these
standards, computers "speak" the same language and can therefore communicate. In addition to the
actual means of communication that the TCP/IP suite defines conventions for connecting different
networks and routing traffic through routers, bridges and other types of connections.
The TCP/IP suite is result of a Defense Advanced Research Projects Agency (DARPA) research
project on network
connectivity. However, its availability has made it the most commonly installed
network software. Many versions provide source-code which reside in the public domain
allowing users
to adapt it to many new systems. Today, essentially all vendors of network
hardware (e.g.
bridges, routers) support the TCP/IP suite as it is the standard protocol
suite on the Internet
and in most companies.
Whereas the data being transferred to and from the hard disk is talked about in terms of blocks,
the unit of information transfer across a network connection is referred to as
a packet. Depending on the program you are using, this packet can be
a different size. In any event they are small enough to send across the network
fast enough, so that no one process hog the network. In addition, the packets go across the network
so fast that you don't notice that your data is broken in to packets. This is similar to the way the
CPU manages processes. Each one gets a very small turn on the processor.
Because it switches so fast between processes it only seems like you have the processor to your
self.
If we take a step back and look at the process of network
communication more abstractly, we see each portion supported by and supporting another. We can say
that each portion sits on top of another. Or in other words the protocols are stacked on top
of each other. Therefore, TCP/IP is often referred to as a protocol
stack. To see how these layers look graphically, take a look at Figure 0-1.
Each portion of the stack
is referred to as a layer. At the bottom of the stack is the
layer that is responsible for the physical connected between the two computers. This is the physical
layer. Sitting on top of the physical layer is the layer that is responsible for
the network portion of
the stack.
That is, it ensures that packets either stay on the network
or get to the right network
and at the same time ensures that packets get to the right network
address.
This is the network
layer.
On top of the network
layer is the layer that ensures that the packets have been transmitted correctly. That is, there are
no errors and all packets have been received. This is the transport layer.
Finally, at the top of all of this is the layer that the user sees. Since the programs that we use
are often called applications, this upper layer is called the application
layer.
Image - Network Layers (interactive)
Conceptually, each layer is talking to its counterpart on the other system. That is, telnet on
the local machine is passing data to telnet on the remote machine. TCP on the
remote machine sends an acknowledgment to TCP on the local machine when it receives a
packet. IP on the local machine gets information from IP
on the remote machine that tells it that this packet is destined for the local machine. Then there
are the network interface cards that communicate with each other using their
specific language.
This communication between corresponding layers is all conceptual. The actual communication
takes place between the different layers on each machine, not the corresponding layers on
both machines.
When the application
layer has data to send, but prepends an application header onto the data it needs to send.
This header contains information necessary for the application to get the data
to the right part of the application on the receiving side. The application then calls up
TCP to send the information along. TCP wraps that data into a TCP
packet, which contains a TCP header followed by the application data
(including header). TCP then hands the packet (also called a TCP segment) to
IP. Like the layers before it, IP wraps the packet up and prepends an IP
header, to create an IP datagram. Finally, IP hands it off to the hardware driver. If
Ethernet, this includes both an Ethernet header and Ethernet trailer. This
creates an Ethernet frame. How the encapsulation looks graphically,
take a look at Figure 0-2.
As we see, it is the TCP
layer that the application
talks to. TCP sticks the data from the application into a kind of envelope (the process is called
encapsulation) and passes it to the IP layer. Just as the
operating system has a mechanism to keep track of which area of memory belongs
to what processes, the network has a means of keeping track of what data
belongs to what process. This is the job of TCP. It is also the responsibility of TCP to ensure
that the packets are delivered with the correct contents and then to put them in the right order.
(Encapsulation is show graphically in Figure 0-2.)
Error detection is the job of the TCP
envelope which contains a checksum
of the data contained within the packet.
This checksum information sits in the packet header
and is checked on all packets. If the checksum doesn't match the contents of the packet or the
packet doesn't arrive at all, it is the job of TCP to ensure that packet is resent. On the sending
end, TCP waits for an acknowledgment that each packet has been received. If it hasn't received one
within a specific period it will resend that packet. Because of this checksum and the resending of
packets, TCP is considered a reliable connection.
Image - Encapsulation of data (interactive)
Another protocol
that is often used is the User Datagram Protocol (UDP). Like TCP,
UDP
sits on top of IP.
However, UDP provides a connection-less transport between applications. Services, such as
the Network File Service (NFS), that utilize UDP, must provide their own mechanism to ensure
delivery and correct sequencing of packets. Since it can be either broadcast or multicast, UDP also
offers one-to-many services. Because there is no checking by UDP it is also considered
unreliable.
Closest to the hardware level, IP
is a protocol
that provides the delivery mechanism for the protocols. The IP layer serves the same function as
your house addresses, telling the upper layers how to get to where they need to. In fact, the
information used by IP to get the pieces of information to their destination are called IP
addresses. However, IP does not guarantee that the packets arrive in the right order or that they
arrive at all. Just like a letter to your house requires it to be registered in order to ensure that
it gets delivered with the content in-tact, IP depends on the upper layers to ensure the integrity
and sequencing of the packets. Therefore, IP is considered unreliable.
Since the hardware, that is the network
cards do the actual, physical transfer of the packets, it is important that they can be addressed
somehow. Each card has its own, unique identifier. This is the Media Access Control,
or MAC, address. The MAC address
is a 48 bit number that is usually represented by 6 pairs of hexadecimal
numbers, separated by (usually) dashes or colons. Each manufacturer of network card is assigned a
specific range of addresses which usually are specified by the first three pairs of numbers. Each
card has its own, individual address: the MAC address.
When sending a packet,
the IP
layer has to figure out how to send the packet. If the destination is on a different physical
network, then IP needs to send it to the appropriate gateway.
However, if the destination machine is on the local network, the IP layers uses the Address
Resolution Protocol (ARP) to determine what the MAC address of the
Ethernet card is with that IP address.
To figure this out, ARP
will broadcast an ARP packet
across the entire network
asking which MAC address
belongs to a particular IP
address. Although every machines gets this broadcast, only the one out there that matches will
respond. This is then stored by the IP layer in its internal ARP table. You can look at the ARP
table at any time by running the command:
This would give you a response similar to:
siemau 194.113.47.147 at 0:0:2:c:8c:d2
This has the general format:
<machine name> (IP address) at <MAC address>
Since the ARP
table is cached, IP
does not have to send out an ARP request every time it needs to make a connection. Instead, it can
quickly look in the ARP table to make the IP-MAC translation. Then, the packet
is sent to the appropriate machine.
Status and error information is exchanged between machines through the Internet Control Message
Protocol (ICMP). This information can be used by other protocols to recover from transmission
problems or by system administrators to detect problems in the network. One of
the most commonly used diagnostic tools, "ping", makes use of ICMP.
At the bottom of the pile is the hardware or link layer. As I mentioned before, this can be
represented by many different kinds of physical connections: Ethernet,
token-ring, fiber-optics, ISDN, RS-232 to name a few.
This four layer model is common when referring to computer networks. This is the model that is
most commonly used and the one that I will use through the book. There is another model that
consists of seven layers. This is referred to as the OSI model, but we won't be using it here.
|