Accesing the Web | The Linux Tutorial

Accessing the Web

It was difficult to decide where to put this topic. You can’t have access to the Web without networking, however, it looses much of it’s impact unless you are using a graphical interface like X. Because the Web is a network of machines accessed in a common manner, I figured the networking chapter would be the best place to talk about it. I think this is a good choice since, there are character based programs that do not require X.

So what is the Web? Well, as I just mentioned, it is a network of machines. Not all machines on the Internet are part of the Web, but we can safely say that all machines on the Web are part of the Internet. The Web is the shortened version of World Wide Web, and as its name implies it connects machines all over the world.

Created in 1989 at the internationally renowned CERN research lab in Switzerland, the Web was originally begun as a means on linking physicists from all over the world. Because it is easy to use and integrate into an existing network, the Web has grown to a community of tens of thousands of sites with millions of users accessing it. With the integration of Web access software, on-line services have opened the Web up to millions of people who couldn’t have used it before.

What the Web really is, is a vast network of inter-linked documents, or resources. These resources may be pure text but can include images, sound and even videos. The links between resources are made through the use of the concept of hypertext. Now, hypertext is not something new. It has been used for years in on-line help systems, for example, like those in MS-Windows’ programs. Certain words or phrases are presented in a different format (often a different color or maybe underlined). These words or phrases are linked to other resources. When you click on them, the resource that is linked is called up. This resource could be the next page, a graphics image, or even video.

Resources are loaded from their source by means of the hypertext transfer protocol, HTTP. In principle, this is very much like FTP, in that resources are files that are transferred to the requesting site. It is then up to the requesting application to make use of that resource, such as display and image or playing an animation. In many cases, files are actually retrieved using ftp instead of HTTP and the application simply saves the file on the local machine.

The application that is used to access the Web is called a Web browser. Web resources are provided by Web Servers. A Web Servers is simply a machine running the HTTP daemon: httpd. Like other network daemons, httpd receives requests from a Web client (such as Mozilla or Konqueror) and sends it the requested resource.

Like the ftp daemon, httpd, is a relatively secure means of allowing anonymous access to your system. You can define a root directory, which, like ftp, prevents users from going “above” the defined root directory. Access to files or directories can be defined on machine basis and you can even provided password control over files.

When httpd starts, it reads its configuration files and begins listening for requests from a document viewer (one that uses the HTTP protocol). When a document is requested, httpd checks for the file relative to the DocumentRoot (defined in srm.conf).

Web pages are written in the Hypertext Markup Language (HTML). This is “plain-text” file that can be edited by any editor, like vi. Recently, as a result of the increasing popularity of the Web, dozens, if not hundreds of commercially available HTML editors have become available. The HTML commands are similar, and also simpler, that those used by troff. In addition to formatting commands, there are build in commands that tell the Web Browser to go out and retrieve a document. You can also create links to specific locations (labels) within that document. Access to the document is by means of a Uniform Resource Locator (URL).

There are several types of URLs that perform different functions. Several different program can be used to access these resources such as ftp, http, gopher, or even telnet. If you leave off the program name, the Web browser assumes that it refers to a file on your local system. However, just like ftp or telnet you can specifically make references to the local machine. I encourage using absolute names like that as it makes transferring Web pages that much easier.

All that you need to access the Web is an Internet connection. If you can do ftp and telnet, then you can probably use the Web. So, assuming you have a Web browser and an Internet connection. The question is where do you go? The question is comparable to “Given a unlimited value plane ticket, where do you go on vacation?” The sky is the limit.

As I mentioned, the convention is that the Web server’s machine name is www.domain.name. To access their home page, the URL would be http://www.domain.name. For example, to get to your home page, the URL is http://www.yourdomain.com. In order to keep from typing so much, I will simply refer to the domain. name and you can expand it out the rest of the way. In some cases, where the convention is not followed, I’ll give you the missing information.

I remember when comet Schumaker-Levy 9 was making history by plowing into the backside of Jupiter. The Jet Propulsion Laboratory has a Web site, on which they regularly updated the images of Jupiter. I still remember my friends asking me if I had seen the “lastest” images. If they were more than three hours old, I would shrug them off as ancient history. With the explosion of the Internet and spread of webcams, it is now possible to get live images from all over the world, directly on your desk top.

The issue of Usenet newsgroups opens up a whole can of worms. Without oversimplifying too much, we could say that Usenet was the first, nation-wide on-line bulletin-board. Whereas the more commercial services like CompuServe store their messages in a central location, Usenet is based on the “store and forward” principle. That is, messages are stored on a message and forwarded to the next at regular intervals. If those intervals are not all that often, it may be hours or even days before messages are propagated to every site.

Messages are organized into a hierarchical, tree structure, very much like many things in UNIX. (although you don’t have to be running a UNIX machine to be accessing Usenet. Groups range from things like rec.arts.startrek.fandom to alt.sex.bondage to comp.unix.admin.

Although I would love to go into more details, this really goes beyond the scope of this book. Instead, I would like to recommend Using UUCP and Usenet by Grace Todino and Dale Dougherty, and Managing UUCP and Usenet by Tim O’Reilly and Grace Todino, both from O’Reilly and Associates. In addition, there is a relatively new book that goes into more details about how Usenet is organized, what newsgroups are available and some general information about behavior and interaction with other when participating in a Usenet sendmail. This is Usenet Netnews for Everyone by Jenny Fristrup, from Prentice Hall.