Notice: This material is excerpted from Special Edition Using HTML, 2nd Edition, ISBN: 0-7897-0758-6. This material has not yet been through the final proof reading stage that it will pass through before being published in printed form. Some errors may exist here that will be corrected before the book is published. This material is provided "as is" without any warranty of any kind.
by Mark Brown
Contrary to what the media would have you believe, the World Wide Web
did not spring into being overnight. Though relatively new in human terms,
the Web has a venerable
genealogy for a computing technology. It can trace
its roots back over 25 years, which is more than half the distance back
to the primordial dawn of the electronic computing age.
However, the media is right in noting that the Web's phenomenal growth has so far outstripped that of any of its predecessors that, like a prize hog, it has left almost no room at the trough for any of them anymore. But like that prize hog, the Web is so much bigger and better and so much more valuable than the network technologies that preceded it, there is little reason to mourn the fact that they've been superseded.
In this chapter I'll discuss the history, development, and characteristics of the Web. You'll find out where it came from and what it's good for. If you're the impatient type and you want to just start using HTML to develop Web pages as quickly as possible, you can certainly skip this chapter and jump right in. However, as with all things, a little understanding of the background and underlying structure of the Web will not only enhance your enjoyment of and appreciation for what it is and what it can do, but it might even give you some insights into how to approach the development of your own Web sites.
The Web came out of the Internet, and it is both empowered and limited
by the structure of the
Internet. Today, most
Web browsers include the
capability to access other
Internet technologies, such as Gopher, e-mail,
and Usenet news, as well as the World Wide Web. So the more you know about
the Internet as a whole, as well as the Web's place in it, the better you'll
understand how to exploit the entire Net to its fullest potential.
Then, too, the Web and the Internet are more than just technology: they
are an environment in which the members of an entire cyberculture communicate,
trade, and interact. If you hope to establish your own Web site and make
yourself a part of that culture, you'd better know what you're getting
into. In a way, it's like moving to another country and trying to set up
shop; if you don't speak the lingo and learn the customs, you'll never
become a part of the community.
In the late 1950s, at the height of the Cold War, the Department of
Defense began to worry about what would happen to the nation's
communications
systems in the event of an
atomic war. It was obvious that maintaining
communications would be vital to the waging of a worldwide war, but it
was also obvious that the very nature of an all-out
nuclear conflict would
practically guarantee that the nation's existing
communications systems
would be knocked out.
In 1962, Paul Baran, a researcher at the government's RAND think tank, described a solution to the problem in a paper titled "On Distributed Communications Networks." He proposed a nationwide system of computers connected together using a decentralized network so that if one or more major nodes were destroyed, the rest could dynamically adjust their connections to maintain communications.
If, for example, a computer in Washington, D.C., needed to communicate
with one in Los Angeles, it might normally pass the information first to
a computer in Kansas City, then on to L.A. But if Kansas City were destroyed
or knocked out by an
A-bomb blast, the Washington computer could reroute
its communications through, say, Chicago instead, and the data would still
arrive safely in L.A. (though too late to help the unfortunate citizens
of Kansas City).
The proposal was discussed, developed, and expanded by various members
of the computing community. In 1969, the first packet-switching network
was funded by the
Pentagon's Advanced Research Projects Agency (ARPA).
So What's Packet Switching?
Packet switching is a method of breaking up data files into small pieces-usually only a couple of kilobytes or less-called packets, which can then be transmitted to another location. There, the packets are reassembled to recreate the original file. Packets don't have to be transmitted in order or even by the same route. In fact, the same packet can be transmitted by several different routes just in case some don't come through. The receiving software at the other end throws away duplicate packets, checks to see if others haven't come through (and asks the originating computer to try to send them again), sorts them into their original order, and puts them back together again into a duplicate of the original data file. Although this isn't the fastest way to transmit data, it is certainly one of the most reliable.
Packet switching also enables several users to send data over the same connection by interleaving packets from each data stream, routing each to its own particular destination.
Besides the original file data, data packets may include information about where they came from, the places they've visited in transit, and where they're going to. The data they contain may be compressed and/or encrypted. Packets almost always also include some kind of information to indicate whether the data that arrives at the destination is the same data that was sent in the first place.
ARPAnet, as it was called, linked four research facilities: the University of California at Los Angeles (UCLA), the Stanford Research Institute (SRI), the University of California at Santa Barbara (UCSB), and the University of Utah. By 1971, ARPAnet had grown to include 15 nodes; there were a grand total of 40 by 1972. That year also marked the creation of the InterNetworking Working Group (INWG), which was needed to establish common protocols for the rapidly growing system.
For more on the history of the Internet, consult Bruce Sterling's excellent article on the topic at gopher://oak.zilker.net:70/00/bruces/F_SF_Science_Column/F_SF_Five_.
Because ARPAnet was decentralized, it was easy for computer administrators
to add their machines to the network. All they needed was a phone line,
a little hardware, and some free NCP (Network Control Protocol)
software. Within just a few years, there were over a hundred mainframe
computers connected to ARPAnet, including some overseas.
ARPAnet immediately became a forum for the exchange of information and
ideas. Collaboration among scientists and educators was the number one
use of the system, and the main incentive for new sites to want to be connected.
Thus, it is not surprising that the first major application developed for
use on the ARPAnet was electronic mail.
With the advent of Ray Tomlinson's e-mail system in 1972, researchers
connected to the Net could establish one-on-one communication links with
colleagues all over the world and could exchange ideas and research at
a pace never before imagined. With the eventual addition of the ability
to send mail to multiple recipients, mailing lists were born and users
began open discussions on a multitude of topics, including "frivolous"
topics, such as science fiction.
There are thousands of mailing lists you can subscribe to on the Internet today, covering topics as diverse as PERL programming and dog breeding. For a list of some of the many mailing lists available on the Net, check out Stephanie de Silva's list of Publicly Accessible Mailing Lists, updated monthly, at http://www.neosoft.com/internet/paml/, the list of LISTSERV lists at http://tile.net/listserv/, or the forms-searchable Liszt database of 25,000 mailing lists at http://www.liszt.com/.
E-mail has proven its value over time and has remained one of the major
uses of the
Net. In fact, e-mail is now handled internally by many World
Wide Web browsers, such as Netscape 2.0 (see
fig. 1.1), so a separate e-mail program is not required.
Fig. 1.1
Reading or sending e-mail with
Netscape Navigator 2.0 brings
up a separate
e-mail window, shown here.
See "Linking HTML Documents"
You can find answers to most of your questions about
Internet e-mail in the
directory of e-mail FAQs at ftp://ftp.uu.net/usenet/news.answers/mail/.
Deciphering Internet e-mail addresses can be a bit challenging. Like
a letter sent through the mail, an
electronic mail message must be sent
to a specific address (or list of addresses). The format for an e-mail
address is name@site (which is verbalized as "name at site").
The name portion of the address is the recipient's personal e-mail
account name. At many sites, this may be the user's first initial and last
name. For example, my
e-mail account name is
mbrown. However,
e-mail
names consist of anything from an obscure set of numbers and/or letters
(70215.1034) to a funky nickname (spanky). (One nearly ubiquitous
e-mail name is
webmaster. This
generic name is used by Webmasters
at most of the
Web sites in the world.)
The site portion of an e-mail address is the
domain name of the server
that the account is on. For example, all
America Online users are at
aol.com,
and all
CompuServe users are at
compuserve.com. I'm at neural.com, so my
complete
e-mail address is mbrown@neural.com.
If you don't know someone's e-mail address, there are a variety of "white pages" services available on the Web for looking them up. As always, a good list of such services can be found on Yahoo! at http://www.yahoo.com/Reference/White_Pages/. My current favorite is the Internet Address Finder at http://www.iaf.net/ (see fig. 1.2).
For more information on Internet e-mail addresses, including lists of domain names for many popular online services, see John J. Chew's and Scott Yanoff's interactive forms-based "Inter-Network Mail Guide" at http://alpha.acast.nova.edu/cgi-bin/inmgq.pl.
The Internet Address Finder can be used to find the
e-mail addresses
of over 3.5 million
Internet users.
A logical extension of the mailing list is the interactive conference,
or
newsgroup. The concept of
interactive conferencing actually predates
the existence of the computers to do it on; it was first proposed by Vannevar
Bush in an article titled "As We May Think" in the Atlantic
Monthly in 1945 (v196(1), p. 101-108).
The first actual online conferencing system was called Delphi
(after the Greek oracle), and it debuted in 1970. Though slow, it did enable
hundreds of researchers at multiple locations to participate in an organized,
ongoing, international discussion group. It is not an exaggeration to say
that it revolutionized the way research is done.
In 1976, AT&T Bell Labs added UUCP (UNIX-to-UNIX CoPy) to
the
UNIX V7 operating system. Tom Truscott and
Jim Ellis of Duke University
and Steve Bellovin at the
University of North Carolina developed the first
version of
Usenet, the UNIX User Network, using UUCP and
UNIX shell scripts
and connected the two sites in 1979.
Usenet quickly became the
online conferencing
system of choice on the Net. In 1986, the
Network News Transfer Protocol
(NNTP) was created to improve
Usenet news performance over
TCP/IP networks.
Since then, it has grown to accommodate more than 2.5 million people a
month and is available to over ten million users at over 200,000 sites.
Another important online conferencing system, BITNET (the "Because It's Time NETwork"), was started two years after Usenet at the City University of New York (CUNY). BITNET uses
e-mail and a group mailing list server (listserv) to distribute more than 4,000 discussion groups to thousands of users daily.
Although
BITNET traffic has peaked and is likely to be superseded completely by
Usenet at some time in the future, it still plays an important role in online conferencing.
Usenet Newsgroups
There are over 10,000 active Usenet newsgroups, all of which are organized into hierarchies by subject matter The seven major categories are as follows:
There are also additional less-official groups that may not be carried by all
Usenet sites. The following are the three most popular:
If you have a question about what a newsgroup is all about or what is appropriate to post, you can usually find a Frequently Asked Questions (FAQ) list that will give you the answer. Most of the
Usenet newsgroup FAQs are posted every month to the
newsgroup news.answers. Many
Web sites archive the most current
Usenet FAQs. ftp://ftp.uu.net/usenet/news.answers/ is a good place to start.
In some Usenet groups, it's more important to stay on topic than it is in others. For example, you really don't want the messages in a scientific research group to degenerate into flame wars over which personal computer is best. To make sure this doesn't happen, many of the more serious Usenet groups are moderated.
In a moderated group, all posted articles are first mailed to a human moderator who combs through the messages to make sure they're on topic. Appropriate messages are then posted for everyone to see, while inappropriate messages are deleted. The moderator may even e-mail
posters of inappropriate messages to warn them not to repeat their indiscretions, or may lock them out of the
newsgroup altogether.
Usenet is not the
Internet or even a part of the
Internet; it
may be thought of as operating in parallel to and in conjunction with the
Internet. While most
Internet sites carry
Usenet newsfeeds, there is no
direct or official relationship between the two. However, Usenet news has
become such an important part of computer internetworking that a
newsreader
is now built into many
Web browsers (see fig.
1.3).
Many browsers, such as Netscape 2.0, now incorporate an integral
newsreader for reading and posting to
Usenet newsgroups.
The definitive
online guide to
Usenet is the comprehensive list of
Usenet FAQs archived at http://www.cis.ohio-state.edu/hypertext/faq/usenet/usenet/top.html.
You can find Usenet newsgroups of interest using the search form at
http://www.cen.uiuc.edu/cgi-bin/find-news.
The Usenet Info Center Launch Pad at http://sunsite.unc.edu/usenet-i/
also offers a wealth of information on Usenet, including lists and indexes
of available Usenet discussion groups.
By the mid-1970s, many government agencies were on the ARPAnet, but each was running on a network developed by the lowest bidder for their specific project. For example, the Army's system was built by DEC, the Air Force's by IBM, and the Navy's by Unisys. All were capable networks, but all spoke different languages. What was clearly needed to make things work smoothly was a set of networking protocols that would tie together disparate networks and enable them to communicate with each other.
In 1974, Vint Cerf and Bob Kahn published a paper titled "
A Protocol
for
Packet Network Internetworking" that detailed a design that would
solve the problem. In 1982, this solution was implemented as TCP/IP.
TCP stands for Transmission Control Protocol; IP is the abbreviation
for Internet Protocol. With the advent of TCP/IP, the word Internet-which
is a
portmanteau word for interconnected networks-entered
the language.
The TCP portion of the TCP/IP provides data
transmission verification
between client and server: If data is lost or scrambled, TCP triggers
retransmission
until the errors are corrected.
You've probably heard the term socket mentioned in conjunction with TCP/IP. A socket is a package of subroutines that provide access to TCP/IP protocols. For example, most Windows systems have a file called winsock.
dll in the
windows/system directory that is required for a
Web browser or other communications program to hook up to the
Internet.
The IP portion of TCP/IP moves data packets from node to node. It decodes
addresses and routes data to designated destinations. The
Internet Protocol
(IP) is what creates the network of networks, or
Internet, by linking
systems at different levels. It can be used by small computers to communicate
across a LAN (Local Area Network) in the same room or with computer networks
around the world. Individual computers connected via a
LAN (either Ethernet
or token ring) can share the
LAN setup with both TCP/IP and other network
protocols, such as Novell or Windows for Workgroups. One computer on the
LAN then provides the
TCP/IP connection to the outside world.
The Department of Defense quickly declared the
TCP/IP suite as the standard
protocol for internetworking
military computers. TCP/IP has been ported
to most
computer systems, including
personal computers, and has become
the new standard in internetworking. It is the protocol set that provides
the infrastructure for the
Internet today.
TCP/IP comprises over 100 different protocols. It includes services for remote logon, file transfers, and data indexing and retrieval, among others.
An excellent source of additional information on TCP/IP is the Introduction to
TCP/IP Gopher site at the
University of California at Davis. Check it out at
gopher://gopher-chem.ucdavis.edu/11/Index/Internet_aw/Intro_the_Internet/intro.to.ip/.
One of the driving forces behind the development of ARPAnet was the desire to afford researchers at various locations the ability to log on to remote computers and run programs. At the time, there were very few computers in existence and only a handful of powerful supercomputers (though the supercomputers of the early 1970s were nowhere near as powerful as the desktop machines of today).
Along with e-mail, remote logon was one of the very first capabilities
built into the ARPAnet.
Today, there is less reason for logging on to a remote system and running
programs there. Most major government agencies, colleges, and research
facilities have their own computers, each of which is as powerful as the
computers at other sites.
TCP/IP provides a remote logon capability through the
Telnet
protocol. Users generally log in to a
UNIX shell account on the remote
system using a text-based or
graphics-based terminal program. With Telnet,
the user can list and navigate through directories on the remote system
and run programs.
The most popular programs run on shell accounts are probably
e-mail
programs, such as PINE; Usenet news readers, such as nn or rn; and text
editors, such as vi or Emacs. Students are the most common users of Telnet
these days; professors, scientists, and administrators are more likely
to have a more direct means of access to powerful computers, such as an
X Windows terminal.
Most Web browsers don't include built-in
Telnet capabilities.
Telnet
connections are usually established using a stand-alone
terminal program,
such as that shown in figure 1.4. These
programs can also be used by those who want Telnet capabilities on the
Web by configuring them as browser helper applications.
A Telnet session can be initiated with an
Internet computer using
a stand-alone
terminal program, such as QVTNET on Windows shown here.
An excellent
online guide to
Telnet is located on the
University of Washington Library's site at http://www.lib.washington.edu/libinfo/inetguides/inet6.html.
The ability to transfer data between computers is central to the internetworking
concept. TCP/IP implements computer-to-computer data transfers thorough
FTP (File Transfer Protocol).
An FTP session involves first connecting to and signing on to an
FTP
server somewhere on the Net. Most
public FTP sites allow anonymous
FTP.
This means you can sign in with the user name anonymous and use
your e-mail address as your password. However, some sites are restricted
and require the use of an assigned user name and password.
Once in, you can list the files available on the site and move around through the directory structure just as though you were on your own system. When you've found a file of interest, you can transfer it to your computer using the get command (or mget for multiple files). You can also upload files to an FTP site using the put command.
The FTP process was originally designed for text-only
UNIX shell style
systems. But today, there are many
FTP programs available that go way beyond
the original
FTP capabilities, adding windows, menus, buttons, automated
uploading and downloading, site directories, and many more modern amenities.
One of the biggest lists of
FTP sites on the Web is the
Monster FTP Sites List at http://hoohoo.ncsa.uiuc.edu/ftp/.
Using Anonymous FTP to obtain
freeware and
shareware programs,
electronic
texts, and
multimedia files remains one of the most popular activities
on the Internet-so much so that
FTP capabilities are now built into most
Web browsers (see fig. 1.5).
Web browsers, such as Netscape 2.0, generally handle anonymous
FTP
too, automatically creating an on-screen directory file with icons and
clickable links.
When accessing an FTP site using a Web browser, the URL will be preceded by
ftp:// rather than the http:// shown when you're viewing a Web site.
Individual files on an FTP site are handled according to the way they
are defined in your browser's configuration setup, just as though you were
browsing a Web site. For example, if you're exploring an
FTP site and click
the link for a .gif picture file, it will be displayed in the browser window.
Text files and
HTML encoded files will be displayed too. If you have configured
helper applications for sound or video, clicking these types of files will
display them using the configured helper applications. Clicking an unconfigured
file type will generally bring up a
requester asking you to configure a
viewer or save the file to disk.
Since you most often want to save files to disk from an FTP site, not view them, you can generally get around all this by using the browser's interactive option to save a file rather than display it. For example, in Netscape you can choose to save a file rather than view it by simply holding down the Shift key before clicking the file's link.
You might wonder, with hundreds of FTP sites on the Net and millions
of files stored at those sites, how in the world can you ever hope to find
the file you're looking for? Archie is the answer. Archie is a program
for finding files stored on any anonymous FTP site on the
Internet. The
Archie Usage Guide at http://info.rutgers.edu/Computing/Network/Internet/Guide/archie.html
provides an excellent overview of Archie, including instructions on how
to find and hook up to Archie servers on the Net.
The complete list of
FTP-related FAQs is located online at
http://www.cis.ohio-state.edu/hypertext/faq/usenet/ftp-list/faq/faq.html.
Along with e-mail, remote logon, and file transfer, information indexing and retrieval was one of the original big four concepts behind the idea of internetworking.
Though there were a plethora of different data indexing and retrieval
experiments in the early days of the Net, none was ubiquitous until, in
1991, Paul Lindner and Mark P. McCahill at the University of Minnesota
created
Gopher. Though it suffered from an overly cute (but highly
descriptive) name, its technique for organizing files under an intuitive
menuing system won it instant acceptance on the Net.
Gopher treats all data as a menu, a document, an index, or a Telnet
connection. Through Telnet, one
Gopher site can access others, making it
a true internetwork application capable of delivering data to a user from
a multitude of sites via a single interface.
The direct precursor in both concept and function to the World Wide
Web,
Gopher lacks
hypertext links or
graphic elements. Its function on
the Net is being taken over by the Web, though there are currently still
several thousand Gopher sites on the Net, and it will probably be years
before Gopher disappears completely. Because so much information is still
contained in Gopher databases, the ability to navigate and view Gopherspace
is now built into most
Web browsers (see fig.
1.6).
Gopher sites like this one are displayed just fine by most
Web browsers.
When accessing a Gopher site using a Web browser, the URL will be preceded by
gopher:// rather than the http:// shown when you're viewing a Web site.
As Archie is to FTP, Veronica is to Gopher. That is, if you want to know where something is on any Gopher site on the Net, the Veronica program can tell you. For a connection to Veronica via the Web, go to http://www.scs.unr.edu/veronica.html.
Although I'm slightly embarrassed to do so, I know that I must pass along to you the information that Veronica is actually an acronym, though it is almost never capitalized as one should be. What does it stand for? Would you believe Very Easy Rodent Oriented Net-wide Index to Computerized Archives?
The
Net's best Gopher sites are on the
Gopher Jewels list at http://galaxy.einet.net/GJ/.
For more about Gopher, consult the Gopher FAQ at http://www.cis.ohio-state.edu/hypertext/faq/usenet/gopher-faq/faq.html.
With the near-universal changeover to TCP/IP protocols in the years
following 1982, the word Internet became the common term for referring
to the worldwide network of research, military, and
university computers.
In 1983, ARPAnet was divided into ARPAnet and MILNET. MILNET was soon
integrated into the Defense Data Network, which had been created in 1982.
ARPAnet's role as the network backbone was taken over by NSFNET (the National
Science Foundation NETwork), which had been created in 1986 with the aid
of NASA and the
Department of Energy to provide an improved backbone speed
of 56Kbps for interconnecting a new
generation of research supercomputers.
Connections proliferated, especially to colleges, when in 1989 NSFNET was
overhauled for faster T1 line connectivity by
IBM, Merit, and
MCI. ARPAnet
was finally retired in 1990.
In 1993, InterNIC (the Internet Network Information Center) was created
by the National Science Foundation to provide information, a directory
and database, and registration services to the Internet community. InterNIC
is, thus, the closest thing there is to an Internet administrative center.
However, InterNIC doesn't dictate
Internet policy or run some huge central
computer that controls the
Net. Its sole purpose is to handle organizational
and "bookkeeping" functions, such as assigning Internet addresses
(see the sidebar, "Domain Names").
Computers on the
Internet are referenced using IP addresses, which are comprised of a series of four numbers separated by periods (always called dots). Each number is an 8-bit integer (a number from 0-255). For example, the
IP address of my Web server at
Neural Applications is 198.137.221.9 (verbalized as "one-ninety-eight dot one-thirty-seven dot two-twenty-one dot nine").
However, because addresses composed of nothing but numbers are difficult for humans to remember, in 1983 the University of Wisconsin developed the Domain Name Server (DNS), which was then introduced to the Net during the following year. DNS automatically and invisibly translates names composed of real words into their
numeric IP addresses, which makes the Net a lot more user-friendly. To use the same example cited above, the DNS address of Neural's Web server is www.neural.com (pronounced "double-u double-u double-u dot neural dot cahm").
There is no formula for calculating an IP address from a
domain name-the correlation must be established by looking one or the other up in a table.
Domain names consist of two or more parts, separated by periods (always, in Internet parlance, pronounced dot). Generally speaking, the leftmost part of the name is the most specific, with sections further to the right more general. A computer may have more than one
domain name assigned to it, but any given domain name will "resolve" into only one specific IP address (which is unique for each machine). Usually, all the machines on one network will share a right-hand and middle
domain name portion. For example, you might see computers at one site with the names:
The leftmost portion of a
domain name may indicate its purpose; for example, www. for a Web server or mail. for a mail server.
The
rightmost portion of a domain name often indicates the type of site it lives on. The most common domain name extensions are:
Other (generally two-letter) extensions indicate a site's country of origin, such as .ca for Canada, .de for Germany, or .fr for France.
The topic of
domain names is covered to the point of exhaustion in the
Usenet FAQ on the topic, which can be downloaded from ftp://ftp.uu.net/usenet/news.answers/internet/tcp-ip/domains-faq/.
Your organization can get an IP address assigned by sending electronic
mail to Hostmaster@INTERNIC.NET.
This service used to be free, but there is now a reasonable charge because
of the tremendous growth of the Internet and the privatization of the process.
For more information, point your browser to InterNIC's Web site at http://rs.internic.net/rs-internic.html.
One of the best
online guides to the
Internet as a whole is the
Electronic Freedom Foundation's Extended Guide to the
Internet at http://www.eff.org/papers/bdgtti/eegtti.html.
By 1990, the European High-Energy Particle Physics Lab (CERN) had become
the largest
Internet site in Europe and was the driving force in getting
the rest of Europe connected to the Net. To help promote and facilitate
the concept of distributed computing via the
Internet, Tim Berners-Lee
created the
World Wide Web in 1992.
The Web was an extension of the Gopher idea, but with many, many improvements.
Inspired by Ted Nelson's work on Xanadu and the hypertext concept, the
World Wide Web incorporated
graphics, typographic text styles, and-most
importantly-hypertext links.
The
hypertext concept predates
personal computers. It was first proposed by computer visionary Ted Nelson in his ground-breaking self-published book Computer Lib/Dream Machines in 1974.
In a nutshell,
electronic hypertext involves adding links to words or phrases. When selected, these links jump you to associated text in the same document or in another document altogether. For example, you could click an unfamiliar term and jump to a definition, or add your own notes that would be optionally displayed when you or someone else selected the note's hyperlink.
The hypertext concept has since been expanded to incorporate the idea of hypermedia, in which links can also be added to and from
graphics, video, and
audio clips.
The Web uses three new technologies: HTML, or
HyperText Markup Language,
is used to write
Web pages; a
Web server computer uses HTTP (HyperText
Transfer Protocol) to transmit those pages; and a
Web browser client program
receives the data, interprets it, and displays the results.
Using HTML, almost anyone with a text editor and an
Internet site can
build visually interesting pages that organize and present information
in a way seldom seen in other
online venues. In fact,
Web sites are said
to be composed of pages because the information on them looks more
like magazine pages than traditional computer screens.
HTML is, itself, a subset of the much more complex
SGML, or
Standard Generalized Markup Language.
SGML is also used for creating pages on the Web, though it takes a different browser to be able to view SGML pages.
SGML is discussed further in Chapter 4, "Building Blocks of HTML,".
HTML is a markup language, which means that Web pages can only be viewed
by using a specialized
Internet terminal program called a
Web browser.
In the beginning, the potential was there for the typical computing "chicken
and the egg problem": no one would create
Web pages because no one
owned a browser program to view them with, and no one would get a browser
program because there were no Web pages to view.
Fortunately, this did not happen because shortly after the Web was invented,
a killer browser program was released to the Internet community-free of
charge!
In 1993, the National Center for Supercomputing Applications (NCSA)
at the University of Illinois at Champaign-Urbana released Mosaic, a Web
browser designed by Marc Andreessen and developed by a team of students
and staff at the
University of Illinois (see
fig. 1.7). It spread like wildfire though the Internet community; within
a year, an estimated two million users were on the Web with Mosaic. Suddenly,
everyone was browsing the Web, and everyone else was creating Web pages.
Nothing in the
history of computing had grown so fast.
NSCA Mosaic, the browser that drove the phenomenal growth of the
World Wide Web, is still available free of charge for Windows,
Windows
NT,
Windows 95,
UNIX, and
Macintosh.
By mid-1993, there were 130 sites on the World Wide Web. Six months later, there were over 600. Today, there are almost 100,000 Web sites in the world (some sources say there may be twice that many). For the first few months of its existence, the Web was doubling in size every three months. Even now, its doubling rate is (depending on whom you believe) less than five months. Table 1.1 shows just how quickly the Web has grown over its three-year history.
Table 1.1 Growth of the World Wide Web
Date | Web Sites |
---|---|
6/93 | 130 |
12/93 | 623 |
6/94 | 2,738 |
12/94 | 10,022 |
6/95 | 23,500 |
1/96 | 90,000 |
Source: "Measuring the Growth of the Web," Copyright 1995, Matthew Gray, http://www.netgen.com.
For more information on NCSA Mosaic, check out the NCSA Web site at http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/.
If the number of Web sites were to keep doubling at the current rate,
there would be over 300 Web sites in the world for every man, woman, and
child by the end of 1998. Clearly, this will not happen, but it does serve
to illustrate just how fast the Web is expanding! See figure
1.8 for a graphical perspective.
The Internet is growing at a phenomenal rate as a whole, but the
Web is growing so much faster that it almost seems destined to take over
the whole Net.
For a wealth of both more and less accurate demographic information on the growth of the Internet in general and the World Wide Web in specific, begin with Yahoo!'s list of sites at http://www.yahoo.com/ Computers_and_Internet/Internet/Statistics_and_Demographics/. One good site to try is the
GVU WWW User Survey at http://www.cc.gatech.edu/gvu/user_surveys/User_Survey_Home.html.
Mosaic's success-and the fact that its source code was distributed for
free!-spawned a wave of new browser introductions. Each topped the previous
by adding new HTML commands and features. Marc Andreessen moved on from
NCSA and joined with Jim Clark of Silicon Graphics to found
Netscape Communications
Corporation. They took along most of the NCSA Mosaic development team,
which quickly turned out the first version of Netscape Navigator for Windows,
Macintosh, and UNIX platforms. Because of its many new features and free
trail preview offer, Netscape (as it is usually called) quickly became
the most popular browser on the Web. The Web's incredible growth even attracted
Microsoft's attention, and in 1995, they introduced their Internet Explorer
Web browser to coincide with the launch of their new WWW service, the Microsoft
Network (MSN).
See "How
Web Browsers and Servers Work Together"
See "Netscape-Specific
Extensions to HTML"
See "Additional
HTML Extensions Supported by Other Browsers"
Established online services like
CompuServe,
America Online, and Prodigy
scrambled to meet their users' demands to add
Web access to their systems.
Most of them quickly developed their own version of Mosaic, customized
to work in conjunction with their proprietary online services. This enabled
millions of established commercial service subscribers to spill over onto
the Web virtually overnight; "old-timers" who had been on the
Web since its beginning (only a year and a half or so before) suddenly
found themselves overtaken by a tidal wave of Web-surfing newbies.
Even
television discovered the Web, and it seemed that every other
news
report featured a story about surfing the Net.
"All that growth is impressive," you say, "but... what just what exactly is the Web good for?" Good question, and one with hundreds of good answers.
People are on the Web to conduct business, to exchange information, to express their creativity, to collaborate, and to just plain have fun.
Some of the survey information used in this section is Copyright
(c)1995 CommerceNet Consortium/Nielsen Media Research.
Today, there are over 37 million adults in North America with access
to the Internet. 24 million of them actually use their access, and 18 million
use their Internet access time to browse the
World Wide Web. The total
amount of time spent cruising the Web is greater than the time spent using
all other
Internet services combined, and is roughly equivalent to the
time
North Americans spend watching
rented videotapes.
The number of people using the Internet is increasing so rapidly that
if the growth rate were to continue at the current rate, by 2003 every
person in the world would be on the Web!
Increasingly, people are using the Web to conduct business. Today, over 50 percent of the sites on the Web are commercial (with a .com domain name). Over half of the users of the Web look for products at least occasionally and-since Web users are predominantly upscale, well educated, and affluent-business is paying attention. Expect Web growth in the near future to continue to be driven and driven hard by business expansion into cyberspace.
But Web surfers also use the Net for more traditional telecommunications
purposes. Three-fourths browse the Web. Two-thirds exchange e-mail. One
third download software by FTP. One in three takes part in discussion groups,
and one in five is active in multimedia.
The World Wide Web didn't get its name by accident. It truly is a web
that encompasses just about every topic in the world. A quick look at the
premier topic index on the Web, Yahoo! (http://www.yahoo.com,),
lists topics as diverse as art, world news, sports, business, libraries,
classified advertising, education, TV, science, fitness, and politics (see
fig. 1.9). You can't get much more diverse than that! There are literally
thousands of sites listed on Yahoo! under each of these topics and many
more.
If you really want to know what's on the Web, you need look no further than Yahoo!
But mere mass isn't the main draw of the Web. It's the way in which
all that information is presented. The best Web sites integrate
graphics,
hypertext links, and even video and audio. They make finding information
interesting, fun, and intuitive.
Marshall McLuhan asserted that the medium is the message, and this is
certainly true with the Web. Because its hypermedia presentation style
can overwhelm its content if done poorly, the Web is a real challenge to
developers. But when done well, the results are fantastic, such as the
tour of an abandoned US missile silo shown in figure
1.10 (http://www.xvt.com/users/kevink/silo/site.html).
For more information about the World Wide Web, consult the WWW FAQ at http://sunsite.unc.edu/boutell/index.html.
Fig. 1.10
A really cool Web site integrates user interface and content
seamlessly.
See "Distributing Information on the Web"
Now that you know where the Web came from, it's time to jump into the whole melange feet first-but with your eyes open. HTML (HyperText Markup Language) is what you use to create Web pages, and it's the topic of this book.
HTML is relatively simple in both concept and execution. In fact, if
you have ever used a very old word processor, you are already familiar
with the concept of a markup language.
In the "good old days" of word processing, if you wanted text to appear in, say, italics, you might surround it with control characters like this:
/Ithis is in italics/I
The "/I" at the beginning would indicate to the word processor
that, when printed, the text following should be italicized. The "/I"
would turn off italics so that any text afterward would be printed in a
normal font. You literally marked up the text for printing just
as you would if you were making editing marks on a printed copy with an
editor's red pencil.
HTML works in much the same way. If, for example, you want text to appear on a Web page in italics, you mark it like this:
<I>this is in italics</I>
Almost everything you create in HTML relies on marks, or tags, like these.
See "How Web Browsers and Servers Work Together"
The rest of this book elaborates on that simple fact.
Although you don't need to know every term that's bantered about on the Internet to be able to work, play, and develop on the Web, an understanding of a few key terms will help you to better understand what's going on there. Here's a short glossary of Internet and Web terms to help you get started.
For more on computer terminology, check out the Free Online Dictionary of Computing at http://wfn-shop.princeton.edu/cgi-bin/foldoc. If computer abbreviations and acronyms have you confused, seek enlightenment at BABEL, a dictionary of such alphabet soup at http://www.access.digex.net/~ikind/babel96a.html. But if you want become a real Net insider, you'll have to learn the slang; for that, check out the latest version of the legendary Jargon File at http://www.ccil.org/jargon/jargon.html.
For technical support for our books and software contact support@mcp.com
Copyright ©1996, Que Corporation