TOC  >>>ENCSEN/CSZVONTranslations

1. General questions

What is XML?
What is XML for?
What is SGML?
What is HTML?
Aren't XML, SGML, and HTML all the same thing?
What is the difference between SGML/XML and C or C++
Who is responsible for XML?
Why is XML such an important development?
How can XML make SGML simpler and still let you define your own document types?
Why not just carry on extending HTML?
Why do we need all this SGML stuff? Why not just use Word or Notes?
Where do I find more information about XML?
Where can I discuss implementation and development of XML?

What is XML?

XML is the Extensible Markup Language (extensible because it is not a fixed format like HTML). It is designed to enable the use of SGML on the World Wide Web.
XML is not a single, predefined markup language: it's a metalanguage -- a language for describing other languages -- which lets you design your own markup. (A predefined markup language like HTML defines a way to describe information in one specific class of documents: XML lets you define your own customized markup languages for different classes of document.) It can do this because it's written in SGML, the international standard metalanguage for markup.

What is XML for?

XML is designed "to make it easy and straightforward to use SGML on the Web: easy to define document types, easy to author and manage SGML-defined documents, and easy to transmit and share them across the Web."
It defines "an extremely simple dialect of SGML which is completely described in the XML Specification. The goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML."
"For this reason, XML has been designed for ease of implementation, and for interoperability with both SGML and HTML" [quotes from the XML spec].

What is SGML?

SGML is the Standard Generalized Markup Language (ISO 8879), the international standard for defining descriptions of the structure and content of different types of electronic document. There is an SGML FAQ at http://www.infosys.utas.edu.au/info/sgmlfaq.txt which is posted every month to the comp.text.sgml newsgroup, and the SGML Web pages are at http://www.oasis-open.org/cover/sgml-xml.html.
ISO standards are governed by the International Organization for Standardization in Geneva, Switzerland, and voted into or out of existence by representatives from every country's national standards body.
The representation of countries at the ISO is not a matter for this FAQ. Please do not submit queries to the maintainer about how or why your ISO representatives have or have not voted.

What is HTML?

HTML is the HyperText Markup Language (RFC 1866), a specific application of SGML used on the World Wide Web.

Aren't XML, SGML, and HTML all the same thing?

Not quite. SGML is the 'mother tongue', used for describing thousands of different document types in many fields of human activity, from transcriptions of ancient Irish manuscripts to the technical documentation for stealth bombers, and from patients' clinical records to musical notation.
HTML is just one of these document types, the one most frequently used in the Web. It defines a simple, fixed type of document with markup designed for a common class of office or technical report, with headings, paragraphs, lists, illustrations, etc, and some provision for hypertext and multimedia.
XML is an abbreviated version of SGML, to make it easier for you to define your own document types, and to make it easier for programmers to write programs to handle them. It omits the more complex and less-used parts of SGML in return for the benefits of being easier to write applications for, easier to understand, and more suited to delivery and interoperability over the Web. But it is still SGML, and XML files may still be parsed and validated the same as any other SGML file (see the question on XML software).
Programmers may find it useful to think of XML as being SGML-- rather than HTML++.

What is the difference between SGML/XML and C or C++

C and C++ (and others like Fortran, or Pascal, or Basic, or Java or dozens more) are programming languages with which you specify calculations, actions, and decisions to be carried out:
do when @front(@date,6) is equal "01-Apr" print "April Fool!\n" else print @days(@datesub("25-Dec",@date)),\ " shopping days to Christmas\n" done
SGML and XML are markup specification languages with which you can design ways of describing information, usually for storage, transmission, or processing by a program:
<p>It was the week after <event class="festival">Christmas</event> but <name class="person">Max</name>'s mind was still running on the prank he had played on <name class="person">Louise</name> the previous <name class="month">April</name>.</p>
On its own, a file of SGML or XML text (including HTML) doesn't do anything: you have to have a program to do something with it.

Who is responsible for XML?

XML is a project of the World Wide Web Consortium (W3C), and the development of the specification is being supervised by their XML Working Group. A Special Interest Group of co-opted contributors and experts from various fields contributed comments and reviews by email.
XML is a public format: it is not a proprietary development of any company. The v1.0 specification was accepted by the W3C as Recommendation on Feb 10, 1998.

Why is XML such an important development?

It removes two constraints which are holding back Web developments:
XML simplifies the levels of optionality in SGML, and allows the development of user-defined document types on the Web.

How can XML make SGML simpler and still let you define your own document types?

To make SGML simpler, XML redefines some of SGML's internal values and parameters, and removes a large number of the more complex and sometimes less-used features which made it harder to write processing programs (see http://www.w3.org/TR/NOTE-sgml-xml-971215).
Although it retains all of SGML's structural abilities which let you define and manage your own document types, XML introduces a new class of document which does not require you to use a predefined document type description (basically you can make up your own markup so long as you stick strictly to the syntactic rules). See the questions about "valid" and "well-formed" documents, and how to define your own document types in the Developers' Section.

Why not just carry on extending HTML?

HTML is already overburdened with dozens of interesting but often incompatible inventions from different manufacturers, because it provides only one way of describing your information.
XML will allow groups of people or organizations to create their own customized markup languages for exchanging information in their domain (music, chemistry, electronics, hill-walking, finance, surfing, petroleum geology, linguistics, cooking, knitting, stellar cartography, history, engineering, rabbit-keeping, mathematics, etc).
HTML is at the limit of its usefulness as a way of describing information, and while it will continue to play an important role for the content it currently represents, many new applications require a more robust and flexible infrastructure.

Why do we need all this SGML stuff? Why not just use Word or Notes?

Information on a network which connects many different types of computer has to be usable on all of them. Public information cannot afford to be restricted to one make or model or manufacturer, or to cede control of its data format to private hands. It is also helpful for such information to be in a form that can be reused in many different ways, as this can minimize wasted time and effort. Proprietary data formats, no matter how well documented or publicized, are simply not an option: their control still resides in private hands and they can be changed or withdrawn arbitrarily without notice.
SGML is the international standard for defining this kind of application, but those who need an alternative based on different software for other purposes are entirely free to implement similar services using such a system, especially if they are for private use.

Where do I find more information about XML?

Online, there's the XML Specification and ancillary documentation available from the W3C; the XML Web pages with an extensive list of online reference material in Robin Cover's SGML pages; and a summary and condensed FAQ from Tim Bray.
The items listed below are the ones I have been told about: please mail me if you come across others.
Further details of these on the GCA's Web site.
There are lists of books, articles, and software for XML in Robin Cover's SGML and XML pages. That site should always be your first port of call: please look there first before using the form in this FAQ to ask about software or documentation.

Where can I discuss implementation and development of XML?

Please Read The Fine Documentation which you will be sent when you join a mailing list, as it contains important information, particularly about what to do when your email address changes.
There is a mailing list called xml-dev for those committed to developing components for XML. You can subscribe by sending a 1-line mail message to majordomo@ic.ac.uk saying:subscribe xml-dev your@email.address(substituting your correct email address). To unsubscribe, send a 1-line message to the same address saying unsubscribe xml-dev your@email.addressThe list is hypermailed for online reference at http://www.lists.ic.ac.uk/hypermail/xml-dev/. Note that this list is for those people actively involved in developing resources for XML. It is not for general information about XML (see this FAQ and other sources) or for general discussion about SGML implementation and resources (see below).
There is a general-purpose mailing list called XML-L for public discussions: to subscribe, send a 1-line mail message to LISTSERV@listserv.heanet.ie sayingsubscribe XML-L forename surname(substituting your own forename and surname). To unsubscribe, send a 1-line message to the same address sayingunsubscribe XML-L(Note that LISTSERV lists like XML-L don't need you to give your email address: they read it from your email headers.) You can access XML-L and its archives, as well as subscribe and unsubscribe interactively, from http://listserv.heanet.ie/xml-l.html.
Please note that there is a lot of inaccurate and misleading information published in print and on the Web about subscribing to mailing lists. The information given here is correct - use it.
There are mailing lists being set up in other languages:
The Usenet newsgroup comp.text.xml is for discussions of XML. If this is not available on your local news server, ask your Internet Provider to add it, or use a Web interface like DejaNews.

TOC  >>>ENCSEN/CSZVONTranslations