<<<TOC  ENCSEN/CSZVONTranslations

4. Developers and Implementors (including WebMasters and server operators)

Where's the spec?
What are these terms 'DTDless', 'valid', and 'well-formed'?
Which should I use in my DTD, attributes or elements?
What else has changed between SGML and XML?
What XML software can I use today?
Do I have to change any of my server software to work with XML?
Can I still use server-side INCLUDEs?
Can I (and my authors) still use client-side INCLUDEs?
I'm trying to understand the XML Spec: why does SGML (and XML) have such difficult terminology?
Is there a Developer's API kit for XML?
How does XML fit with the DOM?
Is there a conformance test suite for XML processors?
How do I include one DTD (or fragment) in another?
I've already got SGML DTDs: how do I convert them for use with XML?
What's the story on XML and EDI?

Where's the spec?

Right here (http://www.w3.org/TR/REC-xml). Includes the EBNF. There are also versions in Japanese (http://www.fxis.co.jp/DMS/sgml/xml/); Spanish (http://www.ucc.ie/xml/faq-es.html); Korean (http://xml.t2000.co.kr/faq/index.html) and a Java-ised annotated version at http://www.xml.com/axml/testaxml.htm.
Eve Maler has released the DTD and documentation used for the spec itself: this is a new version that was used to encode the XML, XLink, XPointer, DOM, etc specifications. Be aware that this version is no longer compatible with the version that XML 1.0 uses; please send any comments or questions to Eve.

What are these terms 'DTDless', 'valid', and 'well-formed'?

Full SGML uses a Document Type Definition (DTD) to describe the markup (elements) available in any specific type of document. However, the design and construction of a DTD can be a complex and non-trivial task, so XML has been designed so it can be used either with or without a DTD. DTDless operation means you can invent markup without having to define it formally, at the penalty of losing automated control over the structuring of additional documents of the same type.
To make this work, a DTDless file in effect defines its own markup informally, by the simple existence and location of elements where you create them. But when an XML application such as a browser encounters a DTDless file, it needs to be able to understand the document structure while it reads it, because it has no DTD to tell it what to expect, so some changes have been made to the rules.
For example, HTML's <IMG> element is defined as "EMPTY": it doesn't have an end-tag. An XML application reading a file without a DTD and encountering <IMG> would have no way to know whether or not to expect an end-tag, so the concept of 'well-formed' files has become necessary. This makes the start and end of every element, and the occurrence of EMPTY elements completely unambiguous.
'Well-formed' files
All XML documents, both DTDless and valid, must be well-formed:
XML files with no DTD are considered to have &lt;, &gt;, &apos;, &quot&, and &amp; predefined and thus available for use even without a DTD. Valid XML files must declare them explicitly if they use them. If you want to use more than these five default character entities, but you want to avoid having to write a full DTD, it is possible to declare just character entities on their own in the internal subset of a standalone XML file (thanks to Richard Lander for this):
Valid XML
Valid XML files are those which have a Document Type Definition (DTD) like other SGML applications, and which adhere to it. They must already be well-formed.
A valid file begins like any other SGML file with a Document Type Declaration, but may have an optional XML Declaration prepended:<?xml version="1.0"?> <!DOCTYPE advert SYSTEM "http://www.foo.org/ad.dtd"> <advert> <headline>...<pic/>...</headline> <text>...</text> </advert>The XML Specification defines an SGML Declaration for XML which is fixed for all instances (the declaration has been removed from the text of the Specification and is now in a separate document). An XML version of the specified DTD must be accessible to the XML processor, either by being available locally (ie the user already has a copy on disk), or by being retrievable via the network. You can specify this by supplying the URL for the DTD in a System Identifier (as in the example above). It is possible (many people would say preferable) to supply a Formal Public Identifier, but if used, this must precede the System Identifier, which must still be given (and only the PUBLIC keyword is used), <!DOCTYPE advert PUBLIC "-//Foo, Inc//DTD Advertisements//EN" "http://www.foo.org/ad.dtd">
The defaults for the other attributes of the XML Declaration are version="1.0" and encoding="UTF-8".

Which should I use in my DTD, attributes or elements?

There is no single answer to this: a lot depends on what you are designing the document type for. The two extremes are best illustrated with examples.
A lot will depend on what you want to do with the information and which bits of it are easiest accessed by each method. A rule of thumb for conventional textual documents is that if the markup were all stripped away, the bare text should still be readable and usable, even if inconvenient. For database output, however, or other machine-generated documents, 'reading' may not be meaningful, so it is perfectly possible to have documents where all the data is in attributes, and the document contains no character data in content models at all. See http://www.oasis-open.org/cover/elementsAndAttrs.html for more.

What else has changed between SGML and XML?

The principal changes are in what you can do in writing a Document Type Definition (DTD). To simplify the syntax and make it easier to write processing software, a large number of SGML markup declaration options have been suppressed (see the list of omitted features).
An extra delimiter is permitted in Names (the colon) for use in experiments with namespaces (enabling DTDs to distinguish element source, ownership, or application). A colon may only appear in mid-name, though, not at the start or the end. Work is ongoing to define how these can be declared and referenced using element and attribute markup.

What XML software can I use today?

Details are no longer in this FAQ as they are now changing too rapidly to be kept up to date: see the XML pages at http://www.oasis-open.org/cover/xml.html.
For a detailed guide to examples of SGML and XML programs and the concepts behind them, see the editor's book Understanding SGML and XML Tools (Kluwer, 1998, 0-7923-8169-6).
For browsers see the question on XML Browsers and the details of the xml-dev mailing list for software developers. Bert Bos keeps a list of some XML developments in bison, flex, perl and Python.
Information for developers of Chinese XML systems can be found at the Chinese XML Now! website of Academia Sinica: http://www.ascc.net/xml/ This site includes an FAQ and test files.

Do I have to change any of my server software to work with XML?

Only to serve up .xml files as the correct MIME type (application/xml, see RFC2376), so for serving XML documents all that is needed is to edit the mime-types file (or its equivalent) and add the lineapplication/xml xml XML
In some servers (eg Apache), users can change the MIME type for specific file types from their own directories by using directives in a .htaccess file. The MIME content-type text/xml must only be applied to pure ASCII files (ISO 646 IRV) because of a character-set restriction in the RFC: for all normal use, application/xml is the one to go for.
Since XML is designed to support stylesheets and sophisticated hyperlinking, XML documents may be accompanied by ancillary files in the same way that SGML files are: DTDs, entity files, catalogs, stylesheets, etc, which may need other MIME Content-Type entries, such as text/css for CSS stylesheets. XUA (XML User Agent), which is one of the planned deliverables of the XML WG, might provide a mechanism for packaging XML documents and XSL styles into a single message.
If you run scripts generating HTML, which you wish to work with XML, they will need to be modified to produce the relevant document type.

Can I still use server-side INCLUDEs?

Yes, so long as what they generate ends up as part of an XML-conformant file (ie either valid or just well-formed).

Can I (and my authors) still use client-side INCLUDEs?

The same rule applies as for server-side INCLUDEs, so you need to ensure that any embedded code which gets passed to a third-party engine (eg SDQL enquiries, Java writes, LiveWire requests, streamed content, etc) does not contain any characters which might be misinterpreted as XML markup (ie no angle brackets or ampersands): either use a CDATA marked section to avoid your XML application parsing the embedded code, or use the standard &lt;, &gt;, and &amp; character entity references instead.

I'm trying to understand the XML Spec: why does SGML (and XML) have such difficult terminology?

For implementation to succeed, the terminology needs to be precise. Design goal 8 of the specification tells us that "the design of XML shall be formal and concise". To describe XML in formal terms, the specification uses the concise language of Computer Science, which is often confusing to non-CS people because it uses well-known English words in a specialised sense which can be very different from their commonly understood meanings -- for example, 'grammar', 'production', 'token', or 'terminal'.
The specification rarely explains these terms because of the other part of this design goal: the specification should be concise. It doesn't repeat explanations that are available elsewhere. In essence this means that to grok the fullness of the spec, you need foreknowledge of computer science and SGML.
Sloppy terminology in specifications causes misunderstandings, so formal standards have to be phrased in formal terminology. This FAQ is not a formal document, and the astute reader may already have noticed it refers to 'element names' where 'element type names is more correct; but the former is more widely understood.
Those new to SGML may want to read something like the Gentle Introduction to SGML chapter of the TEI Guidelines.
Thanks to Bob DuCharme for suggestions and some bits from his book on the XML Spec.

Is there a Developer's API kit for XML?

Several are available or under development. Details of these and other XML software are held on the SGML/XML Web pages.
The big conversion and application development engines like Balise, Omnimark, and SGMLC are all working on adding XML. Details of SGML software of all kinds is on the SGML Web pages.

How does XML fit with the DOM?

The Document Object Model (DOM) (http://www.w3.org/TR/PR-DOM-Level-1) provides an abstract API for constructing, accessing, and manipulating XML and HTML documents. A "binding" of the DOM to a particular programming language provides a concrete API.

Is there a conformance test suite for XML processors?

James Clark has a collection of test cases for testing XML parsers at http://www.jclark.com/xml/ which includes a conformance test.

How do I include one DTD (or fragment) in another?

This works exactly the same as for regular SGML. First you declare the entity you want to include, and then you reference it by name: Můžete stejným způsobem jako v SGML. :
<!ENTITY % mylists PUBLIC "-//Foo, Inc//ENTITIES Common list structures//EN" "dtds/listfrag.ent"> ... %mylists;
Such declarations traditionally go all together towards the top of the main DTD file, where they can be managed and maintained, but this is not essential so long as they are declared before they are used. You use Parameter Entity syntax for this (the percent sign) because the file is to be included at DTD compile time, not when the document instance itself is parsed.
Note that a URL is compulsory in XML for all external file references: standard rules for dereferencing URLs apply (assume the same method, server, and directory as the containing document). The URL can be supplied either as a System Identifier alone:<!ENTITY mydtd SYSTEM "http://www.foo.bar/~blort/my.dtd">or as a second parameter to a formal Public Identifier as in the earlier example.

I've already got SGML DTDs: how do I convert them for use with XML?

There are numerous projects being started to convert common or popular SGML DTDs to XML format (for example Patrice Bonhomme is working on an unofficial XML version of the TEI Lite DTD: details of that are discussed on the TEI-L mailing list).
The following checklist comes courtesy of Sean McGrath (author of XML By Example, Prentice Hall, 1998) [my italics]:
and many more: see the question on the bits of SGML that were removed for XML for a reference to the complete list;
And last but not least, CONCUR! There are some important differences betweeen the internal and external subset portion of a DTD in XML: marked sections can only occur in the external subset. Parameter Entities must be used to replace entire declarations in the internal subset portion of a DTD, eg the following is invalid XML:
<!DOCTYPE x [ <!ENTITY % modelx "(A|B)*"> <!ELEMENT x %modelx;> ]> <x></x>

What's the story on XML and EDI?

Electronic Document Interchange has been used in e-commerce for many years to exchange documents between commercial partners to a transaction. It has required special proprietary software, but there are now moves to enable EDI data to travel inside XML. Details of developments are at http://www.xmledi.com/ and there is a guideline document at http://www.geocities.com/WallStreet/Floor/5815/guide.htm

<<<TOC  ENCSEN/CSZVONTranslations