4. Developers and Implementors
(including WebMasters and server operators)
|
4. Vývojáři a Implementátoři (včetně správů webu a serverů)
|
Where's the spec? | Kde nalézt dokumentaci. |
What are these terms 'DTDless',
'valid', and 'well-formed'? | Co znamená 'bezDTD', 'validní' a 'správně-formátovaný'? |
Which should I use in my DTD, attributes or elements? | Je lepší používat atributy nebo prvky? |
What else has changed between SGML and XML? | Jaké jsou další změny mezi SGML a XML |
What XML software can I use today? | Jaký XML software mohu použít? |
Do I have to change any of my server software to work
with XML? | Musím změnit programové vybavení serveru, abych mohl použít XML? |
Can I still use server-side
INCLUDEs? | Mohu stále používat programy na straně serveru? |
Can I (and my authors) still use
client-side INCLUDEs? | |
I'm trying to understand the XML
Spec: why does SGML (and XML) have such difficult terminology?
| Snažím se porozumět XML specifikaci. Proč má SGML (a XML) tak obtížnou terminologii? |
Is there a Developer's API kit for
XML? | Existuje API pro vývojáře XML? |
How does XML fit with the DOM? | Jaké jsou souvislosti mezi XML a DOM? |
Is there a
conformance test suite for XML processors? | |
How do I include
one DTD (or fragment) in another? | Jak mohu použít jedno DTD (nebo jeho část) v jiném? |
I've already
got SGML DTDs: how do I convert them for use with XML? | |
What's the story
on XML and EDI? | Jak to vypadá s XML a EDI? |
Where's the spec?
|
Kde nalézt dokumentaci.
|
Right
here
(http://www.w3.org/TR/REC-xml).
Includes the EBNF. There are also versions in Japanese (http://www.fxis.co.jp/DMS/sgml/xml/);
Spanish (http://www.ucc.ie/xml/faq-es.html);
Korean (http://xml.t2000.co.kr/faq/index.html)
and a Java-ised annotated version at
http://www.xml.com/axml/testaxml.htm. | |
Eve Maler has released the
DTD
and
documentation
used for the spec itself: this is a new version that was used to encode
the XML, XLink, XPointer, DOM, etc
specifications. Be aware that this version is no longer compatible with
the version that XML 1.0 uses; please send any comments or questions to
Eve. | |
What are these terms 'DTDless',
'valid', and 'well-formed'?
|
Co znamená 'bezDTD', 'validní' a 'správně-formátovaný'?
|
Full SGML uses a Document Type Definition (DTD) to describe the
markup (elements) available in any specific type of document. However,
the design and construction of a DTD can be a complex and non-trivial
task, so XML has been designed so it can be used either with or without
a DTD. DTDless operation means you can invent markup without having to
define it formally, at the penalty of losing automated control over
the structuring of additional documents of the same type.
| SGML používá DTD pro popis dostupných prvků v daném typu dokumentu. Tvorba a návrh DTD ovšem může být značně komplikovaná záležitost, takže XML byl navržen tak, aby byl použitelný s i bez DTD. Pokud tedy nepoužijete DTD, můžete vytvářet nové prvky, aniž je předem definujete, ztrácíte však možnost automatické kontroly. |
To make this work, a DTDless file in effect defines
its own markup informally, by the simple existence and location of elements
where you create them. But when an XML application such as a browser
encounters a DTDless file, it needs to be able to understand the
document structure while it reads it, because it has no DTD to tell it what
to expect, so some changes have been made to the rules.
| Aby to bylo možné, soubor bez DTD určuje svoji syntaxi z polohy a názvů použitých prvků. Pokud ovšem XML aplikace zracovává takový soubor, musí být schopná porozumět struktuře dokumentu v okamžiku zpracovávání, protože žádné DTD ji o této struktuře neinformovalo předem. Bylo proto nutné změnit některá pravidla. |
For example, HTML's <IMG>
element is defined as "EMPTY": it
doesn't have an end-tag. An XML application reading a
file without a DTD and encountering <IMG> would
have no way to know whether or not to expect an end-tag,
so the concept of 'well-formed' files has
become necessary. This makes the start and end of every element, and the
occurrence of EMPTY elements completely unambiguous.
|
Tak např. HTML <IMG>
prvek je definován jako "EMPTY", nemá koncový tag. XML program, který by procházel soubor bez DTD nemůže vědět, zda bude nebo nebude za <IMG> následovat </IMG>. Bylo nutné vytvořit koncept 'správně-formátovaných' souborů. Tím byl začátek a konec prvků, stejně jako obsah EMPTY prvků přesně udán. |
'Well-formed' files
| Správně-formátované soubory |
All XML documents, both DTDless and valid, must be well-formed:
| Všechny XML dokumenty, s i bez DTD musí být správně-formátované: |
- if there is
no DTD in use, the document must start with a Standalone
Document Declaration (SDD) saying so:<?xml version="1.0" standalone="yes"?>
<foo>
<bar>...<blort/>...</bar>
</foo>
David Brownell notes: "XML that's
'just' well-formed doesn't need to use a
Standalone Document Declaration at all. Such declarations are there to
permit certain speedups when processing documents while ignoring
external parameter entities -- basically, you can't rely on
external declarations in standalone documents. The types that are
relevant are entities and attributes. Standalone documents
must not require any kind of attribute value normalization or
defaulting, otherwise they are invalid."
- all tags must be balanced: that is, all elements which
may contain character data must have both start- and end-tags present
(omission is not allowed except for
empty elements, see below);
- all attribute values must be in quotes (the
single-quote character [the apostrophe] may be used if the value
contains a double-quote character, and vice versa):
if you need both, use ' or
", and declare them in the
internal subset;
- any EMPTY element
tags (eg those with no end-tag like HTML's
<IMG>,
<HR>, and <BR>
and others) must either end with "/>"
or you have to make them appear non-EMPTY by adding a real
end-tag;
Example:
<BR> would become either
<BR/> or
<BR></BR>
- there must not be any isolated markup-start characters
(<
or &) in your text data (ie
they must be given as < and
&), and the sequence
]]> must be given as ]]>
if it does not occur as the end of a CDATA marked
section;
- elements must nest inside each other properly (no
overlapping markup, same rule as for all SGML);
- Well-formed files with no DTD may use attributes on
any element, but the attributes must all be of type CDATA by default.
|
- všechny tagy musí být vyvážené, tedy: všechny prvky, které mohou obsahovat text nebo jiné prvky, musí mít počáteční i koncový tag, není povolena žádná výjimka, pokud se nejedná o prázdné (EMPTY) prvky.
- všechny hodnoty atributů musí být v závorkách ("" nebo ''), pokud hodnota obsahuje " použijte '' a naopak. Pokud hodnota obsahuje " i ' , použijte ' nebo "
- libovolný EMPTY (prázdný) prvek, tedy bez koncového tagu, jako je v HTML
<IMG>, <HR>, <BR> atd., musí končit s "/>" nebo k nim musíte přidat koncový tag. Příklad:
<BR> může být <BR/> nebo <BR></BR>
- dokument nesmí obsahovat < nebo &, který neuvádí formátovací instrukce. Pokud tyto znaky potřebujete, použijte < a &), a posloupnost ]]> musí být udána jako]]>, pokud se nevyznačuje konec sekce CDATA
- prvky se nesmí překrývat, tedy jeden prvek musí být zcela uvnitř jiného prvku. Prvek nemůže začít v jednom a končit v jiném prvku.
- Správně-formátované soubory bez DTD mohou mít atributy u všech prvků, všechny tyto atributz jsou považovány za typ CDATA.
|
XML files with no DTD are
considered to have <, >, ', "&, and & predefined and thus available for use
even without a DTD. Valid XML files must declare them explicitly if
they use them. If you want to use more than these five default
character entities, but you want to avoid having to write a full DTD,
it is possible to declare just character entities on their own in the
internal subset of a standalone XML file (thanks to Richard Lander for
this):
| XML soubory bez DTD mají předdefinovány hodnoty <, >, ', "& a & , proto jsou použitelné i bez DTD. Validní XML soubory je musí explicitně deklarovat, pokud je chtějí použít. Pokud chcete použít více než těchto pět základních entit, nemusíte psát kompletní DTD, ale deklarujte je v interním subsetu XML souboru. |
Valid XML | Validní XML |
Valid XML files are those which have a
Document Type Definition (DTD) like
other SGML applications, and which
adhere to it. They must already be well-formed.
| Validní XML soubory obsahují DTD jako ostatní SGML aplikace a řídí se podle ní. Musí být rovněž
správně-formátované. |
A valid file begins like any other SGML file with a Document Type
Declaration, but may have an optional XML Declaration
prepended:<?xml version="1.0"?>
<!DOCTYPE advert SYSTEM "http://www.foo.org/ad.dtd">
<advert>
<headline>...<pic/>...</headline>
<text>...</text>
</advert>The XML
Specification defines an SGML Declaration for XML which is fixed
for all instances (the declaration has been removed from the text of the
Specification and is now in
a separate
document). An
XML version of the specified DTD must be
accessible to the XML processor, either by being available locally
(ie
the user already has a copy on disk), or by being retrievable via the
network. You can specify this by supplying the URL for the DTD in a
System Identifier (as in the example above). It is possible (many people
would say preferable) to supply a
Formal Public Identifier,
but if used, this must precede the System
Identifier, which must still be given (and only the PUBLIC
keyword is used),
<!DOCTYPE advert PUBLIC "-//Foo, Inc//DTD Advertisements//EN"
"http://www.foo.org/ad.dtd">
|
Validní soubor začíná jako všechny SGML soubory s DTD, tuto deklaraci však může předcházet XML deklarace, např. <?xml version="1.0"?>
<!DOCTYPE advert SYSTEM "http://www.foo.org/ad.dtd">
<advert>
<headline>...<pic/>...</headline>
<text>...</text>
</advert> XML
specifikace definuje SGML deklaraci pro XML která je pevně dána pro všechny případy, tato deklarace byla vyjmuta z původní specifikace a je obsažena ve
zvlášním dokumentu.
XML verze udané DTD musí být dostupná pro XML procesor, buďto lokálně nebo na síti. Její polohu můžete určit s pomocí URL uvnitř Identifikátoru systému (System Identifier). Je rovněž možné poskytnout Formal Public Identifier, pokud je však použit, musí předcházet System Identifier, který musí být rovněž uveden (a použito klíčové slovo PUBLIC),
<!DOCTYPE advert PUBLIC "-//Foo, Inc//DTD Advertisements//EN"
"http://www.foo.org/ad.dtd"> |
The defaults for the other attributes of the XML Declaration are
version="1.0" and encoding="UTF-8".
| Další atributy v XML deklaraci mají přednastaveny hodnoty
version="1.0" a encoding="UTF-8". |
Which should I use in my DTD, attributes or elements?
|
Je lepší používat atributy nebo prvky?
|
There is no single answer to this: a lot depends on what
you are designing the document type for. The two extremes are best
illustrated with examples.
| Na tuto otázku neexistuje jednoduchá odpověď, záleží na tom, pro koho je daný typ dokumentu určen. Dva protipóly ukazuje následující příklad: |
-
'Traditional' textual practice
is to put the 'real' text (what would be
printed) as character data content, and keep the metadata (like line
numbers) in attributes, from where they can more easily be isolated
for analysis or special treatment like display in the margin or in a mouseover:
<l n="184"><sp>Portia</sp><text>The quality of mercy is not strain'd,</text></l>
-
But from the systems point of view, there is nothing
'wrong' with storing the data the other way
round, especially where the volume of text data on each occasion is
relatively small:
<line speaker="Portia" text="The quality of mercy is not strain'd,">184</line>
|
-
'Tradiční' textová praxe předpisuje použití vlastního textu v obsahu prvků a atributy jsou využity pro metadata, jako čísla řádků, odkud je lze snáze získat.
<l n="184"><sp>Portia</sp><text>The quality of mercy is not strain'd,</text></l>
-
Z hlediska systémového však není nic špatného na jiných přístupech, zejména pokud je vlastního textu relativně málo:
<line speaker="Portia" text="The quality of mercy is not strain'd,">184</line>
|
A lot will depend on what you want to do with the
information and which bits of it are easiest accessed by each method.
A rule of thumb for conventional textual documents is that if the
markup were all stripped away, the bare text should still be readable
and usable, even if inconvenient. For database output, however, or
other machine-generated documents, 'reading'
may not be meaningful, so it is perfectly possible to have documents
where all the data is in attributes, and the
document contains no character data in content models at all. See
http://www.oasis-open.org/cover/elementsAndAttrs.html
for more.
| Velmi záleží na tom, jak hodláte s informacemi dále naložit a která metoda je pohodlnější pro zpracování. Platí empirické pravidlo, že po odstranění všech formátovacích prvků, vlastní text by stále měl být použitelný, jakkoliv nepohodlně. Pro výstup z databází nebo jiných programů ovšem čitelný výstup nedává smysl, takže mohou být i všechna data obsažena v atributech. (viz http://www.oasis-open.org/cover/elementsAndAttrs.html pro více informací) |
What else has changed between SGML and XML?
|
Jaké jsou další změny mezi SGML a XML
|
The principal changes
are in what you can do in writing a Document Type Definition (DTD). To
simplify the syntax and make it easier to write processing software, a
large number of SGML markup declaration options have been suppressed
(see the list of omitted
features).
| Hlavními změnami prošly DTD. Za účelem zjednodušení syntaxe bylo potlačeno mnoho možnostíSGML ( viz. seznam |
An extra delimiter is permitted in Names
(the colon) for use in experiments with namespaces (enabling DTDs to
distinguish element source, ownership, or application). A colon may only
appear in mid-name, though, not at the start or the end. Work is ongoing
to define how these can be declared and referenced using element and
attribute markup.
| V názvech je povolena dvojtečka pro definice "namespace". |
What XML software can I use today?
|
Jaký XML software mohu použít?
|
Details are no longer in this FAQ as they are now changing too rapidly to
be kept up to date: see the XML pages at http://www.oasis-open.org/cover/xml.html. | |
For a detailed guide to examples of SGML and XML programs and the concepts behind
them, see the editor's book Understanding SGML and XML
Tools (Kluwer, 1998, 0-7923-8169-6). | |
For browsers see the question on
XML Browsers and the details of the
xml-dev mailing
list for software developers. Bert Bos keeps
a list of some XML
developments in bison, flex, perl and Python. | |
Information for developers of Chinese XML systems can be found
at the Chinese XML Now! website of Academia
Sinica: http://www.ascc.net/xml/
This site includes an FAQ and test files. | |
Do I have to change any of my server software to work
with XML?
|
Musím změnit programové vybavení serveru, abych mohl použít XML?
|
Only to serve up .xml files as the
correct MIME type (application/xml, see
RFC2376), so for
serving XML documents all that is needed is to edit the
mime-types file (or its equivalent) and add the
lineapplication/xml xml XML
| Pokud vás pouze zajímá správný MIME typ u vašich souborů .xml, zkonzultujte RFC2376). Vše, co potřebujete je editace souboru s mime-typy nebo jeho ekvivalentu. |
In some servers
(eg Apache),
users can change the MIME type for specific file types from their own
directories by using directives in a .htaccess file. The MIME
content-type text/xml must only be applied to
pure ASCII files (ISO 646 IRV) because of a character-set restriction
in the RFC: for all normal use,
application/xml is the one to go for.
| Některé servery, např. Apache, umožňují jednotlivým uživatelům změnu z jejich vlastního adresáře souborem .htaccess . |
Since XML is designed to support
stylesheets and sophisticated hyperlinking, XML documents may be
accompanied by ancillary files in the same way that SGML files are:
DTDs, entity files, catalogs, stylesheets, etc, which may need other
MIME Content-Type entries, such as text/css
for CSS stylesheets. XUA (XML User Agent), which is one of the planned
deliverables of the XML WG, might provide a mechanism for packaging
XML documents and XSL styles into a single message.
| XML bylo vytvořen tak, aby podporoval tabulky stylů a všestranný hypertext. XML dokumenty tedy mohou být doprovázeny množstvím podpůrných souborů stejně jako SGML: DTD, soubory entit, katalogy, tabulky stylů, které vyžadují vlastní MIME. |
If you run scripts generating HTML, which you wish to work with
XML, they will need to be modified to produce the relevant document
type.
| Pokud používáte skripty, které generují HTML a které mají spolupracovat s XML, musíte je modifikovat tak, aby generovaly žádané typy dokumentů. |
Can I still use server-side
INCLUDEs?
|
Mohu stále používat programy na straně serveru?
|
Yes, so long as what they generate ends up as part of an
XML-conformant file (ie either
valid or just well-formed). | |
Can I (and my authors) still use
client-side INCLUDEs?
|
|
The same rule applies as for
server-side INCLUDEs,
so you need to ensure that any embedded code which gets passed to a
third-party engine (eg SDQL
enquiries, Java
writes, LiveWire requests,
streamed content,
etc) does not contain any characters
which might be misinterpreted as XML markup (ie
no angle brackets or ampersands): either use a CDATA
marked section to avoid your XML application parsing the embedded code,
or use the standard <,
>, and
& character entity references
instead. | |
I'm trying to understand the XML
Spec: why does SGML (and XML) have such difficult terminology?
|
Snažím se porozumět XML specifikaci. Proč má SGML (a XML) tak obtížnou terminologii?
|
For implementation to succeed, the
terminology needs to be precise. Design goal 8 of the specification
tells us that "the design of XML shall be formal and
concise". To describe XML in formal terms, the specification
uses the concise language of Computer Science, which is often
confusing to non-CS people because it uses well-known English words in
a specialised sense which can be very different from their commonly
understood meanings -- for example,
'grammar', 'production',
'token', or
'terminal'.
| Pokud mají implementace uspět, musí být terminologie přesná. Jeden z cílů XML (č.8) říká: konstrukční pravidla pro XML musí být stručná a výstižná. XML specifikace je tedy popsána v jazyce počítačové vědy, který často mate čtenáře z jiných oblastí, neboť používá dobře známá slova z angličtiny ve specifickém významu, který se může podstatně lišit od běžného použití. |
The specification rarely explains these terms because of the other part
of this design goal: the specification should be concise. It doesn't
repeat explanations that are available elsewhere. In essence this
means that to grok the fullness of the spec, you need
foreknowledge of computer science and SGML.
| Specifikace jen zřídka tyto termíny vysvětluje, aby splnila podmínku stručnosti. Neopakuje vysvětlení, pokud je toto vysvětlení již poskytnuto jinde. |
Sloppy terminology in specifications causes misunderstandings, so
formal standards have to be phrased in formal terminology. This FAQ is not a
formal document, and the astute reader may already have noticed it
refers to 'element names' where 'element
type names is more correct; but the former is more widely
understood.
| Neurčitá terminologie ve specifikacích způsobuje zmatek, standardy tedy vyžadují standardní terminologii. Tento dokument není standardem, a proto používá terminologii, které není zcela přesná, ale srozumitelnější. |
Those new to SGML may want to
read something like the Gentle
Introduction to SGML chapter of the
TEI Guidelines. | |
Thanks to Bob DuCharme for suggestions and some bits from
his book on the XML Spec. | |
Is there a Developer's API kit for
XML?
|
Existuje API pro vývojáře XML?
|
Several are available or under development. Details of these and
other XML software are held on the SGML/XML Web pages. | |
The big conversion and application development engines like
Balise, Omnimark,
and SGMLC are all working on adding XML.
Details of SGML software of all kinds is on
the SGML Web pages. | |
How does XML fit with the DOM?
|
Jaké jsou souvislosti mezi XML a DOM?
|
The Document Object Model (DOM) (http://www.w3.org/TR/PR-DOM-Level-1)
provides an abstract API for constructing, accessing, and manipulating
XML and HTML documents. A "binding" of the DOM to a
particular programming language provides a concrete API. | |
Is there a
conformance test suite for XML processors?
|
|
James Clark has a collection of test cases for testing XML
parsers at http://www.jclark.com/xml/
which includes a conformance test. | |
How do I include
one DTD (or fragment) in another?
|
Jak mohu použít jedno DTD (nebo jeho část) v jiném?
|
This works exactly the same as for regular SGML. First you
declare the entity you want to include, and then you reference it by
name:
Můžete stejným způsobem jako v SGML. :
| |
<!ENTITY % mylists PUBLIC
"-//Foo, Inc//ENTITIES Common list structures//EN"
"dtds/listfrag.ent">
...
%mylists;
| |
Such declarations traditionally go all together towards the top
of the main DTD file, where they can be managed and maintained, but
this is not essential so long as they are declared before they are
used. You use Parameter Entity syntax for this (the percent sign)
because the file is to be included at DTD compile time, not when the
document instance itself is parsed.
| Tyto deklarace se tradičně nacházejí na začátku hlavního DTD, kde jsou snáze přístupné, není to však podmínkou. |
Note that a URL is compulsory in XML for all external file
references: standard rules for dereferencing URLs apply (assume the same
method, server, and directory as the containing document). The URL can
be supplied either as a System Identifier alone:<!ENTITY mydtd SYSTEM "http://www.foo.bar/~blort/my.dtd">or
as a second parameter to a
formal Public Identifier
as in the earlier example. | |
I've already
got SGML DTDs: how do I convert them for use with XML?
|
|
There are numerous projects being started to convert common or
popular SGML DTDs to XML format (for example Patrice Bonhomme is working
on an unofficial XML version of the TEI Lite DTD: details of that are
discussed on the TEI-L mailing list). | |
The following checklist comes courtesy of Sean McGrath
(author of XML By Example, Prentice Hall, 1998)
[my italics]: | |
- No equivalent of the SGML Declaration.
So keywords, character set
etc are essentially fixed;
- Tag mimimization is not allowed, so <!ELEMENT x - O (A,B)>
becomes <!ELEMENT X (A,B)> and
<!ELEMENT x - O EMPTY>
becomes <!ELEMENT X EMPTY>;
-
#PCDATA must only occur extreme
left in an OR model,
eg<!ELEMENT x (A|B|#PCDATA|C)>
becomes <!ELEMENT x (#PCDATA|A|B|C)>
and <!ELEMENT x (A,#PCDATA)>
is illegal;
- No CDATA, RCDATA
elements [declared content];
- Some SGML attribute types are not allowed in XML
eg NUTOKEN. Also
there are no NOTATION attributes (data attributes);
- Some SGML attribute defaults are not allowed in XML
eg CONREF;
- Comments cannot be inline to declarations like
[they can in standard SGML]
<!ELEMENT x (A,B) -- this is an SGML comment in a
declaration
- A whole bunch of SGML optional features are not
present in XML:
- all forms of tag
minimization (OMITTAG, DATATAG,
SHORTREF, etc);
- Link Process
Definitions;
- Multiple
DTDs per document
and many more: see
the question on the bits of SGML that were
removed for XML for a reference to the complete list;
| |
And last but not least, CONCUR!
There are some important differences betweeen the internal and external
subset portion of a DTD in XML:
marked
sections can only occur in the external
subset. Parameter Entities must be used to replace entire declarations in the
internal subset portion of a DTD, eg
the following is invalid XML: | |
<!DOCTYPE x [
<!ENTITY % modelx "(A|B)*">
<!ELEMENT x %modelx;>
]>
<x></x>
| |
What's the story
on XML and EDI?
|
Jak to vypadá s XML a EDI?
|
Electronic Document Interchange has been used in e-commerce for many years
to exchange documents between commercial partners to a transaction. It
has required special proprietary software, but there are now moves to
enable EDI data to travel inside XML. Details of developments are at
http://www.xmledi.com/
and there is a guideline document at http://www.geocities.com/WallStreet/Floor/5815/guide.htm
|
Výměna elektronických dokumentů (EDI) je používána v elektronickém obchodu již řadu let při výměně informací mezi obchodními partnery. Vyžadovala speciální komerční software, nyní je však snahou převést EDI do XML. Podrobnosti naleznete na adrese http://www.xmledi.com/ a vodítka poskytuje http://www.geocities.com/WallStreet/Floor/5815/guide.htm.
|