What is XML

Previous Top Next

What is XML

If you are new to XML, perhaps the most confusing aspect is it's similarity to HTML, which makes XML seem familiar at first, but also tends to obscure the view for the finer details of what makes XML tick.

We will, therefore, start by looking at what XML really is and why you need XML.

The XML Specification
The W3C specification defines XML as a subset of SGML, so to properly understand XML, it is useful to take a closer look at SGML first.

SGML stands for Standard Generalized Markup Language, and was developed for large scale applications, aircraft maintenance or power plant documentation, and intended to be maintained over the long term .

The reason why XML seems to be so similar to HTML lies in the fact that HTML is defined as a subset of SGML. XML is actually a lot more similar to SGML than to HTML, because HTML is only one specific subset of SGML used to describe web pages.

As XML was created to simplify SGML, it is no wonder that the W3C has now decided to redefine HTML 4.0 as an XML application, thereby creating XHTML 1.0. But this shall be of no concern for us at the moment, because we are still faced with the fundamental question "What is XML?".

To answer this, let us define what XML is not:

It is not a programming language.

It is not the next generation of HTML.

It is not a database.

It is not specific to any horizontal or vertical market.

It is not the solution to all your problems, but it can be a very powerful tool in building such a solution.

XML is a clearly defined way to structure, describe, and interchange data.

Data in this context really means every conceivable kind of data! You can use XML for such diverse things as describing mathematical formulas, chemical compounds, astronomical information, financial derivatives, architectural blueprints, annotating Shakespearean plays, collecting Buddhist wisdoms, or voice-processing in telephone systems!

To get a feeling for XML, let us take a look at a simple XML document:

<name>Apple</name>

</product>

The < and > angle brackets are used to distinguish between the so-called "markup" (between the brackets) and the actual data of the document (outside of the brackets).

The XML document consist of individual elements that are marked by start- and end-tags (hence the term markup). Tags contain the name of the element, so that they can be distinguished from one another more easily.

The start-tag is bracketed by < > and the end-tag by </ > - both the start- and end-tag must always occur in pairs. The example XML document contains one element called "product", which consists of two elements: "name" (which contains the data "Apple") and "price" (which contains the data "0.10"). Unlike HTML, XML does not enforce a predefined set of element names (such as "body", "h1", and "p") - you can make up your own to suit the particular needs of your data.

This simple XML document also shows a very important aspect of XML - it is "self-describing". In addition to structuring the actual data, the XML element names (sometimes also called tag names) serve to describe the information provided in the document (in our case the price of an Apple). If you compare this to the way such data is traditionally exchanged between different applications (e.g. comma-separated value or CSV files), you can easily see the benefit:

"Apple";0.10

This is even more obvious, if you look at a slightly more complicated XML example document (as shown as in the Text View of XML Spy):

<name>Apple</name>

</product>

<name>Orange</name>

</product>

<name>Strawberries</name>

</product>

<name>Banana</name>

</product>

</invoice>

You can immediately see another crucial property of XML here: the elements can be nested in any way that is useful to show the semantic structure of the data contained, and elements can be repeated, if more than one item of data of the same kind needs to be listed. Our example now describes an invoice with four products and a total.

Also note that some elements contain additional information within the start-tag: attributes always have a name and a value and are written as name="value" (E.g. currency="US$"). Attributes are used to further specify additional information that augments the data of the element (in our example, the currency of the total).

A disadvantage of XML is that the bigger an XML document is, the more markup it contains. This can make it difficult to find the data contained in the document. This slight disadvantage is typically more than compensated by the flexibility of XML, and by the fact that XML is inherently suitable for reading by both humans and machines.

XML Spy offers a concise presentation of a XML document - called the Enhanced Grid View. This view allows you to see and directly manipulate elements in your XML document, such as the actual data it contains:

This is the same XML document as show in the Text view above. The product names and respective prices are shown as columns of a table - just as you would expect to see in a grid view.

Editing in this view is infinitely more comfortable, since you can simply:

·	drag & drop elements

·	insert new rows

·	copy/paste your data to and from other applications (e.g. Excel, etc.) and

·	manipulate data in a graphical way that is not possible in views offered by other products

You now have a first impression of an XML document and have learned about the two most important features of XML: elements and attributes. We will explain the other concepts of XML in the tutorial when we look at the specific features offered by XML Spy.

Before we continue, let us first consider the ever-important "why" question...

Previous Top Next

⌐ 2002 Altova