<<<TOC  >>>ENCSEN/CSZVONTranslations

2. XML: Managing Document Components

Documents vs. Files
Information vs. Documents
XML and its parent technology, SGML (Standard Generalized Markup Language), provide the foundation for managing not only documents but also the information components of which the documents are composed. This is due to some notable characteristics of XML data.

Documents vs. Files

In XML, documents can be seen independently of files. One document can comprise many files, or one file can contain many documents. This is the distinction described earlier in this series between the physical and logical structure of information. XML data is primarily described by its logical structure. In a logical structure, principal interest is placed on what the pieces of information are and how they relate to each other, and secondary interest is placed on the physical items that constitute the information.
Rather than relying on file headers and other system-specific characteristics of a file as the primary means for understanding and managing information, XML relies on the markup in the data itself. A chapter in a document is not a chapter because it resides in a file called chapter1.doc but because the chapter's content is contained in the <chapter> and </chapter> element tags. Because elements in XML can have attributes, the components of a document can be extensively self-descriptive. For example, in XML you can learn a lot about the chapter without actually reading it if the chapter's markup is rich in attributes, as in <chapter language="English" subject="colonial economics" revision_date="19980623" author="Joan X. Pringle" thesis_advisor="Ramona Winkelhoff">. When the elements carry self-describing metadata with them, systems that understand XML syntax can operate on those elements in useful ways, just like a traditional document management system can. But there is a major difference.

Information vs. Documents

XML markup provides metadata for all components of a document, not merely the object that contains the document itself. This makes the pieces of information that constitute a document just as manageable as the fields of a record in a database. Because XML data follows syntactic rules for well-formedness and proper containment of elements, document management systems that can correctly read and parse XML data can apply the functions of document management system, such as those mentioned above, to any and all information components inside the document.
The focus on information rather than documents from XML offers some important capabilities:
While standard document management systems do offer some measure of information reuse through file sharing, information management systems based on XML or SGML enable people to share pieces of common information without storing the piece of information in multiple places.
By enabling people to focus on information components that make up documents rather than on the documents themselves, these systems can identify and capture useful information components that have ongoing value "buried" inside documents whose value as documents is limited. That is, a particular document may be useful only for a short time, but chunks of information inside that document may be reusable and valuable for a longer period.
Because the information components in XML documents are identifiable, manipulatable, and manageable, XML information management technology can support real economies in applications such as translation of technical manuals. (Look for an article devoted to this subject in the future.)

<<<TOC  >>>ENCSEN/CSZVONTranslations